Skip to content

Conversation

@inv-jishnu
Copy link
Contributor

@inv-jishnu inv-jishnu commented Nov 11, 2025

Description

This pull request refactors the ScalarDB Data Loader export functionality to use the DistributedTransactionManager instead of the DistributedStorage interface.

Currently, only the export functionality has been migrated to use the transactional API. Other components of the Data Loader that still depend on DistributedStorage—such as certain import operations and internal utilities—will be updated in future PRs to ensure a consistent data access approach across all modules.

This migration is required for ScalarDB Cluster compatibility, since the DistributedStorage interface is not supported in clustered environments. By using DistributedTransactionManager, the export process can now operate seamlessly in both standalone and clustered ScalarDB deployments.

Related issues and/or PRs

NA

Changes made

  • Replaced all usages of DistributedStorage in export-related classes with DistributedTransactionManager.

  • Updated constructors and factory methods in ExportManager and its concrete subclasses (CsvExportManager, JsonExportManager, JsonLinesExportManager) to use DistributedTransactionManager.

  • Modified logic for reading data during export to use transaction-based reads.

  • Updated corresponding unit tests to mock or inject DistributedTransactionManager instead of DistributedStorage.

  • Cleaned up related Javadoc comments and updated descriptions to reflect the new transactional behaviour.

Checklist

The following is a best-effort checklist. If any items in this checklist are not applicable to this PR or are dependent on other, unmerged PRs, please still mark the checkboxes after you have read and understood each item.

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes.
  • I have considered whether similar issues could occur in other products, components, or modules if this PR is for bug fixes.
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

There are still usage of DistributerStorage in the other sections including table metadata usage in import which will be removed in later PRs

Release notes

Replaced DistributedStorage with DistributedTransactionManager in Data Loader export

@inv-jishnu inv-jishnu self-assigned this Nov 11, 2025
@inv-jishnu inv-jishnu added the enhancement New feature or request label Nov 11, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @inv-jishnu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request initiates a significant architectural shift within the ScalarDB Data Loader by migrating its data export capabilities to leverage the DistributedTransactionManager. This change is a foundational step towards a more unified and robust data access strategy, ensuring that export operations benefit from transactional guarantees. While currently focused solely on export, this refactoring sets the stage for subsequent updates to other Data Loader components, aiming for a consistent transactional approach throughout the system.

Highlights

  • Refactor Data Loader Export: The export functionality of the ScalarDB Data Loader has been refactored to utilize the DistributedTransactionManager interface instead of DistributedStorage.
  • Transactional Reads: The logic for reading data during export now employs transaction-based reads, aligning with the new DistributedTransactionManager usage.
  • Updated Dependencies and Constructors: All export-related classes, including ExportManager and its concrete subclasses (CsvExportManager, JsonExportManager, JsonLinesExportManager), have had their constructors and factory methods updated to accept DistributedTransactionManager.
  • Test and Documentation Updates: Corresponding unit tests have been modified to mock or inject DistributedTransactionManager, and Javadoc comments and descriptions have been updated to reflect the new transactional behavior.
  • Phased Migration: This change specifically targets the export functionality. Other parts of the Data Loader, such as import operations and internal utility classes, will be migrated in future pull requests to ensure a consistent data access pattern across all modules.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors the data loader's export functionality to use DistributedTransactionManager instead of DistributedStorage, which is a good step towards unifying data access patterns. The changes are well-contained to the export logic and are consistently applied across the affected classes and their corresponding unit tests. The Javadoc comments have also been helpfully updated to reflect the new transactional behavior.

I have a couple of suggestions for improvement: one regarding error logging for better diagnostics, and another to correct a minor issue in a test setup for clarity. Overall, this is a solid refactoring.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the data loader export functionality to use DistributedTransactionManager instead of DistributedStorage. The changes are well-implemented and consistent across the modified files. I've identified a few areas for improvement, mainly related to documentation clarity, code consistency, and a minor bug in a test file. Overall, this is a solid refactoring effort.

@ypeckstadt ypeckstadt changed the title Refactor Data Loader Export to Use DistributedTransactionManager Instead of DistributedStorage Use DistributedTransactionManager instead of DistributedStorage in Data Loader export Nov 12, 2025
@ypeckstadt ypeckstadt changed the title Use DistributedTransactionManager instead of DistributedStorage in Data Loader export Replace DistributedStorage with DistributedTransactionManager in Data Loader export Nov 12, 2025
Copy link
Contributor

@ypeckstadt ypeckstadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(apart from AI review comments ) LGTM. Thank you.

@inv-jishnu Best to add the reason why this PR is necessary in the PR description as well, not just what it does. This PR is reviewed by a lot of members so they need the context.

@ypeckstadt ypeckstadt requested review from a team, Torch3333, brfrn169, feeblefakie and komamitsu and removed request for a team November 12, 2025 01:09
Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@brfrn169 brfrn169 added improvement and removed enhancement New feature or request labels Nov 12, 2025
Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Comment on lines 69 to 70
TableMetadataService metaDataService =
new TableMetadataService(storageFactory.getStorageAdmin());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also use DistributedTransactionAdmin in TableMetadataService:

Suggested change
TableMetadataService metaDataService =
new TableMetadataService(storageFactory.getStorageAdmin());
TableMetadataService metaDataService =
new TableMetadataService(transactionFactory.getTransactionAdmin());

I think we can replace all usages of the storage API with the transaction API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brfrn169 Jishnu pushed 2 other PRs for this. I still have to open them up after checking them. Would it be easier to review it all in one PR? In that case, I will ask Jishu to just merge it all into this PR.

#3146
#3147

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants