Skip to content

Conversation

@inv-jishnu
Copy link
Contributor

@inv-jishnu inv-jishnu commented Nov 12, 2025

Description

This pull request refactors the import functionality in the Data Loader core module to use the DistributedTransactionManager instead of the DistributedStorage interface when operating in storage mode.

This update is part of the broader effort to make the Data Loader compatible with ScalarDB Cluster, where DistributedStorage is not supported. By replacing storage-based operations with transaction-based APIs, the import process now aligns with the transactional architecture used by ScalarDB Cluster, ensuring consistent behavior and future maintainability.

In addition to this refactor, the PR also removes unused classes and methods that were no longer relevant after the transition, and updates the corresponding unit tests to reflect the new transactional behavior.

Related issues and/or PRs

Please review this PR after the following PR is merged

Changes made

  • Replaced all usages of DistributedStorage with DistributedTransactionManager in import-related components operating in storage mode.
  • Updated DAO classes to execute put and get operations using DistributedTransactionManager.
  • Removed obsolete or unused classes and methods related to the old storage-based implementation.
  • Updated unit tests to mock or inject DistributedTransactionManager instead of DistributedStorage.
  • Cleaned up documentation and inline comments to reflect the new transaction-based design.

Checklist

The following is a best-effort checklist. If any items in this checklist are not applicable to this PR or are dependent on other, unmerged PRs, please still mark the checkboxes after you have read and understood each item.

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes.
  • I have considered whether similar issues could occur in other products, components, or modules if this PR is for bug fixes.
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

@inv-jishnu Dont forget to change the target branch once the other PRs have been merged.

Release notes

Refactor Data Loader Import to Use DistributedTransactionManager for Storage Mode and Clean Up Unused Code

@inv-jishnu inv-jishnu self-assigned this Nov 12, 2025
@inv-jishnu inv-jishnu marked this pull request as draft November 12, 2025 12:32
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @inv-jishnu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the Data Loader's core import module to enhance compatibility with ScalarDB Cluster. By transitioning from DistributedStorage to DistributedTransactionManager for storage mode operations, the import process now aligns with ScalarDB Cluster's transactional architecture, ensuring consistent behavior and improved maintainability. This change also includes the removal of outdated components and updates to relevant tests.

Highlights

  • Core Refactoring: The Data Loader's import functionality now exclusively uses DistributedTransactionManager for storage mode operations, replacing DistributedStorage to align with ScalarDB Cluster's transactional architecture.
  • DAO Updates: The ScalarDbDao methods for get and put have been updated to interact directly with DistributedTransactionManager, ensuring all data operations are transaction-based.
  • Code Cleanup: Obsolete classes ScalarDbStorageManager and ScalarDbTransactionManager have been completely removed, along with unused methods and imports, simplifying the codebase.
  • Test Alignment: Unit tests for import processors (CSV, JSON, JSON Lines) have been adjusted to mock and utilize DistributedTransactionManager instead of DistributedStorage, reflecting the new implementation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors the data loader's import functionality to use DistributedTransactionManager instead of DistributedStorage for storage mode operations. The changes are clean, consistent, and significantly simplify the codebase by removing conditional logic based on ScalarDbMode and eliminating now-obsolete wrapper classes. The updates to the DAO layer and unit tests are correct and align with the goal of making the data loader compatible with ScalarDB Cluster. I have one suggestion for further cleanup to remove a now-unused field, but overall this is a solid improvement.

@ypeckstadt
Copy link
Contributor

@inv-jishnu For this the CI/Gradle Check is failing and it does seem data loader-related. Can you check that? If we cant keep the build working with this PR then we need to group them together.

@ypeckstadt ypeckstadt requested review from a team, brfrn169, feeblefakie and komamitsu and removed request for a team November 13, 2025 06:44
@ypeckstadt ypeckstadt requested a review from Torch3333 November 13, 2025 06:44
@brfrn169
Copy link
Collaborator

@inv-jishnu It seems like this PR includes unrelated changes.

@inv-jishnu
Copy link
Contributor Author

@inv-jishnu It seems like this PR includes unrelated changes.

@brfrn169 san,
Sorry for the confusion, base PR is not set to master branch but to the branch from which this one was created, I will update that.

@inv-jishnu inv-jishnu changed the base branch from feat/data-loader/table-metadata-replace-storage to master November 13, 2025 12:22
@inv-jishnu
Copy link
Contributor Author

@brfrn169 san,

I have updated the base branch to master, but it still has some other changes which is included in other PR which will be clear once the other PRs are merged and I will rebase changes from master to this one after that.

Copy link
Contributor

@feeblefakie feeblefakie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants