Introduce KeySpacePath.importData to import previously exported data #3578

ScottDugas · 2025-09-09T16:47:32Z

This introduces a new KeySpacePath.importData that will import DataInKeySpacePath as gathered by KeySpacePath.exportAllData.

The new method works when importing data exported from other clusters.

Resolves: #3573

ohadzeliger · 2025-11-12T20:59:37Z

...main/java/com/apple/foundationdb/record/provider/foundationdb/keyspace/KeySpacePathImpl.java


+    @Nonnull
+    @Override
+    public CompletableFuture<Void> importData(@Nonnull FDBRecordContext context,


Is the assumption here that resource control be handled externally? Meaning, if the given stream of data is too large, someone else is going to trim it and retry? In that sense, would it make sense to provide a single import method (for a single item) and return the future, as there seems no efficiency by working on the whole collection?

Yes, the idea would be that the calling process would limit dataToImport, and decrease that limit in response to failures.
I hadn't really considered the possibility of limiting in response to the number that were imported, our existing code does have some dynamic response (e.g. ThrottlingRetryingIterator and IndexingThrottle), so it is probably a good idea to support that here. It could do something like add the number imported to the exception, but that doesn't seem great.
There is some performance advantage to doing it as an iterable, namely toTupleAsync may have a cost if it is DirectoryLayerDirectory. That should be minimal, but it could go away completely by putting it on ResolvedKeySpacePath.
It's unfortunate that would put exportAllData and importData on different classes.

I can try moving this if you think it's worthwhile at this point.

I forgot/misread one other advantage: This imports all the data concurrently.
The set should be incredibly efficient, so the primary thing you are gaining in performance is resolving the path for each data item.
If a client wants to do imports one-at-a-time they can always provide dataToImport with just one element, and call it multiple times.

I feel like that is a good enough reason to stick with the current implementation, at least for now.
That being said, it may make sense to use a MapPipelinedCursor so that we are only resolving X at a time.

ohadzeliger · 2025-11-12T21:35:50Z

...main/java/com/apple/foundationdb/record/provider/foundationdb/keyspace/KeySpacePathImpl.java

+                    }
+
+                    // Store the data
+                    byte[] keyBytes = keyTuple.pack();


Should we add some fdb timer metrics for future use (imported_count)?

Yeah, I think a timer around importFuture makes sense.

ohadzeliger · 2025-11-12T21:40:40Z

...record-layer-core/src/test/java/com/apple/foundationdb/record/test/FDBDatabaseExtension.java

+     * @return a random subset of the databases available. This may be less than {@code count} if there aren't that many
+     * databases available.
+     */
+    public List<FDBDatabase> getDatabases(int count) {


Suggested change

public List<FDBDatabase> getDatabases(int count) {

public List<FDBDatabase> getRandomDatabasesSubset(int count) {

It's possible it will make less sense to exist when I remove the databases field, but if I keep this method around, I'll rename it.

ohadzeliger · 2025-11-12T21:58:31Z

...com/apple/foundationdb/record/provider/foundationdb/keyspace/KeySpacePathImportDataTest.java

+
+    @BeforeEach
+    void setUp() {
+        databases = dbExtension.getDatabases(2);


Maybe sourceDb and targetDb, and some logic (like assume) around whether target DB is null?

Given that running with multiple clusters was only recently added, I felt it would be better to have these tests run re-using the same cluster, rather than have them not run if a developer doesn't have two FDB clusters.
(Also, I wrote these tests before that functionality existed).

Changing it to be sourceDB and destDB fields seems reasonable enough. I think I would have them both be Nonnull and have special logic to clear if they are the same.

It would be good to have a couple of these tests parameterized to copy within a cluster vs across clusters. I'll make sure to do that. At the very least copying back to where it was, clearing in-between, copying to a different path, and copying with DirectoryLayer.

Given that importData works with a DataInKeySpacePath, I actually think copying to a different path doesn't make sense, because the DataInKeySpacePath needs to be scoped to this path.

ohadzeliger · 2025-11-12T22:02:51Z

...com/apple/foundationdb/record/provider/foundationdb/keyspace/KeySpacePathImportDataTest.java

+        final List<DataInKeySpacePath> exportedData = getExportedData(sourcePath);
+
+        if (databases.size() > 1) {
+            database = databases.get(1);


Was this intended to change the class' field value?
(I guess, it is, as after the copy all operations would be running on the "target" DB, but it is a little obscure to the reader)

Yes, it was intended, but I can see how it would be confusing.
When I do the refactoring to have sourceDB and targetDB, I'll see how that shakes out. I think the only code interacting with the targetDB would be verifySingleKey so I'll make sure to check that.
At least one thing I want to ensure is clear is that we aren't accidentally reading back from the source.

ohadzeliger · 2025-11-12T22:32:41Z

...main/java/com/apple/foundationdb/record/provider/foundationdb/keyspace/KeySpacePathImpl.java

+            List<CompletableFuture<Void>> importFutures = new ArrayList<>();
+
+            for (DataInKeySpacePath dataItem : dataToImport) {
+                CompletableFuture<Void> importFuture = dataItem.getPath().toTupleAsync(context).thenCompose(itemPathTuple -> {


I'm assuming that the directory layer implementation of Directory is creating a new entry in case the requested value is not found, right? This is done in the same transaction and is rolled back in case of failure, right? Can there be cases where multiple entries are created for the same resolved value? Do we care?

Yes, the directory layer will create any entries that don't exist, but I should ensure there are tests of that.
The DirectoryLayer creates entries in separate transactions, borrowing the read version. I believe it does this to:

Isolate transaction conflict risk

Once it is created it is cached

The general assumption is that anything that you're interacting with in the DirectoryLayer should almost always already exist, and most likely already be in the cache.

And the DirectoryLayer ensures that both the logical values (String) and resolved values (Long) are unique.

There is a test where the directory-layer entry won't exist: importDataWithDirectoryLayer is using a uuid for the string value, so when copying between clusters it won't exist on the destination.

ohadzeliger · 2025-11-12T22:37:34Z

...com/apple/foundationdb/record/provider/foundationdb/keyspace/KeySpacePathImportDataTest.java

+            context.commit();
+        }
+
+        copyData(root.path("company"), root.path("company"));


Will there be cases where we have to import into a different root? Say where we export from "/local/company/..." and import into "/remote/company/..."?

Yes, but it is the responsibility of the caller to convert the path.
In my draft for serialization I cover this: https://github.com/FoundationDB/fdb-record-layer/pull/3747/files#diff-9422726206a402dab506820d748c97cd0b4a1b76c5f9ed8ff0f60c50a8bad8b0R342
but it's not really a test that makes sense at this point, because the DataInKeySpacePath should be already referring to /remote/company by the time it gets to import.

ohadzeliger · 2025-11-12T22:43:32Z

...com/apple/foundationdb/record/provider/foundationdb/keyspace/KeySpacePathImportDataTest.java

+    private void clearPath(final FDBDatabase database, final KeySpacePath path) {
+        try (FDBRecordContext context = database.openContext()) {
+            Transaction tr = context.ensureActive();
+            tr.clear(path.toSubspace(context).range());


This would not clear the directory layer, right?

ohadzeliger · 2025-11-12T22:59:47Z

...com/apple/foundationdb/record/provider/foundationdb/keyspace/KeySpacePathImportDataTest.java

+        final KeySpacePath dataPath = root.path("tenant").add("user_id", 999L);
+        setSingleKey(dataPath, Tuple.from("data"), Tuple.from("directory_test"));
+
+        copyData(root.path("tenant"), root.path("tenant"));


Would it make sense to add assertions for the resolved exported value and the way it was resolved by directory layer?

That is implicitly done by the verifySingleKey, which takes the dataPath. If copyData didn't resolve the path correctly, it wouldn't be found in verifySingleKey

ohadzeliger · 2025-11-12T23:26:14Z

...com/apple/foundationdb/record/provider/foundationdb/keyspace/KeySpacePathImportDataTest.java

+
+        verifySingleKey(dataPath, Tuple.from("item"), Tuple.from("final_value"));
+    }
+


Additional potential tests:

Large data (or any out of band error) during import

Import into partial path (no leaves in import data) + some remainders

import where data is of the wrong type

Yes, a test of more data than can be inserted into a single transaction would make sense, but not if I move it to ResolvedKeySpacePath and just have it take a single DataInKeySpacePath.

I'm not sure what you mean by a partial path.

If by data you mean the value, there is no validation, and it is not KeySpacePaths responsibility to know what is in the data. If you mean the object in the path, that should be validated above this call, and should be trust-worthy by the time you get a DataInKeySpacePath. Ideally this would be validated when you create the KeySpacePath, but it is covered in the serialization work, and I explain a bit more on the situation there: https://github.com/FoundationDB/fdb-record-layer/pull/3747/files#diff-15120b2e222e6bb7c2647b670f676b719cce8602e410487604bc87e9ea30a3b0R179

ScottDugas force-pushed the keyspace-import branch from 6a071ed to abd67ac Compare October 24, 2025 18:29

ScottDugas added 11 commits November 7, 2025 16:29

Initial pass at KeySpacePath.importData

178cd61

Cleanup some of the tests for importing data

938d240

Cleanup import tests

bee1806

A little more test cleanup

44d609b

Respond to api change on main for DataInKeySpacePath

b810497

Extract helper for export+import

9132184

Change the import test to use 2 clusters if available

f63ae18

Respond to DataInKeySpacePath not having Resolved on main (after rebase)

714faf3

Cleanup some of the export tests

c255044

Fix minor typo

f30cb54

Reduce some duplication in import tests

287ff3f

ScottDugas force-pushed the keyspace-import branch from abd67ac to 287ff3f Compare November 7, 2025 21:53

ScottDugas added 2 commits November 9, 2025 15:58

Remove unused parameter

2cac9f3

Create better helpers to simplify test

77d85a2

ScottDugas added the enhancement New feature or request label Nov 9, 2025

ScottDugas changed the title ~~Keyspace import~~ Introduce KeySpacePath.importData to import previously exported data Nov 9, 2025

Minor cleanup after self-review

afc98fb

ScottDugas requested review from alecgrieser and ohadzeliger November 10, 2025 16:08

ScottDugas marked this pull request as ready for review November 10, 2025 16:08

ohadzeliger requested changes Nov 12, 2025

View reviewed changes

ScottDugas mentioned this pull request Nov 13, 2025

Improve handling of importing with DirectoryLayerDirectory #3751

Open

	public List<FDBDatabase> getDatabases(int count) {
	public List<FDBDatabase> getRandomDatabasesSubset(int count) {


		verifySingleKey(dataPath, Tuple.from("item"), Tuple.from("final_value"));
		}

Introduce KeySpacePath.importData to import previously exported data #3578

Are you sure you want to change the base?

Introduce KeySpacePath.importData to import previously exported data #3578

Conversation

ScottDugas commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ScottDugas commented Sep 9, 2025 •

edited

Loading