-
Notifications
You must be signed in to change notification settings - Fork 40
Add support for S3 #3141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add support for S3 #3141
Conversation
|
/gemini review |
| @@ -0,0 +1,93 @@ | |||
| name: Test Object Storage Adapter | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since AWS does not provide an S3 emulator, and other S3-compatible storage services do not support conditional deletes, we will need to run integration tests when releasing a new ScalarDB version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces S3 support for the Object Storage adapter, which is a significant feature addition. The implementation is comprehensive, leveraging the AWS CRT-based S3 client for better performance and includes new configuration options, exception types, and both unit and integration tests. My review has identified a critical issue in the delete method's logic that needs to be addressed. Additionally, I've pointed out several areas for improvement regarding code duplication, potential memory inefficiencies when handling large numbers of objects, and making exception handling more robust. Overall, this is a solid contribution, and addressing these points will enhance its correctness and maintainability.
core/src/main/java/com/scalar/db/storage/objectstorage/s3/S3Wrapper.java
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/s3/S3Wrapper.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/s3/S3Wrapper.java
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/s3/S3Wrapper.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/s3/S3Wrapper.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/s3/S3Wrapper.java
Show resolved
Hide resolved
| } else { | ||
| throw new IllegalArgumentException( | ||
| "Invalid contact points format. Expected: BLOB_URI/BUCKET_NAME"); | ||
| "Invalid contact points format. Expected: BLOB_URI/CONTAINER_NAME"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Blob Storage, a bucket is called a container.
| import com.scalar.db.storage.objectstorage.ObjectStorageConfig; | ||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance-related parameters may be integrated into ObjectStorageConfig if GCS has similar options.
| .httpClientBuilder( | ||
| AwsCrtAsyncHttpClient.builder() | ||
| .maxConcurrency(config.getParallelUploadMaxParallelism())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CRT-based client is used for performance.
https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds Amazon S3 support to the Object Storage adapter, enabling ScalarDB to use S3 as a backend storage option alongside the existing Azure Blob Storage adapter.
- Implemented
S3Wrapperclass that provides S3-specific operations (get, insert, update, delete, etc.) - Added configuration, provider, and error handling classes for S3
- Refactored shared exception constructors and interface methods to support both storage types
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| S3Wrapper.java | Core S3 implementation with async client operations and error handling |
| S3Config.java | Configuration parser for S3-specific settings (region, bucket, parallelism, etc.) |
| S3Provider.java | Service provider for S3 storage |
| S3ErrorCode.java | Enum for S3 HTTP status codes |
| S3WrapperTest.java | Comprehensive unit tests for S3Wrapper |
| S3ConfigTest.java | Unit tests for S3Config |
| ObjectStorageWrapperFactory.java | Added S3 instantiation logic |
| ObjectStorageUtils.java | Added S3 config factory logic |
| ObjectStorageConfig.java | Removed getEndpoint() method (S3-specific) |
| BlobStorageConfig.java | Moved getEndpoint() from interface to implementation, fixed error message |
| PreconditionFailedException.java | Removed unused single-argument constructor |
| ObjectStorageWrapperException.java | Removed unused single-argument constructor |
| ConflictOccurredException.java | New exception class for conflict errors |
| ObjectStorageWrapperIntegrationTest.java | Fixed test signature and improved test cleanup |
| ObjectStorageEnv.java | Added storage type configuration support |
| core/build.gradle | Added AWS S3 SDK dependencies |
| DistributedStorageProvider | Registered S3Provider service |
| object-storage-adapter-check.yaml | New GitHub workflow for S3 integration tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| public S3Wrapper(S3Config config) { | ||
| this.client = | ||
| S3AsyncClient.builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3 async client is used because it supports automatic parallel transfers.
http://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/s3-async-client-multipart.html
core/src/integration-test/java/com/scalar/db/storage/objectstorage/ObjectStorageEnv.java
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/s3/S3Config.java
Outdated
Show resolved
Hide resolved
| if (cause instanceof S3Exception) { | ||
| Optional<S3Exception> s3Exception = findS3Exception(e); | ||
| if (s3Exception.isPresent()) { | ||
| if (s3Exception.get().statusCode() == S3ErrorCode.CONFLICT.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could happen without version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. For example, for If-None-Match, the official document said as follows:
If-None-Match - Uploads the object only if the object key name does not already exist in the specified bucket. Otherwise, Amazon S3 returns a 412 Precondition Failed error. If a conflicting operation occurs during the upload, S3 returns a 409 ConditionalRequestConflict response. On a 409 failure, retry the upload.
But it doesn't have a similar description for if-Match header and a situation without a header. The catch block is put just in case.
brfrn169
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left several comments. PTAL!
...ation-test/java/com/scalar/db/storage/objectstorage/ObjectStorageWrapperIntegrationTest.java
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/ObjectStorageUtils.java
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/ObjectStorageWrapperFactory.java
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/s3/S3Wrapper.java
Show resolved
Hide resolved
komamitsu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
brfrn169
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
Torch3333
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
feeblefakie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looking good!
Left some suggestions. PTAL!
| public static final long DEFAULT_PARALLEL_UPLOAD_BLOCK_SIZE_IN_BYTES = 8 * 1024 * 1024; // 8MB | ||
| public static final int DEFAULT_PARALLEL_UPLOAD_MAX_PARALLELISM = 50; | ||
| public static final long DEFAULT_PARALLEL_UPLOAD_THRESHOLD_IN_BYTES = 8 * 1024 * 1024; // 8MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any recommended practice for the values depending on workloads?
Or, are they just set randomly?
If there is one, it would be helpful to note it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the lack of reference.
8MB is the default value for minimumPartSizeInBytes and thresholdInBytes.
Ref: https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/multipart/MultipartConfiguration.Builder.html
For maxConcurrency, it seems that the following default value is used:
https://github.com/aws/aws-sdk-java-v2/blob/e3b5e0a44bcfe2779757610179e8fb00014a8bed/http-client-spi/src/main/java/software/amazon/awssdk/http/SdkHttpConfigurationOption.java#L151
For the Blob Storage's parallel upload options, I couldn't find the default value. So, I set the random value in the previous PR. Should we set the same default values on the Blob Storage side? I want to hear your opinion.
| import software.amazon.awssdk.services.s3.model.S3Exception; | ||
| import software.amazon.awssdk.services.s3.multipart.MultipartConfiguration; | ||
|
|
||
| public class S3Wrapper implements ObjectStorageWrapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3AsyncClient seems to be thread-safe, so should this be marked @ThreadSafe?
BTW, the implementation of ObjectStorageWrapper must be thread-safe, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the implementation of ObjectStorageWrapper must be thread-safe, right?
Yes. Because each statement handler must be thread-safe, ObjectStorageWrapper must also be thread-safe. Let me mark it as thread-safe. Thank you for pointing it out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've applied it in b7825fb.
| } | ||
| if (s3Exception.get().statusCode() == S3ErrorCode.NOT_FOUND.get() | ||
| || s3Exception.get().statusCode() == S3ErrorCode.PRECONDITION_FAILED.get()) { | ||
| throw new PreconditionFailedException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason for defining Object Storage adapter-specific exceptions?
Can we just throw RetriableExecutionException here and throw ExecutionException instead of ObjectStorageWrapperException, which is simpler and more consistent with the other adapters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the decision to throw a RetriableExecutionException should be made within the class that uses ObjectStorageWrapper, such as MutateStatementHandler or ObjectStorageAdmin. In the current implementation, if a PreconditionFailedException is caught, a RetriableExecutionException is thrown, but this depends on how the update method is used. Therefore, it would be better to distinguish between PreconditionFailedException and ConflictOccurredException, and ExecutionException alone is not sufficient. Also, since ObjectStorageWrapper is an abstraction layer, I don't think it's unnatural for it to have its own defined exceptions. For example, JDBC databases basically throw SQLException.
Description
This PR adds S3 support to the Object Storage adapter.
Related issues and/or PRs
Changes made
ObjectStorageWrapperimplementation for S3.Checklist
Additional notes (optional)
N/A
Release notes
Added Amazon S3 adapter.