-
Notifications
You must be signed in to change notification settings - Fork 73
[SYNPY-1591] test upload speed of uploading large files to synapse #1264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
docs/scripts/uploadBenchmark.py
Outdated
| print( | ||
| f"total_size_of_files_gib: {total_size_of_files_bytes / (1024 * MiB)}" | ||
| ) | ||
| # 32 MiB chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the files that I want to upload are huge, I think it no longer makes sense to upload them as 1 MiB chunk. I changed it to 32 MiB chunks for files that are bigger than 5GB. I am not sure if that affects anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are misunderstanding what this portion of the code is for. Chunk size is not uploading them as 1 MiB chunks, it is creating the files on disk with that size. This has nothing to do with how the file is uploaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. But if I am creating a file that is 100 GiB, should I create them as 32 MiB chunks keep it as 1 MiB chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not matter. I used this script for a 100G file when I was benchmarking the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I can revert this part back. Thanks for explaining.
| ) # nosec | ||
| if delete_synapse: | ||
| for child in syn.getChildren(PARENT_PROJECT, includeTypes=["folder"]): | ||
| for child in syn.getChildren(PARENT_PROJECT, includeTypes=["folder", "file"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I added "file" here because I am uploading all files directly to a project. If this is not desired, I could revert.
docs/scripts/uploadBenchmark.py
Outdated
| print(f"\nTime to S3 sync: {perf_counter() - time_before_sync}") | ||
|
|
||
| async def upload_multi_files_under_folder(path: str, total_files: int = 1) -> Project: | ||
| # Create a project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is I created all the File objects under "Project" and then store the project in the end with all the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as execute_walk_test_oop, with a little less functionality
| ) | ||
|
|
||
|
|
||
| def execute_walk_file_sequential( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanFauble I modified the execute_walk_test_oop function, and I am going to test uploading files sequentially. Do you mind taking a look? I already tested it with small files to ensure that this function works and files can be uploaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this looks good to me!
Problem:
Test upload speed of uploading large files to Synapse