Skip to content

Conversation

@linglp
Copy link
Contributor

@linglp linglp commented Oct 28, 2025

Problem:

Test upload speed of uploading large files to Synapse

print(
f"total_size_of_files_gib: {total_size_of_files_bytes / (1024 * MiB)}"
)
# 32 MiB chunks
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the files that I want to upload are huge, I think it no longer makes sense to upload them as 1 MiB chunk. I changed it to 32 MiB chunks for files that are bigger than 5GB. I am not sure if that affects anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are misunderstanding what this portion of the code is for. Chunk size is not uploading them as 1 MiB chunks, it is creating the files on disk with that size. This has nothing to do with how the file is uploaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. But if I am creating a file that is 100 GiB, should I create them as 32 MiB chunks keep it as 1 MiB chunk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not matter. I used this script for a 100G file when I was benchmarking the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I can revert this part back. Thanks for explaining.

) # nosec
if delete_synapse:
for child in syn.getChildren(PARENT_PROJECT, includeTypes=["folder"]):
for child in syn.getChildren(PARENT_PROJECT, includeTypes=["folder", "file"]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I added "file" here because I am uploading all files directly to a project. If this is not desired, I could revert.

print(f"\nTime to S3 sync: {perf_counter() - time_before_sync}")

async def upload_multi_files_under_folder(path: str, total_files: int = 1) -> Project:
# Create a project
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is I created all the File objects under "Project" and then store the project in the end with all the files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as execute_walk_test_oop, with a little less functionality

)


def execute_walk_file_sequential(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanFauble I modified the execute_walk_test_oop function, and I am going to test uploading files sequentially. Do you mind taking a look? I already tested it with small files to ensure that this function works and files can be uploaded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants