-
Notifications
You must be signed in to change notification settings - Fork 527
SNOW-1675422: add write pandas to asyncio #2630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1675422: add write pandas to asyncio #2630
Conversation
| len(copy_results), | ||
| sum(int(e[3]) for e in copy_results), | ||
| copy_results, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pd_writer and make_pd_writer were skipped, since Pandas.to_sql does not support async operations anyway. We could consider using get_running_loop or creating a new loop inside the pd_writer function to run the whole async write_pandas in sync pandas.to_sql - but I am not sure if we should promise to deliver such interface with such behaviour right now - it may be unstable. Also overhead for spinning up the loop may be noticeable.
| use_scoped_temp_object: bool, | ||
| ) -> None: | ||
| create_stage_sql = f"CREATE {get_temp_type_for_object(use_scoped_temp_object)} STAGE /* Python:snowflake.connector.aio._pandas_tools.write_pandas() */ identifier(?) FILE_FORMAT=(TYPE=PARQUET COMPRESSION={compression}{' BINARY_AS_TEXT=FALSE' if auto_create_table or overwrite else ''})" | ||
| params = (stage_location,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chances we can bind compression type? Just to avoid sql incjection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, created jira to fix that in sync code as well: https://snowflakecomputing.atlassian.net/browse/SNOW-2690895
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually on a second thought - I think it is already addressed by:
compression_map = {"gzip": "auto", "snappy": "snappy"}
if compression not in compression_map.keys():
raise ProgrammingError(
f"Invalid compression '{compression}', only acceptable values are: {compression_map.keys()}"
)
in write_pandas
| ) | ||
|
|
||
| if create_temp_table: | ||
| warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider removing any deprecation warnings from aio code? The async io version is a new API and we can take advantage of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I think we can make such decision and remove it from here.
ef96cc1 to
6109387
Compare
sfc-gh-turbaszek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
SNOW-1675422-implement-write-pandas-in-async-io-version
Please answer these questions before submitting your pull requests. Thanks!
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes #NNNN
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Please write a short description of how your code change solves the related issue.
(Optional) PR for stored-proc connector: