-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
DOC: Add floating point precision on writing/reading to csv (#13159) … #62770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
e1d0119
a5410c1
28fb8dd
7040375
d352592
92dd1da
91fd754
52a5726
8f69e4c
96110e9
7137897
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -1671,6 +1671,36 @@ function takes a number of arguments. Only the first is required. | |||||
| * ``chunksize``: Number of rows to write at a time | ||||||
| * ``date_format``: Format string for datetime objects | ||||||
|
|
||||||
| Floating Point Precision on Writing and Reading to CSV Files | ||||||
| +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | ||||||
|
|
||||||
| Floating Point Precision inaccuracies when writing and reading to CSV files happen due to how the numeric data is represented and parsed in pandas. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the content in its current state talks about implementation details and is a bit harsh on pandas, to the extent that I think its missing the larger point that floating point values are by nature not exact. Taking a step back - what is the overall goal that this documentation is trying to achieve?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi Will, thank you for the feedback. The overall goal is to explain that, by default, due to computer arithmetic outside of our control, floating point numbers are not always stored or returned with exact accuracy. My intent with the doc addition I added is to show that floating point numbers cannot always be stored precisely, and differences can arise when values are converted and later read back. However, to help with this, pandas provides options such as the float_format parameter (for writing) and the float_precision="round_trip" parameter (for reading) that help improve precision when writing and reading to csv. So that they are preserved just as the were and precision loss doesn't happen. |
||||||
| During the write process, pandas converts all the numeric values into text that is stored as bytes in the CSV file. However, when we read the CSV back, pandas parses those | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| text values and converts them back into different types (floats, integers, strings) which is when the loss of float point precision happens. | ||||||
| The conversion process is not always guaranteed to be accurate because small differences in data representation between original and reloaded data frame can occur leading to precision loss. | ||||||
|
|
||||||
| * ``float_format``: Format string for floating point numbers | ||||||
|
|
||||||
| ``df.to_csv('file.csv', float_format='%.17g')`` allows for floating point precision to be specified when writing to the CSV file. In this example, this ensures that the floating point is written in this exact format of 17 significant digits (64-bit float). | ||||||
|
|
||||||
| ``df = pd.read_csv('file.csv', float_precision='round_trip')`` allows for floating point precision to be specified when reading from the CSV file. This is guaranteed to round-trip values after writing to a file and Pandas will read the numbers without losing or changing decimal places. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| .. ipython:: python | ||||||
|
|
||||||
| from io import StringIO | ||||||
|
|
||||||
| x0 = 18292498239.824 | ||||||
| df1 = pd.DataFrame({'One': [x0]}, index=["bignum"]) | ||||||
|
|
||||||
| csv_string = df1.to_csv(float_format='%.17g') | ||||||
| df2 = pd.read_csv(StringIO(csv_string), index_col=0, float_precision='round_trip') | ||||||
|
|
||||||
| x1 = df1.iloc[0, 0] | ||||||
| x2 = df2.iloc[0, 0] | ||||||
|
|
||||||
| print(f"x0 = {x0}; x1 = {x1}; Are they equal? {x0 == x1}") | ||||||
| print(f"x0 = {x0}; x2 = {x2}; Are they equal? {x0 == x2}") | ||||||
|
|
||||||
| Writing a formatted string | ||||||
| ++++++++++++++++++++++++++ | ||||||
|
|
||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.