IO/Stata: clarify error when input is not a Stata dataset; add GitHub… #62957
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Improve the error message raised by
pd.read_statawhen the input is not avalid Stata dataset (e.g., users paste a GitHub
/blobor/treeURL).The new message is clearer about supported Stata versions and, when the input
looks like a GitHub page, appends a short hint to use the Raw file URL.
Motivation
Students frequently paste GitHub page URLs into
read_stata, which previouslysurfaced a confusing “Version of given Stata file is …” error. This PR makes
that failure mode actionable.
Behavior change
message; if the string contains
github.comwith/blobor/tree, add:“Use the Raw file URL (replace
/blob/with/raw/or click ‘Raw’).”Implementation
Wrap the
StataReader(...)construction intry/except ValueError. If themessage matches the existing “version/format” text, re-raise with the clearer
message. Append the GitHub hint only when the input path is a GitHub page URL.
No I/O is performed in the new logic; this only changes the error text.
Tests
test_non_stata_gives_clear_messagetest_github_blob_hint_is_appendedtest_github_tree_hint_is_appendedUser-visible change?
Yes (error text), but no API change. No performance impact.
Docs
Optional; happy to add a short note in Stata I/O docs showing
wrong (
/blob//tree) vs right (/rawor raw.githubusercontent.com) if preferred.Checklist
pre-commit run -aclean locally