Skip to content

Conversation

@oosei25
Copy link

@oosei25 oosei25 commented Nov 2, 2025

Summary
Improve the error message raised by pd.read_stata when the input is not a
valid Stata dataset (e.g., users paste a GitHub /blob or /tree URL).
The new message is clearer about supported Stata versions and, when the input
looks like a GitHub page, appends a short hint to use the Raw file URL.

Motivation
Students frequently paste GitHub page URLs into read_stata, which previously
surfaced a confusing “Version of given Stata file is …” error. This PR makes
that failure mode actionable.

Behavior change

  • Before: generic version/format error from the parser.
  • After: explicit “not a valid Stata dataset … pandas supports versions …”
    message; if the string contains github.com with /blob or /tree, add:
    “Use the Raw file URL (replace /blob/ with /raw/ or click ‘Raw’).”

Implementation
Wrap the StataReader(...) construction in try/except ValueError. If the
message matches the existing “version/format” text, re-raise with the clearer
message. Append the GitHub hint only when the input path is a GitHub page URL.
No I/O is performed in the new logic; this only changes the error text.

Tests

  • test_non_stata_gives_clear_message
  • test_github_blob_hint_is_appended
  • test_github_tree_hint_is_appended

Tests monkeypatch pandas.io.stata.StataReader to raise the original parsing
ValueError so there is no network dependency. All pass locally.

User-visible change?
Yes (error text), but no API change. No performance impact.

Docs
Optional; happy to add a short note in Stata I/O docs showing
wrong (/blob//tree) vs right (/raw or raw.githubusercontent.com) if preferred.

Checklist

  • Tests added and passing locally
  • pre-commit run -a clean locally
  • No API changes; only clearer error text
  • (optional) Add small docs note if maintainers want it

@oosei25 oosei25 closed this Nov 3, 2025
@oosei25 oosei25 deleted the fix/read-stata-github-blob-warning branch November 3, 2025 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant