Skip to content

Conversation

@FazeelUsmani
Copy link

This PR adds a new configuration option linkcheck_ignore_case to enable case-insensitive URL and anchor checking in the linkcheck builder.

Problem

Some web servers (e.g., GitHub, certain hosting platforms) are case-insensitive and may return URLs with different casing than the original link. This causes the linkcheck builder to report false-positive redirects when the URLs differ only in case, even though they point to the same resource.

Solution

  • Added linkcheck_ignore_case boolean configuration option (default: False)
  • Modified URL comparison logic to support case-insensitive matching when enabled
  • Modified anchor comparison in AnchorCheckParser to support case-insensitive matching when enabled
  • Added comprehensive tests for both URL and anchor case-insensitive checking
  • Updated documentation in doc/usage/configuration.rst

@FazeelUsmani FazeelUsmani marked this pull request as draft November 7, 2025 15:08
@jayaddison
Copy link
Contributor

Hi @FazeelUsmani - thank you for developing and describing this pull request.

I have a concern that enabling the option reduces the precision of other hyperlinks that are checked.

Could you explain the use case where it would be easier for a documentation project to enable this option by editing the conf.py instead of fixing the URLs/anchors in their documentation sources to use the correct casing?

@FazeelUsmani FazeelUsmani marked this pull request as ready for review November 10, 2025 08:56
@FazeelUsmani
Copy link
Author

That’s a good point, @jayaddison.
This option is off by default and meant mainly only for large or older docs where many links hit servers that normalise URL casing (like GitHub) or are case-insensitive (like Windows). Enabling it just filters out harmless casing-related redirects so teams can focus on real link issues instead of noise.

@jayaddison
Copy link
Contributor

@FazeelUsmani got it, understood. As often happens, I had a misunderstanding to begin with - you are saying that this only affects whether case-adjusted response URLs are considered to be redirect instead of successful.

Let me think about this a little more; I do understand the value in this now, but am wary of (and trying to think of) any problem side-effects.

@jayaddison
Copy link
Contributor

(also, thank you for the explanation)

@jayaddison
Copy link
Contributor

Separately: I do think that we should probably isolate the redirect-case-sensitivity handling from the HTML anchor case-sensitivity; they seem fairly functionally different from each other to me.

@FazeelUsmani
Copy link
Author

Hmm.. makes sense. I can refactor this into two separate options:
linkcheck_ignore_case_urls: For comparing URL paths (the redirect scenario)
linkcheck_ignore_case_anchors: For comparing HTML anchors

This would give users more granular control. Most users would likely want linkcheck_ignore_case_urls = True (for case-insensitive servers) while keeping linkcheck_ignore_case_anchors = False (since HTML IDs are technically case-sensitive per spec). What do you say?

@AA-Turner
Copy link
Member

AA-Turner commented Nov 10, 2025

Two options seems overkill for this use-case. What do browsers do de facto on case mismatches on fragment IDs?

A

@FazeelUsmani
Copy link
Author

Fair point — browsers generally treat fragment IDs as case-sensitive, though behavior can vary depending on the HTML generator. My thought was mainly to avoid false negatives in edge cases (like auto-generated anchors that normalize casing differently).
That said, I’m fine keeping it as a single option if we note the anchor behavior clearly in the docs.

@jayaddison
Copy link
Contributor

I can't think of drawbacks to the redirect case-folding -- and although it's maybe slightly controversial, I wonder whether we should enable it by default.

The anchor-checking I'm less certain about; given that we believe browsers seem to navigate to anchors case-sensitively -- something I too checked locally and that is certainly the case in Firefox 140.4 -- I'd be reluctant to offer that without a demonstrable use-case (again that can't be solved easily by fixing the source documentation).

@FazeelUsmani
Copy link
Author

That makes sense — I’ll keep it as a single linkcheck_ignore_case option limited to the URL path. Anchor checks will remain case-sensitive to align with browser behavior, and I’ll clarify this distinction in the docs so users understand the expected behavior.

@jayaddison
Copy link
Contributor

Sounds good to me! Thanks @FazeelUsmani.

@FazeelUsmani FazeelUsmani marked this pull request as draft November 11, 2025 13:13
@FazeelUsmani FazeelUsmani force-pushed the linkcheck_case_insensitive branch 3 times, most recently from 56d6a63 to d115b1e Compare November 11, 2025 14:31
@FazeelUsmani FazeelUsmani force-pushed the linkcheck_case_insensitive branch 2 times, most recently from 90d4145 to d115b1e Compare November 11, 2025 14:55
@FazeelUsmani FazeelUsmani marked this pull request as ready for review November 11, 2025 16:15
FazeelUsmani and others added 3 commits November 12, 2025 13:16
@jayaddison
Copy link
Contributor

Hi @FazeelUsmani - I received a ping about this commit: FazeelUsmani@6411c4f

Some of it looks OK, but the Claude AI seems to have (unexpectedly?) adjusted the name of the config value setting to something different again. I notice that you haven't pushed the commit here yet, so maybe it's off-topic, but I figured it'd be worth mentioning.

I'm not too keen on re-using the ignore verb in the config variable name, because it has a well-defined meaning in the linkcheck builder already (i.e. some links are effectively skipped and not checked at all).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants