Skip to content

Conversation

@AlphaBs
Copy link

@AlphaBs AlphaBs commented Oct 22, 2025

Description

This PR introduces a new optional feature to resolve relative Markdown links (e.g., ../assets/image.png or ./another-doc.md) into their full absolute paths during the conversion.

Motivation

This new option ensures link integrity by converting all relative paths to absolute URLs (e.g., <base_url>/assets/image.png), making them accessible from anywhere and preserving the correct context for the LLM.

Changes

New Config Options:

  • absolute_link: bool: Added a new option, defaulting to False for backward compatibility. When set to True, the absolute path conversion is activated.

  • index_file_name: str: Added a new option, defaulting to index.md. This is used to correctly resolve directory links (e.g., [link](./docs/)) to their corresponding index file (e.g., [link](<base_uri>/docs/index.md)).

Conversion Logic:

The logic runs before the final HTML is converted to Markdown. It parses the HTML, finds all tags, and reads their href attributes. If absolute_link: true, it converts relative paths into absolute URLs using the base_url and the page's directory.

Example

mkdocs.yml

plugins:
  - llmstxt:
      full_output: llms-full.txt
      base_url: https://example.com
      absolute_link: true

docs/page.md

- [Another Page](./another-page.md)
- [Image](../assets/image.png)
- [Docs Section](./section/)
- [Google](https://google.com)
- [Anchor](#title)

llms.txt

- [Another Page](https://example.com/docs/another-page.md)
- [Image](https://example.com/assets/image.png)
- [Docs Section](https://example.com/docs/section/index.md)
- [Google](https://google.com)
- [Anchor](#title)

- Introduced `index_file_name` option in the plugin configuration with a default value of "index.md".
- Refactored link handling in the Markdown generation process to convert relative links to absolute URLs.
- Introduced `absolute_link` option in the plugin configuration to control the conversion of relative links to absolute URLs.
@pawamoy
Copy link
Owner

pawamoy commented Oct 25, 2025

Thanks a lot for the PR @AlphaBs!

Don't we always want to output absolute links? Is the option really useful here, if the "backward-compatible" behavior is actually a bug?

Also, I'm not sure to understand the purpose of the index_file_name option. Aren't index files always named index.md?

@AlphaBs
Copy link
Author

AlphaBs commented Oct 25, 2025

Thanks for the feedback!

  • For my use case, I always need absolute links. However, I set the default to false because I wasn't sure about other users' needs, even though I personally believe the default should be true. I'm happy to go with your opinion on this. I've never personally needed relative links, so I can't judge if the previous behavior was a required feature for someone else.

  • I created the index_file_name option because I wasn't completely sure how mkdocs works internally. I added it just in case there was a configuration in mkdocs that might change the output filename (e.g., from README.md to something other than index.md). If mkdocs always converts README.md to index.md, then you are correct, and this option is unnecessary.

@pawamoy
Copy link
Owner

pawamoy commented Oct 25, 2025

Let's go with always rendering absolute links then!

I created the index_file_name option because I wasn't completely sure how mkdocs works internally. I added it just in case there was a configuration in mkdocs that might change the output filename (e.g., from README.md to something other than index.md). If mkdocs always converts README.md to index.md, then you are correct, and this option is unnecessary.

I see where you come from. MkDocs will consider README.md to be an index page, but this index page will always be named index.html. From there, the equivalent Markdown page we build will always be named index.md. Well, I think. Let me check.

@pawamoy
Copy link
Owner

pawamoy commented Oct 25, 2025

Yeah that's probably it:

path_md = Path(page.file.abs_dest_path).with_suffix(".md")

The absolute destination path would definitely be index.html even if the page is named readme.md.

So, you can remove both options and update the code accordingly 🙂

Thanks for the super fast answer!

@AlphaBs
Copy link
Author

AlphaBs commented Oct 25, 2025

Thanks for clarifying! I'll remove the absolute_link and index_file_name option and update the code to always generate absolute links right now.

@AlphaBs
Copy link
Author

AlphaBs commented Oct 25, 2025

Updated! Let me know if there's anything else you'd like me to change.

Copy link
Owner

@pawamoy pawamoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions and a request to revert the architectural changes (I prefer functions with parameters over methods conveniently depending on self).

@pawamoy
Copy link
Owner

pawamoy commented Oct 26, 2025

Did you use AI by the way? Just curious.

Comment on lines 279 to 281
# Skip if it's a `mailto:` or other protocol links.
if ":" in href and not href.startswith("/"):
continue
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems a bit brittle? Any thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the ':' character wasn't allowed in paths, but I see it's possible on Linux. I'll need to think about a better approach.

continue

# Convert relative link to absolute.
if href.startswith("/"):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should skip links that start with /. MkDocs will warn about those when the link is to an actual page, and will leave such links as they are. We should assume users know what they are doing and they want this exact link, and we shouldn't assume there's a corresponding llmstxt/markdown page.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that sounds like a better approach.

@AlphaBs
Copy link
Author

AlphaBs commented Oct 29, 2025

Did you use AI by the way? Just curious.

I used AI for writing the comments and the PR message, as I'm not a native English speaker. I wrote the code logic myself.

@AlphaBs
Copy link
Author

AlphaBs commented Oct 30, 2025

What do you think about separating the relative path conversion logic from plugin.py?
Currently, we're testing this logic within test_plugin.py, and it seems to make debugging difficult when tests fail, and the test code is becoming overly complicated.
We could thoroughly test the conversion logic in its own test_converter.py and leave only the very basic integration cases in test_plugin.py.

@pawamoy
Copy link
Owner

pawamoy commented Nov 5, 2025

Sure, feel free to do that in this PR. You can import the private function in a new test module (and ignore any lint warning from doing so) to test it in isolation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants