Skip to content

Commit f3e3903

Browse files
alan707claude
andauthored
Fix broken links (#27)
* Fix anchor links in markdown conversion - Handle .md#anchor links properly by preserving anchor fragments - Split URL and anchor parts before processing - Reassemble with trailing slash and anchor intact 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix context-aware link conversion - Add source file parameter to link conversion methods - Resolve relative paths against source file location - Apply proper category mapping based on section - Handle relative links like faq.md#anchor correctly 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add external API link redirection - Add external_links configuration in config.yaml - Configure bazel_api_base for external API documentation - Add logic to redirect /rules/ and /reference/ links to bazel.build - Preserve anchors and fragments in external links 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * add claude code --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent 3e72c6e commit f3e3903

File tree

3 files changed

+167
-10
lines changed

3 files changed

+167
-10
lines changed

CLAUDE.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
This is a Python-based tool that converts Google Devsite documentation (from bazel.build/docs) into Hugo/Docsy format for easier navigation and modification. The converter transforms Devsite frontmatter, directory layout, and styling to be compatible with the Hugo static site generator and Docsy theme.
8+
9+
## Core Commands
10+
11+
### Running the Converter
12+
```bash
13+
# Basic conversion
14+
python cli.py convert --source /path/to/devsite/source --output /path/to/hugo/output
15+
16+
# With dry run validation
17+
python cli.py convert --source /path/to/devsite/source --output /path/to/hugo/output --dry-run
18+
19+
# Incremental conversion (only changed files)
20+
python cli.py convert --source /path/to/devsite/source --output /path/to/hugo/output --incremental
21+
22+
# View converter info
23+
python cli.py info
24+
```
25+
26+
### Environment Setup
27+
```bash
28+
# Install dependencies
29+
pip install -r requirements.txt
30+
31+
# Or using the project setup
32+
pip install -e .
33+
```
34+
35+
### Docker Usage
36+
```bash
37+
# Run the Docker container
38+
docker run -it -p 1313:1313 alan707/bazel-docs:latest bash
39+
40+
# Inside container: convert docs
41+
python /app/cli.py convert --source /app/work/bazel-source/site/en/ --output /app/docs/
42+
43+
# Inside container: setup Hugo modules and run server
44+
cd /app/docs
45+
hugo mod init github.com/alan707/bazel-docs && \
46+
hugo mod get github.com/google/docsy@v0.12.0 && \
47+
hugo mod tidy
48+
hugo server --bind 0.0.0.0 --baseURL "http://localhost:1313"
49+
```
50+
51+
## Architecture
52+
53+
### Core Components
54+
55+
1. **CLI Interface (`cli.py`)**: Click-based command line interface with convert and info commands
56+
2. **Main Converter (`devsite_to_hugo_converter.py`)**: Orchestrates the conversion process using parser and generator
57+
3. **Devsite Parser (`utils/devsite_parser.py`)**: Parses Google Devsite structure, including `_book.yaml` and `_index.yaml` files
58+
4. **Hugo Generator (`utils/hugo_generator.py`)**: Generates Hugo site structure and configuration using Jinja2 templates
59+
60+
### Configuration System
61+
62+
The `config.yaml` file controls all aspects of the conversion:
63+
64+
- **Content Mapping**: Maps Devsite sections to Hugo categories (tutorials, how-to-guides, explanations, reference)
65+
- **External Links**: Handles redirects to legacy Bazel API documentation
66+
- **Code Language Detection**: Automatic language detection for code blocks using pattern matching
67+
- **CSS Conversion**: Transforms CSS/SCSS for Docsy theme compatibility
68+
- **File Patterns**: Controls which files are included/excluded during conversion
69+
70+
### Template System
71+
72+
Uses Jinja2 templates in the `templates/` directory:
73+
- `hugo_config.yaml.jinja2`: Generates Hugo site configuration
74+
- `section_index.jinja2`: Creates section index pages
75+
76+
### Content Organization
77+
78+
The converter maps Devsite sections to Hugo content types:
79+
- Tutorials → tutorials category (weight 1-3)
80+
- Install/Configure/Build guides → how-to-guides category
81+
- Concepts/Extending → explanations category
82+
- Reference materials → reference category
83+
84+
## Development Notes
85+
86+
### Code Language Detection
87+
The system automatically detects programming languages for code blocks without explicit language identifiers using pattern matching defined in `config.yaml`. Supports Starlark (Bazel), Bash, Python, C++, Java, JavaScript, TypeScript, and more.
88+
89+
### Link Conversion
90+
The converter handles both internal link conversion within the Hugo site and external link redirection to maintain compatibility with existing Bazel API documentation.
91+
92+
### CSS/SCSS Processing
93+
PostCSS and Autoprefixer are used for CSS processing (see package.json dependencies), though the main conversion logic is in Python.

config.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,16 @@ hugo:
1313
baseURL: "https://bazel-docs-68tmf.ondigitalocean.app/"
1414
languageCode: "en-us"
1515
theme: "docsy"
16+
17+
# External links configuration
18+
external_links:
19+
# Base URL for legacy Bazel API documentation
20+
bazel_api_base: "https://bazel.build"
21+
# Paths that should be redirected to external API docs
22+
external_paths:
23+
- "/rules/"
24+
- "/reference/"
25+
- "/docs/build-ref"
1626

1727
content_mapping:
1828
# set 'enable_category_indices' to true to generate _index.md files for categories

devsite_to_hugo_converter.py

Lines changed: 64 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ def _convert_single_file(self, source_file: Path, output_file: Path, dry_run: bo
233233
hugo_frontmatter['title'] = title_from_h1
234234

235235
# Convert body content
236-
hugo_body = self._convert_body_content(body)
236+
hugo_body = self._convert_body_content(body, source_file)
237237

238238
# Remove duplicate H1 title if it matches frontmatter title
239239
if 'title' in hugo_frontmatter:
@@ -311,7 +311,7 @@ def _convert_frontmatter(self, frontmatter: Dict) -> Dict:
311311

312312
return hugo_frontmatter
313313

314-
def _convert_body_content(self, body: str) -> str:
314+
def _convert_body_content(self, body: str, source_file: Path) -> str:
315315
"""Convert Devsite-specific content to Hugo format"""
316316
# Remove [TOC] directive (let Docsy handle TOC automatically)
317317
body = re.sub(r'\[TOC\]', '', body)
@@ -342,7 +342,7 @@ def _convert_body_content(self, body: str) -> str:
342342
body = body.strip() # Remove trailing whitespace
343343

344344
# Convert internal links
345-
body = self._convert_internal_links(body)
345+
body = self._convert_internal_links(body, source_file)
346346

347347
# Fix directory structure formatting
348348
body = self._fix_directory_structures(body)
@@ -383,7 +383,7 @@ def _remove_duplicate_h1_title(self, body: str,
383383

384384
return body
385385

386-
def _convert_internal_links(self, content: str) -> str:
386+
def _convert_internal_links(self, content: str, source_file: Path) -> str:
387387
"""Convert internal links to Hugo format"""
388388
# Pattern for markdown links - handle multi-line links
389389
link_pattern = r'\[([^\]]+)\]\(([^)]+)\)'
@@ -404,19 +404,68 @@ def replace_link(match):
404404
if link_url.startswith('#'):
405405
return match.group(0)
406406

407+
# Handle external API links (absolute paths to external documentation)
408+
if link_url.startswith('/') and self._should_redirect_to_external(link_url):
409+
external_base = self.config.get('external_links', {}).get('bazel_api_base', 'https://bazel.build')
410+
return f'[{link_text}]({external_base}{link_url})'
411+
407412
# Handle relative links to .md files
408-
if link_url.endswith('.md'):
413+
if link_url.endswith('.md') or '.md#' in link_url:
414+
# Split URL and anchor
415+
if '#' in link_url:
416+
url_part, anchor_part = link_url.split('#', 1)
417+
anchor = f'#{anchor_part}'
418+
else:
419+
url_part = link_url
420+
anchor = ''
421+
409422
# Normalize the path
410-
normalized_path = link_url.replace('.md', '')
411-
# Remove leading './' if present
412-
if normalized_path.startswith('./'):
413-
normalized_path = normalized_path[2:]
423+
normalized_path = url_part.replace('.md', '')
424+
425+
# Handle relative paths by resolving against source file location
426+
if not normalized_path.startswith('/'):
427+
# Get the source file's directory relative to the source root
428+
source_dir = source_file.parent
429+
# Find the source root (work/bazel-source/site/en)
430+
source_root = None
431+
for parent in source_file.parents:
432+
if parent.name == 'en' and parent.parent.name == 'site':
433+
source_root = parent
434+
break
435+
436+
if source_root:
437+
# Get relative directory from source root
438+
rel_source_dir = source_dir.relative_to(source_root)
439+
440+
# Remove leading './' if present
441+
if normalized_path.startswith('./'):
442+
normalized_path = normalized_path[2:]
443+
444+
# Resolve relative path
445+
if str(rel_source_dir) == '.':
446+
# File is in root, just use the filename
447+
full_path = normalized_path
448+
else:
449+
# Combine source directory with relative path
450+
full_path = str(rel_source_dir / normalized_path)
451+
452+
# Get category mapping for this path
453+
path_parts = full_path.split('/')
454+
if path_parts:
455+
section_name = path_parts[0]
456+
if section_name in self.config.get('content_mapping', {}):
457+
mapping = self.config['content_mapping'][section_name]
458+
category_type = mapping['type']
459+
return f'[{link_text}](/{category_type}/{full_path}/{anchor})'
460+
else:
461+
return f'[{link_text}]({full_path}/{anchor})'
462+
414463
# Remove leading '/' if present (absolute paths within site)
415464
if normalized_path.startswith('/'):
416465
normalized_path = normalized_path[1:]
417466

418467
# Use simple relative links to avoid shortcode issues
419-
return f'[{link_text}](/{normalized_path}/)'
468+
return f'[{link_text}](/{normalized_path}/{anchor})'
420469

421470
# Handle relative links to directories (assume they have index pages)
422471
if '/' in link_url and not '.' in link_url.split('/')[-1]:
@@ -434,6 +483,11 @@ def replace_link(match):
434483

435484
return re.sub(link_pattern, replace_link, content, flags=re.DOTALL)
436485

486+
def _should_redirect_to_external(self, link_url: str) -> bool:
487+
"""Check if a link should be redirected to external Bazel API docs"""
488+
external_paths = self.config.get('external_links', {}).get('external_paths', [])
489+
return any(link_url.startswith(path) for path in external_paths)
490+
437491
def _fix_directory_structures(self, content: str) -> str:
438492
"""Fix directory structure formatting to use proper code blocks"""
439493
# Pattern to match directory structures with Unicode tree characters

0 commit comments

Comments
 (0)