Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
0fd5d2f
[DOC] Can't find k1 parameter using search (not indexed?)
aryasoni98 Oct 29, 2025
0458b0b
Update plugins.calcite.enabled setting default (#11435)
kolchfa-aws Oct 29, 2025
1eeb672
updating the example for more_like_this (#11454)
AntonEliatra Oct 30, 2025
5dfeec3
updating the debian install with apt (#11456)
AntonEliatra Oct 30, 2025
554396a
updating the logstash migration example (#11201)
AntonEliatra Oct 30, 2025
6ad513d
Add tag documentation to ingest processors and pipelines topic (#11459)
kolchfa-aws Oct 30, 2025
22ecfae
Update blueprints.md (#11463)
kolchfa-aws Oct 30, 2025
2cc4cd5
Add copy buttons and highlighting to data prepper code samples (#11465)
kolchfa-aws Oct 30, 2025
cf148e3
Add Polish and Ukranian analyzer documentation (#11469)
kolchfa-aws Oct 30, 2025
4734a57
Add missing PPL settings (#11470)
kolchfa-aws Oct 30, 2025
6e4e1f5
Bump docs to 3.3.2 version with OS updates only (#11404)
peterzhuamazon Oct 30, 2025
5e24988
adding examples to http source of data prepper (#11347)
AntonEliatra Nov 4, 2025
1eb2d41
adding file source page (#11355)
AntonEliatra Nov 4, 2025
b8d3d5d
Add explain filtering functionality for ISM docs (#11462)
kolchfa-aws Nov 4, 2025
aa299ed
Update documentation for arrays that semantic field cannot support it…
bzhangam Nov 4, 2025
734d8ab
expanding example for split string processor (#11246)
AntonEliatra Nov 6, 2025
f2d1880
Expanding conditional routing example (#11237)
AntonEliatra Nov 6, 2025
27a957b
Add field masking search limitation (#11489)
kolchfa-aws Nov 6, 2025
8ecca38
adding example to write json processor data prepper (#11282)
AntonEliatra Nov 6, 2025
4ad656e
Correct kubectl commands (#11492)
violuke Nov 6, 2025
c2da0e9
Update cluster-settings.md (#11503)
kolchfa-aws Nov 6, 2025
4236b7b
Add 2.19.4 to version history (#11494)
kolchfa-aws Nov 6, 2025
e6920e5
Fix broken external links (#11508)
kolchfa-aws Nov 10, 2025
421b438
adding examples to key value processor data prepper (#11232)
AntonEliatra Nov 10, 2025
ca8f9ab
Updating the Cross Cluster Replication documentation for the index le…
darjisagar7 Nov 10, 2025
2ab87ab
Update documentation for delete_entries and select_entries processors…
kennedy-onyia Nov 10, 2025
1626e43
delete incorrect output value in delete_entries processor example (#1…
kennedy-onyia Nov 10, 2025
266aa77
[DOC] Downloadable PDF Developer Guides
aryasoni98 Nov 11, 2025
3a6b124
Merge branch 'main' into issue-11192
aryasoni98 Nov 11, 2025
1234757
Merge branch 'main' into issue-11192
aryasoni98 Nov 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions .github/workflows/generate-pdfs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
name: Generate PDFs

on:
workflow_dispatch:
schedule:
# Run weekly on Sundays at 2 AM UTC
- cron: "0 2 * * 0"
# Optional: Run after main branch updates (uncomment if desired)
# push:
# branches:
# - main

jobs:
generate-pdfs:
if: github.repository == 'opensearch-project/documentation-website'
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Setup Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: '3.4.5'
bundler-cache: true

- name: Build Jekyll site
env:
JEKYLL_ENV: production
run: |
bundle exec jekyll build --future

- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '18'
cache: 'npm'

- name: Install dependencies
run: npm ci

- name: Generate PDFs
run: |
npm run generate-pdfs

- name: List generated PDFs
run: |
echo "Generated PDFs:"
ls -lh pdfs/ || echo "No PDFs generated"

- name: Upload PDFs as artifacts
uses: actions/upload-artifact@v4
if: always()
with:
name: opensearch-documentation-pdfs
path: pdfs/*.pdf
retention-days: 30

# Optional: Create a GitHub release with PDFs
# Uncomment and configure if you want to automatically create releases
# - name: Create GitHub Release
# if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
# uses: softprops/action-gh-release@v1
# with:
# files: pdfs/*.pdf
# tag_name: pdfs-${{ github.run_number }}
# name: Documentation PDFs
# body: |
# Automatically generated PDF documentation for OpenSearch.
# Generated on ${{ github.run_id }}
# env:
# GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,6 @@ Gemfile.lock
.jekyll-cache
.project
vendor/bundle
node_modules
pdfs
package-lock.json
75 changes: 75 additions & 0 deletions DEVELOPER_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,3 +300,78 @@
bundle exec rake generate_dry_run_report
```
This will also generate a markdown (.md) file for each API with their rendered components in the `spec-insert/dry_run` folder. This allows you to preview the rendered components for all APIs without modifying the original documentation files. A report summarizing the errors found during the dry-run will be generated in the `spec-insert/dry_run_report.md` file.

## PDF Generation

Check failure on line 304 in DEVELOPER_GUIDE.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] 'PDF Generation' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] 'PDF Generation' is a heading and should be in sentence case.", "location": {"path": "DEVELOPER_GUIDE.md", "range": {"start": {"line": 304, "column": 4}}}, "severity": "ERROR"}

The documentation website supports generating PDF versions of the developer guides and other documentation sections. This feature allows users to download complete documentation sets for offline use, easier searching, and integration with AI tools.

### Generating PDFs Locally

Check failure on line 308 in DEVELOPER_GUIDE.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] 'Generating PDFs Locally' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] 'Generating PDFs Locally' is a heading and should be in sentence case.", "location": {"path": "DEVELOPER_GUIDE.md", "range": {"start": {"line": 308, "column": 5}}}, "severity": "ERROR"}

To generate PDFs locally:

1. **Install Node.js dependencies:**
```shell
npm install
```

2. **Build the Jekyll site:**
```shell
bundle exec jekyll build
```

3. **Generate PDFs:**
```shell
npm run generate-pdfs
```

Or generate a PDF for a specific collection:
```shell
npm run generate-pdfs -- --collection developer-documentation
```

The generated PDFs will be saved in the `pdfs/` directory.

### PDF Generation Configuration

Check failure on line 334 in DEVELOPER_GUIDE.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] 'PDF Generation Configuration' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] 'PDF Generation Configuration' is a heading and should be in sentence case.", "location": {"path": "DEVELOPER_GUIDE.md", "range": {"start": {"line": 334, "column": 5}}}, "severity": "ERROR"}

PDF generation is configured in `pdf-config.json`. This file defines:
- Which collections to convert to PDFs
- PDF output settings (format, margins, headers, footers)
- Base URL and output directory

You can customize the configuration by editing `pdf-config.json`.

### CI/CD Integration

Check failure on line 343 in DEVELOPER_GUIDE.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] 'CI/CD Integration' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] 'CI/CD Integration' is a heading and should be in sentence case.", "location": {"path": "DEVELOPER_GUIDE.md", "range": {"start": {"line": 343, "column": 5}}}, "severity": "ERROR"}

PDF generation runs automatically in CI/CD through the [generate-pdfs.yml](.github/workflows/generate-pdfs.yml) GitHub Actions workflow. This workflow:

- Runs weekly on Sundays at 2 AM UTC
- Can be triggered manually via `workflow_dispatch`
- Builds the Jekyll site
- Generates PDFs for all configured collections
- Uploads PDFs as GitHub Actions artifacts

The workflow runs separately from the main Jekyll build to avoid adding to build time.

### Available PDFs

The following documentation sections are available as PDFs (as configured in `pdf-config.json`):

- OpenSearch Developer Guide
- Getting Started Guide
- API Reference
- Install and Configure Guide
- Cluster Tuning Guide
- Security Guide
- Query DSL Guide
- Search Features Guide
- Vector Search Guide
- Machine Learning Guide

### Copyright and Usage

Check failure on line 370 in DEVELOPER_GUIDE.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] 'Copyright and Usage' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] 'Copyright and Usage' is a heading and should be in sentence case.", "location": {"path": "DEVELOPER_GUIDE.md", "range": {"start": {"line": 370, "column": 5}}}, "severity": "ERROR"}

OpenSearch documentation is licensed under the Apache License 2.0, which allows you to:
- Use the PDFs for personal or commercial purposes
- Upload PDFs to AI tools (ChatGPT, NotebookLLM, etc.) for knowledge summarization

Check warning on line 374 in DEVELOPER_GUIDE.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.LatinismsElimination] Using 'etc.' is unnecessary. Remove. Raw Output: {"message": "[OpenSearch.LatinismsElimination] Using 'etc.' is unnecessary. Remove.", "location": {"path": "DEVELOPER_GUIDE.md", "range": {"start": {"line": 374, "column": 50}}}, "severity": "WARNING"}
- Share and distribute the PDFs

Proper attribution should be maintained when using the documentation.
9 changes: 8 additions & 1 deletion _search-plugins/keyword-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ layout: default
title: Keyword search
has_children: false
nav_order: 10
meta_description: Learn about BM25 keyword search in OpenSearch, including how to configure BM25 parameters k1 and b for better search relevance
meta_keywords: BM25, keyword search, k1, b, term frequency, inverse document frequency, TF/IDF, search relevance, Okapi BM25
---

# Keyword search
Expand Down Expand Up @@ -165,7 +167,12 @@ PUT /testindex

## Configuring BM25 similarity

You can configure BM25 similarity parameters at the index level as follows:
You can configure BM25 similarity parameters at the index level. The BM25 algorithm supports two key parameters: `k1` (term saturation parameter) and `b` (length normalization parameter). These parameters control how BM25 scores documents:

- The `k1` parameter controls term frequency saturation, determining how quickly the relevance score increases as term frequency grows.
- The `b` parameter controls the impact of document length on scoring.

You can configure these parameters at the index level as follows:

```json
PUT /testindex
Expand Down
Loading
Loading