Skip to content

Commit d66b35e

Browse files
authored
Merge pull request #106 from jchunk-io/dev/pablosanchi/issue-104
docs(issue-104): move antora to docosaurus
2 parents 423657d + 5fa8f3c commit d66b35e

40 files changed

+19073
-721
lines changed

.github/workflows/docs.yml

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
1-
name: Deploy Antora to GitHub Pages
1+
name: Deploy Docusaurus to GitHub Pages
22

33
on:
44
push:
55
branches: [ main ]
66
paths:
77
- 'docs/**'
8-
- 'antora-playbook.yml'
98
- '.github/workflows/docs.yml'
109
workflow_dispatch:
1110

@@ -20,38 +19,38 @@ concurrency:
2019

2120
jobs:
2221
build:
22+
name: Build Jchunk docs
2323
runs-on: ubuntu-latest
2424
steps:
25-
- name: Checkout
26-
uses: actions/checkout@v4
25+
- uses: actions/checkout@v4
26+
with:
27+
fetch-depth: 0
2728

28-
- name: Setup Node.js
29-
uses: actions/setup-node@v4
29+
- uses: actions/setup-node@v4
3030
with:
3131
node-version: 20
32+
cache: npm
33+
cache-dependency-path: docs/package-lock.json
3234

33-
- name: Setup Pages
34-
uses: actions/configure-pages@v5
35-
36-
- name: Build Antora site
37-
run: |
38-
npx -y -p @antora/cli@3.1 -p @antora/site-generator@3.1 antora -r @antora/site-generator antora-playbook.yml
39-
40-
- name: Disable Jekyll on Pages
41-
run: |
42-
echo > build/site/.nojekyll
35+
- name: Install dependencies
36+
working-directory: docs
37+
run: npm ci
38+
- name: Build website
39+
working-directory: docs
40+
run: npm run build
4341

44-
- name: Upload Pages artifact
42+
- name: Upload Build Artifact
4543
uses: actions/upload-pages-artifact@v3
4644
with:
47-
path: build/site
45+
path: docs/build
4846

4947
deploy:
48+
name: Deploy to GitHub pages
49+
needs: build
5050
environment:
5151
name: github-pages
5252
url: ${{ steps.deployment.outputs.page_url }}
5353
runs-on: ubuntu-latest
54-
needs: build
5554
steps:
5655
- name: Deploy to GitHub Pages
5756
id: deployment

.gitignore

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,4 +32,23 @@ build/
3232
### VS Code ###
3333
.vscode/
3434

35-
.DS_Store
35+
.DS_Store
36+
37+
### Docosaurus
38+
39+
# Dependencies
40+
node_modules
41+
42+
# Generated files
43+
.docusaurus
44+
.cache-loader
45+
46+
# Misc
47+
.env.local
48+
.env.development.local
49+
.env.test.local
50+
.env.production.local
51+
52+
npm-debug.log*
53+
yarn-debug.log*
54+
yarn-error.log*

README.md

Lines changed: 1 addition & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# JChunk
22

33
[![GitHub Actions Status](https://img.shields.io/github/actions/workflow/status/jchunk-io/jchunk/build.yml?branch=main&logo=GitHub&style=for-the-badge)](.)
4-
[![Apache 2.0 License](https://img.shields.io/github/license/arconia-io/arconia?style=for-the-badge&logo=apache&color=brightgreen)](.)
4+
[![Apache 2.0 License](https://img.shields.io/github/license/jchunk-io/jchunk?style=for-the-badge&logo=apache&color=brightgreen)](.)
55

66
## A Java Library for Text Chunking
77

@@ -55,23 +55,6 @@ To check javadocs using the javadoc:javadoc
5555
./mvnw javadoc:javadoc -Pjavadoc
5656
```
5757

58-
## Building the docs locally
59-
60-
You can build and preview the Antora documentation locally without installing anything globally.
61-
62-
Prerequisites:
63-
- Node.js 18+ (20 recommended).
64-
- Download from https://nodejs.org/
65-
66-
Build the site:
67-
68-
```sh
69-
npx -y -p @antora/cli@3.1 -p @antora/site-generator@3.1 antora -r @antora/site-generator antora-playbook.yml
70-
```
71-
72-
Open the generated site:
73-
- `build/site/index.html`
74-
7558
## Contributing
7659

7760
Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us.

antora-playbook.yml

Lines changed: 0 additions & 25 deletions
This file was deleted.

docs/README.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Website
2+
3+
This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator.
4+
5+
## Installation
6+
7+
```bash
8+
yarn
9+
```
10+
11+
## Local Development
12+
13+
```bash
14+
yarn start
15+
```
16+
17+
This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.
18+
19+
## Build
20+
21+
```bash
22+
yarn build
23+
```
24+
25+
This command generates static content into the `build` directory and can be served using any static contents hosting service.
26+
27+
## Deployment
28+
29+
Using SSH:
30+
31+
```bash
32+
USE_SSH=true yarn deploy
33+
```
34+
35+
Not using SSH:
36+
37+
```bash
38+
GIT_USER=<Your GitHub username> yarn deploy
39+
```
40+
41+
If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch.

docs/antora.yml

Lines changed: 0 additions & 8 deletions
This file was deleted.

docs/docs/chunkers/_category_.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"label": "Chunkers",
3+
"position": 3,
4+
"link": {
5+
"type": "generated-index"
6+
}
7+
}
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Fixed Character Chunker
2+
3+
## Overview
4+
5+
The Fixed Character Chunker is a basic text processing technique where text is divided into fixed-size chunks of characters. While simple, it serves as an excellent starting point to understand text splitting fundamentals.
6+
7+
## Installation
8+
9+
```xml
10+
<dependency>
11+
<groupId>io.jchunk</groupId>
12+
<artifactId>jchunk-fixed</artifactId>
13+
<version>${jchunk.version}</version>
14+
</dependency>
15+
```
16+
17+
```groovy
18+
implementation group: 'io.jchunk', name: 'jchunk-fixed', version: "${JCHUNK_VERSION}"
19+
```
20+
21+
## Configuration
22+
23+
```java
24+
// using default config
25+
FixedChunker chunker = new FixedChunker();
26+
27+
// with custom config
28+
Config config = Config.builder()
29+
.chunkSize(10)
30+
.chunkOverlap(0)
31+
.delimiter(";")
32+
.trimWhitespace(true)
33+
.keepDelimiter(Delimiter.START)
34+
.build();
35+
36+
FixedChunker chunker = new FixedChunker(config);
37+
```
38+
39+
### Configuration Options
40+
41+
- `chunkSize`: Maximum number of characters per chunk. Defines the target size of each piece. If a single segment is longer than this, it may exceed the limit.
42+
- Default: `1000`.
43+
- `chunkOverlap`: Number of characters to overlap between consecutive chunks (preserves context).
44+
- Default: `100`.
45+
- `delimiter`: Regex string used to split text before forming chunks. Common values: `" "` for spaces, `"\n"` for newlines, `""` for character-level.
46+
- Default: `space (" ")`.
47+
- `trimWhitespace`: Whether to trim leading/trailing whitespace from each chunk.
48+
- Default: `true`.
49+
- `keepDelimiter`: How to keep delimiters in chunks: `NONE`, `START`, or `END`.
50+
- Default: `NONE`.
51+
52+
## Examples
53+
54+
### Basic Chunking
55+
56+
Chunk size of 10 and no overlap (0):
57+
58+
```java
59+
Config config = Config.builder()
60+
.chunkSize(10)
61+
.chunkOverlap(0)
62+
.build();
63+
FixedChunker chunker = new FixedChunker(config);
64+
String text = "This is an example of character splitting.";
65+
66+
List<Chunk> chunks = chunker.split(text);
67+
68+
// Result: ["This is an", "example of", "character", "splitting."]
69+
```
70+
71+
### With Overlap
72+
73+
Adding 4 characters of overlap and a custom blank delimiter:
74+
75+
```java
76+
Config config = Config.builder()
77+
.chunkSize(35)
78+
.chunkOverlap(4)
79+
.delimiter("")
80+
.build();
81+
FixedChunker chunker = new FixedChunker(config);
82+
String text = "This is the text I would like to chunk up. It is the example text for this exercise";
83+
List<Chunk> chunks = chunker.split(text);
84+
85+
// Result: ["This is the text I would like to ch", "o chunk up. It is the example text", "ext for this exercise"]
86+
```
87+
88+
## Pros and Cons
89+
90+
### Pros
91+
- Easy to implement and understand
92+
- Predictable chunk sizes
93+
- Fast processing
94+
95+
### Cons
96+
- Doesn't consider text structure or context
97+
- May split words inappropriately
98+
- Overlap creates duplicate data

0 commit comments

Comments
 (0)