Skip to content

Commit 8f37c5f

Browse files
infinityCuratorRAGFlow Curator
andauthored
Synchronize documentation. (#439)
Co-authored-by: RAGFlow Curator <infinitydocs.curator@users.noreply.github.com>
1 parent 0f0cd04 commit 8f37c5f

File tree

1 file changed

+69
-25
lines changed

1 file changed

+69
-25
lines changed

website/docs/faq.mdx

Lines changed: 69 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -512,43 +512,85 @@ See [here](./guides/chat/best_practices/accelerate_question_answering.mdx).
512512

513513
See [here](./guides/agent/best_practices/accelerate_agent_question_answering.md).
514514

515-
---
516-
517515
### How to use MinerU to parse PDF documents?
518516

519-
MinerU PDF document parsing is available starting from v0.21.1. To use this feature, follow these steps:
517+
MinerU PDF document parsing is available starting from v0.21.1. RAGFlow supports MinerU (>= 2.6.3) as an optional PDF parser with multiple backends. RAGFlow itself only acts as a client: it calls MinerU to parse documents, reads the output files, and ingests the parsed content into RAGFlow. To use this feature, follow these steps:
520518

521-
1. Before deploying ragflow-server, update your **docker/.env** file:
522-
- Enable `HF_ENDPOINT=https://hf-mirror.com`
523-
- Add a MinerU entry: `MINERU_EXECUTABLE=/ragflow/uv_tools/.venv/bin/mineru`
519+
1. **Prepare MinerU**
524520

525-
2. Start the ragflow-server and run the following commands inside the container:
521+
- **If you run RAGFlow from source**, install MinerU into an isolated virtual environment (recommended path: `$HOME/uv_tools`):
526522

527-
```bash
528-
mkdir uv_tools
529-
cd uv_tools
530-
uv venv .venv
531-
source .venv/bin/activate
532-
uv pip install -U "mineru[core]" -i https://mirrors.aliyun.com/pypi/simple
533-
```
523+
```bash
524+
mkdir -p "$HOME/uv_tools"
525+
cd "$HOME/uv_tools"
526+
uv venv .venv
527+
source .venv/bin/activate
528+
uv pip install -U "mineru[core]" -i https://mirrors.aliyun.com/pypi/simple
529+
# or
530+
# uv pip install -U "mineru[all]" -i https://mirrors.aliyun.com/pypi/simple
531+
```
532+
533+
- **If you run RAGFlow with Docker**, you usually only need to turn on MinerU support in `docker/.env`:
534+
535+
```bash
536+
# docker/.env
537+
...
538+
USE_MINERU=true
539+
...
540+
```
541+
542+
Enabling `USE_MINERU=true` will internally perform the same setup as the manual configuration (including setting the MinerU executable path and related environment variables). You only need the manual installation above if you are running from source or want full control over the MinerU installation.
543+
544+
2. **Start RAGFlow with MinerU enabled**
534545

535-
3. Restart the ragflow-server.
536-
4. In the web UI, navigate to the **Configuration** page of your dataset. Click **Built-in** in the **Ingestion pipeline** section, select a chunking method from the **Built-in** dropdown, which supports PDF parsing, and slect **MinerU** in **PDF parser**.
537-
5. If you use a custom ingestion pipeline instead, you must also complete the first three steps before selecting **MinerU** in the **Parsing method** section of the **Parser** component.
546+
- **Source deployment** – in the RAGFlow repo, export the key MinerU-related variables and start the backend service:
547+
548+
```bash
549+
# in RAGFlow repo
550+
export MINERU_EXECUTABLE="$HOME/uv_tools/.venv/bin/mineru"
551+
export MINERU_DELETE_OUTPUT=0 # keep output directory
552+
export MINERU_BACKEND=pipeline # or another backend you prefer
553+
554+
source .venv/bin/activate
555+
export PYTHONPATH=$(pwd)
556+
bash docker/launch_backend_service.sh
557+
```
558+
559+
- **Docker deployment** – after setting `USE_MINERU=true`, restart the containers so that the new settings take effect:
560+
561+
```bash
562+
# in RAGFlow repo
563+
docker compose -f docker/docker-compose.yml restart
564+
```
565+
566+
3. In the web UI, navigate to the **Configuration** page of your dataset. Click **Built-in** in the **Ingestion pipeline** section, select a chunking method from the **Built-in** dropdown (which supports PDF parsing), and select **MinerU** in **PDF parser**.
567+
4. If you use a custom ingestion pipeline instead, you must also complete the first two steps before selecting **MinerU** in the **Parsing method** section of the **Parser** component.
538568

539569
---
540570

541571
### How to configure MinerU-specific settings?
542572

543-
1. Set `MINERU_EXECUTABLE` (default: `mineru`) to the path to the MinerU executable.
544-
2. Set `MINERU_DELETE_OUTPUT` to `0` to keep MinerU's output. (Default: `1`, which deletes temporary output)
545-
3. Set `MINERU_OUTPUT_DIR` to specify the output directory for MinerU.
573+
The table below summarizes the most commonly used MinerU-related environment variables:
574+
575+
| Environment variable | Description | Default | Example |
576+
| ---------------------- | ---------------------------------- | ----------------------------------- | ----------------------------------------------------------------------------------------------- |
577+
| `MINERU_EXECUTABLE` | Path to the local MinerU executable | `mineru` | `MINERU_EXECUTABLE=/home/ragflow/uv_tools/.venv/bin/mineru` |
578+
| `MINERU_DELETE_OUTPUT` | Whether to delete MinerU output directory | `1` (do **not** keep the output directory) | `MINERU_DELETE_OUTPUT=0` |
579+
| `MINERU_OUTPUT_DIR` | Directory for MinerU output files | System-defined temporary directory | `MINERU_OUTPUT_DIR=/home/ragflow/mineru/output` |
580+
| `MINERU_BACKEND` | MinerU parsing backend | `pipeline` | `MINERU_BACKEND=pipeline\|vlm-transformers\|vlm-vllm-engine\|vlm-http-client` |
581+
| `MINERU_SERVER_URL` | URL of remote vLLM server (only for `vlm-http-client` backend) | _unset_ | `MINERU_SERVER_URL=http://your-vllm-server-ip:30000` |
582+
| `MINERU_APISERVER` | URL of remote MinerU service used as the parser (instead of local MinerU) | _unset_ | `MINERU_APISERVER=http://your-mineru-server:port` |
583+
584+
1. Set `MINERU_EXECUTABLE` to the path to the MinerU executable if the default `mineru` is not on `PATH`.
585+
2. Set `MINERU_DELETE_OUTPUT` to `0` to keep MinerU's output. (Default: `1`, which deletes temporary output.)
586+
3. Set `MINERU_OUTPUT_DIR` to specify the output directory for MinerU (otherwise a system temp directory is used).
546587
4. Set `MINERU_BACKEND` to specify a parsing backend:
547588
- `"pipeline"` (default): The traditional multimodel pipeline.
548589
- `"vlm-transformers"`: A vision-language model using HuggingFace Transformers.
549-
- `"vlm-vllm-engine"`: A vision-language model using local vLLM engine (requires a local GPU).
550-
- `"vlm-http-client"`: A vision-language model via HTTP client to remote vLLM server (RAGFlow only requires CPU).
590+
- `"vlm-vllm-engine"`: A vision-language model using a local vLLM engine (requires a local GPU).
591+
- `"vlm-http-client"`: A vision-language model via HTTP client to a remote vLLM server (RAGFlow only requires CPU).
551592
5. If using the `"vlm-http-client"` backend, you must also set `MINERU_SERVER_URL` to the URL of your vLLM server.
593+
6. If you want RAGFlow to call a **remote MinerU service** (instead of a MinerU process running locally with RAGFlow), set `MINERU_APISERVER` to the URL of the remote MinerU server.
552594

553595
:::tip NOTE
554596
For information about other environment variables natively supported by MinerU, see [here](https://opendatalab.github.io/MinerU/usage/cli_tools/#environment-variables-description).
@@ -561,16 +603,18 @@ For information about other environment variables natively supported by MinerU,
561603
RAGFlow supports MinerU's `vlm-http-client` backend, enabling you to delegate document parsing tasks to a remote vLLM server. With this configuration, RAGFlow will connect to your remote vLLM server as a client and use its powerful GPU resources for document parsing. This significantly improves performance for parsing complex documents while reducing the resources required on your RAGFlow server. To configure MinerU with a vLLM server:
562604

563605
1. Set up a vLLM server running MinerU:
606+
564607
```bash
565608
mineru-vllm-server --port 30000
566609
```
567610

568-
2. Configure the following environment variables in your **docker/.env** file:
569-
- `MINERU_EXECUTABLE=/ragflow/uv_tools/.venv/bin/mineru` (or the path to your MinerU executable)
611+
2. Configure the following environment variables in your **docker/.env** file (or your shell if running from source):
612+
613+
- `MINERU_EXECUTABLE=/home/ragflow/uv_tools/.venv/bin/mineru` (or the path to your MinerU executable)
570614
- `MINERU_BACKEND="vlm-http-client"`
571615
- `MINERU_SERVER_URL="http://your-vllm-server-ip:30000"`
572616

573-
3. Complete the rest standard MinerU setup steps as described [here](#how-to-configure-mineru-specific-settings).
617+
3. Complete the rest of the standard MinerU setup steps as described [here](#how-to-configure-mineru-specific-settings).
574618

575619
:::tip NOTE
576620
When using the `vlm-http-client` backend, the RAGFlow server requires no GPU, only network connectivity. This enables cost-effective distributed deployment with multiple RAGFlow instances sharing one remote vLLM server.

0 commit comments

Comments
 (0)