You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/docs/faq.mdx
+69-25Lines changed: 69 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -512,43 +512,85 @@ See [here](./guides/chat/best_practices/accelerate_question_answering.mdx).
512
512
513
513
See [here](./guides/agent/best_practices/accelerate_agent_question_answering.md).
514
514
515
-
---
516
-
517
515
### How to use MinerU to parse PDF documents?
518
516
519
-
MinerU PDF document parsing is available starting from v0.21.1. To use this feature, follow these steps:
517
+
MinerU PDF document parsing is available starting from v0.21.1. RAGFlow supports MinerU (>= 2.6.3) as an optional PDF parser with multiple backends. RAGFlow itself only acts as a client: it calls MinerU to parse documents, reads the output files, and ingests the parsed content into RAGFlow. To use this feature, follow these steps:
520
518
521
-
1. Before deploying ragflow-server, update your **docker/.env** file:
522
-
- Enable `HF_ENDPOINT=https://hf-mirror.com`
523
-
- Add a MinerU entry: `MINERU_EXECUTABLE=/ragflow/uv_tools/.venv/bin/mineru`
519
+
1.**Prepare MinerU**
524
520
525
-
2. Start the ragflow-server and run the following commands inside the container:
521
+
-**If you run RAGFlow from source**, install MinerU into an isolated virtual environment (recommended path: `$HOME/uv_tools`):
-**If you run RAGFlow with Docker**, you usually only need to turn on MinerU support in `docker/.env`:
534
+
535
+
```bash
536
+
# docker/.env
537
+
...
538
+
USE_MINERU=true
539
+
...
540
+
```
541
+
542
+
Enabling `USE_MINERU=true` will internally perform the same setup as the manual configuration (including setting the MinerU executable path and related environment variables). You only need the manual installation above if you are running from source or want full control over the MinerU installation.
543
+
544
+
2.**Start RAGFlow with MinerU enabled**
534
545
535
-
3. Restart the ragflow-server.
536
-
4. In the web UI, navigate to the **Configuration** page of your dataset. Click **Built-in** in the **Ingestion pipeline** section, select a chunking method from the **Built-in** dropdown, which supports PDF parsing, and slect **MinerU** in **PDF parser**.
537
-
5. If you use a custom ingestion pipeline instead, you must also complete the first three steps before selecting **MinerU** in the **Parsing method** section of the **Parser** component.
546
+
-**Source deployment** – in the RAGFlow repo, export the key MinerU-related variables and start the backend service:
3. In the web UI, navigate to the **Configuration** page of your dataset. Click **Built-in** in the **Ingestion pipeline** section, select a chunking method from the **Built-in** dropdown (which supports PDF parsing), and select **MinerU** in **PDF parser**.
567
+
4. If you use a custom ingestion pipeline instead, you must also complete the first two steps before selecting **MinerU** in the **Parsing method** section of the **Parser** component.
538
568
539
569
---
540
570
541
571
### How to configure MinerU-specific settings?
542
572
543
-
1. Set `MINERU_EXECUTABLE` (default: `mineru`) to the path to the MinerU executable.
544
-
2. Set `MINERU_DELETE_OUTPUT` to `0` to keep MinerU's output. (Default: `1`, which deletes temporary output)
545
-
3. Set `MINERU_OUTPUT_DIR` to specify the output directory for MinerU.
573
+
The table below summarizes the most commonly used MinerU-related environment variables:
574
+
575
+
| Environment variable | Description | Default | Example |
|`MINERU_SERVER_URL`| URL of remote vLLM server (only for `vlm-http-client` backend) |_unset_|`MINERU_SERVER_URL=http://your-vllm-server-ip:30000`|
582
+
|`MINERU_APISERVER`| URL of remote MinerU service used as the parser (instead of local MinerU) |_unset_|`MINERU_APISERVER=http://your-mineru-server:port`|
583
+
584
+
1. Set `MINERU_EXECUTABLE` to the path to the MinerU executable if the default `mineru` is not on `PATH`.
585
+
2. Set `MINERU_DELETE_OUTPUT` to `0` to keep MinerU's output. (Default: `1`, which deletes temporary output.)
586
+
3. Set `MINERU_OUTPUT_DIR` to specify the output directory for MinerU (otherwise a system temp directory is used).
546
587
4. Set `MINERU_BACKEND` to specify a parsing backend:
547
588
-`"pipeline"` (default): The traditional multimodel pipeline.
548
589
-`"vlm-transformers"`: A vision-language model using HuggingFace Transformers.
549
-
-`"vlm-vllm-engine"`: A vision-language model using local vLLM engine (requires a local GPU).
550
-
-`"vlm-http-client"`: A vision-language model via HTTP client to remote vLLM server (RAGFlow only requires CPU).
590
+
-`"vlm-vllm-engine"`: A vision-language model using a local vLLM engine (requires a local GPU).
591
+
-`"vlm-http-client"`: A vision-language model via HTTP client to a remote vLLM server (RAGFlow only requires CPU).
551
592
5. If using the `"vlm-http-client"` backend, you must also set `MINERU_SERVER_URL` to the URL of your vLLM server.
593
+
6. If you want RAGFlow to call a **remote MinerU service** (instead of a MinerU process running locally with RAGFlow), set `MINERU_APISERVER` to the URL of the remote MinerU server.
552
594
553
595
:::tip NOTE
554
596
For information about other environment variables natively supported by MinerU, see [here](https://opendatalab.github.io/MinerU/usage/cli_tools/#environment-variables-description).
@@ -561,16 +603,18 @@ For information about other environment variables natively supported by MinerU,
561
603
RAGFlow supports MinerU's `vlm-http-client` backend, enabling you to delegate document parsing tasks to a remote vLLM server. With this configuration, RAGFlow will connect to your remote vLLM server as a client and use its powerful GPU resources for document parsing. This significantly improves performance for parsing complex documents while reducing the resources required on your RAGFlow server. To configure MinerU with a vLLM server:
562
604
563
605
1. Set up a vLLM server running MinerU:
606
+
564
607
```bash
565
608
mineru-vllm-server --port 30000
566
609
```
567
610
568
-
2. Configure the following environment variables in your **docker/.env** file:
569
-
-`MINERU_EXECUTABLE=/ragflow/uv_tools/.venv/bin/mineru` (or the path to your MinerU executable)
611
+
2. Configure the following environment variables in your **docker/.env** file (or your shell if running from source):
612
+
613
+
-`MINERU_EXECUTABLE=/home/ragflow/uv_tools/.venv/bin/mineru` (or the path to your MinerU executable)
3. Complete the rest standard MinerU setup steps as described [here](#how-to-configure-mineru-specific-settings).
617
+
3. Complete the rest of the standard MinerU setup steps as described [here](#how-to-configure-mineru-specific-settings).
574
618
575
619
:::tip NOTE
576
620
When using the `vlm-http-client` backend, the RAGFlow server requires no GPU, only network connectivity. This enables cost-effective distributed deployment with multiple RAGFlow instances sharing one remote vLLM server.
0 commit comments