You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -202,6 +202,7 @@ It defines an index flow like this:
202
202
|[Custom Output Files](examples/custom_output_files)| Convert markdown files to HTML files and save them to a local directory, using *CocoIndex Custom Targets*|
203
203
|[Patient intake form extraction](examples/patient_intake_extraction)| Use LLM to extract structured data from patient intake forms with different formats |
204
204
|[HackerNews Trending Topics](examples/hn_trending_topics)| Extract trending topics from HackerNews threads and comments, using *CocoIndex Custom Source* and LLM |
205
+
|[Patient Intake Form Extraction with BAML](examples/patient_intake_extraction_baml)| Extract structured data from patient intake forms using BAML |
We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful.
5
+
6
+
This example shows how to use [BAML](https://boundaryml.com/) to extract structured data from patient intake PDFs. BAML provides type-safe structured data extraction with native PDF support.
7
+
8
+
-**BAML Schema** (`baml_src/patient.baml`) - Defines the data structure and extraction function
9
+
-**CocoIndex Flow** (`main.py`) - Wraps BAML in a custom function, provide the flow to and process files incrementally.
10
+
11
+
## Prerequisites
12
+
13
+
1.[Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
14
+
15
+
2. Install dependencies
16
+
17
+
```sh
18
+
pip install -U cocoindex baml-py
19
+
```
20
+
21
+
3.**Generate BAML client code** (required step!)
22
+
23
+
```sh
24
+
baml generate
25
+
```
26
+
27
+
This generates the `baml_client/` directory with Python code to call your BAML functions.
28
+
29
+
4. Create a `.env` file. You can copy it from `.env.example` first:
30
+
31
+
```sh
32
+
cp .env.example .env
33
+
```
34
+
35
+
Then edit the file to fill in your `GEMINI_API_KEY`.
36
+
37
+
## Run
38
+
39
+
Update index:
40
+
41
+
```sh
42
+
cocoindex update main
43
+
```
44
+
45
+
## CocoInsight
46
+
47
+
I used CocoInsight (Free beta now) to troubleshoot the index generation and understand the data lineage of the pipeline. It just connects to your local CocoIndex server, with zero pipeline data retention. Run following command to start CocoInsight:
48
+
49
+
```sh
50
+
cocoindex server -ci main
51
+
```
52
+
53
+
Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight).
0 commit comments