You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/getting-started/example-datasets/hacker-news-vector-search.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ real world vector search application built on top of user generated, textual dat
16
16
17
17
## Dataset details {#dataset-details}
18
18
19
-
The complete dataset with vector embeddings is made available by ClickHouse as a single `Parquet` file in a `S3` bucket : https://clickhouse-datasets.s3.amazonaws.com/hackernews-miniLM/hackernews_part_1_of_1.parquet
19
+
The complete dataset with vector embeddings is made available by ClickHouse as a single `Parquet` file in a [S3 bucket](https://clickhouse-datasets.s3.amazonaws.com/hackernews-miniLM/hackernews_part_1_of_1.parquet)
20
20
21
21
We recommend users first run a sizing exercise to estimate the storage and memory requirements for this dataset by referring to the [documentation](../../engines/table-engines/mergetree-family/annindexes.md).
22
22
@@ -63,11 +63,11 @@ To load the dataset from the `Parquet` file, run the following SQL statement:
63
63
INSERT INTO hackernews SELECT*FROM s3('https://clickhouse-datasets.s3.amazonaws.com/hackernews-miniLM/hackernews_part_1_of_1.parquet');
64
64
```
65
65
66
-
The loading of 28.74 million rows into the table will take a few minutes.
66
+
Inserting 28.74 million rows into the table will take a few minutes.
67
67
68
68
### Build a vector similarity index {#build-vector-similarity-index}
69
69
70
-
Run the following SQL to define and build a vector similarity index on the `vector` column of the `hackernews` table:
70
+
Run the following SQL to define and build a vector similarity index on the `vector` column of the `hackernews` table:
71
71
72
72
```sql
73
73
ALTERTABLE hackernews ADD INDEX vector_index vector TYPE vector_similarity('hnsw', 'cosineDistance', 384, 'bf16', 64, 512);
@@ -218,7 +218,7 @@ A very simple but high potential generative AI example application is presented
218
218
The application performs the following steps:
219
219
220
220
1. Accepts a _topic_ as input from the user
221
-
2. Generates an embedding vector for the _topic_ by using `SentenceTransformers` with model `all-MiniLM-L6-v2`
221
+
2. Generates an embedding vector for the _topic_ by using the `SentenceTransformers` with model `all-MiniLM-L6-v2`
222
222
3. Retrieves highly relevant posts/comments using vector similarity search on the `hackernews` table
223
223
4. Uses `LangChain` and OpenAI `gpt-3.5-turbo` Chat API to **summarize** the content retrieved in step #3.
224
224
The posts/comments retrieved in step #3 are passed as _context_ to the Chat API and are the key link in Generative AI.
@@ -256,7 +256,7 @@ as a powerful tool for real-time data processing, analytics, and handling large
256
256
efficiently, gaining popularity for its impressive performance and cost-effectiveness.
0 commit comments