Skip to content

Commit 6962f87

Browse files
committed
Review
1 parent ef74c44 commit 6962f87

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

docs/getting-started/example-datasets/hacker-news-vector-search.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ real world vector search application built on top of user generated, textual dat
1616

1717
## Dataset details {#dataset-details}
1818

19-
The complete dataset with vector embeddings is made available by ClickHouse as a single `Parquet` file in a `S3` bucket : https://clickhouse-datasets.s3.amazonaws.com/hackernews-miniLM/hackernews_part_1_of_1.parquet
19+
The complete dataset with vector embeddings is made available by ClickHouse as a single `Parquet` file in a [S3 bucket](https://clickhouse-datasets.s3.amazonaws.com/hackernews-miniLM/hackernews_part_1_of_1.parquet)
2020

2121
We recommend users first run a sizing exercise to estimate the storage and memory requirements for this dataset by referring to the [documentation](../../engines/table-engines/mergetree-family/annindexes.md).
2222

@@ -63,11 +63,11 @@ To load the dataset from the `Parquet` file, run the following SQL statement:
6363
INSERT INTO hackernews SELECT * FROM s3('https://clickhouse-datasets.s3.amazonaws.com/hackernews-miniLM/hackernews_part_1_of_1.parquet');
6464
```
6565

66-
The loading of 28.74 million rows into the table will take a few minutes.
66+
Inserting 28.74 million rows into the table will take a few minutes.
6767

6868
### Build a vector similarity index {#build-vector-similarity-index}
6969

70-
Run the following SQL to define and build a vector similarity index on the `vector` column of the `hackernews` table :
70+
Run the following SQL to define and build a vector similarity index on the `vector` column of the `hackernews` table:
7171

7272
```sql
7373
ALTER TABLE hackernews ADD INDEX vector_index vector TYPE vector_similarity('hnsw', 'cosineDistance', 384, 'bf16', 64, 512);
@@ -218,7 +218,7 @@ A very simple but high potential generative AI example application is presented
218218
The application performs the following steps:
219219

220220
1. Accepts a _topic_ as input from the user
221-
2. Generates an embedding vector for the _topic_ by using `SentenceTransformers` with model `all-MiniLM-L6-v2`
221+
2. Generates an embedding vector for the _topic_ by using the `SentenceTransformers` with model `all-MiniLM-L6-v2`
222222
3. Retrieves highly relevant posts/comments using vector similarity search on the `hackernews` table
223223
4. Uses `LangChain` and OpenAI `gpt-3.5-turbo` Chat API to **summarize** the content retrieved in step #3.
224224
The posts/comments retrieved in step #3 are passed as _context_ to the Chat API and are the key link in Generative AI.
@@ -256,7 +256,7 @@ as a powerful tool for real-time data processing, analytics, and handling large
256256
efficiently, gaining popularity for its impressive performance and cost-effectiveness.
257257
```
258258
259-
Code for above application :
259+
Code for the above application :
260260
261261
```python
262262
print("Initializing...")

0 commit comments

Comments
 (0)