Skip to content

Commit 0c6a0b0

Browse files
committed
vector search in nodejs tutorial - draft 1 completed
1 parent 78886dd commit 0c6a0b0

File tree

1 file changed

+60
-51
lines changed

1 file changed

+60
-51
lines changed

docs/howtos/solutions/vector/getting-started-vector/index-getting-started-vector.mdx

Lines changed: 60 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -46,19 +46,19 @@ In a more complex scenario, like natural language processing (NLP), words or ent
4646
Vector similarity is a measure that quantifies how alike two vectors are, typically by evaluating the `distance` or `angle` between them in a multi-dimensional space.
4747
When vectors represent data points, such as texts or images, the similarity score can indicate how similar the underlying data points are in terms of their features or content.
4848

49-
### Use cases for vector similarity:
49+
### Use cases for vector similarity
5050

5151
- **Recommendation Systems**: If you have vectors representing user preferences or item profiles, you can quickly find items that are most similar to a user's preference vector.
5252
- **Image Search**: Store vectors representing image features, and then retrieve images most similar to a given image's vector.
5353
- **Textual Content Retrieval**: Store vectors representing textual content (e.g., articles, product descriptions) and find the most relevant texts for a given query vector.
5454

5555
## How to calculate vector similarity?
5656

57-
There are several ways to calculate vector similarity, but some of the most common methods include:
57+
Several techniques are available to assess vector similarity, with some of the most prevalent ones being:
5858

5959
### Euclidean Distance (L2 norm)
6060

61-
**Euclidean Distance (L2 norm)** computes the "straight line" distance between two points in a multi-dimensional space. Lower values indicate closer proximity, and hence higher similarity.
61+
**Euclidean Distance (L2 norm)** calculates the linear distance between two points within a multi-dimensional space. Lower values indicate closer proximity, and hence higher similarity.
6262

6363
<img
6464
src={EuclideanDistanceFormulaImage}
@@ -67,7 +67,7 @@ There are several ways to calculate vector similarity, but some of the most comm
6767
className="margin-bottom--md"
6868
/>
6969

70-
Lets consider `product 1` and `product 2` from above ecommerce dataset and calculate `Euclidean Distance` between those products with all features.
70+
For illustration purposes, let's assess `product 1` and `product 2` from the earlier ecommerce dataset and determine the `Euclidean Distance` considering all features.
7171

7272
<img
7373
src={EuclideanDistanceSampleImage}
@@ -76,7 +76,7 @@ Lets consider `product 1` and `product 2` from above ecommerce dataset and calcu
7676
className="margin-bottom--md"
7777
/>
7878

79-
For the purpose of this demonstration, We will consider 2D chart built using [chart.js](https://www.chartjs.org/) of `Price vs. Quality` features (rather all features with multi dimensional chart) for our products and calculate `Euclidean Distance` between products with Price & Quality features only.
79+
As an example, we will use a 2D chart made with [chart.js](https://www.chartjs.org/) comparing the `Price vs. Quality` features of our products, focusing solely on these two attributes to compute the `Euclidean Distance`.
8080

8181
![chart](./images/euclidean-distance-chart.png)
8282

@@ -91,13 +91,16 @@ For the purpose of this demonstration, We will consider 2D chart built using [ch
9191
className="margin-bottom--md"
9292
/>
9393

94-
Note: If two vectors are pointing in the same direction, the cosine of the angle between them is 1. If they're orthogonal, the cosine is 0, and if they're pointing in opposite directions, the cosine is -1.
94+
:::note
95+
If two vectors are pointing in the same direction, the cosine of the angle between them is 1. If they're orthogonal, the cosine is 0, and if they're pointing in opposite directions, the cosine is -1.
96+
:::
9597

96-
Lets consider same `product 1` and `product 2` from above ecommerce dataset and calculate `Cosine Distance` between those products with all features.
98+
Again, consider `product 1` and `product 2` from the previous dataset and calculate the `Cosine Distance` for all features.
9799

98100
![sample](./images/cosine-sample.png)
99101

100-
Lets consider 2D chart built using [chart.js](https://www.chartjs.org/) of `Price vs. Quality` features for our products and visualize `Cosine Similarity` between products with Price & Quality features only.
102+
Using [chart.js](https://www.chartjs.org/), we've crafted a 2D chart of `Price vs. Quality` features. It visualizes the `Cosine Similarity` solely based on these attributes.
103+
101104
![chart](./images/cosine-chart.png)
102105

103106
### Inner Product
@@ -110,25 +113,27 @@ Lets consider 2D chart built using [chart.js](https://www.chartjs.org/) of `Pric
110113
width="450"
111114
className="margin-bottom--md"
112115
/>
113-
Note: The inner product can be thought of as a measure of how much two vectors "align"
116+
117+
:::note
118+
The inner product can be thought of as a measure of how much two vectors "align"
114119
in a given vector space. Higher values indicate higher similarity. However, the raw
115120
values can be large for long vectors; hence, normalization is recommended for better
116-
interpretation. If the vectors are normalized, their dot product will be `1 if they
117-
are identical` and `0 if they are orthogonal` (uncorrelated).
121+
interpretation. If the vectors are normalized, their dot product will be `1 if they are identical` and `0 if they are orthogonal` (uncorrelated).
122+
:::
118123

119-
Lets consider same `product 1` and `product 2` from above ecommerce dataset and calculate `inner Product` between those products with all features.
124+
Considering our `product 1` and `product 2`, let's compute the `Inner Product` across all features.
120125

121126
![sample](./images/ip-sample.png)
122127

123-
:::note
128+
:::tip
124129
Vectors can also be stored in databases in **binary formats** to save space. In practical applications, it's crucial to strike a balance between the dimensionality of the vectors (which impacts storage and computational costs) and the quality or granularity of the information they capture.
125130
:::
126131

127132
## Generating vectors
128133

129-
In our context, we're interested in generating sentence (product description) and image (product image) embeddings/ vector.There are many AI model repositories (say like GitHub) where pre-trained AI models are stored, maintained, and shared with the public.
134+
In our scenario, our focus revolves around generating sentence (product description) and image (product image) embeddings or vectors. There's an abundance of AI model repositories, like GitHub, where AI models are pre-trained, maintained, and shared.
130135

131-
Let's use one of model from [Hugging Face Model Hub](https://huggingface.co/models) for sentence embeddings and from [TensorFlow Hub](https://tfhub.dev/) for image embeddings for diversity.
136+
For sentence embeddings, we'll employ a model from [Hugging Face Model Hub](https://huggingface.co/models), and for image embeddings, one from [TensorFlow Hub](https://tfhub.dev/) will be leveraged for variety.
132137

133138
:::tip GITHUB CODE
134139

@@ -139,29 +144,27 @@ git clone https://github.com/redis-developer/redis-vector-nodejs-solutions.git
139144

140145
### Sentence vector
141146

142-
To obtain sentence embeddings, let's use a hugging face model called [Xenova/all-distilroberta-v1](https://huggingface.co/Xenova/all-distilroberta-v1) which is compatible version of [sentence-transformers/all-distilroberta-v1](https://huggingface.co/sentence-transformers/all-distilroberta-v1) for transformer.js with ONNX weights.
147+
To procure sentence embeddings, we'll make use of a Hugging Face model titled [Xenova/all-distilroberta-v1](https://huggingface.co/Xenova/all-distilroberta-v1). It's a compatible version of [sentence-transformers/all-distilroberta-v1](https://huggingface.co/sentence-transformers/all-distilroberta-v1) for transformer.js with ONNX weights.
143148

144-
:::note
149+
:::info
145150

146-
<u>
147-
[Hugging Face Transformers](https://huggingface.co/docs/transformers.js/index)
148-
</u> is a widely-used open-source library for Natural Language Processing (NLP) tasks.
149-
It provides an accessible and straightforward way to use many state-of-the-art NLP
150-
models
151+
[<u>Hugging Face Transformers</u>](https://huggingface.co/docs/transformers.js/index)
152+
is a renowned open-source tool for Natural Language Processing (NLP) tasks.
153+
It simplifies the use of cutting-edge NLP models.
151154

152-
transformers.j library is essentially the JavaScript version of Hugging Face's popular Python library.
155+
The transformers.j library is essentially the JavaScript version of Hugging Face's popular Python library.
153156

154157
:::
155158

156-
:::note
159+
:::info
157160

158-
<u>[ONNX (Open Neural Network eXchange)](https://onnx.ai) </u> is an open standard
161+
[<u>ONNX (Open Neural Network eXchange)</u>](https://onnx.ai) is an open standard
159162
that defines a common set of operators and a common file format to represent deep
160163
learning models in a wide variety of frameworks, including PyTorch and TensorFlow
161164

162165
:::
163166

164-
Below is a Node.js code sample that showcases how to generate vector embeddings for any sentence provided:
167+
Below, you'll find a Node.js code snippet that illustrates how to create vector embeddings for any provided `sentence`:
165168

166169
```sh
167170
npm install @xenova/transformers
@@ -186,7 +189,7 @@ async function generateSentenceEmbeddings(_sentence): Promise<number[]> {
186189
export { generateSentenceEmbeddings };
187190
```
188191
189-
Please find vector output for a sample text
192+
Here's a glimpse of the vector output for a sample text:
190193
191194
```js title="sample output"
192195
const embeddings = await generateSentenceEmbeddings('I Love Redis !');
@@ -203,9 +206,9 @@ console.log(embeddings);
203206
204207
### Image vector
205208
206-
To obtain image embeddings, let's use a tensor flow model called [mobilenet](https://github.com/tensorflow/tfjs-models/tree/master/mobilenet)
209+
To obtain image embeddings, we'll leverage the [mobilenet](https://github.com/tensorflow/tfjs-models/tree/master/mobilenet) model from TensorFlow.
207210
208-
Below is a Node.js code sample that showcases how to generate vector embeddings for any image provided:
211+
Below, you'll find a Node.js code snippet that illustrates how to create vector embeddings for any provided `image`:
209212
210213
```sh
211214
npm i @tensorflow/tfjs @tensorflow/tfjs-node @tensorflow-models/mobilenet jpeg-js
@@ -259,11 +262,12 @@ async function generateImageEmbeddings(imagePath: string) {
259262
260263
<div>
261264
262-
:::note
263-
We are using <u>[mobilenet model](https://github.com/tensorflow/tfjs-models/tree/master/mobilenet)</u> which is trained only on small <u>[set of image classes](https://github.com/tensorflow/tfjs-examples/blob/master/mobilenet/imagenet_classes.js)</u>. Selecting an image classification model depends on various factors, such as the dataset size, dataset diversity, computational resources, and the specific needs of the application. There are many image classification models like EfficientNet, ResNets, Vision Transformers (ViT)..etc which can be chosen from based on your requirements.
265+
:::tip Image classification model
266+
267+
We are using <u>[mobilenet model](https://github.com/tensorflow/tfjs-models/tree/master/mobilenet)</u> which is trained only on small <u>[set of image classes](https://github.com/tensorflow/tfjs-examples/blob/master/mobilenet/imagenet_classes.js)</u>. The choice of an image classification model depends on various factors, such as the dataset size, dataset diversity, computational resources, and the specific requirements of your application. There are various alternative image classification models, such as EfficientNet, ResNets, and Vision Transformers (ViT), that you can select based on your needs.
264268
:::
265269
266-
Please find vector output for a sample watch image
270+
Below is an illustration of the vector output for a sample watch image:
267271
268272
</div>
269273
@@ -307,7 +311,7 @@ git clone https://github.com/redis-developer/redis-vector-nodejs-solutions.git
307311
308312
### Sample Data seeding
309313
310-
Let's assume a simplified e-commerce scenario. consider below `products` JSON for vector search demonstration in this tutorial.
314+
For the purposes of this tutorial, let's consider a simplified e-commerce context. The `products` JSON provided offers a glimpse into vector search functionalities we'll be discussing.
311315
312316
```js title="src/data.ts"
313317
const products = [
@@ -354,7 +358,7 @@ const products = [
354358
];
355359
```
356360
357-
Below is the sample code to add `products` data as JSON in Redis along with vectors of product description and product image.
361+
Below is the sample code to seed `products` data as JSON in Redis. The data also includes vectors of both product descriptions and images.
358362
359363
```js title="src/index.ts"
360364
async function addProductWithEmbeddings(_products) {
@@ -383,17 +387,17 @@ async function addProductWithEmbeddings(_products) {
383387
}
384388
```
385389
386-
Data view in RedisInsight
390+
You can observe products JSON data in RedisInsight:
387391
388392
![products data in RedisInsight](./images/products-data-gui.png)
389393
390394
:::tip
391-
Download <u>[RedisInsight](https://redis.com/redis-enterprise/redis-insight/)</u> to view your Redis data or to play with raw Redis commands in the workbench. learn more about <u>[RedisInsight in tutorials](/explore/redisinsight/)</u>
395+
Download <u>[RedisInsight](https://redis.com/redis-enterprise/redis-insight/)</u> to visually explore your Redis data or to engage with raw Redis commands in the workbench. Dive deeper into RedisInsight with these <u>[tutorials](/explore/redisinsight/)</u>.
392396
:::
393397
394398
### Create vector index
395399
396-
JSON fields must be indexed in Redis to perform search on them. Below implementation shows indexing different field types including vector fields like productDescriptionEmbeddings and productImageEmbeddings.
400+
For searches to be conducted on JSON fields in Redis, they must be indexed. The methodology below highlights the process of indexing different types of fields. This encompasses vector fields such as productDescriptionEmbeddings and productImageEmbeddings.
397401
398402
```ts title="src/redis-index.ts"
399403
import {
@@ -515,15 +519,14 @@ const createRedisIndex = async () => {
515519
};
516520
```
517521
518-
:::note FLAT VS HNSW indexing
519-
FLAT : When you index your vectors in a "FLAT" manner, you're essentially storing them as they are, without any additional structure or hierarchy. When you query against a FLAT index, the algorithm will perform a linear scan through all the vectors to find the most similar ones. This is a more accurate, but much slower and compute intensive approach (suitable for smaller dataset).
522+
:::info FLAT VS HNSW indexing
523+
FLAT: When vectors are indexed in a "FLAT" structure, they're stored in their original form without any added hierarchy. A search against a FLAT index will require the algorithm to scan each vector linearly to find the most similar matches. While this is accurate, it's computationally intensive and slower, making it ideal for smaller datasets.
520524
521-
HNSW : (Hierarchical Navigable Small World) :
522-
HNSW is a graph-based method for indexing high-dimensional data. For bigger datasets it becomes slower to compare with every single vector in the index, so a probabilistic approach through the HNSW algorithm provides very fast search results (but sacrifices some accuracy)
525+
HNSW (Hierarchical Navigable Small World): HNSW is a graph-centric method tailored for indexing high-dimensional data. With larger datasets, linear comparisons against every vector in the index become time-consuming. HNSW employs a probabilistic approach, ensuring faster search results but with a slight trade-off in accuracy.
523526
:::
524527
525-
:::note INITIAL_CAP and BLOCK_SIZE parameters
526-
INITIAL_CAP and BLOCK_SIZE are configuration parameters related to how vectors are stored and indexed.
528+
:::info INITIAL_CAP and BLOCK_SIZE parameters
529+
Both INITIAL_CAP and BLOCK_SIZE are configuration parameters that control how vectors are stored and indexed.
527530
528531
INITIAL_CAP defines the initial capacity of the vector index. It helps in pre-allocating space for the index.
529532
@@ -538,7 +541,7 @@ KNN, or k-Nearest Neighbors, is an algorithm used in both classification and reg
538541
539542
Redis allows you to index and then search for vectors [using the KNN approach](https://redis.io/docs/stack/search/reference/vectors/#pure-knn-queries).
540543
541-
Below is a Node.js code sample that showcases how to perform a KNN query for any search term provided:
544+
Below, you'll find a Node.js code snippet that illustrates how to perform `KNN query` for any provided `search text`:
542545
543546
```ts title="src/knn-query.ts"
544547
const float32Buffer = (arr) => {
@@ -593,12 +596,12 @@ const queryProductDescriptionEmbeddingsByKNN = async (
593596
};
594597
```
595598
596-
Please find output for a KNN query in Redis **(Lower score/distance in output indicates higher similarity)**
599+
Please find output for a KNN query in Redis **(A lower score or distance in the output signifies a higher degree of similarity.)**
597600
598601
```js title="sample output"
599602
const result = await queryProductDescriptionEmbeddingsByKNN(
600-
'Puma watch with cat', //search term
601-
3, //max no of results expected
603+
'Puma watch with cat', //search text
604+
3, //max number of results expected
602605
);
603606
console.log(JSON.stringify(result, null, 4));
604607

@@ -638,7 +641,9 @@ console.log(JSON.stringify(result, null, 4));
638641
*/
639642
```
640643
641-
Note : Can combine KNN query with regular Redis search feature using [hybrid knn queries](https://redis.io/docs/interact/search-and-query/search/vectors/#hybrid-knn-queries)
644+
:::note
645+
KNN queries can be combined with standard Redis search functionalities using <u>[hybrid knn queries](https://redis.io/docs/interact/search-and-query/search/vectors/#hybrid-knn-queries).</u>
646+
:::
642647
643648
## What is vector range query?
644649
@@ -647,7 +652,7 @@ For vectors, a "range query" typically refers to retrieving all vectors within a
647652
648653
### Range query with Redis
649654
650-
Below is a Node.js code sample that showcases how to perform a vector range query for any radius (distance) range provided:
655+
Below, you'll find a Node.js code snippet that illustrates how to perform vector `range query` for any range (radius/ distance)provided:
651656
652657
```js title="src/range-query.ts"
653658
const queryProductDescriptionEmbeddingsByRange = async (_searchTxt, _range) => {
@@ -697,8 +702,8 @@ Please find output for a range query in Redis
697702
698703
```js title="sample output"
699704
const result2 = await queryProductDescriptionEmbeddingsByRange(
700-
'Puma watch with cat', //search term
701-
1.0, //with in score or distance
705+
'Puma watch with cat', //search text
706+
1.0, //with in range or distance
702707
);
703708
console.log(JSON.stringify(result2, null, 4));
704709
/*
@@ -727,3 +732,7 @@ console.log(JSON.stringify(result2, null, 4));
727732
}
728733
*/
729734
```
735+
736+
:::info Image vs text vector query
737+
The syntax for KNN/range vector queries remains consistent whether you're dealing with image vectors or text vectors.
738+
:::

0 commit comments

Comments
 (0)