You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/howtos/solutions/vector/getting-started-vector/index-getting-started-vector.mdx
+60-51Lines changed: 60 additions & 51 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,19 +46,19 @@ In a more complex scenario, like natural language processing (NLP), words or ent
46
46
Vector similarity is a measure that quantifies how alike two vectors are, typically by evaluating the `distance` or `angle` between them in a multi-dimensional space.
47
47
When vectors represent data points, such as texts or images, the similarity score can indicate how similar the underlying data points are in terms of their features or content.
48
48
49
-
### Use cases for vector similarity:
49
+
### Use cases for vector similarity
50
50
51
51
-**Recommendation Systems**: If you have vectors representing user preferences or item profiles, you can quickly find items that are most similar to a user's preference vector.
52
52
-**Image Search**: Store vectors representing image features, and then retrieve images most similar to a given image's vector.
53
53
-**Textual Content Retrieval**: Store vectors representing textual content (e.g., articles, product descriptions) and find the most relevant texts for a given query vector.
54
54
55
55
## How to calculate vector similarity?
56
56
57
-
There are several ways to calculate vector similarity, but some of the most common methods include:
57
+
Several techniques are available to assess vector similarity, with some of the most prevalent ones being:
58
58
59
59
### Euclidean Distance (L2 norm)
60
60
61
-
**Euclidean Distance (L2 norm)**computes the "straight line" distance between two points in a multi-dimensional space. Lower values indicate closer proximity, and hence higher similarity.
61
+
**Euclidean Distance (L2 norm)**calculates the linear distance between two points within a multi-dimensional space. Lower values indicate closer proximity, and hence higher similarity.
62
62
63
63
<img
64
64
src={EuclideanDistanceFormulaImage}
@@ -67,7 +67,7 @@ There are several ways to calculate vector similarity, but some of the most comm
67
67
className="margin-bottom--md"
68
68
/>
69
69
70
-
Lets consider `product 1` and `product 2` from above ecommerce dataset and calculate `Euclidean Distance`between those products with all features.
70
+
For illustration purposes, let's assess `product 1` and `product 2` from the earlier ecommerce dataset and determine the `Euclidean Distance`considering all features.
71
71
72
72
<img
73
73
src={EuclideanDistanceSampleImage}
@@ -76,7 +76,7 @@ Lets consider `product 1` and `product 2` from above ecommerce dataset and calcu
76
76
className="margin-bottom--md"
77
77
/>
78
78
79
-
For the purpose of this demonstration, We will consider 2D chart built using[chart.js](https://www.chartjs.org/)of `Price vs. Quality` features (rather all features with multi dimensional chart) for our products and calculate`Euclidean Distance` between products with Price & Quality features only.
79
+
As an example, we will use a 2D chart made with[chart.js](https://www.chartjs.org/)comparing the `Price vs. Quality` features of our products, focusing solely on these two attributes to compute the`Euclidean Distance`.
80
80
81
81

82
82
@@ -91,13 +91,16 @@ For the purpose of this demonstration, We will consider 2D chart built using [ch
91
91
className="margin-bottom--md"
92
92
/>
93
93
94
-
Note: If two vectors are pointing in the same direction, the cosine of the angle between them is 1. If they're orthogonal, the cosine is 0, and if they're pointing in opposite directions, the cosine is -1.
94
+
:::note
95
+
If two vectors are pointing in the same direction, the cosine of the angle between them is 1. If they're orthogonal, the cosine is 0, and if they're pointing in opposite directions, the cosine is -1.
96
+
:::
95
97
96
-
Lets consider same `product 1` and `product 2` from above ecommerce dataset and calculate `Cosine Distance`between those products with all features.
98
+
Again, consider `product 1` and `product 2` from the previous dataset and calculate the `Cosine Distance`for all features.
97
99
98
100

99
101
100
-
Lets consider 2D chart built using [chart.js](https://www.chartjs.org/) of `Price vs. Quality` features for our products and visualize `Cosine Similarity` between products with Price & Quality features only.
102
+
Using [chart.js](https://www.chartjs.org/), we've crafted a 2D chart of `Price vs. Quality` features. It visualizes the `Cosine Similarity` solely based on these attributes.
103
+
101
104

102
105
103
106
### Inner Product
@@ -110,25 +113,27 @@ Lets consider 2D chart built using [chart.js](https://www.chartjs.org/) of `Pric
110
113
width="450"
111
114
className="margin-bottom--md"
112
115
/>
113
-
Note: The inner product can be thought of as a measure of how much two vectors "align"
116
+
117
+
:::note
118
+
The inner product can be thought of as a measure of how much two vectors "align"
114
119
in a given vector space. Higher values indicate higher similarity. However, the raw
115
120
values can be large for long vectors; hence, normalization is recommended for better
116
-
interpretation. If the vectors are normalized, their dot product will be `1 if they
117
-
are identical` and `0 if they are orthogonal` (uncorrelated).
121
+
interpretation. If the vectors are normalized, their dot product will be `1 if they are identical` and `0 if they are orthogonal` (uncorrelated).
122
+
:::
118
123
119
-
Lets consider same `product 1` and `product 2` from above ecommerce dataset and calculate `inner Product`between those products with all features.
124
+
Considering our `product 1` and `product 2`, let's compute the `Inner Product`across all features.
120
125
121
126

122
127
123
-
:::note
128
+
:::tip
124
129
Vectors can also be stored in databases in **binary formats** to save space. In practical applications, it's crucial to strike a balance between the dimensionality of the vectors (which impacts storage and computational costs) and the quality or granularity of the information they capture.
125
130
:::
126
131
127
132
## Generating vectors
128
133
129
-
In our context, we're interested in generating sentence (product description) and image (product image) embeddings/ vector.There are many AI model repositories (say like GitHub) where pre-trained AI models are stored, maintained, and shared with the public.
134
+
In our scenario, our focus revolves around generating sentence (product description) and image (product image) embeddings or vectors. There's an abundance of AI model repositories, like GitHub, where AI models are pre-trained, maintained, and shared.
130
135
131
-
Let's use one of model from [Hugging Face Model Hub](https://huggingface.co/models)for sentence embeddings and from [TensorFlow Hub](https://tfhub.dev/)for image embeddings for diversity.
136
+
For sentence embeddings, we'll employ a model from [Hugging Face Model Hub](https://huggingface.co/models), and for image embeddings, one from [TensorFlow Hub](https://tfhub.dev/)will be leveraged for variety.
To obtain sentence embeddings, let's use a hugging face model called[Xenova/all-distilroberta-v1](https://huggingface.co/Xenova/all-distilroberta-v1) which is compatible version of [sentence-transformers/all-distilroberta-v1](https://huggingface.co/sentence-transformers/all-distilroberta-v1) for transformer.js with ONNX weights.
147
+
To procure sentence embeddings, we'll make use of a Hugging Face model titled[Xenova/all-distilroberta-v1](https://huggingface.co/Xenova/all-distilroberta-v1). It's a compatible version of [sentence-transformers/all-distilroberta-v1](https://huggingface.co/sentence-transformers/all-distilroberta-v1) for transformer.js with ONNX weights.
143
148
144
-
:::note
149
+
:::info
145
150
146
-
<u>
147
-
[Hugging Face Transformers](https://huggingface.co/docs/transformers.js/index)
148
-
</u> is a widely-used open-source library for Natural Language Processing (NLP) tasks.
149
-
It provides an accessible and straightforward way to use many state-of-the-art NLP
150
-
models
151
+
[<u>Hugging Face Transformers</u>](https://huggingface.co/docs/transformers.js/index)
152
+
is a renowned open-source tool for Natural Language Processing (NLP) tasks.
153
+
It simplifies the use of cutting-edge NLP models.
151
154
152
-
transformers.j library is essentially the JavaScript version of Hugging Face's popular Python library.
155
+
The transformers.j library is essentially the JavaScript version of Hugging Face's popular Python library.
153
156
154
157
:::
155
158
156
-
:::note
159
+
:::info
157
160
158
-
<u>[ONNX (Open Neural Network eXchange)](https://onnx.ai) </u> is an open standard
161
+
[<u>ONNX (Open Neural Network eXchange)</u>](https://onnx.ai) is an open standard
159
162
that defines a common set of operators and a common file format to represent deep
160
163
learning models in a wide variety of frameworks, including PyTorch and TensorFlow
161
164
162
165
:::
163
166
164
-
Below is a Node.js code sample that showcases how to generate vector embeddings for any sentence provided:
167
+
Below, you'll find a Node.js code snippet that illustrates how to create vector embeddings for any provided`sentence`:
165
168
166
169
```sh
167
170
npm install @xenova/transformers
@@ -186,7 +189,7 @@ async function generateSentenceEmbeddings(_sentence): Promise<number[]> {
186
189
export { generateSentenceEmbeddings };
187
190
```
188
191
189
-
Please find vector output for a sample text
192
+
Here's a glimpse of the vector output for a sample text:
190
193
191
194
```js title="sample output"
192
195
constembeddings=awaitgenerateSentenceEmbeddings('I Love Redis !');
@@ -203,9 +206,9 @@ console.log(embeddings);
203
206
204
207
### Image vector
205
208
206
-
To obtain image embeddings, let's use a tensor flow model called [mobilenet](https://github.com/tensorflow/tfjs-models/tree/master/mobilenet)
209
+
To obtain image embeddings, we'll leverage the [mobilenet](https://github.com/tensorflow/tfjs-models/tree/master/mobilenet) model from TensorFlow.
207
210
208
-
Below is a Node.js code sample that showcases how to generate vector embeddings for any image provided:
211
+
Below, you'll find a Node.js code snippet that illustrates how to create vector embeddings for any provided`image`:
209
212
210
213
```sh
211
214
npm i @tensorflow/tfjs @tensorflow/tfjs-node @tensorflow-models/mobilenet jpeg-js
@@ -259,11 +262,12 @@ async function generateImageEmbeddings(imagePath: string) {
259
262
260
263
<div>
261
264
262
-
:::note
263
-
We are using <u>[mobilenet model](https://github.com/tensorflow/tfjs-models/tree/master/mobilenet)</u> which is trained only on small <u>[set of image classes](https://github.com/tensorflow/tfjs-examples/blob/master/mobilenet/imagenet_classes.js)</u>. Selecting an image classification model depends on various factors, such as the dataset size, dataset diversity, computational resources, and the specific needs of the application. There are many image classification models like EfficientNet, ResNets, Vision Transformers (ViT)..etc which can be chosen from based on your requirements.
265
+
:::tip Image classification model
266
+
267
+
We are using <u>[mobilenet model](https://github.com/tensorflow/tfjs-models/tree/master/mobilenet)</u> which is trained only on small <u>[set of image classes](https://github.com/tensorflow/tfjs-examples/blob/master/mobilenet/imagenet_classes.js)</u>. The choice of an image classification model depends on various factors, such as the dataset size, dataset diversity, computational resources, and the specific requirements of your application. There are various alternative image classification models, such as EfficientNet, ResNets, and Vision Transformers (ViT), that you can select based on your needs.
264
268
:::
265
269
266
-
Please find vector output for a sample watch image
270
+
Below is an illustration of the vector output for a sample watch image:
Let's assume a simplified e-commerce scenario. consider below `products` JSON for vector search demonstration in this tutorial.
314
+
For the purposes of this tutorial, let's consider a simplified e-commerce context. The `products` JSON provided offers a glimpse into vector search functionalities we'll be discussing.
311
315
312
316
```js title="src/data.ts"
313
317
constproducts= [
@@ -354,7 +358,7 @@ const products = [
354
358
];
355
359
```
356
360
357
-
Below is the sample code to add`products` data as JSON in Redis along with vectors of product description and product image.
361
+
Below is the sample code to seed`products` data as JSON in Redis. The data also includes vectors of both product descriptions and images.
@@ -383,17 +387,17 @@ async function addProductWithEmbeddings(_products) {
383
387
}
384
388
```
385
389
386
-
Data view in RedisInsight
390
+
You can observe products JSON data in RedisInsight:
387
391
388
392

389
393
390
394
:::tip
391
-
Download <u>[RedisInsight](https://redis.com/redis-enterprise/redis-insight/)</u> to view your Redis data or to play with raw Redis commands in the workbench. learn more about <u>[RedisInsight in tutorials](/explore/redisinsight/)</u>
395
+
Download <u>[RedisInsight](https://redis.com/redis-enterprise/redis-insight/)</u> to visually explore your Redis data or to engage with raw Redis commands in the workbench. Dive deeper into RedisInsight with these <u>[tutorials](/explore/redisinsight/)</u>.
392
396
:::
393
397
394
398
### Create vector index
395
399
396
-
JSON fields must be indexed in Redis to perform search on them. Below implementation shows indexing different field types including vector fields like productDescriptionEmbeddings and productImageEmbeddings.
400
+
For searches to be conducted on JSON fields in Redis, they must be indexed. The methodology below highlights the process of indexing different types of fields. This encompasses vector fields such as productDescriptionEmbeddings and productImageEmbeddings.
FLAT: When you index your vectors in a "FLAT" manner, you're essentially storing them as they are, without any additional structure or hierarchy. When you query against a FLAT index, the algorithm will perform a linear scan through all the vectors to find the most similar ones. This is a more accurate, but much slower and compute intensive approach (suitable for smaller dataset).
522
+
:::info FLAT VS HNSW indexing
523
+
FLAT: When vectors are indexed in a "FLAT" structure, they're stored in their original form without any added hierarchy. A search against a FLAT indexwill require the algorithm to scan each vector linearly to find the most similar matches. While this is accurate, it's computationally intensive and slower, making it ideal for smaller datasets.
520
524
521
-
HNSW : (Hierarchical Navigable Small World) :
522
-
HNSW is a graph-based method for indexing high-dimensional data. For bigger datasets it becomes slower to compare with every single vector in the index, so a probabilistic approach through the HNSW algorithm provides very fast search results (but sacrifices some accuracy)
525
+
HNSW (Hierarchical Navigable Small World): HNSW is a graph-centric method tailored for indexing high-dimensional data. With larger datasets, linear comparisons against every vector in the index become time-consuming. HNSW employs a probabilistic approach, ensuring faster search results but with a slight trade-off in accuracy.
523
526
:::
524
527
525
-
:::note INITIAL_CAP and BLOCK_SIZE parameters
526
-
INITIAL_CAP and BLOCK_SIZE are configuration parameters related to how vectors are stored and indexed.
528
+
:::info INITIAL_CAP and BLOCK_SIZE parameters
529
+
Both INITIAL_CAP and BLOCK_SIZE are configuration parameters that control how vectors are stored and indexed.
527
530
528
531
INITIAL_CAP defines the initial capacity of the vector index. It helps in pre-allocating space for the index.
529
532
@@ -538,7 +541,7 @@ KNN, or k-Nearest Neighbors, is an algorithm used in both classification and reg
538
541
539
542
Redis allows you to index and then search for vectors [using the KNN approach](https://redis.io/docs/stack/search/reference/vectors/#pure-knn-queries).
540
543
541
-
Below is a Node.js code sample that showcases how to perform a KNN query for any search term provided:
544
+
Below, you'll find a Node.js code snippet that illustrates how to perform `KNN query` for any provided `search text`:
Note : Can combine KNN query with regular Redis search feature using [hybrid knn queries](https://redis.io/docs/interact/search-and-query/search/vectors/#hybrid-knn-queries)
644
+
:::note
645
+
KNN queries can be combined with standard Redis search functionalities using <u>[hybrid knn queries](https://redis.io/docs/interact/search-and-query/search/vectors/#hybrid-knn-queries).</u>
646
+
:::
642
647
643
648
## What is vector range query?
644
649
@@ -647,7 +652,7 @@ For vectors, a "range query" typically refers to retrieving all vectors within a
647
652
648
653
### Range query with Redis
649
654
650
-
Below is a Node.js code sample that showcases how to perform a vector range query for any radius (distance) range provided:
655
+
Below, you'll find a Node.js code snippet that illustrates how to perform vector `range query` for any range (radius/ distance)provided:
0 commit comments