Skip to content

Commit 22cbeae

Browse files
committed
Upgrade Algolia to v4 and discover search regression
1 parent 9d18373 commit 22cbeae

File tree

4 files changed

+62
-55
lines changed

4 files changed

+62
-55
lines changed

scripts/search/README.md

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
```bash
88
pip install -r requirements.txt
99
```
10+
1011
### Running
1112

1213
```bash
@@ -33,7 +34,7 @@ options:
3334

3435
[]()## Search scripts
3536

36-
We use these to evaluate search performance. `results.csv` contains a list of authoriative search results for 200 terms.
37+
We use these to evaluate search performance. `results.csv` contains a list of authoritative search results for 200 terms.
3738

3839
We use this to compute an average nDCG.
3940

@@ -47,6 +48,19 @@ pip install -r requirements.txt
4748

4849
### Running
4950

51+
You need to comment out either Dev or Prod depending on what you want to test.
52+
The API key is the public search key, don't worry.
53+
54+
```python
55+
# dev details
56+
# ALGOLIA_APP_ID = "7AL1W7YVZK"
57+
# ALGOLIA_API_KEY = "43bd50d4617a97c9b60042a2e8a348f9"
58+
59+
# Prod details
60+
ALGOLIA_APP_ID = "5H9UG7CX5W"
61+
ALGOLIA_API_KEY = "4a7bf25cf3edbef29d78d5e1eecfdca5"
62+
```
63+
5064
```bash
5165
python compute_ndcg.py -d
5266
```
@@ -67,12 +81,13 @@ options:
6781

6882
### Results
6983

70-
| **Date** | **Average nDCG** | **Results** | **Changes** |
71-
|------------|------------------|--------------------------------------------------------------------------------------------------------|--------------------------------------------------|
72-
| 20/01/2024 | 0.4700 | [View Results](https://pastila.nl/?008231f5/bc107912f8a5074d70201e27b1a66c6c#cB/yJOsZPOWi9h8xAkuTUQ==) | Baseline |
73-
| 21/01/2024 | 0.5021 | [View Results](https://pastila.nl/?00bb2c2f/936a9a3af62a9bdda186af5f37f55782#m7Hg0i9F1YCesMW6ot25yA==) | Index `_` character and move language to English |
74-
| 24/01/2024 | 0.7072 | [View Results](https://pastila.nl/?065e3e67/e4ad889d0c166226118e6160b4ee53ff#x1NPd2R7hU90CZvvrE4nhg==) | Process markdown, and tune settings. |
75-
| 24/01/2024 | 0.7412 | [View Results](https://pastila.nl/?0020013d/e69b33aaae82e49bc71c5ee2cea9ad46#pqq3VtRd4eP4JM5/izcBcA==) | Include manual promotions for ambigious terms. |
84+
| **Date** | **Average nDCG** | **Results** | **Changes** |
85+
|------------|------------------|---------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
86+
| 20/01/2025 | 0.4700 | [View Results](https://pastila.nl/?008231f5/bc107912f8a5074d70201e27b1a66c6c#cB/yJOsZPOWi9h8xAkuTUQ==) | Baseline |
87+
| 21/01/2025 | 0.5021 | [View Results](https://pastila.nl/?00bb2c2f/936a9a3af62a9bdda186af5f37f55782#m7Hg0i9F1YCesMW6ot25yA==) | Index `_` character and move language to English |
88+
| 24/01/2025 | 0.7072 | [View Results](https://pastila.nl/?065e3e67/e4ad889d0c166226118e6160b4ee53ff#x1NPd2R7hU90CZvvrE4nhg==) | Process markdown, and tune settings. |
89+
| 24/01/2025 | 0.7412 | [View Results](https://pastila.nl/?0020013d/e69b33aaae82e49bc71c5ee2cea9ad46#pqq3VtRd4eP4JM5/izcBcA==) | Include manual promotions for ambigious terms. |
90+
| 28/08/2025 | 0.5729 | [View Results](https://pastila.nl/?00ab66a7/9eb511690e3b2f53ac7ae95e3f42113c#tK6gf8G9W7mbAQd3aD5f4Q==) | This was unfortunately not run or recorded for search improvements which were made recently |
7691

7792
Note: exact scores may vary due to constant content changes.
7893

scripts/search/index_pages.py

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -382,9 +382,7 @@ def process_markdown_directory(directory, base_directory, base_url):
382382
def send_to_algolia(client, index_name, records):
383383
"""Send records to Algolia."""
384384
if records:
385-
client.batch(index_name=index_name, batch_write_params={
386-
"requests": [{"action": "addObject", "body": record} for record in records],
387-
})
385+
client.save_objects(index_name, records)
388386
print(f"Successfully sent {len(records)} records to Algolia.")
389387
else:
390388
print("No records to send to Algolia.")
@@ -449,13 +447,7 @@ def main(base_directory, algolia_app_id, algolia_api_key, algolia_index_name,
449447
print(f'total {'processed' if dry_run else 'indexed'} {t} records')
450448
if not dry_run:
451449
print('switching temporary index...', end='')
452-
client.operation_index(
453-
index_name=temp_index_name,
454-
operation_index_params={
455-
"operation": "move",
456-
"destination": algolia_index_name
457-
},
458-
)
450+
client.operation_index(temp_index_name, {"operation": "move", "destination": algolia_index_name})
459451
print('done')
460452

461453

scripts/search/requirements.txt

Lines changed: 37 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,37 @@
1-
aiohappyeyeballs==2.4.4
2-
aiohttp==3.12.14
3-
aiosignal==1.3.2
4-
algoliasearch==4.12.0
5-
annotated-types==0.7.0
6-
async-timeout==5.0.1
7-
attrs==24.3.0
8-
certifi==2024.12.14
9-
charset-normalizer==3.4.1
10-
Deprecated==1.2.15
11-
frozenlist==1.5.0
12-
idna==3.10
13-
jaconv==0.4.0
14-
Markdown==3.7
15-
multidict==6.1.0
16-
networkx==3.4.2
17-
numpy==2.2.2
18-
propcache==0.2.1
19-
pydantic==2.10.5
20-
pydantic_core==2.27.2
21-
pykakasi==2.3.0
22-
python-dateutil==2.9.0.post0
23-
python-slugify==8.0.4
24-
PyYAML==6.0.2
25-
remember==0.1
26-
requests==2.32.4
27-
ruamel.yaml==0.18.10
28-
ruamel.yaml.clib==0.2.12
29-
scipy==1.15.1
30-
six==1.17.0
31-
slugger==0.2.2
32-
text-unidecode==1.3
33-
typing_extensions==4.12.2
34-
Unihandecode==0.81
35-
urllib3==2.5.0
36-
wrapt==1.17.2
37-
yarl==1.18.3
1+
aiohappyeyeballs
2+
aiohttp
3+
aiosignal
4+
algoliasearch>=4.25.0
5+
annotated-types
6+
async-timeout
7+
attrs
8+
certifi
9+
charset-normalizer
10+
Deprecated
11+
frozenlist
12+
idna
13+
jaconv
14+
Markdown
15+
multidict
16+
networkx
17+
numpy
18+
propcache
19+
pydantic
20+
pydantic_core
21+
pykakasi
22+
python-dateutil
23+
python-slugify
24+
PyYAML
25+
remember
26+
requests
27+
ruamel.yaml
28+
ruamel.yaml.clib
29+
scipy
30+
six
31+
slugger
32+
text-unidecode
33+
typing_extensions
34+
Unihandecode
35+
urllib3
36+
wrapt
37+
yarl

scripts/search/results.csv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ sum,https://clickhouse.com/docs/sql-reference/aggregate-functions/reference/sum,
8282
keeper,https://clickhouse.com/docs/guides/sre/keeper/clickhouse-keeper,https://clickhouse.com/docs/knowledgebase/why_recommend_clickhouse_keeper_over_zookeeper,
8383
type,https://clickhouse.com/docs/sql-reference/data-types,https://clickhouse.com/docs/sql-reference/functions/type-conversion-functions,
8484
nullable,https://clickhouse.com/docs/sql-reference/data-types/nullable,https://clickhouse.com/docs/cloud/bestpractices/avoid-nullable-columns,https://clickhouse.com/docs/sql-reference/functions/functions-for-nulls
85-
projection,https://clickhouse.com/docs/sql-reference/statements/alter/projection,https://clickhouse.com/docs/engines/table-engines/mergetree-family/mergetree#projections,https://clickhouse.com/docs/knowledgebase/projection_example
85+
projection,https://clickhouse.com/docs/data-modeling/projections,https://clickhouse.com/docs/sql-reference/statements/alter/projection,https://clickhouse.com/docs/engines/table-engines/mergetree-family/mergetree#projections
8686
jdbc,https://clickhouse.com/docs/interfaces/jdbc,https://clickhouse.com/docs/integrations/language-clients/java/jdbc,https://clickhouse.com/docs/engines/table-engines/integrations/jdbc
8787
ifnull,https://clickhouse.com/docs/sql-reference/functions/functions-for-nulls#ifnull,https://clickhouse.com/docs/sql-reference/functions/conditional-functions,
8888
any,https://clickhouse.com/docs/sql-reference/aggregate-functions/reference/any,https://clickhouse.com/docs/sql-reference/aggregate-functions/reference/first_value,

0 commit comments

Comments
 (0)