Skip to content

Commit 075c538

Browse files
committed
Only skip known indexing errors
Why these changes are being introduced: Previously when bulk-indexing returned an error from Opensearch, we logged the error, skipped the record, and continued indexing. We still want to do that for the known error around mapper parsing (since that is a data quality/transformation issue, not an Opensearch issue) but for all other errors we want to stop the process with an exception for further investigation. How this addresses that need: * Adds BulkIndexingError to errors module. * During bulk indexing, checks for the known error we want to skip and otherwise raises the new exception. * Updates tests and fixtures to reflect changes. * Minor documentation update in cli module. Side effects of this change: If we hit any errors other than the occasional expected mapping error, the Step Function will fail. This is good.
1 parent bf1e59b commit 075c538

File tree

7 files changed

+355
-85
lines changed

7 files changed

+355
-85
lines changed
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
interactions:
2+
- request:
3+
body:
4+
"{\"index\":{\"_id\":\"mit:alma:990026671500206761\",\"_index\":\"test-index\"}}\n{\"alternate_titles\":[{\"kind\":\"Alternate
5+
title\",\"value\":\"Best of Paquito D'Rivera\"}],\"call_numbers\":[\"781.657\"],\"citation\":\"D'Rivera,
6+
Paquito et al. 2008. Portraits of Cuba\",\"content_type\":[\"Sound recording\"],\"contents\":[\"Chucho
7+
-- Havana cafe -- The peanut vendor -- A night in Tunisia -- Mambo a la Kenton
8+
-- Echale salsita -- Drume negrita -- Tropicana nights -- Who's smoking -- Tico
9+
tico -- Portraits of Cuba -- Excerpt from Aires tropicales -- What are you doing
10+
tomorrow night -- A mi que/El manisero.\"],\"contributors\":[{\"kind\":\"author\",\"value\":\"D'Rivera,
11+
Paquito, 1948-\"},{\"kind\":\"contributor\",\"value\":\"D'Rivera, Paquito, 1948-\"},{\"kind\":\"contributor\",\"value\":\"P\xE9rez,
12+
Danilo.\"},{\"kind\":\"contributor\",\"value\":\"Gilbert, Wolfe.\"},{\"kind\":\"contributor\",\"value\":\"Gillespie,
13+
Dizzy, 1917-1993.\"},{\"kind\":\"contributor\",\"value\":\"P\xE9rez Prado, 1916-1989.\"},{\"kind\":\"contributor\",\"value\":\"Pi\xF1eiro,
14+
Ignacio, 1888-1969.\"},{\"kind\":\"contributor\",\"value\":\"Grenet, Ernesto
15+
Wood.\"},{\"kind\":\"contributor\",\"value\":\"Roditi, Claudio.\"},{\"kind\":\"contributor\",\"value\":\"Abreu,
16+
Zequinha de, 1880-1935.\"},{\"kind\":\"contributor\",\"value\":\"Godoy, Lucio.\"},{\"kind\":\"contributor\",\"value\":\"Hern\xE1ndez,
17+
Rafael.\"}],\"dates\":[{\"kind\":\"Date of publication\",\"value\":\"this isn't
18+
a date\"}],\"identifiers\":[{\"kind\":\"oclc\",\"value\":\"811549562\"}],\"languages\":[\"No
19+
linguistic content\"],\"links\":[{\"kind\":\"Digital object link\",\"text\":\"Naxos
20+
Music Library\",\"url\":\"http://BLCMIT.NaxosMusicLibrary.com/catalogue/item.asp?cid=JD-342\"}],\"locations\":[{\"kind\":\"Place
21+
of publication\",\"value\":\"New York (State)\"}],\"notes\":[{\"value\":[\"Paquito
22+
d' Rivera, saxophone ; Paquito d' Rivera, soprano saxophone.\",\"Description
23+
based on hard copy version record.\"]}],\"physical_description\":\"1 online
24+
resource (1 sound file)\",\"publication_information\":[\"[New York, N.Y.] :
25+
Chesky Records, p2008.\"],\"source\":\"MIT Alma\",\"source_link\":\"https://mit.primo.exlibrisgroup.com/discovery/fulldisplay?vid=01MIT_INST:MIT&docid=alma990026671500206761\",\"subjects\":[{\"value\":[\"Jazz.\",\"Latin
26+
jazz.\",\"Clarinet music (Jazz)\",\"Saxophone music (Jazz)\"]}],\"timdex_record_id\":\"mit:alma:990026671500206761\",\"title\":\"Spice
27+
it up! the best of Paquito D'Rivera.\"}\n"
28+
headers:
29+
Content-Length:
30+
- "2261"
31+
content-type:
32+
- application/json
33+
user-agent:
34+
- opensearch-py/2.2.0 (Python 3.11.2)
35+
method: POST
36+
uri: http://localhost:9200/_bulk
37+
response:
38+
body:
39+
string:
40+
'{"took":9,"errors":true,"items":[{"index":{"_index":"test-index","_id":"mit:alma:990026671500206761","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed
41+
to parse field [dates.value.as_date] of type [date] in document with id ''mit:alma:990026671500206761''.
42+
Preview of field''s value: ''this isn''t a date''","caused_by":{"type":"illegal_argument_exception","reason":"failed
43+
to parse date field [this isn''t a date] with format [strict_year||strict_year_month||date_optional_time||date||basic_date||yyyy/MM/dd||MM/dd/yyyy||MM/dd/yy||M/d/yyyy||M/d/yy]","caused_by":{"type":"date_time_parse_exception","reason":"Failed
44+
to parse with all enclosed parsers"}}}}}]}'
45+
headers:
46+
content-length:
47+
- "687"
48+
content-type:
49+
- application/json; charset=UTF-8
50+
status:
51+
code: 200
52+
message: OK
53+
- request:
54+
body: null
55+
headers:
56+
Content-Length:
57+
- "0"
58+
content-type:
59+
- application/json
60+
user-agent:
61+
- opensearch-py/2.2.0 (Python 3.11.2)
62+
method: POST
63+
uri: http://localhost:9200/test-index/_refresh
64+
response:
65+
body:
66+
string: '{"_shards":{"total":2,"successful":1,"failed":0}}'
67+
headers:
68+
content-length:
69+
- "49"
70+
content-type:
71+
- application/json; charset=UTF-8
72+
status:
73+
code: 200
74+
message: OK
75+
version: 1

0 commit comments

Comments
 (0)