You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why these changes are being introduced:
We were getting 429 error responses during bulk indexing and needed to
adjust configurations for the opensearchpy bulk helper. Although those
adjustments didn't solve the problem, having them configurable via ENV
variables is a better long-term solution than hard-coding them, so this
commit adds that option.
How this addresses that need:
* Adds a new config function to configure opensearch bulk settings,
reading from ENV variables if present with defaults if not.
* Adds tests for new function.
* Updates opensearch bulk_index function to configure settings using the
new function.
* Updates README to document new optional ENV variables.
Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-128
Copy file name to clipboardExpand all lines: README.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,9 @@ TIMDEX! Index Manager (TIM) is a Python cli application for managing TIMDEX inde
9
9
## Optional ENV
10
10
11
11
-`AWS_REGION` = Only needed if AWS region changes from the default of us-east-1.
12
-
-`OPENSEARCH_REQUEST_TIMEOUT` = Only used for OpenSearch requests that tend to take longer than the default timeout of 10 seconds, such as bulk or index refresh requests. Defaults to 30 seconds if not set.
12
+
-`OPENSEARCH_BULK_MAX_CHUNK_BYTES` = Chunk size limit for sending requests to the bulk indexing endpoint, in bytes. Defaults to 100 MB (the opensearchpy default) if not set.
13
+
-`OPENSEARCH_BULK_MAX_RETRIES` = Maximum number of retries when sending requests to the bulk indexing endpoint. Defaults to 8 if not set.
14
+
-`OPENSEARCH_REQUEST_TIMEOUT` = Only used for OpenSearch requests that tend to take longer than the default timeout of 10 seconds, such as bulk or index refresh requests. Defaults to 120 seconds if not set.
13
15
-`SENTRY_DSN` = If set to a valid Sentry DSN, enables Sentry exception monitoring. This is not needed for local development.
14
16
-`STATUS_UPDATE_INTERVAL` = The ingest process logs the # of records indexed every nth record (1000 by default). Set this env variable to any integer to change the frequency of logging status updates. Can be useful for development/debugging.
15
17
-`TIMDEX_OPENSEARCH_ENDPOINT` = If using a local Docker OpenSearch instance, this isn't needed. Otherwise set to OpenSearch instance endpoint _without_ the http scheme, e.g. `search-timdex-env-1234567890.us-east-1.es.amazonaws.com`. Can also be passed directly to the CLI via the `--url` option.
0 commit comments