Skip to content

Commit b98e476

Browse files
Update README
1 parent 8bc25c4 commit b98e476

File tree

2 files changed

+134
-38
lines changed

2 files changed

+134
-38
lines changed

README.md

Lines changed: 133 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,59 +1,158 @@
11
# timdex-index-manager (tim)
22

3-
TIMDEX! Index Manager (TIM) is a Python cli application for managing TIMDEX indexes in OpenSearch.
4-
5-
## Required ENV
6-
7-
- `WORKSPACE` = Set to `dev` for local development, this will be set to `stage` and `prod` in those environments by Terraform.
8-
9-
## Optional ENV
10-
11-
- `AWS_REGION` = Only needed if AWS region changes from the default of us-east-1.
12-
- `OPENSEARCH_BULK_MAX_CHUNK_BYTES` = Chunk size limit for sending requests to the bulk indexing endpoint, in bytes. Defaults to 100 MB (the opensearchpy default) if not set.
13-
- `OPENSEARCH_BULK_MAX_RETRIES` = Maximum number of retries when sending requests to the bulk indexing endpoint. Defaults to 8 if not set.
14-
- `OPENSEARCH_REQUEST_TIMEOUT` = Only used for OpenSearch requests that tend to take longer than the default timeout of 10 seconds, such as bulk or index refresh requests. Defaults to 120 seconds if not set.
15-
- `SENTRY_DSN` = If set to a valid Sentry DSN, enables Sentry exception monitoring. This is not needed for local development.
16-
- `STATUS_UPDATE_INTERVAL` = The ingest process logs the # of records indexed every nth record (1000 by default). Set this env variable to any integer to change the frequency of logging status updates. Can be useful for development/debugging.
17-
- `TIMDEX_OPENSEARCH_ENDPOINT` = If using a local Docker OpenSearch instance, this isn't needed. Otherwise set to OpenSearch instance endpoint _without_ the http scheme, e.g. `search-timdex-env-1234567890.us-east-1.es.amazonaws.com`. Can also be passed directly to the CLI via the `--url` option.
3+
TIMDEX! Index Manager (TIM) is a Python CLI application for managing TIMDEX indices in OpenSearch.
184

195
## Development
206

7+
- To preview a list of available Makefile commands: `make help`
218
- To install with dev dependencies: `make install`
229
- To update dependencies: `make update`
2310
- To run unit tests: `make test`
2411
- To lint the repo: `make lint`
2512
- To run the app: `pipenv run tim --help`
2613

27-
### Local OpenSearch with Docker
14+
**Important note:** The sections that follow provide instructions for running OpenSearch **locally with Docker**. These instructions are useful for testing. Please make sure the environment variable `TIMDEX_OPENSEARCH_ENDPOINT` is **not** set before proceeding.
2815

29-
A local OpenSearch instance can be started for development purposes by running:
16+
### Running OpenSearch locally with Docker
3017

31-
``` bash
32-
$ docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" \
33-
-e "plugins.security.disabled=true" \
34-
opensearchproject/opensearch:2.11.1
35-
```
18+
1. Run the following command:
3619

37-
To confirm the instance is up, run `pipenv run tim -u localhost ping`.
20+
``` bash
21+
docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" \
22+
-e "plugins.security.disabled=true" \
23+
opensearchproject/opensearch:2.11.1
24+
```
3825

39-
Alternately, you can use the included Docker Compose file to start an OpenSearch node along with an OpenSearch Dashboard. This should leave you with the same
26+
2. To confirm the instance is up, run `pipenv run tim -u localhost ping` or visit http://localhost:9200/. This should produce a log that looks like the following:
27+
```
28+
2024-02-08 13:22:16,826 INFO tim.cli.main(): OpenSearch client configured for endpoint 'localhost'
4029

41-
```bash
42-
docker pull opensearchproject/opensearch:latest
43-
docker pull opensearchproject/opensearch-dashboards:latest
44-
docker compose up
45-
```
30+
Name: docker-cluster
31+
UUID: RVCmwQ_LQEuh1GrtwGnRMw
32+
OpenSearch version: 2.11.1
33+
Lucene version: 9.7.0
34+
35+
2024-02-08 13:22:16,930 INFO tim.cli.log_process_time(): Total time to complete process: 0:00:00.105506
36+
```
37+
38+
### Running Opensearch and OpenSearch Dashboards locally with Docker
39+
40+
You can use the included Docker Compose file ([compose.yaml](compose.yaml)) to start an OpenSearch instance along with OpenSearch Dashboards, "[the user interface that lets you visualize your Opensearch data and run and scale your OpenSearch clusters](https://opensearch.org/docs/latest/dashboards/)". Two tools that are useful for exploring indices are [DevTools](https://opensearch.org/docs/latest/dashboards/dev-tools/index-dev/) and [Discover](https://opensearch.org/docs/latest/dashboards/discover/index-discover/).
41+
42+
**Note:** To use Discover, you'll need to create an index pattern. When creating the index pattern, decline the option to set a date field. When set, it detects a date field in our indices but then crashes trying to use it. When prompted, enter an index or alias to pull patterns from, and it will automatically be configured to work well enough for initial data exploration.
4643
47-
To confirm the instance is up, run `pipenv run tim -u localhost ping`.
44+
1. Run the following command:
45+
```bash
46+
docker pull opensearchproject/opensearch:latest
47+
docker pull opensearchproject/opensearch-dashboards:latest
48+
docker compose up
49+
```
4850
49-
To access the Dashboard, access <http://localhost:5601>.
51+
2. To confirm the instance is up, run `pipenv run tim -u localhost ping` or visit http://localhost:9200/.
5052
51-
DevTools is useful for writing/testing OpenSearch queries.
53+
3. Access OpenSearch Dashboards through <http://localhost:5601>.
5254
53-
Discover is useful for browsing data. An index pattern will be required to use this tool. Note: do not set a date filed (choose the option to skip selecting a date field). It detects a date field in our indexes but then crashes trying to use it. Once you skip the data select field, just enter an index or alias to pull patterns from and it will automatically be configured to work well enough for initial data exploration.
55+
For a more detailed example with test data, please refer to the Confluence document: [How to run and query OpenSearch locally](https://mitlibraries.atlassian.net/wiki/spaces/D/pages/3586129972/How+to+run+and+query+OpenSearch+locally).
5456
55-
### OpenSearch on AWS
57+
### Index records into local OpenSearch Docker container
58+
59+
1. Follow the instructions in either [Running Opensearch locally with Docker](#running-opensearch-locally-with-docker) or [Running Opensearch and OpenSearch Dashboards locally with Docker](#running-opensearch-and-opensearch-dashboards-locally-with-docker).
60+
61+
2. Open a new terminal, and create a new index. Copy the name of the created index printed to the terminal's output.
62+
```
63+
pipenv run tim create -s <index-name>
64+
```
65+
66+
3. Copy the index name and promote the index to the alias.
67+
68+
```
69+
pipenv run tim promote -a <source-name> -i <index-name>
70+
```
71+
72+
4. Bulk index records from a specified directory (e.g., including S3).
73+
```
74+
pipenv run tim bulk-index -s <source-name> <filepath-to-records>
75+
```
76+
77+
5. After verifying that the bulk-index was successful, clean up your local OpenSearch instance by deleting the index.
78+
```
79+
pipenv run tim delete -i <index-name>
80+
```
81+
82+
### Running OpenSearch on AWS
5683

5784
1. Ensure that you have the correct AWS credentials set for the Dev1 (or desired) account.
85+
5886
2. Set the `TIMDEX_OPENSEARCH_ENDPOINT` variable in your .env to match the Dev1 (or desired) TIMDEX OpenSearch endpoint (note: do not include the http scheme prefix).
87+
5988
3. Run `pipenv run tim ping` to confirm the client is connected to the expected TIMDEX OpenSearch instance.
89+
90+
91+
## Environment Variables
92+
93+
### Required ENV
94+
95+
```
96+
# Set to `dev` for local development, this will be set to `stage` and `prod` in those environments by Terraform.
97+
WORKSPACE=dev
98+
```
99+
100+
## Optional ENV
101+
102+
```
103+
# Only needed if AWS region changes from the default of us-east-1.
104+
AWS_REGION=
105+
106+
# Chunk size limit for sending requests to the bulk indexing endpoint, in bytes. Defaults to 104857600 (100 * 1024 * 1024) if not set.
107+
OPENSEARCH_BULK_MAX_CHUNK_BYTES=
108+
109+
# Maximum number of retries when sending requests to the bulk indexing endpoint. Defaults to 50 if not set.
110+
OPENSEARCH_BULK_MAX_RETRIES=
111+
112+
# Only used for OpenSearch requests that tend to take longer than the default timeout of 10 seconds, such as bulk or index refresh requests. Defaults to 120 seconds if not set.
113+
OPENSEARCH_REQUEST_TIMEOUT=
114+
115+
# The ingest process logs the # of records indexed every nth record. Set this env variable to any integer to change the frequency of logging status updates. Can be useful for development/debugging. Defaults to 1000 if not set.
116+
STATUS_UPDATE_INTERVAL=
117+
118+
# If using a local Docker OpenSearch instance, this isn't needed. Otherwise set to OpenSearch instance endpoint without the http schem (e.g., "search-timdex-env-1234567890.us-east-1.es.amazonaws.com"). Can also be passed directly to the CLI via the `--url` option.
119+
TIMDEX_OPENSEARCH_ENDPOINT=
120+
121+
# If set to a valid Sentry DSN, enables Sentry exception monitoring This is not needed for local development.
122+
SENTRY_DSN=
123+
```
124+
125+
## CLI commands
126+
127+
All CLI commands can be run with `pipenv run`.
128+
129+
```
130+
Usage: tim [OPTIONS] COMMAND [ARGS]...
131+
132+
TIM provides commands for interacting with OpenSearch indexes.
133+
For more details on a specific command, run tim COMMAND -h.
134+
135+
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
136+
│ --url -u TEXT The OpenSearch instance endpoint minus the http scheme, e.g. │
137+
│ 'search-timdex-env-1234567890.us-east-1.es.amazonaws.com'. If not provided, will attempt to get from the │
138+
│ TIMDEX_OPENSEARCH_ENDPOINT environment variable. Defaults to 'localhost'. │
139+
│ --verbose -v Pass to log at debug level instead of info │
140+
│ --help -h Show this message and exit. │
141+
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
142+
╭─ Get cluster-level information ─────────────────────────────────────────────────────────────────────────────────────────────────────╮
143+
│ ping Ping OpenSearch and display information about the cluster. │
144+
│ indexes Display summary information about all indexes in the cluster. │
145+
│ aliases List OpenSearch aliases and their associated indexes. │
146+
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
147+
╭─ Index management commands ─────────────────────────────────────────────────────────────────────────────────────────────────────────╮
148+
│ create Create a new index in the cluster. │
149+
│ delete Delete an index. │
150+
│ promote Promote index as the primary alias and add it to any additional provided aliases. │
151+
│ demote Demote an index from all its associated aliases. │
152+
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
153+
╭─ Bulk record processing commands ───────────────────────────────────────────────────────────────────────────────────────────────────╮
154+
│ bulk-index Bulk index records into an index. │
155+
│ bulk-delete Bulk delete records from an index. │
156+
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
157+
```
158+

tests/fixtures/sample_records.json

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -617,10 +617,7 @@
617617
"literary_form": "nonfiction",
618618
"locations": [
619619
{
620-
"geopoint": [
621-
-77.025955,
622-
38.942142
623-
],
620+
"geoshape": "BBOX (-77.11806895668957,-76.90988990509905, 38.99435963428633, 38.79162154730547)",
624621
"kind": "Place of publication",
625622
"value": "District of Columbia"
626623
}

0 commit comments

Comments
 (0)