Skip to content

Commit 52ddc64

Browse files
update synonym converter document
1 parent 9a22d2b commit 52ddc64

File tree

1 file changed

+3
-4
lines changed

1 file changed

+3
-4
lines changed

docs/synonym.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,16 @@ You can partially make use of the Sudachi synonym resource's detailed informatio
3030

3131
### Punctuation Symbols
3232

33-
You may need to remove certain synonym words such as `` and `` when you use the analyzer with setting `"discard_punctuation": true` (Otherwise you will be get an error, e.g., `"term: € was completely eliminated by analyzer"`). Alternatively, you can set `"lenient": true` for the synonym filter to ignore the exceptions.
33+
You may need to remove certain synonym words such as `` and `` when you use the analyzer with setting `"discard_punctuation": true` (Otherwise you will be get an error, e.g., `"term: € was completely eliminated by analyzer"`). If you are using [ssyn2es.py](./ssyn2es.py), use `--discard-punctuation` option to skip those words. Alternatively, you can set `"lenient": true` for the synonym filter to ignore the exceptions.
3434

35-
These symbols are defined as punctuations; See [SudachiTokenizer.java](https://github.com/WorksApplications/elasticsearch-sudachi/blob/develop/src/main/java/com/worksap/nlp/lucene/sudachi/ja/SudachiTokenizer.java#L140) for the detail.
35+
These symbols are defined as punctuations; See [Strings.java](https://github.com/WorksApplications/elasticsearch-sudachi/blob/develop/src/main/java/com/worksap/nlp/lucene/sudachi/ja/util/Strings.java) for the detail.
3636

3737

3838
## Synonym Filter
3939

4040
You can use the converted Solr format file with Elasticsearch's default synonym filters, [Synonym token filter](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html) or [Synonym graph filter](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-graph-tokenfilter.html).
4141

42+
As `sudachi_split` filter produces token graph, you *cannot* use it with synonym filter.
4243

4344
### Example: Set up
4445

@@ -73,8 +74,6 @@ You can use the converted Solr format file with Elasticsearch's default synonym
7374

7475
Here we assume that the converted synonym file is placed as `$ES_PATH_CONF/sudachi/synonym.txt`.
7576

76-
If you would like to use `sudachi_split` filter, set it *after* the synonym filter (otherwise you will get an error, e.g., `term: 不明確 analyzed to a token (不) with position increment != 1 (got: 0)`).
77-
7877

7978
### Example: Analysis
8079

0 commit comments

Comments
 (0)