Skip to content

Commit 6e45504

Browse files
update readme
1 parent d59f84b commit 6e45504

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,8 @@ The `sudachi_tokenizer` tokenizer tokenizes input texts using Sudachi.
102102
- A: The shortest units equivalent to the UniDic short unit
103103
- Ex) 選挙,管理,委員,会
104104
- discard\_punctuation: Select to discard punctuation or not. (bool, default: true)
105+
- allow\_empty\_morpheme: Allow output morpheme to have an empty span. (bool, default: false)
106+
- This happens when an input text contains a composite character (e.g. ㍿) and it is split into morphemes. If false (default), all split morphemes will contain the span of the character. If true, only the first morpheme will contain the span and the span of other morphemes can be empty.
105107
- settings\_path: Sudachi setting file path. The path may be absolute or relative; relative paths are resolved with respect to es\_config. (string, default: null)
106108
- resources\_path: Sudachi dictionary path. The path may be absolute or relative; relative paths are resolved with respect to es\_config. (string, default: null)
107109
- additional_settings: Describes a configuration JSON string for Sudachi. This JSON string will be merged into the default configuration. If this property is set, `settings_path` will be overridden.

0 commit comments

Comments
 (0)