|
2 | 2 |
|
3 | 3 | Natural language for human and machine. |
4 | 4 |
|
5 | | ---- |
| 5 | +**NLCST** discloses the parts of natural language as a concrete syntax |
| 6 | +tree. Concrete means all information is stored in this tree and an |
| 7 | +exact replica of the original document can be re-created. |
6 | 8 |
|
7 | | -> Note: Several projects use this document. Do not make changes without consulting with [TextOM](https://github.com/wooorm/textom), [parse-latin](https://github.com/wooorm/parse-latin), and [retext](https://github.com/wooorm/retext). |
| 9 | +**NLCST** is a subset of [**Unist**][unist], and implemented by |
| 10 | +[**retext**][retext]. |
8 | 11 |
|
9 | | -## CST |
10 | | - |
11 | | -### Node |
| 12 | +## Table of Contents |
12 | 13 |
|
13 | | -Node represents any unit in NLCST hierarchy. |
| 14 | +- [CST](#cst) |
14 | 15 |
|
15 | | -``` |
16 | | -interface Node { |
17 | | - type: string; |
18 | | - data: Data | null; |
19 | | -} |
20 | | -``` |
| 16 | + - [Root](#root) |
| 17 | + - [Paragraph](#paragraph) |
| 18 | + - [Sentence](#sentence) |
| 19 | + - [Word](#word) |
| 20 | + - [Symbol](#symbol) |
| 21 | + - [Punctuation](#punctuation) |
| 22 | + - [WhiteSpace](#whitespace) |
| 23 | + - [Source](#source) |
| 24 | + - [TextNode](#textnode) |
21 | 25 |
|
22 | | -### Data |
| 26 | +- [List of Utilities](#list-of-utilities) |
23 | 27 |
|
24 | | -Data represents data associated with any node. Data is a scope for plug-ins to store any information. Its only limitation being that each property should by stringifyable: not throw when passed to `JSON.stringify()`. |
| 28 | +- [License](#license) |
25 | 29 |
|
26 | | -``` |
27 | | -interface Data { } |
28 | | -``` |
| 30 | +## CST |
29 | 31 |
|
30 | | -### Parent |
| 32 | +### `Root` |
31 | 33 |
|
32 | | -Parent ([Node](#node)) represents a unit in NLCST hierarchy which can have zero or more children. |
| 34 | +`Root` ([`Parent`][parent]) houses all nodes. |
33 | 35 |
|
34 | | -``` |
35 | | -interface Parent <: Node { |
36 | | - children: []; |
| 36 | +```idl |
| 37 | +interface Root <: Parent { |
| 38 | + type: "RootNode"; |
37 | 39 | } |
38 | 40 | ``` |
39 | 41 |
|
40 | | -### Text |
| 42 | +### `Paragraph` |
41 | 43 |
|
42 | | -Text ([Node](#node)) represents a unit in NLCST hierarchy which has value. |
| 44 | +`Paragraph` ([`Parent`][parent]) represents a self-contained unit of |
| 45 | +discourse in writing dealing with a particular point or idea. |
43 | 46 |
|
44 | | -``` |
45 | | -interface Text <: Node { |
46 | | - value: string; |
| 47 | +```idl |
| 48 | +interface Paragraph <: Parent { |
| 49 | + type: "ParagraphNode"; |
47 | 50 | } |
48 | 51 | ``` |
49 | 52 |
|
50 | | -### RootNode |
| 53 | +### `Sentence` |
51 | 54 |
|
52 | | -Root ([Parent](#parent)) represents a document. |
| 55 | +`Sentence` ([`Parent`][parent]) represents grouping of grammatically |
| 56 | +linked words, that in principle tells a complete thought, although it |
| 57 | +may make little sense taken in isolation out of context. |
53 | 58 |
|
54 | | -``` |
55 | | -interface RootNode < Parent { |
56 | | - type: "RootNode"; |
| 59 | +```idl |
| 60 | +interface Sentence <: Parent { |
| 61 | + type: "SentenceNode"; |
57 | 62 | } |
58 | 63 | ``` |
59 | 64 |
|
60 | | -### ParagraphNode |
| 65 | +### `Word` |
61 | 66 |
|
62 | | -Paragraph ([Parent](#parent)) represents a self-contained unit of discourse in writing dealing with a particular point or idea. |
| 67 | +`Word` ([`Parent`][parent]) represents the smallest element that may |
| 68 | +be uttered in isolation with semantic or pragmatic content. |
63 | 69 |
|
64 | | -``` |
65 | | -interface ParagraphNode < Parent { |
66 | | - type: "ParagraphNode"; |
| 70 | +```idl |
| 71 | +interface Word <: Parent { |
| 72 | + type: "WordNode"; |
67 | 73 | } |
68 | 74 | ``` |
69 | 75 |
|
70 | | -### SentenceNode |
| 76 | +### `Symbol` |
71 | 77 |
|
72 | | -Sentence ([Parent](#parent)) represents grouping of grammatically linked words, that in principle tells a complete thought, although it may make little sense taken in isolation out of context. |
| 78 | +`Symbol` ([`Text`][text]) represents typographical devices like |
| 79 | +white space, punctuation, signs, and more, different from characters |
| 80 | +which represent sounds (like letters and numerals). |
73 | 81 |
|
74 | | -``` |
75 | | -interface SentenceNode < Parent { |
76 | | - type: "SentenceNode"; |
| 82 | +```idl |
| 83 | +interface Symbol <: Text { |
| 84 | + type: "SymbolNode"; |
77 | 85 | } |
78 | 86 | ``` |
79 | 87 |
|
80 | | -### WordNode |
| 88 | +### `Punctuation` |
81 | 89 |
|
82 | | -Word ([Parent](#parent)) represents the smallest element that may be uttered in isolation with semantic or pragmatic content. |
| 90 | +`Punctuation` ([`Symbol`][symbol]) represents typographical devices |
| 91 | +which aid understanding and correct reading of other grammatical |
| 92 | +units. |
83 | 93 |
|
84 | | -``` |
85 | | -interface WordNode < Parent { |
86 | | - type: "WordNode"; |
| 94 | +```idl |
| 95 | +interface Punctuation <: Symbol { |
| 96 | + type: "PunctuationNode"; |
87 | 97 | } |
88 | 98 | ``` |
89 | 99 |
|
90 | | -### SymbolNode |
| 100 | +### `WhiteSpace` |
91 | 101 |
|
92 | | -Symbol ([Text](#text)) represents typographical devices like white space, punctuation, signs, and more, different from characers which represent sounds (like letters and numerals). |
| 102 | +`WhiteSpace` ([`Symbol`][symbol]) represents typographical devices |
| 103 | +devoid of content, separating other grammatical units. |
93 | 104 |
|
94 | | -``` |
95 | | -interface SymbolNode < Text { |
96 | | - type: "SymbolNode"; |
| 105 | +```idl |
| 106 | +interface WhiteSpace <: Symbol { |
| 107 | + type: "WhiteSpaceNode"; |
97 | 108 | } |
98 | 109 | ``` |
99 | 110 |
|
100 | | -### PunctuationNode |
| 111 | +### `Source` |
101 | 112 |
|
102 | | -Punctuation ([SymbolNode](#symbolnode)) represents typographical devices which aid understanding and correct reading of other grammatical units. |
| 113 | +`Source` ([`Text`][text]) represents an external (ungrammatical) value |
| 114 | +embedded into a grammatical unit: a hyperlink, a line, and such. |
103 | 115 |
|
104 | | -``` |
105 | | -interface PunctuationNode < SymbolNode { |
106 | | - type: "PunctuationNode"; |
| 116 | +```idl |
| 117 | +interface Source <: Symbol { |
| 118 | + type: "SourceNode"; |
107 | 119 | } |
108 | 120 | ``` |
109 | 121 |
|
110 | | -### WhiteSpaceNode |
| 122 | +### `TextNode` |
111 | 123 |
|
112 | | -White Space ([SymbolNode](#symbolnode)) represents typographical devices devoid of content, separating other grammatical units. |
| 124 | +`TextNode` ([`Text`][text]) represents actual content in an NLCST |
| 125 | +document: one or more characters. Note that its `type` property |
| 126 | +is `TextNode`, but it is different from the asbtract [`Text`][text] |
| 127 | +interface. |
113 | 128 |
|
114 | | -``` |
115 | | -interface WhiteSpaceNode < SymbolNode { |
116 | | - type: "WhiteSpaceNode"; |
| 129 | +```idl |
| 130 | +interface TextNode < Text { |
| 131 | + type: "TextNode"; |
117 | 132 | } |
118 | 133 | ``` |
119 | 134 |
|
120 | | -### SourceNode |
| 135 | +## List of Utilities |
121 | 136 |
|
122 | | -Source ([Text](#text)) represents an external (ungrammatical) value embedded into a grammatical unit: a hyperlink, a line, and such. |
| 137 | +<!--lint disable list-item-spacing--> |
123 | 138 |
|
124 | | -``` |
125 | | -interface SourceNode < Text { |
126 | | - type: "SourceNode"; |
127 | | -} |
128 | | -``` |
| 139 | +- [`wooorm/nlcst-is-literal`](https://github.com/wooorm/nlcst-is-literal) |
| 140 | + — Check whether a node is meant literally; |
| 141 | +- [`wooorm/nlcst-normalize`](https://github.com/wooorm/nlcst-normalize) |
| 142 | + — Normalize a word for easier comparison; |
| 143 | +- [`wooorm/nlcst-search`](https://github.com/wooorm/nlcst-search) |
| 144 | + — Search for patterns in an NLCST tree; |
| 145 | +- [`wooorm/nlcst-to-string`](https://github.com/wooorm/nlcst-to-string) |
| 146 | + — Stringify a node; |
| 147 | +- [`wooorm/nlcst-test`](https://github.com/wooorm/nlcst-test) |
| 148 | + — Validate a NLCST node; |
129 | 149 |
|
130 | | -### TextNode |
| 150 | +In addition, see [**Unist**][unist] for other utilities which |
| 151 | +work with **retext** nodes. |
131 | 152 |
|
132 | | -Text ([Text](#text)) represents actual content in an NLCST document: one or more characters. |
| 153 | +## License |
133 | 154 |
|
134 | | -``` |
135 | | -interface TextNode < Text { |
136 | | - type: "TextNode"; |
137 | | -} |
138 | | -``` |
| 155 | +MIT © Titus Wormer |
139 | 156 |
|
140 | | -## Related |
| 157 | +<!--Definitions--> |
141 | 158 |
|
142 | | -- [retext](https://github.com/wooorm/retext) — Analyse and Manipulate natural language, 20+ plug-ins. |
143 | | -- [parse-latin](https://github.com/wooorm/parse-latin) — Transforms latin-script natural language into a CST; |
144 | | -- [TextOM](https://github.com/wooorm/textom) — Provides an object-oriented manipulation interface to NLCST; |
145 | | -- [nlcst-to-string](https://github.com/wooorm/nlcst-to-string) — Transforms a CST into a string; |
146 | | -- [nlcst-to-textom](https://github.com/wooorm/nlcst-to-textom) — Transforms a CST into a [TextOM](https://github.com/wooorm/textom) object model; |
147 | | -- [nlcst-test](https://github.com/wooorm/nlcst-test) — Validate an NLCST node. |
| 159 | +[unist]: https://github.com/wooorm/unist |
148 | 160 |
|
149 | | -## License |
| 161 | +[retext]: https://github.com/wooorm/retext |
150 | 162 |
|
151 | | -MIT © Titus Wormer |
| 163 | +[parent]: https://github.com/wooorm/unist#parent |
| 164 | + |
| 165 | +[text]: https://github.com/wooorm/unist#text |
| 166 | + |
| 167 | +[symbol]: #symbol |
0 commit comments