Skip to content

Commit 6204674

Browse files
authored
‼️ BREAKING: Change Token.attrs to a dict (#144)
Instead of storing `attrs` as `[["key1", "value1"], ["key2", "value2"]]`, use `{"key1": "value1", "key2": "value2"}`. Upstream the list format is only used to guarantee order: markdown-it/markdown-it#142, but in Python 3.7+ dictionary order is now guaranteed by the specification (in Python 3.6 it is also preserved as an implementation detail). This change improves typing and performance. One should anyhow generally use the `attrGet`, `attrSet`, `attrPush` and `attrJoin` methods to manipulate `Token.attrs`, which all have an identical signature to those upstream. To minimize how breaking this change is, auto-conversion is done on `Token` initiation, i.e. you can still use `Token("type", "tag", 0, attrs=[["key", "value"]])`, and also `Token.as_dict(as_upstream=True)` converts the dict back to `null`/`list`, o that they can still be directly compared to those produced in the `debug` tab of https://markdown-it.github.io/. The `meta_serializer` option has also been added to `Token.as_dict`, which now ensures that this method is always able to produce valid JSON.
1 parent 00a28a6 commit 6204674

File tree

18 files changed

+428
-408
lines changed

18 files changed

+428
-408
lines changed

.mypy.ini

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
[mypy]
2+
show_error_codes = True
23
warn_unused_ignores = True
34
warn_redundant_casts = True
45
no_implicit_optional = True

docs/architecture.md

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -118,10 +118,10 @@ vimeoRE = re.compile(r'^https?:\/\/(www\.)?vimeo.com\/(\d+)($|\/)')
118118

119119
def render_vimeo(self, tokens, idx, options, env):
120120
token = tokens[idx]
121-
aIndex = token.attrIndex('src')
122-
if (vimeoRE.match(token.attrs[aIndex][1])):
123121

124-
ident = vimeoRE.match(token.attrs[aIndex][1])[2]
122+
if vimeoRE.match(token.attrs["src"]):
123+
124+
ident = vimeoRE.match(token.attrs["src"])[2]
125125

126126
return ('<div class="embed-responsive embed-responsive-16by9">\n' +
127127
' <iframe class="embed-responsive-item" src="//player.vimeo.com/video/' +
@@ -140,11 +140,7 @@ Here is another example, how to add `target="_blank"` to all links:
140140
from markdown_it import MarkdownIt
141141

142142
def render_blank_link(self, tokens, idx, options, env):
143-
aIndex = tokens[idx].attrIndex('target')
144-
if (aIndex < 0):
145-
tokens[idx].attrPush(['target', '_blank']) # add new attribute
146-
else:
147-
tokens[idx].attrs[aIndex][1] = '_blank' # replace value of existing attr
143+
tokens[idx].attrSet("target", "_blank")
148144

149145
# pass token to default renderer.
150146
return self.renderToken(tokens, idx, options, env)

docs/using.md

Lines changed: 34 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -27,17 +27,17 @@ then these are converted to other formats using 'renderers'.
2727

2828
The simplest way to understand how text will be parsed is using:
2929

30-
```{code-cell}
30+
```{code-cell} python
3131
from pprint import pprint
3232
from markdown_it import MarkdownIt
3333
```
3434

35-
```{code-cell}
35+
```{code-cell} python
3636
md = MarkdownIt()
3737
md.render("some *text*")
3838
```
3939

40-
```{code-cell}
40+
```{code-cell} python
4141
for token in md.parse("some *text*"):
4242
print(token)
4343
print()
@@ -59,48 +59,48 @@ You can define this configuration *via* directly supplying a dictionary or a pre
5959
Compared to `commonmark`, it enables the table, strikethrough and linkify components.
6060
**Important**, to use this configuration you must have `linkify-it-py` installed.
6161

62-
```{code-cell}
62+
```{code-cell} python
6363
from markdown_it.presets import zero
6464
zero.make()
6565
```
6666

67-
```{code-cell}
67+
```{code-cell} python
6868
md = MarkdownIt("zero")
6969
md.options
7070
```
7171

7272
You can also override specific options:
7373

74-
```{code-cell}
74+
```{code-cell} python
7575
md = MarkdownIt("zero", {"maxNesting": 99})
7676
md.options
7777
```
7878

79-
```{code-cell}
79+
```{code-cell} python
8080
pprint(md.get_active_rules())
8181
```
8282

8383
You can find all the parsing rules in the source code:
8484
`parser_core.py`, `parser_block.py`,
8585
`parser_inline.py`.
8686

87-
```{code-cell}
87+
```{code-cell} python
8888
pprint(md.get_all_rules())
8989
```
9090

9191
Any of the parsing rules can be enabled/disabled, and these methods are "chainable":
9292

93-
```{code-cell}
93+
```{code-cell} python
9494
md.render("- __*emphasise this*__")
9595
```
9696

97-
```{code-cell}
97+
```{code-cell} python
9898
md.enable(["list", "emphasis"]).render("- __*emphasise this*__")
9999
```
100100

101101
You can temporarily modify rules with the `reset_rules` context manager.
102102

103-
```{code-cell}
103+
```{code-cell} python
104104
with md.reset_rules():
105105
md.disable("emphasis")
106106
print(md.render("__*emphasise this*__"))
@@ -109,7 +109,7 @@ md.render("__*emphasise this*__")
109109

110110
Additionally `renderInline` runs the parser with all block syntax rules disabled.
111111

112-
```{code-cell}
112+
```{code-cell} python
113113
md.renderInline("__*emphasise this*__")
114114
```
115115

@@ -140,7 +140,7 @@ The `smartquotes` and `replacements` components are intended to improve typograp
140140

141141
Both of these components require typography to be turned on, as well as the components enabled:
142142

143-
```{code-cell}
143+
```{code-cell} python
144144
md = MarkdownIt("commonmark", {"typographer": True})
145145
md.enable(["replacements", "smartquotes"])
146146
md.render("'single quotes' (c)")
@@ -151,7 +151,7 @@ md.render("'single quotes' (c)")
151151
The `linkify` component requires that [linkify-it-py](https://github.com/tsutsu3/linkify-it-py) be installed (e.g. *via* `pip install markdown-it-py[linkify]`).
152152
This allows URI autolinks to be identified, without the need for enclosing in `<>` brackets:
153153

154-
```{code-cell}
154+
```{code-cell} python
155155
md = MarkdownIt("commonmark", {"linkify": True})
156156
md.enable(["linkify"])
157157
md.render("github.com")
@@ -161,7 +161,7 @@ md.render("github.com")
161161

162162
Plugins load collections of additional syntax rules and render methods into the parser
163163

164-
```{code-cell}
164+
```{code-cell} python
165165
from markdown_it import MarkdownIt
166166
from markdown_it.extensions.front_matter import front_matter_plugin
167167
from markdown_it.extensions.footnote import footnote_plugin
@@ -194,7 +194,7 @@ md.render(text)
194194

195195
Before rendering, the text is parsed to a flat token stream of block level syntax elements, with nesting defined by opening (1) and closing (-1) attributes:
196196

197-
```{code-cell}
197+
```{code-cell} python
198198
md = MarkdownIt("commonmark")
199199
tokens = md.parse("""
200200
Here's some *text*
@@ -208,37 +208,37 @@ Here's some *text*
208208
Naturally all openings should eventually be closed,
209209
such that:
210210

211-
```{code-cell}
211+
```{code-cell} python
212212
sum([t.nesting for t in tokens]) == 0
213213
```
214214

215215
All tokens are the same class, which can also be created outside the parser:
216216

217-
```{code-cell}
217+
```{code-cell} python
218218
tokens[0]
219219
```
220220

221-
```{code-cell}
221+
```{code-cell} python
222222
from markdown_it.token import Token
223223
token = Token("paragraph_open", "p", 1, block=True, map=[1, 2])
224224
token == tokens[0]
225225
```
226226

227227
The `'inline'` type token contain the inline tokens as children:
228228

229-
```{code-cell}
229+
```{code-cell} python
230230
tokens[1]
231231
```
232232

233233
You can serialize a token (and its children) to a JSONable dictionary using:
234234

235-
```{code-cell}
235+
```{code-cell} python
236236
print(tokens[1].as_dict())
237237
```
238238

239239
This dictionary can also be deserialized:
240240

241-
```{code-cell}
241+
```{code-cell} python
242242
Token.from_dict(tokens[1].as_dict())
243243
```
244244

@@ -251,7 +251,7 @@ Token.from_dict(tokens[1].as_dict())
251251
In some use cases it may be useful to convert the token stream into a syntax tree,
252252
with opening/closing tokens collapsed into a single token that contains children.
253253

254-
```{code-cell}
254+
```{code-cell} python
255255
from markdown_it.tree import SyntaxTreeNode
256256
257257
md = MarkdownIt("commonmark")
@@ -271,11 +271,11 @@ print(node.pretty(indent=2, show_text=True))
271271

272272
You can then use methods to traverse the tree
273273

274-
```{code-cell}
274+
```{code-cell} python
275275
node.children
276276
```
277277

278-
```{code-cell}
278+
```{code-cell} python
279279
print(node[0])
280280
node[0].next_sibling
281281
```
@@ -299,7 +299,7 @@ def function(renderer, tokens, idx, options, env):
299299

300300
You can inject render methods into the instantiated render class.
301301

302-
```{code-cell}
302+
```{code-cell} python
303303
md = MarkdownIt("commonmark")
304304
305305
def render_em_open(self, tokens, idx, options, env):
@@ -316,7 +316,7 @@ Also `add_render_rule` method is specific to Python, rather than adding directly
316316

317317
You can also subclass a render and add the method there:
318318

319-
```{code-cell}
319+
```{code-cell} python
320320
from markdown_it.renderer import RendererHTML
321321
322322
class MyRenderer(RendererHTML):
@@ -329,7 +329,7 @@ md.render("*a*")
329329

330330
Plugins can support multiple render types, using the `__ouput__` attribute (this is currently a Python only feature).
331331

332-
```{code-cell}
332+
```{code-cell} python
333333
from markdown_it.renderer import RendererHTML
334334
335335
class MyRenderer1(RendererHTML):
@@ -355,18 +355,18 @@ print(md.render("*a*"))
355355

356356
Here's a more concrete example; let's replace images with vimeo links to player's iframe:
357357

358-
```{code-cell}
358+
```{code-cell} python
359359
import re
360360
from markdown_it import MarkdownIt
361361
362362
vimeoRE = re.compile(r'^https?:\/\/(www\.)?vimeo.com\/(\d+)($|\/)')
363363
364364
def render_vimeo(self, tokens, idx, options, env):
365365
token = tokens[idx]
366-
aIndex = token.attrIndex('src')
367-
if (vimeoRE.match(token.attrs[aIndex][1])):
368366
369-
ident = vimeoRE.match(token.attrs[aIndex][1])[2]
367+
if vimeoRE.match(token.attrs["src"]):
368+
369+
ident = vimeoRE.match(token.attrs["src"])[2]
370370
371371
return ('<div class="embed-responsive embed-responsive-16by9">\n' +
372372
' <iframe class="embed-responsive-item" src="//player.vimeo.com/video/' +
@@ -381,15 +381,11 @@ print(md.render("![](https://www.vimeo.com/123)"))
381381

382382
Here is another example, how to add `target="_blank"` to all links:
383383

384-
```{code-cell}
384+
```{code-cell} python
385385
from markdown_it import MarkdownIt
386386
387387
def render_blank_link(self, tokens, idx, options, env):
388-
aIndex = tokens[idx].attrIndex('target')
389-
if (aIndex < 0):
390-
tokens[idx].attrPush(['target', '_blank']) # add new attribute
391-
else:
392-
tokens[idx].attrs[aIndex][1] = '_blank' # replace value of existing attr
388+
tokens[idx].attrSet("target", "_blank")
393389
394390
# pass token to default renderer.
395391
return self.renderToken(tokens, idx, options, env)

markdown_it/port.yaml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,15 @@
88
- `len` -> `length`
99
- `str` -> `string`
1010
- |
11-
Convert JS for loops -to while loops
11+
Convert JS `for` loops to `while` loops
1212
this is generally the main difference between the codes,
1313
because in python you can't do e.g. `for {i=1;i<x;i++} {}`
14+
- |
15+
`Token.attrs` is a dictionary, instead of a list of lists.
16+
Upstream the list format is only used to guarantee order: https://github.com/markdown-it/markdown-it/issues/142,
17+
but in Python 3.7+ order of dictionaries is guaranteed.
18+
One should anyhow use the `attrGet`, `attrSet`, `attrPush` and `attrJoin` methods
19+
to manipulate `Token.attrs`, which have an identical signature to those upstream.
1420
- Use python version of `charCodeAt`
1521
- |
1622
Reduce use of charCodeAt() by storing char codes in a srcCharCodes attribute for state

markdown_it/renderer.py

Lines changed: 10 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -153,19 +153,10 @@ def renderToken(
153153
@staticmethod
154154
def renderAttrs(token: Token) -> str:
155155
"""Render token attributes to string."""
156-
if not token.attrs:
157-
return ""
158-
159156
result = ""
160157

161-
for token_attr in token.attrs:
162-
result += (
163-
" "
164-
+ escapeHtml(str(token_attr[0]))
165-
+ '="'
166-
+ escapeHtml(str(token_attr[1]))
167-
+ '"'
168-
)
158+
for key, value in token.attrItems():
159+
result += " " + escapeHtml(key) + '="' + escapeHtml(str(value)) + '"'
169160

170161
return result
171162

@@ -241,17 +232,9 @@ def fence(self, tokens: Sequence[Token], idx: int, options, env) -> str:
241232
# May be, one day we will add .deepClone() for token and simplify this part, but
242233
# now we prefer to keep things local.
243234
if info:
244-
i = token.attrIndex("class")
245-
tmpAttrs = token.attrs[:] if token.attrs else []
246-
247-
if i < 0:
248-
tmpAttrs.append(["class", options.langPrefix + langName])
249-
else:
250-
tmpAttrs[i] = tmpAttrs[i][:]
251-
tmpAttrs[i][1] += " " + options.langPrefix + langName
252-
253235
# Fake token just to render attributes
254-
tmpToken = Token(type="", tag="", nesting=0, attrs=tmpAttrs)
236+
tmpToken = Token(type="", tag="", nesting=0, attrs=token.attrs.copy())
237+
tmpToken.attrJoin("class", options.langPrefix + langName)
255238

256239
return (
257240
"<pre><code"
@@ -271,16 +254,17 @@ def fence(self, tokens: Sequence[Token], idx: int, options, env) -> str:
271254

272255
def image(self, tokens: Sequence[Token], idx: int, options, env) -> str:
273256
token = tokens[idx]
274-
assert token.attrs is not None, '"image" token\'s attrs must not be `None`'
275257

276258
# "alt" attr MUST be set, even if empty. Because it's mandatory and
277259
# should be placed on proper position for tests.
278-
#
260+
261+
assert (
262+
token.attrs and "alt" in token.attrs
263+
), '"image" token\'s attrs must contain `alt`'
264+
279265
# Replace content with actual value
280266

281-
token.attrs[token.attrIndex("alt")][1] = self.renderInlineAsText(
282-
token.children, options, env
283-
)
267+
token.attrSet("alt", self.renderInlineAsText(token.children, options, env))
284268

285269
return self.renderToken(tokens, idx, options, env)
286270

markdown_it/rules_block/list.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ def list_block(state: StateBlock, startLine: int, endLine: int, silent: bool):
168168
if isOrdered:
169169
token = state.push("ordered_list_open", "ol", 1)
170170
if markerValue != 1:
171-
token.attrs = [["start", markerValue]]
171+
token.attrs = {"start": markerValue}
172172

173173
else:
174174
token = state.push("bullet_list_open", "ul", 1)

markdown_it/rules_block/table.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@ def table(state: StateBlock, startLine: int, endLine: int, silent: bool):
150150
for i in range(len(columns)):
151151
token = state.push("th_open", "th", 1)
152152
if aligns[i]:
153-
token.attrs = [["style", "text-align:" + aligns[i]]]
153+
token.attrs = {"style": "text-align:" + aligns[i]}
154154

155155
token = state.push("inline", "", 0)
156156
# note in markdown-it this map was removed in v12.0.0 however, we keep it,
@@ -198,7 +198,7 @@ def table(state: StateBlock, startLine: int, endLine: int, silent: bool):
198198
for i in range(columnCount):
199199
token = state.push("td_open", "td", 1)
200200
if aligns[i]:
201-
token.attrs = [["style", "text-align:" + aligns[i]]]
201+
token.attrs = {"style": "text-align:" + aligns[i]}
202202

203203
token = state.push("inline", "", 0)
204204
# note in markdown-it this map was removed in v12.0.0 however, we keep it,

0 commit comments

Comments
 (0)