|
1 | 1 | # XML dataclasses |
2 | 2 |
|
3 | | -[](https://opensource.org/licenses/MPL-2.0)  |
| 3 | +[](https://opensource.org/licenses/MPL-2.0)  |
4 | 4 |
|
5 | 5 | [XML dataclasses on PyPI](https://pypi.org/project/xml-dataclasses/) |
6 | 6 |
|
7 | | -This library enables (de)serialising XML into Python dataclasses. XML dataclasses build on normal dataclasses from the standard library and [`lxml`](https://pypi.org/project/lxml/) elements. Loading and saving these elements is left to the consumer for flexibility of the desired output. |
| 7 | +This library maps XML to and from Python dataclasses. It build on normal dataclasses from the standard library and uses [`lxml`](https://pypi.org/project/lxml/) for parsing/generating XML. |
8 | 8 |
|
9 | | -It's currently in alpha. It isn't ready for production if you aren't willing to do your own evaluation/quality assurance. I don't recommend using this library with untrusted content. It inherits all of `lxml`'s flaws with regards to XML attacks, and recursively resolves data structures. Because deserialisation is driven from the dataclass definitions, it shouldn't be possible to execute arbitrary Python code (not a guarantee, see license). Denial of service attacks would very likely be feasible. One workaround may be to [use `lxml` to validate](https://lxml.de/validation.html) untrusted content with a strict schema. |
| 9 | +It's currently in alpha. It isn't ready for production if you aren't willing to do your own evaluation/quality assurance. |
10 | 10 |
|
11 | 11 | Requires Python 3.7 or higher. |
12 | 12 |
|
13 | 13 | ## Features |
14 | 14 |
|
15 | | -* XML dataclasses are also dataclasses, and only require a single decorator to work (but see type hinting section for issues) |
16 | | -* Convert XML documents to well-defined dataclasses, which should work with IDE auto-completion |
| 15 | +* Convert XML documents to well-defined dataclasses, which work with Mypy or IDE auto-completion |
| 16 | +* XML dataclasses are dataclasses |
| 17 | +* Full control of parsing and generating XML via `lxml` |
17 | 18 | * Loading and dumping of attributes, child elements, and text content |
18 | | -* Required and optional attributes and child elements |
| 19 | +* Required and optional attributes/child elements |
19 | 20 | * Lists of child elements are supported, as are unions and lists or unions |
20 | 21 | * Inheritance does work, but has the same limitations as dataclasses. Inheriting from base classes with required fields and declaring optional fields doesn't work due to field order. This isn't recommended |
21 | 22 | * Namespace support is decent as long as correctly declared. I've tried on several real-world examples, although they were known to be valid. `lxml` does a great job at expanding namespace information when loading and simplifying it when saving |
22 | 23 | * Post-load validation hook `xml_validate` |
23 | 24 | * Fields not required in the constructor are ignored by this library (via `ignored()` or `init=False`) |
24 | 25 |
|
| 26 | +## Limitations |
| 27 | + |
| 28 | +* Whitespace and comments aren't supported in the data model. They must be stripped when loading the XML |
| 29 | +* So far, I haven't found any examples where XML can't be mapped to a dataclass, but it's likely possible given how complex XML is |
| 30 | +* Strict mapping. Currently, if an unknown element is encountered, an error is raised (see [#3](https://github.com/tobywf/xml_dataclasses/issues/3), pull requests welcome) |
| 31 | +* No typing/type conversions. Since XML is untyped, only string values are currently allowed. Type conversions are tricky to implement in a type-safe and extensible manner. |
| 32 | +* Dataclasses must be written by hand, no tools are provided to generate these from, DTDs, XML schema definitions, or RELAX NG schemas |
| 33 | + |
| 34 | +## Security |
| 35 | + |
| 36 | +The caveats concerning untrusted content are roughly the same as with `lxml`, since that does the parsing. This is good, since `lxml`'s behaviour to XML attacks are well-understood. This library recursively resolves data structures, which may have memory implications for unbounded payloads. Because loading is driven from the dataclass definitions, it shouldn't be possible to execute arbitrary Python code (not a guarantee, see license). If you must deal with untrusted content, a workaround is to [use `lxml` to validate](https://lxml.de/validation.html) untrusted content with a strict schema, which you may already be doing. |
| 37 | + |
25 | 38 | ## Patterns |
26 | 39 |
|
27 | 40 | ### Defining attributes |
@@ -146,7 +159,7 @@ class Container(XmlDataclass): |
146 | 159 |
|
147 | 160 | if __name__ == "__main__": |
148 | 161 | nsmap: NsMap = {None: CONTAINER_NS} |
149 | | - # see Gotchas, stripping whitespace is highly recommended |
| 162 | + # see Gotchas, stripping whitespace and comments is highly recommended |
150 | 163 | parser = etree.XMLParser(remove_blank_text=True, remove_comments=True) |
151 | 164 | lxml_el_in = etree.parse("container.xml", parser).getroot() |
152 | 165 | container = load(Container, lxml_el_in, "container") |
@@ -186,26 +199,18 @@ parser = etree.XMLParser(remove_blank_text=True, remove_comments=True) |
186 | 199 |
|
187 | 200 | By default, `lxml` preserves whitespace. This can cause a problem when checking if elements have no text. The library does attempt to strip these; literally via Python's `strip()`. But `lxml` is likely faster and more robust. |
188 | 201 |
|
189 | | -Similarly, comments are included by default, and because deserialization is strict, they will be considered as nodes that the dataclass has not declared. It is recommended to omit them during parsing. |
| 202 | +Similarly, comments are included by default, and because loading is strict, they will be considered as nodes that the dataclass has not declared. It is recommended to omit them during parsing. |
190 | 203 |
|
191 | 204 | ### Optional vs required |
192 | 205 |
|
193 | 206 | On dataclasses, optional fields also usually have a default value to be useful. But this isn't required; `Optional` is just a type hint to say `None` is allowed. This would occur e.g. if an element has no children. |
194 | 207 |
|
195 | | -For XML dataclasses, on loading/deserialisation, whether or not a field is required is determined by if it has a `default`/`default_factory` defined. If so, and it's missing, that default is used. Otherwise, an error is raised. |
| 208 | +For loading XML dataclasses, whether or not a field is required is determined by if it has a `default`/`default_factory` defined. If so, and it's missing, that default is used. Otherwise, an error is raised. |
196 | 209 |
|
197 | | -For dumping/serialisation, the default isn't considered. Instead, if a value is marked as `Optional` and the value is `None`, it isn't written. |
| 210 | +For dumping, the default isn't considered. Instead, if a value is marked as `Optional` and the value is `None`, it isn't written. |
198 | 211 |
|
199 | 212 | This makes sense in many cases, but possibly not every case. |
200 | 213 |
|
201 | | -### Other limitations and Assumptions |
202 | | - |
203 | | -Most of these limitations/assumptions are enforced. They may make this project unsuitable for your use-case. |
204 | | - |
205 | | -* If you need to pass any parameters to the wrapped `@dataclass` decorator, apply it before the `@xml_dataclass` decorator |
206 | | -* Deserialisation is strict; missing required attributes and child elements will cause an error. I want this to be the default behaviour, but it should be straightforward to add a parameter to `load` for lenient operation |
207 | | -* Dataclasses must be written by hand, no tools are provided to generate these from, DTDs, XML schema definitions, or RELAX NG schemas |
208 | | - |
209 | 214 | ## Changelog |
210 | 215 |
|
211 | 216 | ### [0.0.6] - 2020-03-25 |
@@ -244,13 +249,12 @@ Dependencies are managed via [poetry](https://python-poetry.org/). To install al |
244 | 249 | poetry install |
245 | 250 | ``` |
246 | 251 |
|
247 | | -This will also install development dependencies such as `black`, `isort`, `pylint`, `mypy`, and `pytest`. I've provided a simple script to run these during development called `lint`. You can either run it from a shell session with the poetry-installed virtual environment, or run as follows: |
| 252 | +This will also install development dependencies such as `black`, `isort`, `pylint`, `mypy`, and `pytest`. Pre-defined tasks make it easy to run these, for example |
248 | 253 |
|
249 | | -``` |
250 | | -poetry run ./lint |
251 | | -``` |
| 254 | +* `poetry run task lint` - this runs `black`, `isort`, `mypy`, and `pylint` |
| 255 | +* `poetry run task test` - this runs `pytest` with coverage |
252 | 256 |
|
253 | | -Auto-formatters will be applied, and static analysis/tests are run in order. The script stops on failure to allow quick iteration. |
| 257 | +For a full list of tasks, see `poetry run task --list`. |
254 | 258 |
|
255 | 259 | ## License |
256 | 260 |
|
|
0 commit comments