Skip to content

Commit 65fdc96

Browse files
committed
Easier field definitions
`attr()` and `child()` are no longer required, and are inferred. The type resolving has been improved, I think it's more robust now, although incorrectly defined attributes will be harder to debug.
1 parent ceeca03 commit 65fdc96

18 files changed

+943
-826
lines changed

.pylintrc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ disable=
2222

2323
[BASIC]
2424

25-
good-names=_,e,el,ex,f,tp,v
25+
good-names=_,e,el,ex,f,tp,k,v,ns
2626

2727
[FORMAT]
2828

README.md

Lines changed: 71 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -22,30 +22,29 @@ Requires Python 3.7 or higher.
2222
```python
2323
from lxml import etree
2424
from typing import List
25-
from xml_dataclasses import xml_dataclass, attr, child, load, dump
25+
from xml_dataclasses import xml_dataclass, rename, load, dump
2626

2727
CONTAINER_NS = "urn:oasis:names:tc:opendocument:xmlns:container"
2828

2929
@xml_dataclass
3030
class RootFile:
3131
__ns__ = CONTAINER_NS
32-
full_path: str = attr(rename="full-path")
33-
media_type: str = attr(rename="media-type")
32+
full_path: str = rename(name="full-path")
33+
media_type: str = rename(name="media-type")
3434

3535

3636
@xml_dataclass
3737
class RootFiles:
3838
__ns__ = CONTAINER_NS
39-
rootfile: List[RootFile] = child()
39+
rootfile: List[RootFile]
4040

4141

4242
@xml_dataclass
4343
class Container:
4444
__ns__ = CONTAINER_NS
45-
version: str = attr()
46-
rootfiles: RootFiles = child()
45+
version: str
46+
rootfiles: RootFiles
4747
# WARNING: this is an incomplete implementation of an OPF container
48-
# (it's missing links)
4948

5049

5150
if __name__ == "__main__":
@@ -62,10 +61,61 @@ if __name__ == "__main__":
6261
* Convert XML documents to well-defined dataclasses, which should work with IDE auto-completion
6362
* Loading and dumping of attributes, child elements, and text content
6463
* Required and optional attributes and child elements
65-
* Lists of child elements are supported
64+
* Lists of child elements are supported, as are unions and lists or unions
6665
* Inheritance does work, but has the same limitations as dataclasses. Inheriting from base classes with required fields and declaring optional fields doesn't work due to field order. This isn't recommended
6766
* Namespace support is decent as long as correctly declared. I've tried on several real-world examples, although they were known to be valid. `lxml` does a great job at expanding namespace information when loading and simplifying it when saving
68-
* Union child types are supported. When loading XML, they are attempted to be parsed in order
67+
68+
## Patterns
69+
70+
### Defining attributes
71+
72+
Attributes can be either `str` or `Optional[str]`. Using any other type won't work. Attributes can be renamed or have their namespace modified via the `rename` function. It can be used either on its own, or with an existing field definition:
73+
74+
```python
75+
@xml_dataclass
76+
class Foo:
77+
__ns__ = None
78+
required: str
79+
optional: Optional[str] = None
80+
renamed_with_default: str = rename(default=None, name="renamed-with-default")
81+
namespaced: str = rename(ns="http://www.w3.org/XML/1998/namespace")
82+
existing_field: str = rename(field(...), name="existing-field")
83+
```
84+
85+
I would like to add support for validation in future, which might also make it easier to support other types. For now, you can work around this limitation with properties that do the conversion.
86+
87+
### Defining text
88+
89+
Like attributes, text can be either `str` or `Optional[str]`. You must declare text content with the `text` function. Similar to `rename`, this function can use an existing field definition, or take the `default` argument. Text cannot be renamed or namespaced. Every class can only have one field defining text content. If a class has text content, it cannot have any children.
90+
91+
```python
92+
@xml_dataclass
93+
class Foo:
94+
__ns__ = None
95+
value: str = text()
96+
97+
@xml_dataclass
98+
class Foo:
99+
__ns__ = None
100+
content: Optional[str] = text(default=None)
101+
102+
@xml_dataclass
103+
class Foo:
104+
__ns__ = None
105+
uuid: str = text(field(default_factory=lambda: str(uuid4())))
106+
```
107+
108+
### Defining children/child elements
109+
110+
Children must ultimately be other XML dataclasses. However, they can also be `Optional`, `List`, and `Union` types:
111+
112+
* `Optional` must be at the top level. Valid: `Optional[List[XmlDataclass]]`. Invalid: `List[Optional[XmlDataclass]]`
113+
* Next, `List` should be defined (if multiple child elements are allowed). Valid: `List[Union[XmlDataclass1, XmlDataclass2]]`. Invalid: `Union[List[XmlDataclass1], XmlDataclass2]`
114+
* Finally, if `Optional` or `List` were used, a union type should be the inner-most (again, if needed)
115+
116+
Children can be renamed via the `rename` function, however attempting to set a namespace is invalid, since the namespace is provided by the child type's XML dataclass. Also, unions of XML dataclasses must have the same namespace (you can use different fields if they have different namespaces).
117+
118+
If a class has children, it cannot have text content.
69119

70120
## Gotchas
71121

@@ -79,19 +129,24 @@ parser = etree.XMLParser(remove_blank_text=True)
79129

80130
By default, `lxml` preserves whitespace. This can cause a problem when checking if elements have no text. The library does attempt to strip these; literally via Python's `strip()`. But `lxml` is likely faster and more robust.
81131

82-
## Limitations and Assumptions
132+
### Optional vs required
133+
134+
On dataclasses, optional fields also usually have a default value to be useful. But this isn't required; `Optional` is just a type hint to say `None` is allowed.
135+
136+
For XML dataclasses, on loading/deserialisation, whether or not a field is required is determined by if it has a `default`/`default_factory` defined. If so, and it's missing, that default is used. Otherwise, an error is raised.
137+
138+
For dumping/serialisation, the default isn't considered. Instead, if a value is marked as `Optional` and the value is `None`, it isn't written.
139+
140+
This makes sense in many cases, but possibly not every case.
141+
142+
### Other limitations and Assumptions
83143

84144
Most of these limitations/assumptions are enforced. They may make this project unsuitable for your use-case.
85145

86-
* All attributes are strings, no extra validation is performed. I would like to add support for validation in future, which might also make it easier to support other types
87-
* Elements can either have child elements or text content, not both
88-
* Child elements are other XML dataclasses
89-
* Text content is a string
90146
* It isn't possible to pass any parameters to the wrapped `@dataclass` decorator
91-
* Some properties of dataclass `field`s are not exposed: `default_factory`, `repr`, `hash`, `init`, `compare`. For most, it is because I don't understand the implications fully or how that would be useful for XML. `default_factory` is hard only because of [the overloaded type signatures](https://github.com/python/typeshed/blob/master/stdlib/3.7/dataclasses.pyi), and getting that to work with `mypy`
147+
* Setting the `init` parameter of a dataclass' `field` will lead to bad things happening, this isn't supported
92148
* Deserialisation is strict; missing required attributes and child elements will cause an error. I want this to be the default behaviour, but it should be straightforward to add a parameter to `load` for lenient operation
93149
* Dataclasses must be written by hand, no tools are provided to generate these from, DTDs, XML schema definitions, or RELAX NG schemas
94-
* Union types must have the same element/tag name and namespace. Otherwise, two different dataclass attributes (XML child fields) may be used
95150

96151
## Development
97152

functional/container_test.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
from pathlib import Path
22
from typing import List
3+
34
import pytest
45
from lxml import etree
56

6-
from xml_dataclasses import attr, child, dump, load, xml_dataclass
7+
from xml_dataclasses import dump, load, rename, xml_dataclass
78

89
from .utils import lmxl_dump
910

@@ -15,21 +16,21 @@
1516
@xml_dataclass
1617
class RootFile:
1718
__ns__ = CONTAINER_NS
18-
full_path: str = attr(rename="full-path")
19-
media_type: str = attr(rename="media-type")
19+
full_path: str = rename(name="full-path")
20+
media_type: str = rename(name="media-type")
2021

2122

2223
@xml_dataclass
2324
class RootFiles:
2425
__ns__ = CONTAINER_NS
25-
rootfile: List[RootFile] = child()
26+
rootfile: List[RootFile]
2627

2728

2829
@xml_dataclass
2930
class Container:
3031
__ns__ = CONTAINER_NS
31-
version: str = attr()
32-
rootfiles: RootFiles = child()
32+
version: str
33+
rootfiles: RootFiles
3334
# WARNING: this is an incomplete implementation of an OPF container
3435
# (it's missing links)
3536

functional/package_test.py

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
from lxml import etree
66

7-
from xml_dataclasses import attr, child, dump, load, text, xml_dataclass
7+
from xml_dataclasses import dump, load, rename, text, xml_dataclass
88

99
from .utils import lmxl_dump
1010

@@ -29,73 +29,73 @@ def to_dict(cls, default_ns: Optional[str] = None) -> Mapping[Optional[str], str
2929
class DublinCoreMd:
3030
__ns__ = NsMap.dc.value
3131
value: str = text()
32-
id: Optional[str] = attr(default=None)
32+
id: Optional[str] = None
3333

3434

3535
@xml_dataclass
3636
class MdMeta3:
3737
__ns__ = NsMap.opf.value
3838

39-
property: str = attr()
39+
property: str
4040
value: str = text()
4141

4242

4343
@xml_dataclass
4444
class MdMeta2:
4545
__ns__ = NsMap.opf.value
46-
content: str = attr()
47-
name: str = attr()
46+
content: str
47+
name: str
4848

4949

5050
@xml_dataclass
5151
class Metadata3:
5252
__ns__ = NsMap.opf.value
5353

54-
identifier: List[DublinCoreMd] = child()
55-
title: List[DublinCoreMd] = child()
56-
language: List[DublinCoreMd] = child()
57-
meta: List[Union[MdMeta3, MdMeta2]] = child()
54+
identifier: List[DublinCoreMd]
55+
title: List[DublinCoreMd]
56+
language: List[DublinCoreMd]
57+
meta: List[Union[MdMeta3, MdMeta2]]
5858

5959

6060
@xml_dataclass
6161
class Item3:
6262
__ns__ = NsMap.opf.value
63-
id: str = attr()
64-
href: str = attr()
65-
media_type: str = attr(rename="media-type")
63+
id: str
64+
href: str
65+
media_type: str = rename(name="media-type")
6666

6767

6868
@xml_dataclass
6969
class Manifest3:
7070
__ns__ = NsMap.opf.value
71-
item: List[Item3] = child()
71+
item: List[Item3]
7272

7373

7474
@xml_dataclass
7575
class ItemRef3:
7676
__ns__ = NsMap.opf.value
77-
idref: str = attr()
78-
properties: Optional[str] = attr(default=None)
77+
idref: str
78+
properties: Optional[str] = None
7979

8080

8181
@xml_dataclass
8282
class Spine3:
8383
__ns__ = NsMap.opf.value
84-
itemref: List[ItemRef3] = child()
85-
toc: Optional[str] = attr(default=None)
84+
itemref: List[ItemRef3]
85+
toc: Optional[str] = None
8686

8787

8888
@xml_dataclass
8989
class Package3:
9090
__ns__ = NsMap.opf.value
91-
version: str = attr()
92-
unique_identifier: str = attr(rename="unique-identifier")
93-
metadata: Metadata3 = child()
94-
manifest: Manifest3 = child()
95-
spine: Spine3 = child()
96-
id: Optional[str] = attr(default=None)
97-
lang: Optional[str] = attr(default=None, namespace=NsMap.xml.value)
98-
dir: Optional[str] = attr(default=None)
91+
version: str
92+
unique_identifier: str = rename(name="unique-identifier")
93+
metadata: Metadata3
94+
manifest: Manifest3
95+
spine: Spine3
96+
id: Optional[str] = None
97+
lang: Optional[str] = rename(default=None, ns=NsMap.xml.value)
98+
dir: Optional[str] = None
9999

100100

101101
def test_functional_package():

lint

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ black $BLACK_CHECK src/ tests/ functional/
1616
mypy src/xml_dataclasses/ --strict
1717
pylint src/
1818
# always output coverage report
19-
if pytest tests/ --cov=xml_dataclasses --random-order $PYTEST_DEBUG; then
19+
if pytest tests/ --cov=xml_dataclasses --cov-report term --cov-report html --random-order $PYTEST_DEBUG; then
2020
coverage html
2121
else
2222
coverage html

pyproject.toml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@ classifiers = [
2020
[tool.poetry.dependencies]
2121
python = "^3.7"
2222
lxml = "^4.5.0"
23-
typing_inspect = "^0.5.0"
2423

2524
[tool.poetry.dev-dependencies]
2625
pytest = "^5.3.5"

src/xml_dataclasses/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@
22

33
logging.getLogger(__name__).addHandler(logging.NullHandler())
44

5-
from .structs import attr, child, text, xml_dataclass # isort:skip
5+
from .modifiers import rename, text # isort:skip
6+
from .resolve_types import xml_dataclass # isort:skip
67
from .serde import dump, load # isort:skip

src/xml_dataclasses/exceptions.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
class XmlDataclassError(Exception):
2+
pass
3+
4+
5+
class XmlDataclassInternalError(XmlDataclassError):
6+
pass
7+
8+
9+
class XmlDataclassNoNamespaceError(XmlDataclassError):
10+
MESSAGE = "XML dataclass without namespace"
11+
12+
def __init__(self) -> None:
13+
super().__init__(self.MESSAGE)
14+
15+
16+
class XmlDataclassModelError(XmlDataclassError):
17+
pass
18+
19+
20+
class XmlDataclassContentsError(XmlDataclassModelError):
21+
MESSAGE = "XML dataclass with text-only content has children declared"
22+
23+
def __init__(self) -> None:
24+
super().__init__(self.MESSAGE)
25+
26+
27+
class XmlDataclassDuplicateFieldError(XmlDataclassModelError):
28+
pass
29+
30+
31+
class XmlTypeError(XmlDataclassModelError):
32+
pass

src/xml_dataclasses/modifiers.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# pylint: disable=unsubscriptable-object
2+
# unsubscriptable-object clashes with type hints
3+
from __future__ import annotations
4+
5+
from dataclasses import _MISSING_TYPE, MISSING, Field, field
6+
from typing import TYPE_CHECKING, Optional, TypeVar, Union, cast
7+
8+
_T = TypeVar("_T")
9+
10+
11+
def make_field(default: Union[_T, _MISSING_TYPE]) -> Field[_T]:
12+
if TYPE_CHECKING: # pragma: no cover
13+
return cast(Field[_T], field(default=default))
14+
return field(default=default)
15+
16+
17+
def rename(
18+
f: Optional[Field[_T]] = None,
19+
default: Union[_T, _MISSING_TYPE] = MISSING,
20+
name: Optional[str] = None,
21+
ns: Optional[str] = None,
22+
) -> Field[_T]:
23+
if f is None:
24+
f = make_field(default=default)
25+
metadata = dict(f.metadata)
26+
if name:
27+
metadata["xml:name"] = name
28+
if ns:
29+
metadata["xml:ns"] = ns
30+
f.metadata = metadata
31+
return f
32+
33+
34+
def text(
35+
f: Optional[Field[_T]] = None, default: Union[_T, _MISSING_TYPE] = MISSING
36+
) -> Field[_T]:
37+
if f is None:
38+
f = make_field(default=default)
39+
metadata = dict(f.metadata)
40+
metadata["xml:text"] = True
41+
f.metadata = metadata
42+
return f

0 commit comments

Comments
 (0)