Skip to content

Commit eedea9c

Browse files
committed
Simple post-load validation hook
1 parent 65fdc96 commit eedea9c

File tree

4 files changed

+52
-3
lines changed

4 files changed

+52
-3
lines changed

README.md

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# XML dataclasses
22

3-
This is a very rough prototype of how a library might look like for (de)serialising XML into Python dataclasses. XML dataclasses build on normal dataclasses from the standard library and [`lxml`](https://pypi.org/project/lxml/) elements. Loading and saving these elements is left to the consumer for flexibility of the desired output.
3+
[![License: MPL 2.0](https://img.shields.io/badge/License-MPL%202.0-brightgreen.svg)](https://opensource.org/licenses/MPL-2.0)
4+
5+
This is a prototype of how a library might look like for (de)serialising XML into Python dataclasses. XML dataclasses build on normal dataclasses from the standard library and [`lxml`](https://pypi.org/project/lxml/) elements. Loading and saving these elements is left to the consumer for flexibility of the desired output.
46

57
It isn't ready for production if you aren't willing to do your own evaluation/quality assurance. I don't recommend using this library with untrusted content. It inherits all of `lxml`'s flaws with regards to XML attacks, and recursively resolves data structures. Because deserialisation is driven from the dataclass definitions, it shouldn't be possible to execute arbitrary Python code. But denial of service attacks would very likely be feasible.
68

@@ -46,10 +48,16 @@ class Container:
4648
rootfiles: RootFiles
4749
# WARNING: this is an incomplete implementation of an OPF container
4850

51+
def xml_validate(self):
52+
if self.version != "1.0":
53+
raise ValueError(f"Unknown container version '{self.version}'")
54+
4955

5056
if __name__ == "__main__":
5157
nsmap = {None: CONTAINER_NS}
52-
lxml_el_in = etree.parse("container.xml").getroot()
58+
# see Gotchas, stripping whitespace is highly recommended
59+
parser = etree.XMLParser(remove_blank_text=True)
60+
lxml_el_in = etree.parse("container.xml", parser).getroot()
5361
container = load(Container, lxml_el_in, "container")
5462
lxml_el_out = dump(container, "container", nsmap)
5563
print(etree.tostring(lxml_el_out, encoding="unicode", pretty_print=True))
@@ -64,6 +72,7 @@ if __name__ == "__main__":
6472
* Lists of child elements are supported, as are unions and lists or unions
6573
* Inheritance does work, but has the same limitations as dataclasses. Inheriting from base classes with required fields and declaring optional fields doesn't work due to field order. This isn't recommended
6674
* Namespace support is decent as long as correctly declared. I've tried on several real-world examples, although they were known to be valid. `lxml` does a great job at expanding namespace information when loading and simplifying it when saving
75+
* Post-load validation hook `xml_validate`
6776

6877
## Patterns
6978

@@ -117,6 +126,17 @@ Children can be renamed via the `rename` function, however attempting to set a n
117126

118127
If a class has children, it cannot have text content.
119128

129+
### Defining post-load validation
130+
131+
Simply implement an instance method called `xml_validate` with no parameters, and no return value (if you're using type hints):
132+
133+
```python
134+
def xml_validate(self) -> None:
135+
pass
136+
```
137+
138+
If defined, the `load` function will call it after all values have been loaded and assigned to the XML dataclass. You can validate the fields you want inside this method. Return values are ignored; instead raise and catch exceptions.
139+
120140
## Gotchas
121141

122142
### Whitespace

functional/container_test.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,10 @@ class Container:
3232
version: str
3333
rootfiles: RootFiles
3434
# WARNING: this is an incomplete implementation of an OPF container
35-
# (it's missing links)
35+
36+
def xml_validate(self):
37+
if self.version != "1.0":
38+
raise ValueError(f"Unknown container version '{self.version}'")
3639

3740

3841
@pytest.mark.parametrize("remove_blank_text", [True, False])

src/xml_dataclasses/serde.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,14 @@ def load(cls: Type[XmlDataclass], el: Any, name: Optional[str] = None) -> XmlDat
153153

154154
instance = cls(**attr_values, **text_values, **child_values) # type: ignore
155155
instance.__nsmap__ = el.nsmap
156+
157+
try:
158+
validate_fn = instance.xml_validate # type: ignore
159+
except AttributeError:
160+
pass
161+
else:
162+
validate_fn()
163+
156164
return instance
157165

158166

tests/load_test.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -530,3 +530,21 @@ class Foo:
530530
foo = load(Foo, el, "foo")
531531
assert isinstance(foo.bar, Child1)
532532
assert foo.bar.spam == "eggs"
533+
534+
535+
def test_load_with_validation():
536+
class MyError(Exception):
537+
pass
538+
539+
@xml_dataclass
540+
class Foo:
541+
__ns__ = None
542+
bar: str
543+
544+
def xml_validate(self) -> None:
545+
if self.bar == "baz":
546+
raise MyError()
547+
548+
el = etree.fromstring('<foo bar="baz" />')
549+
with pytest.raises(MyError):
550+
load(Foo, el, "foo")

0 commit comments

Comments
 (0)