-
-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Forward type references are used to refer to types that have not yet been defined. They are implemented as strings which must be evaluated later to compute the actual type declaration.
Whenever anything is evaled, it gets done within the context of a specific pair of global and local namespaces.
When computing a schema for a dataclass, marshmallow_dataclass needs access to the final resolved type hints for the dataclass' attributes.
It’s basically easy: we call [typing.get_type_hints][get_type_hints] to compute those type hints.
The only tricky part is that for type references to be properly resolved we often need to provide the correct local namespace to get_type_hints for it to use when resolving those references.
When resolving the type references of class attributes, we must use the local namespace from the scope where the class was defined. Unfortunately, there appears to be no sure-fire way to find that scope or that namespace.
Currently, our class_schema defaults to using the caller's locals. Often that's the right thing — that is, often the dataclass was created in the same scope that calls class_schema, or, alternatively, the dataclass is was created in module scope (where the local and global namespaces are one and the same), so if we assume the "wrong" locals, it doesn't really hurt (unless there's a name conflict the reference will be resolved correctly through the module globals).
E.g. using the caller’s local namespace does work here: def f(): @dataclasses.dataclass class A: b: "B"
@dataclasses.dataclass
class B:
x: int
MySchema = marshmallow.class_schema(A)
But using the caller's locals doesn't always work. Consider this case:
```py
def f() -> None:
@dataclasses.dataclass
class A:
b: "B"
@dataclasses.dataclass
class B:
a: "A"
def g():
MySchema = marshmallow_dataclass.class_schema(A)
print(locals())
g()
This currently doesn't work ("NameError: name 'B' is not defined"). We need the locals from f to resolve the type references, but class_schema uses the locals from g.
We can fix it by explicitly passing the frame of the function whose locals should be used:
def f() -> None:
@dataclasses.dataclass
class A:
b: "B"
@dataclasses.dataclass
class B:
a: "A"
def g():
MySchema = marshmallow_dataclass.class_schema(A, clazz_frame=inspect.currentframe().f_back)
print(locals())
g()but that’s pretty ugly. And dealing with frames is fraught with booby traps waiting to cause large memory leaks.
Here's another interesting case, where using the caller’s locals to resolve type references is worse than not assuming any locals at all:
@dataclasses.dataclass
class A:
b: "B"
@dataclasses.dataclass
class B:
x: int
def f() -> None:
@dataclasses.dataclass
class B:
y: str
MySchema = marshmallow_dataclass.class_schema(A)This produces the wrong schema. The reference to "B" gets resolved incorrectly to f’s f.<locals>.B, when it should be resolved to the module-level B.
But Wait, It Gets Worse!
The marshmallow_dataclass.dataclass decorator sticks the computed schema into the .Schema attribute of the dedicated class.
Except that in a case with a forward reference
@marshmallow_dataclass.dataclass
class A:
b: "B"
#... definition of class B followsat the time class A is decorated class B has not yet been defined. No matter what locals we use it is, in general, not possible to resolve the forward reference at class-decoration time.
So, currently, we store the caller’s frame in an attribute descriptor which waits until the .Schema attribute is first accessed to call class_schema and compute the schema.
This comes with the same problem discussed above — how to pick the correct frame for a local namespace — but, worse still, it also means we have to hold a reference to a stack frame for potentially quite a while.
So, we'd feel much better if instead of holding onto the whole frame (which includes all the parent frames) we could just hold a reference to the frames f_locals dict.
It turns out this doesn't work. It has something to do with "fast locals" (a CPython implementation detail). Not all of a frame’s locals (perhaps none of them) are actually stored in the f_locals dict. Instead they are stored on the frame in some other, presumably more optimized, way as a "fast local". When one accesses the local dict via frame.f_locals (or by calling builtin.locals()), any fast locals in scope are copied to the local namespace dict before it is returned, so f_locals give a complete view of the local namespace at the time the f_locals attribute was accessed. Any fast locals which are created or modified after the f_locals attribute was accessed, however, do not automatically make it into the locals dict — until next time f_locals is accessed (or locals() is called.)
def f():
localns = locals()
x = 2
print("x" in localns)
locals()
print("x" in localns)
f()For me, this prints
False
True
both under CPython and pypy.
So, it appears that if we want access to any locals which may be defined in the future, we have to hold a reference to the frame, not just the locals dict.