Skip to content
Jeff Dairiki edited this page Sep 26, 2022 · 18 revisions

Resolving Type References in marshmallow_dataclass

Forward type references are used to refer to types that have not yet been defined. They are implemented as strings which must be evaluated later to compute the actual type declaration.

Whenever anything is evaled, it gets done within the context of a specific pair of global and local namespaces.

When computing a schema for a dataclass, marshmallow_dataclass needs access to the final resolved type hints for the dataclass' attributes. It’s basically easy: we call [typing.get_type_hints][get_type_hints] to compute those type hints. The only tricky part is that for type references to be properly resolved we often need to provide the correct local namespace to get_type_hints for it to use when resolving those references.

Currently, our class_schema defaults to using the caller's locals. Often that's the right thing. E.g. it does allow for correctly resolving the attribute types in

def f():
    @dataclasses.dataclass
    class A:
        b: "B"

    @dataclasses.dataclass
    class B:
        x: int

    MySchema = marshmallow.class_schema(A)

But using the caller's locals is just a guess — it doesn't always work. Consider this case:

def f() -> None:
    @dataclasses.dataclass
    class A:
        b: "B"

    @dataclasses.dataclass
    class B:
        a: "A"

    def g():
        MySchema = marshmallow_dataclass.class_schema(A)
        print(locals())
    g()

This currently doesn't work ("NameError: name 'B' is not defined"). We need the locals from f to resolve the type references, but class_schema uses the locals from g.

We can fix it by explicitly passing the frame of the function whose locals should be used:

def f() -> None:
    @dataclasses.dataclass
    class A:
        b: "B"

    @dataclasses.dataclass
    class B:
        a: "A"

    def g():
        MySchema = marshmallow_dataclass.class_schema(A, clazz_frame=inspect.currentframe().f_back)
        print(locals())
    g()

but that’s pretty ugly. And dealing with frames is fraught with booby traps waiting to cause large memory leaks.

Here's another interesting case, where using the caller’s locals to resolve type references is worse than not using any locals at all:

@dataclasses.dataclass
class A:
    b: "B"

@dataclasses.dataclass
class B:
    x: int

def f() -> None:
    @dataclasses.dataclass
    class B:
        y: str

    MySchema = marshmallow_dataclass.class_schema(A)

This produces the wrong schema. The reference to "B" gets resolve to f’s f.<locals>.B, when it should be resolved to the module-level B.

But Wait, It Gets Worse!

The marshmallow_dataclass.dataclass decorator sticks the computed schema into the .Schema attribute of the dedicated class.

Except that in a case with a forward reference

@marshmallow_dataclass.dataclass
class A:
    b: "B"

#... definition of class B follows

at the time class A is decorated class B has not yet been defined. No matter what locals we use it is, in general, not possible to resolve the forward reference at class-decoration time.

So, currently, we store the caller’s frame in an attribute descriptor which waits until the .Schema attribute is first accessed to call class_schema and compute the schema.

This comes not only with the same problem discussed above — how to pick the correct frame for a local namespace — but also means we have to hold a reference to a stack frame for potentially quite awhile.

Why Can't We Just Hold a Reference to the Locals?

So, we'd feel much better if instead of holding on the whole frame (which includes all the parent frames) we could just hold a reference to the frames f_locals dict.

It turns out this doesn't work. It has something to do with "fast locals" (a CPython implementation detail). Not all of a frame’s locals (perhaps none of them) are actually stored in the f_locals dict. Instead they are stored on the frame in some other optimized way. When when accesses the local dict via frame.f_locals (or by calling builtin.locals()), any fast locals in scope are copied to the local namespace dict before it is returned. Any new fast locals which are created or modified after the f_locals attribute was accessed do not automatically make it into the locals dict — until next time f_locals is accessed (or locals() is called.)

def f():
    localns = locals()
    x = 2
    print("x" in localns)
    locals()
    print("x" in localns)

f()

For me, this prints

False
True

both under CPython and pypy.

So, it appears that if we want access to any locals which may be defined in the future, we have to hold a reference to the frame, not just the locals dict.

Clone this wiki locally