-
Notifications
You must be signed in to change notification settings - Fork 192
Description
We recently noticed that a heavy JMESpath workload was triggering a large number of garbage collection runs. We are using jmespath.compile(), and we tracked this down to the jmespath.visitor.TreeInterpreter that is created on every call to `ParsedResult.search():
jmespath.py/jmespath/parser.py
Line 508 in bbe7300
| interpreter = visitor.TreeInterpreter(options) |
It appears that TreeInterpreter creates a reference cycle, which leads to the GC being triggered frequently to clean up the cycles. As far as I can tell, the problem comes from the Visitor._method_cache:
jmespath.py/jmespath/visitor.py
Lines 91 to 93 in bbe7300
| method = getattr( | |
| self, 'visit_%s' % node['type'], self.default_visit) | |
| self._method_cache[node_type] = method |
...which store references to methods that are bound to self in a member of self.
Possible solution
We worked around the problem by monkey patching ParsedResult so that it (1) caches a default_interpreter for use when options=None, and (2) uses it in search(). If I understand correctly, we could go further and use a global TreeInterpreter for all ParsedResult instances. The TreeInterpreter seems to be stateless apart from self._method_cache and that implementation seems to be thread-safe (with only the risk of multiple lookups for the same method in a multithreaded case).
I'd be happy to contribute a PR for either version if this would be welcome.
How to reproduce
The following reproducer shows the problem:
import jmespath
import gc
gc.set_debug(gc.DEBUG_COLLECTABLE)
pattern = jmespath.compile("foo")
value = {"foo": "bar"}
for _ in range(1000000):
pattern.search(value)...where the output contains one million repetitions of something like:
gc: collectable <TreeInterpreter 0x10f634fa0>
gc: collectable <dict 0x10f63e780>
gc: collectable <Options 0x10f634520>
gc: collectable <Functions 0x10f6345b0>
gc: collectable <method 0x10f63ee80>
gc: collectable <dict 0x10f63eb00>