Hi Team,
I use the crate-python driver during CDC process that writes data to CrateDB.
During benchmarking and optimizations of the process I recoded CPU flamegraphs that show that a lot of CPU time was spent during json serialization
|
return json.dumps(data, cls=CrateJsonEncoder) |
There are publicly available benchmarks reporting that json.dumps is very slow, and changing the library to ujson or orjson can make a huge difference. I can confirm that swapping json to ujson reduces time spent on serialization by around 30-40%.
I didn't check how this impacts correctness since the current implementation uses JSON.dumps(cls=) to provide custom transformation logic.
Cheers,
Gabriel