Skip to content

Conversation

@124C41p
Copy link
Contributor

@124C41p 124C41p commented Aug 6, 2023

Fixes #515

Since (de-)serialization is implemented purely in Python, it is quite slow compared to native implementations. I try to circumvent that issue by not deserializing repeated scalar fields immediately, but wrapping their byte representation inside the ScalarArray[T] class instead. This class acts like a list. That is, you can call len(a), a[i], and list(a) for any ScalarArray a, and only at this point we actually deserialize (which is still very slow for big arrays).

On the other hand, when using numpy you can also call np.asarray(a) for any ScalarArray a to turn it into a numpy array in no time. Conversely, any numpy array b can be turned into a ScalarArray by calling ScalarArray.from_numpy(b) to be passed to a betterproto dataclass field (instead of a list) for faster serialization speed.

I tried to be as non-breaking as possible. That is, you can use lists everywhere you used them before. However, it was necessary to generate Sequence[T] type hints where List[T] hints were generated before. Also note that ScalarArray is an immutable data structure. So you might not be able to use .append() or .insert() on repeated fields as before (although it should be possible to make ScalarArray mutable if really needed).

What do you think about this approach?

@124C41p 124C41p marked this pull request as ready for review August 11, 2023 09:00
124C41p and others added 3 commits August 12, 2023 14:12
This increases (de-)serialization speed of
repeated scalar fields (of fixed length)
drastically in the case they are used as
numpy arrays.
unreachable code removed

generic type parameter removed from ScalarArray
for compatibility with Python 3.7 and 3.8

code (auto-)reformatted
@cetanu cetanu self-assigned this Oct 16, 2023
@cetanu cetanu added enhancement New feature or request low priority labels Oct 16, 2023
@Gobot1234
Copy link
Collaborator

Superseded by #545

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request low priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve (de-)serialization performance for scalar arrays

3 participants