Skip to content

Conversation

@thomasywang
Copy link
Contributor

Summary:
Currently our pickle is still not truly zero copy because the Pickler calls Buffer::write() which is copying bytes from PyBytes to BytesMut via extend_from_slice().

To avoid copies, we can just make Buffer backed by a Vec<PyBytes> with each call to Buffer::write() pushing the PyBytes to the Vec.

The following figures show a round trip produced from

await am.echo.call(b"x" * 1024 * 1024)

Before:
Send path: 600us pickle (purple), write frames (dark green), receive frames (light green), 130 us unpickle,
Reply path: 600us pickle, write frames, receive frames, 130us unpickle
{F1983399784}

After:
Send path: 20us pickle (purple), write frames (dark green), receive frames (light green), 200us unpickle,
Reply path: 20us pickle, write frames, receive frames, 150us unpickle
{F1983399596}

Differential Revision: D86696391

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 11, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 11, 2025

@thomasywang has exported this pull request. If you are a Meta employee, you can view the originating Diff in D86696391.

Summary:

This sets us up for the next diff by creating `FragmentedPart` 

Currently our pickle is still not truly zero copy because the Pickler calls `Buffer::write()` which is copying bytes from `PyBytes` to `BytesMut` via `extend_from_slice()`. 

This is especially problematic for large messages (100KB+) as we are spending a lot of CPU cycles handling page faults. For a 1MB message pickling can take as long as 600us

To avoid copies, we can just make `Buffer` backed by a `Vec<PyBytes>` with each call to `Buffer::write()` pushing the PyBytes to the Vec. As a result of this, the PyBytes are physically fragmented despite being logically contiguous. 

To make this work, we will have a NewType with called `FragmentedPart` with a `::Fragmented` variant wrapping `Vec<Part>` and a `::Contiguous` variant wrapping `Part`. Similar to `Part`, `FragmentedPart` also just collects during serialization. When we receive the frame on the other end of the wire, we reconstruct it contiguously in the `FragmentedPart::Contiguous` variant so that we can easily consume it to create a single contiguous `bytes::Bytes`

Differential Revision: D86696390
Summary:

Currently our pickle is still not truly zero copy because the Pickler calls `Buffer::write()` which is copying bytes from `PyBytes` to `BytesMut` via `extend_from_slice()`. 

To avoid copies, we can just make `Buffer` backed by a `Vec<PyBytes>` with each call to `Buffer::write()` pushing the PyBytes to the Vec.

The following figures show a round trip produced from
```
await am.echo.call(b"x" * 1024 * 1024)
```
Before: 
Send path: 600us pickle (purple), write frames (dark green), receive frames (light green), 130 us unpickle, 
Reply path: 600us pickle, write frames, receive frames, 130us unpickle
 {F1983399784}



After:
Send path: 20us pickle (purple), write frames (dark green), receive frames (light green), 200us unpickle, 
Reply path: 20us pickle, write frames, receive frames, 150us unpickle
{F1983399596}

Differential Revision: D86696391
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant