Skip to content

Conversation

@notfilippo
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Related to #16841. The ability to correctly account for memory usage of arrow buffers in execution nodes is crucial to maximise resource usage while preventing OOMs.

What changes are included in this PR?

  • An implementation of arrow_buffer::MemoryPool for DataFusion's MemoryPool under the arrow_buffer_pool feature-flag

Are these changes tested?

Yes!

Are there any user-facing changes?

Introduced new API.

@github-actions github-actions bot added the execution Related to the execution crate label Nov 25, 2025
parquet_encryption = [
"parquet/encryption",
]
arrow_buffer_pool = [
Copy link
Contributor Author

@notfilippo notfilippo Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature-flag name is fairly descriptive. Maybe we could change it to something better. I'm open to suggestions!

Comment on lines +51 to +67
fn available(&self) -> isize {
// The pool may be overfilled, so this method might return a negative value.
(self.capacity() as i128 - self.used() as i128)
.try_into()
.unwrap_or(isize::MIN)
}

fn used(&self) -> usize {
self.inner.reserved()
}

fn capacity(&self) -> usize {
match self.inner.memory_limit() {
MemoryLimit::Infinite | MemoryLimit::Unknown => usize::MAX,
MemoryLimit::Finite(capacity) => capacity,
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure these methods have a use-case... I'm open to removing them in the upstream trait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

execution Related to the execution crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement arrow_buffer::MemoryPool for DataFusion's MemoryPool

1 participant