Skip to content

Conversation

@eustlb
Copy link
Contributor

@eustlb eustlb commented Nov 7, 2025

What does this PR do?

Before a deeper clean (#42039) on xcodec and to break things down a bit, this PR fixes some inefficient logic in X-codec, responsible for unnecessary memory spikes.

Makes me think that the whole _get_output_length when applying conv1d, which is repeated a lot throughout audio models, could benefit from some standardisation. In that objective, I've added a new audio util that I'll propagate in a subsequent PR.

@eustlb eustlb requested a review from Cyrilvallez November 7, 2025 16:03
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@eustlb eustlb requested a review from ArthurZucker November 10, 2025 09:48
Comment on lines +399 to +431
@lru_cache
def _get_conv1d_layers(self, module):
"""
Recursively iterate to fetch all Conv1d layers.
"""

def get_conv1d_layers_recursive(module: nn.Module):
params_list = []

if isinstance(module, nn.Conv1d):
params_list.append(module)

# Recursively check all child modules
for child in module.children():
params_list.extend(get_conv1d_layers_recursive(child))

return params_list

return tuple(get_conv1d_layers_recursive(module))

def _get_conv1d_output_lengths(self, input_length, module=None):
"""
For a given module, compute the output length that would be obtained after all Conv1d layers.
"""
if module is None:
module = self

conv1d_layers = self._get_conv1d_layers(module)

for layer in conv1d_layers:
input_length = conv1d_output_length(layer, input_length)

return input_length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we can do earlier, in the config? It would avoid having to do these recursions on modules etc!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we'll standardize this computation (see #41203), we could be able to do it from the config, provided we ensure that every convolution layer’s parameters are stored there, and in the correct order relative to their appearance in the forward pass. However, I don’t think expanding the config with such parameters is a good idea, especially since we decided to hardcode them to avoid exposing them to the user.

I was thinking more along the lines of handling this directly during model initialization, when the modules are already being iterated over. For now, is it okay to merge this?

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, yes it's fine to merge it like that to unblock, since I see you're already working on a refined version! 🤗
CI not happy though it seems!

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: xcodec

@eustlb eustlb enabled auto-merge (squash) November 25, 2025 14:30
@eustlb eustlb merged commit f13b100 into huggingface:main Nov 25, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants