Skip to content

Conversation

@mollyxu
Copy link
Contributor

@mollyxu mollyxu commented Nov 4, 2025

Consolidate Metadata Fallback Logic in C++

Related issue: #1009

Our metadata fallback logic was previously scattered across both C++ and Python layers, making it confusing and potentially led to circular dependencies.

In C++, we had seek-mode-aware logic:

double SingleStreamDecoder::getMinSeconds(
const StreamMetadata& streamMetadata) {
switch (seekMode_) {
case SeekMode::custom_frame_mappings:
case SeekMode::exact:
return streamMetadata.beginStreamPtsSecondsFromContent.value();
case SeekMode::approximate:
return 0;
default:
TORCH_CHECK(false, "Unknown SeekMode");
}
}

In Python, we had fallback logic without seek mode awareness:

@property
def begin_stream_seconds(self) -> float:
"""Beginning of the stream, in seconds (float). Conceptually, this
corresponds to the first frame's :term:`pts`. If
``begin_stream_seconds_from_content`` is not None, then it is returned.
Otherwise, this value is 0.
"""
if self.begin_stream_seconds_from_content is None:
return 0
else:
return self.begin_stream_seconds_from_content

This separation meant:

  • Python code couldn't make seek-mode-aware decisions
  • Fallback logic was duplicated and could diverge

Changes Made

1. Added Computed Methods to C++ StreamMetadata

Created seek-mode-aware methods in **Metadata.cpp**:

  • getDurationSeconds(SeekMode)
  • getBeginStreamSeconds(SeekMode)
  • getEndStreamSeconds(SeekMode)
  • getNumFrames(SeekMode)
  • getAverageFps(SeekMode)

These methods encapsulate all fallback logic with proper seek mode handling.

2. Moved SeekMode Enum

Moved SeekMode from SingleStreamDecoder class to Metadata.h to make it accessible to StreamMetadata methods.

3. Updated C++ Custom Ops Layer

Modified custom_ops.cpp to call the new computed methods and pass computed values to Python via JSON serialization

4. Simplified Python Metadata Classes

Updated _metadata.py

  • Removed all @property methods with fallback logic
  • Added simple fields to receive computed values from C++

5. Updated Tests

Removed Python tests test_metadata.py

  • test_num_frames_fallback - fallback logic now in C++
  • test_calculate_num_frames_using_fps_and_duration - calculated in C++
  • test_duration_seconds_fallback - fallback logic now in C++
  • test_calculate_duration_seconds_using_fps_and_num_frames - calculated in C++

Added C++ tests in MetadataTest.cpp

  • NumFramesFallbackPriority
  • CalculateNumFramesUsingFpsAndDuration
  • DurationSecondsFallback
  • CalculateDurationSecondsUsingFpsAndNumFrames

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 4, 2025
@mollyxu mollyxu marked this pull request as ready for review November 5, 2025 16:22
return endStreamPtsSecondsFromContent.value() -
beginStreamPtsSecondsFromContent.value();
}
return std::nullopt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the benefits of using the seek mode is that we can know that a scan should have been performed, and the metadata from the scan should definitely be there. I think we should instead make this:

  case SeekMode:exact:
    return endStreamPtsSecondsFromContent.value() -
            beginStreamPtsSecondsFromContent.value();

That is, we don't check if the value is present, and we don't return nullopt. The unconditional call to .value() will throw an exception if we somehow get into a situation where we're in exact seek mode and there is no value. But, that is what we want. Such a situation should never happen, and we want an exception thrown.

Copy link
Contributor Author

@mollyxu mollyxu Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you recommend modifying the logic such that exact mode is accessed directly but we keep the checks for custom_frame_mappings? Previously the Python logic didn't have knowledge of seek mode so custom_frame_mappings was oftentimes grouped with exact because scans are performed.

In custom_frame_mappings a scan only happens to specific streams upon add_video_stream() is called (ie. test_get_metadata in test_metadata.py). Removing the checks would lead to cases in which not all streams in the container would have scanned metadata.

Or should custom_frame_mappings that's missing the scanned information behave the same as approximate mode?

Copy link
Contributor

@Dan-Flores Dan-Flores Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only allow one stream to be added and scanned via add_video_stream, and I believe (but do correct me if I am mistaken!) when we access any stream metadata, its from the stream currently stored as activeStreamIndex_. This stream index is updated by the custom_frame_mappings equivalent "scan" function.

Removing the checks would lead to cases in which not all streams in the container would have scanned metadata.

Is it possible to access the stream metadata of a stream other than activeStreamIndex_? If not, it should be safe to assume that the metadata will be present, and treat it as exact mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we access any stream metadata, its from the stream currently stored as activeStreamIndex_

I think in get_container_metadata.py in _metadata.py we call get_stream_json_metadata for each stream in the container. It is at this step that we are triggering the logic in Metadata.cpp for custom_frame_mappings that doesn't exist for streams that are at the activeStreamIndex_

for stream_index in range(container_dict["numStreams"]):
stream_dict = json.loads(_get_stream_json_metadata(decoder, stream_index))

The error seems to come not from accessing a non-active stream but rather from how we store metadata.

@scotts
Copy link
Contributor

scotts commented Nov 5, 2025

@mollyxu, this is great! I think we can further refine the implementations of the fallback logic, but this is definitely the way. Thank you for working on this! :)

@mollyxu
Copy link
Contributor Author

mollyxu commented Nov 6, 2025

Thanks for the review!

To summarize, from my understanding, we currently store the metadata of all streams in a video (including streams that we are not decoding on).

  • The refactoring did not affect exact mode because we did a scan on the entire container and have the content metadata for all streams.
  • This was also not a problem in approximate mode because we are using fallback logic with checks.
  • However, with the way we are currently handling custom_frame_mappings, it would only have stream metadata for the stream that we are decoding on. This would lead to bad optional access on the other streams.

In my latest commit, I tried to expose the activeStreamIndex in SingleStreamDecoder so that in custom_ops.py we can handle the stream that we are decoding on differently from the other streams for seekmode = custom_frame_mappings.

Another potential approach that we could take is not modifying custom_ops.py but handling the case of custom_frame_mapings in a similar way as our approximate mode, falling back to default values after not finding values. This approach might overlook some underlying issues with how custom_frame_mapings should behave.

I would love to hear everyone's thoughts on these approaches and the general state of our metadata retrieval!

@scotts
Copy link
Contributor

scotts commented Nov 6, 2025

@mollyxu, I actually think this approach might be the best way to do it.

Looking more, we expose get_container_metadata(), and I think that's useful. It's a convenient way for people to inspect all metadata we can retrieve, both at the container level and all streams. And it's in there that we're hitting problems, because when seek_index==custom_frame_mappings, we don't have certain metadata for our non-active streams. During decoding, this doesn't matter because we are only concerned with the active stream. It's only when we're getting metadata from all streams we hit this.

The alternatives that I can think of:

  1. Change get_container_metadata() so that it only returns stream metadata from the active index, if there is one. That's actually a breaking change, so I'd rather not do it for that reason alone. But it also removes some useful functionality.
  2. Implement different member functions in StreamMetadata for decoding versus metadata retrieval. That goes against some of the point of this refactor, which is to do this logic in one place.
  3. Make the StreamMetadata member functions aware of the active stream, and behave differently for each case. This is basically the same thing as option 2, except we're putting all of the logic in one function. In option 2, the caller would be responsible for knowing which function to call when.

Tagging in @NicolasHug in case he has an opinion.

s += f"{SPACES}num_frames: {self.num_frames}\n"
s += f"{SPACES}average_fps: {self.average_fps}\n"
return s
return super().__repr__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this looks like the default now, do we even need to define this method anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think without defining this method, the child class will generate a __repr__() that will override the custom __repr__() we had in the parent class.

The child class generated repr's formatting will be

VideoStreamMetadata(duration_seconds_from_header=13.013, begin_stream_seconds_from_header=0.0, bit_rate=128783.0, ...)

Rather than

VideoStreamMetadata:
  duration_seconds_from_header: 13.013
  begin_stream_seconds_from_header: 0.0
  bit_rate: 128783.0
  ...

Alternatively, to prevent this, we could also use

@dataclass(repr=False)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. Reading up on it, I think that if we do repr=False, then we won't get one at all! I think the current definition is fine.

// when getting metadata from non-active streams.
if ((seekMode != SeekMode::custom_frame_mappings) ||
(seekMode == SeekMode::custom_frame_mappings &&
stream_index == activeStreamIndex)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to factor out all of values that are common to both cases, such as durationSecondsFromHeader and mediaType. That way, it's more clear what the difference between the two cases is.

switch (seekMode) {
case SeekMode::custom_frame_mappings:
case SeekMode::exact:
if (getNumFrames(seekMode).has_value() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's define:

auto numFrames = getNumFrames(seekMode);

And use it inside the expressions.

Copy link
Contributor

@scotts scotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's address my two new comments before merging, but approving to unblock. This is great, I'm happy we've finally cleaned this up! :)

@mollyxu
Copy link
Contributor Author

mollyxu commented Nov 7, 2025

Thanks for the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants