refactor metadata fallback logic #1021

mollyxu · 2025-11-04T06:07:41Z

Consolidate Metadata Fallback Logic in C++

Related issue: #1009

Our metadata fallback logic was previously scattered across both C++ and Python layers, making it confusing and potentially led to circular dependencies.

In C++, we had seek-mode-aware logic:

torchcodec/src/torchcodec/_core/SingleStreamDecoder.cpp

Lines 1456 to 1467 in c552b60

    
           double SingleStreamDecoder::getMinSeconds( 
        
               const StreamMetadata& streamMetadata) { 
        
             switch (seekMode_) { 
        
               case SeekMode::custom_frame_mappings: 
        
               case SeekMode::exact: 
        
                 return streamMetadata.beginStreamPtsSecondsFromContent.value(); 
        
               case SeekMode::approximate: 
        
                 return 0; 
        
               default: 
        
                 TORCH_CHECK(false, "Unknown SeekMode"); 
        
             } 
        
           }

In Python, we had fallback logic without seek mode awareness:

torchcodec/src/torchcodec/_core/_metadata.py

Lines 116 to 126 in c552b60

    
               @property 
        
               def begin_stream_seconds(self) -> float: 
        
                   """Beginning of the stream, in seconds (float). Conceptually, this 
        
                   corresponds to the first frame's :term:`pts`. If 
        
                   ``begin_stream_seconds_from_content`` is not None, then it is returned. 
        
                   Otherwise, this value is 0. 
        
                   """ 
        
                   if self.begin_stream_seconds_from_content is None: 
        
                       return 0 
        
                   else: 
        
                       return self.begin_stream_seconds_from_content

This separation meant:

Python code couldn't make seek-mode-aware decisions
Fallback logic was duplicated and could diverge

Changes Made

1. Added Computed Methods to C++ `StreamMetadata`

Created seek-mode-aware methods in **Metadata.cpp**:

getDurationSeconds(SeekMode)
getBeginStreamSeconds(SeekMode)
getEndStreamSeconds(SeekMode)
getNumFrames(SeekMode)
getAverageFps(SeekMode)

These methods encapsulate all fallback logic with proper seek mode handling.

2. Moved `SeekMode` Enum

Moved SeekMode from SingleStreamDecoder class to Metadata.h to make it accessible to StreamMetadata methods.

3. Updated C++ Custom Ops Layer

Modified custom_ops.cpp to call the new computed methods and pass computed values to Python via JSON serialization

4. Simplified Python Metadata Classes

Updated _metadata.py

Removed all @property methods with fallback logic
Added simple fields to receive computed values from C++

5. Updated Tests

Removed Python tests test_metadata.py

test_num_frames_fallback - fallback logic now in C++
test_calculate_num_frames_using_fps_and_duration - calculated in C++
test_duration_seconds_fallback - fallback logic now in C++
test_calculate_duration_seconds_using_fps_and_num_frames - calculated in C++

Added C++ tests in MetadataTest.cpp

NumFramesFallbackPriority
CalculateNumFramesUsingFpsAndDuration
DurationSecondsFallback
CalculateDurationSecondsUsingFpsAndNumFrames

src/torchcodec/_core/_metadata.py

src/torchcodec/_core/SingleStreamDecoder.cpp

test/test_decoders.py

src/torchcodec/_core/_metadata.py

scotts · 2025-11-05T18:57:57Z

src/torchcodec/_core/Metadata.cpp

+        return endStreamPtsSecondsFromContent.value() -
+            beginStreamPtsSecondsFromContent.value();
+      }
+      return std::nullopt;


One of the benefits of using the seek mode is that we can know that a scan should have been performed, and the metadata from the scan should definitely be there. I think we should instead make this:

case SeekMode:exact: return endStreamPtsSecondsFromContent.value() - beginStreamPtsSecondsFromContent.value();

That is, we don't check if the value is present, and we don't return nullopt. The unconditional call to .value() will throw an exception if we somehow get into a situation where we're in exact seek mode and there is no value. But, that is what we want. Such a situation should never happen, and we want an exception thrown.

Would you recommend modifying the logic such that exact mode is accessed directly but we keep the checks for custom_frame_mappings? Previously the Python logic didn't have knowledge of seek mode so custom_frame_mappings was oftentimes grouped with exact because scans are performed.

In custom_frame_mappings a scan only happens to specific streams upon add_video_stream() is called (ie. test_get_metadata in test_metadata.py). Removing the checks would lead to cases in which not all streams in the container would have scanned metadata.

Or should custom_frame_mappings that's missing the scanned information behave the same as approximate mode?

We only allow one stream to be added and scanned via add_video_stream, and I believe (but do correct me if I am mistaken!) when we access any stream metadata, its from the stream currently stored as activeStreamIndex_. This stream index is updated by the custom_frame_mappings equivalent "scan" function.

Removing the checks would lead to cases in which not all streams in the container would have scanned metadata.

Is it possible to access the stream metadata of a stream other than activeStreamIndex_? If not, it should be safe to assume that the metadata will be present, and treat it as exact mode.

when we access any stream metadata, its from the stream currently stored as activeStreamIndex_

I think in get_container_metadata.py in _metadata.py we call get_stream_json_metadata for each stream in the container. It is at this step that we are triggering the logic in Metadata.cpp for custom_frame_mappings that doesn't exist for streams that are at the activeStreamIndex_

torchcodec/src/torchcodec/_core/_metadata.py

Lines 259 to 260 in afd5aba

for stream_index in range(container_dict["numStreams"]):

stream_dict = json.loads(_get_stream_json_metadata(decoder, stream_index))

The error seems to come not from accessing a non-active stream but rather from how we store metadata.

src/torchcodec/_core/Metadata.cpp

scotts · 2025-11-05T19:50:28Z

@mollyxu, this is great! I think we can further refine the implementations of the fallback logic, but this is definitely the way. Thank you for working on this! :)

mollyxu · 2025-11-06T15:46:02Z

Thanks for the review!

To summarize, from my understanding, we currently store the metadata of all streams in a video (including streams that we are not decoding on).

The refactoring did not affect exact mode because we did a scan on the entire container and have the content metadata for all streams.
This was also not a problem in approximate mode because we are using fallback logic with checks.
However, with the way we are currently handling custom_frame_mappings, it would only have stream metadata for the stream that we are decoding on. This would lead to bad optional access on the other streams.

In my latest commit, I tried to expose the activeStreamIndex in SingleStreamDecoder so that in custom_ops.py we can handle the stream that we are decoding on differently from the other streams for seekmode = custom_frame_mappings.

Another potential approach that we could take is not modifying custom_ops.py but handling the case of custom_frame_mapings in a similar way as our approximate mode, falling back to default values after not finding values. This approach might overlook some underlying issues with how custom_frame_mapings should behave.

I would love to hear everyone's thoughts on these approaches and the general state of our metadata retrieval!

src/torchcodec/_core/Metadata.cpp

scotts · 2025-11-06T20:14:41Z

@mollyxu, I actually think this approach might be the best way to do it.

Looking more, we expose get_container_metadata(), and I think that's useful. It's a convenient way for people to inspect all metadata we can retrieve, both at the container level and all streams. And it's in there that we're hitting problems, because when seek_index==custom_frame_mappings, we don't have certain metadata for our non-active streams. During decoding, this doesn't matter because we are only concerned with the active stream. It's only when we're getting metadata from all streams we hit this.

The alternatives that I can think of:

Change get_container_metadata() so that it only returns stream metadata from the active index, if there is one. That's actually a breaking change, so I'd rather not do it for that reason alone. But it also removes some useful functionality.
Implement different member functions in StreamMetadata for decoding versus metadata retrieval. That goes against some of the point of this refactor, which is to do this logic in one place.
Make the StreamMetadata member functions aware of the active stream, and behave differently for each case. This is basically the same thing as option 2, except we're putting all of the logic in one function. In option 2, the caller would be responsible for knowing which function to call when.

Tagging in @NicolasHug in case he has an opinion.

src/torchcodec/_core/custom_ops.cpp

scotts · 2025-11-07T15:32:41Z

src/torchcodec/_core/_metadata.py

-        s += f"{SPACES}num_frames: {self.num_frames}\n"
-        s += f"{SPACES}average_fps: {self.average_fps}\n"
-        return s
+        return super().__repr__()


Since this looks like the default now, do we even need to define this method anymore?

I think without defining this method, the child class will generate a __repr__() that will override the custom __repr__() we had in the parent class.

The child class generated repr's formatting will be

VideoStreamMetadata(duration_seconds_from_header=13.013, begin_stream_seconds_from_header=0.0, bit_rate=128783.0, ...)

Rather than

VideoStreamMetadata: duration_seconds_from_header: 13.013 begin_stream_seconds_from_header: 0.0 bit_rate: 128783.0 ...

Alternatively, to prevent this, we could also use

@dataclass(repr=False)

Hmmm. Reading up on it, I think that if we do repr=False, then we won't get one at all! I think the current definition is fine.

scotts · 2025-11-07T15:38:18Z

src/torchcodec/_core/custom_ops.cpp

+  // when getting metadata from non-active streams.
+  if ((seekMode != SeekMode::custom_frame_mappings) ||
+      (seekMode == SeekMode::custom_frame_mappings &&
+       stream_index == activeStreamIndex)) {


We should be able to factor out all of values that are common to both cases, such as durationSecondsFromHeader and mediaType. That way, it's more clear what the difference between the two cases is.

scotts · 2025-11-07T15:58:37Z

src/torchcodec/_core/Metadata.cpp

+  switch (seekMode) {
+    case SeekMode::custom_frame_mappings:
+    case SeekMode::exact:
+      if (getNumFrames(seekMode).has_value() &&


Nit: let's define:

auto numFrames = getNumFrames(seekMode);

And use it inside the expressions.

scotts

Let's address my two new comments before merging, but approving to unblock. This is great, I'm happy we've finally cleaned this up! :)

mollyxu · 2025-11-07T19:18:37Z

Thanks for the reviews!

refactor metadata fallback logic

8bee8d7

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 4, 2025

Molly Xu and others added 5 commits November 4, 2025 11:55

fix tests and move seekmode to metadata.h

08c348e

add checks

868e23e

add cpp tests

e58cf39

Merge branch 'meta-pytorch:main' into refactor-metadata-fallback

47ff2db

use streametadata methods in singlestreamdecoder

35febfb

mollyxu marked this pull request as ready for review November 5, 2025 16:22

scotts reviewed Nov 5, 2025

View reviewed changes

src/torchcodec/_core/_metadata.py Outdated Show resolved Hide resolved

scotts reviewed Nov 5, 2025

View reviewed changes

src/torchcodec/_core/SingleStreamDecoder.cpp Outdated Show resolved Hide resolved

address feedback

20a7227

Dan-Flores reviewed Nov 5, 2025

View reviewed changes

test/test_decoders.py Outdated Show resolved Hide resolved

Dan-Flores reviewed Nov 5, 2025

View reviewed changes

src/torchcodec/_core/_metadata.py Outdated Show resolved Hide resolved

scotts reviewed Nov 5, 2025

View reviewed changes

src/torchcodec/_core/_metadata.py Outdated Show resolved Hide resolved

scotts reviewed Nov 5, 2025

View reviewed changes

src/torchcodec/_core/Metadata.cpp Outdated Show resolved Hide resolved

scotts reviewed Nov 5, 2025

View reviewed changes

src/torchcodec/_core/Metadata.cpp Show resolved Hide resolved

address feedback

b677863

scotts reviewed Nov 5, 2025

View reviewed changes

src/torchcodec/_core/Metadata.cpp Outdated Show resolved Hide resolved

scotts reviewed Nov 5, 2025

View reviewed changes

src/torchcodec/_core/Metadata.cpp Show resolved Hide resolved

modify docstrings

1478117

Molly Xu added 3 commits November 5, 2025 12:40

modified fallback logic

3494331

fix fallthrough error

cdabcb0

modify fallback logic and add torch_checks

8bf490c

scotts reviewed Nov 6, 2025

View reviewed changes

src/torchcodec/_core/Metadata.cpp Outdated Show resolved Hide resolved

scotts reviewed Nov 6, 2025

View reviewed changes

src/torchcodec/_core/custom_ops.cpp Outdated Show resolved Hide resolved

address feedback

e3b91ea

scotts reviewed Nov 7, 2025

View reviewed changes

scotts approved these changes Nov 7, 2025

View reviewed changes

address feedback

71f4861

	double SingleStreamDecoder::getMinSeconds(
	const StreamMetadata& streamMetadata) {
	switch (seekMode_) {
	case SeekMode::custom_frame_mappings:
	case SeekMode::exact:
	return streamMetadata.beginStreamPtsSecondsFromContent.value();
	case SeekMode::approximate:
	return 0;
	default:
	TORCH_CHECK(false, "Unknown SeekMode");
	}
	}

	@property
	def begin_stream_seconds(self) -> float:
	"""Beginning of the stream, in seconds (float). Conceptually, this
	corresponds to the first frame's :term:`pts`. If
	``begin_stream_seconds_from_content`` is not None, then it is returned.
	Otherwise, this value is 0.
	"""
	if self.begin_stream_seconds_from_content is None:
	return 0
	else:
	return self.begin_stream_seconds_from_content

	for stream_index in range(container_dict["numStreams"]):
	stream_dict = json.loads(_get_stream_json_metadata(decoder, stream_index))

refactor metadata fallback logic #1021

Are you sure you want to change the base?

refactor metadata fallback logic #1021

Uh oh!

Conversation

mollyxu commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Consolidate Metadata Fallback Logic in C++

Changes Made

1. Added Computed Methods to C++ StreamMetadata

2. Moved SeekMode Enum

3. Updated C++ Custom Ops Layer

4. Simplified Python Metadata Classes

5. Updated Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

scotts Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

mollyxu Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dan-Flores Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mollyxu Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

scotts commented Nov 5, 2025

Uh oh!

mollyxu commented Nov 6, 2025

Uh oh!

Uh oh!

scotts commented Nov 6, 2025

Uh oh!

Uh oh!

scotts Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

mollyxu Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

scotts Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

scotts Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

scotts Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

scotts left a comment

Choose a reason for hiding this comment

Uh oh!

mollyxu commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mollyxu commented Nov 4, 2025 •

edited

Loading

1. Added Computed Methods to C++ `StreamMetadata`

2. Moved `SeekMode` Enum

mollyxu Nov 5, 2025 •

edited

Loading

Dan-Flores Nov 6, 2025 •

edited

Loading