-
Notifications
You must be signed in to change notification settings - Fork 67
refactor metadata fallback logic #1021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
src/torchcodec/_core/Metadata.cpp
Outdated
| return endStreamPtsSecondsFromContent.value() - | ||
| beginStreamPtsSecondsFromContent.value(); | ||
| } | ||
| return std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the benefits of using the seek mode is that we can know that a scan should have been performed, and the metadata from the scan should definitely be there. I think we should instead make this:
case SeekMode:exact:
return endStreamPtsSecondsFromContent.value() -
beginStreamPtsSecondsFromContent.value();That is, we don't check if the value is present, and we don't return nullopt. The unconditional call to .value() will throw an exception if we somehow get into a situation where we're in exact seek mode and there is no value. But, that is what we want. Such a situation should never happen, and we want an exception thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you recommend modifying the logic such that exact mode is accessed directly but we keep the checks for custom_frame_mappings? Previously the Python logic didn't have knowledge of seek mode so custom_frame_mappings was oftentimes grouped with exact because scans are performed.
In custom_frame_mappings a scan only happens to specific streams upon add_video_stream() is called (ie. test_get_metadata in test_metadata.py). Removing the checks would lead to cases in which not all streams in the container would have scanned metadata.
Or should custom_frame_mappings that's missing the scanned information behave the same as approximate mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only allow one stream to be added and scanned via add_video_stream, and I believe (but do correct me if I am mistaken!) when we access any stream metadata, its from the stream currently stored as activeStreamIndex_. This stream index is updated by the custom_frame_mappings equivalent "scan" function.
Removing the checks would lead to cases in which not all streams in the container would have scanned metadata.
Is it possible to access the stream metadata of a stream other than activeStreamIndex_? If not, it should be safe to assume that the metadata will be present, and treat it as exact mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we access any stream metadata, its from the stream currently stored as activeStreamIndex_
I think in get_container_metadata.py in _metadata.py we call get_stream_json_metadata for each stream in the container. It is at this step that we are triggering the logic in Metadata.cpp for custom_frame_mappings that doesn't exist for streams that are at the activeStreamIndex_
torchcodec/src/torchcodec/_core/_metadata.py
Lines 259 to 260 in afd5aba
| for stream_index in range(container_dict["numStreams"]): | |
| stream_dict = json.loads(_get_stream_json_metadata(decoder, stream_index)) |
The error seems to come not from accessing a non-active stream but rather from how we store metadata.
|
@mollyxu, this is great! I think we can further refine the implementations of the fallback logic, but this is definitely the way. Thank you for working on this! :) |
|
Thanks for the review! To summarize, from my understanding, we currently store the metadata of all streams in a video (including streams that we are not decoding on).
In my latest commit, I tried to expose the Another potential approach that we could take is not modifying I would love to hear everyone's thoughts on these approaches and the general state of our metadata retrieval! |
|
@mollyxu, I actually think this approach might be the best way to do it. Looking more, we expose The alternatives that I can think of:
Tagging in @NicolasHug in case he has an opinion. |
| s += f"{SPACES}num_frames: {self.num_frames}\n" | ||
| s += f"{SPACES}average_fps: {self.average_fps}\n" | ||
| return s | ||
| return super().__repr__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this looks like the default now, do we even need to define this method anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think without defining this method, the child class will generate a __repr__() that will override the custom __repr__() we had in the parent class.
The child class generated repr's formatting will be
VideoStreamMetadata(duration_seconds_from_header=13.013, begin_stream_seconds_from_header=0.0, bit_rate=128783.0, ...)
Rather than
VideoStreamMetadata:
duration_seconds_from_header: 13.013
begin_stream_seconds_from_header: 0.0
bit_rate: 128783.0
...
Alternatively, to prevent this, we could also use
@dataclass(repr=False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. Reading up on it, I think that if we do repr=False, then we won't get one at all! I think the current definition is fine.
| // when getting metadata from non-active streams. | ||
| if ((seekMode != SeekMode::custom_frame_mappings) || | ||
| (seekMode == SeekMode::custom_frame_mappings && | ||
| stream_index == activeStreamIndex)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to factor out all of values that are common to both cases, such as durationSecondsFromHeader and mediaType. That way, it's more clear what the difference between the two cases is.
src/torchcodec/_core/Metadata.cpp
Outdated
| switch (seekMode) { | ||
| case SeekMode::custom_frame_mappings: | ||
| case SeekMode::exact: | ||
| if (getNumFrames(seekMode).has_value() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: let's define:
auto numFrames = getNumFrames(seekMode);And use it inside the expressions.
scotts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's address my two new comments before merging, but approving to unblock. This is great, I'm happy we've finally cleaned this up! :)
|
Thanks for the reviews! |
Consolidate Metadata Fallback Logic in C++
Related issue: #1009
Our metadata fallback logic was previously scattered across both C++ and Python layers, making it confusing and potentially led to circular dependencies.
In C++, we had seek-mode-aware logic:
torchcodec/src/torchcodec/_core/SingleStreamDecoder.cpp
Lines 1456 to 1467 in c552b60
In Python, we had fallback logic without seek mode awareness:
torchcodec/src/torchcodec/_core/_metadata.py
Lines 116 to 126 in c552b60
This separation meant:
Changes Made
1. Added Computed Methods to C++
StreamMetadataCreated seek-mode-aware methods in
**Metadata.cpp**:getDurationSeconds(SeekMode)getBeginStreamSeconds(SeekMode)getEndStreamSeconds(SeekMode)getNumFrames(SeekMode)getAverageFps(SeekMode)These methods encapsulate all fallback logic with proper seek mode handling.
2. Moved
SeekModeEnumMoved
SeekModefromSingleStreamDecoderclass toMetadata.hto make it accessible toStreamMetadatamethods.3. Updated C++ Custom Ops Layer
Modified
custom_ops.cppto call the new computed methods and pass computed values to Python via JSON serialization4. Simplified Python Metadata Classes
Updated
_metadata.py@propertymethods with fallback logic5. Updated Tests
Removed Python tests
test_metadata.pytest_num_frames_fallback- fallback logic now in C++test_calculate_num_frames_using_fps_and_duration- calculated in C++test_duration_seconds_fallback- fallback logic now in C++test_calculate_duration_seconds_using_fps_and_num_frames- calculated in C++Added C++ tests in
MetadataTest.cppNumFramesFallbackPriorityCalculateNumFramesUsingFpsAndDurationDurationSecondsFallbackCalculateDurationSecondsUsingFpsAndNumFrames