-
Notifications
You must be signed in to change notification settings - Fork 72
Add pixel_format to VideoEncoder API #1027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
d75f0eb
4ff25f6
7ef8a8f
dc86a8c
884f4dc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -4,6 +4,10 @@ | |||||||
| #include "Encoder.h" | ||||||||
| #include "torch/types.h" | ||||||||
|
|
||||||||
| extern "C" { | ||||||||
| #include <libavutil/pixdesc.h> | ||||||||
| } | ||||||||
|
|
||||||||
| namespace facebook::torchcodec { | ||||||||
|
|
||||||||
| namespace { | ||||||||
|
|
@@ -534,6 +538,36 @@ torch::Tensor validateFrames(const torch::Tensor& frames) { | |||||||
| return frames.contiguous(); | ||||||||
| } | ||||||||
|
|
||||||||
| AVPixelFormat validatePixelFormat( | ||||||||
| const AVCodec& avCodec, | ||||||||
| const std::string& targetPixelFormat) { | ||||||||
| AVPixelFormat pixelFormat = av_get_pix_fmt(targetPixelFormat.c_str()); | ||||||||
|
|
||||||||
| // Validate that the encoder supports this pixel format | ||||||||
| const AVPixelFormat* supportedFormats = getSupportedPixelFormats(avCodec); | ||||||||
| if (supportedFormats != nullptr) { | ||||||||
| for (int i = 0; supportedFormats[i] != AV_PIX_FMT_NONE; ++i) { | ||||||||
| if (supportedFormats[i] == pixelFormat) { | ||||||||
| return pixelFormat; | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
|
|
||||||||
| std::stringstream errorMsg; | ||||||||
| // av_get_pix_fmt failed to find a pix_fmt | ||||||||
| if (pixelFormat == AV_PIX_FMT_NONE) { | ||||||||
| errorMsg << "Unknown pixel format: " << targetPixelFormat; | ||||||||
| } else { | ||||||||
| errorMsg << "Specified pixel format " << targetPixelFormat | ||||||||
| << " is not supported by the " << avCodec.name << " encoder."; | ||||||||
| } | ||||||||
| // Build error message, similar to FFmpeg's error log | ||||||||
| errorMsg << "\nSupported pixel formats for " << avCodec.name << ":"; | ||||||||
| for (int i = 0; supportedFormats[i] != AV_PIX_FMT_NONE; ++i) { | ||||||||
| errorMsg << " " << av_get_pix_fmt_name(supportedFormats[i]); | ||||||||
| } | ||||||||
| TORCH_CHECK(false, errorMsg.str()); | ||||||||
| } | ||||||||
| } // namespace | ||||||||
|
|
||||||||
| VideoEncoder::~VideoEncoder() { | ||||||||
|
|
@@ -635,15 +669,17 @@ void VideoEncoder::initializeEncoder( | |||||||
| outWidth_ = inWidth_; | ||||||||
| outHeight_ = inHeight_; | ||||||||
|
|
||||||||
| // TODO-VideoEncoder: Enable other pixel formats | ||||||||
| // Let FFmpeg choose best pixel format to minimize loss | ||||||||
| outPixelFormat_ = avcodec_find_best_pix_fmt_of_list( | ||||||||
| getSupportedPixelFormats(*avCodec), // List of supported formats | ||||||||
| AV_PIX_FMT_GBRP, // We reorder input to GBRP currently | ||||||||
| 0, // No alpha channel | ||||||||
| nullptr // Discard conversion loss information | ||||||||
| ); | ||||||||
| TORCH_CHECK(outPixelFormat_ != -1, "Failed to find best pix fmt") | ||||||||
| if (videoStreamOptions.pixelFormat.has_value()) { | ||||||||
| outPixelFormat_ = | ||||||||
| validatePixelFormat(*avCodec, videoStreamOptions.pixelFormat.value()); | ||||||||
| } else { | ||||||||
| const AVPixelFormat* formats = getSupportedPixelFormats(*avCodec); | ||||||||
| // Use first listed pixel format as default. | ||||||||
| // If pixel formats are undefined for some reason, try yuv420p | ||||||||
| outPixelFormat_ = (formats && formats[0] != AV_PIX_FMT_NONE) | ||||||||
| ? formats[0] | ||||||||
| : AV_PIX_FMT_YUV420P; | ||||||||
| } | ||||||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, this makes sense and that's similar to what we do for the audio encoder when we can't validate: torchcodec/src/torchcodec/_core/Encoder.cpp Lines 85 to 87 in 8e615e3
|
||||||||
|
|
||||||||
| // Configure codec parameters | ||||||||
| avCodecContext_->codec_id = avCodec->id; | ||||||||
|
|
||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -213,6 +213,7 @@ def encode_video_to_file_like( | |
| format: str, | ||
| file_like: Union[io.RawIOBase, io.BufferedIOBase], | ||
| crf: Optional[int] = None, | ||
| pixel_format: Optional[str] = None, | ||
| ) -> None: | ||
| """Encode video frames to a file-like object. | ||
|
|
||
|
|
@@ -222,6 +223,7 @@ def encode_video_to_file_like( | |
| format: Video format (e.g., "mp4", "mov", "mkv") | ||
| file_like: File-like object that supports write() and seek() methods | ||
| crf: Optional constant rate factor for encoding quality | ||
| pixel_format: Optional pixel format (e.g., "yuv420p", "yuv444p") | ||
| """ | ||
| assert _pybind_ops is not None | ||
|
|
||
|
|
@@ -230,6 +232,7 @@ def encode_video_to_file_like( | |
| frame_rate, | ||
| format, | ||
| _pybind_ops.create_file_like_context(file_like, True), # True means for writing | ||
| pixel_format, | ||
| crf, | ||
| ) | ||
|
|
||
|
|
@@ -319,6 +322,7 @@ def encode_video_to_file_abstract( | |
| frame_rate: int, | ||
| filename: str, | ||
| crf: Optional[int], | ||
| pixel_format: Optional[str], | ||
|
||
| ) -> None: | ||
| return | ||
|
|
||
|
|
@@ -329,6 +333,7 @@ def encode_video_to_tensor_abstract( | |
| frame_rate: int, | ||
| format: str, | ||
| crf: Optional[int], | ||
| pixel_format: Optional[str], | ||
|
||
| ) -> torch.Tensor: | ||
| return torch.empty([], dtype=torch.long) | ||
|
|
||
|
|
@@ -340,6 +345,7 @@ def _encode_video_to_file_like_abstract( | |
| format: str, | ||
| file_like_context: int, | ||
| crf: Optional[int] = None, | ||
| pixel_format: Optional[str] = None, | ||
| ) -> None: | ||
| return | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,5 @@ | ||
| from pathlib import Path | ||
| from typing import Union | ||
| from typing import Optional, Union | ||
|
|
||
| import torch | ||
| from torch import Tensor | ||
|
|
@@ -35,29 +35,38 @@ def __init__(self, frames: Tensor, *, frame_rate: int): | |
| def to_file( | ||
| self, | ||
| dest: Union[str, Path], | ||
| *, | ||
| pixel_format: Optional[str] = None, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good job on making this a keyword only params 👍 |
||
| ) -> None: | ||
| """Encode frames into a file. | ||
|
|
||
| Args: | ||
| dest (str or ``pathlib.Path``): The path to the output file, e.g. | ||
| ``video.mp4``. The extension of the file determines the video | ||
| container format. | ||
| pixel_format (str, optional): The pixel format for encoding (e.g., | ||
| "yuv420p", "yuv444p"). If not specified, uses codec's default format. | ||
| """ | ||
| _core.encode_video_to_file( | ||
| frames=self._frames, | ||
| frame_rate=self._frame_rate, | ||
| filename=str(dest), | ||
| pixel_format=pixel_format, | ||
| ) | ||
|
|
||
| def to_tensor( | ||
| self, | ||
| format: str, | ||
| *, | ||
| pixel_format: Optional[str] = None, | ||
| ) -> Tensor: | ||
| """Encode frames into raw bytes, as a 1D uint8 Tensor. | ||
|
|
||
| Args: | ||
| format (str): The container format of the encoded frames, e.g. "mp4", "mov", | ||
| "mkv", "avi", "webm", "flv", or "gif" | ||
| pixel_format (str, optional): The pixel format to encode frames into (e.g., | ||
| "yuv420p", "yuv444p"). If not specified, uses codec's default format. | ||
|
|
||
| Returns: | ||
| Tensor: The raw encoded bytes as 4D uint8 Tensor. | ||
|
|
@@ -66,12 +75,15 @@ def to_tensor( | |
| frames=self._frames, | ||
| frame_rate=self._frame_rate, | ||
| format=format, | ||
| pixel_format=pixel_format, | ||
| ) | ||
|
|
||
| def to_file_like( | ||
| self, | ||
| file_like, | ||
| format: str, | ||
| *, | ||
| pixel_format: Optional[str] = None, | ||
| ) -> None: | ||
| """Encode frames into a file-like object. | ||
|
|
||
|
|
@@ -83,10 +95,13 @@ def to_file_like( | |
| int = 0) -> int``. | ||
| format (str): The container format of the encoded frames, e.g. "mp4", "mov", | ||
| "mkv", "avi", "webm", "flv", or "gif". | ||
| pixel_format (str, optional): The pixel format for encoding (e.g., | ||
| "yuv420p", "yuv444p"). If not specified, uses codec's default format. | ||
| """ | ||
| _core.encode_video_to_file_like( | ||
| frames=self._frames, | ||
| frame_rate=self._frame_rate, | ||
| format=format, | ||
| file_like=file_like, | ||
| pixel_format=pixel_format, | ||
| ) | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -629,6 +629,30 @@ def test_bad_input(self, tmp_path): | |||||
| ): | ||||||
| encoder.to_tensor(format="bad_format") | ||||||
|
|
||||||
| @pytest.mark.parametrize("method", ("to_file", "to_tensor", "to_file_like")) | ||||||
| def test_pixel_format_errors(self, method, tmp_path): | ||||||
| frames = torch.zeros((5, 3, 64, 64), dtype=torch.uint8) | ||||||
| encoder = VideoEncoder(frames, frame_rate=30) | ||||||
|
|
||||||
| if method == "to_file": | ||||||
| valid_params = dict(dest=str(tmp_path / "output.mp4")) | ||||||
| elif method == "to_tensor": | ||||||
| valid_params = dict(format="mp4") | ||||||
| elif method == "to_file_like": | ||||||
| valid_params = dict(file_like=io.BytesIO(), format="mp4") | ||||||
|
|
||||||
| with pytest.raises( | ||||||
| RuntimeError, | ||||||
| match=r"Unknown pixel format: invalid_pix_fmt[\s\S]*Supported pixel formats.*yuv420p", | ||||||
| ): | ||||||
| getattr(encoder, method)(**valid_params, pixel_format="invalid_pix_fmt") | ||||||
|
|
||||||
| with pytest.raises( | ||||||
| RuntimeError, | ||||||
| match=r"Specified pixel format rgb24 is not supported[\s\S]*Supported pixel formats.*yuv420p", | ||||||
| ): | ||||||
| getattr(encoder, method)(**valid_params, pixel_format="rgb24") | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess it's personal taste, but I think this test might be simpler and more clear if we did the more basic thing: with pytest.raises(
RuntimeError,
match=r"Unknown pixel format: invalid_pix_fmt[\s\S]*Supported pixel formats.*yuv420p",
):
encoder.to_file(str(tmp_path / "output.mp4", pixel_format="invalid_pix_fmt")
with pytest.raises(
RuntimeError,
match=r"Unknown pixel format: invalid_pix_fmt[\s\S]*Supported pixel formats.*yuv420p",
):
encoder.to_tensor(format="mp4", pixel_format="invalid_pix_fmt")
with pytest.raises(
RuntimeError,
match=r"Unknown pixel format: invalid_pix_fmt[\s\S]*Supported pixel formats.*yuv420p",
):
encoder.to_file_like(file_like=io.BytesIO(), format="mp4", pixel_format="invalid_pix_fmt")
...
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that this is a bit confusing, but I reused it as the AudioEncoder tests also use this pattern to test across encoding methods. torchcodec/test/test_encoders.py Lines 193 to 194 in dc86a8c
In terms of taste, I would prefer to use this pattern and reduce code duplication. Alternatively, I don't think this test needs to be parametrized across encoding methods, since the error will always be hit in
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have a strong pref on the style. The compact parametrized form is a pattern we use across all existing encode tests (both audio and video), but I agree it is harder to understand at first sight. We should however test all 3 public methods: we want to make sure all these errors are properly raised by all methods, and the fact that they all rely on the same |
||||||
|
|
||||||
| @pytest.mark.parametrize("method", ("to_file", "to_tensor", "to_file_like")) | ||||||
| def test_contiguity(self, method, tmp_path): | ||||||
| # Ensure that 2 sets of video frames with the same pixel values are encoded | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we were using
avcodec_find_best_pix_fmt_of_list, now our heuristic is to return the first format in the list. Do I understand correctly that the reason for this change is that you have empirically observe that the first in the list is oftenyuv420p, which is a good default?I think it makes sense, just confirming my understanding. IT might be worth making that very explicit through a comment explaining that
yuv420pis often the first entryThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,
yuv420pis often the first in the list, and FFmpeg's avcodec_default_get_format uses the same heuristic.I'll add a comment here to mention this as well