-
Notifications
You must be signed in to change notification settings - Fork 68
Proper resize tests; remove swscale resize #1013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 13 commits
dd24dfa
3a2df84
1541ab8
25f3002
b4efafb
a7c6d69
5717cb4
f0a6174
f853e3a
0e61aba
2a391b6
7668eea
08fa176
928ae4c
a95dd4c
4245bdd
225b6fb
02f884c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -87,15 +87,25 @@ class CpuDeviceInterface : public DeviceInterface { | |
| UniqueSwsContext swsContext_; | ||
| SwsFrameContext prevSwsFrameContext_; | ||
|
|
||
| // The filter we supply to filterGraph_, if it is used. The default is the | ||
| // copy filter, which just copies the input to the output. Computationally, it | ||
| // should be a no-op. If we get no user-provided transforms, we will use the | ||
| // copy filter. Otherwise, we will construct the string from the transforms. | ||
| // We pass these filters to FFmpeg's filtergraph API. It is a simple pipeline | ||
| // of what FFmpeg calls "filters" to apply to decoded frames before returning | ||
| // them. In the PyTorch ecosystem, we call these "transforms". During | ||
| // initialization, we convert the user-supplied transforms into this string of | ||
| // filters. | ||
| // | ||
| // Note that even if we only use the copy filter, we still get the desired | ||
| // colorspace conversion. We construct the filtergraph with its output sink | ||
| // set to RGB24. | ||
| std::string filters_ = "copy"; | ||
| // Note that we start with just the format conversion, and then we ensure that | ||
| // the user-supplied filters always happen AFTER the format conversion. We | ||
| // want the user-supplied filters to operate on frames in the output pixel | ||
| // format and colorspace. | ||
| // | ||
| // We apply the transforms on the output pixel format and colorspace because | ||
| // then decoder-native transforms are as close as possible to returning | ||
| // untransformed frames and applying TochVision transforms to them. | ||
| // | ||
| // We ensure that the transforms happen on the output pixel format and | ||
| // colorspace by making sure all of the user-supplied filters happen AFTER | ||
| // an explicit format conversion. | ||
| std::string filters_ = "format=rgb24"; | ||
|
||
|
|
||
| // The flags we supply to swsContext_, if it used. The flags control the | ||
| // resizing algorithm. We default to bilinear. Users can override this with a | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,38 +25,20 @@ std::string toFilterGraphInterpolation( | |
| } | ||
| } | ||
|
|
||
| int toSwsInterpolation(ResizeTransform::InterpolationMode mode) { | ||
| switch (mode) { | ||
| case ResizeTransform::InterpolationMode::BILINEAR: | ||
| return SWS_BILINEAR; | ||
| default: | ||
| TORCH_CHECK( | ||
| false, | ||
| "Unknown interpolation mode: " + | ||
| std::to_string(static_cast<int>(mode))); | ||
| } | ||
| } | ||
|
|
||
| } // namespace | ||
|
|
||
| std::string ResizeTransform::getFilterGraphCpu() const { | ||
| // Note that we turn on gamma correct scaling. This produces results that are | ||
| // closer to what TorchVision's resize produces. | ||
scotts marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| return "scale=" + std::to_string(outputDims_.width) + ":" + | ||
| std::to_string(outputDims_.height) + | ||
| ":sws_flags=" + toFilterGraphInterpolation(interpolationMode_); | ||
| ":flags=" + toFilterGraphInterpolation(interpolationMode_); | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From the FFmpeg docs:
Whereas |
||
| } | ||
|
|
||
| std::optional<FrameDims> ResizeTransform::getOutputFrameDims() const { | ||
| return outputDims_; | ||
| } | ||
|
|
||
| bool ResizeTransform::isResize() const { | ||
| return true; | ||
| } | ||
|
|
||
| int ResizeTransform::getSwsFlags() const { | ||
| return toSwsInterpolation(interpolationMode_); | ||
| } | ||
|
|
||
| CropTransform::CropTransform(const FrameDims& dims, int x, int y) | ||
| : outputDims_(dims), x_(x), y_(y) { | ||
| TORCH_CHECK(x_ >= 0, "Crop x position must be >= 0, got: ", x_); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below I'm suggesting that maybe we could leave the default of
filters_to "copy", because I didn't understand this comment at first. I think the comment is only clear if one knows that the default offilters_is the format conversion. If we keepcopyas the default, then this becomesand this comment is more obvious.
I don't have a strong opinion and if I'm not making any sense, please feel free to ignore (it's the end of the day!)