Proper resize tests; remove swscale resize #1013

scotts · 2025-10-30T19:13:04Z

I forked this from #1003, which is about the public API for decoder-native transforms. When working on that I realized we didn't have proper resize transform tests, and when working on those tests, I realized we needed to change transform behavior. I want to keep #1003 about the public API, so this PR is about those tests and the behavior change.

Because I've flip-flopped on this point, I want to make this status explicit: all transforms are applied after the conversion to their output pixel format and colorspace. In other words, all transforms happen in the output pixel format and colorspace (which is now only RGB24). We want the output from TorchCodec's decoder-native transforms to be as close as possible to passing untransformed frames from TorchCodec to TorchVision transforms.

This PR does two things:

Adds proper resize tests which compare TorchCodec using the decoder-native resize against:
1. Checked-in frames generated by the ffmpeg CLI with an explicit scale filter. We expect bit-for-bit equality on CPU and Linux.
2. Passing a full, untransformed frame from TorchCodec to torchvision.transforms.v2.function.resize(). We expect some difference here, but it's small: 99% of all values are within 1.
Removes all special handling for resize. On main, if there is a single transform, and it's a resize, then we call sw_scale() directly instead of going through filtergraph. We remove that special handling here. When there are no transforms, we can still use sw_scale() for our format conversion. The reason we remove this special handling is that when calling sw_scale() once to do both a resize and colorspace conversion, it will do the colorspace conversion first. That does not match what we do when calling filtergraph. In order to get the same behavior as with filtergraph, we will have to call sw_scale() twice: once to do the resize, and another time to do the colorspace conversion. We will need to benchmark that doing so is faster than calling filtergraph. I have created issue Resize transform optimization: use swscale when appropriate #1018 to track.
Removes scaling from all of the existing color conversion library tests as swscale no longer allows that.

conversion

…ze_test

scotts · 2025-10-31T01:48:56Z

src/torchcodec/_core/CpuDeviceInterface.h

+  // we could achieve that by calling sws_scale() twice: once to do the resize
+  // and another time to do the format conversion. But that will be slower,
+  // which goes against the whole point of calling sws_scale() directly.
+  std::string filters_ = "format=rgb24";


@NicolasHug, do you think I should pull this explanation out into a named section, and then refer to it in a few different places?

I think it's fine here. I'd just suggest to remove (or re-write?) the last paragraph because it seems redundant with the second.

Is there any difference between "copy" and "format=rgb24" when passing AV_PIX_FMT_RGB24? I think we have made this slight change now.

filtersContext( avFrame->width, avFrame->height, avFrameFormat, avFrame->sample_aspect_ratio, outputDims.width, outputDims.height, /*outputFormat=*/AV_PIX_FMT_RGB24, filters_, # "copy" vs "format=rgb24" ???? timeBase_);

I might suggest to leave the default to "copy"?

I used "copy" before because the empty string is not a valid filtergraph (I tried). So we needed to provide something, and "copy" is the closest I could find to a no-op. That, however, was back when the transforms were okay to happen in the input colorspace; we were okay with the automatically inserted format conversions.

Now that we want the transforms to happen in RGB24, we need to explicitly specify the format conversion. For the no-transformations case, there is no difference between explicitly specifying "format=rgb24" and not. However, when there are transformations, the explicit "format=rgb24" makes sure that the transforms happen in RGB24 colorspace. Without it, filtergraph will automatically insert a format after our transforms to do the conversion we specified when we said the output node is AV_PIX_FMT_RGB24.

With all that said, I don't love the current logic as it does require more knowledge. I'm going to try to make it more clear without introducing "copy".

scotts · 2025-10-31T02:13:13Z

src/torchcodec/_core/Transform.cpp

  return "scale=" + std::to_string(outputDims_.width) + ":" +
      std::to_string(outputDims_.height) +
-      ":sws_flags=" + toFilterGraphInterpolation(interpolationMode_);
+      ":flags=" + toFilterGraphInterpolation(interpolationMode_);


From the FFmpeg docs:

Libavfilter will automatically insert scale filters where format conversion is required. It is possible to specify swscale flags for those automatically inserted scalers by prepending sws_flags=flags; to the filtergraph description.

Whereas flags is the specific parameter to scale. They end up being semantically equivalent, but it's more clear to use the scale option here.

scotts · 2025-10-31T02:16:15Z

test/test_transform_ops.py

+            # As a consequence, they are not going to be identical.
+            assert_tensor_close_on_at_least(
+                frame_resize, frame_tv, percentage=85, atol=3
+            )


@NicolasHug, is this tolerable?

This is promising, I'd suggest to be a bit more strict:

test on TEST_SRC_2_720P instead of the nasa video. The nasa vid is really small, and in parts mostly pitch black.

Increase the percentage. Looks like we can push it to 99.9.

assert an upper bound on the max abs diff. Looks like the max abs diff is 4. It's higher than I hoped, but on 0.1% of the pixels it's probably OK

assert that those tests don't pass when comparing against antialias=False.

I wanted to check for all of these myself so let me share my diff:

diff --git a/test/test_transform_ops.py b/test/test_transform_ops.py index 818de0a..62606e2 100644 --- a/test/test_transform_ops.py +++ b/test/test_transform_ops.py @@ -31,6 +31,7 @@ from .utils import ( H265_VIDEO, NASA_VIDEO, needs_cuda, + TEST_SRC_2_720P, ) torch._dynamo.config.capture_dynamic_output_shape_ops = True @@ -171,31 +172,48 @@ class TestCoreVideoDecoderTransformOps: ((1.5, 1.31), (0.5, 0.71), (0.7, 1.31), (1.5, 0.71), (1.0, 1.0), (2.0, 2.0)), ) def test_resize_torchvision(self, height_scaling_factor, width_scaling_factor): - height = int(NASA_VIDEO.get_height() * height_scaling_factor) - width = int(NASA_VIDEO.get_width() * width_scaling_factor) + asset = TEST_SRC_2_720P + height = int(asset.get_height() * height_scaling_factor) + width = int(asset.get_width() * width_scaling_factor) resize_spec = f"resize, {height}, {width}" - decoder_resize = create_from_file(str(NASA_VIDEO.path)) + decoder_resize = create_from_file(str(asset.path)) add_video_stream(decoder_resize, transform_specs=resize_spec) - decoder_full = create_from_file(str(NASA_VIDEO.path)) + decoder_full = create_from_file(str(asset.path)) add_video_stream(decoder_full) - for frame_index in [0, 10, 17, 100, 230, 389]: - expected_shape = (NASA_VIDEO.get_num_color_channels(), height, width) + for frame_index in [0, 10, 17, 20, 40, 50]: + expected_shape = (asset.get_num_color_channels(), height, width) frame_resize, *_ = get_frame_at_index( decoder_resize, frame_index=frame_index ) frame_full, *_ = get_frame_at_index(decoder_full, frame_index=frame_index) frame_tv = v2.functional.resize(frame_full, size=(height, width)) + frame_tv_no_antialias = v2.functional.resize( + frame_full, size=(height, width), antialias=False + ) assert frame_resize.shape == expected_shape assert frame_tv.shape == expected_shape + assert frame_tv_no_antialias.shape == expected_shape assert_tensor_close_on_at_least( - frame_resize, frame_tv, percentage=99, atol=1 + frame_resize, frame_tv, percentage=99.9, atol=1 ) + torch.testing.assert_allclose(frame_resize, frame_tv, rtol=0, atol=4) + + if height_scaling_factor < 1 or width_scaling_factor < 1: + # Antialias only relevant when down-scaling! + with pytest.raises(AssertionError, match="Expected at least"): + assert_tensor_close_on_at_least( + frame_resize, frame_tv_no_antialias, percentage=99, atol=1 + ) + with pytest.raises(AssertionError, match="Tensor-likes are not close"): + torch.testing.assert_allclose( + frame_resize, frame_tv_no_antialias, rtol=0, atol=4 + ) def test_resize_ffmpeg(self): height = 135

@NicolasHug, see code below: for the frames I'm testing against in the video, I had to increase atol from 4 to 6.

test/test_transform_ops.py

scotts · 2025-11-03T15:11:05Z

test/test_transform_ops.py

-            assert_frames_equal(frame, frame_tv)
-            assert_frames_equal(frame, frame_ref)
+            assert_frames_equal(frame_crop, frame_ref)
+            assert_frames_equal(frame_crop, frame_tv)


Pointing out that here we get bit-for-bit equality with TorchVision's crop.

src/torchcodec/_core/Transform.cpp

NicolasHug · 2025-11-03T18:37:33Z

src/torchcodec/_core/CpuDeviceInterface.h

+  // we could achieve that by calling sws_scale() twice: once to do the resize
+  // and another time to do the format conversion. But that will be slower,
+  // which goes against the whole point of calling sws_scale() directly.
+  std::string filters_ = "format=rgb24";


I think it's fine here. I'd just suggest to remove (or re-write?) the last paragraph because it seems redundant with the second.

Is there any difference between "copy" and "format=rgb24" when passing AV_PIX_FMT_RGB24? I think we have made this slight change now.

filtersContext( avFrame->width, avFrame->height, avFrameFormat, avFrame->sample_aspect_ratio, outputDims.width, outputDims.height, /*outputFormat=*/AV_PIX_FMT_RGB24, filters_, # "copy" vs "format=rgb24" ???? timeBase_);

I might suggest to leave the default to "copy"?

NicolasHug · 2025-11-03T18:40:25Z

src/torchcodec/_core/CpuDeviceInterface.cpp

+    // Note that we ensure that the transforms come AFTER the format conversion.
+    // This means that the transforms are applied in the output pixel format and
+    // colorspace.
+    filters_ += "," + filters.str();


Below I'm suggesting that maybe we could leave the default of filters_ to "copy", because I didn't understand this comment at first. I think the comment is only clear if one knows that the default of filters_ is the format conversion. If we keep copy as the default, then this becomes

Suggested change

filters_ += "," + filters.str();

filters_ = "format=rgb24" + filters.str();

and this comment is more obvious.

I don't have a strong opinion and if I'm not making any sense, please feel free to ignore (it's the end of the day!)

scotts added 5 commits October 27, 2025 12:27

Decoder-native resize public implementation

dd24dfa

Lint

3a2df84

Add resize transform tests; make transforms happen before color

1541ab8

conversion

Merge branch 'main' of github.com:pytorch/torchcodec into proper_resi…

25f3002

…ze_test

Commas matter.

b4efafb

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 30, 2025

Comment tweaks

a7c6d69

scotts commented Oct 31, 2025

View reviewed changes

Better testing

5717cb4

scotts commented Oct 31, 2025

View reviewed changes

test/test_transform_ops.py Outdated Show resolved Hide resolved

More comments

f0a6174

scotts marked this pull request as ready for review October 31, 2025 02:38

Tighter crop bound

f853e3a

scotts commented Oct 31, 2025

View reviewed changes

test/test_transform_ops.py Outdated Show resolved Hide resolved

Dan-Flores reviewed Oct 31, 2025

View reviewed changes

test/test_transform_ops.py Outdated Show resolved Hide resolved

scotts marked this pull request as draft October 31, 2025 18:00

scotts added 2 commits October 31, 2025 18:59

Full 180

0e61aba

Better comments

2a391b6

scotts changed the title ~~Transforms now happen in input frame colorspace~~ Proper resize tests; remove swscale resize Nov 1, 2025

Remove resize from color conversion library tests

7668eea

scotts mentioned this pull request Nov 3, 2025

Resize transform optimization: use swscale when appropriate #1018

Open

Go back to auto

08fa176

scotts marked this pull request as ready for review November 3, 2025 03:21

scotts commented Nov 3, 2025

View reviewed changes

Remove unused field

928ae4c

Dan-Flores reviewed Nov 3, 2025

View reviewed changes

src/torchcodec/_core/Transform.cpp Outdated Show resolved Hide resolved

Remove old comment

a95dd4c

NicolasHug approved these changes Nov 3, 2025

View reviewed changes

scotts added 3 commits November 5, 2025 06:57

More involved testing

4245bdd

99.9% -> 99.8%

225b6fb

Lint

02f884c

scotts merged commit afd5aba into meta-pytorch:main Nov 5, 2025
65 of 66 checks passed

scotts deleted the proper_resize_test branch November 5, 2025 18:08

	filters_ += "," + filters.str();
	filters_ = "format=rgb24" + filters.str();

Proper resize tests; remove swscale resize #1013

Proper resize tests; remove swscale resize #1013

Uh oh!

Conversation

scotts commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

scotts commented Oct 30, 2025 •

edited

Loading