Skip to content

Conversation

@bjarthur
Copy link

using Float32 for Lanczos is ~2x faster and uses ~1/2x as much memory as the current Float64.

this PR currently uses whatever precision was input as the precision the internal calculations are performed with. i could also imagine specifying the type used for internal computations in the type (e.g. struct Lanczos4OpenCV{T} <: AbstractLanczos end) to separate it from the input.

i'm also curious where there is a more clever way to cast l4_2d_cs at compile time so as not to incur runtime penalities.

let me know what you think and i'll add some tests and docs.

julia> using Interpolations, BenchmarkTools

julia> x=rand(1_000_000);

julia> @benchmark Interpolations._lanczos4_opencv.(x)
BenchmarkTools.Trial: 231 samples with 1 evaluation per sample.
 Range (min … max):  19.061 ms … 83.561 ms  ┊ GC (min … max): 5.34% … 74.95%
 Time  (median):     21.303 ms              ┊ GC (median):    2.90%
 Time  (mean ± σ):   21.654 ms ±  4.135 ms  ┊ GC (mean ± σ):  4.35% ±  5.10%

                       ▅█▄▃                                    
  ▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▃▃▃▃▅▇█████▄▄▄▃▃▃▃▂▂▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▂ ▃
  19.1 ms         Histogram: frequency by time        24.8 ms <

 Memory estimate: 61.05 MiB, allocs estimate: 4.

julia> x=rand(Float32, 1_000_000);

julia> @benchmark Interpolations._lanczos4_opencv.(x)
BenchmarkTools.Trial: 387 samples with 1 evaluation per sample.
 Range (min … max):  12.078 ms … 76.608 ms  ┊ GC (min … max): 0.00% … 83.87%
 Time  (median):     12.695 ms              ┊ GC (median):    3.05%
 Time  (mean ± σ):   12.928 ms ±  3.290 ms  ┊ GC (mean ± σ):  4.56% ±  4.80%

     ▂       ▄▅▇██▇▅▂                                          
  ▆▇▇██▄▆▆▄▇██████████▄█▄▇▄▆▁▄▁▇▄▄▁▄▄▄▇▄▆▁▁▄▁▁▁▁▆▁▁▁▁▁▁▄▄▁▁▁▄ ▇
  12.1 ms      Histogram: log(frequency) by time      14.5 ms <

 Memory estimate: 30.53 MiB, allocs estimate: 4.

@codecov
Copy link

codecov bot commented Oct 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.01%. Comparing base (7a2d581) to head (740fa93).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #635      +/-   ##
==========================================
- Coverage   88.10%   88.01%   -0.10%     
==========================================
  Files          29       29              
  Lines        1908     1910       +2     
==========================================
  Hits         1681     1681              
- Misses        227      229       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bjarthur
Copy link
Author

second commit makes it slightly faster:

julia> x=rand(Float32, 1_000_000);

julia> @benchmark Interpolations.value_weights.(Ref(Lanczos4OpenCV()), x)
BenchmarkTools.Trial: 429 samples with 1 evaluation per sample.
 Range (min … max):  10.796 ms … 75.222 ms  ┊ GC (min … max): 0.00% … 85.30%
 Time  (median):     11.479 ms              ┊ GC (median):    3.84%
 Time  (mean ± σ):   11.664 ms ±  3.107 ms  ┊ GC (mean ± σ):  5.43% ±  4.66%

               ▂█▆▃▄▄▅▇▃▁                                      
  ▃▃▃▃▃▃▁▃▃▃▄▅▆██████████▆▃▃▃▄▃▁▁▂▃▁▁▃▃▂▁▁▃▁▂▁▃▁▂▂▁▁▁▁▁▁▁▁▂▂▂ ▃
  10.8 ms         Histogram: frequency by time          13 ms <

 Memory estimate: 30.53 MiB, allocs estimate: 5.

@bjarthur bjarthur changed the title reduced precision for lanczos_opencv WIP: reduced precision for lanczos_opencv Oct 27, 2025
@bjarthur bjarthur marked this pull request as draft October 27, 2025 22:09
@bjarthur
Copy link
Author

bjarthur commented Oct 27, 2025

the output is qualitatively different for Float32 compared to Float64. :(

specifically, when δx is 0f0 (a 32-bit float) in _lanczos4_opencv, then s0 and c0 are exactly identical, and so cs[4] becomes 0f0 because the numerator is zero. this is not a problem when δx is 0.0 (a 64-bit float), because due to a higher numerical precision somehow s0 and c0 are off by 1 eps, and so the numerator of csis not zero.

i don't know enough about lanczos resampling to fix this in the right way. what i do know though is that this is not a problem if opencv's lanczos implementation is followed more closely (see the last commit of this PR). specifically, iszero is replaced with abs(y) <= 1e-6.

so i'm putting this on the back burner for now. below is the output without the last commit.

@mkitti @timholy @mileslucas

julia> Interpolations.value_weights(Lanczos4OpenCV{Float64}(), 0.0)
(8.837979241208245e-19, -2.8122277740499295e-18, 7.95418131708742e-18, 1.0, -7.95418131708742e-18, 2.8122277740499295e-18, -8.837979241208245e-19, -7.80550006310146e-35)

julia> Interpolations.value_weights(Lanczos4OpenCV{Float32}(), 0f0)
(8.547569f6, -2.7198194f7, 7.692811f7, -0.0f0, -7.692811f7, 2.7198194f7, -8.547569f6, -0.0f0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant