FastCUDASSIM.jl

Fast calculation of the Structural Similarity Index Measure (SSIM) and its gradients, on NVIDIA GPUs.

Quick start

julia> using Pkg; Pkg.add("CUDA"); Pkg.add(url = "https://github.com/LaurensDiels/FastCUDASSIM.jl")  # Currently not yet registered

julia> using FastCUDASSIM, CUDA

julia> img1 = CUDA.rand(Float32, 3, 512, 768); img2 = similar(img1);

julia> ssim(img1, img2)  # Actual value depends on RNG above
4.7802605f-6

julia> ssim_gradient(img1, img2)  # w.r.t. img1
3×512×768 CuArray{Float32, 3, CUDA.DeviceMemory}:
[:, :, 1] =
 -4.05717f-11 ...
 ...          ...

julia> using Zygote

julia> Zygote.gradient(x -> ssim(x, img2), img1)
(Float32[-4.057169f-11 ...],)

julia> using TestImages

julia> dssim(cu(testimage("cameraman.tif")), cu(testimage("mandril_gray.tif")))  # CuMatrix{Gray{N0f8}} inputs
0.8373802f0

Key points

  • Fast by avoiding global memory as much as reasonably possible. See the benchmarks.
  • Support for CUDA only: no other graphics APIs, no CPU fallback.
  • Reverse-mode autodiff integration via ChainRules.jl. Works out of the box with Zygote.jl.
  • The convolutions use zero padding and the usual Gaussian kernel with (conceptual) window size $11 \times 11$ and $\sigma = 1.5$, in Float32 precision.
  • Image intensities are assumed to lie in $[0, 1]$.
  • Images should use [channels x] height x width [[x batch size]] memory layout. See Input and output formats for more details.

Contents