FastCUDASSIM.jl
Fast calculation of the Structural Similarity Index Measure (SSIM) and its gradients, on NVIDIA GPUs.
Quick start
julia> using Pkg; Pkg.add("CUDA"); Pkg.add(url = "https://github.com/LaurensDiels/FastCUDASSIM.jl") # Currently not yet registered
julia> using FastCUDASSIM, CUDA
julia> img1 = CUDA.rand(Float32, 3, 512, 768); img2 = similar(img1);
julia> ssim(img1, img2) # Actual value depends on RNG above
4.7802605f-6
julia> ssim_gradient(img1, img2) # w.r.t. img1
3×512×768 CuArray{Float32, 3, CUDA.DeviceMemory}:
[:, :, 1] =
-4.05717f-11 ...
... ...
julia> using Zygote
julia> Zygote.gradient(x -> ssim(x, img2), img1)
(Float32[-4.057169f-11 ...],)
julia> using TestImages
julia> dssim(cu(testimage("cameraman.tif")), cu(testimage("mandril_gray.tif"))) # CuMatrix{Gray{N0f8}} inputs
0.8373802f0Key points
- Fast by avoiding global memory as much as reasonably possible. See the benchmarks.
- Support for CUDA only: no other graphics APIs, no CPU fallback.
- Reverse-mode autodiff integration via ChainRules.jl. Works out of the box with Zygote.jl.
- The convolutions use zero padding and the usual Gaussian kernel with (conceptual) window size $11 \times 11$ and $\sigma = 1.5$, in
Float32precision. - Image intensities are assumed to lie in $[0, 1]$.
- Images should use
[channels x] height x width [[x batch size]]memory layout. See Input and output formats for more details.