Learning to Refocus with Video Diffusion Models

1Adobe Research, 2York University
SIGGRAPH ASIA 2025

Abstract

Focus is a cornerstone of photography, yet autofocus systems often fail to capture the intended subject, and users frequently wish to adjust focus after capture. We introduce a novel method for realistic post-capture refocusing using video diffusion models. From a single defocused image, our approach generates a perceptually accurate focal stack, represented as a video sequence, enabling interactive refocusing and unlocking a range of downstream applications. We release a large-scale focal stack dataset acquired under diverse real-world smartphone conditions to support this work and future research. Our method consistently outperforms existing approaches in both perceptual quality and robustness across challenging scenarios, paving the way for more advanced focus-editing capabilities in everyday photography.




Qualitative Results: Generated Focal Stack Comparisons

Compare different focal stacks and hover over images to view details. Select a scene from the menu below, then use the frame controls to view the generated views from all methods. The hover zoom window allows for detailed inspection of specific regions.

Scene Selection

← Scroll to see more scenes →

1

Input Focal Position 01

Still Image
NAF InstructPix2Pix
Comparison Stack

Ours

Ours Stack

GT

GT Stack

Hover Zoom Window

Hover over an image to visualize
3.0x

Video Models Have Powerful Priors on Refocusing!

Video diffusion models naturally understand focus transitions. We leverage these powerful priors to design our refocusing model.

"A focus pull of a bug on a leaf."

"A focus pull of a man holding a remote."



Dataset

We release a large-scale focal stack dataset acquired under diverse real-world smartphone conditions. The dataset contains high-quality focal stacks captured across various scenes and lighting conditions, providing a valuable resource for future research in computational photography and focus manipulation. Each focal stack consists of 9 focal positions, enabling comprehensive evaluation of refocusing algorithms.


Dataset Examples

← Scroll to see more datasets →

Dataset Sample
0
▶ Play
⏸ Pause
↻ Reset


BibTeX

@inproceedings{Tedla2025Refocus,
  title={{Learning to Refocus with Video Diffusion Models}},
  author={{Tedla, SaiKiran and Zhang, Zhoutong and Zhang, Xuaner and Xin, Shumian}},
  booktitle={{Proceedings of the ACM SIGGRAPH Asia Conference}},
  year={{2025}}
}