Saurabh Saxena - Homepage

I am a Staff Research Engineer at Google DeepMind, where I lead research at the intersection of computer vision and generative modeling. My work in generative AI explores the potential of diffusion models for creating controllable image and video content (aka world models). On the scene understanding front, my research is centered on methods for recovering intrinsic properties of a scene, such as its geometry and motion, as well as estimating camera parameters. Prior to this, I played a pivotal role in building and launching TensorFlow 2, leading foundational aspects like automatic differentiation, control flow, and eager execution.

Selected Publications

360Anything: Geometry-Free Lifting of Images and Videos to 360°

Ziyi Wu, Daniel Watson, Andrea Tagliasacchi, David J. Fleet, Marcus A. Brubaker, Saurabh Saxena

Paper Website

RoMo: Robust Motion Segmentation Improves Structure from Motion

Lily Goli*, Sara Sabour*, Mark Matthews, Marcus Brubaker, Dmitry Lagun, Alec Jacobson, David J. Fleet, Saurabh Saxena^†, Andrea Tagliasacchi^†

ICCV 2025

Paper Website

High-Resolution Frame Interpolation with Patch-based Cascaded Diffusion

Junhwa Hur, Charles Herrmann, Saurabh Saxena, Janne Kontkanen, Wei-Sheng Lai, Yichang Shih, Michael Rubinstein, David J. Fleet, Deqing Sun

AAAI 2025

Paper Website

Controlling space and time with diffusion models

Daniel Watson*, Saurabh Saxena*, Lala Li* , Andrea Tagliasacchi, David J Fleet

ICLR 2025

Paper Website

Nerfiller: Completing scenes via generative 3d inpainting

Ethan Weber, Aleksander Holynski, Varun Jampani, Saurabh Saxena, Noah Snavely, Abhishek Kar, Angjoo Kanazawa

CVPR 2024

Paper Website

Zero-shot metric depth with a field-of-view conditioned diffusion model

Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J Fleet

ECCV 2024 Workshop on Wild 3D: 3D Modeling, Reconstruction, and Generation in the Wild

Paper Website

The surprising effectiveness of diffusion models for optical flow and monocular depth estimation

Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, David J Fleet

NeurIPS 2023 ORAL

Paper Website

A generalist framework for panoptic segmentation of images and videos

Ting Chen, Lala Li, Saurabh Saxena, Geoffrey Hinton, David J Fleet

ICCV 2023

Paper Website

A unified sequence interface for vision tasks

Ting Chen*, Saurabh Saxena*, Lala Li*, Tsung-Yi Lin, David J Fleet, Geoffrey E Hinton

NeurIPS 2022

Paper Website

Photorealistic text-to-image diffusion models with deep language understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi

NeurIPS 2022 SPOTLIGHT

Paper Website

Pix2seq: A language modeling framework for object detection

Ting Chen, Saurabh Saxena, Lala Li, David J Fleet, Geoffrey Hinton

ICLR 2022

Paper Website

Large-scale evolution of image classifiers

Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V Le, Alexey Kurakin

ICML 2017

Paper

News

Selected Publications