Saurabh Saxena

Research @ Google DeepMind

Saurabh Saxena

I am a Staff Research Engineer at Google DeepMind, where I lead research at the intersection of computer vision and generative modeling. My work in generative AI explores the potential of diffusion models for creating controllable image and video content (aka world models). On the scene understanding front, my research is centered on methods for recovering intrinsic properties of a scene, such as its geometry and motion, as well as estimating camera parameters. Prior to this, I played a pivotal role in building and launching TensorFlow 2, leading foundational aspects like automatic differentiation, control flow, and eager execution.

News

Selected Publications

RoMo: Robust Motion Segmentation Improves Structure from Motion

Lily Goli*, Sara Sabour*, Mark Matthews, Marcus Brubaker, Dmitry Lagun, Alec Jacobson, David J. Fleet, Saurabh Saxena, Andrea Tagliasacchi

ICCV 2025

High-Resolution Frame Interpolation with Patch-based Cascaded Diffusion

Junhwa Hur, Charles Herrmann, Saurabh Saxena, Janne Kontkanen, Wei-Sheng Lai, Yichang Shih, Michael Rubinstein, David J. Fleet, Deqing Sun

AAAI 2025

Controlling space and time with diffusion models

Daniel Watson*, Saurabh Saxena*, Lala Li* , Andrea Tagliasacchi, David J Fleet

ICLR 2025

Nerfiller: Completing scenes via generative 3d inpainting

Ethan Weber, Aleksander Holynski, Varun Jampani, Saurabh Saxena, Noah Snavely, Abhishek Kar, Angjoo Kanazawa

CVPR 2024

Zero-shot metric depth with a field-of-view conditioned diffusion model

Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J Fleet

ECCV 2024 Workshop on Wild 3D: 3D Modeling, Reconstruction, and Generation in the Wild

The surprising effectiveness of diffusion models for optical flow and monocular depth estimation

Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, David J Fleet

NeurIPS 2023 ORAL

A generalist framework for panoptic segmentation of images and videos

Ting Chen, Lala Li, Saurabh Saxena, Geoffrey Hinton, David J Fleet

ICCV 2023

A unified sequence interface for vision tasks

Ting Chen*, Saurabh Saxena*, Lala Li*, Tsung-Yi Lin, David J Fleet, Geoffrey E Hinton

NeurIPS 2022

Photorealistic text-to-image diffusion models with deep language understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi

NeurIPS 2022 SPOTLIGHT

Pix2seq: A language modeling framework for object detection

Ting Chen, Saurabh Saxena, Lala Li, David J Fleet, Geoffrey Hinton

ICLR 2022

Large-scale evolution of image classifiers

Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V Le, Alexey Kurakin

ICML 2017