NVIDIA's Instant NeRF technology turns 2D photos into 3D scenes within seconds

NVIDIA has developed a new approach towards inverse rendering to train AI algorithms and develop a 3D scene from 2D photos taken at various angles in merely a few seconds. This method merges neural network training and rapid rendering, and is one of the first models of its kind to do so.

The company applied this approach to a relatively recent technology known as neural radiance fields (NeRF) and produced "the fastest NeRF technique to date", Instant NeRF, that is capable of achieving over 1,000x speedups in certain cases. This model needs seconds to train on a few dozen still photos coupled with metadata about the camera angles, and can then render the resulting 3D scene within "tens of milliseconds".

Vice President for graphics research at NVIDIA, David Luebke stated:

If traditional 3D representations like polygonal meshes are akin to vector images, NeRFs are like bitmap images: they densely capture the way light radiates from an object or within a scene. In that sense, Instant NeRF could be as important to 3D as digital cameras and JPEG compression have been to 2D photography — vastly increasing the speed, ease and reach of 3D capture and sharing.

The company says that the Instant NeRF could be employed to develop avatars or scenery in virtual worlds, render video conference participants and their environments in 3D, or to reconstruct scenes for 3D digital maps.

Through Instant NeRF, NVIDIA was able to recreate a photo of Andy Warhol taking an instant photo, converting it into a 3D scene to pay tribute to the early days of Polaroid images.

Talking about the new technique, the company mentioned:

While estimating the depth and appearance of an object based on a partial view is a natural skill for humans, it’s a demanding task for AI.

Creating a 3D scene with traditional methods takes hours or longer, depending on the complexity and resolution of the visualization. Bringing AI into the picture speeds things up. Early NeRF models rendered crisp scenes without artifacts in a few minutes, but still took hours to train.

Instant NeRF, however, cuts rendering time by several orders of magnitude. It relies on a technique developed by NVIDIA called multi-resolution hash grid encoding, which is optimized to run efficiently on NVIDIA GPUs. Using a new input encoding method, researchers can achieve high-quality results using a tiny neural network that runs rapidly.

This model was produced by utilizing the NVIDIA CUDA Toolkit and the Tiny CUDA Neural Networks library. Being a lightweight neural network, it can be trained and run on a single NVIDIA GPU, with improved performance on Tensor Cores.