CINXE.COM
mip-NeRF
<!DOCTYPE html> <html> <head lang="en"> <meta charset="UTF-8"> <meta http-equiv="x-ua-compatible" content="ie=edge"> <title>mip-NeRF</title> <meta name="description" content=""> <meta name="viewport" content="width=device-width, initial-scale=1"> <!-- <base href="/"> --> <!--FACEBOOK--> <meta property="og:image" content="https://jonbarron.info/mipnerf/img/rays_square.png"> <meta property="og:image:type" content="image/png"> <meta property="og:image:width" content="682"> <meta property="og:image:height" content="682"> <meta property="og:type" content="website" /> <meta property="og:url" content="https://jonbarron.info/mipnerf/"/> <meta property="og:title" content="mip-NeRF" /> <meta property="og:description" content="Project page for Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields." /> <!--TWITTER--> <meta name="twitter:card" content="summary_large_image" /> <meta name="twitter:title" content="mip-NeRF" /> <meta name="twitter:description" content="Project page for Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields." /> <meta name="twitter:image" content="https://jonbarron.info/mipnerf/img/rays_square.png" /> <!-- <link rel="apple-touch-icon" href="apple-touch-icon.png"> --> <!-- <link rel="icon" type="image/png" href="img/seal_icon.png"> --> <!-- Place favicon.ico in the root directory --> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css"> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css"> <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/codemirror/5.8.0/codemirror.min.css"> <link rel="stylesheet" href="css/app.css"> <link rel="stylesheet" href="css/bootstrap.min.css"> <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/5.8.0/codemirror.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/1.5.3/clipboard.min.js"></script> <script src="js/app.js"></script> </head> <body> <div class="container" id="main"> <div class="row"> <h2 class="col-md-12 text-center"> <b>Mip-NeRF</b>: A Multiscale Representation <br> for Anti-Aliasing Neural Radiance Fields</br> <small> ICCV 2021 (Oral, Best Paper Honorable Mention) </small> </h2> </div> <div class="row"> <div class="col-md-12 text-center"> <ul class="list-inline"> <li> <a href="https://jonbarron.info/"> Jonathan T. Barron </a> </br>Google </li> <li> <a href="http://bmild.github.io/"> Ben Mildenhall </a> </br>Google </li> <li> <a href="http://matthewtancik.com/"> Matthew Tancik </a> </br>UC Berkeley </li><br> <li> <a href="https://phogzone.com/"> Peter Hedman </a> </br>Google </li> <li> <a href="http://www.ricardomartinbrualla.com/"> Ricardo Martin-Brualla </a> </br>Google </li> <li> <a href="https://pratulsrinivasan.github.io/"> Pratul P. Srinivasan </a> </br>Google </li> </ul> </div> </div> <div class="row"> <div class="col-md-4 col-md-offset-4 text-center"> <ul class="nav nav-pills nav-justified"> <li> <a href="https://arxiv.org/abs/2103.13415"> <image src="img/mip_paper_image.jpg" height="60px"> <h4><strong>Paper</strong></h4> </a> </li> <li> <a href="https://youtu.be/EpH175PY1A0"> <image src="img/youtube_icon.png" height="60px"> <h4><strong>Video</strong></h4> </a> </li> <li> <a href="https://github.com/google/mipnerf"> <image src="img/github.png" height="60px"> <h4><strong>Code</strong></h4> </a> </li> </ul> </div> </div> <div class="row"> <div class="col-md-8 col-md-offset-2"> <h3> Abstract </h3> <image src="img/rays.png" class="img-responsive" alt="overview"><br> <p class="text-justify"> The rendering procedure used by neural radiance fields (NeRF) samples a scene with a single ray per pixel and may therefore produce renderings that are excessively blurred or aliased when training or testing images observe scene content at different resolutions. The straightforward solution of supersampling by rendering with multiple rays per pixel is impractical for NeRF, because rendering each ray requires querying a multilayer perceptron hundreds of times. Our solution, which we call "mip-NeRF" (脿 la "mipmap"), extends NeRF to represent the scene at a continuously-valued scale. By efficiently rendering anti-aliased conical frustums instead of rays, mip-NeRF reduces objectionable aliasing artifacts and significantly improves NeRF's ability to represent fine details, while also being 7% faster than NeRF and half the size. Compared to NeRF, mip-NeRF reduces average error rates by 17% on the dataset presented with NeRF and by 60% on a challenging multiscale variant of that dataset that we present. mip-NeRF is also able to match the accuracy of a brute-force supersampled NeRF on our multiscale dataset while being 22x faster. </p> </div> </div> <div class="row"> <div class="col-md-8 col-md-offset-2"> <h3> Video </h3> <div class="text-center"> <div style="position:relative;padding-top:56.25%;"> <iframe src="https://www.youtube.com/embed/EpH175PY1A0" allowfullscreen style="position:absolute;top:0;left:0;width:100%;height:100%;"></iframe> </div> </div> </div> </div> <div class="row"> <div class="col-md-8 col-md-offset-2"> <h3> Integrated Positional Encoding </h3> <p class="text-justify"> Typical positional encoding (as used in Transformer networks and Neural Radiance Fields) maps a single point in space to a feature vector, where each element is generated by a sinusoid with an exponentially increasing frequency: </p> <p style="text-align:center;"> <image src="img/pe_seq_eqn_pad.png" height="50px" class="img-responsive"> </p> <video id="v0" width="100%" autoplay loop muted> <source src="img/pe_anim_horiz.mp4" type="video/mp4" /> </video> <p class="text-justify"> Here, we show how these feature vectors change as a function of a point moving in 1D space. <br><br> Our <em>integrated positional encoding</em> considers Gaussian <em>regions</em> of space, rather than infinitesimal points. This provides a natural way to input a "region" of space as query to a coordinate-based neural network, allowing the network to reason about sampling and aliasing. The expected value of each positional encoding component has a simple closed form: </p> <p style="text-align:center;"> <image src="img/ipe_eqn_under_pad.png" height="30px" class="img-responsive"> </p> <video id="v0" width="100%" autoplay loop muted> <source src="img/ipe_anim_horiz.mp4" type="video/mp4" /> </video> <p class="text-justify"> We can see that when considering a wider region, the higher frequency features automatically shrink toward zero, providing the network with lower-frequency inputs. As the region narrows, these features converge to the original positional encoding. </p> </div> </div> <div class="row"> <div class="col-md-8 col-md-offset-2"> <h3> Mip-NeRF </h3> <p class="text-justify"> We use integrated positional encoding to train NeRF to generate anti-aliased renderings. Rather than casting an infinitesimal ray through each pixel, we instead cast a full 3D <em>cone</em>. For each queried point along a ray, we consider its associated 3D conical frustum. Two different cameras viewing the same point in space may result in vastly different conical frustums, as illustrated here in 2D: </p> <p style="text-align:center;"> <image src="img/scales_toy.png" class="img-responsive" alt="scales"> </p> <p class="text-justify"> In order to pass this information through the NeRF network, we fit a multivariate Gaussian to the conical frustum and use the integrated positional encoding described above to create the input feature vector to the network. </p> </div> </div> <div class="row"> <div class="col-md-8 col-md-offset-2"> <h3> Results </h3> <p class="text-justify"> We train NeRF and mip-NeRF on a dataset with images at four different resolutions. Normal NeRF (left) is not capable of learning to represent the same scene at multiple levels of detail, with blurring in close-up shots and aliasing in low resolution views, while mip-NeRF (right) both preserves sharp details in close-ups and correctly renders the zoomed-out images. </p> <br> <video id="v0" width="100%" autoplay loop muted controls> <source src="img/ship_sbs_path1.mp4" type="video/mp4" /> </video> <video id="v0" width="100%" autoplay loop muted controls> <source src="img/chair_sbs_path1.mp4" type="video/mp4" /> </video> <video id="v0" width="100%" autoplay loop muted controls> <source src="img/lego_sbs_path1.mp4" type="video/mp4" /> </video> <video id="v0" width="100%" autoplay loop muted controls> <source src="img/mic_sbs_path1.mp4" type="video/mp4" /> </video> <br><br> <p class="text-justify"> We can also manipulate the integrated positional encoding by using a larger or smaller radius than the true pixel footprint, exposing the continuous level of detail learned within a single network: </p> <video id="v0" width="100%" autoplay loop muted controls> <source src="img/lego_radii_manip_slider_200p.mp4" type="video/mp4" /> </video> </div> </div> <div class="row"> <div class="col-md-8 col-md-offset-2"> <h3> Related links </h3> <p class="text-justify"> <a href="https://en.wikipedia.org/wiki/Spatial_anti-aliasing">Wikipedia</a> provides an excellent introduction to spatial anti-aliasing techniques. </p> <p class="text-justify"> Mipmaps were introduced by Lance Williams in his paper "Pyramidal Parametrics" (<a href="https://software.intel.com/sites/default/files/m/7/2/c/p1-williams.pdf">Williams (1983)</a>). </p> <p class="text-justify"> <a href="https://dl.acm.org/doi/abs/10.1145/964965.808589">Amanatides (1984)</a> first proposed the idea of replacing rays with cones in computer graphics rendering. </p> <p class="text-justify"> The closely related concept of <em>ray differentials</em> (<a href="https://graphics.stanford.edu/papers/trd/">Igehy (1999)</a>) is used in most modern renderers to antialias textures and other material buffers during ray tracing. </p> <p class="text-justify"> Cone tracing has been used along with prefiltered voxel-based representations of scene geometry for speeding up indirect illumination calculations in <a href="https://research.nvidia.com/sites/default/files/publications/GIVoxels-pg2011-authors.pdf">Crassin et al. (2011)</a>. </p> <p class="text-justify"> Mip-NeRF was implemented on top of the <a href="https://github.com/google-research/google-research/tree/master/jaxnerf">JAXNeRF</a> codebase. </p> </div> </div> <div class="row"> <div class="col-md-8 col-md-offset-2"> <h3> Citation </h3> <div class="form-group col-md-10 col-md-offset-1"> <textarea id="bibtex" class="form-control" readonly> @article{barron2021mipnerf, title={Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields}, author={Jonathan T. Barron and Ben Mildenhall and Matthew Tancik and Peter Hedman and Ricardo Martin-Brualla and Pratul P. Srinivasan}, journal={ICCV}, year={2021} }</textarea> </div> </div> </div> <div class="row"> <div class="col-md-8 col-md-offset-2"> <h3> Acknowledgements </h3> <p class="text-justify"> We thank Janne Kontkanen and David Salesin for their comments on the text, Paul Debevec for constructive discussions, and Boyang Deng for JaxNeRF. <br> MT is funded by an NSF Graduate Fellowship. <br> The website template was borrowed from <a href="http://mgharbi.com/">Micha毛l Gharbi</a>. </p> </div> </div> </div> </body> </html>