CINXE.COM
Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View
<!DOCTYPE html> <html lang="en"> <head> <meta content="text/html; charset=utf-8" http-equiv="content-type"/> <title>Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View</title> <!--Generated on Sun Mar 16 15:49:58 2025 by LaTeXML (version 0.8.8) http://dlmf.nist.gov/LaTeXML/.--> <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv-fonts.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/latexml_styles.css" rel="stylesheet" type="text/css"/> <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.3.3/html2canvas.min.js"></script> <script src="/static/browse/0.3.4/js/addons_new.js"></script> <script src="/static/browse/0.3.4/js/feedbackOverlay.js"></script> <base href="/html/2503.12553v1/"/></head> <body> <nav class="ltx_page_navbar"> <nav class="ltx_TOC"> <ol class="ltx_toclist"> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S1" title="In Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">1 </span>Introduction</span></a></li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S2" title="In Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">2 </span>Related Work</span></a></li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S3" title="In Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3 </span>Proposed Method</span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S3.SS1" title="In 3 Proposed Method ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.1 </span>Prerequisites: Monocular 3D Reconstruction</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S3.SS2" title="In 3 Proposed Method ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.2 </span>Our Method: Niagara</span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S3.SS2.SSS1" title="In 3.2 Our Method: Niagara ‣ 3 Proposed Method ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.2.1 </span>Prior Information and Geometric Feature</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S3.SS2.SSS2" title="In 3.2 Our Method: Niagara ‣ 3 Proposed Method ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.2.2 </span>Niagara Encoder</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S3.SS2.SSS3" title="In 3.2 Our Method: Niagara ‣ 3 Proposed Method ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.2.3 </span>Niagara Decoder</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S3.SS2.SSS4" title="In 3.2 Our Method: Niagara ‣ 3 Proposed Method ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.2.4 </span>Training Loss</span></a></li> </ol> </li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S4" title="In Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4 </span>Experimental Results</span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S4.SS1" title="In 4 Experimental Results ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.1 </span>Experiment Setup</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S4.SS2" title="In 4 Experimental Results ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.2 </span>Comparison Results with SoTA Methods</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S4.SS3" title="In 4 Experimental Results ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.3 </span><span class="ltx_text ltx_font_bold">Ablation Study </span></span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S5" title="In Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">5 </span>Conclusion</span></a></li> <li class="ltx_tocentry ltx_tocentry_appendix"><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#A1" title="In Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">A </span>Implementation Details</span></a></li> <li class="ltx_tocentry ltx_tocentry_appendix"><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#A2" title="In Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">B </span>More Qualitative Comparison</span></a></li> <li class="ltx_tocentry ltx_tocentry_appendix"><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#A3" title="In Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">C </span>KITTI Experiments</span></a></li> </ol></nav> </nav> <div class="ltx_page_main"> <div class="ltx_page_content"> <article class="ltx_document ltx_authors_1line ltx_pruned_first"> <h1 class="ltx_title ltx_title_document"> <span class="ltx_text ltx_font_bold" id="id15.id1">Niagara</span>: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View</h1> <div class="ltx_authors"> <span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Xianzu Wu<sup class="ltx_sup" id="id16.13.id1"><span class="ltx_text ltx_font_italic" id="id16.13.id1.1">1,3,6,</span></sup><span class="ltx_note ltx_role_footnotemark" id="footnotex1"><sup class="ltx_note_mark">1</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">1</sup><span class="ltx_note_type">footnotemark: </span><span class="ltx_tag ltx_tag_note">1</span></span></span></span> Zhenxin Ai<sup class="ltx_sup" id="id17.14.id2"><span class="ltx_text ltx_font_italic" id="id17.14.id2.1">1,2,</span></sup><span class="ltx_note ltx_role_footnotemark" id="footnotex2"><sup class="ltx_note_mark">1</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">1</sup><span class="ltx_note_type">footnotemark: </span><span class="ltx_tag ltx_tag_note">1</span></span></span></span> Harry Yang<sup class="ltx_sup" id="id18.15.id3"><span class="ltx_text ltx_font_italic" id="id18.15.id3.1">3,6</span></sup> Sernam Lim<sup class="ltx_sup" id="id19.16.id4"><span class="ltx_text ltx_font_italic" id="id19.16.id4.1">4,6</span></sup> Jun Liu<sup class="ltx_sup" id="id20.17.id5"><span class="ltx_text ltx_font_italic" id="id20.17.id5.1">5</span></sup> Huan Wang<sup class="ltx_sup" id="id21.18.id6"><span class="ltx_text ltx_font_italic" id="id21.18.id6.1">1,</span></sup><span class="ltx_note ltx_role_footnotemark" id="footnotex3"><sup class="ltx_note_mark">2</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">2</sup><span class="ltx_note_type">footnotemark: </span><span class="ltx_tag ltx_tag_note">2</span></span></span></span> <br class="ltx_break"/> <sup class="ltx_sup" id="id22.19.id7">1</sup>Westlake University <sup class="ltx_sup" id="id23.20.id8">2</sup>Jiangxi University of Science and Technology <br class="ltx_break"/><sup class="ltx_sup" id="id24.21.id9">3</sup>The Hong Kong University of Science and Technology <sup class="ltx_sup" id="id25.22.id10">4</sup>University of Central Florida <br class="ltx_break"/><sup class="ltx_sup" id="id26.23.id11">5</sup>Lancaster University <sup class="ltx_sup" id="id27.24.id12">6</sup>Everlyn AI <br class="ltx_break"/><a class="ltx_ref ltx_url ltx_font_typewriter" href="https://ai-kunkun.github.io/Niagara_page" title="">https://ai-kunkun.github.io/Niagara_page</a> <br class="ltx_break"/> </span></span> </div> <div class="ltx_abstract"> <h6 class="ltx_title ltx_title_abstract">Abstract</h6> <p class="ltx_p" id="id28.id1">Recent advances in <span class="ltx_text ltx_font_italic" id="id28.id1.1">single-view</span> 3D scene reconstruction have highlighted the challenges in capturing fine geometric details and ensuring structural consistency, particularly in high-fidelity outdoor scene modeling. This paper presents <span class="ltx_text ltx_font_bold" id="id28.id1.2">Niagara</span>, a new <span class="ltx_text ltx_font_bold" id="id28.id1.3">single-view</span> 3D scene reconstruction framework that can faithfully reconstruct challenging outdoor scenes from a single input image for the first time. Our approach integrates monocular depth and normal estimation as input, which substantially improves its ability to capture fine details, mitigating common issues like geometric detail loss and deformation. Additionally, we introduce a geometric affine field (GAF) and 3D self-attention as geometry-constraint, which combines the structural properties of explicit geometry with the adaptability of implicit feature fields, striking a balance between efficient rendering and high-fidelity reconstruction. Our framework finally proposes a specialized encoder-decoder architecture, where a depth-based 3D Gaussian decoder is proposed to predict 3D Gaussian parameters, which can be used for novel view synthesis.</p> <p class="ltx_p" id="id29.id2">Extensive results and analyses suggest that our Niagara surpasses prior SoTA approaches such as Flash3D in both single-view and dual-view settings, significantly enhancing the geometric accuracy and visual fidelity, especially in outdoor scenes.</p> </div> <div class="ltx_logical-block" id="id14"> <div class="ltx_para" id="id14.p1"> <table class="ltx_tabular ltx_centering ltx_align_middle" id="id13.1"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="id13.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="id13.1.1.1" style="padding-top:-4.75pt;padding-bottom:-4.75pt;"><img alt="[Uncaptioned image]" class="ltx_graphics ltx_img_landscape" height="161" id="id13.1.1.1.g1" src="x1.png" width="598"/></td> </tr> </tbody> </table> </div> <figure class="ltx_figure ltx_align_center" id="S0.F1"> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S0.F1.8.1.1" style="font-size:90%;">Figure 1</span>: </span><span class="ltx_text" id="S0.F1.9.2" style="font-size:90%;"> <span class="ltx_text ltx_font_italic" id="S0.F1.9.2.1">Left:</span> This paper presents <span class="ltx_text ltx_font_italic" id="S0.F1.9.2.2">Niagara</span>, a new 3D scene reconstruction method from a single view. Unlike the previous SoTA method Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite> in this line, which only utilizes depth maps as input, Niagara proposes to exploit the surface normals with a novel geometric affine field (GAF) as additional input. They are used in a proposed 3D self-attention fashion to learn 3D Gaussians of the scene. Niagara is the <span class="ltx_text ltx_font_italic" id="S0.F1.9.2.3">first</span> model that can effectively reconstruct the challenging <span class="ltx_text ltx_font_italic" id="S0.F1.9.2.4">outdoor</span> scenes from a single view (as shown by the rendered novel views above). <span class="ltx_text ltx_font_italic" id="S0.F1.9.2.5">Right:</span> Further quantitative comparison in PSNR and LPIPS on the RE10K dataset confirms the merits of our method <span class="ltx_text ltx_font_italic" id="S0.F1.9.2.6">vs.</span> Flash3D.</span></figcaption> </figure> </div> <span class="ltx_note ltx_role_footnotetext" id="footnotex4"><sup class="ltx_note_mark">0</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">0</sup><span class="ltx_note_type">footnotetext: </span><span class="ltx_text" id="footnotex4.1" style="font-size:90%;">Equal contribution. Work done during the author’s research internship at ENCODE Lab, Westlake University. <sup class="ltx_sup" id="footnotex4.1.1"><span class="ltx_text ltx_font_italic" id="footnotex4.1.1.1">†</span></sup>Corresponding author: <span class="ltx_text ltx_font_typewriter" id="footnotex4.1.2">wanghuan@westlake.edu.cn</span>. </span></span></span></span> <section class="ltx_section" id="S1"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">1 </span>Introduction</h2> <div class="ltx_para" id="S1.p1"> <p class="ltx_p" id="S1.p1.1">3D scene reconstruction from images has long been a fundamental challenge in the field of computer vision <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib52" title="">52</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib34" title="">34</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib76" title="">76</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib16" title="">16</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib41" title="">41</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib57" title="">57</a>]</cite>, with extensive applications in various domains such as autonomous driving, drone surveying, game development, virtual reality, and building information modeling <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib62" title="">62</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib45" title="">45</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib37" title="">37</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib86" title="">86</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib32" title="">32</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib43" title="">43</a>]</cite>. Traditional methods predominantly rely on Multi-View Stereo (MVS) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib23" title="">23</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib29" title="">29</a>]</cite>, which estimates depth maps from multiple images and integrates them to create a comprehensive 3D model <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib23" title="">23</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib24" title="">24</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib51" title="">51</a>]</cite>. Recent advances, such as 3D Gaussian splatting <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib33" title="">33</a>]</cite>, neural radiance fields <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib42" title="">42</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib44" title="">44</a>]</cite> or light fields <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib4" title="">4</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib58" title="">58</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib67" title="">67</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib9" title="">9</a>]</cite>, and their derivative works <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib2" title="">2</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib3" title="">3</a>]</cite>, have effectively addressed previous challenges associated with inconsistent regions, including occlusions, specular reflections, transparent objects, and low-texture surfaces in multi-view scenes.</p> </div> <figure class="ltx_figure" id="S1.F2"> <table class="ltx_tabular ltx_centering ltx_align_middle" id="S1.F2.25"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S1.F2.5.5"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S1.F2.1.1.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.1.1.1.g1" src="x2.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.2.2.2"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="108" id="S1.F2.2.2.2.g1" src="x3.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.3.3.3"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="108" id="S1.F2.3.3.3.g1" src="x4.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.4.4.4"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.4.4.4.g1" src="x5.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.5.5.5"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.5.5.5.g1" src="x6.png" width="161"/></td> </tr> <tr class="ltx_tr" id="S1.F2.10.10"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S1.F2.6.6.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="108" id="S1.F2.6.6.1.g1" src="x7.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.7.7.2"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="108" id="S1.F2.7.7.2.g1" src="x8.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.8.8.3"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="108" id="S1.F2.8.8.3.g1" src="x9.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.9.9.4"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.9.9.4.g1" src="x10.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.10.10.5"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.10.10.5.g1" src="x11.png" width="161"/></td> </tr> <tr class="ltx_tr" id="S1.F2.15.15"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S1.F2.11.11.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.11.11.1.g1" src="x12.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.12.12.2"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.12.12.2.g1" src="x13.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.13.13.3"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.13.13.3.g1" src="x14.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.14.14.4"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.14.14.4.g1" src="x15.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.15.15.5"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.15.15.5.g1" src="x16.png" width="161"/></td> </tr> <tr class="ltx_tr" id="S1.F2.20.20"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S1.F2.16.16.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.16.16.1.g1" src="x17.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.17.17.2"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.17.17.2.g1" src="x18.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.18.18.3"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.18.18.3.g1" src="x19.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.19.19.4"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.19.19.4.g1" src="x20.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.20.20.5"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.20.20.5.g1" src="x21.png" width="161"/></td> </tr> <tr class="ltx_tr" id="S1.F2.25.25"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S1.F2.21.21.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.21.21.1.g1" src="x22.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.22.22.2"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.22.22.2.g1" src="x23.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.23.23.3"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.23.23.3.g1" src="x24.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.24.24.4"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.24.24.4.g1" src="x25.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.25.25.5"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S1.F2.25.25.5.g1" src="x26.png" width="161"/></td> </tr> <tr class="ltx_tr" id="S1.F2.25.26.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S1.F2.25.26.1.1"><span class="ltx_text" id="S1.F2.25.26.1.1.1" style="font-size:90%;">(a) GT</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.25.26.1.2"><span class="ltx_text" id="S1.F2.25.26.1.2.1" style="font-size:90%;">(b) Flash3D</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.25.26.1.3"><span class="ltx_text" id="S1.F2.25.26.1.3.1" style="font-size:90%;">(c) Ours</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.25.26.1.4"><span class="ltx_text" id="S1.F2.25.26.1.4.1" style="font-size:90%;">(d) Depth</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S1.F2.25.26.1.5"><span class="ltx_text" id="S1.F2.25.26.1.5.1" style="font-size:90%;">(e) Normal</span></td> </tr> </tbody> </table> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S1.F2.28.1.1" style="font-size:90%;">Figure 2</span>: </span><span class="ltx_text ltx_font_bold" id="S1.F2.29.2" style="font-size:90%;">Motivation illustrations.<span class="ltx_text ltx_font_medium" id="S1.F2.29.2.1"> Flash3D faces geometric blurring and color distortion issues due to Gaussian interpolation errors and insufficient Gaussian representation solely from depth images. To resolve this, we incorporate normal images into our framework, significantly improving reconstruction results.</span></span></figcaption> </figure> <div class="ltx_para" id="S1.p2"> <p class="ltx_p" id="S1.p2.1">However, <span class="ltx_text ltx_font_italic" id="S1.p2.1.1">monocular</span> reconstruction faces significant challenges due to the absence of inherent depth information <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib50" title="">50</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib35" title="">35</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib27" title="">27</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib75" title="">75</a>]</cite> and the limitations associated with single-viewpoint data <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib85" title="">85</a>]</cite>. Recently, Stanislaw <em class="ltx_emph ltx_font_italic" id="S1.p2.1.2">et al</em>. introduced Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite>, the <span class="ltx_text ltx_font_italic" id="S1.p2.1.3">first</span> monocular method, to our best knowledge, that realizes <span class="ltx_text ltx_font_italic" id="S1.p2.1.4">single-view</span> 3D reconstruction. Flash3D integrates depth estimation <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib8" title="">8</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib53" title="">53</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib78" title="">78</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib7" title="">7</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib35" title="">35</a>]</cite> with a feedforward network transformer architecture, resulting in notable advances in reconstructing complex scenes without the need for multiview inputs.</p> </div> <div class="ltx_para" id="S1.p3"> <p class="ltx_p" id="S1.p3.1">Although Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite> has made significant strides in single-view 3D scene reconstruction, there is still notable room for improvement. In <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#S1.F2" title="Figure 2 ‣ 1 Introduction ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Figure 2</span></a>, we illustrate several examples in which Flash3D did not perform so well. The first two rows show Flash3D renders <span class="ltx_text ltx_font_italic" id="S1.p3.1.1">blurred</span> results for the corners and door edges with abrupt geometric changes. This blurring is caused by insufficient depth interpolation, resulting in a loss of geometric fidelity. Inadequate point cloud <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib83" title="">83</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib77" title="">77</a>]</cite> sampling exacerbates interpolation errors <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib49" title="">49</a>]</cite>, particularly in detail-rich regions such as the edges or boundaries of physical structures and the sky. Color distortion and overflow artifacts (see <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#S1.F2" title="Figure 2 ‣ 1 Introduction ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Figure 2</span></a>, Row 4, 5) occur due to insufficient Gaussian representation.</p> </div> <div class="ltx_para" id="S1.p4"> <p class="ltx_p" id="S1.p4.1">To address these shortcomings, this paper proposes a new <span class="ltx_text ltx_framed ltx_framed_underline" id="S1.p4.1.1">n</span>ormal-<span class="ltx_text ltx_framed ltx_framed_underline" id="S1.p4.1.2">i</span>ntegr<span class="ltx_text ltx_framed ltx_framed_underline" id="S1.p4.1.3">a</span>ted <span class="ltx_text ltx_framed ltx_framed_underline" id="S1.p4.1.4">g</span>eometric <span class="ltx_text ltx_framed ltx_framed_underline" id="S1.p4.1.5">a</span>ffine field for 3D scene <span class="ltx_text ltx_framed ltx_framed_underline" id="S1.p4.1.6">r</span>econstruction from <span class="ltx_text ltx_framed ltx_framed_underline" id="S1.p4.1.7">a</span> single view, abbreviated as <span class="ltx_text ltx_font_italic" id="S1.p4.1.8">Niagara</span>. Niagara aims to improve geometric accuracy while preserving fine details, as shown in <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#S0.F1" title="Figure 1 ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Figure 1</span></a>. Specifically, we integrate both depth and normal information <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib21" title="">21</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib30" title="">30</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib79" title="">79</a>]</cite>, improving depth cues and allowing the model to capture finer details. Furthermore, we introduce a <span class="ltx_text ltx_font_italic" id="S1.p4.1.9">geometric affine field</span> (GAF) with 3D self-attention as geometric constraint, which enriches the geometric details in single-view image reconstruction and enhances the sensitivity of the model to geometric boundaries <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib74" title="">74</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib11" title="">11</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib6" title="">6</a>]</cite>. The output of GAF is then used to learn the parameters of 3D Gaussian splatting (3D-GS) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib31" title="">31</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib14" title="">14</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib30" title="">30</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib39" title="">39</a>]</cite>, which can be used to render novel views during testing.</p> </div> <div class="ltx_para" id="S1.p5"> <p class="ltx_p" id="S1.p5.1">Experimental results show that Niagara excels in a variety of challenging scenes, offering notable advantages in terms of geometric accuracy and preservation of detail when compared to existing methods (a quick demonstration is shown in <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#S1.F2" title="Figure 2 ‣ 1 Introduction ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Figure 2</span></a> - our method outperforms Flash3D, the prior SoTA in this line).</p> </div> <div class="ltx_para" id="S1.p6"> <p class="ltx_p" id="S1.p6.1">Our contributions can be summarized as follows:</p> <ul class="ltx_itemize" id="S1.I1"> <li class="ltx_item" id="S1.I1.i1" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S1.I1.i1.p1"> <p class="ltx_p" id="S1.I1.i1.p1.1">This paper proposes Niagara, the <span class="ltx_text ltx_font_italic" id="S1.I1.i1.p1.1.1">first</span> effective <span class="ltx_text ltx_font_italic" id="S1.I1.i1.p1.1.2">single-view</span> 3D scene reconstruction framework that addresses key challenges for complex outdoor scenes, by integrating surface normals for improved global features.</p> </div> </li> <li class="ltx_item" id="S1.I1.i2" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S1.I1.i2.p1"> <p class="ltx_p" id="S1.I1.i2.p1.1">Niagara introduces several novel modules for accurate scene representation and learning: a geometric affine field and 3D self-attention for refining local geometry, and a depth-based Gaussian decoder for novel view rendering.</p> </div> </li> <li class="ltx_item" id="S1.I1.i3" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S1.I1.i3.p1"> <p class="ltx_p" id="S1.I1.i3.p1.1">Experiments demonstrate Niagara achieves state-of-the-art performance on the single-view 3D reconstruction benchmark, surpassing existing methods by nearly 1 dB PSNR, especially in challenging outdoor scenes.</p> </div> </li> </ul> </div> </section> <section class="ltx_section" id="S2"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">2 </span>Related Work</h2> <figure class="ltx_figure" id="S2.F3"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="417" id="S2.F3.g1" src="x27.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S2.F3.20.9.1" style="font-size:90%;">Figure 3</span>: </span><span class="ltx_text ltx_font_bold" id="S2.F3.16.8" style="font-size:90%;">Overview of Niagara. <span class="ltx_text ltx_font_medium" id="S2.F3.11.3.3"> First, two frozen pre-trained networks <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib46" title="">46</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib79" title="">79</a>]</cite> simultaneously estimate the metric depth map <math alttext="D" class="ltx_Math" display="inline" id="S2.F3.9.1.1.m1.1"><semantics id="S2.F3.9.1.1.m1.1b"><mi id="S2.F3.9.1.1.m1.1.1" xref="S2.F3.9.1.1.m1.1.1.cmml">D</mi><annotation-xml encoding="MathML-Content" id="S2.F3.9.1.1.m1.1c"><ci id="S2.F3.9.1.1.m1.1.1.cmml" xref="S2.F3.9.1.1.m1.1.1">𝐷</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.F3.9.1.1.m1.1d">D</annotation><annotation encoding="application/x-llamapun" id="S2.F3.9.1.1.m1.1e">italic_D</annotation></semantics></math> and the normal map <math alttext="N" class="ltx_Math" display="inline" id="S2.F3.10.2.2.m2.1"><semantics id="S2.F3.10.2.2.m2.1b"><mi id="S2.F3.10.2.2.m2.1.1" xref="S2.F3.10.2.2.m2.1.1.cmml">N</mi><annotation-xml encoding="MathML-Content" id="S2.F3.10.2.2.m2.1c"><ci id="S2.F3.10.2.2.m2.1.1.cmml" xref="S2.F3.10.2.2.m2.1.1">𝑁</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.F3.10.2.2.m2.1d">N</annotation><annotation encoding="application/x-llamapun" id="S2.F3.10.2.2.m2.1e">italic_N</annotation></semantics></math> of the input image <math alttext="I" class="ltx_Math" display="inline" id="S2.F3.11.3.3.m3.1"><semantics id="S2.F3.11.3.3.m3.1b"><mi id="S2.F3.11.3.3.m3.1.1" xref="S2.F3.11.3.3.m3.1.1.cmml">I</mi><annotation-xml encoding="MathML-Content" id="S2.F3.11.3.3.m3.1c"><ci id="S2.F3.11.3.3.m3.1.1.cmml" xref="S2.F3.11.3.3.m3.1.1">𝐼</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.F3.11.3.3.m3.1d">I</annotation><annotation encoding="application/x-llamapun" id="S2.F3.11.3.3.m3.1e">italic_I</annotation></semantics></math>. </span>GAF<span class="ltx_text ltx_font_medium" id="S2.F3.16.8.8"> (Geometric Affine Field) module and the Res block in the encoder-decoder is ResNet50 based and combines with a 3D self-attention module. Niagara Decoder is similar to that of Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite>. Niagara Decoder predicts shape and appearance parameters <math alttext="\hat{P}" class="ltx_Math" display="inline" id="S2.F3.12.4.4.m1.1"><semantics id="S2.F3.12.4.4.m1.1b"><mover accent="true" id="S2.F3.12.4.4.m1.1.1" xref="S2.F3.12.4.4.m1.1.1.cmml"><mi id="S2.F3.12.4.4.m1.1.1.2" xref="S2.F3.12.4.4.m1.1.1.2.cmml">P</mi><mo id="S2.F3.12.4.4.m1.1.1.1" xref="S2.F3.12.4.4.m1.1.1.1.cmml">^</mo></mover><annotation-xml encoding="MathML-Content" id="S2.F3.12.4.4.m1.1c"><apply id="S2.F3.12.4.4.m1.1.1.cmml" xref="S2.F3.12.4.4.m1.1.1"><ci id="S2.F3.12.4.4.m1.1.1.1.cmml" xref="S2.F3.12.4.4.m1.1.1.1">^</ci><ci id="S2.F3.12.4.4.m1.1.1.2.cmml" xref="S2.F3.12.4.4.m1.1.1.2">𝑃</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.F3.12.4.4.m1.1d">\hat{P}</annotation><annotation encoding="application/x-llamapun" id="S2.F3.12.4.4.m1.1e">over^ start_ARG italic_P end_ARG</annotation></semantics></math> for <math alttext="K" class="ltx_Math" display="inline" id="S2.F3.13.5.5.m2.1"><semantics id="S2.F3.13.5.5.m2.1b"><mi id="S2.F3.13.5.5.m2.1.1" xref="S2.F3.13.5.5.m2.1.1.cmml">K</mi><annotation-xml encoding="MathML-Content" id="S2.F3.13.5.5.m2.1c"><ci id="S2.F3.13.5.5.m2.1.1.cmml" xref="S2.F3.13.5.5.m2.1.1">𝐾</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.F3.13.5.5.m2.1d">K</annotation><annotation encoding="application/x-llamapun" id="S2.F3.13.5.5.m2.1e">italic_K</annotation></semantics></math> layers of Gaussian distributions at each pixel <math alttext="u" class="ltx_Math" display="inline" id="S2.F3.14.6.6.m3.1"><semantics id="S2.F3.14.6.6.m3.1b"><mi id="S2.F3.14.6.6.m3.1.1" xref="S2.F3.14.6.6.m3.1.1.cmml">u</mi><annotation-xml encoding="MathML-Content" id="S2.F3.14.6.6.m3.1c"><ci id="S2.F3.14.6.6.m3.1.1.cmml" xref="S2.F3.14.6.6.m3.1.1">𝑢</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.F3.14.6.6.m3.1d">u</annotation><annotation encoding="application/x-llamapun" id="S2.F3.14.6.6.m3.1e">italic_u</annotation></semantics></math>. By adding the predicted positive depth offsets <math alttext="\delta_{i}" class="ltx_Math" display="inline" id="S2.F3.15.7.7.m4.1"><semantics id="S2.F3.15.7.7.m4.1b"><msub id="S2.F3.15.7.7.m4.1.1" xref="S2.F3.15.7.7.m4.1.1.cmml"><mi id="S2.F3.15.7.7.m4.1.1.2" xref="S2.F3.15.7.7.m4.1.1.2.cmml">δ</mi><mi id="S2.F3.15.7.7.m4.1.1.3" xref="S2.F3.15.7.7.m4.1.1.3.cmml">i</mi></msub><annotation-xml encoding="MathML-Content" id="S2.F3.15.7.7.m4.1c"><apply id="S2.F3.15.7.7.m4.1.1.cmml" xref="S2.F3.15.7.7.m4.1.1"><csymbol cd="ambiguous" id="S2.F3.15.7.7.m4.1.1.1.cmml" xref="S2.F3.15.7.7.m4.1.1">subscript</csymbol><ci id="S2.F3.15.7.7.m4.1.1.2.cmml" xref="S2.F3.15.7.7.m4.1.1.2">𝛿</ci><ci id="S2.F3.15.7.7.m4.1.1.3.cmml" xref="S2.F3.15.7.7.m4.1.1.3">𝑖</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.F3.15.7.7.m4.1d">\delta_{i}</annotation><annotation encoding="application/x-llamapun" id="S2.F3.15.7.7.m4.1e">italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT</annotation></semantics></math> to the initial monocular depth <math alttext="D" class="ltx_Math" display="inline" id="S2.F3.16.8.8.m5.1"><semantics id="S2.F3.16.8.8.m5.1b"><mi id="S2.F3.16.8.8.m5.1.1" xref="S2.F3.16.8.8.m5.1.1.cmml">D</mi><annotation-xml encoding="MathML-Content" id="S2.F3.16.8.8.m5.1c"><ci id="S2.F3.16.8.8.m5.1.1.cmml" xref="S2.F3.16.8.8.m5.1.1">𝐷</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.F3.16.8.8.m5.1d">D</annotation><annotation encoding="application/x-llamapun" id="S2.F3.16.8.8.m5.1e">italic_D</annotation></semantics></math>, the depth for each Gaussian layer is obtained. At the same time, the estimated normal information is used to compute the mean vectors of each Gaussian layer. This approach ensures that the Gaussian slices are ordered by depth, effectively modeling occluded and unobserved surfaces and increasing the accuracy of 3D reconstruction from a single image. </span></span></figcaption> </figure> <div class="ltx_para" id="S2.p1"> <p class="ltx_p" id="S2.p1.1"><span class="ltx_text ltx_font_bold ltx_font_italic" id="S2.p1.1.1">Single-View<span class="ltx_text ltx_font_upright" id="S2.p1.1.1.1"> 3D Scene Reconstruction.</span></span> <span class="ltx_text ltx_font_italic" id="S2.p1.1.2">Single-view</span> 3D scene reconstruction typically involves per-pixel depth estimation <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib19" title="">19</a>]</cite>. However, independent pixel-wise depth computation can result in artifacts <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib19" title="">19</a>]</cite>, compromising global geometric continuity and concave boundary consistency, particularly when faced with complex shapes.</p> </div> <div class="ltx_para" id="S2.p2"> <p class="ltx_p" id="S2.p2.1">Neural networks are increasingly used for single-view reconstruction. Wiles <em class="ltx_emph ltx_font_italic" id="S2.p2.1.1">et al</em>. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib73" title="">73</a>]</cite> introduce a novel end-to-end view synthesis method that leverages depth-guided 3D point clouds in combination with neural rendering. However, this method faces challenges, such as noticeable distortions and blurring, particularly when subjected to significant changes in viewpoint.</p> </div> <div class="ltx_para" id="S2.p3"> <p class="ltx_p" id="S2.p3.1">Li <em class="ltx_emph ltx_font_italic" id="S2.p3.1.1">et al</em>. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib38" title="">38</a>]</cite> combine Neural Radiance Fields (NeRF) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib42" title="">42</a>]</cite> with Multi-Plane Images (MPI) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib17" title="">17</a>]</cite> for novel view synthesis, effectively capturing depth variations. Nonetheless, this method may struggle in complex scenes characterized by substantial depth or lighting variations.</p> </div> <div class="ltx_para" id="S2.p4"> <p class="ltx_p" id="S2.p4.1">Recently, Wofk <em class="ltx_emph ltx_font_italic" id="S2.p4.1.1">et al</em>. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib75" title="">75</a>]</cite> developed a fast and efficient reconstruction technique that leverages convolutional neural networks in conjunction with spatial transformations. Despite its speed, the approach may sacrifice fine details and depth consistency, especially in intricate scenes with significant changes in viewpoint.</p> </div> <div class="ltx_para ltx_noindent" id="S2.p5"> <p class="ltx_p" id="S2.p5.1"><span class="ltx_text ltx_font_bold" id="S2.p5.1.1">Monocular Depth and Normal Estimation.</span> Our approach builds on monocular depth estimation, which predicts pixel-wise depth in images. Recent methods in this line have built upon deep neural networks and shown remarkable advances <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib14" title="">14</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib12" title="">12</a>]</cite>, driven by large training datasets <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib85" title="">85</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib55" title="">55</a>]</cite>. Due to varying depth distributions under different RGB values, some methods discretize depth as a classification task to enhance performance <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib26" title="">26</a>]</cite>.</p> </div> <div class="ltx_para" id="S2.p6"> <p class="ltx_p" id="S2.p6.1">Depth models in 3D reconstruction face two main challenges: adapting to diverse scenes and accurately predicting metric information under varying camera settings. Bhat <em class="ltx_emph ltx_font_italic" id="S2.p6.1.1">et al</em>. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib5" title="">5</a>]</cite> introduces global depth distributions with conditional scene-based processing. Yin <em class="ltx_emph ltx_font_italic" id="S2.p6.1.2">et al</em>. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib80" title="">80</a>]</cite> scale inputs and outputs to standard space, remapping depth by focal length. Facil <em class="ltx_emph ltx_font_italic" id="S2.p6.1.3">et al</em>. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib20" title="">20</a>]</cite> employ camera intrinsic parameters as inputs to improve depth estimation, while Piccinelli <em class="ltx_emph ltx_font_italic" id="S2.p6.1.4">et al</em>. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib46" title="">46</a>]</cite> leverages self-supervised learning with variational inference for camera-specific embeddings. Gui <em class="ltx_emph ltx_font_italic" id="S2.p6.1.5">et al</em>. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib27" title="">27</a>]</cite> develops a fast monocular depth estimation method using flow matching to enhance efficiency and accuracy.</p> </div> <div class="ltx_para" id="S2.p7"> <p class="ltx_p" id="S2.p7.1">Surface normals are less prone to metric ambiguity, better capture geometric shapes, and are crucial in 3D reconstruction tasks. However, estimating fine details and avoiding directional bias remains challenging, especially in unstructured scenes. Bae <em class="ltx_emph ltx_font_italic" id="S2.p7.1.1">et al</em>. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib1" title="">1</a>]</cite> utilize pixel ray directions but face difficulties in complex scenes. Long <em class="ltx_emph ltx_font_italic" id="S2.p7.1.2">et al</em>. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib39" title="">39</a>]</cite> model joint color-normal distribution for consistency, while Ye <em class="ltx_emph ltx_font_italic" id="S2.p7.1.3">et al</em>. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib79" title="">79</a>]</cite> refines initial estimates with semantic-guided diffusion to enhance sharpness and reduce randomness. In this work, we choose to employ pre-trained models to generate the depth <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib46" title="">46</a>]</cite> and normal maps <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib79" title="">79</a>]</cite>.</p> </div> </section> <section class="ltx_section" id="S3"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">3 </span>Proposed Method</h2> <div class="ltx_para" id="S3.p1"> <p class="ltx_p" id="S3.p1.6">Given an input RGB image <math alttext="I\in\mathbb{R}^{3\times H\times W}" class="ltx_Math" display="inline" id="S3.p1.1.m1.1"><semantics id="S3.p1.1.m1.1a"><mrow id="S3.p1.1.m1.1.1" xref="S3.p1.1.m1.1.1.cmml"><mi id="S3.p1.1.m1.1.1.2" xref="S3.p1.1.m1.1.1.2.cmml">I</mi><mo id="S3.p1.1.m1.1.1.1" xref="S3.p1.1.m1.1.1.1.cmml">∈</mo><msup id="S3.p1.1.m1.1.1.3" xref="S3.p1.1.m1.1.1.3.cmml"><mi id="S3.p1.1.m1.1.1.3.2" xref="S3.p1.1.m1.1.1.3.2.cmml">ℝ</mi><mrow id="S3.p1.1.m1.1.1.3.3" xref="S3.p1.1.m1.1.1.3.3.cmml"><mn id="S3.p1.1.m1.1.1.3.3.2" xref="S3.p1.1.m1.1.1.3.3.2.cmml">3</mn><mo id="S3.p1.1.m1.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S3.p1.1.m1.1.1.3.3.1.cmml">×</mo><mi id="S3.p1.1.m1.1.1.3.3.3" xref="S3.p1.1.m1.1.1.3.3.3.cmml">H</mi><mo id="S3.p1.1.m1.1.1.3.3.1a" lspace="0.222em" rspace="0.222em" xref="S3.p1.1.m1.1.1.3.3.1.cmml">×</mo><mi id="S3.p1.1.m1.1.1.3.3.4" xref="S3.p1.1.m1.1.1.3.3.4.cmml">W</mi></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.p1.1.m1.1b"><apply id="S3.p1.1.m1.1.1.cmml" xref="S3.p1.1.m1.1.1"><in id="S3.p1.1.m1.1.1.1.cmml" xref="S3.p1.1.m1.1.1.1"></in><ci id="S3.p1.1.m1.1.1.2.cmml" xref="S3.p1.1.m1.1.1.2">𝐼</ci><apply id="S3.p1.1.m1.1.1.3.cmml" xref="S3.p1.1.m1.1.1.3"><csymbol cd="ambiguous" id="S3.p1.1.m1.1.1.3.1.cmml" xref="S3.p1.1.m1.1.1.3">superscript</csymbol><ci id="S3.p1.1.m1.1.1.3.2.cmml" xref="S3.p1.1.m1.1.1.3.2">ℝ</ci><apply id="S3.p1.1.m1.1.1.3.3.cmml" xref="S3.p1.1.m1.1.1.3.3"><times id="S3.p1.1.m1.1.1.3.3.1.cmml" xref="S3.p1.1.m1.1.1.3.3.1"></times><cn id="S3.p1.1.m1.1.1.3.3.2.cmml" type="integer" xref="S3.p1.1.m1.1.1.3.3.2">3</cn><ci id="S3.p1.1.m1.1.1.3.3.3.cmml" xref="S3.p1.1.m1.1.1.3.3.3">𝐻</ci><ci id="S3.p1.1.m1.1.1.3.3.4.cmml" xref="S3.p1.1.m1.1.1.3.3.4">𝑊</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.p1.1.m1.1c">I\in\mathbb{R}^{3\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S3.p1.1.m1.1d">italic_I ∈ blackboard_R start_POSTSUPERSCRIPT 3 × italic_H × italic_W end_POSTSUPERSCRIPT</annotation></semantics></math> that captures a scene, our goal is to train a neural network <math alttext="\Phi" class="ltx_Math" display="inline" id="S3.p1.2.m2.1"><semantics id="S3.p1.2.m2.1a"><mi id="S3.p1.2.m2.1.1" mathvariant="normal" xref="S3.p1.2.m2.1.1.cmml">Φ</mi><annotation-xml encoding="MathML-Content" id="S3.p1.2.m2.1b"><ci id="S3.p1.2.m2.1.1.cmml" xref="S3.p1.2.m2.1.1">Φ</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.p1.2.m2.1c">\Phi</annotation><annotation encoding="application/x-llamapun" id="S3.p1.2.m2.1d">roman_Φ</annotation></semantics></math> to take <math alttext="I" class="ltx_Math" display="inline" id="S3.p1.3.m3.1"><semantics id="S3.p1.3.m3.1a"><mi id="S3.p1.3.m3.1.1" xref="S3.p1.3.m3.1.1.cmml">I</mi><annotation-xml encoding="MathML-Content" id="S3.p1.3.m3.1b"><ci id="S3.p1.3.m3.1.1.cmml" xref="S3.p1.3.m3.1.1">𝐼</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.p1.3.m3.1c">I</annotation><annotation encoding="application/x-llamapun" id="S3.p1.3.m3.1d">italic_I</annotation></semantics></math> as input and generate a scene representation <math alttext="G=\Phi(I)" class="ltx_Math" display="inline" id="S3.p1.4.m4.1"><semantics id="S3.p1.4.m4.1a"><mrow id="S3.p1.4.m4.1.2" xref="S3.p1.4.m4.1.2.cmml"><mi id="S3.p1.4.m4.1.2.2" xref="S3.p1.4.m4.1.2.2.cmml">G</mi><mo id="S3.p1.4.m4.1.2.1" xref="S3.p1.4.m4.1.2.1.cmml">=</mo><mrow id="S3.p1.4.m4.1.2.3" xref="S3.p1.4.m4.1.2.3.cmml"><mi id="S3.p1.4.m4.1.2.3.2" mathvariant="normal" xref="S3.p1.4.m4.1.2.3.2.cmml">Φ</mi><mo id="S3.p1.4.m4.1.2.3.1" xref="S3.p1.4.m4.1.2.3.1.cmml"></mo><mrow id="S3.p1.4.m4.1.2.3.3.2" xref="S3.p1.4.m4.1.2.3.cmml"><mo id="S3.p1.4.m4.1.2.3.3.2.1" stretchy="false" xref="S3.p1.4.m4.1.2.3.cmml">(</mo><mi id="S3.p1.4.m4.1.1" xref="S3.p1.4.m4.1.1.cmml">I</mi><mo id="S3.p1.4.m4.1.2.3.3.2.2" stretchy="false" xref="S3.p1.4.m4.1.2.3.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.p1.4.m4.1b"><apply id="S3.p1.4.m4.1.2.cmml" xref="S3.p1.4.m4.1.2"><eq id="S3.p1.4.m4.1.2.1.cmml" xref="S3.p1.4.m4.1.2.1"></eq><ci id="S3.p1.4.m4.1.2.2.cmml" xref="S3.p1.4.m4.1.2.2">𝐺</ci><apply id="S3.p1.4.m4.1.2.3.cmml" xref="S3.p1.4.m4.1.2.3"><times id="S3.p1.4.m4.1.2.3.1.cmml" xref="S3.p1.4.m4.1.2.3.1"></times><ci id="S3.p1.4.m4.1.2.3.2.cmml" xref="S3.p1.4.m4.1.2.3.2">Φ</ci><ci id="S3.p1.4.m4.1.1.cmml" xref="S3.p1.4.m4.1.1">𝐼</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.p1.4.m4.1c">G=\Phi(I)</annotation><annotation encoding="application/x-llamapun" id="S3.p1.4.m4.1d">italic_G = roman_Φ ( italic_I )</annotation></semantics></math>, encapsulating both the 3D geometry and the photometric properties of the scene described by <math alttext="I" class="ltx_Math" display="inline" id="S3.p1.5.m5.1"><semantics id="S3.p1.5.m5.1a"><mi id="S3.p1.5.m5.1.1" xref="S3.p1.5.m5.1.1.cmml">I</mi><annotation-xml encoding="MathML-Content" id="S3.p1.5.m5.1b"><ci id="S3.p1.5.m5.1.1.cmml" xref="S3.p1.5.m5.1.1">𝐼</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.p1.5.m5.1c">I</annotation><annotation encoding="application/x-llamapun" id="S3.p1.5.m5.1d">italic_I</annotation></semantics></math>. <math alttext="G" class="ltx_Math" display="inline" id="S3.p1.6.m6.1"><semantics id="S3.p1.6.m6.1a"><mi id="S3.p1.6.m6.1.1" xref="S3.p1.6.m6.1.1.cmml">G</mi><annotation-xml encoding="MathML-Content" id="S3.p1.6.m6.1b"><ci id="S3.p1.6.m6.1.1.cmml" xref="S3.p1.6.m6.1.1">𝐺</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.p1.6.m6.1c">G</annotation><annotation encoding="application/x-llamapun" id="S3.p1.6.m6.1d">italic_G</annotation></semantics></math> will be used to render novel views during testing. Next, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S3.SS1" title="3.1 Prerequisites: Monocular 3D Reconstruction ‣ 3 Proposed Method ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Section</span> <span class="ltx_text ltx_ref_tag">3.1</span></a> outlines the basic concepts and framework on which we build. <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S3.SS2" title="3.2 Our Method: Niagara ‣ 3 Proposed Method ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Section</span> <span class="ltx_text ltx_ref_tag">3.2</span></a> formally introduces the Niagara model, the decoder from Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite>, and the integration of monocular depth prediction as a prior, followed by a description of the 3D self-attention and geometric affine field that we use to learn better geometric features.</p> </div> <div class="ltx_para" id="S3.p2"> <p class="ltx_p" id="S3.p2.1">A detailed overview of our method is given in <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#S2.F3" title="Figure 3 ‣ 2 Related Work ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Figure 3</span></a>. Our method combines depth and normal estimation <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib81" title="">81</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib68" title="">68</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib15" title="">15</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib65" title="">65</a>]</cite> with a layered Gaussian representation, allowing for robust handling of occlusions and unobserved surfaces. As we will show, the method significantly improves the geometric consistency and accuracy of single-view 3D scene reconstruction.</p> </div> <section class="ltx_subsection" id="S3.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">3.1 </span>Prerequisites: Monocular 3D Reconstruction</h3> <div class="ltx_para" id="S3.SS1.p1"> <p class="ltx_p" id="S3.SS1.p1.6"><span class="ltx_text ltx_font_bold" id="S3.SS1.p1.6.1">Scene Representation.</span> Similar to Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite>, our method represents scenes as a collection of 3D Gaussians <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib31" title="">31</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib14" title="">14</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib30" title="">30</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib39" title="">39</a>]</cite>. For the <math alttext="i" class="ltx_Math" display="inline" id="S3.SS1.p1.1.m1.1"><semantics id="S3.SS1.p1.1.m1.1a"><mi id="S3.SS1.p1.1.m1.1.1" xref="S3.SS1.p1.1.m1.1.1.cmml">i</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.1.m1.1b"><ci id="S3.SS1.p1.1.m1.1.1.cmml" xref="S3.SS1.p1.1.m1.1.1">𝑖</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.1.m1.1c">i</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.1.m1.1d">italic_i</annotation></semantics></math>-th Gaussian, it is characterized by a set of parameters: opacity <math alttext="\sigma_{i}" class="ltx_Math" display="inline" id="S3.SS1.p1.2.m2.1"><semantics id="S3.SS1.p1.2.m2.1a"><msub id="S3.SS1.p1.2.m2.1.1" xref="S3.SS1.p1.2.m2.1.1.cmml"><mi id="S3.SS1.p1.2.m2.1.1.2" xref="S3.SS1.p1.2.m2.1.1.2.cmml">σ</mi><mi id="S3.SS1.p1.2.m2.1.1.3" xref="S3.SS1.p1.2.m2.1.1.3.cmml">i</mi></msub><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.2.m2.1b"><apply id="S3.SS1.p1.2.m2.1.1.cmml" xref="S3.SS1.p1.2.m2.1.1"><csymbol cd="ambiguous" id="S3.SS1.p1.2.m2.1.1.1.cmml" xref="S3.SS1.p1.2.m2.1.1">subscript</csymbol><ci id="S3.SS1.p1.2.m2.1.1.2.cmml" xref="S3.SS1.p1.2.m2.1.1.2">𝜎</ci><ci id="S3.SS1.p1.2.m2.1.1.3.cmml" xref="S3.SS1.p1.2.m2.1.1.3">𝑖</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.2.m2.1c">\sigma_{i}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.2.m2.1d">italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT</annotation></semantics></math> (range: <math alttext="[0,1)" class="ltx_Math" display="inline" id="S3.SS1.p1.3.m3.2"><semantics id="S3.SS1.p1.3.m3.2a"><mrow id="S3.SS1.p1.3.m3.2.3.2" xref="S3.SS1.p1.3.m3.2.3.1.cmml"><mo id="S3.SS1.p1.3.m3.2.3.2.1" stretchy="false" xref="S3.SS1.p1.3.m3.2.3.1.cmml">[</mo><mn id="S3.SS1.p1.3.m3.1.1" xref="S3.SS1.p1.3.m3.1.1.cmml">0</mn><mo id="S3.SS1.p1.3.m3.2.3.2.2" xref="S3.SS1.p1.3.m3.2.3.1.cmml">,</mo><mn id="S3.SS1.p1.3.m3.2.2" xref="S3.SS1.p1.3.m3.2.2.cmml">1</mn><mo id="S3.SS1.p1.3.m3.2.3.2.3" stretchy="false" xref="S3.SS1.p1.3.m3.2.3.1.cmml">)</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.3.m3.2b"><interval closure="closed-open" id="S3.SS1.p1.3.m3.2.3.1.cmml" xref="S3.SS1.p1.3.m3.2.3.2"><cn id="S3.SS1.p1.3.m3.1.1.cmml" type="integer" xref="S3.SS1.p1.3.m3.1.1">0</cn><cn id="S3.SS1.p1.3.m3.2.2.cmml" type="integer" xref="S3.SS1.p1.3.m3.2.2">1</cn></interval></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.3.m3.2c">[0,1)</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.3.m3.2d">[ 0 , 1 )</annotation></semantics></math>), mean position <math alttext="\mu_{i}\in\mathbb{R}^{3}" class="ltx_Math" display="inline" id="S3.SS1.p1.4.m4.1"><semantics id="S3.SS1.p1.4.m4.1a"><mrow id="S3.SS1.p1.4.m4.1.1" xref="S3.SS1.p1.4.m4.1.1.cmml"><msub id="S3.SS1.p1.4.m4.1.1.2" xref="S3.SS1.p1.4.m4.1.1.2.cmml"><mi id="S3.SS1.p1.4.m4.1.1.2.2" xref="S3.SS1.p1.4.m4.1.1.2.2.cmml">μ</mi><mi id="S3.SS1.p1.4.m4.1.1.2.3" xref="S3.SS1.p1.4.m4.1.1.2.3.cmml">i</mi></msub><mo id="S3.SS1.p1.4.m4.1.1.1" xref="S3.SS1.p1.4.m4.1.1.1.cmml">∈</mo><msup id="S3.SS1.p1.4.m4.1.1.3" xref="S3.SS1.p1.4.m4.1.1.3.cmml"><mi id="S3.SS1.p1.4.m4.1.1.3.2" xref="S3.SS1.p1.4.m4.1.1.3.2.cmml">ℝ</mi><mn id="S3.SS1.p1.4.m4.1.1.3.3" xref="S3.SS1.p1.4.m4.1.1.3.3.cmml">3</mn></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.4.m4.1b"><apply id="S3.SS1.p1.4.m4.1.1.cmml" xref="S3.SS1.p1.4.m4.1.1"><in id="S3.SS1.p1.4.m4.1.1.1.cmml" xref="S3.SS1.p1.4.m4.1.1.1"></in><apply id="S3.SS1.p1.4.m4.1.1.2.cmml" xref="S3.SS1.p1.4.m4.1.1.2"><csymbol cd="ambiguous" id="S3.SS1.p1.4.m4.1.1.2.1.cmml" xref="S3.SS1.p1.4.m4.1.1.2">subscript</csymbol><ci id="S3.SS1.p1.4.m4.1.1.2.2.cmml" xref="S3.SS1.p1.4.m4.1.1.2.2">𝜇</ci><ci id="S3.SS1.p1.4.m4.1.1.2.3.cmml" xref="S3.SS1.p1.4.m4.1.1.2.3">𝑖</ci></apply><apply id="S3.SS1.p1.4.m4.1.1.3.cmml" xref="S3.SS1.p1.4.m4.1.1.3"><csymbol cd="ambiguous" id="S3.SS1.p1.4.m4.1.1.3.1.cmml" xref="S3.SS1.p1.4.m4.1.1.3">superscript</csymbol><ci id="S3.SS1.p1.4.m4.1.1.3.2.cmml" xref="S3.SS1.p1.4.m4.1.1.3.2">ℝ</ci><cn id="S3.SS1.p1.4.m4.1.1.3.3.cmml" type="integer" xref="S3.SS1.p1.4.m4.1.1.3.3">3</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.4.m4.1c">\mu_{i}\in\mathbb{R}^{3}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.4.m4.1d">italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT</annotation></semantics></math> (indicating the center of the Gaussian), covariance matrix <math alttext="\Sigma_{i}\in\mathbb{R}^{3\times 3}" class="ltx_Math" display="inline" id="S3.SS1.p1.5.m5.1"><semantics id="S3.SS1.p1.5.m5.1a"><mrow id="S3.SS1.p1.5.m5.1.1" xref="S3.SS1.p1.5.m5.1.1.cmml"><msub id="S3.SS1.p1.5.m5.1.1.2" xref="S3.SS1.p1.5.m5.1.1.2.cmml"><mi id="S3.SS1.p1.5.m5.1.1.2.2" mathvariant="normal" xref="S3.SS1.p1.5.m5.1.1.2.2.cmml">Σ</mi><mi id="S3.SS1.p1.5.m5.1.1.2.3" xref="S3.SS1.p1.5.m5.1.1.2.3.cmml">i</mi></msub><mo id="S3.SS1.p1.5.m5.1.1.1" xref="S3.SS1.p1.5.m5.1.1.1.cmml">∈</mo><msup id="S3.SS1.p1.5.m5.1.1.3" xref="S3.SS1.p1.5.m5.1.1.3.cmml"><mi id="S3.SS1.p1.5.m5.1.1.3.2" xref="S3.SS1.p1.5.m5.1.1.3.2.cmml">ℝ</mi><mrow id="S3.SS1.p1.5.m5.1.1.3.3" xref="S3.SS1.p1.5.m5.1.1.3.3.cmml"><mn id="S3.SS1.p1.5.m5.1.1.3.3.2" xref="S3.SS1.p1.5.m5.1.1.3.3.2.cmml">3</mn><mo id="S3.SS1.p1.5.m5.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S3.SS1.p1.5.m5.1.1.3.3.1.cmml">×</mo><mn id="S3.SS1.p1.5.m5.1.1.3.3.3" xref="S3.SS1.p1.5.m5.1.1.3.3.3.cmml">3</mn></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.5.m5.1b"><apply id="S3.SS1.p1.5.m5.1.1.cmml" xref="S3.SS1.p1.5.m5.1.1"><in id="S3.SS1.p1.5.m5.1.1.1.cmml" xref="S3.SS1.p1.5.m5.1.1.1"></in><apply id="S3.SS1.p1.5.m5.1.1.2.cmml" xref="S3.SS1.p1.5.m5.1.1.2"><csymbol cd="ambiguous" id="S3.SS1.p1.5.m5.1.1.2.1.cmml" xref="S3.SS1.p1.5.m5.1.1.2">subscript</csymbol><ci id="S3.SS1.p1.5.m5.1.1.2.2.cmml" xref="S3.SS1.p1.5.m5.1.1.2.2">Σ</ci><ci id="S3.SS1.p1.5.m5.1.1.2.3.cmml" xref="S3.SS1.p1.5.m5.1.1.2.3">𝑖</ci></apply><apply id="S3.SS1.p1.5.m5.1.1.3.cmml" xref="S3.SS1.p1.5.m5.1.1.3"><csymbol cd="ambiguous" id="S3.SS1.p1.5.m5.1.1.3.1.cmml" xref="S3.SS1.p1.5.m5.1.1.3">superscript</csymbol><ci id="S3.SS1.p1.5.m5.1.1.3.2.cmml" xref="S3.SS1.p1.5.m5.1.1.3.2">ℝ</ci><apply id="S3.SS1.p1.5.m5.1.1.3.3.cmml" xref="S3.SS1.p1.5.m5.1.1.3.3"><times id="S3.SS1.p1.5.m5.1.1.3.3.1.cmml" xref="S3.SS1.p1.5.m5.1.1.3.3.1"></times><cn id="S3.SS1.p1.5.m5.1.1.3.3.2.cmml" type="integer" xref="S3.SS1.p1.5.m5.1.1.3.3.2">3</cn><cn id="S3.SS1.p1.5.m5.1.1.3.3.3.cmml" type="integer" xref="S3.SS1.p1.5.m5.1.1.3.3.3">3</cn></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.5.m5.1c">\Sigma_{i}\in\mathbb{R}^{3\times 3}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.5.m5.1d">roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 3 end_POSTSUPERSCRIPT</annotation></semantics></math> (describing the spread and orientation), and radiance function <math alttext="c_{i}:\mathbb{S}^{2}\rightarrow\mathbb{R}^{3}" class="ltx_Math" display="inline" id="S3.SS1.p1.6.m6.1"><semantics id="S3.SS1.p1.6.m6.1a"><mrow id="S3.SS1.p1.6.m6.1.1" xref="S3.SS1.p1.6.m6.1.1.cmml"><msub id="S3.SS1.p1.6.m6.1.1.2" xref="S3.SS1.p1.6.m6.1.1.2.cmml"><mi id="S3.SS1.p1.6.m6.1.1.2.2" xref="S3.SS1.p1.6.m6.1.1.2.2.cmml">c</mi><mi id="S3.SS1.p1.6.m6.1.1.2.3" xref="S3.SS1.p1.6.m6.1.1.2.3.cmml">i</mi></msub><mo id="S3.SS1.p1.6.m6.1.1.1" lspace="0.278em" rspace="0.278em" xref="S3.SS1.p1.6.m6.1.1.1.cmml">:</mo><mrow id="S3.SS1.p1.6.m6.1.1.3" xref="S3.SS1.p1.6.m6.1.1.3.cmml"><msup id="S3.SS1.p1.6.m6.1.1.3.2" xref="S3.SS1.p1.6.m6.1.1.3.2.cmml"><mi id="S3.SS1.p1.6.m6.1.1.3.2.2" xref="S3.SS1.p1.6.m6.1.1.3.2.2.cmml">𝕊</mi><mn id="S3.SS1.p1.6.m6.1.1.3.2.3" xref="S3.SS1.p1.6.m6.1.1.3.2.3.cmml">2</mn></msup><mo id="S3.SS1.p1.6.m6.1.1.3.1" stretchy="false" xref="S3.SS1.p1.6.m6.1.1.3.1.cmml">→</mo><msup id="S3.SS1.p1.6.m6.1.1.3.3" xref="S3.SS1.p1.6.m6.1.1.3.3.cmml"><mi id="S3.SS1.p1.6.m6.1.1.3.3.2" xref="S3.SS1.p1.6.m6.1.1.3.3.2.cmml">ℝ</mi><mn id="S3.SS1.p1.6.m6.1.1.3.3.3" xref="S3.SS1.p1.6.m6.1.1.3.3.3.cmml">3</mn></msup></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.6.m6.1b"><apply id="S3.SS1.p1.6.m6.1.1.cmml" xref="S3.SS1.p1.6.m6.1.1"><ci id="S3.SS1.p1.6.m6.1.1.1.cmml" xref="S3.SS1.p1.6.m6.1.1.1">:</ci><apply id="S3.SS1.p1.6.m6.1.1.2.cmml" xref="S3.SS1.p1.6.m6.1.1.2"><csymbol cd="ambiguous" id="S3.SS1.p1.6.m6.1.1.2.1.cmml" xref="S3.SS1.p1.6.m6.1.1.2">subscript</csymbol><ci id="S3.SS1.p1.6.m6.1.1.2.2.cmml" xref="S3.SS1.p1.6.m6.1.1.2.2">𝑐</ci><ci id="S3.SS1.p1.6.m6.1.1.2.3.cmml" xref="S3.SS1.p1.6.m6.1.1.2.3">𝑖</ci></apply><apply id="S3.SS1.p1.6.m6.1.1.3.cmml" xref="S3.SS1.p1.6.m6.1.1.3"><ci id="S3.SS1.p1.6.m6.1.1.3.1.cmml" xref="S3.SS1.p1.6.m6.1.1.3.1">→</ci><apply id="S3.SS1.p1.6.m6.1.1.3.2.cmml" xref="S3.SS1.p1.6.m6.1.1.3.2"><csymbol cd="ambiguous" id="S3.SS1.p1.6.m6.1.1.3.2.1.cmml" xref="S3.SS1.p1.6.m6.1.1.3.2">superscript</csymbol><ci id="S3.SS1.p1.6.m6.1.1.3.2.2.cmml" xref="S3.SS1.p1.6.m6.1.1.3.2.2">𝕊</ci><cn id="S3.SS1.p1.6.m6.1.1.3.2.3.cmml" type="integer" xref="S3.SS1.p1.6.m6.1.1.3.2.3">2</cn></apply><apply id="S3.SS1.p1.6.m6.1.1.3.3.cmml" xref="S3.SS1.p1.6.m6.1.1.3.3"><csymbol cd="ambiguous" id="S3.SS1.p1.6.m6.1.1.3.3.1.cmml" xref="S3.SS1.p1.6.m6.1.1.3.3">superscript</csymbol><ci id="S3.SS1.p1.6.m6.1.1.3.3.2.cmml" xref="S3.SS1.p1.6.m6.1.1.3.3.2">ℝ</ci><cn id="S3.SS1.p1.6.m6.1.1.3.3.3.cmml" type="integer" xref="S3.SS1.p1.6.m6.1.1.3.3.3">3</cn></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.6.m6.1c">c_{i}:\mathbb{S}^{2}\rightarrow\mathbb{R}^{3}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.6.m6.1d">italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : blackboard_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT</annotation></semantics></math> (which defines the direction-dependent color).</p> </div> <div class="ltx_para" id="S3.SS1.p2"> <p class="ltx_p" id="S3.SS1.p2.8">For each pixel, we predict the corresponding Gaussian parameters: opacity <math alttext="\sigma" class="ltx_Math" display="inline" id="S3.SS1.p2.1.m1.1"><semantics id="S3.SS1.p2.1.m1.1a"><mi id="S3.SS1.p2.1.m1.1.1" xref="S3.SS1.p2.1.m1.1.1.cmml">σ</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p2.1.m1.1b"><ci id="S3.SS1.p2.1.m1.1.1.cmml" xref="S3.SS1.p2.1.m1.1.1">𝜎</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p2.1.m1.1c">\sigma</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p2.1.m1.1d">italic_σ</annotation></semantics></math>, depth <math alttext="d\in\mathbb{R}^{+}" class="ltx_Math" display="inline" id="S3.SS1.p2.2.m2.1"><semantics id="S3.SS1.p2.2.m2.1a"><mrow id="S3.SS1.p2.2.m2.1.1" xref="S3.SS1.p2.2.m2.1.1.cmml"><mi id="S3.SS1.p2.2.m2.1.1.2" xref="S3.SS1.p2.2.m2.1.1.2.cmml">d</mi><mo id="S3.SS1.p2.2.m2.1.1.1" xref="S3.SS1.p2.2.m2.1.1.1.cmml">∈</mo><msup id="S3.SS1.p2.2.m2.1.1.3" xref="S3.SS1.p2.2.m2.1.1.3.cmml"><mi id="S3.SS1.p2.2.m2.1.1.3.2" xref="S3.SS1.p2.2.m2.1.1.3.2.cmml">ℝ</mi><mo id="S3.SS1.p2.2.m2.1.1.3.3" xref="S3.SS1.p2.2.m2.1.1.3.3.cmml">+</mo></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p2.2.m2.1b"><apply id="S3.SS1.p2.2.m2.1.1.cmml" xref="S3.SS1.p2.2.m2.1.1"><in id="S3.SS1.p2.2.m2.1.1.1.cmml" xref="S3.SS1.p2.2.m2.1.1.1"></in><ci id="S3.SS1.p2.2.m2.1.1.2.cmml" xref="S3.SS1.p2.2.m2.1.1.2">𝑑</ci><apply id="S3.SS1.p2.2.m2.1.1.3.cmml" xref="S3.SS1.p2.2.m2.1.1.3"><csymbol cd="ambiguous" id="S3.SS1.p2.2.m2.1.1.3.1.cmml" xref="S3.SS1.p2.2.m2.1.1.3">superscript</csymbol><ci id="S3.SS1.p2.2.m2.1.1.3.2.cmml" xref="S3.SS1.p2.2.m2.1.1.3.2">ℝ</ci><plus id="S3.SS1.p2.2.m2.1.1.3.3.cmml" xref="S3.SS1.p2.2.m2.1.1.3.3"></plus></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p2.2.m2.1c">d\in\mathbb{R}^{+}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p2.2.m2.1d">italic_d ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT</annotation></semantics></math>, position offset <math alttext="\Delta\in\mathbb{R}^{3}" class="ltx_Math" display="inline" id="S3.SS1.p2.3.m3.1"><semantics id="S3.SS1.p2.3.m3.1a"><mrow id="S3.SS1.p2.3.m3.1.1" xref="S3.SS1.p2.3.m3.1.1.cmml"><mi id="S3.SS1.p2.3.m3.1.1.2" mathvariant="normal" xref="S3.SS1.p2.3.m3.1.1.2.cmml">Δ</mi><mo id="S3.SS1.p2.3.m3.1.1.1" xref="S3.SS1.p2.3.m3.1.1.1.cmml">∈</mo><msup id="S3.SS1.p2.3.m3.1.1.3" xref="S3.SS1.p2.3.m3.1.1.3.cmml"><mi id="S3.SS1.p2.3.m3.1.1.3.2" xref="S3.SS1.p2.3.m3.1.1.3.2.cmml">ℝ</mi><mn id="S3.SS1.p2.3.m3.1.1.3.3" xref="S3.SS1.p2.3.m3.1.1.3.3.cmml">3</mn></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p2.3.m3.1b"><apply id="S3.SS1.p2.3.m3.1.1.cmml" xref="S3.SS1.p2.3.m3.1.1"><in id="S3.SS1.p2.3.m3.1.1.1.cmml" xref="S3.SS1.p2.3.m3.1.1.1"></in><ci id="S3.SS1.p2.3.m3.1.1.2.cmml" xref="S3.SS1.p2.3.m3.1.1.2">Δ</ci><apply id="S3.SS1.p2.3.m3.1.1.3.cmml" xref="S3.SS1.p2.3.m3.1.1.3"><csymbol cd="ambiguous" id="S3.SS1.p2.3.m3.1.1.3.1.cmml" xref="S3.SS1.p2.3.m3.1.1.3">superscript</csymbol><ci id="S3.SS1.p2.3.m3.1.1.3.2.cmml" xref="S3.SS1.p2.3.m3.1.1.3.2">ℝ</ci><cn id="S3.SS1.p2.3.m3.1.1.3.3.cmml" type="integer" xref="S3.SS1.p2.3.m3.1.1.3.3">3</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p2.3.m3.1c">\Delta\in\mathbb{R}^{3}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p2.3.m3.1d">roman_Δ ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT</annotation></semantics></math>, covariance <math alttext="\Sigma\in\mathbb{R}^{3\times 3}" class="ltx_Math" display="inline" id="S3.SS1.p2.4.m4.1"><semantics id="S3.SS1.p2.4.m4.1a"><mrow id="S3.SS1.p2.4.m4.1.1" xref="S3.SS1.p2.4.m4.1.1.cmml"><mi id="S3.SS1.p2.4.m4.1.1.2" mathvariant="normal" xref="S3.SS1.p2.4.m4.1.1.2.cmml">Σ</mi><mo id="S3.SS1.p2.4.m4.1.1.1" xref="S3.SS1.p2.4.m4.1.1.1.cmml">∈</mo><msup id="S3.SS1.p2.4.m4.1.1.3" xref="S3.SS1.p2.4.m4.1.1.3.cmml"><mi id="S3.SS1.p2.4.m4.1.1.3.2" xref="S3.SS1.p2.4.m4.1.1.3.2.cmml">ℝ</mi><mrow id="S3.SS1.p2.4.m4.1.1.3.3" xref="S3.SS1.p2.4.m4.1.1.3.3.cmml"><mn id="S3.SS1.p2.4.m4.1.1.3.3.2" xref="S3.SS1.p2.4.m4.1.1.3.3.2.cmml">3</mn><mo id="S3.SS1.p2.4.m4.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S3.SS1.p2.4.m4.1.1.3.3.1.cmml">×</mo><mn id="S3.SS1.p2.4.m4.1.1.3.3.3" xref="S3.SS1.p2.4.m4.1.1.3.3.3.cmml">3</mn></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p2.4.m4.1b"><apply id="S3.SS1.p2.4.m4.1.1.cmml" xref="S3.SS1.p2.4.m4.1.1"><in id="S3.SS1.p2.4.m4.1.1.1.cmml" xref="S3.SS1.p2.4.m4.1.1.1"></in><ci id="S3.SS1.p2.4.m4.1.1.2.cmml" xref="S3.SS1.p2.4.m4.1.1.2">Σ</ci><apply id="S3.SS1.p2.4.m4.1.1.3.cmml" xref="S3.SS1.p2.4.m4.1.1.3"><csymbol cd="ambiguous" id="S3.SS1.p2.4.m4.1.1.3.1.cmml" xref="S3.SS1.p2.4.m4.1.1.3">superscript</csymbol><ci id="S3.SS1.p2.4.m4.1.1.3.2.cmml" xref="S3.SS1.p2.4.m4.1.1.3.2">ℝ</ci><apply id="S3.SS1.p2.4.m4.1.1.3.3.cmml" xref="S3.SS1.p2.4.m4.1.1.3.3"><times id="S3.SS1.p2.4.m4.1.1.3.3.1.cmml" xref="S3.SS1.p2.4.m4.1.1.3.3.1"></times><cn id="S3.SS1.p2.4.m4.1.1.3.3.2.cmml" type="integer" xref="S3.SS1.p2.4.m4.1.1.3.3.2">3</cn><cn id="S3.SS1.p2.4.m4.1.1.3.3.3.cmml" type="integer" xref="S3.SS1.p2.4.m4.1.1.3.3.3">3</cn></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p2.4.m4.1c">\Sigma\in\mathbb{R}^{3\times 3}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p2.4.m4.1d">roman_Σ ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 3 end_POSTSUPERSCRIPT</annotation></semantics></math>, a normal vector <math alttext="\gamma\in\mathbb{R}^{3}" class="ltx_Math" display="inline" id="S3.SS1.p2.5.m5.1"><semantics id="S3.SS1.p2.5.m5.1a"><mrow id="S3.SS1.p2.5.m5.1.1" xref="S3.SS1.p2.5.m5.1.1.cmml"><mi id="S3.SS1.p2.5.m5.1.1.2" xref="S3.SS1.p2.5.m5.1.1.2.cmml">γ</mi><mo id="S3.SS1.p2.5.m5.1.1.1" xref="S3.SS1.p2.5.m5.1.1.1.cmml">∈</mo><msup id="S3.SS1.p2.5.m5.1.1.3" xref="S3.SS1.p2.5.m5.1.1.3.cmml"><mi id="S3.SS1.p2.5.m5.1.1.3.2" xref="S3.SS1.p2.5.m5.1.1.3.2.cmml">ℝ</mi><mn id="S3.SS1.p2.5.m5.1.1.3.3" xref="S3.SS1.p2.5.m5.1.1.3.3.cmml">3</mn></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p2.5.m5.1b"><apply id="S3.SS1.p2.5.m5.1.1.cmml" xref="S3.SS1.p2.5.m5.1.1"><in id="S3.SS1.p2.5.m5.1.1.1.cmml" xref="S3.SS1.p2.5.m5.1.1.1"></in><ci id="S3.SS1.p2.5.m5.1.1.2.cmml" xref="S3.SS1.p2.5.m5.1.1.2">𝛾</ci><apply id="S3.SS1.p2.5.m5.1.1.3.cmml" xref="S3.SS1.p2.5.m5.1.1.3"><csymbol cd="ambiguous" id="S3.SS1.p2.5.m5.1.1.3.1.cmml" xref="S3.SS1.p2.5.m5.1.1.3">superscript</csymbol><ci id="S3.SS1.p2.5.m5.1.1.3.2.cmml" xref="S3.SS1.p2.5.m5.1.1.3.2">ℝ</ci><cn id="S3.SS1.p2.5.m5.1.1.3.3.cmml" type="integer" xref="S3.SS1.p2.5.m5.1.1.3.3">3</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p2.5.m5.1c">\gamma\in\mathbb{R}^{3}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p2.5.m5.1d">italic_γ ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT</annotation></semantics></math>, color model parameters <math alttext="c\in\mathbb{R}^{3(L+1)^{2}}" class="ltx_Math" display="inline" id="S3.SS1.p2.6.m6.1"><semantics id="S3.SS1.p2.6.m6.1a"><mrow id="S3.SS1.p2.6.m6.1.2" xref="S3.SS1.p2.6.m6.1.2.cmml"><mi id="S3.SS1.p2.6.m6.1.2.2" xref="S3.SS1.p2.6.m6.1.2.2.cmml">c</mi><mo id="S3.SS1.p2.6.m6.1.2.1" xref="S3.SS1.p2.6.m6.1.2.1.cmml">∈</mo><msup id="S3.SS1.p2.6.m6.1.2.3" xref="S3.SS1.p2.6.m6.1.2.3.cmml"><mi id="S3.SS1.p2.6.m6.1.2.3.2" xref="S3.SS1.p2.6.m6.1.2.3.2.cmml">ℝ</mi><mrow id="S3.SS1.p2.6.m6.1.1.1" xref="S3.SS1.p2.6.m6.1.1.1.cmml"><mn id="S3.SS1.p2.6.m6.1.1.1.3" xref="S3.SS1.p2.6.m6.1.1.1.3.cmml">3</mn><mo id="S3.SS1.p2.6.m6.1.1.1.2" xref="S3.SS1.p2.6.m6.1.1.1.2.cmml"></mo><msup id="S3.SS1.p2.6.m6.1.1.1.1" xref="S3.SS1.p2.6.m6.1.1.1.1.cmml"><mrow id="S3.SS1.p2.6.m6.1.1.1.1.1.1" xref="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.cmml"><mo id="S3.SS1.p2.6.m6.1.1.1.1.1.1.2" stretchy="false" xref="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.cmml">(</mo><mrow id="S3.SS1.p2.6.m6.1.1.1.1.1.1.1" xref="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.cmml"><mi id="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.2" xref="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.2.cmml">L</mi><mo id="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.1" xref="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.1.cmml">+</mo><mn id="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.3" xref="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.3.cmml">1</mn></mrow><mo id="S3.SS1.p2.6.m6.1.1.1.1.1.1.3" stretchy="false" xref="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.cmml">)</mo></mrow><mn id="S3.SS1.p2.6.m6.1.1.1.1.3" xref="S3.SS1.p2.6.m6.1.1.1.1.3.cmml">2</mn></msup></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p2.6.m6.1b"><apply id="S3.SS1.p2.6.m6.1.2.cmml" xref="S3.SS1.p2.6.m6.1.2"><in id="S3.SS1.p2.6.m6.1.2.1.cmml" xref="S3.SS1.p2.6.m6.1.2.1"></in><ci id="S3.SS1.p2.6.m6.1.2.2.cmml" xref="S3.SS1.p2.6.m6.1.2.2">𝑐</ci><apply id="S3.SS1.p2.6.m6.1.2.3.cmml" xref="S3.SS1.p2.6.m6.1.2.3"><csymbol cd="ambiguous" id="S3.SS1.p2.6.m6.1.2.3.1.cmml" xref="S3.SS1.p2.6.m6.1.2.3">superscript</csymbol><ci id="S3.SS1.p2.6.m6.1.2.3.2.cmml" xref="S3.SS1.p2.6.m6.1.2.3.2">ℝ</ci><apply id="S3.SS1.p2.6.m6.1.1.1.cmml" xref="S3.SS1.p2.6.m6.1.1.1"><times id="S3.SS1.p2.6.m6.1.1.1.2.cmml" xref="S3.SS1.p2.6.m6.1.1.1.2"></times><cn id="S3.SS1.p2.6.m6.1.1.1.3.cmml" type="integer" xref="S3.SS1.p2.6.m6.1.1.1.3">3</cn><apply id="S3.SS1.p2.6.m6.1.1.1.1.cmml" xref="S3.SS1.p2.6.m6.1.1.1.1"><csymbol cd="ambiguous" id="S3.SS1.p2.6.m6.1.1.1.1.2.cmml" xref="S3.SS1.p2.6.m6.1.1.1.1">superscript</csymbol><apply id="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.cmml" xref="S3.SS1.p2.6.m6.1.1.1.1.1.1"><plus id="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.1.cmml" xref="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.1"></plus><ci id="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.2.cmml" xref="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.2">𝐿</ci><cn id="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.3.cmml" type="integer" xref="S3.SS1.p2.6.m6.1.1.1.1.1.1.1.3">1</cn></apply><cn id="S3.SS1.p2.6.m6.1.1.1.1.3.cmml" type="integer" xref="S3.SS1.p2.6.m6.1.1.1.1.3">2</cn></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p2.6.m6.1c">c\in\mathbb{R}^{3(L+1)^{2}}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p2.6.m6.1d">italic_c ∈ blackboard_R start_POSTSUPERSCRIPT 3 ( italic_L + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT</annotation></semantics></math>, and <math alttext="\mu_{i}\in\mathbb{R}^{3}" class="ltx_Math" display="inline" id="S3.SS1.p2.7.m7.1"><semantics id="S3.SS1.p2.7.m7.1a"><mrow id="S3.SS1.p2.7.m7.1.1" xref="S3.SS1.p2.7.m7.1.1.cmml"><msub id="S3.SS1.p2.7.m7.1.1.2" xref="S3.SS1.p2.7.m7.1.1.2.cmml"><mi id="S3.SS1.p2.7.m7.1.1.2.2" xref="S3.SS1.p2.7.m7.1.1.2.2.cmml">μ</mi><mi id="S3.SS1.p2.7.m7.1.1.2.3" xref="S3.SS1.p2.7.m7.1.1.2.3.cmml">i</mi></msub><mo id="S3.SS1.p2.7.m7.1.1.1" xref="S3.SS1.p2.7.m7.1.1.1.cmml">∈</mo><msup id="S3.SS1.p2.7.m7.1.1.3" xref="S3.SS1.p2.7.m7.1.1.3.cmml"><mi id="S3.SS1.p2.7.m7.1.1.3.2" xref="S3.SS1.p2.7.m7.1.1.3.2.cmml">ℝ</mi><mn id="S3.SS1.p2.7.m7.1.1.3.3" xref="S3.SS1.p2.7.m7.1.1.3.3.cmml">3</mn></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p2.7.m7.1b"><apply id="S3.SS1.p2.7.m7.1.1.cmml" xref="S3.SS1.p2.7.m7.1.1"><in id="S3.SS1.p2.7.m7.1.1.1.cmml" xref="S3.SS1.p2.7.m7.1.1.1"></in><apply id="S3.SS1.p2.7.m7.1.1.2.cmml" xref="S3.SS1.p2.7.m7.1.1.2"><csymbol cd="ambiguous" id="S3.SS1.p2.7.m7.1.1.2.1.cmml" xref="S3.SS1.p2.7.m7.1.1.2">subscript</csymbol><ci id="S3.SS1.p2.7.m7.1.1.2.2.cmml" xref="S3.SS1.p2.7.m7.1.1.2.2">𝜇</ci><ci id="S3.SS1.p2.7.m7.1.1.2.3.cmml" xref="S3.SS1.p2.7.m7.1.1.2.3">𝑖</ci></apply><apply id="S3.SS1.p2.7.m7.1.1.3.cmml" xref="S3.SS1.p2.7.m7.1.1.3"><csymbol cd="ambiguous" id="S3.SS1.p2.7.m7.1.1.3.1.cmml" xref="S3.SS1.p2.7.m7.1.1.3">superscript</csymbol><ci id="S3.SS1.p2.7.m7.1.1.3.2.cmml" xref="S3.SS1.p2.7.m7.1.1.3.2">ℝ</ci><cn id="S3.SS1.p2.7.m7.1.1.3.3.cmml" type="integer" xref="S3.SS1.p2.7.m7.1.1.3.3">3</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p2.7.m7.1c">\mu_{i}\in\mathbb{R}^{3}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p2.7.m7.1d">italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT</annotation></semantics></math>, which is the mean, where <math alttext="L" class="ltx_Math" display="inline" id="S3.SS1.p2.8.m8.1"><semantics id="S3.SS1.p2.8.m8.1a"><mi id="S3.SS1.p2.8.m8.1.1" xref="S3.SS1.p2.8.m8.1.1.cmml">L</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p2.8.m8.1b"><ci id="S3.SS1.p2.8.m8.1.1.cmml" xref="S3.SS1.p2.8.m8.1.1">𝐿</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p2.8.m8.1c">L</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p2.8.m8.1d">italic_L</annotation></semantics></math> means the order of spherical harmonics, thereby providing color and lighting information for each pixel. The unnormalized Gaussian function for each component is given by:</p> <table class="ltx_equation ltx_eqn_table" id="S3.E1"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="g_{i}(x)=\exp\left(-\frac{1}{2}(x-\mu_{i})^{\top}\Sigma_{i}^{-1}(x-\mu_{i})% \right)." class="ltx_Math" display="block" id="S3.E1.m1.3"><semantics id="S3.E1.m1.3a"><mrow id="S3.E1.m1.3.3.1" xref="S3.E1.m1.3.3.1.1.cmml"><mrow id="S3.E1.m1.3.3.1.1" xref="S3.E1.m1.3.3.1.1.cmml"><mrow id="S3.E1.m1.3.3.1.1.3" xref="S3.E1.m1.3.3.1.1.3.cmml"><msub id="S3.E1.m1.3.3.1.1.3.2" xref="S3.E1.m1.3.3.1.1.3.2.cmml"><mi id="S3.E1.m1.3.3.1.1.3.2.2" xref="S3.E1.m1.3.3.1.1.3.2.2.cmml">g</mi><mi id="S3.E1.m1.3.3.1.1.3.2.3" xref="S3.E1.m1.3.3.1.1.3.2.3.cmml">i</mi></msub><mo id="S3.E1.m1.3.3.1.1.3.1" xref="S3.E1.m1.3.3.1.1.3.1.cmml"></mo><mrow id="S3.E1.m1.3.3.1.1.3.3.2" xref="S3.E1.m1.3.3.1.1.3.cmml"><mo id="S3.E1.m1.3.3.1.1.3.3.2.1" stretchy="false" xref="S3.E1.m1.3.3.1.1.3.cmml">(</mo><mi id="S3.E1.m1.1.1" xref="S3.E1.m1.1.1.cmml">x</mi><mo id="S3.E1.m1.3.3.1.1.3.3.2.2" stretchy="false" xref="S3.E1.m1.3.3.1.1.3.cmml">)</mo></mrow></mrow><mo id="S3.E1.m1.3.3.1.1.2" xref="S3.E1.m1.3.3.1.1.2.cmml">=</mo><mrow id="S3.E1.m1.3.3.1.1.1.1" xref="S3.E1.m1.3.3.1.1.1.2.cmml"><mi id="S3.E1.m1.2.2" xref="S3.E1.m1.2.2.cmml">exp</mi><mo id="S3.E1.m1.3.3.1.1.1.1a" xref="S3.E1.m1.3.3.1.1.1.2.cmml"></mo><mrow id="S3.E1.m1.3.3.1.1.1.1.1" xref="S3.E1.m1.3.3.1.1.1.2.cmml"><mo id="S3.E1.m1.3.3.1.1.1.1.1.2" xref="S3.E1.m1.3.3.1.1.1.2.cmml">(</mo><mrow id="S3.E1.m1.3.3.1.1.1.1.1.1" xref="S3.E1.m1.3.3.1.1.1.1.1.1.cmml"><mo id="S3.E1.m1.3.3.1.1.1.1.1.1a" xref="S3.E1.m1.3.3.1.1.1.1.1.1.cmml">−</mo><mrow id="S3.E1.m1.3.3.1.1.1.1.1.1.2" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.cmml"><mfrac id="S3.E1.m1.3.3.1.1.1.1.1.1.2.4" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.4.cmml"><mn id="S3.E1.m1.3.3.1.1.1.1.1.1.2.4.2" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.4.2.cmml">1</mn><mn id="S3.E1.m1.3.3.1.1.1.1.1.1.2.4.3" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.4.3.cmml">2</mn></mfrac><mo id="S3.E1.m1.3.3.1.1.1.1.1.1.2.3" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.3.cmml"></mo><msup id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.cmml"><mrow id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.cmml"><mo id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.2" stretchy="false" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.cmml">(</mo><mrow id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.cmml"><mi id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.2" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.2.cmml">x</mi><mo id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.1" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.1.cmml">−</mo><msub id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.3" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.3.cmml"><mi id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.3.2" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.3.2.cmml">μ</mi><mi id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.3.3" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.3.3.cmml">i</mi></msub></mrow><mo id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.3" stretchy="false" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.cmml">)</mo></mrow><mo id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.3" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.3.cmml">⊤</mo></msup><mo id="S3.E1.m1.3.3.1.1.1.1.1.1.2.3a" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.3.cmml"></mo><msubsup id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.cmml"><mi id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.2.2" mathvariant="normal" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.2.2.cmml">Σ</mi><mi id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.2.3" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.2.3.cmml">i</mi><mrow id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.3" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.3.cmml"><mo id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.3a" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.3.cmml">−</mo><mn id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.3.2" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.3.2.cmml">1</mn></mrow></msubsup><mo id="S3.E1.m1.3.3.1.1.1.1.1.1.2.3b" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.3.cmml"></mo><mrow id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.cmml"><mo id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.2" stretchy="false" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.cmml">(</mo><mrow id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.cmml"><mi id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.2" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.2.cmml">x</mi><mo id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.1" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.1.cmml">−</mo><msub id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.3" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.3.cmml"><mi id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.3.2" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.3.2.cmml">μ</mi><mi id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.3.3" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.3.3.cmml">i</mi></msub></mrow><mo id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.3" stretchy="false" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.cmml">)</mo></mrow></mrow></mrow><mo id="S3.E1.m1.3.3.1.1.1.1.1.3" xref="S3.E1.m1.3.3.1.1.1.2.cmml">)</mo></mrow></mrow></mrow><mo id="S3.E1.m1.3.3.1.2" lspace="0em" xref="S3.E1.m1.3.3.1.1.cmml">.</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.E1.m1.3b"><apply id="S3.E1.m1.3.3.1.1.cmml" xref="S3.E1.m1.3.3.1"><eq id="S3.E1.m1.3.3.1.1.2.cmml" xref="S3.E1.m1.3.3.1.1.2"></eq><apply id="S3.E1.m1.3.3.1.1.3.cmml" xref="S3.E1.m1.3.3.1.1.3"><times id="S3.E1.m1.3.3.1.1.3.1.cmml" xref="S3.E1.m1.3.3.1.1.3.1"></times><apply id="S3.E1.m1.3.3.1.1.3.2.cmml" xref="S3.E1.m1.3.3.1.1.3.2"><csymbol cd="ambiguous" id="S3.E1.m1.3.3.1.1.3.2.1.cmml" xref="S3.E1.m1.3.3.1.1.3.2">subscript</csymbol><ci id="S3.E1.m1.3.3.1.1.3.2.2.cmml" xref="S3.E1.m1.3.3.1.1.3.2.2">𝑔</ci><ci id="S3.E1.m1.3.3.1.1.3.2.3.cmml" xref="S3.E1.m1.3.3.1.1.3.2.3">𝑖</ci></apply><ci id="S3.E1.m1.1.1.cmml" xref="S3.E1.m1.1.1">𝑥</ci></apply><apply id="S3.E1.m1.3.3.1.1.1.2.cmml" xref="S3.E1.m1.3.3.1.1.1.1"><exp id="S3.E1.m1.2.2.cmml" xref="S3.E1.m1.2.2"></exp><apply id="S3.E1.m1.3.3.1.1.1.1.1.1.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1"><minus id="S3.E1.m1.3.3.1.1.1.1.1.1.3.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1"></minus><apply id="S3.E1.m1.3.3.1.1.1.1.1.1.2.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2"><times id="S3.E1.m1.3.3.1.1.1.1.1.1.2.3.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.3"></times><apply id="S3.E1.m1.3.3.1.1.1.1.1.1.2.4.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.4"><divide id="S3.E1.m1.3.3.1.1.1.1.1.1.2.4.1.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.4"></divide><cn id="S3.E1.m1.3.3.1.1.1.1.1.1.2.4.2.cmml" type="integer" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.4.2">1</cn><cn id="S3.E1.m1.3.3.1.1.1.1.1.1.2.4.3.cmml" type="integer" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.4.3">2</cn></apply><apply id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.2.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1">superscript</csymbol><apply id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1"><minus id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.1"></minus><ci id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.2.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.2">𝑥</ci><apply id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.3.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.3.1.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.3">subscript</csymbol><ci id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.3.2.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.3.2">𝜇</ci><ci id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.3.3.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.1.1.1.3.3">𝑖</ci></apply></apply><csymbol cd="latexml" id="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.3.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.1.1.3">top</csymbol></apply><apply id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5"><csymbol cd="ambiguous" id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.1.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5">superscript</csymbol><apply id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.2.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5"><csymbol cd="ambiguous" id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.2.1.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5">subscript</csymbol><ci id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.2.2.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.2.2">Σ</ci><ci id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.2.3.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.2.3">𝑖</ci></apply><apply id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.3.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.3"><minus id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.3.1.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.3"></minus><cn id="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.3.2.cmml" type="integer" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.5.3.2">1</cn></apply></apply><apply id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1"><minus id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.1.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.1"></minus><ci id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.2.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.2">𝑥</ci><apply id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.3.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.3"><csymbol cd="ambiguous" id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.3.1.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.3">subscript</csymbol><ci id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.3.2.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.3.2">𝜇</ci><ci id="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.3.3.cmml" xref="S3.E1.m1.3.3.1.1.1.1.1.1.2.2.1.1.3.3">𝑖</ci></apply></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E1.m1.3c">g_{i}(x)=\exp\left(-\frac{1}{2}(x-\mu_{i})^{\top}\Sigma_{i}^{-1}(x-\mu_{i})% \right).</annotation><annotation encoding="application/x-llamapun" id="S3.E1.m1.3d">italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) = roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_x - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_x - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) .</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(1)</span></td> </tr></tbody> </table> </div> <div class="ltx_para ltx_noindent" id="S3.SS1.p3"> <p class="ltx_p" id="S3.SS1.p3.8">Spherical harmonics <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib47" title="">47</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib59" title="">59</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib84" title="">84</a>]</cite> are employed to model colors emitted by these Gaussians,</p> <table class="ltx_equation ltx_eqn_table" id="S3.E2"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="[c_{i}(\nu)]_{j}=\sum_{l=0}^{L}\sum_{m=-l}^{l}c_{ijop}Z_{op}(\nu)," class="ltx_Math" display="block" id="S3.E2.m1.3"><semantics id="S3.E2.m1.3a"><mrow id="S3.E2.m1.3.3.1" xref="S3.E2.m1.3.3.1.1.cmml"><mrow id="S3.E2.m1.3.3.1.1" xref="S3.E2.m1.3.3.1.1.cmml"><msub id="S3.E2.m1.3.3.1.1.1" xref="S3.E2.m1.3.3.1.1.1.cmml"><mrow id="S3.E2.m1.3.3.1.1.1.1.1" xref="S3.E2.m1.3.3.1.1.1.1.2.cmml"><mo id="S3.E2.m1.3.3.1.1.1.1.1.2" stretchy="false" xref="S3.E2.m1.3.3.1.1.1.1.2.1.cmml">[</mo><mrow id="S3.E2.m1.3.3.1.1.1.1.1.1" xref="S3.E2.m1.3.3.1.1.1.1.1.1.cmml"><msub id="S3.E2.m1.3.3.1.1.1.1.1.1.2" xref="S3.E2.m1.3.3.1.1.1.1.1.1.2.cmml"><mi id="S3.E2.m1.3.3.1.1.1.1.1.1.2.2" xref="S3.E2.m1.3.3.1.1.1.1.1.1.2.2.cmml">c</mi><mi id="S3.E2.m1.3.3.1.1.1.1.1.1.2.3" xref="S3.E2.m1.3.3.1.1.1.1.1.1.2.3.cmml">i</mi></msub><mo id="S3.E2.m1.3.3.1.1.1.1.1.1.1" xref="S3.E2.m1.3.3.1.1.1.1.1.1.1.cmml"></mo><mrow id="S3.E2.m1.3.3.1.1.1.1.1.1.3.2" xref="S3.E2.m1.3.3.1.1.1.1.1.1.cmml"><mo id="S3.E2.m1.3.3.1.1.1.1.1.1.3.2.1" stretchy="false" xref="S3.E2.m1.3.3.1.1.1.1.1.1.cmml">(</mo><mi id="S3.E2.m1.1.1" xref="S3.E2.m1.1.1.cmml">ν</mi><mo id="S3.E2.m1.3.3.1.1.1.1.1.1.3.2.2" stretchy="false" xref="S3.E2.m1.3.3.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.E2.m1.3.3.1.1.1.1.1.3" stretchy="false" xref="S3.E2.m1.3.3.1.1.1.1.2.1.cmml">]</mo></mrow><mi id="S3.E2.m1.3.3.1.1.1.3" xref="S3.E2.m1.3.3.1.1.1.3.cmml">j</mi></msub><mo id="S3.E2.m1.3.3.1.1.2" rspace="0.111em" xref="S3.E2.m1.3.3.1.1.2.cmml">=</mo><mrow id="S3.E2.m1.3.3.1.1.3" xref="S3.E2.m1.3.3.1.1.3.cmml"><munderover id="S3.E2.m1.3.3.1.1.3.1" xref="S3.E2.m1.3.3.1.1.3.1.cmml"><mo id="S3.E2.m1.3.3.1.1.3.1.2.2" movablelimits="false" rspace="0em" xref="S3.E2.m1.3.3.1.1.3.1.2.2.cmml">∑</mo><mrow id="S3.E2.m1.3.3.1.1.3.1.2.3" xref="S3.E2.m1.3.3.1.1.3.1.2.3.cmml"><mi id="S3.E2.m1.3.3.1.1.3.1.2.3.2" xref="S3.E2.m1.3.3.1.1.3.1.2.3.2.cmml">l</mi><mo id="S3.E2.m1.3.3.1.1.3.1.2.3.1" xref="S3.E2.m1.3.3.1.1.3.1.2.3.1.cmml">=</mo><mn id="S3.E2.m1.3.3.1.1.3.1.2.3.3" xref="S3.E2.m1.3.3.1.1.3.1.2.3.3.cmml">0</mn></mrow><mi id="S3.E2.m1.3.3.1.1.3.1.3" xref="S3.E2.m1.3.3.1.1.3.1.3.cmml">L</mi></munderover><mrow id="S3.E2.m1.3.3.1.1.3.2" xref="S3.E2.m1.3.3.1.1.3.2.cmml"><munderover id="S3.E2.m1.3.3.1.1.3.2.1" xref="S3.E2.m1.3.3.1.1.3.2.1.cmml"><mo id="S3.E2.m1.3.3.1.1.3.2.1.2.2" movablelimits="false" xref="S3.E2.m1.3.3.1.1.3.2.1.2.2.cmml">∑</mo><mrow id="S3.E2.m1.3.3.1.1.3.2.1.2.3" xref="S3.E2.m1.3.3.1.1.3.2.1.2.3.cmml"><mi id="S3.E2.m1.3.3.1.1.3.2.1.2.3.2" xref="S3.E2.m1.3.3.1.1.3.2.1.2.3.2.cmml">m</mi><mo id="S3.E2.m1.3.3.1.1.3.2.1.2.3.1" xref="S3.E2.m1.3.3.1.1.3.2.1.2.3.1.cmml">=</mo><mrow id="S3.E2.m1.3.3.1.1.3.2.1.2.3.3" xref="S3.E2.m1.3.3.1.1.3.2.1.2.3.3.cmml"><mo id="S3.E2.m1.3.3.1.1.3.2.1.2.3.3a" xref="S3.E2.m1.3.3.1.1.3.2.1.2.3.3.cmml">−</mo><mi id="S3.E2.m1.3.3.1.1.3.2.1.2.3.3.2" xref="S3.E2.m1.3.3.1.1.3.2.1.2.3.3.2.cmml">l</mi></mrow></mrow><mi id="S3.E2.m1.3.3.1.1.3.2.1.3" xref="S3.E2.m1.3.3.1.1.3.2.1.3.cmml">l</mi></munderover><mrow id="S3.E2.m1.3.3.1.1.3.2.2" xref="S3.E2.m1.3.3.1.1.3.2.2.cmml"><msub id="S3.E2.m1.3.3.1.1.3.2.2.2" xref="S3.E2.m1.3.3.1.1.3.2.2.2.cmml"><mi id="S3.E2.m1.3.3.1.1.3.2.2.2.2" xref="S3.E2.m1.3.3.1.1.3.2.2.2.2.cmml">c</mi><mrow id="S3.E2.m1.3.3.1.1.3.2.2.2.3" xref="S3.E2.m1.3.3.1.1.3.2.2.2.3.cmml"><mi id="S3.E2.m1.3.3.1.1.3.2.2.2.3.2" xref="S3.E2.m1.3.3.1.1.3.2.2.2.3.2.cmml">i</mi><mo id="S3.E2.m1.3.3.1.1.3.2.2.2.3.1" xref="S3.E2.m1.3.3.1.1.3.2.2.2.3.1.cmml"></mo><mi id="S3.E2.m1.3.3.1.1.3.2.2.2.3.3" xref="S3.E2.m1.3.3.1.1.3.2.2.2.3.3.cmml">j</mi><mo id="S3.E2.m1.3.3.1.1.3.2.2.2.3.1a" xref="S3.E2.m1.3.3.1.1.3.2.2.2.3.1.cmml"></mo><mi id="S3.E2.m1.3.3.1.1.3.2.2.2.3.4" xref="S3.E2.m1.3.3.1.1.3.2.2.2.3.4.cmml">o</mi><mo id="S3.E2.m1.3.3.1.1.3.2.2.2.3.1b" xref="S3.E2.m1.3.3.1.1.3.2.2.2.3.1.cmml"></mo><mi id="S3.E2.m1.3.3.1.1.3.2.2.2.3.5" xref="S3.E2.m1.3.3.1.1.3.2.2.2.3.5.cmml">p</mi></mrow></msub><mo id="S3.E2.m1.3.3.1.1.3.2.2.1" xref="S3.E2.m1.3.3.1.1.3.2.2.1.cmml"></mo><msub id="S3.E2.m1.3.3.1.1.3.2.2.3" xref="S3.E2.m1.3.3.1.1.3.2.2.3.cmml"><mi id="S3.E2.m1.3.3.1.1.3.2.2.3.2" xref="S3.E2.m1.3.3.1.1.3.2.2.3.2.cmml">Z</mi><mrow id="S3.E2.m1.3.3.1.1.3.2.2.3.3" xref="S3.E2.m1.3.3.1.1.3.2.2.3.3.cmml"><mi id="S3.E2.m1.3.3.1.1.3.2.2.3.3.2" xref="S3.E2.m1.3.3.1.1.3.2.2.3.3.2.cmml">o</mi><mo id="S3.E2.m1.3.3.1.1.3.2.2.3.3.1" xref="S3.E2.m1.3.3.1.1.3.2.2.3.3.1.cmml"></mo><mi id="S3.E2.m1.3.3.1.1.3.2.2.3.3.3" xref="S3.E2.m1.3.3.1.1.3.2.2.3.3.3.cmml">p</mi></mrow></msub><mo id="S3.E2.m1.3.3.1.1.3.2.2.1a" xref="S3.E2.m1.3.3.1.1.3.2.2.1.cmml"></mo><mrow id="S3.E2.m1.3.3.1.1.3.2.2.4.2" xref="S3.E2.m1.3.3.1.1.3.2.2.cmml"><mo id="S3.E2.m1.3.3.1.1.3.2.2.4.2.1" stretchy="false" xref="S3.E2.m1.3.3.1.1.3.2.2.cmml">(</mo><mi id="S3.E2.m1.2.2" xref="S3.E2.m1.2.2.cmml">ν</mi><mo id="S3.E2.m1.3.3.1.1.3.2.2.4.2.2" stretchy="false" xref="S3.E2.m1.3.3.1.1.3.2.2.cmml">)</mo></mrow></mrow></mrow></mrow></mrow><mo id="S3.E2.m1.3.3.1.2" xref="S3.E2.m1.3.3.1.1.cmml">,</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.E2.m1.3b"><apply id="S3.E2.m1.3.3.1.1.cmml" xref="S3.E2.m1.3.3.1"><eq id="S3.E2.m1.3.3.1.1.2.cmml" xref="S3.E2.m1.3.3.1.1.2"></eq><apply id="S3.E2.m1.3.3.1.1.1.cmml" xref="S3.E2.m1.3.3.1.1.1"><csymbol cd="ambiguous" id="S3.E2.m1.3.3.1.1.1.2.cmml" xref="S3.E2.m1.3.3.1.1.1">subscript</csymbol><apply id="S3.E2.m1.3.3.1.1.1.1.2.cmml" xref="S3.E2.m1.3.3.1.1.1.1.1"><csymbol cd="latexml" id="S3.E2.m1.3.3.1.1.1.1.2.1.cmml" xref="S3.E2.m1.3.3.1.1.1.1.1.2">delimited-[]</csymbol><apply id="S3.E2.m1.3.3.1.1.1.1.1.1.cmml" xref="S3.E2.m1.3.3.1.1.1.1.1.1"><times id="S3.E2.m1.3.3.1.1.1.1.1.1.1.cmml" xref="S3.E2.m1.3.3.1.1.1.1.1.1.1"></times><apply id="S3.E2.m1.3.3.1.1.1.1.1.1.2.cmml" xref="S3.E2.m1.3.3.1.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S3.E2.m1.3.3.1.1.1.1.1.1.2.1.cmml" xref="S3.E2.m1.3.3.1.1.1.1.1.1.2">subscript</csymbol><ci id="S3.E2.m1.3.3.1.1.1.1.1.1.2.2.cmml" xref="S3.E2.m1.3.3.1.1.1.1.1.1.2.2">𝑐</ci><ci id="S3.E2.m1.3.3.1.1.1.1.1.1.2.3.cmml" xref="S3.E2.m1.3.3.1.1.1.1.1.1.2.3">𝑖</ci></apply><ci id="S3.E2.m1.1.1.cmml" xref="S3.E2.m1.1.1">𝜈</ci></apply></apply><ci id="S3.E2.m1.3.3.1.1.1.3.cmml" xref="S3.E2.m1.3.3.1.1.1.3">𝑗</ci></apply><apply id="S3.E2.m1.3.3.1.1.3.cmml" xref="S3.E2.m1.3.3.1.1.3"><apply id="S3.E2.m1.3.3.1.1.3.1.cmml" xref="S3.E2.m1.3.3.1.1.3.1"><csymbol cd="ambiguous" id="S3.E2.m1.3.3.1.1.3.1.1.cmml" xref="S3.E2.m1.3.3.1.1.3.1">superscript</csymbol><apply id="S3.E2.m1.3.3.1.1.3.1.2.cmml" xref="S3.E2.m1.3.3.1.1.3.1"><csymbol cd="ambiguous" id="S3.E2.m1.3.3.1.1.3.1.2.1.cmml" xref="S3.E2.m1.3.3.1.1.3.1">subscript</csymbol><sum id="S3.E2.m1.3.3.1.1.3.1.2.2.cmml" xref="S3.E2.m1.3.3.1.1.3.1.2.2"></sum><apply id="S3.E2.m1.3.3.1.1.3.1.2.3.cmml" xref="S3.E2.m1.3.3.1.1.3.1.2.3"><eq id="S3.E2.m1.3.3.1.1.3.1.2.3.1.cmml" xref="S3.E2.m1.3.3.1.1.3.1.2.3.1"></eq><ci id="S3.E2.m1.3.3.1.1.3.1.2.3.2.cmml" xref="S3.E2.m1.3.3.1.1.3.1.2.3.2">𝑙</ci><cn id="S3.E2.m1.3.3.1.1.3.1.2.3.3.cmml" type="integer" xref="S3.E2.m1.3.3.1.1.3.1.2.3.3">0</cn></apply></apply><ci id="S3.E2.m1.3.3.1.1.3.1.3.cmml" xref="S3.E2.m1.3.3.1.1.3.1.3">𝐿</ci></apply><apply id="S3.E2.m1.3.3.1.1.3.2.cmml" xref="S3.E2.m1.3.3.1.1.3.2"><apply id="S3.E2.m1.3.3.1.1.3.2.1.cmml" xref="S3.E2.m1.3.3.1.1.3.2.1"><csymbol cd="ambiguous" id="S3.E2.m1.3.3.1.1.3.2.1.1.cmml" xref="S3.E2.m1.3.3.1.1.3.2.1">superscript</csymbol><apply id="S3.E2.m1.3.3.1.1.3.2.1.2.cmml" xref="S3.E2.m1.3.3.1.1.3.2.1"><csymbol cd="ambiguous" id="S3.E2.m1.3.3.1.1.3.2.1.2.1.cmml" xref="S3.E2.m1.3.3.1.1.3.2.1">subscript</csymbol><sum id="S3.E2.m1.3.3.1.1.3.2.1.2.2.cmml" xref="S3.E2.m1.3.3.1.1.3.2.1.2.2"></sum><apply id="S3.E2.m1.3.3.1.1.3.2.1.2.3.cmml" xref="S3.E2.m1.3.3.1.1.3.2.1.2.3"><eq id="S3.E2.m1.3.3.1.1.3.2.1.2.3.1.cmml" xref="S3.E2.m1.3.3.1.1.3.2.1.2.3.1"></eq><ci id="S3.E2.m1.3.3.1.1.3.2.1.2.3.2.cmml" xref="S3.E2.m1.3.3.1.1.3.2.1.2.3.2">𝑚</ci><apply id="S3.E2.m1.3.3.1.1.3.2.1.2.3.3.cmml" xref="S3.E2.m1.3.3.1.1.3.2.1.2.3.3"><minus id="S3.E2.m1.3.3.1.1.3.2.1.2.3.3.1.cmml" xref="S3.E2.m1.3.3.1.1.3.2.1.2.3.3"></minus><ci id="S3.E2.m1.3.3.1.1.3.2.1.2.3.3.2.cmml" xref="S3.E2.m1.3.3.1.1.3.2.1.2.3.3.2">𝑙</ci></apply></apply></apply><ci id="S3.E2.m1.3.3.1.1.3.2.1.3.cmml" xref="S3.E2.m1.3.3.1.1.3.2.1.3">𝑙</ci></apply><apply id="S3.E2.m1.3.3.1.1.3.2.2.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2"><times id="S3.E2.m1.3.3.1.1.3.2.2.1.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.1"></times><apply id="S3.E2.m1.3.3.1.1.3.2.2.2.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.2"><csymbol cd="ambiguous" id="S3.E2.m1.3.3.1.1.3.2.2.2.1.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.2">subscript</csymbol><ci id="S3.E2.m1.3.3.1.1.3.2.2.2.2.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.2.2">𝑐</ci><apply id="S3.E2.m1.3.3.1.1.3.2.2.2.3.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.2.3"><times id="S3.E2.m1.3.3.1.1.3.2.2.2.3.1.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.2.3.1"></times><ci id="S3.E2.m1.3.3.1.1.3.2.2.2.3.2.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.2.3.2">𝑖</ci><ci id="S3.E2.m1.3.3.1.1.3.2.2.2.3.3.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.2.3.3">𝑗</ci><ci id="S3.E2.m1.3.3.1.1.3.2.2.2.3.4.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.2.3.4">𝑜</ci><ci id="S3.E2.m1.3.3.1.1.3.2.2.2.3.5.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.2.3.5">𝑝</ci></apply></apply><apply id="S3.E2.m1.3.3.1.1.3.2.2.3.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.3"><csymbol cd="ambiguous" id="S3.E2.m1.3.3.1.1.3.2.2.3.1.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.3">subscript</csymbol><ci id="S3.E2.m1.3.3.1.1.3.2.2.3.2.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.3.2">𝑍</ci><apply id="S3.E2.m1.3.3.1.1.3.2.2.3.3.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.3.3"><times id="S3.E2.m1.3.3.1.1.3.2.2.3.3.1.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.3.3.1"></times><ci id="S3.E2.m1.3.3.1.1.3.2.2.3.3.2.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.3.3.2">𝑜</ci><ci id="S3.E2.m1.3.3.1.1.3.2.2.3.3.3.cmml" xref="S3.E2.m1.3.3.1.1.3.2.2.3.3.3">𝑝</ci></apply></apply><ci id="S3.E2.m1.2.2.cmml" xref="S3.E2.m1.2.2">𝜈</ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E2.m1.3c">[c_{i}(\nu)]_{j}=\sum_{l=0}^{L}\sum_{m=-l}^{l}c_{ijop}Z_{op}(\nu),</annotation><annotation encoding="application/x-llamapun" id="S3.E2.m1.3d">[ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ν ) ] start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_m = - italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i italic_j italic_o italic_p end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ( italic_ν ) ,</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(2)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS1.p3.3">where <math alttext="\nu\in\mathbb{S}^{2}" class="ltx_Math" display="inline" id="S3.SS1.p3.1.m1.1"><semantics id="S3.SS1.p3.1.m1.1a"><mrow id="S3.SS1.p3.1.m1.1.1" xref="S3.SS1.p3.1.m1.1.1.cmml"><mi id="S3.SS1.p3.1.m1.1.1.2" xref="S3.SS1.p3.1.m1.1.1.2.cmml">ν</mi><mo id="S3.SS1.p3.1.m1.1.1.1" xref="S3.SS1.p3.1.m1.1.1.1.cmml">∈</mo><msup id="S3.SS1.p3.1.m1.1.1.3" xref="S3.SS1.p3.1.m1.1.1.3.cmml"><mi id="S3.SS1.p3.1.m1.1.1.3.2" xref="S3.SS1.p3.1.m1.1.1.3.2.cmml">𝕊</mi><mn id="S3.SS1.p3.1.m1.1.1.3.3" xref="S3.SS1.p3.1.m1.1.1.3.3.cmml">2</mn></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p3.1.m1.1b"><apply id="S3.SS1.p3.1.m1.1.1.cmml" xref="S3.SS1.p3.1.m1.1.1"><in id="S3.SS1.p3.1.m1.1.1.1.cmml" xref="S3.SS1.p3.1.m1.1.1.1"></in><ci id="S3.SS1.p3.1.m1.1.1.2.cmml" xref="S3.SS1.p3.1.m1.1.1.2">𝜈</ci><apply id="S3.SS1.p3.1.m1.1.1.3.cmml" xref="S3.SS1.p3.1.m1.1.1.3"><csymbol cd="ambiguous" id="S3.SS1.p3.1.m1.1.1.3.1.cmml" xref="S3.SS1.p3.1.m1.1.1.3">superscript</csymbol><ci id="S3.SS1.p3.1.m1.1.1.3.2.cmml" xref="S3.SS1.p3.1.m1.1.1.3.2">𝕊</ci><cn id="S3.SS1.p3.1.m1.1.1.3.3.cmml" type="integer" xref="S3.SS1.p3.1.m1.1.1.3.3">2</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p3.1.m1.1c">\nu\in\mathbb{S}^{2}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p3.1.m1.1d">italic_ν ∈ blackboard_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math> denotes the viewing direction, and <math alttext="Z_{op}" class="ltx_Math" display="inline" id="S3.SS1.p3.2.m2.1"><semantics id="S3.SS1.p3.2.m2.1a"><msub id="S3.SS1.p3.2.m2.1.1" xref="S3.SS1.p3.2.m2.1.1.cmml"><mi id="S3.SS1.p3.2.m2.1.1.2" xref="S3.SS1.p3.2.m2.1.1.2.cmml">Z</mi><mrow id="S3.SS1.p3.2.m2.1.1.3" xref="S3.SS1.p3.2.m2.1.1.3.cmml"><mi id="S3.SS1.p3.2.m2.1.1.3.2" xref="S3.SS1.p3.2.m2.1.1.3.2.cmml">o</mi><mo id="S3.SS1.p3.2.m2.1.1.3.1" xref="S3.SS1.p3.2.m2.1.1.3.1.cmml"></mo><mi id="S3.SS1.p3.2.m2.1.1.3.3" xref="S3.SS1.p3.2.m2.1.1.3.3.cmml">p</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.SS1.p3.2.m2.1b"><apply id="S3.SS1.p3.2.m2.1.1.cmml" xref="S3.SS1.p3.2.m2.1.1"><csymbol cd="ambiguous" id="S3.SS1.p3.2.m2.1.1.1.cmml" xref="S3.SS1.p3.2.m2.1.1">subscript</csymbol><ci id="S3.SS1.p3.2.m2.1.1.2.cmml" xref="S3.SS1.p3.2.m2.1.1.2">𝑍</ci><apply id="S3.SS1.p3.2.m2.1.1.3.cmml" xref="S3.SS1.p3.2.m2.1.1.3"><times id="S3.SS1.p3.2.m2.1.1.3.1.cmml" xref="S3.SS1.p3.2.m2.1.1.3.1"></times><ci id="S3.SS1.p3.2.m2.1.1.3.2.cmml" xref="S3.SS1.p3.2.m2.1.1.3.2">𝑜</ci><ci id="S3.SS1.p3.2.m2.1.1.3.3.cmml" xref="S3.SS1.p3.2.m2.1.1.3.3">𝑝</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p3.2.m2.1c">Z_{op}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p3.2.m2.1d">italic_Z start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT</annotation></semantics></math> represents the spherical harmonics. The opacity at the location <math alttext="x\in\mathbb{R}^{3}" class="ltx_Math" display="inline" id="S3.SS1.p3.3.m3.1"><semantics id="S3.SS1.p3.3.m3.1a"><mrow id="S3.SS1.p3.3.m3.1.1" xref="S3.SS1.p3.3.m3.1.1.cmml"><mi id="S3.SS1.p3.3.m3.1.1.2" xref="S3.SS1.p3.3.m3.1.1.2.cmml">x</mi><mo id="S3.SS1.p3.3.m3.1.1.1" xref="S3.SS1.p3.3.m3.1.1.1.cmml">∈</mo><msup id="S3.SS1.p3.3.m3.1.1.3" xref="S3.SS1.p3.3.m3.1.1.3.cmml"><mi id="S3.SS1.p3.3.m3.1.1.3.2" xref="S3.SS1.p3.3.m3.1.1.3.2.cmml">ℝ</mi><mn id="S3.SS1.p3.3.m3.1.1.3.3" xref="S3.SS1.p3.3.m3.1.1.3.3.cmml">3</mn></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p3.3.m3.1b"><apply id="S3.SS1.p3.3.m3.1.1.cmml" xref="S3.SS1.p3.3.m3.1.1"><in id="S3.SS1.p3.3.m3.1.1.1.cmml" xref="S3.SS1.p3.3.m3.1.1.1"></in><ci id="S3.SS1.p3.3.m3.1.1.2.cmml" xref="S3.SS1.p3.3.m3.1.1.2">𝑥</ci><apply id="S3.SS1.p3.3.m3.1.1.3.cmml" xref="S3.SS1.p3.3.m3.1.1.3"><csymbol cd="ambiguous" id="S3.SS1.p3.3.m3.1.1.3.1.cmml" xref="S3.SS1.p3.3.m3.1.1.3">superscript</csymbol><ci id="S3.SS1.p3.3.m3.1.1.3.2.cmml" xref="S3.SS1.p3.3.m3.1.1.3.2">ℝ</ci><cn id="S3.SS1.p3.3.m3.1.1.3.3.cmml" type="integer" xref="S3.SS1.p3.3.m3.1.1.3.3">3</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p3.3.m3.1c">x\in\mathbb{R}^{3}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p3.3.m3.1d">italic_x ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT</annotation></semantics></math> can be derived by:</p> <table class="ltx_equation ltx_eqn_table" id="S3.E3"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\sigma(x)=\sum_{i=1}^{G}\sigma_{i}g_{i}(x)," class="ltx_Math" display="block" id="S3.E3.m1.3"><semantics id="S3.E3.m1.3a"><mrow id="S3.E3.m1.3.3.1" xref="S3.E3.m1.3.3.1.1.cmml"><mrow id="S3.E3.m1.3.3.1.1" xref="S3.E3.m1.3.3.1.1.cmml"><mrow id="S3.E3.m1.3.3.1.1.2" xref="S3.E3.m1.3.3.1.1.2.cmml"><mi id="S3.E3.m1.3.3.1.1.2.2" xref="S3.E3.m1.3.3.1.1.2.2.cmml">σ</mi><mo id="S3.E3.m1.3.3.1.1.2.1" xref="S3.E3.m1.3.3.1.1.2.1.cmml"></mo><mrow id="S3.E3.m1.3.3.1.1.2.3.2" xref="S3.E3.m1.3.3.1.1.2.cmml"><mo id="S3.E3.m1.3.3.1.1.2.3.2.1" stretchy="false" xref="S3.E3.m1.3.3.1.1.2.cmml">(</mo><mi id="S3.E3.m1.1.1" xref="S3.E3.m1.1.1.cmml">x</mi><mo id="S3.E3.m1.3.3.1.1.2.3.2.2" stretchy="false" xref="S3.E3.m1.3.3.1.1.2.cmml">)</mo></mrow></mrow><mo id="S3.E3.m1.3.3.1.1.1" rspace="0.111em" xref="S3.E3.m1.3.3.1.1.1.cmml">=</mo><mrow id="S3.E3.m1.3.3.1.1.3" xref="S3.E3.m1.3.3.1.1.3.cmml"><munderover id="S3.E3.m1.3.3.1.1.3.1" xref="S3.E3.m1.3.3.1.1.3.1.cmml"><mo id="S3.E3.m1.3.3.1.1.3.1.2.2" movablelimits="false" xref="S3.E3.m1.3.3.1.1.3.1.2.2.cmml">∑</mo><mrow id="S3.E3.m1.3.3.1.1.3.1.2.3" xref="S3.E3.m1.3.3.1.1.3.1.2.3.cmml"><mi id="S3.E3.m1.3.3.1.1.3.1.2.3.2" xref="S3.E3.m1.3.3.1.1.3.1.2.3.2.cmml">i</mi><mo id="S3.E3.m1.3.3.1.1.3.1.2.3.1" xref="S3.E3.m1.3.3.1.1.3.1.2.3.1.cmml">=</mo><mn id="S3.E3.m1.3.3.1.1.3.1.2.3.3" xref="S3.E3.m1.3.3.1.1.3.1.2.3.3.cmml">1</mn></mrow><mi id="S3.E3.m1.3.3.1.1.3.1.3" xref="S3.E3.m1.3.3.1.1.3.1.3.cmml">G</mi></munderover><mrow id="S3.E3.m1.3.3.1.1.3.2" xref="S3.E3.m1.3.3.1.1.3.2.cmml"><msub id="S3.E3.m1.3.3.1.1.3.2.2" xref="S3.E3.m1.3.3.1.1.3.2.2.cmml"><mi id="S3.E3.m1.3.3.1.1.3.2.2.2" xref="S3.E3.m1.3.3.1.1.3.2.2.2.cmml">σ</mi><mi id="S3.E3.m1.3.3.1.1.3.2.2.3" xref="S3.E3.m1.3.3.1.1.3.2.2.3.cmml">i</mi></msub><mo id="S3.E3.m1.3.3.1.1.3.2.1" xref="S3.E3.m1.3.3.1.1.3.2.1.cmml"></mo><msub id="S3.E3.m1.3.3.1.1.3.2.3" xref="S3.E3.m1.3.3.1.1.3.2.3.cmml"><mi id="S3.E3.m1.3.3.1.1.3.2.3.2" xref="S3.E3.m1.3.3.1.1.3.2.3.2.cmml">g</mi><mi id="S3.E3.m1.3.3.1.1.3.2.3.3" xref="S3.E3.m1.3.3.1.1.3.2.3.3.cmml">i</mi></msub><mo id="S3.E3.m1.3.3.1.1.3.2.1a" xref="S3.E3.m1.3.3.1.1.3.2.1.cmml"></mo><mrow id="S3.E3.m1.3.3.1.1.3.2.4.2" xref="S3.E3.m1.3.3.1.1.3.2.cmml"><mo id="S3.E3.m1.3.3.1.1.3.2.4.2.1" stretchy="false" xref="S3.E3.m1.3.3.1.1.3.2.cmml">(</mo><mi id="S3.E3.m1.2.2" xref="S3.E3.m1.2.2.cmml">x</mi><mo id="S3.E3.m1.3.3.1.1.3.2.4.2.2" stretchy="false" xref="S3.E3.m1.3.3.1.1.3.2.cmml">)</mo></mrow></mrow></mrow></mrow><mo id="S3.E3.m1.3.3.1.2" xref="S3.E3.m1.3.3.1.1.cmml">,</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.E3.m1.3b"><apply id="S3.E3.m1.3.3.1.1.cmml" xref="S3.E3.m1.3.3.1"><eq id="S3.E3.m1.3.3.1.1.1.cmml" xref="S3.E3.m1.3.3.1.1.1"></eq><apply id="S3.E3.m1.3.3.1.1.2.cmml" xref="S3.E3.m1.3.3.1.1.2"><times id="S3.E3.m1.3.3.1.1.2.1.cmml" xref="S3.E3.m1.3.3.1.1.2.1"></times><ci id="S3.E3.m1.3.3.1.1.2.2.cmml" xref="S3.E3.m1.3.3.1.1.2.2">𝜎</ci><ci id="S3.E3.m1.1.1.cmml" xref="S3.E3.m1.1.1">𝑥</ci></apply><apply id="S3.E3.m1.3.3.1.1.3.cmml" xref="S3.E3.m1.3.3.1.1.3"><apply id="S3.E3.m1.3.3.1.1.3.1.cmml" xref="S3.E3.m1.3.3.1.1.3.1"><csymbol cd="ambiguous" id="S3.E3.m1.3.3.1.1.3.1.1.cmml" xref="S3.E3.m1.3.3.1.1.3.1">superscript</csymbol><apply id="S3.E3.m1.3.3.1.1.3.1.2.cmml" xref="S3.E3.m1.3.3.1.1.3.1"><csymbol cd="ambiguous" id="S3.E3.m1.3.3.1.1.3.1.2.1.cmml" xref="S3.E3.m1.3.3.1.1.3.1">subscript</csymbol><sum id="S3.E3.m1.3.3.1.1.3.1.2.2.cmml" xref="S3.E3.m1.3.3.1.1.3.1.2.2"></sum><apply id="S3.E3.m1.3.3.1.1.3.1.2.3.cmml" xref="S3.E3.m1.3.3.1.1.3.1.2.3"><eq id="S3.E3.m1.3.3.1.1.3.1.2.3.1.cmml" xref="S3.E3.m1.3.3.1.1.3.1.2.3.1"></eq><ci id="S3.E3.m1.3.3.1.1.3.1.2.3.2.cmml" xref="S3.E3.m1.3.3.1.1.3.1.2.3.2">𝑖</ci><cn id="S3.E3.m1.3.3.1.1.3.1.2.3.3.cmml" type="integer" xref="S3.E3.m1.3.3.1.1.3.1.2.3.3">1</cn></apply></apply><ci id="S3.E3.m1.3.3.1.1.3.1.3.cmml" xref="S3.E3.m1.3.3.1.1.3.1.3">𝐺</ci></apply><apply id="S3.E3.m1.3.3.1.1.3.2.cmml" xref="S3.E3.m1.3.3.1.1.3.2"><times id="S3.E3.m1.3.3.1.1.3.2.1.cmml" xref="S3.E3.m1.3.3.1.1.3.2.1"></times><apply id="S3.E3.m1.3.3.1.1.3.2.2.cmml" xref="S3.E3.m1.3.3.1.1.3.2.2"><csymbol cd="ambiguous" id="S3.E3.m1.3.3.1.1.3.2.2.1.cmml" xref="S3.E3.m1.3.3.1.1.3.2.2">subscript</csymbol><ci id="S3.E3.m1.3.3.1.1.3.2.2.2.cmml" xref="S3.E3.m1.3.3.1.1.3.2.2.2">𝜎</ci><ci id="S3.E3.m1.3.3.1.1.3.2.2.3.cmml" xref="S3.E3.m1.3.3.1.1.3.2.2.3">𝑖</ci></apply><apply id="S3.E3.m1.3.3.1.1.3.2.3.cmml" xref="S3.E3.m1.3.3.1.1.3.2.3"><csymbol cd="ambiguous" id="S3.E3.m1.3.3.1.1.3.2.3.1.cmml" xref="S3.E3.m1.3.3.1.1.3.2.3">subscript</csymbol><ci id="S3.E3.m1.3.3.1.1.3.2.3.2.cmml" xref="S3.E3.m1.3.3.1.1.3.2.3.2">𝑔</ci><ci id="S3.E3.m1.3.3.1.1.3.2.3.3.cmml" xref="S3.E3.m1.3.3.1.1.3.2.3.3">𝑖</ci></apply><ci id="S3.E3.m1.2.2.cmml" xref="S3.E3.m1.2.2">𝑥</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E3.m1.3c">\sigma(x)=\sum_{i=1}^{G}\sigma_{i}g_{i}(x),</annotation><annotation encoding="application/x-llamapun" id="S3.E3.m1.3d">italic_σ ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) ,</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(3)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS1.p3.7">where <math alttext="G" class="ltx_Math" display="inline" id="S3.SS1.p3.4.m1.1"><semantics id="S3.SS1.p3.4.m1.1a"><mi id="S3.SS1.p3.4.m1.1.1" xref="S3.SS1.p3.4.m1.1.1.cmml">G</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p3.4.m1.1b"><ci id="S3.SS1.p3.4.m1.1.1.cmml" xref="S3.SS1.p3.4.m1.1.1">𝐺</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p3.4.m1.1c">G</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p3.4.m1.1d">italic_G</annotation></semantics></math> represents the number of Gaussians, the radiance <math alttext="c(x,\nu)" class="ltx_Math" display="inline" id="S3.SS1.p3.5.m2.2"><semantics id="S3.SS1.p3.5.m2.2a"><mrow id="S3.SS1.p3.5.m2.2.3" xref="S3.SS1.p3.5.m2.2.3.cmml"><mi id="S3.SS1.p3.5.m2.2.3.2" xref="S3.SS1.p3.5.m2.2.3.2.cmml">c</mi><mo id="S3.SS1.p3.5.m2.2.3.1" xref="S3.SS1.p3.5.m2.2.3.1.cmml"></mo><mrow id="S3.SS1.p3.5.m2.2.3.3.2" xref="S3.SS1.p3.5.m2.2.3.3.1.cmml"><mo id="S3.SS1.p3.5.m2.2.3.3.2.1" stretchy="false" xref="S3.SS1.p3.5.m2.2.3.3.1.cmml">(</mo><mi id="S3.SS1.p3.5.m2.1.1" xref="S3.SS1.p3.5.m2.1.1.cmml">x</mi><mo id="S3.SS1.p3.5.m2.2.3.3.2.2" xref="S3.SS1.p3.5.m2.2.3.3.1.cmml">,</mo><mi id="S3.SS1.p3.5.m2.2.2" xref="S3.SS1.p3.5.m2.2.2.cmml">ν</mi><mo id="S3.SS1.p3.5.m2.2.3.3.2.3" stretchy="false" xref="S3.SS1.p3.5.m2.2.3.3.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p3.5.m2.2b"><apply id="S3.SS1.p3.5.m2.2.3.cmml" xref="S3.SS1.p3.5.m2.2.3"><times id="S3.SS1.p3.5.m2.2.3.1.cmml" xref="S3.SS1.p3.5.m2.2.3.1"></times><ci id="S3.SS1.p3.5.m2.2.3.2.cmml" xref="S3.SS1.p3.5.m2.2.3.2">𝑐</ci><interval closure="open" id="S3.SS1.p3.5.m2.2.3.3.1.cmml" xref="S3.SS1.p3.5.m2.2.3.3.2"><ci id="S3.SS1.p3.5.m2.1.1.cmml" xref="S3.SS1.p3.5.m2.1.1">𝑥</ci><ci id="S3.SS1.p3.5.m2.2.2.cmml" xref="S3.SS1.p3.5.m2.2.2">𝜈</ci></interval></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p3.5.m2.2c">c(x,\nu)</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p3.5.m2.2d">italic_c ( italic_x , italic_ν )</annotation></semantics></math> at location <math alttext="x" class="ltx_Math" display="inline" id="S3.SS1.p3.6.m3.1"><semantics id="S3.SS1.p3.6.m3.1a"><mi id="S3.SS1.p3.6.m3.1.1" xref="S3.SS1.p3.6.m3.1.1.cmml">x</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p3.6.m3.1b"><ci id="S3.SS1.p3.6.m3.1.1.cmml" xref="S3.SS1.p3.6.m3.1.1">𝑥</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p3.6.m3.1c">x</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p3.6.m3.1d">italic_x</annotation></semantics></math> from viewing direction <math alttext="\nu" class="ltx_Math" display="inline" id="S3.SS1.p3.7.m4.1"><semantics id="S3.SS1.p3.7.m4.1a"><mi id="S3.SS1.p3.7.m4.1.1" xref="S3.SS1.p3.7.m4.1.1.cmml">ν</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p3.7.m4.1b"><ci id="S3.SS1.p3.7.m4.1.1.cmml" xref="S3.SS1.p3.7.m4.1.1">𝜈</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p3.7.m4.1c">\nu</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p3.7.m4.1d">italic_ν</annotation></semantics></math> is given by:</p> <table class="ltx_equation ltx_eqn_table" id="S3.E4"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="c(x,\nu)=\sum_{i=1}^{G}\frac{c_{i}(\nu)\sigma_{i}g_{i}(x)}{\sigma(x)}." class="ltx_Math" display="block" id="S3.E4.m1.6"><semantics id="S3.E4.m1.6a"><mrow id="S3.E4.m1.6.6.1" xref="S3.E4.m1.6.6.1.1.cmml"><mrow id="S3.E4.m1.6.6.1.1" xref="S3.E4.m1.6.6.1.1.cmml"><mrow id="S3.E4.m1.6.6.1.1.2" xref="S3.E4.m1.6.6.1.1.2.cmml"><mi id="S3.E4.m1.6.6.1.1.2.2" xref="S3.E4.m1.6.6.1.1.2.2.cmml">c</mi><mo id="S3.E4.m1.6.6.1.1.2.1" xref="S3.E4.m1.6.6.1.1.2.1.cmml"></mo><mrow id="S3.E4.m1.6.6.1.1.2.3.2" xref="S3.E4.m1.6.6.1.1.2.3.1.cmml"><mo id="S3.E4.m1.6.6.1.1.2.3.2.1" stretchy="false" xref="S3.E4.m1.6.6.1.1.2.3.1.cmml">(</mo><mi id="S3.E4.m1.4.4" xref="S3.E4.m1.4.4.cmml">x</mi><mo id="S3.E4.m1.6.6.1.1.2.3.2.2" xref="S3.E4.m1.6.6.1.1.2.3.1.cmml">,</mo><mi id="S3.E4.m1.5.5" xref="S3.E4.m1.5.5.cmml">ν</mi><mo id="S3.E4.m1.6.6.1.1.2.3.2.3" stretchy="false" xref="S3.E4.m1.6.6.1.1.2.3.1.cmml">)</mo></mrow></mrow><mo id="S3.E4.m1.6.6.1.1.1" rspace="0.111em" xref="S3.E4.m1.6.6.1.1.1.cmml">=</mo><mrow id="S3.E4.m1.6.6.1.1.3" xref="S3.E4.m1.6.6.1.1.3.cmml"><munderover id="S3.E4.m1.6.6.1.1.3.1" xref="S3.E4.m1.6.6.1.1.3.1.cmml"><mo id="S3.E4.m1.6.6.1.1.3.1.2.2" movablelimits="false" xref="S3.E4.m1.6.6.1.1.3.1.2.2.cmml">∑</mo><mrow id="S3.E4.m1.6.6.1.1.3.1.2.3" xref="S3.E4.m1.6.6.1.1.3.1.2.3.cmml"><mi id="S3.E4.m1.6.6.1.1.3.1.2.3.2" xref="S3.E4.m1.6.6.1.1.3.1.2.3.2.cmml">i</mi><mo id="S3.E4.m1.6.6.1.1.3.1.2.3.1" xref="S3.E4.m1.6.6.1.1.3.1.2.3.1.cmml">=</mo><mn id="S3.E4.m1.6.6.1.1.3.1.2.3.3" xref="S3.E4.m1.6.6.1.1.3.1.2.3.3.cmml">1</mn></mrow><mi id="S3.E4.m1.6.6.1.1.3.1.3" xref="S3.E4.m1.6.6.1.1.3.1.3.cmml">G</mi></munderover><mfrac id="S3.E4.m1.3.3" xref="S3.E4.m1.3.3.cmml"><mrow id="S3.E4.m1.2.2.2" xref="S3.E4.m1.2.2.2.cmml"><msub id="S3.E4.m1.2.2.2.4" xref="S3.E4.m1.2.2.2.4.cmml"><mi id="S3.E4.m1.2.2.2.4.2" xref="S3.E4.m1.2.2.2.4.2.cmml">c</mi><mi id="S3.E4.m1.2.2.2.4.3" xref="S3.E4.m1.2.2.2.4.3.cmml">i</mi></msub><mo id="S3.E4.m1.2.2.2.3" xref="S3.E4.m1.2.2.2.3.cmml"></mo><mrow id="S3.E4.m1.2.2.2.5.2" xref="S3.E4.m1.2.2.2.cmml"><mo id="S3.E4.m1.2.2.2.5.2.1" stretchy="false" xref="S3.E4.m1.2.2.2.cmml">(</mo><mi id="S3.E4.m1.1.1.1.1" xref="S3.E4.m1.1.1.1.1.cmml">ν</mi><mo id="S3.E4.m1.2.2.2.5.2.2" stretchy="false" xref="S3.E4.m1.2.2.2.cmml">)</mo></mrow><mo id="S3.E4.m1.2.2.2.3a" xref="S3.E4.m1.2.2.2.3.cmml"></mo><msub id="S3.E4.m1.2.2.2.6" xref="S3.E4.m1.2.2.2.6.cmml"><mi id="S3.E4.m1.2.2.2.6.2" xref="S3.E4.m1.2.2.2.6.2.cmml">σ</mi><mi id="S3.E4.m1.2.2.2.6.3" xref="S3.E4.m1.2.2.2.6.3.cmml">i</mi></msub><mo id="S3.E4.m1.2.2.2.3b" xref="S3.E4.m1.2.2.2.3.cmml"></mo><msub id="S3.E4.m1.2.2.2.7" xref="S3.E4.m1.2.2.2.7.cmml"><mi id="S3.E4.m1.2.2.2.7.2" xref="S3.E4.m1.2.2.2.7.2.cmml">g</mi><mi id="S3.E4.m1.2.2.2.7.3" xref="S3.E4.m1.2.2.2.7.3.cmml">i</mi></msub><mo id="S3.E4.m1.2.2.2.3c" xref="S3.E4.m1.2.2.2.3.cmml"></mo><mrow id="S3.E4.m1.2.2.2.8.2" xref="S3.E4.m1.2.2.2.cmml"><mo id="S3.E4.m1.2.2.2.8.2.1" stretchy="false" xref="S3.E4.m1.2.2.2.cmml">(</mo><mi id="S3.E4.m1.2.2.2.2" xref="S3.E4.m1.2.2.2.2.cmml">x</mi><mo id="S3.E4.m1.2.2.2.8.2.2" stretchy="false" xref="S3.E4.m1.2.2.2.cmml">)</mo></mrow></mrow><mrow id="S3.E4.m1.3.3.3" xref="S3.E4.m1.3.3.3.cmml"><mi id="S3.E4.m1.3.3.3.3" xref="S3.E4.m1.3.3.3.3.cmml">σ</mi><mo id="S3.E4.m1.3.3.3.2" xref="S3.E4.m1.3.3.3.2.cmml"></mo><mrow id="S3.E4.m1.3.3.3.4.2" xref="S3.E4.m1.3.3.3.cmml"><mo id="S3.E4.m1.3.3.3.4.2.1" stretchy="false" xref="S3.E4.m1.3.3.3.cmml">(</mo><mi id="S3.E4.m1.3.3.3.1" xref="S3.E4.m1.3.3.3.1.cmml">x</mi><mo id="S3.E4.m1.3.3.3.4.2.2" stretchy="false" xref="S3.E4.m1.3.3.3.cmml">)</mo></mrow></mrow></mfrac></mrow></mrow><mo id="S3.E4.m1.6.6.1.2" lspace="0em" xref="S3.E4.m1.6.6.1.1.cmml">.</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.E4.m1.6b"><apply id="S3.E4.m1.6.6.1.1.cmml" xref="S3.E4.m1.6.6.1"><eq id="S3.E4.m1.6.6.1.1.1.cmml" xref="S3.E4.m1.6.6.1.1.1"></eq><apply id="S3.E4.m1.6.6.1.1.2.cmml" xref="S3.E4.m1.6.6.1.1.2"><times id="S3.E4.m1.6.6.1.1.2.1.cmml" xref="S3.E4.m1.6.6.1.1.2.1"></times><ci id="S3.E4.m1.6.6.1.1.2.2.cmml" xref="S3.E4.m1.6.6.1.1.2.2">𝑐</ci><interval closure="open" id="S3.E4.m1.6.6.1.1.2.3.1.cmml" xref="S3.E4.m1.6.6.1.1.2.3.2"><ci id="S3.E4.m1.4.4.cmml" xref="S3.E4.m1.4.4">𝑥</ci><ci id="S3.E4.m1.5.5.cmml" xref="S3.E4.m1.5.5">𝜈</ci></interval></apply><apply id="S3.E4.m1.6.6.1.1.3.cmml" xref="S3.E4.m1.6.6.1.1.3"><apply id="S3.E4.m1.6.6.1.1.3.1.cmml" xref="S3.E4.m1.6.6.1.1.3.1"><csymbol cd="ambiguous" id="S3.E4.m1.6.6.1.1.3.1.1.cmml" xref="S3.E4.m1.6.6.1.1.3.1">superscript</csymbol><apply id="S3.E4.m1.6.6.1.1.3.1.2.cmml" xref="S3.E4.m1.6.6.1.1.3.1"><csymbol cd="ambiguous" id="S3.E4.m1.6.6.1.1.3.1.2.1.cmml" xref="S3.E4.m1.6.6.1.1.3.1">subscript</csymbol><sum id="S3.E4.m1.6.6.1.1.3.1.2.2.cmml" xref="S3.E4.m1.6.6.1.1.3.1.2.2"></sum><apply id="S3.E4.m1.6.6.1.1.3.1.2.3.cmml" xref="S3.E4.m1.6.6.1.1.3.1.2.3"><eq id="S3.E4.m1.6.6.1.1.3.1.2.3.1.cmml" xref="S3.E4.m1.6.6.1.1.3.1.2.3.1"></eq><ci id="S3.E4.m1.6.6.1.1.3.1.2.3.2.cmml" xref="S3.E4.m1.6.6.1.1.3.1.2.3.2">𝑖</ci><cn id="S3.E4.m1.6.6.1.1.3.1.2.3.3.cmml" type="integer" xref="S3.E4.m1.6.6.1.1.3.1.2.3.3">1</cn></apply></apply><ci id="S3.E4.m1.6.6.1.1.3.1.3.cmml" xref="S3.E4.m1.6.6.1.1.3.1.3">𝐺</ci></apply><apply id="S3.E4.m1.3.3.cmml" xref="S3.E4.m1.3.3"><divide id="S3.E4.m1.3.3.4.cmml" xref="S3.E4.m1.3.3"></divide><apply id="S3.E4.m1.2.2.2.cmml" xref="S3.E4.m1.2.2.2"><times id="S3.E4.m1.2.2.2.3.cmml" xref="S3.E4.m1.2.2.2.3"></times><apply id="S3.E4.m1.2.2.2.4.cmml" xref="S3.E4.m1.2.2.2.4"><csymbol cd="ambiguous" id="S3.E4.m1.2.2.2.4.1.cmml" xref="S3.E4.m1.2.2.2.4">subscript</csymbol><ci id="S3.E4.m1.2.2.2.4.2.cmml" xref="S3.E4.m1.2.2.2.4.2">𝑐</ci><ci id="S3.E4.m1.2.2.2.4.3.cmml" xref="S3.E4.m1.2.2.2.4.3">𝑖</ci></apply><ci id="S3.E4.m1.1.1.1.1.cmml" xref="S3.E4.m1.1.1.1.1">𝜈</ci><apply id="S3.E4.m1.2.2.2.6.cmml" xref="S3.E4.m1.2.2.2.6"><csymbol cd="ambiguous" id="S3.E4.m1.2.2.2.6.1.cmml" xref="S3.E4.m1.2.2.2.6">subscript</csymbol><ci id="S3.E4.m1.2.2.2.6.2.cmml" xref="S3.E4.m1.2.2.2.6.2">𝜎</ci><ci id="S3.E4.m1.2.2.2.6.3.cmml" xref="S3.E4.m1.2.2.2.6.3">𝑖</ci></apply><apply id="S3.E4.m1.2.2.2.7.cmml" xref="S3.E4.m1.2.2.2.7"><csymbol cd="ambiguous" id="S3.E4.m1.2.2.2.7.1.cmml" xref="S3.E4.m1.2.2.2.7">subscript</csymbol><ci id="S3.E4.m1.2.2.2.7.2.cmml" xref="S3.E4.m1.2.2.2.7.2">𝑔</ci><ci id="S3.E4.m1.2.2.2.7.3.cmml" xref="S3.E4.m1.2.2.2.7.3">𝑖</ci></apply><ci id="S3.E4.m1.2.2.2.2.cmml" xref="S3.E4.m1.2.2.2.2">𝑥</ci></apply><apply id="S3.E4.m1.3.3.3.cmml" xref="S3.E4.m1.3.3.3"><times id="S3.E4.m1.3.3.3.2.cmml" xref="S3.E4.m1.3.3.3.2"></times><ci id="S3.E4.m1.3.3.3.3.cmml" xref="S3.E4.m1.3.3.3.3">𝜎</ci><ci id="S3.E4.m1.3.3.3.1.cmml" xref="S3.E4.m1.3.3.3.1">𝑥</ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E4.m1.6c">c(x,\nu)=\sum_{i=1}^{G}\frac{c_{i}(\nu)\sigma_{i}g_{i}(x)}{\sigma(x)}.</annotation><annotation encoding="application/x-llamapun" id="S3.E4.m1.6d">italic_c ( italic_x , italic_ν ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT divide start_ARG italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ν ) italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG italic_σ ( italic_x ) end_ARG .</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(4)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS1.p3.9">This representation captures both opacity and radiance, offering a detailed view of the scene from different angles.</p> </div> <div class="ltx_para ltx_noindent" id="S3.SS1.p4"> <p class="ltx_p" id="S3.SS1.p4.1"><span class="ltx_text ltx_font_bold" id="S3.SS1.p4.1.1">Differentiable Rendering.</span> To render the radiance field into an image <math alttext="\hat{J}" class="ltx_Math" display="inline" id="S3.SS1.p4.1.m1.1"><semantics id="S3.SS1.p4.1.m1.1a"><mover accent="true" id="S3.SS1.p4.1.m1.1.1" xref="S3.SS1.p4.1.m1.1.1.cmml"><mi id="S3.SS1.p4.1.m1.1.1.2" xref="S3.SS1.p4.1.m1.1.1.2.cmml">J</mi><mo id="S3.SS1.p4.1.m1.1.1.1" xref="S3.SS1.p4.1.m1.1.1.1.cmml">^</mo></mover><annotation-xml encoding="MathML-Content" id="S3.SS1.p4.1.m1.1b"><apply id="S3.SS1.p4.1.m1.1.1.cmml" xref="S3.SS1.p4.1.m1.1.1"><ci id="S3.SS1.p4.1.m1.1.1.1.cmml" xref="S3.SS1.p4.1.m1.1.1.1">^</ci><ci id="S3.SS1.p4.1.m1.1.1.2.cmml" xref="S3.SS1.p4.1.m1.1.1.2">𝐽</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p4.1.m1.1c">\hat{J}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p4.1.m1.1d">over^ start_ARG italic_J end_ARG</annotation></semantics></math>, we integrate radiances along each line of sight using the emission-absorption equation <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib40" title="">40</a>]</cite> following 3D Gaussian Splatting <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib33" title="">33</a>]</cite>:</p> <table class="ltx_equation ltx_eqn_table" id="S3.E5"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\begin{split}\hat{J}&=\text{Rend}(\mathbf{G},\pi)\\ &=\int_{0}^{\infty}c(x_{t},\nu)\sigma(x_{t})\exp\left(-\int_{0}^{t}\sigma(x_{% \tau})d\tau\right)dt,\end{split}" class="ltx_Math" display="block" id="S3.E5.m1.43"><semantics id="S3.E5.m1.43a"><mtable columnspacing="0pt" displaystyle="true" id="S3.E5.m1.43.43.2" rowspacing="0pt"><mtr id="S3.E5.m1.43.43.2a"><mtd class="ltx_align_right" columnalign="right" id="S3.E5.m1.43.43.2b"><mover accent="true" id="S3.E5.m1.1.1.1.1.1.1" xref="S3.E5.m1.1.1.1.1.1.1.cmml"><mi id="S3.E5.m1.1.1.1.1.1.1.3" xref="S3.E5.m1.1.1.1.1.1.1.3.cmml">J</mi><mo id="S3.E5.m1.1.1.1.1.1.1.2" xref="S3.E5.m1.1.1.1.1.1.1.2.cmml">^</mo></mover></mtd><mtd class="ltx_align_left" columnalign="left" id="S3.E5.m1.43.43.2c"><mrow id="S3.E5.m1.8.8.8.8.7"><mi id="S3.E5.m1.8.8.8.8.7.8" xref="S3.E5.m1.42.42.1.1.1.cmml"></mi><mo id="S3.E5.m1.2.2.2.2.1.1" xref="S3.E5.m1.2.2.2.2.1.1.cmml">=</mo><mrow id="S3.E5.m1.8.8.8.8.7.9"><mtext id="S3.E5.m1.3.3.3.3.2.2" xref="S3.E5.m1.3.3.3.3.2.2a.cmml">Rend</mtext><mo id="S3.E5.m1.8.8.8.8.7.9.1" xref="S3.E5.m1.42.42.1.1.1.cmml"></mo><mrow id="S3.E5.m1.8.8.8.8.7.9.2"><mo id="S3.E5.m1.4.4.4.4.3.3" stretchy="false" xref="S3.E5.m1.42.42.1.1.1.cmml">(</mo><mi id="S3.E5.m1.5.5.5.5.4.4" xref="S3.E5.m1.5.5.5.5.4.4.cmml">𝐆</mi><mo id="S3.E5.m1.6.6.6.6.5.5" xref="S3.E5.m1.42.42.1.1.1.cmml">,</mo><mi id="S3.E5.m1.7.7.7.7.6.6" xref="S3.E5.m1.7.7.7.7.6.6.cmml">π</mi><mo id="S3.E5.m1.8.8.8.8.7.7" stretchy="false" xref="S3.E5.m1.42.42.1.1.1.cmml">)</mo></mrow></mrow></mrow></mtd></mtr><mtr id="S3.E5.m1.43.43.2d"><mtd id="S3.E5.m1.43.43.2e" xref="S3.E5.m1.42.42.1.1.1.cmml"></mtd><mtd class="ltx_align_left" columnalign="left" id="S3.E5.m1.43.43.2f"><mrow id="S3.E5.m1.43.43.2.42.34.34.34"><mrow id="S3.E5.m1.43.43.2.42.34.34.34.1"><mi id="S3.E5.m1.43.43.2.42.34.34.34.1.4" xref="S3.E5.m1.42.42.1.1.1.cmml"></mi><mo id="S3.E5.m1.9.9.9.1.1.1" rspace="0.111em" xref="S3.E5.m1.9.9.9.1.1.1.cmml">=</mo><mrow id="S3.E5.m1.43.43.2.42.34.34.34.1.3"><msubsup id="S3.E5.m1.43.43.2.42.34.34.34.1.3.4"><mo id="S3.E5.m1.10.10.10.2.2.2" xref="S3.E5.m1.10.10.10.2.2.2.cmml">∫</mo><mn id="S3.E5.m1.11.11.11.3.3.3.1" xref="S3.E5.m1.11.11.11.3.3.3.1.cmml">0</mn><mi id="S3.E5.m1.12.12.12.4.4.4.1" mathvariant="normal" xref="S3.E5.m1.12.12.12.4.4.4.1.cmml">∞</mi></msubsup><mrow id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3"><mi id="S3.E5.m1.13.13.13.5.5.5" xref="S3.E5.m1.13.13.13.5.5.5.cmml">c</mi><mo id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.4" xref="S3.E5.m1.42.42.1.1.1.cmml"></mo><mrow id="S3.E5.m1.43.43.2.42.34.34.34.1.1.1.1.1"><mo id="S3.E5.m1.14.14.14.6.6.6" stretchy="false" xref="S3.E5.m1.42.42.1.1.1.cmml">(</mo><msub id="S3.E5.m1.43.43.2.42.34.34.34.1.1.1.1.1.1"><mi id="S3.E5.m1.15.15.15.7.7.7" xref="S3.E5.m1.15.15.15.7.7.7.cmml">x</mi><mi id="S3.E5.m1.16.16.16.8.8.8.1" xref="S3.E5.m1.16.16.16.8.8.8.1.cmml">t</mi></msub><mo id="S3.E5.m1.17.17.17.9.9.9" xref="S3.E5.m1.42.42.1.1.1.cmml">,</mo><mi id="S3.E5.m1.18.18.18.10.10.10" xref="S3.E5.m1.18.18.18.10.10.10.cmml">ν</mi><mo id="S3.E5.m1.19.19.19.11.11.11" stretchy="false" xref="S3.E5.m1.42.42.1.1.1.cmml">)</mo></mrow><mo id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.4a" xref="S3.E5.m1.42.42.1.1.1.cmml"></mo><mi id="S3.E5.m1.20.20.20.12.12.12" xref="S3.E5.m1.20.20.20.12.12.12.cmml">σ</mi><mo id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.4b" xref="S3.E5.m1.42.42.1.1.1.cmml"></mo><mrow id="S3.E5.m1.43.43.2.42.34.34.34.1.2.2.2.1"><mo id="S3.E5.m1.21.21.21.13.13.13" stretchy="false" xref="S3.E5.m1.42.42.1.1.1.cmml">(</mo><msub id="S3.E5.m1.43.43.2.42.34.34.34.1.2.2.2.1.1"><mi id="S3.E5.m1.22.22.22.14.14.14" xref="S3.E5.m1.22.22.22.14.14.14.cmml">x</mi><mi id="S3.E5.m1.23.23.23.15.15.15.1" xref="S3.E5.m1.23.23.23.15.15.15.1.cmml">t</mi></msub><mo id="S3.E5.m1.24.24.24.16.16.16" stretchy="false" xref="S3.E5.m1.42.42.1.1.1.cmml">)</mo></mrow><mo id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.4c" lspace="0.167em" xref="S3.E5.m1.42.42.1.1.1.cmml"></mo><mrow id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.3.1"><mi id="S3.E5.m1.25.25.25.17.17.17" xref="S3.E5.m1.25.25.25.17.17.17.cmml">exp</mi><mo id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.3.1a" xref="S3.E5.m1.42.42.1.1.1.cmml"></mo><mrow id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.3.1.1"><mo id="S3.E5.m1.26.26.26.18.18.18" xref="S3.E5.m1.42.42.1.1.1.cmml">(</mo><mrow id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.3.1.1.1"><mo id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.3.1.1.1a" xref="S3.E5.m1.42.42.1.1.1.cmml">−</mo><mrow id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.3.1.1.1.1"><msubsup id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.3.1.1.1.1.2"><mo id="S3.E5.m1.28.28.28.20.20.20" xref="S3.E5.m1.28.28.28.20.20.20.cmml">∫</mo><mn id="S3.E5.m1.29.29.29.21.21.21.1" xref="S3.E5.m1.29.29.29.21.21.21.1.cmml">0</mn><mi id="S3.E5.m1.30.30.30.22.22.22.1" xref="S3.E5.m1.30.30.30.22.22.22.1.cmml">t</mi></msubsup><mrow id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.3.1.1.1.1.1"><mi id="S3.E5.m1.31.31.31.23.23.23" xref="S3.E5.m1.31.31.31.23.23.23.cmml">σ</mi><mo id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.3.1.1.1.1.1.2" xref="S3.E5.m1.42.42.1.1.1.cmml"></mo><mrow id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.3.1.1.1.1.1.1.1"><mo id="S3.E5.m1.32.32.32.24.24.24" stretchy="false" xref="S3.E5.m1.42.42.1.1.1.cmml">(</mo><msub id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.3.1.1.1.1.1.1.1.1"><mi id="S3.E5.m1.33.33.33.25.25.25" xref="S3.E5.m1.33.33.33.25.25.25.cmml">x</mi><mi id="S3.E5.m1.34.34.34.26.26.26.1" xref="S3.E5.m1.34.34.34.26.26.26.1.cmml">τ</mi></msub><mo id="S3.E5.m1.35.35.35.27.27.27" stretchy="false" xref="S3.E5.m1.42.42.1.1.1.cmml">)</mo></mrow><mo id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.3.1.1.1.1.1.2a" lspace="0em" xref="S3.E5.m1.42.42.1.1.1.cmml"></mo><mrow id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.3.1.1.1.1.1.3"><mo id="S3.E5.m1.36.36.36.28.28.28" rspace="0em" xref="S3.E5.m1.36.36.36.28.28.28.cmml">𝑑</mo><mi id="S3.E5.m1.37.37.37.29.29.29" xref="S3.E5.m1.37.37.37.29.29.29.cmml">τ</mi></mrow></mrow></mrow></mrow><mo id="S3.E5.m1.38.38.38.30.30.30" xref="S3.E5.m1.42.42.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.4d" lspace="0em" xref="S3.E5.m1.42.42.1.1.1.cmml"></mo><mrow id="S3.E5.m1.43.43.2.42.34.34.34.1.3.3.5"><mo id="S3.E5.m1.39.39.39.31.31.31" rspace="0em" xref="S3.E5.m1.39.39.39.31.31.31.cmml">𝑑</mo><mi id="S3.E5.m1.40.40.40.32.32.32" xref="S3.E5.m1.40.40.40.32.32.32.cmml">t</mi></mrow></mrow></mrow></mrow><mo id="S3.E5.m1.41.41.41.33.33.33" xref="S3.E5.m1.42.42.1.1.1.cmml">,</mo></mrow></mtd></mtr></mtable><annotation-xml encoding="MathML-Content" id="S3.E5.m1.43b"><apply id="S3.E5.m1.42.42.1.1.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><and id="S3.E5.m1.42.42.1.1.1a.cmml" xref="S3.E5.m1.8.8.8.8.7.8"></and><apply id="S3.E5.m1.42.42.1.1.1b.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><eq id="S3.E5.m1.2.2.2.2.1.1.cmml" xref="S3.E5.m1.2.2.2.2.1.1"></eq><apply id="S3.E5.m1.1.1.1.1.1.1.cmml" xref="S3.E5.m1.1.1.1.1.1.1"><ci id="S3.E5.m1.1.1.1.1.1.1.2.cmml" xref="S3.E5.m1.1.1.1.1.1.1.2">^</ci><ci id="S3.E5.m1.1.1.1.1.1.1.3.cmml" xref="S3.E5.m1.1.1.1.1.1.1.3">𝐽</ci></apply><apply id="S3.E5.m1.42.42.1.1.1.7.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><times id="S3.E5.m1.42.42.1.1.1.7.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8"></times><ci id="S3.E5.m1.3.3.3.3.2.2a.cmml" xref="S3.E5.m1.3.3.3.3.2.2"><mtext id="S3.E5.m1.3.3.3.3.2.2.cmml" xref="S3.E5.m1.3.3.3.3.2.2">Rend</mtext></ci><interval closure="open" id="S3.E5.m1.42.42.1.1.1.7.3.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><ci id="S3.E5.m1.5.5.5.5.4.4.cmml" xref="S3.E5.m1.5.5.5.5.4.4">𝐆</ci><ci id="S3.E5.m1.7.7.7.7.6.6.cmml" xref="S3.E5.m1.7.7.7.7.6.6">𝜋</ci></interval></apply></apply><apply id="S3.E5.m1.42.42.1.1.1c.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><eq id="S3.E5.m1.9.9.9.1.1.1.cmml" xref="S3.E5.m1.9.9.9.1.1.1"></eq><share href="https://arxiv.org/html/2503.12553v1#S3.E5.m1.42.42.1.1.1.7.cmml" id="S3.E5.m1.42.42.1.1.1d.cmml" xref="S3.E5.m1.8.8.8.8.7.8"></share><apply id="S3.E5.m1.42.42.1.1.1.3.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><apply id="S3.E5.m1.42.42.1.1.1.3.4.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><csymbol cd="ambiguous" id="S3.E5.m1.42.42.1.1.1.3.4.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8">superscript</csymbol><apply id="S3.E5.m1.42.42.1.1.1.3.4.2.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><csymbol cd="ambiguous" id="S3.E5.m1.42.42.1.1.1.3.4.2.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8">subscript</csymbol><int id="S3.E5.m1.10.10.10.2.2.2.cmml" xref="S3.E5.m1.10.10.10.2.2.2"></int><cn id="S3.E5.m1.11.11.11.3.3.3.1.cmml" type="integer" xref="S3.E5.m1.11.11.11.3.3.3.1">0</cn></apply><infinity id="S3.E5.m1.12.12.12.4.4.4.1.cmml" xref="S3.E5.m1.12.12.12.4.4.4.1"></infinity></apply><apply id="S3.E5.m1.42.42.1.1.1.3.3.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><times id="S3.E5.m1.42.42.1.1.1.3.3.4.cmml" xref="S3.E5.m1.8.8.8.8.7.8"></times><ci id="S3.E5.m1.13.13.13.5.5.5.cmml" xref="S3.E5.m1.13.13.13.5.5.5">𝑐</ci><interval closure="open" id="S3.E5.m1.42.42.1.1.1.1.1.1.2.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><apply id="S3.E5.m1.42.42.1.1.1.1.1.1.1.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><csymbol cd="ambiguous" id="S3.E5.m1.42.42.1.1.1.1.1.1.1.1.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8">subscript</csymbol><ci id="S3.E5.m1.15.15.15.7.7.7.cmml" xref="S3.E5.m1.15.15.15.7.7.7">𝑥</ci><ci id="S3.E5.m1.16.16.16.8.8.8.1.cmml" xref="S3.E5.m1.16.16.16.8.8.8.1">𝑡</ci></apply><ci id="S3.E5.m1.18.18.18.10.10.10.cmml" xref="S3.E5.m1.18.18.18.10.10.10">𝜈</ci></interval><ci id="S3.E5.m1.20.20.20.12.12.12.cmml" xref="S3.E5.m1.20.20.20.12.12.12">𝜎</ci><apply id="S3.E5.m1.42.42.1.1.1.2.2.2.1.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><csymbol cd="ambiguous" id="S3.E5.m1.42.42.1.1.1.2.2.2.1.1.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8">subscript</csymbol><ci id="S3.E5.m1.22.22.22.14.14.14.cmml" xref="S3.E5.m1.22.22.22.14.14.14">𝑥</ci><ci id="S3.E5.m1.23.23.23.15.15.15.1.cmml" xref="S3.E5.m1.23.23.23.15.15.15.1">𝑡</ci></apply><apply id="S3.E5.m1.42.42.1.1.1.3.3.3.2.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><exp id="S3.E5.m1.25.25.25.17.17.17.cmml" xref="S3.E5.m1.25.25.25.17.17.17"></exp><apply id="S3.E5.m1.42.42.1.1.1.3.3.3.1.1.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><minus id="S3.E5.m1.27.27.27.19.19.19.cmml" xref="S3.E5.m1.8.8.8.8.7.8"></minus><apply id="S3.E5.m1.42.42.1.1.1.3.3.3.1.1.1.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><apply id="S3.E5.m1.42.42.1.1.1.3.3.3.1.1.1.1.2.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><csymbol cd="ambiguous" id="S3.E5.m1.42.42.1.1.1.3.3.3.1.1.1.1.2.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8">superscript</csymbol><apply id="S3.E5.m1.42.42.1.1.1.3.3.3.1.1.1.1.2.2.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><csymbol cd="ambiguous" id="S3.E5.m1.42.42.1.1.1.3.3.3.1.1.1.1.2.2.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8">subscript</csymbol><int id="S3.E5.m1.28.28.28.20.20.20.cmml" xref="S3.E5.m1.28.28.28.20.20.20"></int><cn id="S3.E5.m1.29.29.29.21.21.21.1.cmml" type="integer" xref="S3.E5.m1.29.29.29.21.21.21.1">0</cn></apply><ci id="S3.E5.m1.30.30.30.22.22.22.1.cmml" xref="S3.E5.m1.30.30.30.22.22.22.1">𝑡</ci></apply><apply id="S3.E5.m1.42.42.1.1.1.3.3.3.1.1.1.1.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><times id="S3.E5.m1.42.42.1.1.1.3.3.3.1.1.1.1.1.2.cmml" xref="S3.E5.m1.8.8.8.8.7.8"></times><ci id="S3.E5.m1.31.31.31.23.23.23.cmml" xref="S3.E5.m1.31.31.31.23.23.23">𝜎</ci><apply id="S3.E5.m1.42.42.1.1.1.3.3.3.1.1.1.1.1.1.1.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><csymbol cd="ambiguous" id="S3.E5.m1.42.42.1.1.1.3.3.3.1.1.1.1.1.1.1.1.1.cmml" xref="S3.E5.m1.8.8.8.8.7.8">subscript</csymbol><ci id="S3.E5.m1.33.33.33.25.25.25.cmml" xref="S3.E5.m1.33.33.33.25.25.25">𝑥</ci><ci id="S3.E5.m1.34.34.34.26.26.26.1.cmml" xref="S3.E5.m1.34.34.34.26.26.26.1">𝜏</ci></apply><apply id="S3.E5.m1.42.42.1.1.1.3.3.3.1.1.1.1.1.4.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><csymbol cd="latexml" id="S3.E5.m1.36.36.36.28.28.28.cmml" xref="S3.E5.m1.36.36.36.28.28.28">differential-d</csymbol><ci id="S3.E5.m1.37.37.37.29.29.29.cmml" xref="S3.E5.m1.37.37.37.29.29.29">𝜏</ci></apply></apply></apply></apply></apply><apply id="S3.E5.m1.42.42.1.1.1.3.3.7.cmml" xref="S3.E5.m1.8.8.8.8.7.8"><csymbol cd="latexml" id="S3.E5.m1.39.39.39.31.31.31.cmml" xref="S3.E5.m1.39.39.39.31.31.31">differential-d</csymbol><ci id="S3.E5.m1.40.40.40.32.32.32.cmml" xref="S3.E5.m1.40.40.40.32.32.32">𝑡</ci></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E5.m1.43c">\begin{split}\hat{J}&=\text{Rend}(\mathbf{G},\pi)\\ &=\int_{0}^{\infty}c(x_{t},\nu)\sigma(x_{t})\exp\left(-\int_{0}^{t}\sigma(x_{% \tau})d\tau\right)dt,\end{split}</annotation><annotation encoding="application/x-llamapun" id="S3.E5.m1.43d">start_ROW start_CELL over^ start_ARG italic_J end_ARG end_CELL start_CELL = Rend ( bold_G , italic_π ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_c ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_ν ) italic_σ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) roman_exp ( - ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_σ ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_d italic_τ ) italic_d italic_t , end_CELL end_ROW</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(5)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS1.p4.7">where <math alttext="\mathbf{G}" class="ltx_Math" display="inline" id="S3.SS1.p4.2.m1.1"><semantics id="S3.SS1.p4.2.m1.1a"><mi id="S3.SS1.p4.2.m1.1.1" xref="S3.SS1.p4.2.m1.1.1.cmml">𝐆</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p4.2.m1.1b"><ci id="S3.SS1.p4.2.m1.1.1.cmml" xref="S3.SS1.p4.2.m1.1.1">𝐆</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p4.2.m1.1c">\mathbf{G}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p4.2.m1.1d">bold_G</annotation></semantics></math> means the Gaussian mixture, and <math alttext="\pi" class="ltx_Math" display="inline" id="S3.SS1.p4.3.m2.1"><semantics id="S3.SS1.p4.3.m2.1a"><mi id="S3.SS1.p4.3.m2.1.1" xref="S3.SS1.p4.3.m2.1.1.cmml">π</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p4.3.m2.1b"><ci id="S3.SS1.p4.3.m2.1.1.cmml" xref="S3.SS1.p4.3.m2.1.1">𝜋</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p4.3.m2.1c">\pi</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p4.3.m2.1d">italic_π</annotation></semantics></math> means the the viewpoint; <math alttext="x_{t}=x_{0}-t\nu" class="ltx_Math" display="inline" id="S3.SS1.p4.4.m3.1"><semantics id="S3.SS1.p4.4.m3.1a"><mrow id="S3.SS1.p4.4.m3.1.1" xref="S3.SS1.p4.4.m3.1.1.cmml"><msub id="S3.SS1.p4.4.m3.1.1.2" xref="S3.SS1.p4.4.m3.1.1.2.cmml"><mi id="S3.SS1.p4.4.m3.1.1.2.2" xref="S3.SS1.p4.4.m3.1.1.2.2.cmml">x</mi><mi id="S3.SS1.p4.4.m3.1.1.2.3" xref="S3.SS1.p4.4.m3.1.1.2.3.cmml">t</mi></msub><mo id="S3.SS1.p4.4.m3.1.1.1" xref="S3.SS1.p4.4.m3.1.1.1.cmml">=</mo><mrow id="S3.SS1.p4.4.m3.1.1.3" xref="S3.SS1.p4.4.m3.1.1.3.cmml"><msub id="S3.SS1.p4.4.m3.1.1.3.2" xref="S3.SS1.p4.4.m3.1.1.3.2.cmml"><mi id="S3.SS1.p4.4.m3.1.1.3.2.2" xref="S3.SS1.p4.4.m3.1.1.3.2.2.cmml">x</mi><mn id="S3.SS1.p4.4.m3.1.1.3.2.3" xref="S3.SS1.p4.4.m3.1.1.3.2.3.cmml">0</mn></msub><mo id="S3.SS1.p4.4.m3.1.1.3.1" xref="S3.SS1.p4.4.m3.1.1.3.1.cmml">−</mo><mrow id="S3.SS1.p4.4.m3.1.1.3.3" xref="S3.SS1.p4.4.m3.1.1.3.3.cmml"><mi id="S3.SS1.p4.4.m3.1.1.3.3.2" xref="S3.SS1.p4.4.m3.1.1.3.3.2.cmml">t</mi><mo id="S3.SS1.p4.4.m3.1.1.3.3.1" xref="S3.SS1.p4.4.m3.1.1.3.3.1.cmml"></mo><mi id="S3.SS1.p4.4.m3.1.1.3.3.3" xref="S3.SS1.p4.4.m3.1.1.3.3.3.cmml">ν</mi></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p4.4.m3.1b"><apply id="S3.SS1.p4.4.m3.1.1.cmml" xref="S3.SS1.p4.4.m3.1.1"><eq id="S3.SS1.p4.4.m3.1.1.1.cmml" xref="S3.SS1.p4.4.m3.1.1.1"></eq><apply id="S3.SS1.p4.4.m3.1.1.2.cmml" xref="S3.SS1.p4.4.m3.1.1.2"><csymbol cd="ambiguous" id="S3.SS1.p4.4.m3.1.1.2.1.cmml" xref="S3.SS1.p4.4.m3.1.1.2">subscript</csymbol><ci id="S3.SS1.p4.4.m3.1.1.2.2.cmml" xref="S3.SS1.p4.4.m3.1.1.2.2">𝑥</ci><ci id="S3.SS1.p4.4.m3.1.1.2.3.cmml" xref="S3.SS1.p4.4.m3.1.1.2.3">𝑡</ci></apply><apply id="S3.SS1.p4.4.m3.1.1.3.cmml" xref="S3.SS1.p4.4.m3.1.1.3"><minus id="S3.SS1.p4.4.m3.1.1.3.1.cmml" xref="S3.SS1.p4.4.m3.1.1.3.1"></minus><apply id="S3.SS1.p4.4.m3.1.1.3.2.cmml" xref="S3.SS1.p4.4.m3.1.1.3.2"><csymbol cd="ambiguous" id="S3.SS1.p4.4.m3.1.1.3.2.1.cmml" xref="S3.SS1.p4.4.m3.1.1.3.2">subscript</csymbol><ci id="S3.SS1.p4.4.m3.1.1.3.2.2.cmml" xref="S3.SS1.p4.4.m3.1.1.3.2.2">𝑥</ci><cn id="S3.SS1.p4.4.m3.1.1.3.2.3.cmml" type="integer" xref="S3.SS1.p4.4.m3.1.1.3.2.3">0</cn></apply><apply id="S3.SS1.p4.4.m3.1.1.3.3.cmml" xref="S3.SS1.p4.4.m3.1.1.3.3"><times id="S3.SS1.p4.4.m3.1.1.3.3.1.cmml" xref="S3.SS1.p4.4.m3.1.1.3.3.1"></times><ci id="S3.SS1.p4.4.m3.1.1.3.3.2.cmml" xref="S3.SS1.p4.4.m3.1.1.3.3.2">𝑡</ci><ci id="S3.SS1.p4.4.m3.1.1.3.3.3.cmml" xref="S3.SS1.p4.4.m3.1.1.3.3.3">𝜈</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p4.4.m3.1c">x_{t}=x_{0}-t\nu</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p4.4.m3.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_t italic_ν</annotation></semantics></math> is a point from the camera center <math alttext="x_{0}" class="ltx_Math" display="inline" id="S3.SS1.p4.5.m4.1"><semantics id="S3.SS1.p4.5.m4.1a"><msub id="S3.SS1.p4.5.m4.1.1" xref="S3.SS1.p4.5.m4.1.1.cmml"><mi id="S3.SS1.p4.5.m4.1.1.2" xref="S3.SS1.p4.5.m4.1.1.2.cmml">x</mi><mn id="S3.SS1.p4.5.m4.1.1.3" xref="S3.SS1.p4.5.m4.1.1.3.cmml">0</mn></msub><annotation-xml encoding="MathML-Content" id="S3.SS1.p4.5.m4.1b"><apply id="S3.SS1.p4.5.m4.1.1.cmml" xref="S3.SS1.p4.5.m4.1.1"><csymbol cd="ambiguous" id="S3.SS1.p4.5.m4.1.1.1.cmml" xref="S3.SS1.p4.5.m4.1.1">subscript</csymbol><ci id="S3.SS1.p4.5.m4.1.1.2.cmml" xref="S3.SS1.p4.5.m4.1.1.2">𝑥</ci><cn id="S3.SS1.p4.5.m4.1.1.3.cmml" type="integer" xref="S3.SS1.p4.5.m4.1.1.3">0</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p4.5.m4.1c">x_{0}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p4.5.m4.1d">italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT</annotation></semantics></math> with distance <math alttext="t" class="ltx_Math" display="inline" id="S3.SS1.p4.6.m5.1"><semantics id="S3.SS1.p4.6.m5.1a"><mi id="S3.SS1.p4.6.m5.1.1" xref="S3.SS1.p4.6.m5.1.1.cmml">t</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p4.6.m5.1b"><ci id="S3.SS1.p4.6.m5.1.1.cmml" xref="S3.SS1.p4.6.m5.1.1">𝑡</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p4.6.m5.1c">t</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p4.6.m5.1d">italic_t</annotation></semantics></math> at the direction of <math alttext="\nu" class="ltx_Math" display="inline" id="S3.SS1.p4.7.m6.1"><semantics id="S3.SS1.p4.7.m6.1a"><mi id="S3.SS1.p4.7.m6.1.1" xref="S3.SS1.p4.7.m6.1.1.cmml">ν</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p4.7.m6.1b"><ci id="S3.SS1.p4.7.m6.1.1.cmml" xref="S3.SS1.p4.7.m6.1.1">𝜈</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p4.7.m6.1c">\nu</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p4.7.m6.1d">italic_ν</annotation></semantics></math>.</p> </div> <div class="ltx_para ltx_noindent" id="S3.SS1.p5"> <p class="ltx_p" id="S3.SS1.p5.2"><span class="ltx_text ltx_font_bold" id="S3.SS1.p5.2.1">Monocular Reconstruction.</span> Inspired by <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib61" title="">61</a>]</cite>, the output of the neural network <math alttext="\Phi(I)\in\mathbb{R}^{C\times H\times W}" class="ltx_Math" display="inline" id="S3.SS1.p5.1.m1.1"><semantics id="S3.SS1.p5.1.m1.1a"><mrow id="S3.SS1.p5.1.m1.1.2" xref="S3.SS1.p5.1.m1.1.2.cmml"><mrow id="S3.SS1.p5.1.m1.1.2.2" xref="S3.SS1.p5.1.m1.1.2.2.cmml"><mi id="S3.SS1.p5.1.m1.1.2.2.2" mathvariant="normal" xref="S3.SS1.p5.1.m1.1.2.2.2.cmml">Φ</mi><mo id="S3.SS1.p5.1.m1.1.2.2.1" xref="S3.SS1.p5.1.m1.1.2.2.1.cmml"></mo><mrow id="S3.SS1.p5.1.m1.1.2.2.3.2" xref="S3.SS1.p5.1.m1.1.2.2.cmml"><mo id="S3.SS1.p5.1.m1.1.2.2.3.2.1" stretchy="false" xref="S3.SS1.p5.1.m1.1.2.2.cmml">(</mo><mi id="S3.SS1.p5.1.m1.1.1" xref="S3.SS1.p5.1.m1.1.1.cmml">I</mi><mo id="S3.SS1.p5.1.m1.1.2.2.3.2.2" stretchy="false" xref="S3.SS1.p5.1.m1.1.2.2.cmml">)</mo></mrow></mrow><mo id="S3.SS1.p5.1.m1.1.2.1" xref="S3.SS1.p5.1.m1.1.2.1.cmml">∈</mo><msup id="S3.SS1.p5.1.m1.1.2.3" xref="S3.SS1.p5.1.m1.1.2.3.cmml"><mi id="S3.SS1.p5.1.m1.1.2.3.2" xref="S3.SS1.p5.1.m1.1.2.3.2.cmml">ℝ</mi><mrow id="S3.SS1.p5.1.m1.1.2.3.3" xref="S3.SS1.p5.1.m1.1.2.3.3.cmml"><mi id="S3.SS1.p5.1.m1.1.2.3.3.2" xref="S3.SS1.p5.1.m1.1.2.3.3.2.cmml">C</mi><mo id="S3.SS1.p5.1.m1.1.2.3.3.1" lspace="0.222em" rspace="0.222em" xref="S3.SS1.p5.1.m1.1.2.3.3.1.cmml">×</mo><mi id="S3.SS1.p5.1.m1.1.2.3.3.3" xref="S3.SS1.p5.1.m1.1.2.3.3.3.cmml">H</mi><mo id="S3.SS1.p5.1.m1.1.2.3.3.1a" lspace="0.222em" rspace="0.222em" xref="S3.SS1.p5.1.m1.1.2.3.3.1.cmml">×</mo><mi id="S3.SS1.p5.1.m1.1.2.3.3.4" xref="S3.SS1.p5.1.m1.1.2.3.3.4.cmml">W</mi></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p5.1.m1.1b"><apply id="S3.SS1.p5.1.m1.1.2.cmml" xref="S3.SS1.p5.1.m1.1.2"><in id="S3.SS1.p5.1.m1.1.2.1.cmml" xref="S3.SS1.p5.1.m1.1.2.1"></in><apply id="S3.SS1.p5.1.m1.1.2.2.cmml" xref="S3.SS1.p5.1.m1.1.2.2"><times id="S3.SS1.p5.1.m1.1.2.2.1.cmml" xref="S3.SS1.p5.1.m1.1.2.2.1"></times><ci id="S3.SS1.p5.1.m1.1.2.2.2.cmml" xref="S3.SS1.p5.1.m1.1.2.2.2">Φ</ci><ci id="S3.SS1.p5.1.m1.1.1.cmml" xref="S3.SS1.p5.1.m1.1.1">𝐼</ci></apply><apply id="S3.SS1.p5.1.m1.1.2.3.cmml" xref="S3.SS1.p5.1.m1.1.2.3"><csymbol cd="ambiguous" id="S3.SS1.p5.1.m1.1.2.3.1.cmml" xref="S3.SS1.p5.1.m1.1.2.3">superscript</csymbol><ci id="S3.SS1.p5.1.m1.1.2.3.2.cmml" xref="S3.SS1.p5.1.m1.1.2.3.2">ℝ</ci><apply id="S3.SS1.p5.1.m1.1.2.3.3.cmml" xref="S3.SS1.p5.1.m1.1.2.3.3"><times id="S3.SS1.p5.1.m1.1.2.3.3.1.cmml" xref="S3.SS1.p5.1.m1.1.2.3.3.1"></times><ci id="S3.SS1.p5.1.m1.1.2.3.3.2.cmml" xref="S3.SS1.p5.1.m1.1.2.3.3.2">𝐶</ci><ci id="S3.SS1.p5.1.m1.1.2.3.3.3.cmml" xref="S3.SS1.p5.1.m1.1.2.3.3.3">𝐻</ci><ci id="S3.SS1.p5.1.m1.1.2.3.3.4.cmml" xref="S3.SS1.p5.1.m1.1.2.3.3.4">𝑊</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p5.1.m1.1c">\Phi(I)\in\mathbb{R}^{C\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p5.1.m1.1d">roman_Φ ( italic_I ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_H × italic_W end_POSTSUPERSCRIPT</annotation></semantics></math> specifies the parameters of a colored Gaussian for each pixel <math alttext="u=(u_{x},u_{y},1)" class="ltx_Math" display="inline" id="S3.SS1.p5.2.m2.3"><semantics id="S3.SS1.p5.2.m2.3a"><mrow id="S3.SS1.p5.2.m2.3.3" xref="S3.SS1.p5.2.m2.3.3.cmml"><mi id="S3.SS1.p5.2.m2.3.3.4" xref="S3.SS1.p5.2.m2.3.3.4.cmml">u</mi><mo id="S3.SS1.p5.2.m2.3.3.3" xref="S3.SS1.p5.2.m2.3.3.3.cmml">=</mo><mrow id="S3.SS1.p5.2.m2.3.3.2.2" xref="S3.SS1.p5.2.m2.3.3.2.3.cmml"><mo id="S3.SS1.p5.2.m2.3.3.2.2.3" stretchy="false" xref="S3.SS1.p5.2.m2.3.3.2.3.cmml">(</mo><msub id="S3.SS1.p5.2.m2.2.2.1.1.1" xref="S3.SS1.p5.2.m2.2.2.1.1.1.cmml"><mi id="S3.SS1.p5.2.m2.2.2.1.1.1.2" xref="S3.SS1.p5.2.m2.2.2.1.1.1.2.cmml">u</mi><mi id="S3.SS1.p5.2.m2.2.2.1.1.1.3" xref="S3.SS1.p5.2.m2.2.2.1.1.1.3.cmml">x</mi></msub><mo id="S3.SS1.p5.2.m2.3.3.2.2.4" xref="S3.SS1.p5.2.m2.3.3.2.3.cmml">,</mo><msub id="S3.SS1.p5.2.m2.3.3.2.2.2" xref="S3.SS1.p5.2.m2.3.3.2.2.2.cmml"><mi id="S3.SS1.p5.2.m2.3.3.2.2.2.2" xref="S3.SS1.p5.2.m2.3.3.2.2.2.2.cmml">u</mi><mi id="S3.SS1.p5.2.m2.3.3.2.2.2.3" xref="S3.SS1.p5.2.m2.3.3.2.2.2.3.cmml">y</mi></msub><mo id="S3.SS1.p5.2.m2.3.3.2.2.5" xref="S3.SS1.p5.2.m2.3.3.2.3.cmml">,</mo><mn id="S3.SS1.p5.2.m2.1.1" xref="S3.SS1.p5.2.m2.1.1.cmml">1</mn><mo id="S3.SS1.p5.2.m2.3.3.2.2.6" stretchy="false" xref="S3.SS1.p5.2.m2.3.3.2.3.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p5.2.m2.3b"><apply id="S3.SS1.p5.2.m2.3.3.cmml" xref="S3.SS1.p5.2.m2.3.3"><eq id="S3.SS1.p5.2.m2.3.3.3.cmml" xref="S3.SS1.p5.2.m2.3.3.3"></eq><ci id="S3.SS1.p5.2.m2.3.3.4.cmml" xref="S3.SS1.p5.2.m2.3.3.4">𝑢</ci><vector id="S3.SS1.p5.2.m2.3.3.2.3.cmml" xref="S3.SS1.p5.2.m2.3.3.2.2"><apply id="S3.SS1.p5.2.m2.2.2.1.1.1.cmml" xref="S3.SS1.p5.2.m2.2.2.1.1.1"><csymbol cd="ambiguous" id="S3.SS1.p5.2.m2.2.2.1.1.1.1.cmml" xref="S3.SS1.p5.2.m2.2.2.1.1.1">subscript</csymbol><ci id="S3.SS1.p5.2.m2.2.2.1.1.1.2.cmml" xref="S3.SS1.p5.2.m2.2.2.1.1.1.2">𝑢</ci><ci id="S3.SS1.p5.2.m2.2.2.1.1.1.3.cmml" xref="S3.SS1.p5.2.m2.2.2.1.1.1.3">𝑥</ci></apply><apply id="S3.SS1.p5.2.m2.3.3.2.2.2.cmml" xref="S3.SS1.p5.2.m2.3.3.2.2.2"><csymbol cd="ambiguous" id="S3.SS1.p5.2.m2.3.3.2.2.2.1.cmml" xref="S3.SS1.p5.2.m2.3.3.2.2.2">subscript</csymbol><ci id="S3.SS1.p5.2.m2.3.3.2.2.2.2.cmml" xref="S3.SS1.p5.2.m2.3.3.2.2.2.2">𝑢</ci><ci id="S3.SS1.p5.2.m2.3.3.2.2.2.3.cmml" xref="S3.SS1.p5.2.m2.3.3.2.2.2.3">𝑦</ci></apply><cn id="S3.SS1.p5.2.m2.1.1.cmml" type="integer" xref="S3.SS1.p5.2.m2.1.1">1</cn></vector></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p5.2.m2.3c">u=(u_{x},u_{y},1)</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p5.2.m2.3d">italic_u = ( italic_u start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , 1 )</annotation></semantics></math>. The mean of each Gaussian is:</p> <table class="ltx_equation ltx_eqn_table" id="S3.E6"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\mu=K^{-1}ud+\Delta," class="ltx_Math" display="block" id="S3.E6.m1.1"><semantics id="S3.E6.m1.1a"><mrow id="S3.E6.m1.1.1.1" xref="S3.E6.m1.1.1.1.1.cmml"><mrow id="S3.E6.m1.1.1.1.1" xref="S3.E6.m1.1.1.1.1.cmml"><mi id="S3.E6.m1.1.1.1.1.2" xref="S3.E6.m1.1.1.1.1.2.cmml">μ</mi><mo id="S3.E6.m1.1.1.1.1.1" xref="S3.E6.m1.1.1.1.1.1.cmml">=</mo><mrow id="S3.E6.m1.1.1.1.1.3" xref="S3.E6.m1.1.1.1.1.3.cmml"><mrow id="S3.E6.m1.1.1.1.1.3.2" xref="S3.E6.m1.1.1.1.1.3.2.cmml"><msup id="S3.E6.m1.1.1.1.1.3.2.2" xref="S3.E6.m1.1.1.1.1.3.2.2.cmml"><mi id="S3.E6.m1.1.1.1.1.3.2.2.2" xref="S3.E6.m1.1.1.1.1.3.2.2.2.cmml">K</mi><mrow id="S3.E6.m1.1.1.1.1.3.2.2.3" xref="S3.E6.m1.1.1.1.1.3.2.2.3.cmml"><mo id="S3.E6.m1.1.1.1.1.3.2.2.3a" xref="S3.E6.m1.1.1.1.1.3.2.2.3.cmml">−</mo><mn id="S3.E6.m1.1.1.1.1.3.2.2.3.2" xref="S3.E6.m1.1.1.1.1.3.2.2.3.2.cmml">1</mn></mrow></msup><mo id="S3.E6.m1.1.1.1.1.3.2.1" xref="S3.E6.m1.1.1.1.1.3.2.1.cmml"></mo><mi id="S3.E6.m1.1.1.1.1.3.2.3" xref="S3.E6.m1.1.1.1.1.3.2.3.cmml">u</mi><mo id="S3.E6.m1.1.1.1.1.3.2.1a" xref="S3.E6.m1.1.1.1.1.3.2.1.cmml"></mo><mi id="S3.E6.m1.1.1.1.1.3.2.4" xref="S3.E6.m1.1.1.1.1.3.2.4.cmml">d</mi></mrow><mo id="S3.E6.m1.1.1.1.1.3.1" xref="S3.E6.m1.1.1.1.1.3.1.cmml">+</mo><mi id="S3.E6.m1.1.1.1.1.3.3" mathvariant="normal" xref="S3.E6.m1.1.1.1.1.3.3.cmml">Δ</mi></mrow></mrow><mo id="S3.E6.m1.1.1.1.2" xref="S3.E6.m1.1.1.1.1.cmml">,</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.E6.m1.1b"><apply id="S3.E6.m1.1.1.1.1.cmml" xref="S3.E6.m1.1.1.1"><eq id="S3.E6.m1.1.1.1.1.1.cmml" xref="S3.E6.m1.1.1.1.1.1"></eq><ci id="S3.E6.m1.1.1.1.1.2.cmml" xref="S3.E6.m1.1.1.1.1.2">𝜇</ci><apply id="S3.E6.m1.1.1.1.1.3.cmml" xref="S3.E6.m1.1.1.1.1.3"><plus id="S3.E6.m1.1.1.1.1.3.1.cmml" xref="S3.E6.m1.1.1.1.1.3.1"></plus><apply id="S3.E6.m1.1.1.1.1.3.2.cmml" xref="S3.E6.m1.1.1.1.1.3.2"><times id="S3.E6.m1.1.1.1.1.3.2.1.cmml" xref="S3.E6.m1.1.1.1.1.3.2.1"></times><apply id="S3.E6.m1.1.1.1.1.3.2.2.cmml" xref="S3.E6.m1.1.1.1.1.3.2.2"><csymbol cd="ambiguous" id="S3.E6.m1.1.1.1.1.3.2.2.1.cmml" xref="S3.E6.m1.1.1.1.1.3.2.2">superscript</csymbol><ci id="S3.E6.m1.1.1.1.1.3.2.2.2.cmml" xref="S3.E6.m1.1.1.1.1.3.2.2.2">𝐾</ci><apply id="S3.E6.m1.1.1.1.1.3.2.2.3.cmml" xref="S3.E6.m1.1.1.1.1.3.2.2.3"><minus id="S3.E6.m1.1.1.1.1.3.2.2.3.1.cmml" xref="S3.E6.m1.1.1.1.1.3.2.2.3"></minus><cn id="S3.E6.m1.1.1.1.1.3.2.2.3.2.cmml" type="integer" xref="S3.E6.m1.1.1.1.1.3.2.2.3.2">1</cn></apply></apply><ci id="S3.E6.m1.1.1.1.1.3.2.3.cmml" xref="S3.E6.m1.1.1.1.1.3.2.3">𝑢</ci><ci id="S3.E6.m1.1.1.1.1.3.2.4.cmml" xref="S3.E6.m1.1.1.1.1.3.2.4">𝑑</ci></apply><ci id="S3.E6.m1.1.1.1.1.3.3.cmml" xref="S3.E6.m1.1.1.1.1.3.3">Δ</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E6.m1.1c">\mu=K^{-1}ud+\Delta,</annotation><annotation encoding="application/x-llamapun" id="S3.E6.m1.1d">italic_μ = italic_K start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_u italic_d + roman_Δ ,</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(6)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS1.p5.4">where <math alttext="K=\text{diag}(f,f,1)\in\mathbb{R}^{3\times 3}" class="ltx_Math" display="inline" id="S3.SS1.p5.3.m1.3"><semantics id="S3.SS1.p5.3.m1.3a"><mrow id="S3.SS1.p5.3.m1.3.4" xref="S3.SS1.p5.3.m1.3.4.cmml"><mi id="S3.SS1.p5.3.m1.3.4.2" xref="S3.SS1.p5.3.m1.3.4.2.cmml">K</mi><mo id="S3.SS1.p5.3.m1.3.4.3" xref="S3.SS1.p5.3.m1.3.4.3.cmml">=</mo><mrow id="S3.SS1.p5.3.m1.3.4.4" xref="S3.SS1.p5.3.m1.3.4.4.cmml"><mtext id="S3.SS1.p5.3.m1.3.4.4.2" xref="S3.SS1.p5.3.m1.3.4.4.2a.cmml">diag</mtext><mo id="S3.SS1.p5.3.m1.3.4.4.1" xref="S3.SS1.p5.3.m1.3.4.4.1.cmml"></mo><mrow id="S3.SS1.p5.3.m1.3.4.4.3.2" xref="S3.SS1.p5.3.m1.3.4.4.3.1.cmml"><mo id="S3.SS1.p5.3.m1.3.4.4.3.2.1" stretchy="false" xref="S3.SS1.p5.3.m1.3.4.4.3.1.cmml">(</mo><mi id="S3.SS1.p5.3.m1.1.1" xref="S3.SS1.p5.3.m1.1.1.cmml">f</mi><mo id="S3.SS1.p5.3.m1.3.4.4.3.2.2" xref="S3.SS1.p5.3.m1.3.4.4.3.1.cmml">,</mo><mi id="S3.SS1.p5.3.m1.2.2" xref="S3.SS1.p5.3.m1.2.2.cmml">f</mi><mo id="S3.SS1.p5.3.m1.3.4.4.3.2.3" xref="S3.SS1.p5.3.m1.3.4.4.3.1.cmml">,</mo><mn id="S3.SS1.p5.3.m1.3.3" xref="S3.SS1.p5.3.m1.3.3.cmml">1</mn><mo id="S3.SS1.p5.3.m1.3.4.4.3.2.4" stretchy="false" xref="S3.SS1.p5.3.m1.3.4.4.3.1.cmml">)</mo></mrow></mrow><mo id="S3.SS1.p5.3.m1.3.4.5" xref="S3.SS1.p5.3.m1.3.4.5.cmml">∈</mo><msup id="S3.SS1.p5.3.m1.3.4.6" xref="S3.SS1.p5.3.m1.3.4.6.cmml"><mi id="S3.SS1.p5.3.m1.3.4.6.2" xref="S3.SS1.p5.3.m1.3.4.6.2.cmml">ℝ</mi><mrow id="S3.SS1.p5.3.m1.3.4.6.3" xref="S3.SS1.p5.3.m1.3.4.6.3.cmml"><mn id="S3.SS1.p5.3.m1.3.4.6.3.2" xref="S3.SS1.p5.3.m1.3.4.6.3.2.cmml">3</mn><mo id="S3.SS1.p5.3.m1.3.4.6.3.1" lspace="0.222em" rspace="0.222em" xref="S3.SS1.p5.3.m1.3.4.6.3.1.cmml">×</mo><mn id="S3.SS1.p5.3.m1.3.4.6.3.3" xref="S3.SS1.p5.3.m1.3.4.6.3.3.cmml">3</mn></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p5.3.m1.3b"><apply id="S3.SS1.p5.3.m1.3.4.cmml" xref="S3.SS1.p5.3.m1.3.4"><and id="S3.SS1.p5.3.m1.3.4a.cmml" xref="S3.SS1.p5.3.m1.3.4"></and><apply id="S3.SS1.p5.3.m1.3.4b.cmml" xref="S3.SS1.p5.3.m1.3.4"><eq id="S3.SS1.p5.3.m1.3.4.3.cmml" xref="S3.SS1.p5.3.m1.3.4.3"></eq><ci id="S3.SS1.p5.3.m1.3.4.2.cmml" xref="S3.SS1.p5.3.m1.3.4.2">𝐾</ci><apply id="S3.SS1.p5.3.m1.3.4.4.cmml" xref="S3.SS1.p5.3.m1.3.4.4"><times id="S3.SS1.p5.3.m1.3.4.4.1.cmml" xref="S3.SS1.p5.3.m1.3.4.4.1"></times><ci id="S3.SS1.p5.3.m1.3.4.4.2a.cmml" xref="S3.SS1.p5.3.m1.3.4.4.2"><mtext id="S3.SS1.p5.3.m1.3.4.4.2.cmml" xref="S3.SS1.p5.3.m1.3.4.4.2">diag</mtext></ci><vector id="S3.SS1.p5.3.m1.3.4.4.3.1.cmml" xref="S3.SS1.p5.3.m1.3.4.4.3.2"><ci id="S3.SS1.p5.3.m1.1.1.cmml" xref="S3.SS1.p5.3.m1.1.1">𝑓</ci><ci id="S3.SS1.p5.3.m1.2.2.cmml" xref="S3.SS1.p5.3.m1.2.2">𝑓</ci><cn id="S3.SS1.p5.3.m1.3.3.cmml" type="integer" xref="S3.SS1.p5.3.m1.3.3">1</cn></vector></apply></apply><apply id="S3.SS1.p5.3.m1.3.4c.cmml" xref="S3.SS1.p5.3.m1.3.4"><in id="S3.SS1.p5.3.m1.3.4.5.cmml" xref="S3.SS1.p5.3.m1.3.4.5"></in><share href="https://arxiv.org/html/2503.12553v1#S3.SS1.p5.3.m1.3.4.4.cmml" id="S3.SS1.p5.3.m1.3.4d.cmml" xref="S3.SS1.p5.3.m1.3.4"></share><apply id="S3.SS1.p5.3.m1.3.4.6.cmml" xref="S3.SS1.p5.3.m1.3.4.6"><csymbol cd="ambiguous" id="S3.SS1.p5.3.m1.3.4.6.1.cmml" xref="S3.SS1.p5.3.m1.3.4.6">superscript</csymbol><ci id="S3.SS1.p5.3.m1.3.4.6.2.cmml" xref="S3.SS1.p5.3.m1.3.4.6.2">ℝ</ci><apply id="S3.SS1.p5.3.m1.3.4.6.3.cmml" xref="S3.SS1.p5.3.m1.3.4.6.3"><times id="S3.SS1.p5.3.m1.3.4.6.3.1.cmml" xref="S3.SS1.p5.3.m1.3.4.6.3.1"></times><cn id="S3.SS1.p5.3.m1.3.4.6.3.2.cmml" type="integer" xref="S3.SS1.p5.3.m1.3.4.6.3.2">3</cn><cn id="S3.SS1.p5.3.m1.3.4.6.3.3.cmml" type="integer" xref="S3.SS1.p5.3.m1.3.4.6.3.3">3</cn></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p5.3.m1.3c">K=\text{diag}(f,f,1)\in\mathbb{R}^{3\times 3}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p5.3.m1.3d">italic_K = diag ( italic_f , italic_f , 1 ) ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 3 end_POSTSUPERSCRIPT</annotation></semantics></math> is the camera calibration matrix with focal length <math alttext="f" class="ltx_Math" display="inline" id="S3.SS1.p5.4.m2.1"><semantics id="S3.SS1.p5.4.m2.1a"><mi id="S3.SS1.p5.4.m2.1.1" xref="S3.SS1.p5.4.m2.1.1.cmml">f</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p5.4.m2.1b"><ci id="S3.SS1.p5.4.m2.1.1.cmml" xref="S3.SS1.p5.4.m2.1.1">𝑓</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p5.4.m2.1c">f</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p5.4.m2.1d">italic_f</annotation></semantics></math>.</p> </div> <div class="ltx_para" id="S3.SS1.p6"> <p class="ltx_p" id="S3.SS1.p6.5">The neural network <math alttext="\Phi" class="ltx_Math" display="inline" id="S3.SS1.p6.1.m1.1"><semantics id="S3.SS1.p6.1.m1.1a"><mi id="S3.SS1.p6.1.m1.1.1" mathvariant="normal" xref="S3.SS1.p6.1.m1.1.1.cmml">Φ</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p6.1.m1.1b"><ci id="S3.SS1.p6.1.m1.1.1.cmml" xref="S3.SS1.p6.1.m1.1.1">Φ</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p6.1.m1.1c">\Phi</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p6.1.m1.1d">roman_Φ</annotation></semantics></math> is trained using image triplets consisting of <math alttext="(I,J,\pi)" class="ltx_Math" display="inline" id="S3.SS1.p6.2.m2.3"><semantics id="S3.SS1.p6.2.m2.3a"><mrow id="S3.SS1.p6.2.m2.3.4.2" xref="S3.SS1.p6.2.m2.3.4.1.cmml"><mo id="S3.SS1.p6.2.m2.3.4.2.1" stretchy="false" xref="S3.SS1.p6.2.m2.3.4.1.cmml">(</mo><mi id="S3.SS1.p6.2.m2.1.1" xref="S3.SS1.p6.2.m2.1.1.cmml">I</mi><mo id="S3.SS1.p6.2.m2.3.4.2.2" xref="S3.SS1.p6.2.m2.3.4.1.cmml">,</mo><mi id="S3.SS1.p6.2.m2.2.2" xref="S3.SS1.p6.2.m2.2.2.cmml">J</mi><mo id="S3.SS1.p6.2.m2.3.4.2.3" xref="S3.SS1.p6.2.m2.3.4.1.cmml">,</mo><mi id="S3.SS1.p6.2.m2.3.3" xref="S3.SS1.p6.2.m2.3.3.cmml">π</mi><mo id="S3.SS1.p6.2.m2.3.4.2.4" stretchy="false" xref="S3.SS1.p6.2.m2.3.4.1.cmml">)</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p6.2.m2.3b"><vector id="S3.SS1.p6.2.m2.3.4.1.cmml" xref="S3.SS1.p6.2.m2.3.4.2"><ci id="S3.SS1.p6.2.m2.1.1.cmml" xref="S3.SS1.p6.2.m2.1.1">𝐼</ci><ci id="S3.SS1.p6.2.m2.2.2.cmml" xref="S3.SS1.p6.2.m2.2.2">𝐽</ci><ci id="S3.SS1.p6.2.m2.3.3.cmml" xref="S3.SS1.p6.2.m2.3.3">𝜋</ci></vector></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p6.2.m2.3c">(I,J,\pi)</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p6.2.m2.3d">( italic_I , italic_J , italic_π )</annotation></semantics></math>, where <math alttext="I" class="ltx_Math" display="inline" id="S3.SS1.p6.3.m3.1"><semantics id="S3.SS1.p6.3.m3.1a"><mi id="S3.SS1.p6.3.m3.1.1" xref="S3.SS1.p6.3.m3.1.1.cmml">I</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p6.3.m3.1b"><ci id="S3.SS1.p6.3.m3.1.1.cmml" xref="S3.SS1.p6.3.m3.1.1">𝐼</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p6.3.m3.1c">I</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p6.3.m3.1d">italic_I</annotation></semantics></math> is the input image, and <math alttext="J" class="ltx_Math" display="inline" id="S3.SS1.p6.4.m4.1"><semantics id="S3.SS1.p6.4.m4.1a"><mi id="S3.SS1.p6.4.m4.1.1" xref="S3.SS1.p6.4.m4.1.1.cmml">J</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p6.4.m4.1b"><ci id="S3.SS1.p6.4.m4.1.1.cmml" xref="S3.SS1.p6.4.m4.1.1">𝐽</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p6.4.m4.1c">J</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p6.4.m4.1d">italic_J</annotation></semantics></math> is the target ground-truth image at camera pose <math alttext="\pi" class="ltx_Math" display="inline" id="S3.SS1.p6.5.m5.1"><semantics id="S3.SS1.p6.5.m5.1a"><mi id="S3.SS1.p6.5.m5.1.1" xref="S3.SS1.p6.5.m5.1.1.cmml">π</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p6.5.m5.1b"><ci id="S3.SS1.p6.5.m5.1.1.cmml" xref="S3.SS1.p6.5.m5.1.1">𝜋</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p6.5.m5.1c">\pi</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p6.5.m5.1d">italic_π</annotation></semantics></math>.</p> </div> </section> <section class="ltx_subsection" id="S3.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">3.2 </span>Our Method: Niagara</h3> <div class="ltx_para" id="S3.SS2.p1"> <p class="ltx_p" id="S3.SS2.p1.1">Our Niagara method integrates both the normal and depth as input, which are then transformed to a geometric affine field. It is then converted to 3D Gaussian parameters that can be used to render high-quality novel views.</p> </div> <section class="ltx_subsubsection" id="S3.SS2.SSS1"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection">3.2.1 </span>Prior Information and Geometric Feature</h4> <div class="ltx_para ltx_noindent" id="S3.SS2.SSS1.p1"> <p class="ltx_p" id="S3.SS2.SSS1.p1.4"><span class="ltx_text ltx_font_bold" id="S3.SS2.SSS1.p1.4.1">Normal-Integrated Depth Estimator.</span> To overcome the inaccurate geometry in previous methods (<span class="ltx_text ltx_font_italic" id="S3.SS2.SSS1.p1.4.2">e.g.</span>, Flash3D, shown in <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#S1.F2" title="Figure 2 ‣ 1 Introduction ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Figure 2</span></a>), we propose an improvement by incorporating the concept of <span class="ltx_text ltx_font_italic" id="S3.SS2.SSS1.p1.4.3">surface normals</span> into our framework. The inclusion of surface normals empowers our model to improve both the photometric precision and lighting uniformity within rendered scenes. Specifically, we use predicted per-pixel normals and depth from <span class="ltx_text ltx_font_italic" id="S3.SS2.SSS1.p1.4.4">pre-trained</span> normal estimator <math alttext="\Phi_{n}" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.1.m1.1"><semantics id="S3.SS2.SSS1.p1.1.m1.1a"><msub id="S3.SS2.SSS1.p1.1.m1.1.1" xref="S3.SS2.SSS1.p1.1.m1.1.1.cmml"><mi id="S3.SS2.SSS1.p1.1.m1.1.1.2" mathvariant="normal" xref="S3.SS2.SSS1.p1.1.m1.1.1.2.cmml">Φ</mi><mi id="S3.SS2.SSS1.p1.1.m1.1.1.3" xref="S3.SS2.SSS1.p1.1.m1.1.1.3.cmml">n</mi></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.1.m1.1b"><apply id="S3.SS2.SSS1.p1.1.m1.1.1.cmml" xref="S3.SS2.SSS1.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS1.p1.1.m1.1.1.1.cmml" xref="S3.SS2.SSS1.p1.1.m1.1.1">subscript</csymbol><ci id="S3.SS2.SSS1.p1.1.m1.1.1.2.cmml" xref="S3.SS2.SSS1.p1.1.m1.1.1.2">Φ</ci><ci id="S3.SS2.SSS1.p1.1.m1.1.1.3.cmml" xref="S3.SS2.SSS1.p1.1.m1.1.1.3">𝑛</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.1.m1.1c">\Phi_{n}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.1.m1.1d">roman_Φ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT</annotation></semantics></math> <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib79" title="">79</a>]</cite> and <span class="ltx_text ltx_font_italic" id="S3.SS2.SSS1.p1.4.5">pre-trained</span> depth estimator <math alttext="\Psi" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.2.m2.1"><semantics id="S3.SS2.SSS1.p1.2.m2.1a"><mi id="S3.SS2.SSS1.p1.2.m2.1.1" mathvariant="normal" xref="S3.SS2.SSS1.p1.2.m2.1.1.cmml">Ψ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.2.m2.1b"><ci id="S3.SS2.SSS1.p1.2.m2.1.1.cmml" xref="S3.SS2.SSS1.p1.2.m2.1.1">Ψ</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.2.m2.1c">\Psi</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.2.m2.1d">roman_Ψ</annotation></semantics></math> <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib46" title="">46</a>]</cite>, yielding a normal map <math alttext="N" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.3.m3.1"><semantics id="S3.SS2.SSS1.p1.3.m3.1a"><mi id="S3.SS2.SSS1.p1.3.m3.1.1" xref="S3.SS2.SSS1.p1.3.m3.1.1.cmml">N</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.3.m3.1b"><ci id="S3.SS2.SSS1.p1.3.m3.1.1.cmml" xref="S3.SS2.SSS1.p1.3.m3.1.1">𝑁</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.3.m3.1c">N</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.3.m3.1d">italic_N</annotation></semantics></math> and depth map <math alttext="D" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.4.m4.1"><semantics id="S3.SS2.SSS1.p1.4.m4.1a"><mi id="S3.SS2.SSS1.p1.4.m4.1.1" xref="S3.SS2.SSS1.p1.4.m4.1.1.cmml">D</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.4.m4.1b"><ci id="S3.SS2.SSS1.p1.4.m4.1.1.cmml" xref="S3.SS2.SSS1.p1.4.m4.1.1">𝐷</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.4.m4.1c">D</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.4.m4.1d">italic_D</annotation></semantics></math>:</p> <table class="ltx_equation ltx_eqn_table" id="S3.E7"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\begin{split}&N=\Phi_{n}(I),\\ &D=\Psi(I).\end{split}" class="ltx_Math" display="block" id="S3.E7.m1.18"><semantics id="S3.E7.m1.18a"><mtable columnspacing="0pt" displaystyle="true" id="S3.E7.m1.18.18.3" rowspacing="0pt"><mtr id="S3.E7.m1.18.18.3a"><mtd id="S3.E7.m1.18.18.3b"></mtd><mtd class="ltx_align_left" columnalign="left" id="S3.E7.m1.18.18.3c"><mrow id="S3.E7.m1.17.17.2.16.9.9.9"><mrow id="S3.E7.m1.17.17.2.16.9.9.9.1"><mi id="S3.E7.m1.1.1.1.1.1.1" xref="S3.E7.m1.1.1.1.1.1.1.cmml">N</mi><mo id="S3.E7.m1.2.2.2.2.2.2" xref="S3.E7.m1.2.2.2.2.2.2.cmml">=</mo><mrow id="S3.E7.m1.17.17.2.16.9.9.9.1.1"><msub id="S3.E7.m1.17.17.2.16.9.9.9.1.1.2"><mi id="S3.E7.m1.3.3.3.3.3.3" mathvariant="normal" xref="S3.E7.m1.3.3.3.3.3.3.cmml">Φ</mi><mi id="S3.E7.m1.4.4.4.4.4.4.1" xref="S3.E7.m1.4.4.4.4.4.4.1.cmml">n</mi></msub><mo id="S3.E7.m1.17.17.2.16.9.9.9.1.1.1"></mo><mrow id="S3.E7.m1.17.17.2.16.9.9.9.1.1.3"><mo id="S3.E7.m1.5.5.5.5.5.5" stretchy="false">(</mo><mi id="S3.E7.m1.6.6.6.6.6.6" xref="S3.E7.m1.6.6.6.6.6.6.cmml">I</mi><mo id="S3.E7.m1.7.7.7.7.7.7" stretchy="false">)</mo></mrow></mrow></mrow><mo id="S3.E7.m1.8.8.8.8.8.8">,</mo></mrow></mtd></mtr><mtr id="S3.E7.m1.18.18.3d"><mtd id="S3.E7.m1.18.18.3e"></mtd><mtd class="ltx_align_left" columnalign="left" id="S3.E7.m1.18.18.3f"><mrow id="S3.E7.m1.18.18.3.17.8.8.8"><mrow id="S3.E7.m1.18.18.3.17.8.8.8.1"><mi id="S3.E7.m1.9.9.9.1.1.1" xref="S3.E7.m1.9.9.9.1.1.1.cmml">D</mi><mo id="S3.E7.m1.10.10.10.2.2.2" xref="S3.E7.m1.10.10.10.2.2.2.cmml">=</mo><mrow id="S3.E7.m1.18.18.3.17.8.8.8.1.1"><mi id="S3.E7.m1.11.11.11.3.3.3" mathvariant="normal" xref="S3.E7.m1.11.11.11.3.3.3.cmml">Ψ</mi><mo id="S3.E7.m1.18.18.3.17.8.8.8.1.1.1"></mo><mrow id="S3.E7.m1.18.18.3.17.8.8.8.1.1.2"><mo id="S3.E7.m1.12.12.12.4.4.4" stretchy="false">(</mo><mi id="S3.E7.m1.13.13.13.5.5.5" xref="S3.E7.m1.13.13.13.5.5.5.cmml">I</mi><mo id="S3.E7.m1.14.14.14.6.6.6" stretchy="false">)</mo></mrow></mrow></mrow><mo id="S3.E7.m1.15.15.15.7.7.7" lspace="0em">.</mo></mrow></mtd></mtr></mtable><annotation-xml encoding="MathML-Content" id="S3.E7.m1.18b"><apply id="S3.E7.m1.16.16.1.1.1.3.cmml"><csymbol cd="ambiguous" id="S3.E7.m1.16.16.1.1.1.3a.cmml">formulae-sequence</csymbol><apply id="S3.E7.m1.16.16.1.1.1.1.1.cmml"><eq id="S3.E7.m1.2.2.2.2.2.2.cmml" xref="S3.E7.m1.2.2.2.2.2.2"></eq><ci id="S3.E7.m1.1.1.1.1.1.1.cmml" xref="S3.E7.m1.1.1.1.1.1.1">𝑁</ci><apply id="S3.E7.m1.16.16.1.1.1.1.1.3.cmml"><times id="S3.E7.m1.16.16.1.1.1.1.1.3.1.cmml"></times><apply id="S3.E7.m1.16.16.1.1.1.1.1.3.2.cmml"><csymbol cd="ambiguous" id="S3.E7.m1.16.16.1.1.1.1.1.3.2.1.cmml">subscript</csymbol><ci id="S3.E7.m1.3.3.3.3.3.3.cmml" xref="S3.E7.m1.3.3.3.3.3.3">Φ</ci><ci id="S3.E7.m1.4.4.4.4.4.4.1.cmml" xref="S3.E7.m1.4.4.4.4.4.4.1">𝑛</ci></apply><ci id="S3.E7.m1.6.6.6.6.6.6.cmml" xref="S3.E7.m1.6.6.6.6.6.6">𝐼</ci></apply></apply><apply id="S3.E7.m1.16.16.1.1.1.2.2.cmml"><eq id="S3.E7.m1.10.10.10.2.2.2.cmml" xref="S3.E7.m1.10.10.10.2.2.2"></eq><ci id="S3.E7.m1.9.9.9.1.1.1.cmml" xref="S3.E7.m1.9.9.9.1.1.1">𝐷</ci><apply id="S3.E7.m1.16.16.1.1.1.2.2.3.cmml"><times id="S3.E7.m1.16.16.1.1.1.2.2.3.1.cmml"></times><ci id="S3.E7.m1.11.11.11.3.3.3.cmml" xref="S3.E7.m1.11.11.11.3.3.3">Ψ</ci><ci id="S3.E7.m1.13.13.13.5.5.5.cmml" xref="S3.E7.m1.13.13.13.5.5.5">𝐼</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E7.m1.18c">\begin{split}&N=\Phi_{n}(I),\\ &D=\Psi(I).\end{split}</annotation><annotation encoding="application/x-llamapun" id="S3.E7.m1.18d">start_ROW start_CELL end_CELL start_CELL italic_N = roman_Φ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_I ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_D = roman_Ψ ( italic_I ) . end_CELL end_ROW</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(7)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS2.SSS1.p1.18"><span class="ltx_text ltx_font_bold" id="S3.SS2.SSS1.p1.18.1">Gaussian Splatting Geometric Feature.</span> The surface normals positively influence the Gaussian parameters predicted by our network. The overall output from our network can be represented as:</p> <table class="ltx_equation ltx_eqn_table" id="S3.E8"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="[\Phi(I,D,\gamma)]_{u}=(\sigma,\Delta,s,\theta,c)," class="ltx_Math" display="block" id="S3.E8.m1.9"><semantics id="S3.E8.m1.9a"><mrow id="S3.E8.m1.9.9.1" xref="S3.E8.m1.9.9.1.1.cmml"><mrow id="S3.E8.m1.9.9.1.1" xref="S3.E8.m1.9.9.1.1.cmml"><msub id="S3.E8.m1.9.9.1.1.1" xref="S3.E8.m1.9.9.1.1.1.cmml"><mrow id="S3.E8.m1.9.9.1.1.1.1.1" xref="S3.E8.m1.9.9.1.1.1.1.2.cmml"><mo id="S3.E8.m1.9.9.1.1.1.1.1.2" stretchy="false" xref="S3.E8.m1.9.9.1.1.1.1.2.1.cmml">[</mo><mrow id="S3.E8.m1.9.9.1.1.1.1.1.1" xref="S3.E8.m1.9.9.1.1.1.1.1.1.cmml"><mi id="S3.E8.m1.9.9.1.1.1.1.1.1.2" mathvariant="normal" xref="S3.E8.m1.9.9.1.1.1.1.1.1.2.cmml">Φ</mi><mo id="S3.E8.m1.9.9.1.1.1.1.1.1.1" xref="S3.E8.m1.9.9.1.1.1.1.1.1.1.cmml"></mo><mrow id="S3.E8.m1.9.9.1.1.1.1.1.1.3.2" xref="S3.E8.m1.9.9.1.1.1.1.1.1.3.1.cmml"><mo id="S3.E8.m1.9.9.1.1.1.1.1.1.3.2.1" stretchy="false" xref="S3.E8.m1.9.9.1.1.1.1.1.1.3.1.cmml">(</mo><mi id="S3.E8.m1.1.1" xref="S3.E8.m1.1.1.cmml">I</mi><mo id="S3.E8.m1.9.9.1.1.1.1.1.1.3.2.2" xref="S3.E8.m1.9.9.1.1.1.1.1.1.3.1.cmml">,</mo><mi id="S3.E8.m1.2.2" xref="S3.E8.m1.2.2.cmml">D</mi><mo id="S3.E8.m1.9.9.1.1.1.1.1.1.3.2.3" xref="S3.E8.m1.9.9.1.1.1.1.1.1.3.1.cmml">,</mo><mi id="S3.E8.m1.3.3" xref="S3.E8.m1.3.3.cmml">γ</mi><mo id="S3.E8.m1.9.9.1.1.1.1.1.1.3.2.4" stretchy="false" xref="S3.E8.m1.9.9.1.1.1.1.1.1.3.1.cmml">)</mo></mrow></mrow><mo id="S3.E8.m1.9.9.1.1.1.1.1.3" stretchy="false" xref="S3.E8.m1.9.9.1.1.1.1.2.1.cmml">]</mo></mrow><mi id="S3.E8.m1.9.9.1.1.1.3" xref="S3.E8.m1.9.9.1.1.1.3.cmml">u</mi></msub><mo id="S3.E8.m1.9.9.1.1.2" xref="S3.E8.m1.9.9.1.1.2.cmml">=</mo><mrow id="S3.E8.m1.9.9.1.1.3.2" xref="S3.E8.m1.9.9.1.1.3.1.cmml"><mo id="S3.E8.m1.9.9.1.1.3.2.1" stretchy="false" xref="S3.E8.m1.9.9.1.1.3.1.cmml">(</mo><mi id="S3.E8.m1.4.4" xref="S3.E8.m1.4.4.cmml">σ</mi><mo id="S3.E8.m1.9.9.1.1.3.2.2" xref="S3.E8.m1.9.9.1.1.3.1.cmml">,</mo><mi id="S3.E8.m1.5.5" mathvariant="normal" xref="S3.E8.m1.5.5.cmml">Δ</mi><mo id="S3.E8.m1.9.9.1.1.3.2.3" xref="S3.E8.m1.9.9.1.1.3.1.cmml">,</mo><mi id="S3.E8.m1.6.6" xref="S3.E8.m1.6.6.cmml">s</mi><mo id="S3.E8.m1.9.9.1.1.3.2.4" xref="S3.E8.m1.9.9.1.1.3.1.cmml">,</mo><mi id="S3.E8.m1.7.7" xref="S3.E8.m1.7.7.cmml">θ</mi><mo id="S3.E8.m1.9.9.1.1.3.2.5" xref="S3.E8.m1.9.9.1.1.3.1.cmml">,</mo><mi id="S3.E8.m1.8.8" xref="S3.E8.m1.8.8.cmml">c</mi><mo id="S3.E8.m1.9.9.1.1.3.2.6" stretchy="false" xref="S3.E8.m1.9.9.1.1.3.1.cmml">)</mo></mrow></mrow><mo id="S3.E8.m1.9.9.1.2" xref="S3.E8.m1.9.9.1.1.cmml">,</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.E8.m1.9b"><apply id="S3.E8.m1.9.9.1.1.cmml" xref="S3.E8.m1.9.9.1"><eq id="S3.E8.m1.9.9.1.1.2.cmml" xref="S3.E8.m1.9.9.1.1.2"></eq><apply id="S3.E8.m1.9.9.1.1.1.cmml" xref="S3.E8.m1.9.9.1.1.1"><csymbol cd="ambiguous" id="S3.E8.m1.9.9.1.1.1.2.cmml" xref="S3.E8.m1.9.9.1.1.1">subscript</csymbol><apply id="S3.E8.m1.9.9.1.1.1.1.2.cmml" xref="S3.E8.m1.9.9.1.1.1.1.1"><csymbol cd="latexml" id="S3.E8.m1.9.9.1.1.1.1.2.1.cmml" xref="S3.E8.m1.9.9.1.1.1.1.1.2">delimited-[]</csymbol><apply id="S3.E8.m1.9.9.1.1.1.1.1.1.cmml" xref="S3.E8.m1.9.9.1.1.1.1.1.1"><times id="S3.E8.m1.9.9.1.1.1.1.1.1.1.cmml" xref="S3.E8.m1.9.9.1.1.1.1.1.1.1"></times><ci id="S3.E8.m1.9.9.1.1.1.1.1.1.2.cmml" xref="S3.E8.m1.9.9.1.1.1.1.1.1.2">Φ</ci><vector id="S3.E8.m1.9.9.1.1.1.1.1.1.3.1.cmml" xref="S3.E8.m1.9.9.1.1.1.1.1.1.3.2"><ci id="S3.E8.m1.1.1.cmml" xref="S3.E8.m1.1.1">𝐼</ci><ci id="S3.E8.m1.2.2.cmml" xref="S3.E8.m1.2.2">𝐷</ci><ci id="S3.E8.m1.3.3.cmml" xref="S3.E8.m1.3.3">𝛾</ci></vector></apply></apply><ci id="S3.E8.m1.9.9.1.1.1.3.cmml" xref="S3.E8.m1.9.9.1.1.1.3">𝑢</ci></apply><vector id="S3.E8.m1.9.9.1.1.3.1.cmml" xref="S3.E8.m1.9.9.1.1.3.2"><ci id="S3.E8.m1.4.4.cmml" xref="S3.E8.m1.4.4">𝜎</ci><ci id="S3.E8.m1.5.5.cmml" xref="S3.E8.m1.5.5">Δ</ci><ci id="S3.E8.m1.6.6.cmml" xref="S3.E8.m1.6.6">𝑠</ci><ci id="S3.E8.m1.7.7.cmml" xref="S3.E8.m1.7.7">𝜃</ci><ci id="S3.E8.m1.8.8.cmml" xref="S3.E8.m1.8.8">𝑐</ci></vector></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E8.m1.9c">[\Phi(I,D,\gamma)]_{u}=(\sigma,\Delta,s,\theta,c),</annotation><annotation encoding="application/x-llamapun" id="S3.E8.m1.9d">[ roman_Φ ( italic_I , italic_D , italic_γ ) ] start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = ( italic_σ , roman_Δ , italic_s , italic_θ , italic_c ) ,</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(8)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS2.SSS1.p1.17">where additional parameters <math alttext="s" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.5.m1.1"><semantics id="S3.SS2.SSS1.p1.5.m1.1a"><mi id="S3.SS2.SSS1.p1.5.m1.1.1" xref="S3.SS2.SSS1.p1.5.m1.1.1.cmml">s</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.5.m1.1b"><ci id="S3.SS2.SSS1.p1.5.m1.1.1.cmml" xref="S3.SS2.SSS1.p1.5.m1.1.1">𝑠</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.5.m1.1c">s</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.5.m1.1d">italic_s</annotation></semantics></math>, <math alttext="\theta" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.6.m2.1"><semantics id="S3.SS2.SSS1.p1.6.m2.1a"><mi id="S3.SS2.SSS1.p1.6.m2.1.1" xref="S3.SS2.SSS1.p1.6.m2.1.1.cmml">θ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.6.m2.1b"><ci id="S3.SS2.SSS1.p1.6.m2.1.1.cmml" xref="S3.SS2.SSS1.p1.6.m2.1.1">𝜃</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.6.m2.1c">\theta</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.6.m2.1d">italic_θ</annotation></semantics></math>, <math alttext="\gamma" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.7.m3.1"><semantics id="S3.SS2.SSS1.p1.7.m3.1a"><mi id="S3.SS2.SSS1.p1.7.m3.1.1" xref="S3.SS2.SSS1.p1.7.m3.1.1.cmml">γ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.7.m3.1b"><ci id="S3.SS2.SSS1.p1.7.m3.1.1.cmml" xref="S3.SS2.SSS1.p1.7.m3.1.1">𝛾</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.7.m3.1c">\gamma</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.7.m3.1d">italic_γ</annotation></semantics></math>, and <math alttext="c" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.8.m4.1"><semantics id="S3.SS2.SSS1.p1.8.m4.1a"><mi id="S3.SS2.SSS1.p1.8.m4.1.1" xref="S3.SS2.SSS1.p1.8.m4.1.1.cmml">c</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.8.m4.1b"><ci id="S3.SS2.SSS1.p1.8.m4.1.1.cmml" xref="S3.SS2.SSS1.p1.8.m4.1.1">𝑐</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.8.m4.1c">c</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.8.m4.1d">italic_c</annotation></semantics></math> mean shape, orientation, normal, and color, respectively; <math alttext="\sigma" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.9.m5.1"><semantics id="S3.SS2.SSS1.p1.9.m5.1a"><mi id="S3.SS2.SSS1.p1.9.m5.1.1" xref="S3.SS2.SSS1.p1.9.m5.1.1.cmml">σ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.9.m5.1b"><ci id="S3.SS2.SSS1.p1.9.m5.1.1.cmml" xref="S3.SS2.SSS1.p1.9.m5.1.1">𝜎</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.9.m5.1c">\sigma</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.9.m5.1d">italic_σ</annotation></semantics></math> is the positive opacity, <math alttext="\Delta" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.10.m6.1"><semantics id="S3.SS2.SSS1.p1.10.m6.1a"><mi id="S3.SS2.SSS1.p1.10.m6.1.1" mathvariant="normal" xref="S3.SS2.SSS1.p1.10.m6.1.1.cmml">Δ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.10.m6.1b"><ci id="S3.SS2.SSS1.p1.10.m6.1.1.cmml" xref="S3.SS2.SSS1.p1.10.m6.1.1">Δ</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.10.m6.1c">\Delta</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.10.m6.1d">roman_Δ</annotation></semantics></math> is the 3D displacement vector, <math alttext="s" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.11.m7.1"><semantics id="S3.SS2.SSS1.p1.11.m7.1a"><mi id="S3.SS2.SSS1.p1.11.m7.1.1" xref="S3.SS2.SSS1.p1.11.m7.1.1.cmml">s</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.11.m7.1b"><ci id="S3.SS2.SSS1.p1.11.m7.1.1.cmml" xref="S3.SS2.SSS1.p1.11.m7.1.1">𝑠</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.11.m7.1c">s</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.11.m7.1d">italic_s</annotation></semantics></math> denotes 3D scale factors, <math alttext="\theta" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.12.m8.1"><semantics id="S3.SS2.SSS1.p1.12.m8.1a"><mi id="S3.SS2.SSS1.p1.12.m8.1.1" xref="S3.SS2.SSS1.p1.12.m8.1.1.cmml">θ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.12.m8.1b"><ci id="S3.SS2.SSS1.p1.12.m8.1.1.cmml" xref="S3.SS2.SSS1.p1.12.m8.1.1">𝜃</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.12.m8.1c">\theta</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.12.m8.1d">italic_θ</annotation></semantics></math> is a quaternion for rotation <math alttext="R(\theta)" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.13.m9.1"><semantics id="S3.SS2.SSS1.p1.13.m9.1a"><mrow id="S3.SS2.SSS1.p1.13.m9.1.2" xref="S3.SS2.SSS1.p1.13.m9.1.2.cmml"><mi id="S3.SS2.SSS1.p1.13.m9.1.2.2" xref="S3.SS2.SSS1.p1.13.m9.1.2.2.cmml">R</mi><mo id="S3.SS2.SSS1.p1.13.m9.1.2.1" xref="S3.SS2.SSS1.p1.13.m9.1.2.1.cmml"></mo><mrow id="S3.SS2.SSS1.p1.13.m9.1.2.3.2" xref="S3.SS2.SSS1.p1.13.m9.1.2.cmml"><mo id="S3.SS2.SSS1.p1.13.m9.1.2.3.2.1" stretchy="false" xref="S3.SS2.SSS1.p1.13.m9.1.2.cmml">(</mo><mi id="S3.SS2.SSS1.p1.13.m9.1.1" xref="S3.SS2.SSS1.p1.13.m9.1.1.cmml">θ</mi><mo id="S3.SS2.SSS1.p1.13.m9.1.2.3.2.2" stretchy="false" xref="S3.SS2.SSS1.p1.13.m9.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.13.m9.1b"><apply id="S3.SS2.SSS1.p1.13.m9.1.2.cmml" xref="S3.SS2.SSS1.p1.13.m9.1.2"><times id="S3.SS2.SSS1.p1.13.m9.1.2.1.cmml" xref="S3.SS2.SSS1.p1.13.m9.1.2.1"></times><ci id="S3.SS2.SSS1.p1.13.m9.1.2.2.cmml" xref="S3.SS2.SSS1.p1.13.m9.1.2.2">𝑅</ci><ci id="S3.SS2.SSS1.p1.13.m9.1.1.cmml" xref="S3.SS2.SSS1.p1.13.m9.1.1">𝜃</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.13.m9.1c">R(\theta)</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.13.m9.1d">italic_R ( italic_θ )</annotation></semantics></math>, <math alttext="\gamma" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.14.m10.1"><semantics id="S3.SS2.SSS1.p1.14.m10.1a"><mi id="S3.SS2.SSS1.p1.14.m10.1.1" xref="S3.SS2.SSS1.p1.14.m10.1.1.cmml">γ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.14.m10.1b"><ci id="S3.SS2.SSS1.p1.14.m10.1.1.cmml" xref="S3.SS2.SSS1.p1.14.m10.1.1">𝛾</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.14.m10.1c">\gamma</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.14.m10.1d">italic_γ</annotation></semantics></math> is the normal vector, and <math alttext="c" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.15.m11.1"><semantics id="S3.SS2.SSS1.p1.15.m11.1a"><mi id="S3.SS2.SSS1.p1.15.m11.1.1" xref="S3.SS2.SSS1.p1.15.m11.1.1.cmml">c</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.15.m11.1b"><ci id="S3.SS2.SSS1.p1.15.m11.1.1.cmml" xref="S3.SS2.SSS1.p1.15.m11.1.1">𝑐</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.15.m11.1c">c</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.15.m11.1d">italic_c</annotation></semantics></math> represents color parameters. Incorporating normals improves the computation of the covariance matrix <math alttext="\Sigma" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.16.m12.1"><semantics id="S3.SS2.SSS1.p1.16.m12.1a"><mi id="S3.SS2.SSS1.p1.16.m12.1.1" mathvariant="normal" xref="S3.SS2.SSS1.p1.16.m12.1.1.cmml">Σ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.16.m12.1b"><ci id="S3.SS2.SSS1.p1.16.m12.1.1.cmml" xref="S3.SS2.SSS1.p1.16.m12.1.1">Σ</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.16.m12.1c">\Sigma</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.16.m12.1d">roman_Σ</annotation></semantics></math>, where the <math alttext="\Sigma" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p1.17.m13.1"><semantics id="S3.SS2.SSS1.p1.17.m13.1a"><mi id="S3.SS2.SSS1.p1.17.m13.1.1" mathvariant="normal" xref="S3.SS2.SSS1.p1.17.m13.1.1.cmml">Σ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p1.17.m13.1b"><ci id="S3.SS2.SSS1.p1.17.m13.1.1.cmml" xref="S3.SS2.SSS1.p1.17.m13.1.1">Σ</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p1.17.m13.1c">\Sigma</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p1.17.m13.1d">roman_Σ</annotation></semantics></math> for each Gaussian is defined as</p> <table class="ltx_equation ltx_eqn_table" id="S3.E9"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\Sigma=R(\theta)^{\top}\,\text{diag}(s)\,R(\theta)." class="ltx_Math" display="block" id="S3.E9.m1.4"><semantics id="S3.E9.m1.4a"><mrow id="S3.E9.m1.4.4.1" xref="S3.E9.m1.4.4.1.1.cmml"><mrow id="S3.E9.m1.4.4.1.1" xref="S3.E9.m1.4.4.1.1.cmml"><mi id="S3.E9.m1.4.4.1.1.2" mathvariant="normal" xref="S3.E9.m1.4.4.1.1.2.cmml">Σ</mi><mo id="S3.E9.m1.4.4.1.1.1" xref="S3.E9.m1.4.4.1.1.1.cmml">=</mo><mrow id="S3.E9.m1.4.4.1.1.3" xref="S3.E9.m1.4.4.1.1.3.cmml"><mi id="S3.E9.m1.4.4.1.1.3.2" xref="S3.E9.m1.4.4.1.1.3.2.cmml">R</mi><mo id="S3.E9.m1.4.4.1.1.3.1" xref="S3.E9.m1.4.4.1.1.3.1.cmml"></mo><msup id="S3.E9.m1.4.4.1.1.3.3" xref="S3.E9.m1.4.4.1.1.3.3.cmml"><mrow id="S3.E9.m1.4.4.1.1.3.3.2.2" xref="S3.E9.m1.4.4.1.1.3.3.cmml"><mo id="S3.E9.m1.4.4.1.1.3.3.2.2.1" stretchy="false" xref="S3.E9.m1.4.4.1.1.3.3.cmml">(</mo><mi id="S3.E9.m1.1.1" xref="S3.E9.m1.1.1.cmml">θ</mi><mo id="S3.E9.m1.4.4.1.1.3.3.2.2.2" stretchy="false" xref="S3.E9.m1.4.4.1.1.3.3.cmml">)</mo></mrow><mo id="S3.E9.m1.4.4.1.1.3.3.3" xref="S3.E9.m1.4.4.1.1.3.3.3.cmml">⊤</mo></msup><mo id="S3.E9.m1.4.4.1.1.3.1a" xref="S3.E9.m1.4.4.1.1.3.1.cmml"></mo><mtext id="S3.E9.m1.4.4.1.1.3.4" xref="S3.E9.m1.4.4.1.1.3.4a.cmml">diag</mtext><mo id="S3.E9.m1.4.4.1.1.3.1b" xref="S3.E9.m1.4.4.1.1.3.1.cmml"></mo><mrow id="S3.E9.m1.4.4.1.1.3.5.2" xref="S3.E9.m1.4.4.1.1.3.cmml"><mo id="S3.E9.m1.4.4.1.1.3.5.2.1" stretchy="false" xref="S3.E9.m1.4.4.1.1.3.cmml">(</mo><mi id="S3.E9.m1.2.2" xref="S3.E9.m1.2.2.cmml">s</mi><mo id="S3.E9.m1.4.4.1.1.3.5.2.2" stretchy="false" xref="S3.E9.m1.4.4.1.1.3.cmml">)</mo></mrow><mo id="S3.E9.m1.4.4.1.1.3.1c" lspace="0.170em" xref="S3.E9.m1.4.4.1.1.3.1.cmml"></mo><mi id="S3.E9.m1.4.4.1.1.3.6" xref="S3.E9.m1.4.4.1.1.3.6.cmml">R</mi><mo id="S3.E9.m1.4.4.1.1.3.1d" xref="S3.E9.m1.4.4.1.1.3.1.cmml"></mo><mrow id="S3.E9.m1.4.4.1.1.3.7.2" xref="S3.E9.m1.4.4.1.1.3.cmml"><mo id="S3.E9.m1.4.4.1.1.3.7.2.1" stretchy="false" xref="S3.E9.m1.4.4.1.1.3.cmml">(</mo><mi id="S3.E9.m1.3.3" xref="S3.E9.m1.3.3.cmml">θ</mi><mo id="S3.E9.m1.4.4.1.1.3.7.2.2" stretchy="false" xref="S3.E9.m1.4.4.1.1.3.cmml">)</mo></mrow></mrow></mrow><mo id="S3.E9.m1.4.4.1.2" lspace="0em" xref="S3.E9.m1.4.4.1.1.cmml">.</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.E9.m1.4b"><apply id="S3.E9.m1.4.4.1.1.cmml" xref="S3.E9.m1.4.4.1"><eq id="S3.E9.m1.4.4.1.1.1.cmml" xref="S3.E9.m1.4.4.1.1.1"></eq><ci id="S3.E9.m1.4.4.1.1.2.cmml" xref="S3.E9.m1.4.4.1.1.2">Σ</ci><apply id="S3.E9.m1.4.4.1.1.3.cmml" xref="S3.E9.m1.4.4.1.1.3"><times id="S3.E9.m1.4.4.1.1.3.1.cmml" xref="S3.E9.m1.4.4.1.1.3.1"></times><ci id="S3.E9.m1.4.4.1.1.3.2.cmml" xref="S3.E9.m1.4.4.1.1.3.2">𝑅</ci><apply id="S3.E9.m1.4.4.1.1.3.3.cmml" xref="S3.E9.m1.4.4.1.1.3.3"><csymbol cd="ambiguous" id="S3.E9.m1.4.4.1.1.3.3.1.cmml" xref="S3.E9.m1.4.4.1.1.3.3">superscript</csymbol><ci id="S3.E9.m1.1.1.cmml" xref="S3.E9.m1.1.1">𝜃</ci><csymbol cd="latexml" id="S3.E9.m1.4.4.1.1.3.3.3.cmml" xref="S3.E9.m1.4.4.1.1.3.3.3">top</csymbol></apply><ci id="S3.E9.m1.4.4.1.1.3.4a.cmml" xref="S3.E9.m1.4.4.1.1.3.4"><mtext id="S3.E9.m1.4.4.1.1.3.4.cmml" xref="S3.E9.m1.4.4.1.1.3.4">diag</mtext></ci><ci id="S3.E9.m1.2.2.cmml" xref="S3.E9.m1.2.2">𝑠</ci><ci id="S3.E9.m1.4.4.1.1.3.6.cmml" xref="S3.E9.m1.4.4.1.1.3.6">𝑅</ci><ci id="S3.E9.m1.3.3.cmml" xref="S3.E9.m1.3.3">𝜃</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E9.m1.4c">\Sigma=R(\theta)^{\top}\,\text{diag}(s)\,R(\theta).</annotation><annotation encoding="application/x-llamapun" id="S3.E9.m1.4d">roman_Σ = italic_R ( italic_θ ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT diag ( italic_s ) italic_R ( italic_θ ) .</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(9)</span></td> </tr></tbody> </table> </div> <div class="ltx_para" id="S3.SS2.SSS1.p2"> <p class="ltx_p" id="S3.SS2.SSS1.p2.5">Here, <math alttext="R(\theta)" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p2.1.m1.1"><semantics id="S3.SS2.SSS1.p2.1.m1.1a"><mrow id="S3.SS2.SSS1.p2.1.m1.1.2" xref="S3.SS2.SSS1.p2.1.m1.1.2.cmml"><mi id="S3.SS2.SSS1.p2.1.m1.1.2.2" xref="S3.SS2.SSS1.p2.1.m1.1.2.2.cmml">R</mi><mo id="S3.SS2.SSS1.p2.1.m1.1.2.1" xref="S3.SS2.SSS1.p2.1.m1.1.2.1.cmml"></mo><mrow id="S3.SS2.SSS1.p2.1.m1.1.2.3.2" xref="S3.SS2.SSS1.p2.1.m1.1.2.cmml"><mo id="S3.SS2.SSS1.p2.1.m1.1.2.3.2.1" stretchy="false" xref="S3.SS2.SSS1.p2.1.m1.1.2.cmml">(</mo><mi id="S3.SS2.SSS1.p2.1.m1.1.1" xref="S3.SS2.SSS1.p2.1.m1.1.1.cmml">θ</mi><mo id="S3.SS2.SSS1.p2.1.m1.1.2.3.2.2" stretchy="false" xref="S3.SS2.SSS1.p2.1.m1.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p2.1.m1.1b"><apply id="S3.SS2.SSS1.p2.1.m1.1.2.cmml" xref="S3.SS2.SSS1.p2.1.m1.1.2"><times id="S3.SS2.SSS1.p2.1.m1.1.2.1.cmml" xref="S3.SS2.SSS1.p2.1.m1.1.2.1"></times><ci id="S3.SS2.SSS1.p2.1.m1.1.2.2.cmml" xref="S3.SS2.SSS1.p2.1.m1.1.2.2">𝑅</ci><ci id="S3.SS2.SSS1.p2.1.m1.1.1.cmml" xref="S3.SS2.SSS1.p2.1.m1.1.1">𝜃</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p2.1.m1.1c">R(\theta)</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p2.1.m1.1d">italic_R ( italic_θ )</annotation></semantics></math> is a rotation matrix parameterized by <math alttext="\theta" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p2.2.m2.1"><semantics id="S3.SS2.SSS1.p2.2.m2.1a"><mi id="S3.SS2.SSS1.p2.2.m2.1.1" xref="S3.SS2.SSS1.p2.2.m2.1.1.cmml">θ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p2.2.m2.1b"><ci id="S3.SS2.SSS1.p2.2.m2.1.1.cmml" xref="S3.SS2.SSS1.p2.2.m2.1.1">𝜃</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p2.2.m2.1c">\theta</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p2.2.m2.1d">italic_θ</annotation></semantics></math>, <math alttext="\text{diag}(s)" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p2.3.m3.1"><semantics id="S3.SS2.SSS1.p2.3.m3.1a"><mrow id="S3.SS2.SSS1.p2.3.m3.1.2" xref="S3.SS2.SSS1.p2.3.m3.1.2.cmml"><mtext id="S3.SS2.SSS1.p2.3.m3.1.2.2" xref="S3.SS2.SSS1.p2.3.m3.1.2.2a.cmml">diag</mtext><mo id="S3.SS2.SSS1.p2.3.m3.1.2.1" xref="S3.SS2.SSS1.p2.3.m3.1.2.1.cmml"></mo><mrow id="S3.SS2.SSS1.p2.3.m3.1.2.3.2" xref="S3.SS2.SSS1.p2.3.m3.1.2.cmml"><mo id="S3.SS2.SSS1.p2.3.m3.1.2.3.2.1" stretchy="false" xref="S3.SS2.SSS1.p2.3.m3.1.2.cmml">(</mo><mi id="S3.SS2.SSS1.p2.3.m3.1.1" xref="S3.SS2.SSS1.p2.3.m3.1.1.cmml">s</mi><mo id="S3.SS2.SSS1.p2.3.m3.1.2.3.2.2" stretchy="false" xref="S3.SS2.SSS1.p2.3.m3.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p2.3.m3.1b"><apply id="S3.SS2.SSS1.p2.3.m3.1.2.cmml" xref="S3.SS2.SSS1.p2.3.m3.1.2"><times id="S3.SS2.SSS1.p2.3.m3.1.2.1.cmml" xref="S3.SS2.SSS1.p2.3.m3.1.2.1"></times><ci id="S3.SS2.SSS1.p2.3.m3.1.2.2a.cmml" xref="S3.SS2.SSS1.p2.3.m3.1.2.2"><mtext id="S3.SS2.SSS1.p2.3.m3.1.2.2.cmml" xref="S3.SS2.SSS1.p2.3.m3.1.2.2">diag</mtext></ci><ci id="S3.SS2.SSS1.p2.3.m3.1.1.cmml" xref="S3.SS2.SSS1.p2.3.m3.1.1">𝑠</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p2.3.m3.1c">\text{diag}(s)</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p2.3.m3.1d">diag ( italic_s )</annotation></semantics></math> represents a diagonal matrix of scales, and <math alttext="\lambda" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p2.4.m4.1"><semantics id="S3.SS2.SSS1.p2.4.m4.1a"><mi id="S3.SS2.SSS1.p2.4.m4.1.1" xref="S3.SS2.SSS1.p2.4.m4.1.1.cmml">λ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p2.4.m4.1b"><ci id="S3.SS2.SSS1.p2.4.m4.1.1.cmml" xref="S3.SS2.SSS1.p2.4.m4.1.1">𝜆</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p2.4.m4.1c">\lambda</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p2.4.m4.1d">italic_λ</annotation></semantics></math> is a weight factor that scales the contribution of the normal vector <math alttext="\gamma" class="ltx_Math" display="inline" id="S3.SS2.SSS1.p2.5.m5.1"><semantics id="S3.SS2.SSS1.p2.5.m5.1a"><mi id="S3.SS2.SSS1.p2.5.m5.1.1" xref="S3.SS2.SSS1.p2.5.m5.1.1.cmml">γ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS1.p2.5.m5.1b"><ci id="S3.SS2.SSS1.p2.5.m5.1.1.cmml" xref="S3.SS2.SSS1.p2.5.m5.1.1">𝛾</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS1.p2.5.m5.1c">\gamma</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS1.p2.5.m5.1d">italic_γ</annotation></semantics></math>. This formulation allows the model to effectively adjust the shape and spread of the Gaussian representations based on both the geometric and photometric properties of the scene, leading to more accurate rendering.</p> </div> </section> <section class="ltx_subsubsection" id="S3.SS2.SSS2"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection">3.2.2 </span>Niagara Encoder</h4> <div class="ltx_para ltx_noindent" id="S3.SS2.SSS2.p1"> <p class="ltx_p" id="S3.SS2.SSS2.p1.2"><span class="ltx_text ltx_font_bold" id="S3.SS2.SSS2.p1.2.1">3D Self-Attention.</span> In our framework, we use self-attention <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib66" title="">66</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib69" title="">69</a>]</cite>, which enhances geometric constraint performance by allowing geometric features directly associated with different locations to take into account spatial location information. Similar to the findings in MVDream <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib54" title="">54</a>]</cite>, we observe that simple temporal attention fails to learn multi-view consistency and that content drift remains an issue even after fine-tuning on a 3D-rendered dataset. Therefore, we decide to use a 3D attention mechanism similar to MVDream (see <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#S2.F3" title="Figure 3 ‣ 2 Related Work ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Figure 3</span></a>), which produces fairly consistent images even when the view gap is huge based on our findings. Specifically, given a tensor of shape <math alttext="[B,H,W,C]" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p1.1.m1.4"><semantics id="S3.SS2.SSS2.p1.1.m1.4a"><mrow id="S3.SS2.SSS2.p1.1.m1.4.5.2" xref="S3.SS2.SSS2.p1.1.m1.4.5.1.cmml"><mo id="S3.SS2.SSS2.p1.1.m1.4.5.2.1" stretchy="false" xref="S3.SS2.SSS2.p1.1.m1.4.5.1.cmml">[</mo><mi id="S3.SS2.SSS2.p1.1.m1.1.1" xref="S3.SS2.SSS2.p1.1.m1.1.1.cmml">B</mi><mo id="S3.SS2.SSS2.p1.1.m1.4.5.2.2" xref="S3.SS2.SSS2.p1.1.m1.4.5.1.cmml">,</mo><mi id="S3.SS2.SSS2.p1.1.m1.2.2" xref="S3.SS2.SSS2.p1.1.m1.2.2.cmml">H</mi><mo id="S3.SS2.SSS2.p1.1.m1.4.5.2.3" xref="S3.SS2.SSS2.p1.1.m1.4.5.1.cmml">,</mo><mi id="S3.SS2.SSS2.p1.1.m1.3.3" xref="S3.SS2.SSS2.p1.1.m1.3.3.cmml">W</mi><mo id="S3.SS2.SSS2.p1.1.m1.4.5.2.4" xref="S3.SS2.SSS2.p1.1.m1.4.5.1.cmml">,</mo><mi id="S3.SS2.SSS2.p1.1.m1.4.4" xref="S3.SS2.SSS2.p1.1.m1.4.4.cmml">C</mi><mo id="S3.SS2.SSS2.p1.1.m1.4.5.2.5" stretchy="false" xref="S3.SS2.SSS2.p1.1.m1.4.5.1.cmml">]</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p1.1.m1.4b"><list id="S3.SS2.SSS2.p1.1.m1.4.5.1.cmml" xref="S3.SS2.SSS2.p1.1.m1.4.5.2"><ci id="S3.SS2.SSS2.p1.1.m1.1.1.cmml" xref="S3.SS2.SSS2.p1.1.m1.1.1">𝐵</ci><ci id="S3.SS2.SSS2.p1.1.m1.2.2.cmml" xref="S3.SS2.SSS2.p1.1.m1.2.2">𝐻</ci><ci id="S3.SS2.SSS2.p1.1.m1.3.3.cmml" xref="S3.SS2.SSS2.p1.1.m1.3.3">𝑊</ci><ci id="S3.SS2.SSS2.p1.1.m1.4.4.cmml" xref="S3.SS2.SSS2.p1.1.m1.4.4">𝐶</ci></list></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p1.1.m1.4c">[B,H,W,C]</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p1.1.m1.4d">[ italic_B , italic_H , italic_W , italic_C ]</annotation></semantics></math>, we format it to <math alttext="[B,H\times W,C]" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p1.2.m2.3"><semantics id="S3.SS2.SSS2.p1.2.m2.3a"><mrow id="S3.SS2.SSS2.p1.2.m2.3.3.1" xref="S3.SS2.SSS2.p1.2.m2.3.3.2.cmml"><mo id="S3.SS2.SSS2.p1.2.m2.3.3.1.2" stretchy="false" xref="S3.SS2.SSS2.p1.2.m2.3.3.2.cmml">[</mo><mi id="S3.SS2.SSS2.p1.2.m2.1.1" xref="S3.SS2.SSS2.p1.2.m2.1.1.cmml">B</mi><mo id="S3.SS2.SSS2.p1.2.m2.3.3.1.3" xref="S3.SS2.SSS2.p1.2.m2.3.3.2.cmml">,</mo><mrow id="S3.SS2.SSS2.p1.2.m2.3.3.1.1" xref="S3.SS2.SSS2.p1.2.m2.3.3.1.1.cmml"><mi id="S3.SS2.SSS2.p1.2.m2.3.3.1.1.2" xref="S3.SS2.SSS2.p1.2.m2.3.3.1.1.2.cmml">H</mi><mo id="S3.SS2.SSS2.p1.2.m2.3.3.1.1.1" lspace="0.222em" rspace="0.222em" xref="S3.SS2.SSS2.p1.2.m2.3.3.1.1.1.cmml">×</mo><mi id="S3.SS2.SSS2.p1.2.m2.3.3.1.1.3" xref="S3.SS2.SSS2.p1.2.m2.3.3.1.1.3.cmml">W</mi></mrow><mo id="S3.SS2.SSS2.p1.2.m2.3.3.1.4" xref="S3.SS2.SSS2.p1.2.m2.3.3.2.cmml">,</mo><mi id="S3.SS2.SSS2.p1.2.m2.2.2" xref="S3.SS2.SSS2.p1.2.m2.2.2.cmml">C</mi><mo id="S3.SS2.SSS2.p1.2.m2.3.3.1.5" stretchy="false" xref="S3.SS2.SSS2.p1.2.m2.3.3.2.cmml">]</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p1.2.m2.3b"><list id="S3.SS2.SSS2.p1.2.m2.3.3.2.cmml" xref="S3.SS2.SSS2.p1.2.m2.3.3.1"><ci id="S3.SS2.SSS2.p1.2.m2.1.1.cmml" xref="S3.SS2.SSS2.p1.2.m2.1.1">𝐵</ci><apply id="S3.SS2.SSS2.p1.2.m2.3.3.1.1.cmml" xref="S3.SS2.SSS2.p1.2.m2.3.3.1.1"><times id="S3.SS2.SSS2.p1.2.m2.3.3.1.1.1.cmml" xref="S3.SS2.SSS2.p1.2.m2.3.3.1.1.1"></times><ci id="S3.SS2.SSS2.p1.2.m2.3.3.1.1.2.cmml" xref="S3.SS2.SSS2.p1.2.m2.3.3.1.1.2">𝐻</ci><ci id="S3.SS2.SSS2.p1.2.m2.3.3.1.1.3.cmml" xref="S3.SS2.SSS2.p1.2.m2.3.3.1.1.3">𝑊</ci></apply><ci id="S3.SS2.SSS2.p1.2.m2.2.2.cmml" xref="S3.SS2.SSS2.p1.2.m2.2.2">𝐶</ci></list></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p1.2.m2.3c">[B,H\times W,C]</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p1.2.m2.3d">[ italic_B , italic_H × italic_W , italic_C ]</annotation></semantics></math> for self-attention, where the second dimension is the sequence dimension representing the number of tokens. This way, we can also inherit all the module weights from the original 2D self-attention,</p> <table class="ltx_equation ltx_eqn_table" id="S3.E10"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\begin{split}&x=\mathrm{rearrange}\bigl{(}x,B\,C\,H\,W\rightarrow B\,(H\,W)\,C% \bigr{)},\\ &x=x+\mathrm{3DSelfAttn}(x),\\ &x=\mathrm{rearrange}\bigl{(}x,B\,(H\,W)\,C\rightarrow B\,C\,H\,W\bigr{)},\end% {split}" class="ltx_Math" display="block" id="S3.E10.m1.60"><semantics id="S3.E10.m1.60a"><mtable columnspacing="0pt" displaystyle="true" id="S3.E10.m1.60.60.4" rowspacing="0pt"><mtr id="S3.E10.m1.60.60.4a"><mtd id="S3.E10.m1.60.60.4b"></mtd><mtd class="ltx_align_left" columnalign="left" id="S3.E10.m1.60.60.4c"><mrow id="S3.E10.m1.58.58.2.57.20.20.20"><mrow id="S3.E10.m1.58.58.2.57.20.20.20.1"><mi id="S3.E10.m1.1.1.1.1.1.1" xref="S3.E10.m1.1.1.1.1.1.1.cmml">x</mi><mo id="S3.E10.m1.2.2.2.2.2.2" xref="S3.E10.m1.2.2.2.2.2.2.cmml">=</mo><mrow id="S3.E10.m1.58.58.2.57.20.20.20.1.1"><mi id="S3.E10.m1.3.3.3.3.3.3" xref="S3.E10.m1.3.3.3.3.3.3.cmml">rearrange</mi><mo id="S3.E10.m1.58.58.2.57.20.20.20.1.1.2"></mo><mrow id="S3.E10.m1.58.58.2.57.20.20.20.1.1.1.1"><mo id="S3.E10.m1.4.4.4.4.4.4" maxsize="120%" minsize="120%">(</mo><mrow id="S3.E10.m1.58.58.2.57.20.20.20.1.1.1.1.1"><mrow id="S3.E10.m1.58.58.2.57.20.20.20.1.1.1.1.1.1.1"><mi id="S3.E10.m1.5.5.5.5.5.5" xref="S3.E10.m1.5.5.5.5.5.5.cmml">x</mi><mo id="S3.E10.m1.6.6.6.6.6.6">,</mo><mrow id="S3.E10.m1.58.58.2.57.20.20.20.1.1.1.1.1.1.1.1"><mi id="S3.E10.m1.7.7.7.7.7.7" xref="S3.E10.m1.7.7.7.7.7.7.cmml">B</mi><mo id="S3.E10.m1.58.58.2.57.20.20.20.1.1.1.1.1.1.1.1.1" lspace="0.170em"></mo><mi id="S3.E10.m1.8.8.8.8.8.8" xref="S3.E10.m1.8.8.8.8.8.8.cmml">C</mi><mo id="S3.E10.m1.58.58.2.57.20.20.20.1.1.1.1.1.1.1.1.1a" lspace="0.170em"></mo><mi id="S3.E10.m1.9.9.9.9.9.9" xref="S3.E10.m1.9.9.9.9.9.9.cmml">H</mi><mo id="S3.E10.m1.58.58.2.57.20.20.20.1.1.1.1.1.1.1.1.1b" lspace="0.170em"></mo><mi id="S3.E10.m1.10.10.10.10.10.10" xref="S3.E10.m1.10.10.10.10.10.10.cmml">W</mi></mrow></mrow><mo id="S3.E10.m1.11.11.11.11.11.11" stretchy="false" xref="S3.E10.m1.11.11.11.11.11.11.cmml">→</mo><mrow id="S3.E10.m1.58.58.2.57.20.20.20.1.1.1.1.1.2"><mi id="S3.E10.m1.12.12.12.12.12.12" xref="S3.E10.m1.12.12.12.12.12.12.cmml">B</mi><mo id="S3.E10.m1.58.58.2.57.20.20.20.1.1.1.1.1.2.2" lspace="0.170em"></mo><mrow id="S3.E10.m1.58.58.2.57.20.20.20.1.1.1.1.1.2.1.1"><mo id="S3.E10.m1.13.13.13.13.13.13" stretchy="false">(</mo><mrow id="S3.E10.m1.58.58.2.57.20.20.20.1.1.1.1.1.2.1.1.1"><mi id="S3.E10.m1.14.14.14.14.14.14" xref="S3.E10.m1.14.14.14.14.14.14.cmml">H</mi><mo id="S3.E10.m1.58.58.2.57.20.20.20.1.1.1.1.1.2.1.1.1.1" lspace="0.170em"></mo><mi id="S3.E10.m1.15.15.15.15.15.15" xref="S3.E10.m1.15.15.15.15.15.15.cmml">W</mi></mrow><mo id="S3.E10.m1.16.16.16.16.16.16" stretchy="false">)</mo></mrow><mo id="S3.E10.m1.58.58.2.57.20.20.20.1.1.1.1.1.2.2a" lspace="0.170em"></mo><mi id="S3.E10.m1.17.17.17.17.17.17" xref="S3.E10.m1.17.17.17.17.17.17.cmml">C</mi></mrow></mrow><mo id="S3.E10.m1.18.18.18.18.18.18" maxsize="120%" minsize="120%">)</mo></mrow></mrow></mrow><mo id="S3.E10.m1.19.19.19.19.19.19">,</mo></mrow></mtd></mtr><mtr id="S3.E10.m1.60.60.4d"><mtd id="S3.E10.m1.60.60.4e"></mtd><mtd class="ltx_align_left" columnalign="left" id="S3.E10.m1.60.60.4f"><mrow id="S3.E10.m1.59.59.3.58.19.19.19"><mrow id="S3.E10.m1.59.59.3.58.19.19.19.1"><mi id="S3.E10.m1.20.20.20.1.1.1" xref="S3.E10.m1.20.20.20.1.1.1.cmml">x</mi><mo id="S3.E10.m1.21.21.21.2.2.2" xref="S3.E10.m1.21.21.21.2.2.2.cmml">=</mo><mrow id="S3.E10.m1.59.59.3.58.19.19.19.1.1"><mi id="S3.E10.m1.22.22.22.3.3.3" xref="S3.E10.m1.22.22.22.3.3.3.cmml">x</mi><mo id="S3.E10.m1.23.23.23.4.4.4" xref="S3.E10.m1.23.23.23.4.4.4.cmml">+</mo><mrow id="S3.E10.m1.59.59.3.58.19.19.19.1.1.1"><mn id="S3.E10.m1.24.24.24.5.5.5" xref="S3.E10.m1.24.24.24.5.5.5.cmml">3</mn><mo id="S3.E10.m1.59.59.3.58.19.19.19.1.1.1.1"></mo><mi id="S3.E10.m1.25.25.25.6.6.6" mathvariant="normal" xref="S3.E10.m1.25.25.25.6.6.6.cmml">D</mi><mo id="S3.E10.m1.59.59.3.58.19.19.19.1.1.1.1a"></mo><mi id="S3.E10.m1.26.26.26.7.7.7" mathvariant="normal" xref="S3.E10.m1.26.26.26.7.7.7.cmml">S</mi><mo id="S3.E10.m1.59.59.3.58.19.19.19.1.1.1.1b"></mo><mi id="S3.E10.m1.27.27.27.8.8.8" mathvariant="normal" xref="S3.E10.m1.27.27.27.8.8.8.cmml">e</mi><mo id="S3.E10.m1.59.59.3.58.19.19.19.1.1.1.1c"></mo><mi id="S3.E10.m1.28.28.28.9.9.9" mathvariant="normal" xref="S3.E10.m1.28.28.28.9.9.9.cmml">l</mi><mo id="S3.E10.m1.59.59.3.58.19.19.19.1.1.1.1d"></mo><mi id="S3.E10.m1.29.29.29.10.10.10" mathvariant="normal" xref="S3.E10.m1.29.29.29.10.10.10.cmml">f</mi><mo id="S3.E10.m1.59.59.3.58.19.19.19.1.1.1.1e"></mo><mi id="S3.E10.m1.30.30.30.11.11.11" mathvariant="normal" xref="S3.E10.m1.30.30.30.11.11.11.cmml">A</mi><mo id="S3.E10.m1.59.59.3.58.19.19.19.1.1.1.1f"></mo><mi id="S3.E10.m1.31.31.31.12.12.12" mathvariant="normal" xref="S3.E10.m1.31.31.31.12.12.12.cmml">t</mi><mo id="S3.E10.m1.59.59.3.58.19.19.19.1.1.1.1g"></mo><mi id="S3.E10.m1.32.32.32.13.13.13" mathvariant="normal" xref="S3.E10.m1.32.32.32.13.13.13.cmml">t</mi><mo id="S3.E10.m1.59.59.3.58.19.19.19.1.1.1.1h"></mo><mi id="S3.E10.m1.33.33.33.14.14.14" mathvariant="normal" xref="S3.E10.m1.33.33.33.14.14.14.cmml">n</mi><mo id="S3.E10.m1.59.59.3.58.19.19.19.1.1.1.1i"></mo><mrow id="S3.E10.m1.59.59.3.58.19.19.19.1.1.1.2"><mo id="S3.E10.m1.34.34.34.15.15.15" stretchy="false">(</mo><mi id="S3.E10.m1.35.35.35.16.16.16" xref="S3.E10.m1.35.35.35.16.16.16.cmml">x</mi><mo id="S3.E10.m1.36.36.36.17.17.17" stretchy="false">)</mo></mrow></mrow></mrow></mrow><mo id="S3.E10.m1.37.37.37.18.18.18">,</mo></mrow></mtd></mtr><mtr id="S3.E10.m1.60.60.4g"><mtd id="S3.E10.m1.60.60.4h"></mtd><mtd class="ltx_align_left" columnalign="left" id="S3.E10.m1.60.60.4i"><mrow id="S3.E10.m1.60.60.4.59.20.20.20"><mrow id="S3.E10.m1.60.60.4.59.20.20.20.1"><mi id="S3.E10.m1.38.38.38.1.1.1" xref="S3.E10.m1.38.38.38.1.1.1.cmml">x</mi><mo id="S3.E10.m1.39.39.39.2.2.2" xref="S3.E10.m1.39.39.39.2.2.2.cmml">=</mo><mrow id="S3.E10.m1.60.60.4.59.20.20.20.1.1"><mi id="S3.E10.m1.40.40.40.3.3.3" xref="S3.E10.m1.40.40.40.3.3.3.cmml">rearrange</mi><mo id="S3.E10.m1.60.60.4.59.20.20.20.1.1.2"></mo><mrow id="S3.E10.m1.60.60.4.59.20.20.20.1.1.1.1"><mo id="S3.E10.m1.41.41.41.4.4.4" maxsize="120%" minsize="120%">(</mo><mrow id="S3.E10.m1.60.60.4.59.20.20.20.1.1.1.1.1"><mrow id="S3.E10.m1.60.60.4.59.20.20.20.1.1.1.1.1.1.1"><mi id="S3.E10.m1.42.42.42.5.5.5" xref="S3.E10.m1.42.42.42.5.5.5.cmml">x</mi><mo id="S3.E10.m1.43.43.43.6.6.6">,</mo><mrow id="S3.E10.m1.60.60.4.59.20.20.20.1.1.1.1.1.1.1.1"><mi id="S3.E10.m1.44.44.44.7.7.7" xref="S3.E10.m1.44.44.44.7.7.7.cmml">B</mi><mo id="S3.E10.m1.60.60.4.59.20.20.20.1.1.1.1.1.1.1.1.2" lspace="0.170em"></mo><mrow id="S3.E10.m1.60.60.4.59.20.20.20.1.1.1.1.1.1.1.1.1.1"><mo id="S3.E10.m1.45.45.45.8.8.8" stretchy="false">(</mo><mrow id="S3.E10.m1.60.60.4.59.20.20.20.1.1.1.1.1.1.1.1.1.1.1"><mi id="S3.E10.m1.46.46.46.9.9.9" xref="S3.E10.m1.46.46.46.9.9.9.cmml">H</mi><mo id="S3.E10.m1.60.60.4.59.20.20.20.1.1.1.1.1.1.1.1.1.1.1.1" lspace="0.170em"></mo><mi id="S3.E10.m1.47.47.47.10.10.10" xref="S3.E10.m1.47.47.47.10.10.10.cmml">W</mi></mrow><mo id="S3.E10.m1.48.48.48.11.11.11" stretchy="false">)</mo></mrow><mo id="S3.E10.m1.60.60.4.59.20.20.20.1.1.1.1.1.1.1.1.2a" lspace="0.170em"></mo><mi id="S3.E10.m1.49.49.49.12.12.12" xref="S3.E10.m1.49.49.49.12.12.12.cmml">C</mi></mrow></mrow><mo id="S3.E10.m1.50.50.50.13.13.13" stretchy="false" xref="S3.E10.m1.50.50.50.13.13.13.cmml">→</mo><mrow id="S3.E10.m1.60.60.4.59.20.20.20.1.1.1.1.1.2"><mi id="S3.E10.m1.51.51.51.14.14.14" xref="S3.E10.m1.51.51.51.14.14.14.cmml">B</mi><mo id="S3.E10.m1.60.60.4.59.20.20.20.1.1.1.1.1.2.1" lspace="0.170em"></mo><mi id="S3.E10.m1.52.52.52.15.15.15" xref="S3.E10.m1.52.52.52.15.15.15.cmml">C</mi><mo id="S3.E10.m1.60.60.4.59.20.20.20.1.1.1.1.1.2.1a" lspace="0.170em"></mo><mi id="S3.E10.m1.53.53.53.16.16.16" xref="S3.E10.m1.53.53.53.16.16.16.cmml">H</mi><mo id="S3.E10.m1.60.60.4.59.20.20.20.1.1.1.1.1.2.1b" lspace="0.170em"></mo><mi id="S3.E10.m1.54.54.54.17.17.17" xref="S3.E10.m1.54.54.54.17.17.17.cmml">W</mi></mrow></mrow><mo id="S3.E10.m1.55.55.55.18.18.18" maxsize="120%" minsize="120%">)</mo></mrow></mrow></mrow><mo id="S3.E10.m1.56.56.56.19.19.19">,</mo></mrow></mtd></mtr></mtable><annotation-xml encoding="MathML-Content" id="S3.E10.m1.60b"><apply id="S3.E10.m1.57.57.1.1.1.3.cmml"><csymbol cd="ambiguous" id="S3.E10.m1.57.57.1.1.1.3a.cmml">formulae-sequence</csymbol><apply id="S3.E10.m1.57.57.1.1.1.1.1.cmml"><eq id="S3.E10.m1.2.2.2.2.2.2.cmml" xref="S3.E10.m1.2.2.2.2.2.2"></eq><ci id="S3.E10.m1.1.1.1.1.1.1.cmml" xref="S3.E10.m1.1.1.1.1.1.1">𝑥</ci><apply id="S3.E10.m1.57.57.1.1.1.1.1.1.cmml"><times id="S3.E10.m1.57.57.1.1.1.1.1.1.2.cmml"></times><ci id="S3.E10.m1.3.3.3.3.3.3.cmml" xref="S3.E10.m1.3.3.3.3.3.3">rearrange</ci><apply id="S3.E10.m1.57.57.1.1.1.1.1.1.1.1.1.cmml"><ci id="S3.E10.m1.11.11.11.11.11.11.cmml" xref="S3.E10.m1.11.11.11.11.11.11">→</ci><list id="S3.E10.m1.57.57.1.1.1.1.1.1.1.1.1.1.2.cmml"><ci id="S3.E10.m1.5.5.5.5.5.5.cmml" xref="S3.E10.m1.5.5.5.5.5.5">𝑥</ci><apply id="S3.E10.m1.57.57.1.1.1.1.1.1.1.1.1.1.1.1.cmml"><times id="S3.E10.m1.57.57.1.1.1.1.1.1.1.1.1.1.1.1.1.cmml"></times><ci id="S3.E10.m1.7.7.7.7.7.7.cmml" xref="S3.E10.m1.7.7.7.7.7.7">𝐵</ci><ci id="S3.E10.m1.8.8.8.8.8.8.cmml" xref="S3.E10.m1.8.8.8.8.8.8">𝐶</ci><ci id="S3.E10.m1.9.9.9.9.9.9.cmml" xref="S3.E10.m1.9.9.9.9.9.9">𝐻</ci><ci id="S3.E10.m1.10.10.10.10.10.10.cmml" xref="S3.E10.m1.10.10.10.10.10.10">𝑊</ci></apply></list><apply id="S3.E10.m1.57.57.1.1.1.1.1.1.1.1.1.2.cmml"><times id="S3.E10.m1.57.57.1.1.1.1.1.1.1.1.1.2.2.cmml"></times><ci id="S3.E10.m1.12.12.12.12.12.12.cmml" xref="S3.E10.m1.12.12.12.12.12.12">𝐵</ci><apply id="S3.E10.m1.57.57.1.1.1.1.1.1.1.1.1.2.1.1.1.cmml"><times id="S3.E10.m1.57.57.1.1.1.1.1.1.1.1.1.2.1.1.1.1.cmml"></times><ci id="S3.E10.m1.14.14.14.14.14.14.cmml" xref="S3.E10.m1.14.14.14.14.14.14">𝐻</ci><ci id="S3.E10.m1.15.15.15.15.15.15.cmml" xref="S3.E10.m1.15.15.15.15.15.15">𝑊</ci></apply><ci id="S3.E10.m1.17.17.17.17.17.17.cmml" xref="S3.E10.m1.17.17.17.17.17.17">𝐶</ci></apply></apply></apply></apply><apply id="S3.E10.m1.57.57.1.1.1.2.2.3.cmml"><csymbol cd="ambiguous" id="S3.E10.m1.57.57.1.1.1.2.2.3a.cmml">formulae-sequence</csymbol><apply id="S3.E10.m1.57.57.1.1.1.2.2.1.1.cmml"><eq id="S3.E10.m1.21.21.21.2.2.2.cmml" xref="S3.E10.m1.21.21.21.2.2.2"></eq><ci id="S3.E10.m1.20.20.20.1.1.1.cmml" xref="S3.E10.m1.20.20.20.1.1.1">𝑥</ci><apply id="S3.E10.m1.57.57.1.1.1.2.2.1.1.3.cmml"><plus id="S3.E10.m1.23.23.23.4.4.4.cmml" xref="S3.E10.m1.23.23.23.4.4.4"></plus><ci id="S3.E10.m1.22.22.22.3.3.3.cmml" xref="S3.E10.m1.22.22.22.3.3.3">𝑥</ci><apply id="S3.E10.m1.57.57.1.1.1.2.2.1.1.3.3.cmml"><times id="S3.E10.m1.57.57.1.1.1.2.2.1.1.3.3.1.cmml"></times><cn id="S3.E10.m1.24.24.24.5.5.5.cmml" type="integer" xref="S3.E10.m1.24.24.24.5.5.5">3</cn><ci id="S3.E10.m1.25.25.25.6.6.6.cmml" xref="S3.E10.m1.25.25.25.6.6.6">D</ci><ci id="S3.E10.m1.26.26.26.7.7.7.cmml" xref="S3.E10.m1.26.26.26.7.7.7">S</ci><ci id="S3.E10.m1.27.27.27.8.8.8.cmml" xref="S3.E10.m1.27.27.27.8.8.8">e</ci><ci id="S3.E10.m1.28.28.28.9.9.9.cmml" xref="S3.E10.m1.28.28.28.9.9.9">l</ci><ci id="S3.E10.m1.29.29.29.10.10.10.cmml" xref="S3.E10.m1.29.29.29.10.10.10">f</ci><ci id="S3.E10.m1.30.30.30.11.11.11.cmml" xref="S3.E10.m1.30.30.30.11.11.11">A</ci><ci id="S3.E10.m1.31.31.31.12.12.12.cmml" xref="S3.E10.m1.31.31.31.12.12.12">t</ci><ci id="S3.E10.m1.32.32.32.13.13.13.cmml" xref="S3.E10.m1.32.32.32.13.13.13">t</ci><ci id="S3.E10.m1.33.33.33.14.14.14.cmml" xref="S3.E10.m1.33.33.33.14.14.14">n</ci><ci id="S3.E10.m1.35.35.35.16.16.16.cmml" xref="S3.E10.m1.35.35.35.16.16.16">𝑥</ci></apply></apply></apply><apply id="S3.E10.m1.57.57.1.1.1.2.2.2.2.cmml"><eq id="S3.E10.m1.39.39.39.2.2.2.cmml" xref="S3.E10.m1.39.39.39.2.2.2"></eq><ci id="S3.E10.m1.38.38.38.1.1.1.cmml" xref="S3.E10.m1.38.38.38.1.1.1">𝑥</ci><apply id="S3.E10.m1.57.57.1.1.1.2.2.2.2.1.cmml"><times id="S3.E10.m1.57.57.1.1.1.2.2.2.2.1.2.cmml"></times><ci id="S3.E10.m1.40.40.40.3.3.3.cmml" xref="S3.E10.m1.40.40.40.3.3.3">rearrange</ci><apply id="S3.E10.m1.57.57.1.1.1.2.2.2.2.1.1.1.1.cmml"><ci id="S3.E10.m1.50.50.50.13.13.13.cmml" xref="S3.E10.m1.50.50.50.13.13.13">→</ci><list id="S3.E10.m1.57.57.1.1.1.2.2.2.2.1.1.1.1.1.2.cmml"><ci id="S3.E10.m1.42.42.42.5.5.5.cmml" xref="S3.E10.m1.42.42.42.5.5.5">𝑥</ci><apply id="S3.E10.m1.57.57.1.1.1.2.2.2.2.1.1.1.1.1.1.1.cmml"><times id="S3.E10.m1.57.57.1.1.1.2.2.2.2.1.1.1.1.1.1.1.2.cmml"></times><ci id="S3.E10.m1.44.44.44.7.7.7.cmml" xref="S3.E10.m1.44.44.44.7.7.7">𝐵</ci><apply id="S3.E10.m1.57.57.1.1.1.2.2.2.2.1.1.1.1.1.1.1.1.1.1.cmml"><times id="S3.E10.m1.57.57.1.1.1.2.2.2.2.1.1.1.1.1.1.1.1.1.1.1.cmml"></times><ci id="S3.E10.m1.46.46.46.9.9.9.cmml" xref="S3.E10.m1.46.46.46.9.9.9">𝐻</ci><ci id="S3.E10.m1.47.47.47.10.10.10.cmml" xref="S3.E10.m1.47.47.47.10.10.10">𝑊</ci></apply><ci id="S3.E10.m1.49.49.49.12.12.12.cmml" xref="S3.E10.m1.49.49.49.12.12.12">𝐶</ci></apply></list><apply id="S3.E10.m1.57.57.1.1.1.2.2.2.2.1.1.1.1.3.cmml"><times id="S3.E10.m1.57.57.1.1.1.2.2.2.2.1.1.1.1.3.1.cmml"></times><ci id="S3.E10.m1.51.51.51.14.14.14.cmml" xref="S3.E10.m1.51.51.51.14.14.14">𝐵</ci><ci id="S3.E10.m1.52.52.52.15.15.15.cmml" xref="S3.E10.m1.52.52.52.15.15.15">𝐶</ci><ci id="S3.E10.m1.53.53.53.16.16.16.cmml" xref="S3.E10.m1.53.53.53.16.16.16">𝐻</ci><ci id="S3.E10.m1.54.54.54.17.17.17.cmml" xref="S3.E10.m1.54.54.54.17.17.17">𝑊</ci></apply></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E10.m1.60c">\begin{split}&x=\mathrm{rearrange}\bigl{(}x,B\,C\,H\,W\rightarrow B\,(H\,W)\,C% \bigr{)},\\ &x=x+\mathrm{3DSelfAttn}(x),\\ &x=\mathrm{rearrange}\bigl{(}x,B\,(H\,W)\,C\rightarrow B\,C\,H\,W\bigr{)},\end% {split}</annotation><annotation encoding="application/x-llamapun" id="S3.E10.m1.60d">start_ROW start_CELL end_CELL start_CELL italic_x = roman_rearrange ( italic_x , italic_B italic_C italic_H italic_W → italic_B ( italic_H italic_W ) italic_C ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_x = italic_x + 3 roman_D roman_S roman_e roman_l roman_f roman_A roman_t roman_t roman_n ( italic_x ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_x = roman_rearrange ( italic_x , italic_B ( italic_H italic_W ) italic_C → italic_B italic_C italic_H italic_W ) , end_CELL end_ROW</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(10)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS2.SSS2.p1.7">where <math alttext="x" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p1.3.m1.1"><semantics id="S3.SS2.SSS2.p1.3.m1.1a"><mi id="S3.SS2.SSS2.p1.3.m1.1.1" xref="S3.SS2.SSS2.p1.3.m1.1.1.cmml">x</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p1.3.m1.1b"><ci id="S3.SS2.SSS2.p1.3.m1.1.1.cmml" xref="S3.SS2.SSS2.p1.3.m1.1.1">𝑥</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p1.3.m1.1c">x</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p1.3.m1.1d">italic_x</annotation></semantics></math> is the feature, <math alttext="B" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p1.4.m2.1"><semantics id="S3.SS2.SSS2.p1.4.m2.1a"><mi id="S3.SS2.SSS2.p1.4.m2.1.1" xref="S3.SS2.SSS2.p1.4.m2.1.1.cmml">B</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p1.4.m2.1b"><ci id="S3.SS2.SSS2.p1.4.m2.1.1.cmml" xref="S3.SS2.SSS2.p1.4.m2.1.1">𝐵</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p1.4.m2.1c">B</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p1.4.m2.1d">italic_B</annotation></semantics></math>, <math alttext="H" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p1.5.m3.1"><semantics id="S3.SS2.SSS2.p1.5.m3.1a"><mi id="S3.SS2.SSS2.p1.5.m3.1.1" xref="S3.SS2.SSS2.p1.5.m3.1.1.cmml">H</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p1.5.m3.1b"><ci id="S3.SS2.SSS2.p1.5.m3.1.1.cmml" xref="S3.SS2.SSS2.p1.5.m3.1.1">𝐻</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p1.5.m3.1c">H</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p1.5.m3.1d">italic_H</annotation></semantics></math>, <math alttext="W" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p1.6.m4.1"><semantics id="S3.SS2.SSS2.p1.6.m4.1a"><mi id="S3.SS2.SSS2.p1.6.m4.1.1" xref="S3.SS2.SSS2.p1.6.m4.1.1.cmml">W</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p1.6.m4.1b"><ci id="S3.SS2.SSS2.p1.6.m4.1.1.cmml" xref="S3.SS2.SSS2.p1.6.m4.1.1">𝑊</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p1.6.m4.1c">W</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p1.6.m4.1d">italic_W</annotation></semantics></math>, and <math alttext="C" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p1.7.m5.1"><semantics id="S3.SS2.SSS2.p1.7.m5.1a"><mi id="S3.SS2.SSS2.p1.7.m5.1.1" xref="S3.SS2.SSS2.p1.7.m5.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p1.7.m5.1b"><ci id="S3.SS2.SSS2.p1.7.m5.1.1.cmml" xref="S3.SS2.SSS2.p1.7.m5.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p1.7.m5.1c">C</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p1.7.m5.1d">italic_C</annotation></semantics></math> mean the batch size, height, width, and the number of channels, respectively.</p> </div> <div class="ltx_para ltx_noindent" id="S3.SS2.SSS2.p2"> <p class="ltx_p" id="S3.SS2.SSS2.p2.22"><span class="ltx_text ltx_font_bold" id="S3.SS2.SSS2.p2.22.1">Geometric Affine Field.</span> In our framework, we utilize a hybrid representation that combines explicit geometry similar to TensoRF <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib13" title="">13</a>]</cite> through a point cloud <math alttext="P\in\mathbb{R}^{N_{P}\times 3}" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.1.m1.1"><semantics id="S3.SS2.SSS2.p2.1.m1.1a"><mrow id="S3.SS2.SSS2.p2.1.m1.1.1" xref="S3.SS2.SSS2.p2.1.m1.1.1.cmml"><mi id="S3.SS2.SSS2.p2.1.m1.1.1.2" xref="S3.SS2.SSS2.p2.1.m1.1.1.2.cmml">P</mi><mo id="S3.SS2.SSS2.p2.1.m1.1.1.1" xref="S3.SS2.SSS2.p2.1.m1.1.1.1.cmml">∈</mo><msup id="S3.SS2.SSS2.p2.1.m1.1.1.3" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.cmml"><mi id="S3.SS2.SSS2.p2.1.m1.1.1.3.2" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.2.cmml">ℝ</mi><mrow id="S3.SS2.SSS2.p2.1.m1.1.1.3.3" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.3.cmml"><msub id="S3.SS2.SSS2.p2.1.m1.1.1.3.3.2" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.3.2.cmml"><mi id="S3.SS2.SSS2.p2.1.m1.1.1.3.3.2.2" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.3.2.2.cmml">N</mi><mi id="S3.SS2.SSS2.p2.1.m1.1.1.3.3.2.3" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.3.2.3.cmml">P</mi></msub><mo id="S3.SS2.SSS2.p2.1.m1.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.3.1.cmml">×</mo><mn id="S3.SS2.SSS2.p2.1.m1.1.1.3.3.3" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.3.3.cmml">3</mn></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.1.m1.1b"><apply id="S3.SS2.SSS2.p2.1.m1.1.1.cmml" xref="S3.SS2.SSS2.p2.1.m1.1.1"><in id="S3.SS2.SSS2.p2.1.m1.1.1.1.cmml" xref="S3.SS2.SSS2.p2.1.m1.1.1.1"></in><ci id="S3.SS2.SSS2.p2.1.m1.1.1.2.cmml" xref="S3.SS2.SSS2.p2.1.m1.1.1.2">𝑃</ci><apply id="S3.SS2.SSS2.p2.1.m1.1.1.3.cmml" xref="S3.SS2.SSS2.p2.1.m1.1.1.3"><csymbol cd="ambiguous" id="S3.SS2.SSS2.p2.1.m1.1.1.3.1.cmml" xref="S3.SS2.SSS2.p2.1.m1.1.1.3">superscript</csymbol><ci id="S3.SS2.SSS2.p2.1.m1.1.1.3.2.cmml" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.2">ℝ</ci><apply id="S3.SS2.SSS2.p2.1.m1.1.1.3.3.cmml" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.3"><times id="S3.SS2.SSS2.p2.1.m1.1.1.3.3.1.cmml" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.3.1"></times><apply id="S3.SS2.SSS2.p2.1.m1.1.1.3.3.2.cmml" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.3.2"><csymbol cd="ambiguous" id="S3.SS2.SSS2.p2.1.m1.1.1.3.3.2.1.cmml" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.3.2">subscript</csymbol><ci id="S3.SS2.SSS2.p2.1.m1.1.1.3.3.2.2.cmml" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.3.2.2">𝑁</ci><ci id="S3.SS2.SSS2.p2.1.m1.1.1.3.3.2.3.cmml" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.3.2.3">𝑃</ci></apply><cn id="S3.SS2.SSS2.p2.1.m1.1.1.3.3.3.cmml" type="integer" xref="S3.SS2.SSS2.p2.1.m1.1.1.3.3.3">3</cn></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.1.m1.1c">P\in\mathbb{R}^{N_{P}\times 3}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.1.m1.1d">italic_P ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT × 3 end_POSTSUPERSCRIPT</annotation></semantics></math>, which consists of <math alttext="N_{P}" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.2.m2.1"><semantics id="S3.SS2.SSS2.p2.2.m2.1a"><msub id="S3.SS2.SSS2.p2.2.m2.1.1" xref="S3.SS2.SSS2.p2.2.m2.1.1.cmml"><mi id="S3.SS2.SSS2.p2.2.m2.1.1.2" xref="S3.SS2.SSS2.p2.2.m2.1.1.2.cmml">N</mi><mi id="S3.SS2.SSS2.p2.2.m2.1.1.3" xref="S3.SS2.SSS2.p2.2.m2.1.1.3.cmml">P</mi></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.2.m2.1b"><apply id="S3.SS2.SSS2.p2.2.m2.1.1.cmml" xref="S3.SS2.SSS2.p2.2.m2.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS2.p2.2.m2.1.1.1.cmml" xref="S3.SS2.SSS2.p2.2.m2.1.1">subscript</csymbol><ci id="S3.SS2.SSS2.p2.2.m2.1.1.2.cmml" xref="S3.SS2.SSS2.p2.2.m2.1.1.2">𝑁</ci><ci id="S3.SS2.SSS2.p2.2.m2.1.1.3.cmml" xref="S3.SS2.SSS2.p2.2.m2.1.1.3">𝑃</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.2.m2.1c">N_{P}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.2.m2.1d">italic_N start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT</annotation></semantics></math> 3D points defined by their <math alttext="x" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.3.m3.1"><semantics id="S3.SS2.SSS2.p2.3.m3.1a"><mi id="S3.SS2.SSS2.p2.3.m3.1.1" xref="S3.SS2.SSS2.p2.3.m3.1.1.cmml">x</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.3.m3.1b"><ci id="S3.SS2.SSS2.p2.3.m3.1.1.cmml" xref="S3.SS2.SSS2.p2.3.m3.1.1">𝑥</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.3.m3.1c">x</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.3.m3.1d">italic_x</annotation></semantics></math>, <math alttext="y" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.4.m4.1"><semantics id="S3.SS2.SSS2.p2.4.m4.1a"><mi id="S3.SS2.SSS2.p2.4.m4.1.1" xref="S3.SS2.SSS2.p2.4.m4.1.1.cmml">y</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.4.m4.1b"><ci id="S3.SS2.SSS2.p2.4.m4.1.1.cmml" xref="S3.SS2.SSS2.p2.4.m4.1.1">𝑦</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.4.m4.1c">y</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.4.m4.1d">italic_y</annotation></semantics></math>, and <math alttext="z" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.5.m5.1"><semantics id="S3.SS2.SSS2.p2.5.m5.1a"><mi id="S3.SS2.SSS2.p2.5.m5.1.1" xref="S3.SS2.SSS2.p2.5.m5.1.1.cmml">z</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.5.m5.1b"><ci id="S3.SS2.SSS2.p2.5.m5.1.1.cmml" xref="S3.SS2.SSS2.p2.5.m5.1.1">𝑧</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.5.m5.1c">z</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.5.m5.1d">italic_z</annotation></semantics></math> coordinates, and an implicit feature field encoded by a local geometric affine tensor <math alttext="T\in\mathbb{R}^{3\times C\times H\times W}" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.6.m6.1"><semantics id="S3.SS2.SSS2.p2.6.m6.1a"><mrow id="S3.SS2.SSS2.p2.6.m6.1.1" xref="S3.SS2.SSS2.p2.6.m6.1.1.cmml"><mi id="S3.SS2.SSS2.p2.6.m6.1.1.2" xref="S3.SS2.SSS2.p2.6.m6.1.1.2.cmml">T</mi><mo id="S3.SS2.SSS2.p2.6.m6.1.1.1" xref="S3.SS2.SSS2.p2.6.m6.1.1.1.cmml">∈</mo><msup id="S3.SS2.SSS2.p2.6.m6.1.1.3" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.cmml"><mi id="S3.SS2.SSS2.p2.6.m6.1.1.3.2" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.2.cmml">ℝ</mi><mrow id="S3.SS2.SSS2.p2.6.m6.1.1.3.3" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.3.cmml"><mn id="S3.SS2.SSS2.p2.6.m6.1.1.3.3.2" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.3.2.cmml">3</mn><mo id="S3.SS2.SSS2.p2.6.m6.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.3.1.cmml">×</mo><mi id="S3.SS2.SSS2.p2.6.m6.1.1.3.3.3" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.3.3.cmml">C</mi><mo id="S3.SS2.SSS2.p2.6.m6.1.1.3.3.1a" lspace="0.222em" rspace="0.222em" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.3.1.cmml">×</mo><mi id="S3.SS2.SSS2.p2.6.m6.1.1.3.3.4" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.3.4.cmml">H</mi><mo id="S3.SS2.SSS2.p2.6.m6.1.1.3.3.1b" lspace="0.222em" rspace="0.222em" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.3.1.cmml">×</mo><mi id="S3.SS2.SSS2.p2.6.m6.1.1.3.3.5" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.3.5.cmml">W</mi></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.6.m6.1b"><apply id="S3.SS2.SSS2.p2.6.m6.1.1.cmml" xref="S3.SS2.SSS2.p2.6.m6.1.1"><in id="S3.SS2.SSS2.p2.6.m6.1.1.1.cmml" xref="S3.SS2.SSS2.p2.6.m6.1.1.1"></in><ci id="S3.SS2.SSS2.p2.6.m6.1.1.2.cmml" xref="S3.SS2.SSS2.p2.6.m6.1.1.2">𝑇</ci><apply id="S3.SS2.SSS2.p2.6.m6.1.1.3.cmml" xref="S3.SS2.SSS2.p2.6.m6.1.1.3"><csymbol cd="ambiguous" id="S3.SS2.SSS2.p2.6.m6.1.1.3.1.cmml" xref="S3.SS2.SSS2.p2.6.m6.1.1.3">superscript</csymbol><ci id="S3.SS2.SSS2.p2.6.m6.1.1.3.2.cmml" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.2">ℝ</ci><apply id="S3.SS2.SSS2.p2.6.m6.1.1.3.3.cmml" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.3"><times id="S3.SS2.SSS2.p2.6.m6.1.1.3.3.1.cmml" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.3.1"></times><cn id="S3.SS2.SSS2.p2.6.m6.1.1.3.3.2.cmml" type="integer" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.3.2">3</cn><ci id="S3.SS2.SSS2.p2.6.m6.1.1.3.3.3.cmml" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.3.3">𝐶</ci><ci id="S3.SS2.SSS2.p2.6.m6.1.1.3.3.4.cmml" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.3.4">𝐻</ci><ci id="S3.SS2.SSS2.p2.6.m6.1.1.3.3.5.cmml" xref="S3.SS2.SSS2.p2.6.m6.1.1.3.3.5">𝑊</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.6.m6.1c">T\in\mathbb{R}^{3\times C\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.6.m6.1d">italic_T ∈ blackboard_R start_POSTSUPERSCRIPT 3 × italic_C × italic_H × italic_W end_POSTSUPERSCRIPT</annotation></semantics></math>, where <math alttext="C" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.7.m7.1"><semantics id="S3.SS2.SSS2.p2.7.m7.1a"><mi id="S3.SS2.SSS2.p2.7.m7.1.1" xref="S3.SS2.SSS2.p2.7.m7.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.7.m7.1b"><ci id="S3.SS2.SSS2.p2.7.m7.1.1.cmml" xref="S3.SS2.SSS2.p2.7.m7.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.7.m7.1c">C</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.7.m7.1d">italic_C</annotation></semantics></math> represents the number of feature channels and <math alttext="H" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.8.m8.1"><semantics id="S3.SS2.SSS2.p2.8.m8.1a"><mi id="S3.SS2.SSS2.p2.8.m8.1.1" xref="S3.SS2.SSS2.p2.8.m8.1.1.cmml">H</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.8.m8.1b"><ci id="S3.SS2.SSS2.p2.8.m8.1.1.cmml" xref="S3.SS2.SSS2.p2.8.m8.1.1">𝐻</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.8.m8.1c">H</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.8.m8.1d">italic_H</annotation></semantics></math> and <math alttext="W" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.9.m9.1"><semantics id="S3.SS2.SSS2.p2.9.m9.1a"><mi id="S3.SS2.SSS2.p2.9.m9.1.1" xref="S3.SS2.SSS2.p2.9.m9.1.1.cmml">W</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.9.m9.1b"><ci id="S3.SS2.SSS2.p2.9.m9.1.1.cmml" xref="S3.SS2.SSS2.p2.9.m9.1.1">𝑊</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.9.m9.1c">W</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.9.m9.1d">italic_W</annotation></semantics></math> denote the height and width of the feature maps. The local geometric affine <math alttext="T" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.10.m10.1"><semantics id="S3.SS2.SSS2.p2.10.m10.1a"><mi id="S3.SS2.SSS2.p2.10.m10.1.1" xref="S3.SS2.SSS2.p2.10.m10.1.1.cmml">T</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.10.m10.1b"><ci id="S3.SS2.SSS2.p2.10.m10.1.1.cmml" xref="S3.SS2.SSS2.p2.10.m10.1.1">𝑇</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.10.m10.1c">T</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.10.m10.1d">italic_T</annotation></semantics></math> is composed of three orthogonal geometric affines aligned with the axis: <math alttext="T_{xy}" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.11.m11.1"><semantics id="S3.SS2.SSS2.p2.11.m11.1a"><msub id="S3.SS2.SSS2.p2.11.m11.1.1" xref="S3.SS2.SSS2.p2.11.m11.1.1.cmml"><mi id="S3.SS2.SSS2.p2.11.m11.1.1.2" xref="S3.SS2.SSS2.p2.11.m11.1.1.2.cmml">T</mi><mrow id="S3.SS2.SSS2.p2.11.m11.1.1.3" xref="S3.SS2.SSS2.p2.11.m11.1.1.3.cmml"><mi id="S3.SS2.SSS2.p2.11.m11.1.1.3.2" xref="S3.SS2.SSS2.p2.11.m11.1.1.3.2.cmml">x</mi><mo id="S3.SS2.SSS2.p2.11.m11.1.1.3.1" xref="S3.SS2.SSS2.p2.11.m11.1.1.3.1.cmml"></mo><mi id="S3.SS2.SSS2.p2.11.m11.1.1.3.3" xref="S3.SS2.SSS2.p2.11.m11.1.1.3.3.cmml">y</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.11.m11.1b"><apply id="S3.SS2.SSS2.p2.11.m11.1.1.cmml" xref="S3.SS2.SSS2.p2.11.m11.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS2.p2.11.m11.1.1.1.cmml" xref="S3.SS2.SSS2.p2.11.m11.1.1">subscript</csymbol><ci id="S3.SS2.SSS2.p2.11.m11.1.1.2.cmml" xref="S3.SS2.SSS2.p2.11.m11.1.1.2">𝑇</ci><apply id="S3.SS2.SSS2.p2.11.m11.1.1.3.cmml" xref="S3.SS2.SSS2.p2.11.m11.1.1.3"><times id="S3.SS2.SSS2.p2.11.m11.1.1.3.1.cmml" xref="S3.SS2.SSS2.p2.11.m11.1.1.3.1"></times><ci id="S3.SS2.SSS2.p2.11.m11.1.1.3.2.cmml" xref="S3.SS2.SSS2.p2.11.m11.1.1.3.2">𝑥</ci><ci id="S3.SS2.SSS2.p2.11.m11.1.1.3.3.cmml" xref="S3.SS2.SSS2.p2.11.m11.1.1.3.3">𝑦</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.11.m11.1c">T_{xy}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.11.m11.1d">italic_T start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT</annotation></semantics></math>, <math alttext="T_{xz}" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.12.m12.1"><semantics id="S3.SS2.SSS2.p2.12.m12.1a"><msub id="S3.SS2.SSS2.p2.12.m12.1.1" xref="S3.SS2.SSS2.p2.12.m12.1.1.cmml"><mi id="S3.SS2.SSS2.p2.12.m12.1.1.2" xref="S3.SS2.SSS2.p2.12.m12.1.1.2.cmml">T</mi><mrow id="S3.SS2.SSS2.p2.12.m12.1.1.3" xref="S3.SS2.SSS2.p2.12.m12.1.1.3.cmml"><mi id="S3.SS2.SSS2.p2.12.m12.1.1.3.2" xref="S3.SS2.SSS2.p2.12.m12.1.1.3.2.cmml">x</mi><mo id="S3.SS2.SSS2.p2.12.m12.1.1.3.1" xref="S3.SS2.SSS2.p2.12.m12.1.1.3.1.cmml"></mo><mi id="S3.SS2.SSS2.p2.12.m12.1.1.3.3" xref="S3.SS2.SSS2.p2.12.m12.1.1.3.3.cmml">z</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.12.m12.1b"><apply id="S3.SS2.SSS2.p2.12.m12.1.1.cmml" xref="S3.SS2.SSS2.p2.12.m12.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS2.p2.12.m12.1.1.1.cmml" xref="S3.SS2.SSS2.p2.12.m12.1.1">subscript</csymbol><ci id="S3.SS2.SSS2.p2.12.m12.1.1.2.cmml" xref="S3.SS2.SSS2.p2.12.m12.1.1.2">𝑇</ci><apply id="S3.SS2.SSS2.p2.12.m12.1.1.3.cmml" xref="S3.SS2.SSS2.p2.12.m12.1.1.3"><times id="S3.SS2.SSS2.p2.12.m12.1.1.3.1.cmml" xref="S3.SS2.SSS2.p2.12.m12.1.1.3.1"></times><ci id="S3.SS2.SSS2.p2.12.m12.1.1.3.2.cmml" xref="S3.SS2.SSS2.p2.12.m12.1.1.3.2">𝑥</ci><ci id="S3.SS2.SSS2.p2.12.m12.1.1.3.3.cmml" xref="S3.SS2.SSS2.p2.12.m12.1.1.3.3">𝑧</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.12.m12.1c">T_{xz}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.12.m12.1d">italic_T start_POSTSUBSCRIPT italic_x italic_z end_POSTSUBSCRIPT</annotation></semantics></math>, and <math alttext="T_{yz}" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.13.m13.1"><semantics id="S3.SS2.SSS2.p2.13.m13.1a"><msub id="S3.SS2.SSS2.p2.13.m13.1.1" xref="S3.SS2.SSS2.p2.13.m13.1.1.cmml"><mi id="S3.SS2.SSS2.p2.13.m13.1.1.2" xref="S3.SS2.SSS2.p2.13.m13.1.1.2.cmml">T</mi><mrow id="S3.SS2.SSS2.p2.13.m13.1.1.3" xref="S3.SS2.SSS2.p2.13.m13.1.1.3.cmml"><mi id="S3.SS2.SSS2.p2.13.m13.1.1.3.2" xref="S3.SS2.SSS2.p2.13.m13.1.1.3.2.cmml">y</mi><mo id="S3.SS2.SSS2.p2.13.m13.1.1.3.1" xref="S3.SS2.SSS2.p2.13.m13.1.1.3.1.cmml"></mo><mi id="S3.SS2.SSS2.p2.13.m13.1.1.3.3" xref="S3.SS2.SSS2.p2.13.m13.1.1.3.3.cmml">z</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.13.m13.1b"><apply id="S3.SS2.SSS2.p2.13.m13.1.1.cmml" xref="S3.SS2.SSS2.p2.13.m13.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS2.p2.13.m13.1.1.1.cmml" xref="S3.SS2.SSS2.p2.13.m13.1.1">subscript</csymbol><ci id="S3.SS2.SSS2.p2.13.m13.1.1.2.cmml" xref="S3.SS2.SSS2.p2.13.m13.1.1.2">𝑇</ci><apply id="S3.SS2.SSS2.p2.13.m13.1.1.3.cmml" xref="S3.SS2.SSS2.p2.13.m13.1.1.3"><times id="S3.SS2.SSS2.p2.13.m13.1.1.3.1.cmml" xref="S3.SS2.SSS2.p2.13.m13.1.1.3.1"></times><ci id="S3.SS2.SSS2.p2.13.m13.1.1.3.2.cmml" xref="S3.SS2.SSS2.p2.13.m13.1.1.3.2">𝑦</ci><ci id="S3.SS2.SSS2.p2.13.m13.1.1.3.3.cmml" xref="S3.SS2.SSS2.p2.13.m13.1.1.3.3">𝑧</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.13.m13.1c">T_{yz}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.13.m13.1d">italic_T start_POSTSUBSCRIPT italic_y italic_z end_POSTSUBSCRIPT</annotation></semantics></math>, which correspond to projections onto the planes <math alttext="xy" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.14.m14.1"><semantics id="S3.SS2.SSS2.p2.14.m14.1a"><mrow id="S3.SS2.SSS2.p2.14.m14.1.1" xref="S3.SS2.SSS2.p2.14.m14.1.1.cmml"><mi id="S3.SS2.SSS2.p2.14.m14.1.1.2" xref="S3.SS2.SSS2.p2.14.m14.1.1.2.cmml">x</mi><mo id="S3.SS2.SSS2.p2.14.m14.1.1.1" xref="S3.SS2.SSS2.p2.14.m14.1.1.1.cmml"></mo><mi id="S3.SS2.SSS2.p2.14.m14.1.1.3" xref="S3.SS2.SSS2.p2.14.m14.1.1.3.cmml">y</mi></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.14.m14.1b"><apply id="S3.SS2.SSS2.p2.14.m14.1.1.cmml" xref="S3.SS2.SSS2.p2.14.m14.1.1"><times id="S3.SS2.SSS2.p2.14.m14.1.1.1.cmml" xref="S3.SS2.SSS2.p2.14.m14.1.1.1"></times><ci id="S3.SS2.SSS2.p2.14.m14.1.1.2.cmml" xref="S3.SS2.SSS2.p2.14.m14.1.1.2">𝑥</ci><ci id="S3.SS2.SSS2.p2.14.m14.1.1.3.cmml" xref="S3.SS2.SSS2.p2.14.m14.1.1.3">𝑦</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.14.m14.1c">xy</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.14.m14.1d">italic_x italic_y</annotation></semantics></math>, <math alttext="xz" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.15.m15.1"><semantics id="S3.SS2.SSS2.p2.15.m15.1a"><mrow id="S3.SS2.SSS2.p2.15.m15.1.1" xref="S3.SS2.SSS2.p2.15.m15.1.1.cmml"><mi id="S3.SS2.SSS2.p2.15.m15.1.1.2" xref="S3.SS2.SSS2.p2.15.m15.1.1.2.cmml">x</mi><mo id="S3.SS2.SSS2.p2.15.m15.1.1.1" xref="S3.SS2.SSS2.p2.15.m15.1.1.1.cmml"></mo><mi id="S3.SS2.SSS2.p2.15.m15.1.1.3" xref="S3.SS2.SSS2.p2.15.m15.1.1.3.cmml">z</mi></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.15.m15.1b"><apply id="S3.SS2.SSS2.p2.15.m15.1.1.cmml" xref="S3.SS2.SSS2.p2.15.m15.1.1"><times id="S3.SS2.SSS2.p2.15.m15.1.1.1.cmml" xref="S3.SS2.SSS2.p2.15.m15.1.1.1"></times><ci id="S3.SS2.SSS2.p2.15.m15.1.1.2.cmml" xref="S3.SS2.SSS2.p2.15.m15.1.1.2">𝑥</ci><ci id="S3.SS2.SSS2.p2.15.m15.1.1.3.cmml" xref="S3.SS2.SSS2.p2.15.m15.1.1.3">𝑧</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.15.m15.1c">xz</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.15.m15.1d">italic_x italic_z</annotation></semantics></math> and <math alttext="yz" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.16.m16.1"><semantics id="S3.SS2.SSS2.p2.16.m16.1a"><mrow id="S3.SS2.SSS2.p2.16.m16.1.1" xref="S3.SS2.SSS2.p2.16.m16.1.1.cmml"><mi id="S3.SS2.SSS2.p2.16.m16.1.1.2" xref="S3.SS2.SSS2.p2.16.m16.1.1.2.cmml">y</mi><mo id="S3.SS2.SSS2.p2.16.m16.1.1.1" xref="S3.SS2.SSS2.p2.16.m16.1.1.1.cmml"></mo><mi id="S3.SS2.SSS2.p2.16.m16.1.1.3" xref="S3.SS2.SSS2.p2.16.m16.1.1.3.cmml">z</mi></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.16.m16.1b"><apply id="S3.SS2.SSS2.p2.16.m16.1.1.cmml" xref="S3.SS2.SSS2.p2.16.m16.1.1"><times id="S3.SS2.SSS2.p2.16.m16.1.1.1.cmml" xref="S3.SS2.SSS2.p2.16.m16.1.1.1"></times><ci id="S3.SS2.SSS2.p2.16.m16.1.1.2.cmml" xref="S3.SS2.SSS2.p2.16.m16.1.1.2">𝑦</ci><ci id="S3.SS2.SSS2.p2.16.m16.1.1.3.cmml" xref="S3.SS2.SSS2.p2.16.m16.1.1.3">𝑧</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.16.m16.1c">yz</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.16.m16.1d">italic_y italic_z</annotation></semantics></math>, respectively. To retrieve the corresponding feature vector at any given 3D position <math alttext="x" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.17.m17.1"><semantics id="S3.SS2.SSS2.p2.17.m17.1a"><mi id="S3.SS2.SSS2.p2.17.m17.1.1" xref="S3.SS2.SSS2.p2.17.m17.1.1.cmml">x</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.17.m17.1b"><ci id="S3.SS2.SSS2.p2.17.m17.1.1.cmml" xref="S3.SS2.SSS2.p2.17.m17.1.1">𝑥</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.17.m17.1c">x</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.17.m17.1d">italic_x</annotation></semantics></math>, we first project <math alttext="x" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.18.m18.1"><semantics id="S3.SS2.SSS2.p2.18.m18.1a"><mi id="S3.SS2.SSS2.p2.18.m18.1.1" xref="S3.SS2.SSS2.p2.18.m18.1.1.cmml">x</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.18.m18.1b"><ci id="S3.SS2.SSS2.p2.18.m18.1.1.cmml" xref="S3.SS2.SSS2.p2.18.m18.1.1">𝑥</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.18.m18.1c">x</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.18.m18.1d">italic_x</annotation></semantics></math> onto each of the three geometric affines to obtain the projections <math alttext="p_{xy}" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.19.m19.1"><semantics id="S3.SS2.SSS2.p2.19.m19.1a"><msub id="S3.SS2.SSS2.p2.19.m19.1.1" xref="S3.SS2.SSS2.p2.19.m19.1.1.cmml"><mi id="S3.SS2.SSS2.p2.19.m19.1.1.2" xref="S3.SS2.SSS2.p2.19.m19.1.1.2.cmml">p</mi><mrow id="S3.SS2.SSS2.p2.19.m19.1.1.3" xref="S3.SS2.SSS2.p2.19.m19.1.1.3.cmml"><mi id="S3.SS2.SSS2.p2.19.m19.1.1.3.2" xref="S3.SS2.SSS2.p2.19.m19.1.1.3.2.cmml">x</mi><mo id="S3.SS2.SSS2.p2.19.m19.1.1.3.1" xref="S3.SS2.SSS2.p2.19.m19.1.1.3.1.cmml"></mo><mi id="S3.SS2.SSS2.p2.19.m19.1.1.3.3" xref="S3.SS2.SSS2.p2.19.m19.1.1.3.3.cmml">y</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.19.m19.1b"><apply id="S3.SS2.SSS2.p2.19.m19.1.1.cmml" xref="S3.SS2.SSS2.p2.19.m19.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS2.p2.19.m19.1.1.1.cmml" xref="S3.SS2.SSS2.p2.19.m19.1.1">subscript</csymbol><ci id="S3.SS2.SSS2.p2.19.m19.1.1.2.cmml" xref="S3.SS2.SSS2.p2.19.m19.1.1.2">𝑝</ci><apply id="S3.SS2.SSS2.p2.19.m19.1.1.3.cmml" xref="S3.SS2.SSS2.p2.19.m19.1.1.3"><times id="S3.SS2.SSS2.p2.19.m19.1.1.3.1.cmml" xref="S3.SS2.SSS2.p2.19.m19.1.1.3.1"></times><ci id="S3.SS2.SSS2.p2.19.m19.1.1.3.2.cmml" xref="S3.SS2.SSS2.p2.19.m19.1.1.3.2">𝑥</ci><ci id="S3.SS2.SSS2.p2.19.m19.1.1.3.3.cmml" xref="S3.SS2.SSS2.p2.19.m19.1.1.3.3">𝑦</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.19.m19.1c">p_{xy}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.19.m19.1d">italic_p start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT</annotation></semantics></math>, <math alttext="p_{xz}" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.20.m20.1"><semantics id="S3.SS2.SSS2.p2.20.m20.1a"><msub id="S3.SS2.SSS2.p2.20.m20.1.1" xref="S3.SS2.SSS2.p2.20.m20.1.1.cmml"><mi id="S3.SS2.SSS2.p2.20.m20.1.1.2" xref="S3.SS2.SSS2.p2.20.m20.1.1.2.cmml">p</mi><mrow id="S3.SS2.SSS2.p2.20.m20.1.1.3" xref="S3.SS2.SSS2.p2.20.m20.1.1.3.cmml"><mi id="S3.SS2.SSS2.p2.20.m20.1.1.3.2" xref="S3.SS2.SSS2.p2.20.m20.1.1.3.2.cmml">x</mi><mo id="S3.SS2.SSS2.p2.20.m20.1.1.3.1" xref="S3.SS2.SSS2.p2.20.m20.1.1.3.1.cmml"></mo><mi id="S3.SS2.SSS2.p2.20.m20.1.1.3.3" xref="S3.SS2.SSS2.p2.20.m20.1.1.3.3.cmml">z</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.20.m20.1b"><apply id="S3.SS2.SSS2.p2.20.m20.1.1.cmml" xref="S3.SS2.SSS2.p2.20.m20.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS2.p2.20.m20.1.1.1.cmml" xref="S3.SS2.SSS2.p2.20.m20.1.1">subscript</csymbol><ci id="S3.SS2.SSS2.p2.20.m20.1.1.2.cmml" xref="S3.SS2.SSS2.p2.20.m20.1.1.2">𝑝</ci><apply id="S3.SS2.SSS2.p2.20.m20.1.1.3.cmml" xref="S3.SS2.SSS2.p2.20.m20.1.1.3"><times id="S3.SS2.SSS2.p2.20.m20.1.1.3.1.cmml" xref="S3.SS2.SSS2.p2.20.m20.1.1.3.1"></times><ci id="S3.SS2.SSS2.p2.20.m20.1.1.3.2.cmml" xref="S3.SS2.SSS2.p2.20.m20.1.1.3.2">𝑥</ci><ci id="S3.SS2.SSS2.p2.20.m20.1.1.3.3.cmml" xref="S3.SS2.SSS2.p2.20.m20.1.1.3.3">𝑧</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.20.m20.1c">p_{xz}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.20.m20.1d">italic_p start_POSTSUBSCRIPT italic_x italic_z end_POSTSUBSCRIPT</annotation></semantics></math>, and <math alttext="p_{yz}" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.21.m21.1"><semantics id="S3.SS2.SSS2.p2.21.m21.1a"><msub id="S3.SS2.SSS2.p2.21.m21.1.1" xref="S3.SS2.SSS2.p2.21.m21.1.1.cmml"><mi id="S3.SS2.SSS2.p2.21.m21.1.1.2" xref="S3.SS2.SSS2.p2.21.m21.1.1.2.cmml">p</mi><mrow id="S3.SS2.SSS2.p2.21.m21.1.1.3" xref="S3.SS2.SSS2.p2.21.m21.1.1.3.cmml"><mi id="S3.SS2.SSS2.p2.21.m21.1.1.3.2" xref="S3.SS2.SSS2.p2.21.m21.1.1.3.2.cmml">y</mi><mo id="S3.SS2.SSS2.p2.21.m21.1.1.3.1" xref="S3.SS2.SSS2.p2.21.m21.1.1.3.1.cmml"></mo><mi id="S3.SS2.SSS2.p2.21.m21.1.1.3.3" xref="S3.SS2.SSS2.p2.21.m21.1.1.3.3.cmml">z</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.21.m21.1b"><apply id="S3.SS2.SSS2.p2.21.m21.1.1.cmml" xref="S3.SS2.SSS2.p2.21.m21.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS2.p2.21.m21.1.1.1.cmml" xref="S3.SS2.SSS2.p2.21.m21.1.1">subscript</csymbol><ci id="S3.SS2.SSS2.p2.21.m21.1.1.2.cmml" xref="S3.SS2.SSS2.p2.21.m21.1.1.2">𝑝</ci><apply id="S3.SS2.SSS2.p2.21.m21.1.1.3.cmml" xref="S3.SS2.SSS2.p2.21.m21.1.1.3"><times id="S3.SS2.SSS2.p2.21.m21.1.1.3.1.cmml" xref="S3.SS2.SSS2.p2.21.m21.1.1.3.1"></times><ci id="S3.SS2.SSS2.p2.21.m21.1.1.3.2.cmml" xref="S3.SS2.SSS2.p2.21.m21.1.1.3.2">𝑦</ci><ci id="S3.SS2.SSS2.p2.21.m21.1.1.3.3.cmml" xref="S3.SS2.SSS2.p2.21.m21.1.1.3.3">𝑧</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.21.m21.1c">p_{yz}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.21.m21.1d">italic_p start_POSTSUBSCRIPT italic_y italic_z end_POSTSUBSCRIPT</annotation></semantics></math>. The final feature vector <math alttext="f_{t}" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.22.m22.1"><semantics id="S3.SS2.SSS2.p2.22.m22.1a"><msub id="S3.SS2.SSS2.p2.22.m22.1.1" xref="S3.SS2.SSS2.p2.22.m22.1.1.cmml"><mi id="S3.SS2.SSS2.p2.22.m22.1.1.2" xref="S3.SS2.SSS2.p2.22.m22.1.1.2.cmml">f</mi><mi id="S3.SS2.SSS2.p2.22.m22.1.1.3" xref="S3.SS2.SSS2.p2.22.m22.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.22.m22.1b"><apply id="S3.SS2.SSS2.p2.22.m22.1.1.cmml" xref="S3.SS2.SSS2.p2.22.m22.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS2.p2.22.m22.1.1.1.cmml" xref="S3.SS2.SSS2.p2.22.m22.1.1">subscript</csymbol><ci id="S3.SS2.SSS2.p2.22.m22.1.1.2.cmml" xref="S3.SS2.SSS2.p2.22.m22.1.1.2">𝑓</ci><ci id="S3.SS2.SSS2.p2.22.m22.1.1.3.cmml" xref="S3.SS2.SSS2.p2.22.m22.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.22.m22.1c">f_{t}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.22.m22.1d">italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> is then computed by performing trilinear interpolation on each of these projections and concatenating the results, which can be represented with the following equation,</p> <table class="ltx_equation ltx_eqn_table" id="S3.E11"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\begin{split}f_{t}=\;\mathrm{Aff}(T_{xy},p_{xy})\oplus\mathrm{Aff}(T_{xz},p_{% xz})\oplus\mathrm{Aff}(T_{yz},p_{yz}),\end{split}" class="ltx_Math" display="block" id="S3.E11.m1.32"><semantics id="S3.E11.m1.32a"><mtable displaystyle="true" id="S3.E11.m1.32.32.2"><mtr id="S3.E11.m1.32.32.2a"><mtd class="ltx_align_right" columnalign="right" id="S3.E11.m1.32.32.2b"><mrow id="S3.E11.m1.32.32.2.31.31.31.31"><mrow id="S3.E11.m1.32.32.2.31.31.31.31.1"><msub id="S3.E11.m1.32.32.2.31.31.31.31.1.7"><mi id="S3.E11.m1.1.1.1.1.1.1" mathsize="90%" xref="S3.E11.m1.1.1.1.1.1.1.cmml">f</mi><mi id="S3.E11.m1.2.2.2.2.2.2.1" mathsize="90%" xref="S3.E11.m1.2.2.2.2.2.2.1.cmml">t</mi></msub><mo id="S3.E11.m1.3.3.3.3.3.3" mathsize="90%" rspace="0.558em" xref="S3.E11.m1.3.3.3.3.3.3.cmml">=</mo><mrow id="S3.E11.m1.32.32.2.31.31.31.31.1.6"><mrow id="S3.E11.m1.32.32.2.31.31.31.31.1.2.2"><mi id="S3.E11.m1.4.4.4.4.4.4" mathsize="90%" xref="S3.E11.m1.4.4.4.4.4.4.cmml">Aff</mi><mo id="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3" xref="S3.E11.m1.31.31.1.1.1.cmml"></mo><mrow id="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.2.2"><mo id="S3.E11.m1.5.5.5.5.5.5" maxsize="90%" minsize="90%" xref="S3.E11.m1.31.31.1.1.1.cmml">(</mo><msub id="S3.E11.m1.32.32.2.31.31.31.31.1.1.1.1.1.1"><mi id="S3.E11.m1.6.6.6.6.6.6" mathsize="90%" xref="S3.E11.m1.6.6.6.6.6.6.cmml">T</mi><mrow id="S3.E11.m1.7.7.7.7.7.7.1" xref="S3.E11.m1.7.7.7.7.7.7.1.cmml"><mi id="S3.E11.m1.7.7.7.7.7.7.1.2" mathsize="90%" xref="S3.E11.m1.7.7.7.7.7.7.1.2.cmml">x</mi><mo id="S3.E11.m1.7.7.7.7.7.7.1.1" xref="S3.E11.m1.7.7.7.7.7.7.1.1.cmml"></mo><mi id="S3.E11.m1.7.7.7.7.7.7.1.3" mathsize="90%" xref="S3.E11.m1.7.7.7.7.7.7.1.3.cmml">y</mi></mrow></msub><mo id="S3.E11.m1.8.8.8.8.8.8" mathsize="90%" xref="S3.E11.m1.31.31.1.1.1.cmml">,</mo><msub id="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.2.2.2"><mi id="S3.E11.m1.9.9.9.9.9.9" mathsize="90%" xref="S3.E11.m1.9.9.9.9.9.9.cmml">p</mi><mrow id="S3.E11.m1.10.10.10.10.10.10.1" xref="S3.E11.m1.10.10.10.10.10.10.1.cmml"><mi id="S3.E11.m1.10.10.10.10.10.10.1.2" mathsize="90%" xref="S3.E11.m1.10.10.10.10.10.10.1.2.cmml">x</mi><mo id="S3.E11.m1.10.10.10.10.10.10.1.1" xref="S3.E11.m1.10.10.10.10.10.10.1.1.cmml"></mo><mi id="S3.E11.m1.10.10.10.10.10.10.1.3" mathsize="90%" xref="S3.E11.m1.10.10.10.10.10.10.1.3.cmml">y</mi></mrow></msub><mo id="S3.E11.m1.11.11.11.11.11.11" maxsize="90%" minsize="90%" xref="S3.E11.m1.31.31.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.E11.m1.12.12.12.12.12.12" mathsize="90%" xref="S3.E11.m1.12.12.12.12.12.12.cmml">⊕</mo><mrow id="S3.E11.m1.32.32.2.31.31.31.31.1.4.4"><mi id="S3.E11.m1.13.13.13.13.13.13" mathsize="90%" xref="S3.E11.m1.13.13.13.13.13.13.cmml">Aff</mi><mo id="S3.E11.m1.32.32.2.31.31.31.31.1.4.4.3" xref="S3.E11.m1.31.31.1.1.1.cmml"></mo><mrow id="S3.E11.m1.32.32.2.31.31.31.31.1.4.4.2.2"><mo id="S3.E11.m1.14.14.14.14.14.14" maxsize="90%" minsize="90%" xref="S3.E11.m1.31.31.1.1.1.cmml">(</mo><msub id="S3.E11.m1.32.32.2.31.31.31.31.1.3.3.1.1.1"><mi id="S3.E11.m1.15.15.15.15.15.15" mathsize="90%" xref="S3.E11.m1.15.15.15.15.15.15.cmml">T</mi><mrow id="S3.E11.m1.16.16.16.16.16.16.1" xref="S3.E11.m1.16.16.16.16.16.16.1.cmml"><mi id="S3.E11.m1.16.16.16.16.16.16.1.2" mathsize="90%" xref="S3.E11.m1.16.16.16.16.16.16.1.2.cmml">x</mi><mo id="S3.E11.m1.16.16.16.16.16.16.1.1" xref="S3.E11.m1.16.16.16.16.16.16.1.1.cmml"></mo><mi id="S3.E11.m1.16.16.16.16.16.16.1.3" mathsize="90%" xref="S3.E11.m1.16.16.16.16.16.16.1.3.cmml">z</mi></mrow></msub><mo id="S3.E11.m1.17.17.17.17.17.17" mathsize="90%" xref="S3.E11.m1.31.31.1.1.1.cmml">,</mo><msub id="S3.E11.m1.32.32.2.31.31.31.31.1.4.4.2.2.2"><mi id="S3.E11.m1.18.18.18.18.18.18" mathsize="90%" xref="S3.E11.m1.18.18.18.18.18.18.cmml">p</mi><mrow id="S3.E11.m1.19.19.19.19.19.19.1" xref="S3.E11.m1.19.19.19.19.19.19.1.cmml"><mi id="S3.E11.m1.19.19.19.19.19.19.1.2" mathsize="90%" xref="S3.E11.m1.19.19.19.19.19.19.1.2.cmml">x</mi><mo id="S3.E11.m1.19.19.19.19.19.19.1.1" xref="S3.E11.m1.19.19.19.19.19.19.1.1.cmml"></mo><mi id="S3.E11.m1.19.19.19.19.19.19.1.3" mathsize="90%" xref="S3.E11.m1.19.19.19.19.19.19.1.3.cmml">z</mi></mrow></msub><mo id="S3.E11.m1.20.20.20.20.20.20" maxsize="90%" minsize="90%" xref="S3.E11.m1.31.31.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.E11.m1.12.12.12.12.12.12a" mathsize="90%" xref="S3.E11.m1.12.12.12.12.12.12.cmml">⊕</mo><mrow id="S3.E11.m1.32.32.2.31.31.31.31.1.6.6"><mi id="S3.E11.m1.22.22.22.22.22.22" mathsize="90%" xref="S3.E11.m1.22.22.22.22.22.22.cmml">Aff</mi><mo id="S3.E11.m1.32.32.2.31.31.31.31.1.6.6.3" xref="S3.E11.m1.31.31.1.1.1.cmml"></mo><mrow id="S3.E11.m1.32.32.2.31.31.31.31.1.6.6.2.2"><mo id="S3.E11.m1.23.23.23.23.23.23" maxsize="90%" minsize="90%" xref="S3.E11.m1.31.31.1.1.1.cmml">(</mo><msub id="S3.E11.m1.32.32.2.31.31.31.31.1.5.5.1.1.1"><mi id="S3.E11.m1.24.24.24.24.24.24" mathsize="90%" xref="S3.E11.m1.24.24.24.24.24.24.cmml">T</mi><mrow id="S3.E11.m1.25.25.25.25.25.25.1" xref="S3.E11.m1.25.25.25.25.25.25.1.cmml"><mi id="S3.E11.m1.25.25.25.25.25.25.1.2" mathsize="90%" xref="S3.E11.m1.25.25.25.25.25.25.1.2.cmml">y</mi><mo id="S3.E11.m1.25.25.25.25.25.25.1.1" xref="S3.E11.m1.25.25.25.25.25.25.1.1.cmml"></mo><mi id="S3.E11.m1.25.25.25.25.25.25.1.3" mathsize="90%" xref="S3.E11.m1.25.25.25.25.25.25.1.3.cmml">z</mi></mrow></msub><mo id="S3.E11.m1.26.26.26.26.26.26" mathsize="90%" xref="S3.E11.m1.31.31.1.1.1.cmml">,</mo><msub id="S3.E11.m1.32.32.2.31.31.31.31.1.6.6.2.2.2"><mi id="S3.E11.m1.27.27.27.27.27.27" mathsize="90%" xref="S3.E11.m1.27.27.27.27.27.27.cmml">p</mi><mrow id="S3.E11.m1.28.28.28.28.28.28.1" xref="S3.E11.m1.28.28.28.28.28.28.1.cmml"><mi id="S3.E11.m1.28.28.28.28.28.28.1.2" mathsize="90%" xref="S3.E11.m1.28.28.28.28.28.28.1.2.cmml">y</mi><mo id="S3.E11.m1.28.28.28.28.28.28.1.1" xref="S3.E11.m1.28.28.28.28.28.28.1.1.cmml"></mo><mi id="S3.E11.m1.28.28.28.28.28.28.1.3" mathsize="90%" xref="S3.E11.m1.28.28.28.28.28.28.1.3.cmml">z</mi></mrow></msub><mo id="S3.E11.m1.29.29.29.29.29.29" maxsize="90%" minsize="90%" xref="S3.E11.m1.31.31.1.1.1.cmml">)</mo></mrow></mrow></mrow></mrow><mo id="S3.E11.m1.30.30.30.30.30.30" mathsize="90%" xref="S3.E11.m1.31.31.1.1.1.cmml">,</mo></mrow></mtd></mtr></mtable><annotation-xml encoding="MathML-Content" id="S3.E11.m1.32b"><apply id="S3.E11.m1.31.31.1.1.1.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><eq id="S3.E11.m1.3.3.3.3.3.3.cmml" xref="S3.E11.m1.3.3.3.3.3.3"></eq><apply id="S3.E11.m1.31.31.1.1.1.8.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><csymbol cd="ambiguous" id="S3.E11.m1.31.31.1.1.1.8.1.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3">subscript</csymbol><ci id="S3.E11.m1.1.1.1.1.1.1.cmml" xref="S3.E11.m1.1.1.1.1.1.1">𝑓</ci><ci id="S3.E11.m1.2.2.2.2.2.2.1.cmml" xref="S3.E11.m1.2.2.2.2.2.2.1">𝑡</ci></apply><apply id="S3.E11.m1.31.31.1.1.1.6.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><csymbol cd="latexml" id="S3.E11.m1.12.12.12.12.12.12.cmml" xref="S3.E11.m1.12.12.12.12.12.12">direct-sum</csymbol><apply id="S3.E11.m1.31.31.1.1.1.2.2.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><times id="S3.E11.m1.31.31.1.1.1.2.2.3.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"></times><ci id="S3.E11.m1.4.4.4.4.4.4.cmml" xref="S3.E11.m1.4.4.4.4.4.4">Aff</ci><interval closure="open" id="S3.E11.m1.31.31.1.1.1.2.2.2.3.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><apply id="S3.E11.m1.31.31.1.1.1.1.1.1.1.1.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><csymbol cd="ambiguous" id="S3.E11.m1.31.31.1.1.1.1.1.1.1.1.1.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3">subscript</csymbol><ci id="S3.E11.m1.6.6.6.6.6.6.cmml" xref="S3.E11.m1.6.6.6.6.6.6">𝑇</ci><apply id="S3.E11.m1.7.7.7.7.7.7.1.cmml" xref="S3.E11.m1.7.7.7.7.7.7.1"><times id="S3.E11.m1.7.7.7.7.7.7.1.1.cmml" xref="S3.E11.m1.7.7.7.7.7.7.1.1"></times><ci id="S3.E11.m1.7.7.7.7.7.7.1.2.cmml" xref="S3.E11.m1.7.7.7.7.7.7.1.2">𝑥</ci><ci id="S3.E11.m1.7.7.7.7.7.7.1.3.cmml" xref="S3.E11.m1.7.7.7.7.7.7.1.3">𝑦</ci></apply></apply><apply id="S3.E11.m1.31.31.1.1.1.2.2.2.2.2.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><csymbol cd="ambiguous" id="S3.E11.m1.31.31.1.1.1.2.2.2.2.2.1.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3">subscript</csymbol><ci id="S3.E11.m1.9.9.9.9.9.9.cmml" xref="S3.E11.m1.9.9.9.9.9.9">𝑝</ci><apply id="S3.E11.m1.10.10.10.10.10.10.1.cmml" xref="S3.E11.m1.10.10.10.10.10.10.1"><times id="S3.E11.m1.10.10.10.10.10.10.1.1.cmml" xref="S3.E11.m1.10.10.10.10.10.10.1.1"></times><ci id="S3.E11.m1.10.10.10.10.10.10.1.2.cmml" xref="S3.E11.m1.10.10.10.10.10.10.1.2">𝑥</ci><ci id="S3.E11.m1.10.10.10.10.10.10.1.3.cmml" xref="S3.E11.m1.10.10.10.10.10.10.1.3">𝑦</ci></apply></apply></interval></apply><apply id="S3.E11.m1.31.31.1.1.1.4.4.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><times id="S3.E11.m1.31.31.1.1.1.4.4.3.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"></times><ci id="S3.E11.m1.13.13.13.13.13.13.cmml" xref="S3.E11.m1.13.13.13.13.13.13">Aff</ci><interval closure="open" id="S3.E11.m1.31.31.1.1.1.4.4.2.3.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><apply id="S3.E11.m1.31.31.1.1.1.3.3.1.1.1.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><csymbol cd="ambiguous" id="S3.E11.m1.31.31.1.1.1.3.3.1.1.1.1.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3">subscript</csymbol><ci id="S3.E11.m1.15.15.15.15.15.15.cmml" xref="S3.E11.m1.15.15.15.15.15.15">𝑇</ci><apply id="S3.E11.m1.16.16.16.16.16.16.1.cmml" xref="S3.E11.m1.16.16.16.16.16.16.1"><times id="S3.E11.m1.16.16.16.16.16.16.1.1.cmml" xref="S3.E11.m1.16.16.16.16.16.16.1.1"></times><ci id="S3.E11.m1.16.16.16.16.16.16.1.2.cmml" xref="S3.E11.m1.16.16.16.16.16.16.1.2">𝑥</ci><ci id="S3.E11.m1.16.16.16.16.16.16.1.3.cmml" xref="S3.E11.m1.16.16.16.16.16.16.1.3">𝑧</ci></apply></apply><apply id="S3.E11.m1.31.31.1.1.1.4.4.2.2.2.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><csymbol cd="ambiguous" id="S3.E11.m1.31.31.1.1.1.4.4.2.2.2.1.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3">subscript</csymbol><ci id="S3.E11.m1.18.18.18.18.18.18.cmml" xref="S3.E11.m1.18.18.18.18.18.18">𝑝</ci><apply id="S3.E11.m1.19.19.19.19.19.19.1.cmml" xref="S3.E11.m1.19.19.19.19.19.19.1"><times id="S3.E11.m1.19.19.19.19.19.19.1.1.cmml" xref="S3.E11.m1.19.19.19.19.19.19.1.1"></times><ci id="S3.E11.m1.19.19.19.19.19.19.1.2.cmml" xref="S3.E11.m1.19.19.19.19.19.19.1.2">𝑥</ci><ci id="S3.E11.m1.19.19.19.19.19.19.1.3.cmml" xref="S3.E11.m1.19.19.19.19.19.19.1.3">𝑧</ci></apply></apply></interval></apply><apply id="S3.E11.m1.31.31.1.1.1.6.6.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><times id="S3.E11.m1.31.31.1.1.1.6.6.3.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"></times><ci id="S3.E11.m1.22.22.22.22.22.22.cmml" xref="S3.E11.m1.22.22.22.22.22.22">Aff</ci><interval closure="open" id="S3.E11.m1.31.31.1.1.1.6.6.2.3.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><apply id="S3.E11.m1.31.31.1.1.1.5.5.1.1.1.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><csymbol cd="ambiguous" id="S3.E11.m1.31.31.1.1.1.5.5.1.1.1.1.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3">subscript</csymbol><ci id="S3.E11.m1.24.24.24.24.24.24.cmml" xref="S3.E11.m1.24.24.24.24.24.24">𝑇</ci><apply id="S3.E11.m1.25.25.25.25.25.25.1.cmml" xref="S3.E11.m1.25.25.25.25.25.25.1"><times id="S3.E11.m1.25.25.25.25.25.25.1.1.cmml" xref="S3.E11.m1.25.25.25.25.25.25.1.1"></times><ci id="S3.E11.m1.25.25.25.25.25.25.1.2.cmml" xref="S3.E11.m1.25.25.25.25.25.25.1.2">𝑦</ci><ci id="S3.E11.m1.25.25.25.25.25.25.1.3.cmml" xref="S3.E11.m1.25.25.25.25.25.25.1.3">𝑧</ci></apply></apply><apply id="S3.E11.m1.31.31.1.1.1.6.6.2.2.2.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3"><csymbol cd="ambiguous" id="S3.E11.m1.31.31.1.1.1.6.6.2.2.2.1.cmml" xref="S3.E11.m1.32.32.2.31.31.31.31.1.2.2.3">subscript</csymbol><ci id="S3.E11.m1.27.27.27.27.27.27.cmml" xref="S3.E11.m1.27.27.27.27.27.27">𝑝</ci><apply id="S3.E11.m1.28.28.28.28.28.28.1.cmml" xref="S3.E11.m1.28.28.28.28.28.28.1"><times id="S3.E11.m1.28.28.28.28.28.28.1.1.cmml" xref="S3.E11.m1.28.28.28.28.28.28.1.1"></times><ci id="S3.E11.m1.28.28.28.28.28.28.1.2.cmml" xref="S3.E11.m1.28.28.28.28.28.28.1.2">𝑦</ci><ci id="S3.E11.m1.28.28.28.28.28.28.1.3.cmml" xref="S3.E11.m1.28.28.28.28.28.28.1.3">𝑧</ci></apply></apply></interval></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E11.m1.32c">\begin{split}f_{t}=\;\mathrm{Aff}(T_{xy},p_{xy})\oplus\mathrm{Aff}(T_{xz},p_{% xz})\oplus\mathrm{Aff}(T_{yz},p_{yz}),\end{split}</annotation><annotation encoding="application/x-llamapun" id="S3.E11.m1.32d">start_ROW start_CELL italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Aff ( italic_T start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT ) ⊕ roman_Aff ( italic_T start_POSTSUBSCRIPT italic_x italic_z end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_x italic_z end_POSTSUBSCRIPT ) ⊕ roman_Aff ( italic_T start_POSTSUBSCRIPT italic_y italic_z end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_y italic_z end_POSTSUBSCRIPT ) , end_CELL end_ROW</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(11)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS2.SSS2.p2.24">where <math alttext="\mathrm{Aff}(\cdot)" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.23.m1.1"><semantics id="S3.SS2.SSS2.p2.23.m1.1a"><mrow id="S3.SS2.SSS2.p2.23.m1.1.2" xref="S3.SS2.SSS2.p2.23.m1.1.2.cmml"><mi id="S3.SS2.SSS2.p2.23.m1.1.2.2" xref="S3.SS2.SSS2.p2.23.m1.1.2.2.cmml">Aff</mi><mo id="S3.SS2.SSS2.p2.23.m1.1.2.1" xref="S3.SS2.SSS2.p2.23.m1.1.2.1.cmml"></mo><mrow id="S3.SS2.SSS2.p2.23.m1.1.2.3.2" xref="S3.SS2.SSS2.p2.23.m1.1.2.cmml"><mo id="S3.SS2.SSS2.p2.23.m1.1.2.3.2.1" stretchy="false" xref="S3.SS2.SSS2.p2.23.m1.1.2.cmml">(</mo><mo id="S3.SS2.SSS2.p2.23.m1.1.1" lspace="0em" rspace="0em" xref="S3.SS2.SSS2.p2.23.m1.1.1.cmml">⋅</mo><mo id="S3.SS2.SSS2.p2.23.m1.1.2.3.2.2" stretchy="false" xref="S3.SS2.SSS2.p2.23.m1.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.23.m1.1b"><apply id="S3.SS2.SSS2.p2.23.m1.1.2.cmml" xref="S3.SS2.SSS2.p2.23.m1.1.2"><times id="S3.SS2.SSS2.p2.23.m1.1.2.1.cmml" xref="S3.SS2.SSS2.p2.23.m1.1.2.1"></times><ci id="S3.SS2.SSS2.p2.23.m1.1.2.2.cmml" xref="S3.SS2.SSS2.p2.23.m1.1.2.2">Aff</ci><ci id="S3.SS2.SSS2.p2.23.m1.1.1.cmml" xref="S3.SS2.SSS2.p2.23.m1.1.1">⋅</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.23.m1.1c">\mathrm{Aff}(\cdot)</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.23.m1.1d">roman_Aff ( ⋅ )</annotation></semantics></math> denotes the affine interpolation operation, and <math alttext="\oplus" class="ltx_Math" display="inline" id="S3.SS2.SSS2.p2.24.m2.1"><semantics id="S3.SS2.SSS2.p2.24.m2.1a"><mo id="S3.SS2.SSS2.p2.24.m2.1.1" xref="S3.SS2.SSS2.p2.24.m2.1.1.cmml">⊕</mo><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS2.p2.24.m2.1b"><csymbol cd="latexml" id="S3.SS2.SSS2.p2.24.m2.1.1.cmml" xref="S3.SS2.SSS2.p2.24.m2.1.1">direct-sum</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS2.p2.24.m2.1c">\oplus</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS2.p2.24.m2.1d">⊕</annotation></semantics></math> indicates concatenation. This scheme effectively leverages the spatial structure captured by both the point cloud and the geometric affines, allowing for enhanced feature representation in 3D space and facilitating more accurate and robust modeling of complex geometries in diverse tasks, such as depth estimation and scene reconstruction.</p> </div> </section> <section class="ltx_subsubsection" id="S3.SS2.SSS3"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection">3.2.3 </span>Niagara Decoder</h4> <div class="ltx_para ltx_noindent" id="S3.SS2.SSS3.p1"> <p class="ltx_p" id="S3.SS2.SSS3.p1.4"><span class="ltx_text ltx_font_bold" id="S3.SS2.SSS3.p1.4.1">Single-View Gaussian Model.</span> Building upon the high-quality monocular depth predictor <math alttext="\Psi" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p1.1.m1.1"><semantics id="S3.SS2.SSS3.p1.1.m1.1a"><mi id="S3.SS2.SSS3.p1.1.m1.1.1" mathvariant="normal" xref="S3.SS2.SSS3.p1.1.m1.1.1.cmml">Ψ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p1.1.m1.1b"><ci id="S3.SS2.SSS3.p1.1.m1.1.1.cmml" xref="S3.SS2.SSS3.p1.1.m1.1.1">Ψ</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p1.1.m1.1c">\Psi</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p1.1.m1.1d">roman_Ψ</annotation></semantics></math> pre-trained in Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite>, our framework achieves enhanced geometric consistency through iterative depth refinement modules. This depth predictor processes the input image <math alttext="I" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p1.2.m2.1"><semantics id="S3.SS2.SSS3.p1.2.m2.1a"><mi id="S3.SS2.SSS3.p1.2.m2.1.1" xref="S3.SS2.SSS3.p1.2.m2.1.1.cmml">I</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p1.2.m2.1b"><ci id="S3.SS2.SSS3.p1.2.m2.1.1.cmml" xref="S3.SS2.SSS3.p1.2.m2.1.1">𝐼</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p1.2.m2.1c">I</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p1.2.m2.1d">italic_I</annotation></semantics></math> to produce a detailed depth map <math alttext="D=\Psi_{d}(I)" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p1.3.m3.1"><semantics id="S3.SS2.SSS3.p1.3.m3.1a"><mrow id="S3.SS2.SSS3.p1.3.m3.1.2" xref="S3.SS2.SSS3.p1.3.m3.1.2.cmml"><mi id="S3.SS2.SSS3.p1.3.m3.1.2.2" xref="S3.SS2.SSS3.p1.3.m3.1.2.2.cmml">D</mi><mo id="S3.SS2.SSS3.p1.3.m3.1.2.1" xref="S3.SS2.SSS3.p1.3.m3.1.2.1.cmml">=</mo><mrow id="S3.SS2.SSS3.p1.3.m3.1.2.3" xref="S3.SS2.SSS3.p1.3.m3.1.2.3.cmml"><msub id="S3.SS2.SSS3.p1.3.m3.1.2.3.2" xref="S3.SS2.SSS3.p1.3.m3.1.2.3.2.cmml"><mi id="S3.SS2.SSS3.p1.3.m3.1.2.3.2.2" mathvariant="normal" xref="S3.SS2.SSS3.p1.3.m3.1.2.3.2.2.cmml">Ψ</mi><mi id="S3.SS2.SSS3.p1.3.m3.1.2.3.2.3" xref="S3.SS2.SSS3.p1.3.m3.1.2.3.2.3.cmml">d</mi></msub><mo id="S3.SS2.SSS3.p1.3.m3.1.2.3.1" xref="S3.SS2.SSS3.p1.3.m3.1.2.3.1.cmml"></mo><mrow id="S3.SS2.SSS3.p1.3.m3.1.2.3.3.2" xref="S3.SS2.SSS3.p1.3.m3.1.2.3.cmml"><mo id="S3.SS2.SSS3.p1.3.m3.1.2.3.3.2.1" stretchy="false" xref="S3.SS2.SSS3.p1.3.m3.1.2.3.cmml">(</mo><mi id="S3.SS2.SSS3.p1.3.m3.1.1" xref="S3.SS2.SSS3.p1.3.m3.1.1.cmml">I</mi><mo id="S3.SS2.SSS3.p1.3.m3.1.2.3.3.2.2" stretchy="false" xref="S3.SS2.SSS3.p1.3.m3.1.2.3.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p1.3.m3.1b"><apply id="S3.SS2.SSS3.p1.3.m3.1.2.cmml" xref="S3.SS2.SSS3.p1.3.m3.1.2"><eq id="S3.SS2.SSS3.p1.3.m3.1.2.1.cmml" xref="S3.SS2.SSS3.p1.3.m3.1.2.1"></eq><ci id="S3.SS2.SSS3.p1.3.m3.1.2.2.cmml" xref="S3.SS2.SSS3.p1.3.m3.1.2.2">𝐷</ci><apply id="S3.SS2.SSS3.p1.3.m3.1.2.3.cmml" xref="S3.SS2.SSS3.p1.3.m3.1.2.3"><times id="S3.SS2.SSS3.p1.3.m3.1.2.3.1.cmml" xref="S3.SS2.SSS3.p1.3.m3.1.2.3.1"></times><apply id="S3.SS2.SSS3.p1.3.m3.1.2.3.2.cmml" xref="S3.SS2.SSS3.p1.3.m3.1.2.3.2"><csymbol cd="ambiguous" id="S3.SS2.SSS3.p1.3.m3.1.2.3.2.1.cmml" xref="S3.SS2.SSS3.p1.3.m3.1.2.3.2">subscript</csymbol><ci id="S3.SS2.SSS3.p1.3.m3.1.2.3.2.2.cmml" xref="S3.SS2.SSS3.p1.3.m3.1.2.3.2.2">Ψ</ci><ci id="S3.SS2.SSS3.p1.3.m3.1.2.3.2.3.cmml" xref="S3.SS2.SSS3.p1.3.m3.1.2.3.2.3">𝑑</ci></apply><ci id="S3.SS2.SSS3.p1.3.m3.1.1.cmml" xref="S3.SS2.SSS3.p1.3.m3.1.1">𝐼</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p1.3.m3.1c">D=\Psi_{d}(I)</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p1.3.m3.1d">italic_D = roman_Ψ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_I )</annotation></semantics></math>, where <math alttext="D\in\mathbb{R}^{+}_{H\times W}" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p1.4.m4.1"><semantics id="S3.SS2.SSS3.p1.4.m4.1a"><mrow id="S3.SS2.SSS3.p1.4.m4.1.1" xref="S3.SS2.SSS3.p1.4.m4.1.1.cmml"><mi id="S3.SS2.SSS3.p1.4.m4.1.1.2" xref="S3.SS2.SSS3.p1.4.m4.1.1.2.cmml">D</mi><mo id="S3.SS2.SSS3.p1.4.m4.1.1.1" xref="S3.SS2.SSS3.p1.4.m4.1.1.1.cmml">∈</mo><msubsup id="S3.SS2.SSS3.p1.4.m4.1.1.3" xref="S3.SS2.SSS3.p1.4.m4.1.1.3.cmml"><mi id="S3.SS2.SSS3.p1.4.m4.1.1.3.2.2" xref="S3.SS2.SSS3.p1.4.m4.1.1.3.2.2.cmml">ℝ</mi><mrow id="S3.SS2.SSS3.p1.4.m4.1.1.3.3" xref="S3.SS2.SSS3.p1.4.m4.1.1.3.3.cmml"><mi id="S3.SS2.SSS3.p1.4.m4.1.1.3.3.2" xref="S3.SS2.SSS3.p1.4.m4.1.1.3.3.2.cmml">H</mi><mo id="S3.SS2.SSS3.p1.4.m4.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S3.SS2.SSS3.p1.4.m4.1.1.3.3.1.cmml">×</mo><mi id="S3.SS2.SSS3.p1.4.m4.1.1.3.3.3" xref="S3.SS2.SSS3.p1.4.m4.1.1.3.3.3.cmml">W</mi></mrow><mo id="S3.SS2.SSS3.p1.4.m4.1.1.3.2.3" xref="S3.SS2.SSS3.p1.4.m4.1.1.3.2.3.cmml">+</mo></msubsup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p1.4.m4.1b"><apply id="S3.SS2.SSS3.p1.4.m4.1.1.cmml" xref="S3.SS2.SSS3.p1.4.m4.1.1"><in id="S3.SS2.SSS3.p1.4.m4.1.1.1.cmml" xref="S3.SS2.SSS3.p1.4.m4.1.1.1"></in><ci id="S3.SS2.SSS3.p1.4.m4.1.1.2.cmml" xref="S3.SS2.SSS3.p1.4.m4.1.1.2">𝐷</ci><apply id="S3.SS2.SSS3.p1.4.m4.1.1.3.cmml" xref="S3.SS2.SSS3.p1.4.m4.1.1.3"><csymbol cd="ambiguous" id="S3.SS2.SSS3.p1.4.m4.1.1.3.1.cmml" xref="S3.SS2.SSS3.p1.4.m4.1.1.3">subscript</csymbol><apply id="S3.SS2.SSS3.p1.4.m4.1.1.3.2.cmml" xref="S3.SS2.SSS3.p1.4.m4.1.1.3"><csymbol cd="ambiguous" id="S3.SS2.SSS3.p1.4.m4.1.1.3.2.1.cmml" xref="S3.SS2.SSS3.p1.4.m4.1.1.3">superscript</csymbol><ci id="S3.SS2.SSS3.p1.4.m4.1.1.3.2.2.cmml" xref="S3.SS2.SSS3.p1.4.m4.1.1.3.2.2">ℝ</ci><plus id="S3.SS2.SSS3.p1.4.m4.1.1.3.2.3.cmml" xref="S3.SS2.SSS3.p1.4.m4.1.1.3.2.3"></plus></apply><apply id="S3.SS2.SSS3.p1.4.m4.1.1.3.3.cmml" xref="S3.SS2.SSS3.p1.4.m4.1.1.3.3"><times id="S3.SS2.SSS3.p1.4.m4.1.1.3.3.1.cmml" xref="S3.SS2.SSS3.p1.4.m4.1.1.3.3.1"></times><ci id="S3.SS2.SSS3.p1.4.m4.1.1.3.3.2.cmml" xref="S3.SS2.SSS3.p1.4.m4.1.1.3.3.2">𝐻</ci><ci id="S3.SS2.SSS3.p1.4.m4.1.1.3.3.3.cmml" xref="S3.SS2.SSS3.p1.4.m4.1.1.3.3.3">𝑊</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p1.4.m4.1c">D\in\mathbb{R}^{+}_{H\times W}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p1.4.m4.1d">italic_D ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_H × italic_W end_POSTSUBSCRIPT</annotation></semantics></math> provides depth values for each pixel in the image.</p> </div> <div class="ltx_para" id="S3.SS2.SSS3.p2"> <p class="ltx_p" id="S3.SS2.SSS3.p2.5">To enhance representation in complex scenes, our model employs a <span class="ltx_text ltx_font_italic" id="S3.SS2.SSS3.p2.5.1">depth-based Gaussian</span> prediction approach, predicting <math alttext="K>1" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p2.1.m1.1"><semantics id="S3.SS2.SSS3.p2.1.m1.1a"><mrow id="S3.SS2.SSS3.p2.1.m1.1.1" xref="S3.SS2.SSS3.p2.1.m1.1.1.cmml"><mi id="S3.SS2.SSS3.p2.1.m1.1.1.2" xref="S3.SS2.SSS3.p2.1.m1.1.1.2.cmml">K</mi><mo id="S3.SS2.SSS3.p2.1.m1.1.1.1" xref="S3.SS2.SSS3.p2.1.m1.1.1.1.cmml">></mo><mn id="S3.SS2.SSS3.p2.1.m1.1.1.3" xref="S3.SS2.SSS3.p2.1.m1.1.1.3.cmml">1</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p2.1.m1.1b"><apply id="S3.SS2.SSS3.p2.1.m1.1.1.cmml" xref="S3.SS2.SSS3.p2.1.m1.1.1"><gt id="S3.SS2.SSS3.p2.1.m1.1.1.1.cmml" xref="S3.SS2.SSS3.p2.1.m1.1.1.1"></gt><ci id="S3.SS2.SSS3.p2.1.m1.1.1.2.cmml" xref="S3.SS2.SSS3.p2.1.m1.1.1.2">𝐾</ci><cn id="S3.SS2.SSS3.p2.1.m1.1.1.3.cmml" type="integer" xref="S3.SS2.SSS3.p2.1.m1.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p2.1.m1.1c">K>1</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p2.1.m1.1d">italic_K > 1</annotation></semantics></math> distinct Gaussians per pixel to address the limitations of single Gaussian predictions, particularly in handling occluded regions. For an input image <math alttext="I" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p2.2.m2.1"><semantics id="S3.SS2.SSS3.p2.2.m2.1a"><mi id="S3.SS2.SSS3.p2.2.m2.1.1" xref="S3.SS2.SSS3.p2.2.m2.1.1.cmml">I</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p2.2.m2.1b"><ci id="S3.SS2.SSS3.p2.2.m2.1.1.cmml" xref="S3.SS2.SSS3.p2.2.m2.1.1">𝐼</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p2.2.m2.1c">I</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p2.2.m2.1d">italic_I</annotation></semantics></math> with its depth map <math alttext="D" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p2.3.m3.1"><semantics id="S3.SS2.SSS3.p2.3.m3.1a"><mi id="S3.SS2.SSS3.p2.3.m3.1.1" xref="S3.SS2.SSS3.p2.3.m3.1.1.cmml">D</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p2.3.m3.1b"><ci id="S3.SS2.SSS3.p2.3.m3.1.1.cmml" xref="S3.SS2.SSS3.p2.3.m3.1.1">𝐷</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p2.3.m3.1c">D</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p2.3.m3.1d">italic_D</annotation></semantics></math> and normal map <math alttext="N" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p2.4.m4.1"><semantics id="S3.SS2.SSS3.p2.4.m4.1a"><mi id="S3.SS2.SSS3.p2.4.m4.1.1" xref="S3.SS2.SSS3.p2.4.m4.1.1.cmml">N</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p2.4.m4.1b"><ci id="S3.SS2.SSS3.p2.4.m4.1.1.cmml" xref="S3.SS2.SSS3.p2.4.m4.1.1">𝑁</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p2.4.m4.1c">N</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p2.4.m4.1d">italic_N</annotation></semantics></math>, the network outputs a set of parameters <math alttext="\hat{P}" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p2.5.m5.1"><semantics id="S3.SS2.SSS3.p2.5.m5.1a"><mover accent="true" id="S3.SS2.SSS3.p2.5.m5.1.1" xref="S3.SS2.SSS3.p2.5.m5.1.1.cmml"><mi id="S3.SS2.SSS3.p2.5.m5.1.1.2" xref="S3.SS2.SSS3.p2.5.m5.1.1.2.cmml">P</mi><mo id="S3.SS2.SSS3.p2.5.m5.1.1.1" xref="S3.SS2.SSS3.p2.5.m5.1.1.1.cmml">^</mo></mover><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p2.5.m5.1b"><apply id="S3.SS2.SSS3.p2.5.m5.1.1.cmml" xref="S3.SS2.SSS3.p2.5.m5.1.1"><ci id="S3.SS2.SSS3.p2.5.m5.1.1.1.cmml" xref="S3.SS2.SSS3.p2.5.m5.1.1.1">^</ci><ci id="S3.SS2.SSS3.p2.5.m5.1.1.2.cmml" xref="S3.SS2.SSS3.p2.5.m5.1.1.2">𝑃</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p2.5.m5.1c">\hat{P}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p2.5.m5.1d">over^ start_ARG italic_P end_ARG</annotation></semantics></math> for each Gaussian:</p> <table class="ltx_equation ltx_eqn_table" id="S3.E12"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\hat{P}=\left\{(\sigma_{i},\delta_{i},\Delta_{i},\Sigma_{i},\gamma_{i},c_{i})% \right\}_{i=1}^{K}," class="ltx_Math" display="block" id="S3.E12.m1.1"><semantics id="S3.E12.m1.1a"><mrow id="S3.E12.m1.1.1.1" xref="S3.E12.m1.1.1.1.1.cmml"><mrow id="S3.E12.m1.1.1.1.1" xref="S3.E12.m1.1.1.1.1.cmml"><mover accent="true" id="S3.E12.m1.1.1.1.1.3" xref="S3.E12.m1.1.1.1.1.3.cmml"><mi id="S3.E12.m1.1.1.1.1.3.2" xref="S3.E12.m1.1.1.1.1.3.2.cmml">P</mi><mo id="S3.E12.m1.1.1.1.1.3.1" xref="S3.E12.m1.1.1.1.1.3.1.cmml">^</mo></mover><mo id="S3.E12.m1.1.1.1.1.2" xref="S3.E12.m1.1.1.1.1.2.cmml">=</mo><msubsup id="S3.E12.m1.1.1.1.1.1" xref="S3.E12.m1.1.1.1.1.1.cmml"><mrow id="S3.E12.m1.1.1.1.1.1.1.1.1" xref="S3.E12.m1.1.1.1.1.1.1.1.2.cmml"><mo id="S3.E12.m1.1.1.1.1.1.1.1.1.2" xref="S3.E12.m1.1.1.1.1.1.1.1.2.cmml">{</mo><mrow id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.7.cmml"><mo id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.7" stretchy="false" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.7.cmml">(</mo><msub id="S3.E12.m1.1.1.1.1.1.1.1.1.1.1.1" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.1.1.cmml"><mi id="S3.E12.m1.1.1.1.1.1.1.1.1.1.1.1.2" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.1.1.2.cmml">σ</mi><mi id="S3.E12.m1.1.1.1.1.1.1.1.1.1.1.1.3" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.1.1.3.cmml">i</mi></msub><mo id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.8" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.7.cmml">,</mo><msub id="S3.E12.m1.1.1.1.1.1.1.1.1.1.2.2" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.2.2.cmml"><mi id="S3.E12.m1.1.1.1.1.1.1.1.1.1.2.2.2" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.2.2.2.cmml">δ</mi><mi id="S3.E12.m1.1.1.1.1.1.1.1.1.1.2.2.3" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.2.2.3.cmml">i</mi></msub><mo id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.9" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.7.cmml">,</mo><msub id="S3.E12.m1.1.1.1.1.1.1.1.1.1.3.3" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.3.3.cmml"><mi id="S3.E12.m1.1.1.1.1.1.1.1.1.1.3.3.2" mathvariant="normal" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.3.3.2.cmml">Δ</mi><mi id="S3.E12.m1.1.1.1.1.1.1.1.1.1.3.3.3" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.3.3.3.cmml">i</mi></msub><mo id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.10" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.7.cmml">,</mo><msub id="S3.E12.m1.1.1.1.1.1.1.1.1.1.4.4" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.4.4.cmml"><mi id="S3.E12.m1.1.1.1.1.1.1.1.1.1.4.4.2" mathvariant="normal" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.4.4.2.cmml">Σ</mi><mi id="S3.E12.m1.1.1.1.1.1.1.1.1.1.4.4.3" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.4.4.3.cmml">i</mi></msub><mo id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.11" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.7.cmml">,</mo><msub id="S3.E12.m1.1.1.1.1.1.1.1.1.1.5.5" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.5.5.cmml"><mi id="S3.E12.m1.1.1.1.1.1.1.1.1.1.5.5.2" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.5.5.2.cmml">γ</mi><mi id="S3.E12.m1.1.1.1.1.1.1.1.1.1.5.5.3" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.5.5.3.cmml">i</mi></msub><mo id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.12" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.7.cmml">,</mo><msub id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.6" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.6.cmml"><mi id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.6.2" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.6.2.cmml">c</mi><mi id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.6.3" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.6.3.cmml">i</mi></msub><mo id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.13" stretchy="false" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.7.cmml">)</mo></mrow><mo id="S3.E12.m1.1.1.1.1.1.1.1.1.3" xref="S3.E12.m1.1.1.1.1.1.1.1.2.cmml">}</mo></mrow><mrow id="S3.E12.m1.1.1.1.1.1.1.3" xref="S3.E12.m1.1.1.1.1.1.1.3.cmml"><mi id="S3.E12.m1.1.1.1.1.1.1.3.2" xref="S3.E12.m1.1.1.1.1.1.1.3.2.cmml">i</mi><mo id="S3.E12.m1.1.1.1.1.1.1.3.1" xref="S3.E12.m1.1.1.1.1.1.1.3.1.cmml">=</mo><mn id="S3.E12.m1.1.1.1.1.1.1.3.3" xref="S3.E12.m1.1.1.1.1.1.1.3.3.cmml">1</mn></mrow><mi id="S3.E12.m1.1.1.1.1.1.3" xref="S3.E12.m1.1.1.1.1.1.3.cmml">K</mi></msubsup></mrow><mo id="S3.E12.m1.1.1.1.2" xref="S3.E12.m1.1.1.1.1.cmml">,</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.E12.m1.1b"><apply id="S3.E12.m1.1.1.1.1.cmml" xref="S3.E12.m1.1.1.1"><eq id="S3.E12.m1.1.1.1.1.2.cmml" xref="S3.E12.m1.1.1.1.1.2"></eq><apply id="S3.E12.m1.1.1.1.1.3.cmml" xref="S3.E12.m1.1.1.1.1.3"><ci id="S3.E12.m1.1.1.1.1.3.1.cmml" xref="S3.E12.m1.1.1.1.1.3.1">^</ci><ci id="S3.E12.m1.1.1.1.1.3.2.cmml" xref="S3.E12.m1.1.1.1.1.3.2">𝑃</ci></apply><apply id="S3.E12.m1.1.1.1.1.1.cmml" xref="S3.E12.m1.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.E12.m1.1.1.1.1.1.2.cmml" xref="S3.E12.m1.1.1.1.1.1">superscript</csymbol><apply id="S3.E12.m1.1.1.1.1.1.1.cmml" xref="S3.E12.m1.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.E12.m1.1.1.1.1.1.1.2.cmml" xref="S3.E12.m1.1.1.1.1.1">subscript</csymbol><set id="S3.E12.m1.1.1.1.1.1.1.1.2.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1"><vector id="S3.E12.m1.1.1.1.1.1.1.1.1.1.7.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.6"><apply id="S3.E12.m1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.E12.m1.1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.1.1">subscript</csymbol><ci id="S3.E12.m1.1.1.1.1.1.1.1.1.1.1.1.2.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.1.1.2">𝜎</ci><ci id="S3.E12.m1.1.1.1.1.1.1.1.1.1.1.1.3.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.1.1.3">𝑖</ci></apply><apply id="S3.E12.m1.1.1.1.1.1.1.1.1.1.2.2.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.2.2"><csymbol cd="ambiguous" id="S3.E12.m1.1.1.1.1.1.1.1.1.1.2.2.1.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.2.2">subscript</csymbol><ci id="S3.E12.m1.1.1.1.1.1.1.1.1.1.2.2.2.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.2.2.2">𝛿</ci><ci id="S3.E12.m1.1.1.1.1.1.1.1.1.1.2.2.3.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.2.2.3">𝑖</ci></apply><apply id="S3.E12.m1.1.1.1.1.1.1.1.1.1.3.3.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.3.3"><csymbol cd="ambiguous" id="S3.E12.m1.1.1.1.1.1.1.1.1.1.3.3.1.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.3.3">subscript</csymbol><ci id="S3.E12.m1.1.1.1.1.1.1.1.1.1.3.3.2.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.3.3.2">Δ</ci><ci id="S3.E12.m1.1.1.1.1.1.1.1.1.1.3.3.3.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.3.3.3">𝑖</ci></apply><apply id="S3.E12.m1.1.1.1.1.1.1.1.1.1.4.4.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.4.4"><csymbol cd="ambiguous" id="S3.E12.m1.1.1.1.1.1.1.1.1.1.4.4.1.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.4.4">subscript</csymbol><ci id="S3.E12.m1.1.1.1.1.1.1.1.1.1.4.4.2.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.4.4.2">Σ</ci><ci id="S3.E12.m1.1.1.1.1.1.1.1.1.1.4.4.3.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.4.4.3">𝑖</ci></apply><apply id="S3.E12.m1.1.1.1.1.1.1.1.1.1.5.5.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.5.5"><csymbol cd="ambiguous" id="S3.E12.m1.1.1.1.1.1.1.1.1.1.5.5.1.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.5.5">subscript</csymbol><ci id="S3.E12.m1.1.1.1.1.1.1.1.1.1.5.5.2.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.5.5.2">𝛾</ci><ci id="S3.E12.m1.1.1.1.1.1.1.1.1.1.5.5.3.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.5.5.3">𝑖</ci></apply><apply id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.6.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.6"><csymbol cd="ambiguous" id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.6.1.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.6">subscript</csymbol><ci id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.6.2.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.6.2">𝑐</ci><ci id="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.6.3.cmml" xref="S3.E12.m1.1.1.1.1.1.1.1.1.1.6.6.3">𝑖</ci></apply></vector></set><apply id="S3.E12.m1.1.1.1.1.1.1.3.cmml" xref="S3.E12.m1.1.1.1.1.1.1.3"><eq id="S3.E12.m1.1.1.1.1.1.1.3.1.cmml" xref="S3.E12.m1.1.1.1.1.1.1.3.1"></eq><ci id="S3.E12.m1.1.1.1.1.1.1.3.2.cmml" xref="S3.E12.m1.1.1.1.1.1.1.3.2">𝑖</ci><cn id="S3.E12.m1.1.1.1.1.1.1.3.3.cmml" type="integer" xref="S3.E12.m1.1.1.1.1.1.1.3.3">1</cn></apply></apply><ci id="S3.E12.m1.1.1.1.1.1.3.cmml" xref="S3.E12.m1.1.1.1.1.1.3">𝐾</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E12.m1.1c">\hat{P}=\left\{(\sigma_{i},\delta_{i},\Delta_{i},\Sigma_{i},\gamma_{i},c_{i})% \right\}_{i=1}^{K},</annotation><annotation encoding="application/x-llamapun" id="S3.E12.m1.1d">over^ start_ARG italic_P end_ARG = { ( italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ,</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(12)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS2.SSS3.p2.6">where the depth of the <math alttext="i" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p2.6.m1.1"><semantics id="S3.SS2.SSS3.p2.6.m1.1a"><mi id="S3.SS2.SSS3.p2.6.m1.1.1" xref="S3.SS2.SSS3.p2.6.m1.1.1.cmml">i</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p2.6.m1.1b"><ci id="S3.SS2.SSS3.p2.6.m1.1.1.cmml" xref="S3.SS2.SSS3.p2.6.m1.1.1">𝑖</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p2.6.m1.1c">i</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p2.6.m1.1d">italic_i</annotation></semantics></math>-th Gaussian is calculated as:</p> <table class="ltx_equation ltx_eqn_table" id="S3.E13"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="d_{i}=d+\sum_{j=1}^{i}\delta_{j}," class="ltx_Math" display="block" id="S3.E13.m1.1"><semantics id="S3.E13.m1.1a"><mrow id="S3.E13.m1.1.1.1" xref="S3.E13.m1.1.1.1.1.cmml"><mrow id="S3.E13.m1.1.1.1.1" xref="S3.E13.m1.1.1.1.1.cmml"><msub id="S3.E13.m1.1.1.1.1.2" xref="S3.E13.m1.1.1.1.1.2.cmml"><mi id="S3.E13.m1.1.1.1.1.2.2" xref="S3.E13.m1.1.1.1.1.2.2.cmml">d</mi><mi id="S3.E13.m1.1.1.1.1.2.3" xref="S3.E13.m1.1.1.1.1.2.3.cmml">i</mi></msub><mo id="S3.E13.m1.1.1.1.1.1" xref="S3.E13.m1.1.1.1.1.1.cmml">=</mo><mrow id="S3.E13.m1.1.1.1.1.3" xref="S3.E13.m1.1.1.1.1.3.cmml"><mi id="S3.E13.m1.1.1.1.1.3.2" xref="S3.E13.m1.1.1.1.1.3.2.cmml">d</mi><mo id="S3.E13.m1.1.1.1.1.3.1" rspace="0.055em" xref="S3.E13.m1.1.1.1.1.3.1.cmml">+</mo><mrow id="S3.E13.m1.1.1.1.1.3.3" xref="S3.E13.m1.1.1.1.1.3.3.cmml"><munderover id="S3.E13.m1.1.1.1.1.3.3.1" xref="S3.E13.m1.1.1.1.1.3.3.1.cmml"><mo id="S3.E13.m1.1.1.1.1.3.3.1.2.2" movablelimits="false" xref="S3.E13.m1.1.1.1.1.3.3.1.2.2.cmml">∑</mo><mrow id="S3.E13.m1.1.1.1.1.3.3.1.2.3" xref="S3.E13.m1.1.1.1.1.3.3.1.2.3.cmml"><mi id="S3.E13.m1.1.1.1.1.3.3.1.2.3.2" xref="S3.E13.m1.1.1.1.1.3.3.1.2.3.2.cmml">j</mi><mo id="S3.E13.m1.1.1.1.1.3.3.1.2.3.1" xref="S3.E13.m1.1.1.1.1.3.3.1.2.3.1.cmml">=</mo><mn id="S3.E13.m1.1.1.1.1.3.3.1.2.3.3" xref="S3.E13.m1.1.1.1.1.3.3.1.2.3.3.cmml">1</mn></mrow><mi id="S3.E13.m1.1.1.1.1.3.3.1.3" xref="S3.E13.m1.1.1.1.1.3.3.1.3.cmml">i</mi></munderover><msub id="S3.E13.m1.1.1.1.1.3.3.2" xref="S3.E13.m1.1.1.1.1.3.3.2.cmml"><mi id="S3.E13.m1.1.1.1.1.3.3.2.2" xref="S3.E13.m1.1.1.1.1.3.3.2.2.cmml">δ</mi><mi id="S3.E13.m1.1.1.1.1.3.3.2.3" xref="S3.E13.m1.1.1.1.1.3.3.2.3.cmml">j</mi></msub></mrow></mrow></mrow><mo id="S3.E13.m1.1.1.1.2" xref="S3.E13.m1.1.1.1.1.cmml">,</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.E13.m1.1b"><apply id="S3.E13.m1.1.1.1.1.cmml" xref="S3.E13.m1.1.1.1"><eq id="S3.E13.m1.1.1.1.1.1.cmml" xref="S3.E13.m1.1.1.1.1.1"></eq><apply id="S3.E13.m1.1.1.1.1.2.cmml" xref="S3.E13.m1.1.1.1.1.2"><csymbol cd="ambiguous" id="S3.E13.m1.1.1.1.1.2.1.cmml" xref="S3.E13.m1.1.1.1.1.2">subscript</csymbol><ci id="S3.E13.m1.1.1.1.1.2.2.cmml" xref="S3.E13.m1.1.1.1.1.2.2">𝑑</ci><ci id="S3.E13.m1.1.1.1.1.2.3.cmml" xref="S3.E13.m1.1.1.1.1.2.3">𝑖</ci></apply><apply id="S3.E13.m1.1.1.1.1.3.cmml" xref="S3.E13.m1.1.1.1.1.3"><plus id="S3.E13.m1.1.1.1.1.3.1.cmml" xref="S3.E13.m1.1.1.1.1.3.1"></plus><ci id="S3.E13.m1.1.1.1.1.3.2.cmml" xref="S3.E13.m1.1.1.1.1.3.2">𝑑</ci><apply id="S3.E13.m1.1.1.1.1.3.3.cmml" xref="S3.E13.m1.1.1.1.1.3.3"><apply id="S3.E13.m1.1.1.1.1.3.3.1.cmml" xref="S3.E13.m1.1.1.1.1.3.3.1"><csymbol cd="ambiguous" id="S3.E13.m1.1.1.1.1.3.3.1.1.cmml" xref="S3.E13.m1.1.1.1.1.3.3.1">superscript</csymbol><apply id="S3.E13.m1.1.1.1.1.3.3.1.2.cmml" xref="S3.E13.m1.1.1.1.1.3.3.1"><csymbol cd="ambiguous" id="S3.E13.m1.1.1.1.1.3.3.1.2.1.cmml" xref="S3.E13.m1.1.1.1.1.3.3.1">subscript</csymbol><sum id="S3.E13.m1.1.1.1.1.3.3.1.2.2.cmml" xref="S3.E13.m1.1.1.1.1.3.3.1.2.2"></sum><apply id="S3.E13.m1.1.1.1.1.3.3.1.2.3.cmml" xref="S3.E13.m1.1.1.1.1.3.3.1.2.3"><eq id="S3.E13.m1.1.1.1.1.3.3.1.2.3.1.cmml" xref="S3.E13.m1.1.1.1.1.3.3.1.2.3.1"></eq><ci id="S3.E13.m1.1.1.1.1.3.3.1.2.3.2.cmml" xref="S3.E13.m1.1.1.1.1.3.3.1.2.3.2">𝑗</ci><cn id="S3.E13.m1.1.1.1.1.3.3.1.2.3.3.cmml" type="integer" xref="S3.E13.m1.1.1.1.1.3.3.1.2.3.3">1</cn></apply></apply><ci id="S3.E13.m1.1.1.1.1.3.3.1.3.cmml" xref="S3.E13.m1.1.1.1.1.3.3.1.3">𝑖</ci></apply><apply id="S3.E13.m1.1.1.1.1.3.3.2.cmml" xref="S3.E13.m1.1.1.1.1.3.3.2"><csymbol cd="ambiguous" id="S3.E13.m1.1.1.1.1.3.3.2.1.cmml" xref="S3.E13.m1.1.1.1.1.3.3.2">subscript</csymbol><ci id="S3.E13.m1.1.1.1.1.3.3.2.2.cmml" xref="S3.E13.m1.1.1.1.1.3.3.2.2">𝛿</ci><ci id="S3.E13.m1.1.1.1.1.3.3.2.3.cmml" xref="S3.E13.m1.1.1.1.1.3.3.2.3">𝑗</ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E13.m1.1c">d_{i}=d+\sum_{j=1}^{i}\delta_{j},</annotation><annotation encoding="application/x-llamapun" id="S3.E13.m1.1d">italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_d + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ,</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(13)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS2.SSS3.p2.10">where <math alttext="d=D(u)" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p2.7.m1.1"><semantics id="S3.SS2.SSS3.p2.7.m1.1a"><mrow id="S3.SS2.SSS3.p2.7.m1.1.2" xref="S3.SS2.SSS3.p2.7.m1.1.2.cmml"><mi id="S3.SS2.SSS3.p2.7.m1.1.2.2" xref="S3.SS2.SSS3.p2.7.m1.1.2.2.cmml">d</mi><mo id="S3.SS2.SSS3.p2.7.m1.1.2.1" xref="S3.SS2.SSS3.p2.7.m1.1.2.1.cmml">=</mo><mrow id="S3.SS2.SSS3.p2.7.m1.1.2.3" xref="S3.SS2.SSS3.p2.7.m1.1.2.3.cmml"><mi id="S3.SS2.SSS3.p2.7.m1.1.2.3.2" xref="S3.SS2.SSS3.p2.7.m1.1.2.3.2.cmml">D</mi><mo id="S3.SS2.SSS3.p2.7.m1.1.2.3.1" xref="S3.SS2.SSS3.p2.7.m1.1.2.3.1.cmml"></mo><mrow id="S3.SS2.SSS3.p2.7.m1.1.2.3.3.2" xref="S3.SS2.SSS3.p2.7.m1.1.2.3.cmml"><mo id="S3.SS2.SSS3.p2.7.m1.1.2.3.3.2.1" stretchy="false" xref="S3.SS2.SSS3.p2.7.m1.1.2.3.cmml">(</mo><mi id="S3.SS2.SSS3.p2.7.m1.1.1" xref="S3.SS2.SSS3.p2.7.m1.1.1.cmml">u</mi><mo id="S3.SS2.SSS3.p2.7.m1.1.2.3.3.2.2" stretchy="false" xref="S3.SS2.SSS3.p2.7.m1.1.2.3.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p2.7.m1.1b"><apply id="S3.SS2.SSS3.p2.7.m1.1.2.cmml" xref="S3.SS2.SSS3.p2.7.m1.1.2"><eq id="S3.SS2.SSS3.p2.7.m1.1.2.1.cmml" xref="S3.SS2.SSS3.p2.7.m1.1.2.1"></eq><ci id="S3.SS2.SSS3.p2.7.m1.1.2.2.cmml" xref="S3.SS2.SSS3.p2.7.m1.1.2.2">𝑑</ci><apply id="S3.SS2.SSS3.p2.7.m1.1.2.3.cmml" xref="S3.SS2.SSS3.p2.7.m1.1.2.3"><times id="S3.SS2.SSS3.p2.7.m1.1.2.3.1.cmml" xref="S3.SS2.SSS3.p2.7.m1.1.2.3.1"></times><ci id="S3.SS2.SSS3.p2.7.m1.1.2.3.2.cmml" xref="S3.SS2.SSS3.p2.7.m1.1.2.3.2">𝐷</ci><ci id="S3.SS2.SSS3.p2.7.m1.1.1.cmml" xref="S3.SS2.SSS3.p2.7.m1.1.1">𝑢</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p2.7.m1.1c">d=D(u)</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p2.7.m1.1d">italic_d = italic_D ( italic_u )</annotation></semantics></math> stands for the depth from the depth map, and <math alttext="\delta_{1}=0" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p2.8.m2.1"><semantics id="S3.SS2.SSS3.p2.8.m2.1a"><mrow id="S3.SS2.SSS3.p2.8.m2.1.1" xref="S3.SS2.SSS3.p2.8.m2.1.1.cmml"><msub id="S3.SS2.SSS3.p2.8.m2.1.1.2" xref="S3.SS2.SSS3.p2.8.m2.1.1.2.cmml"><mi id="S3.SS2.SSS3.p2.8.m2.1.1.2.2" xref="S3.SS2.SSS3.p2.8.m2.1.1.2.2.cmml">δ</mi><mn id="S3.SS2.SSS3.p2.8.m2.1.1.2.3" xref="S3.SS2.SSS3.p2.8.m2.1.1.2.3.cmml">1</mn></msub><mo id="S3.SS2.SSS3.p2.8.m2.1.1.1" xref="S3.SS2.SSS3.p2.8.m2.1.1.1.cmml">=</mo><mn id="S3.SS2.SSS3.p2.8.m2.1.1.3" xref="S3.SS2.SSS3.p2.8.m2.1.1.3.cmml">0</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p2.8.m2.1b"><apply id="S3.SS2.SSS3.p2.8.m2.1.1.cmml" xref="S3.SS2.SSS3.p2.8.m2.1.1"><eq id="S3.SS2.SSS3.p2.8.m2.1.1.1.cmml" xref="S3.SS2.SSS3.p2.8.m2.1.1.1"></eq><apply id="S3.SS2.SSS3.p2.8.m2.1.1.2.cmml" xref="S3.SS2.SSS3.p2.8.m2.1.1.2"><csymbol cd="ambiguous" id="S3.SS2.SSS3.p2.8.m2.1.1.2.1.cmml" xref="S3.SS2.SSS3.p2.8.m2.1.1.2">subscript</csymbol><ci id="S3.SS2.SSS3.p2.8.m2.1.1.2.2.cmml" xref="S3.SS2.SSS3.p2.8.m2.1.1.2.2">𝛿</ci><cn id="S3.SS2.SSS3.p2.8.m2.1.1.2.3.cmml" type="integer" xref="S3.SS2.SSS3.p2.8.m2.1.1.2.3">1</cn></apply><cn id="S3.SS2.SSS3.p2.8.m2.1.1.3.cmml" type="integer" xref="S3.SS2.SSS3.p2.8.m2.1.1.3">0</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p2.8.m2.1c">\delta_{1}=0</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p2.8.m2.1d">italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0</annotation></semantics></math>. The mean <math alttext="\mu_{i}" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p2.9.m3.1"><semantics id="S3.SS2.SSS3.p2.9.m3.1a"><msub id="S3.SS2.SSS3.p2.9.m3.1.1" xref="S3.SS2.SSS3.p2.9.m3.1.1.cmml"><mi id="S3.SS2.SSS3.p2.9.m3.1.1.2" xref="S3.SS2.SSS3.p2.9.m3.1.1.2.cmml">μ</mi><mi id="S3.SS2.SSS3.p2.9.m3.1.1.3" xref="S3.SS2.SSS3.p2.9.m3.1.1.3.cmml">i</mi></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p2.9.m3.1b"><apply id="S3.SS2.SSS3.p2.9.m3.1.1.cmml" xref="S3.SS2.SSS3.p2.9.m3.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS3.p2.9.m3.1.1.1.cmml" xref="S3.SS2.SSS3.p2.9.m3.1.1">subscript</csymbol><ci id="S3.SS2.SSS3.p2.9.m3.1.1.2.cmml" xref="S3.SS2.SSS3.p2.9.m3.1.1.2">𝜇</ci><ci id="S3.SS2.SSS3.p2.9.m3.1.1.3.cmml" xref="S3.SS2.SSS3.p2.9.m3.1.1.3">𝑖</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p2.9.m3.1c">\mu_{i}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p2.9.m3.1d">italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT</annotation></semantics></math> for the <math alttext="i" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p2.10.m4.1"><semantics id="S3.SS2.SSS3.p2.10.m4.1a"><mi id="S3.SS2.SSS3.p2.10.m4.1.1" xref="S3.SS2.SSS3.p2.10.m4.1.1.cmml">i</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p2.10.m4.1b"><ci id="S3.SS2.SSS3.p2.10.m4.1.1.cmml" xref="S3.SS2.SSS3.p2.10.m4.1.1">𝑖</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p2.10.m4.1c">i</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p2.10.m4.1d">italic_i</annotation></semantics></math>-th Gaussian is given by:</p> <table class="ltx_equation ltx_eqn_table" id="S3.E14"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\mu_{i}=\left(\frac{u_{x}d_{i}}{f},\frac{u_{y}d_{i}}{f},d_{i}\right)+\Delta_{i}," class="ltx_Math" display="block" id="S3.E14.m1.3"><semantics id="S3.E14.m1.3a"><mrow id="S3.E14.m1.3.3.1" xref="S3.E14.m1.3.3.1.1.cmml"><mrow id="S3.E14.m1.3.3.1.1" xref="S3.E14.m1.3.3.1.1.cmml"><msub id="S3.E14.m1.3.3.1.1.3" xref="S3.E14.m1.3.3.1.1.3.cmml"><mi id="S3.E14.m1.3.3.1.1.3.2" xref="S3.E14.m1.3.3.1.1.3.2.cmml">μ</mi><mi id="S3.E14.m1.3.3.1.1.3.3" xref="S3.E14.m1.3.3.1.1.3.3.cmml">i</mi></msub><mo id="S3.E14.m1.3.3.1.1.2" xref="S3.E14.m1.3.3.1.1.2.cmml">=</mo><mrow id="S3.E14.m1.3.3.1.1.1" xref="S3.E14.m1.3.3.1.1.1.cmml"><mrow id="S3.E14.m1.3.3.1.1.1.1.1" xref="S3.E14.m1.3.3.1.1.1.1.2.cmml"><mo id="S3.E14.m1.3.3.1.1.1.1.1.2" xref="S3.E14.m1.3.3.1.1.1.1.2.cmml">(</mo><mfrac id="S3.E14.m1.1.1" xref="S3.E14.m1.1.1.cmml"><mrow id="S3.E14.m1.1.1.2" xref="S3.E14.m1.1.1.2.cmml"><msub id="S3.E14.m1.1.1.2.2" xref="S3.E14.m1.1.1.2.2.cmml"><mi id="S3.E14.m1.1.1.2.2.2" xref="S3.E14.m1.1.1.2.2.2.cmml">u</mi><mi id="S3.E14.m1.1.1.2.2.3" xref="S3.E14.m1.1.1.2.2.3.cmml">x</mi></msub><mo id="S3.E14.m1.1.1.2.1" xref="S3.E14.m1.1.1.2.1.cmml"></mo><msub id="S3.E14.m1.1.1.2.3" xref="S3.E14.m1.1.1.2.3.cmml"><mi id="S3.E14.m1.1.1.2.3.2" xref="S3.E14.m1.1.1.2.3.2.cmml">d</mi><mi id="S3.E14.m1.1.1.2.3.3" xref="S3.E14.m1.1.1.2.3.3.cmml">i</mi></msub></mrow><mi id="S3.E14.m1.1.1.3" xref="S3.E14.m1.1.1.3.cmml">f</mi></mfrac><mo id="S3.E14.m1.3.3.1.1.1.1.1.3" xref="S3.E14.m1.3.3.1.1.1.1.2.cmml">,</mo><mfrac id="S3.E14.m1.2.2" xref="S3.E14.m1.2.2.cmml"><mrow id="S3.E14.m1.2.2.2" xref="S3.E14.m1.2.2.2.cmml"><msub id="S3.E14.m1.2.2.2.2" xref="S3.E14.m1.2.2.2.2.cmml"><mi id="S3.E14.m1.2.2.2.2.2" xref="S3.E14.m1.2.2.2.2.2.cmml">u</mi><mi id="S3.E14.m1.2.2.2.2.3" xref="S3.E14.m1.2.2.2.2.3.cmml">y</mi></msub><mo id="S3.E14.m1.2.2.2.1" xref="S3.E14.m1.2.2.2.1.cmml"></mo><msub id="S3.E14.m1.2.2.2.3" xref="S3.E14.m1.2.2.2.3.cmml"><mi id="S3.E14.m1.2.2.2.3.2" xref="S3.E14.m1.2.2.2.3.2.cmml">d</mi><mi id="S3.E14.m1.2.2.2.3.3" xref="S3.E14.m1.2.2.2.3.3.cmml">i</mi></msub></mrow><mi id="S3.E14.m1.2.2.3" xref="S3.E14.m1.2.2.3.cmml">f</mi></mfrac><mo id="S3.E14.m1.3.3.1.1.1.1.1.4" xref="S3.E14.m1.3.3.1.1.1.1.2.cmml">,</mo><msub id="S3.E14.m1.3.3.1.1.1.1.1.1" xref="S3.E14.m1.3.3.1.1.1.1.1.1.cmml"><mi id="S3.E14.m1.3.3.1.1.1.1.1.1.2" xref="S3.E14.m1.3.3.1.1.1.1.1.1.2.cmml">d</mi><mi id="S3.E14.m1.3.3.1.1.1.1.1.1.3" xref="S3.E14.m1.3.3.1.1.1.1.1.1.3.cmml">i</mi></msub><mo id="S3.E14.m1.3.3.1.1.1.1.1.5" xref="S3.E14.m1.3.3.1.1.1.1.2.cmml">)</mo></mrow><mo id="S3.E14.m1.3.3.1.1.1.2" xref="S3.E14.m1.3.3.1.1.1.2.cmml">+</mo><msub id="S3.E14.m1.3.3.1.1.1.3" xref="S3.E14.m1.3.3.1.1.1.3.cmml"><mi id="S3.E14.m1.3.3.1.1.1.3.2" mathvariant="normal" xref="S3.E14.m1.3.3.1.1.1.3.2.cmml">Δ</mi><mi id="S3.E14.m1.3.3.1.1.1.3.3" xref="S3.E14.m1.3.3.1.1.1.3.3.cmml">i</mi></msub></mrow></mrow><mo id="S3.E14.m1.3.3.1.2" xref="S3.E14.m1.3.3.1.1.cmml">,</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.E14.m1.3b"><apply id="S3.E14.m1.3.3.1.1.cmml" xref="S3.E14.m1.3.3.1"><eq id="S3.E14.m1.3.3.1.1.2.cmml" xref="S3.E14.m1.3.3.1.1.2"></eq><apply id="S3.E14.m1.3.3.1.1.3.cmml" xref="S3.E14.m1.3.3.1.1.3"><csymbol cd="ambiguous" id="S3.E14.m1.3.3.1.1.3.1.cmml" xref="S3.E14.m1.3.3.1.1.3">subscript</csymbol><ci id="S3.E14.m1.3.3.1.1.3.2.cmml" xref="S3.E14.m1.3.3.1.1.3.2">𝜇</ci><ci id="S3.E14.m1.3.3.1.1.3.3.cmml" xref="S3.E14.m1.3.3.1.1.3.3">𝑖</ci></apply><apply id="S3.E14.m1.3.3.1.1.1.cmml" xref="S3.E14.m1.3.3.1.1.1"><plus id="S3.E14.m1.3.3.1.1.1.2.cmml" xref="S3.E14.m1.3.3.1.1.1.2"></plus><vector id="S3.E14.m1.3.3.1.1.1.1.2.cmml" xref="S3.E14.m1.3.3.1.1.1.1.1"><apply id="S3.E14.m1.1.1.cmml" xref="S3.E14.m1.1.1"><divide id="S3.E14.m1.1.1.1.cmml" xref="S3.E14.m1.1.1"></divide><apply id="S3.E14.m1.1.1.2.cmml" xref="S3.E14.m1.1.1.2"><times id="S3.E14.m1.1.1.2.1.cmml" xref="S3.E14.m1.1.1.2.1"></times><apply id="S3.E14.m1.1.1.2.2.cmml" xref="S3.E14.m1.1.1.2.2"><csymbol cd="ambiguous" id="S3.E14.m1.1.1.2.2.1.cmml" xref="S3.E14.m1.1.1.2.2">subscript</csymbol><ci id="S3.E14.m1.1.1.2.2.2.cmml" xref="S3.E14.m1.1.1.2.2.2">𝑢</ci><ci id="S3.E14.m1.1.1.2.2.3.cmml" xref="S3.E14.m1.1.1.2.2.3">𝑥</ci></apply><apply id="S3.E14.m1.1.1.2.3.cmml" xref="S3.E14.m1.1.1.2.3"><csymbol cd="ambiguous" id="S3.E14.m1.1.1.2.3.1.cmml" xref="S3.E14.m1.1.1.2.3">subscript</csymbol><ci id="S3.E14.m1.1.1.2.3.2.cmml" xref="S3.E14.m1.1.1.2.3.2">𝑑</ci><ci id="S3.E14.m1.1.1.2.3.3.cmml" xref="S3.E14.m1.1.1.2.3.3">𝑖</ci></apply></apply><ci id="S3.E14.m1.1.1.3.cmml" xref="S3.E14.m1.1.1.3">𝑓</ci></apply><apply id="S3.E14.m1.2.2.cmml" xref="S3.E14.m1.2.2"><divide id="S3.E14.m1.2.2.1.cmml" xref="S3.E14.m1.2.2"></divide><apply id="S3.E14.m1.2.2.2.cmml" xref="S3.E14.m1.2.2.2"><times id="S3.E14.m1.2.2.2.1.cmml" xref="S3.E14.m1.2.2.2.1"></times><apply id="S3.E14.m1.2.2.2.2.cmml" xref="S3.E14.m1.2.2.2.2"><csymbol cd="ambiguous" id="S3.E14.m1.2.2.2.2.1.cmml" xref="S3.E14.m1.2.2.2.2">subscript</csymbol><ci id="S3.E14.m1.2.2.2.2.2.cmml" xref="S3.E14.m1.2.2.2.2.2">𝑢</ci><ci id="S3.E14.m1.2.2.2.2.3.cmml" xref="S3.E14.m1.2.2.2.2.3">𝑦</ci></apply><apply id="S3.E14.m1.2.2.2.3.cmml" xref="S3.E14.m1.2.2.2.3"><csymbol cd="ambiguous" id="S3.E14.m1.2.2.2.3.1.cmml" xref="S3.E14.m1.2.2.2.3">subscript</csymbol><ci id="S3.E14.m1.2.2.2.3.2.cmml" xref="S3.E14.m1.2.2.2.3.2">𝑑</ci><ci id="S3.E14.m1.2.2.2.3.3.cmml" xref="S3.E14.m1.2.2.2.3.3">𝑖</ci></apply></apply><ci id="S3.E14.m1.2.2.3.cmml" xref="S3.E14.m1.2.2.3">𝑓</ci></apply><apply id="S3.E14.m1.3.3.1.1.1.1.1.1.cmml" xref="S3.E14.m1.3.3.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.E14.m1.3.3.1.1.1.1.1.1.1.cmml" xref="S3.E14.m1.3.3.1.1.1.1.1.1">subscript</csymbol><ci id="S3.E14.m1.3.3.1.1.1.1.1.1.2.cmml" xref="S3.E14.m1.3.3.1.1.1.1.1.1.2">𝑑</ci><ci id="S3.E14.m1.3.3.1.1.1.1.1.1.3.cmml" xref="S3.E14.m1.3.3.1.1.1.1.1.1.3">𝑖</ci></apply></vector><apply id="S3.E14.m1.3.3.1.1.1.3.cmml" xref="S3.E14.m1.3.3.1.1.1.3"><csymbol cd="ambiguous" id="S3.E14.m1.3.3.1.1.1.3.1.cmml" xref="S3.E14.m1.3.3.1.1.1.3">subscript</csymbol><ci id="S3.E14.m1.3.3.1.1.1.3.2.cmml" xref="S3.E14.m1.3.3.1.1.1.3.2">Δ</ci><ci id="S3.E14.m1.3.3.1.1.1.3.3.cmml" xref="S3.E14.m1.3.3.1.1.1.3.3">𝑖</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E14.m1.3c">\mu_{i}=\left(\frac{u_{x}d_{i}}{f},\frac{u_{y}d_{i}}{f},d_{i}\right)+\Delta_{i},</annotation><annotation encoding="application/x-llamapun" id="S3.E14.m1.3d">italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( divide start_ARG italic_u start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_f end_ARG , divide start_ARG italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_f end_ARG , italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(14)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS2.SSS3.p2.13">where <math alttext="f" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p2.11.m1.1"><semantics id="S3.SS2.SSS3.p2.11.m1.1a"><mi id="S3.SS2.SSS3.p2.11.m1.1.1" xref="S3.SS2.SSS3.p2.11.m1.1.1.cmml">f</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p2.11.m1.1b"><ci id="S3.SS2.SSS3.p2.11.m1.1.1.cmml" xref="S3.SS2.SSS3.p2.11.m1.1.1">𝑓</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p2.11.m1.1c">f</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p2.11.m1.1d">italic_f</annotation></semantics></math> means the focal length of the camera. Integrating <math alttext="\Delta_{i}" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p2.12.m2.1"><semantics id="S3.SS2.SSS3.p2.12.m2.1a"><msub id="S3.SS2.SSS3.p2.12.m2.1.1" xref="S3.SS2.SSS3.p2.12.m2.1.1.cmml"><mi id="S3.SS2.SSS3.p2.12.m2.1.1.2" mathvariant="normal" xref="S3.SS2.SSS3.p2.12.m2.1.1.2.cmml">Δ</mi><mi id="S3.SS2.SSS3.p2.12.m2.1.1.3" xref="S3.SS2.SSS3.p2.12.m2.1.1.3.cmml">i</mi></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p2.12.m2.1b"><apply id="S3.SS2.SSS3.p2.12.m2.1.1.cmml" xref="S3.SS2.SSS3.p2.12.m2.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS3.p2.12.m2.1.1.1.cmml" xref="S3.SS2.SSS3.p2.12.m2.1.1">subscript</csymbol><ci id="S3.SS2.SSS3.p2.12.m2.1.1.2.cmml" xref="S3.SS2.SSS3.p2.12.m2.1.1.2">Δ</ci><ci id="S3.SS2.SSS3.p2.12.m2.1.1.3.cmml" xref="S3.SS2.SSS3.p2.12.m2.1.1.3">𝑖</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p2.12.m2.1c">\Delta_{i}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p2.12.m2.1d">roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT</annotation></semantics></math> in <math alttext="\mu_{i}" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p2.13.m3.1"><semantics id="S3.SS2.SSS3.p2.13.m3.1a"><msub id="S3.SS2.SSS3.p2.13.m3.1.1" xref="S3.SS2.SSS3.p2.13.m3.1.1.cmml"><mi id="S3.SS2.SSS3.p2.13.m3.1.1.2" xref="S3.SS2.SSS3.p2.13.m3.1.1.2.cmml">μ</mi><mi id="S3.SS2.SSS3.p2.13.m3.1.1.3" xref="S3.SS2.SSS3.p2.13.m3.1.1.3.cmml">i</mi></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p2.13.m3.1b"><apply id="S3.SS2.SSS3.p2.13.m3.1.1.cmml" xref="S3.SS2.SSS3.p2.13.m3.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS3.p2.13.m3.1.1.1.cmml" xref="S3.SS2.SSS3.p2.13.m3.1.1">subscript</csymbol><ci id="S3.SS2.SSS3.p2.13.m3.1.1.2.cmml" xref="S3.SS2.SSS3.p2.13.m3.1.1.2">𝜇</ci><ci id="S3.SS2.SSS3.p2.13.m3.1.1.3.cmml" xref="S3.SS2.SSS3.p2.13.m3.1.1.3">𝑖</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p2.13.m3.1c">\mu_{i}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p2.13.m3.1d">italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT</annotation></semantics></math> enables the model to better handle occluded regions and depth ambiguities.</p> </div> <div class="ltx_para ltx_noindent" id="S3.SS2.SSS3.p3"> <p class="ltx_p" id="S3.SS2.SSS3.p3.3"><span class="ltx_text ltx_font_bold" id="S3.SS2.SSS3.p3.3.1">Output Gaussian Parameters.</span> Using an additional neural network <math alttext="\Phi(I,D)" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p3.1.m1.2"><semantics id="S3.SS2.SSS3.p3.1.m1.2a"><mrow id="S3.SS2.SSS3.p3.1.m1.2.3" xref="S3.SS2.SSS3.p3.1.m1.2.3.cmml"><mi id="S3.SS2.SSS3.p3.1.m1.2.3.2" mathvariant="normal" xref="S3.SS2.SSS3.p3.1.m1.2.3.2.cmml">Φ</mi><mo id="S3.SS2.SSS3.p3.1.m1.2.3.1" xref="S3.SS2.SSS3.p3.1.m1.2.3.1.cmml"></mo><mrow id="S3.SS2.SSS3.p3.1.m1.2.3.3.2" xref="S3.SS2.SSS3.p3.1.m1.2.3.3.1.cmml"><mo id="S3.SS2.SSS3.p3.1.m1.2.3.3.2.1" stretchy="false" xref="S3.SS2.SSS3.p3.1.m1.2.3.3.1.cmml">(</mo><mi id="S3.SS2.SSS3.p3.1.m1.1.1" xref="S3.SS2.SSS3.p3.1.m1.1.1.cmml">I</mi><mo id="S3.SS2.SSS3.p3.1.m1.2.3.3.2.2" xref="S3.SS2.SSS3.p3.1.m1.2.3.3.1.cmml">,</mo><mi id="S3.SS2.SSS3.p3.1.m1.2.2" xref="S3.SS2.SSS3.p3.1.m1.2.2.cmml">D</mi><mo id="S3.SS2.SSS3.p3.1.m1.2.3.3.2.3" stretchy="false" xref="S3.SS2.SSS3.p3.1.m1.2.3.3.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p3.1.m1.2b"><apply id="S3.SS2.SSS3.p3.1.m1.2.3.cmml" xref="S3.SS2.SSS3.p3.1.m1.2.3"><times id="S3.SS2.SSS3.p3.1.m1.2.3.1.cmml" xref="S3.SS2.SSS3.p3.1.m1.2.3.1"></times><ci id="S3.SS2.SSS3.p3.1.m1.2.3.2.cmml" xref="S3.SS2.SSS3.p3.1.m1.2.3.2">Φ</ci><interval closure="open" id="S3.SS2.SSS3.p3.1.m1.2.3.3.1.cmml" xref="S3.SS2.SSS3.p3.1.m1.2.3.3.2"><ci id="S3.SS2.SSS3.p3.1.m1.1.1.cmml" xref="S3.SS2.SSS3.p3.1.m1.1.1">𝐼</ci><ci id="S3.SS2.SSS3.p3.1.m1.2.2.cmml" xref="S3.SS2.SSS3.p3.1.m1.2.2">𝐷</ci></interval></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p3.1.m1.2c">\Phi(I,D)</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p3.1.m1.2d">roman_Φ ( italic_I , italic_D )</annotation></semantics></math> to process both the image and the depth map, we output per-pixel Gaussian parameters for accurate scene modeling. For each pixel <math alttext="u" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p3.2.m2.1"><semantics id="S3.SS2.SSS3.p3.2.m2.1a"><mi id="S3.SS2.SSS3.p3.2.m2.1.1" xref="S3.SS2.SSS3.p3.2.m2.1.1.cmml">u</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p3.2.m2.1b"><ci id="S3.SS2.SSS3.p3.2.m2.1.1.cmml" xref="S3.SS2.SSS3.p3.2.m2.1.1">𝑢</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p3.2.m2.1c">u</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p3.2.m2.1d">italic_u</annotation></semantics></math>, the network result is given by Eq. (<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#S3.E8" title="Equation 8 ‣ 3.2.1 Prior Information and Geometric Feature ‣ 3.2 Our Method: Niagara ‣ 3 Proposed Method ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">8</span></a>), where the mean <math alttext="\mu" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p3.3.m3.1"><semantics id="S3.SS2.SSS3.p3.3.m3.1a"><mi id="S3.SS2.SSS3.p3.3.m3.1.1" xref="S3.SS2.SSS3.p3.3.m3.1.1.cmml">μ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p3.3.m3.1b"><ci id="S3.SS2.SSS3.p3.3.m3.1.1.cmml" xref="S3.SS2.SSS3.p3.3.m3.1.1">𝜇</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p3.3.m3.1c">\mu</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p3.3.m3.1d">italic_μ</annotation></semantics></math> is given by</p> <table class="ltx_equation ltx_eqn_table" id="S3.E15"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\mu=\left(\frac{u_{x}d}{f},\frac{u_{y}d}{f},d\right)+\Delta." class="ltx_Math" display="block" id="S3.E15.m1.4"><semantics id="S3.E15.m1.4a"><mrow id="S3.E15.m1.4.4.1" xref="S3.E15.m1.4.4.1.1.cmml"><mrow id="S3.E15.m1.4.4.1.1" xref="S3.E15.m1.4.4.1.1.cmml"><mi id="S3.E15.m1.4.4.1.1.2" xref="S3.E15.m1.4.4.1.1.2.cmml">μ</mi><mo id="S3.E15.m1.4.4.1.1.1" xref="S3.E15.m1.4.4.1.1.1.cmml">=</mo><mrow id="S3.E15.m1.4.4.1.1.3" xref="S3.E15.m1.4.4.1.1.3.cmml"><mrow id="S3.E15.m1.4.4.1.1.3.2.2" xref="S3.E15.m1.4.4.1.1.3.2.1.cmml"><mo id="S3.E15.m1.4.4.1.1.3.2.2.1" xref="S3.E15.m1.4.4.1.1.3.2.1.cmml">(</mo><mfrac id="S3.E15.m1.1.1" xref="S3.E15.m1.1.1.cmml"><mrow id="S3.E15.m1.1.1.2" xref="S3.E15.m1.1.1.2.cmml"><msub id="S3.E15.m1.1.1.2.2" xref="S3.E15.m1.1.1.2.2.cmml"><mi id="S3.E15.m1.1.1.2.2.2" xref="S3.E15.m1.1.1.2.2.2.cmml">u</mi><mi id="S3.E15.m1.1.1.2.2.3" xref="S3.E15.m1.1.1.2.2.3.cmml">x</mi></msub><mo id="S3.E15.m1.1.1.2.1" xref="S3.E15.m1.1.1.2.1.cmml"></mo><mi id="S3.E15.m1.1.1.2.3" xref="S3.E15.m1.1.1.2.3.cmml">d</mi></mrow><mi id="S3.E15.m1.1.1.3" xref="S3.E15.m1.1.1.3.cmml">f</mi></mfrac><mo id="S3.E15.m1.4.4.1.1.3.2.2.2" xref="S3.E15.m1.4.4.1.1.3.2.1.cmml">,</mo><mfrac id="S3.E15.m1.2.2" xref="S3.E15.m1.2.2.cmml"><mrow id="S3.E15.m1.2.2.2" xref="S3.E15.m1.2.2.2.cmml"><msub id="S3.E15.m1.2.2.2.2" xref="S3.E15.m1.2.2.2.2.cmml"><mi id="S3.E15.m1.2.2.2.2.2" xref="S3.E15.m1.2.2.2.2.2.cmml">u</mi><mi id="S3.E15.m1.2.2.2.2.3" xref="S3.E15.m1.2.2.2.2.3.cmml">y</mi></msub><mo id="S3.E15.m1.2.2.2.1" xref="S3.E15.m1.2.2.2.1.cmml"></mo><mi id="S3.E15.m1.2.2.2.3" xref="S3.E15.m1.2.2.2.3.cmml">d</mi></mrow><mi id="S3.E15.m1.2.2.3" xref="S3.E15.m1.2.2.3.cmml">f</mi></mfrac><mo id="S3.E15.m1.4.4.1.1.3.2.2.3" xref="S3.E15.m1.4.4.1.1.3.2.1.cmml">,</mo><mi id="S3.E15.m1.3.3" xref="S3.E15.m1.3.3.cmml">d</mi><mo id="S3.E15.m1.4.4.1.1.3.2.2.4" xref="S3.E15.m1.4.4.1.1.3.2.1.cmml">)</mo></mrow><mo id="S3.E15.m1.4.4.1.1.3.1" xref="S3.E15.m1.4.4.1.1.3.1.cmml">+</mo><mi id="S3.E15.m1.4.4.1.1.3.3" mathvariant="normal" xref="S3.E15.m1.4.4.1.1.3.3.cmml">Δ</mi></mrow></mrow><mo id="S3.E15.m1.4.4.1.2" lspace="0em" xref="S3.E15.m1.4.4.1.1.cmml">.</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.E15.m1.4b"><apply id="S3.E15.m1.4.4.1.1.cmml" xref="S3.E15.m1.4.4.1"><eq id="S3.E15.m1.4.4.1.1.1.cmml" xref="S3.E15.m1.4.4.1.1.1"></eq><ci id="S3.E15.m1.4.4.1.1.2.cmml" xref="S3.E15.m1.4.4.1.1.2">𝜇</ci><apply id="S3.E15.m1.4.4.1.1.3.cmml" xref="S3.E15.m1.4.4.1.1.3"><plus id="S3.E15.m1.4.4.1.1.3.1.cmml" xref="S3.E15.m1.4.4.1.1.3.1"></plus><vector id="S3.E15.m1.4.4.1.1.3.2.1.cmml" xref="S3.E15.m1.4.4.1.1.3.2.2"><apply id="S3.E15.m1.1.1.cmml" xref="S3.E15.m1.1.1"><divide id="S3.E15.m1.1.1.1.cmml" xref="S3.E15.m1.1.1"></divide><apply id="S3.E15.m1.1.1.2.cmml" xref="S3.E15.m1.1.1.2"><times id="S3.E15.m1.1.1.2.1.cmml" xref="S3.E15.m1.1.1.2.1"></times><apply id="S3.E15.m1.1.1.2.2.cmml" xref="S3.E15.m1.1.1.2.2"><csymbol cd="ambiguous" id="S3.E15.m1.1.1.2.2.1.cmml" xref="S3.E15.m1.1.1.2.2">subscript</csymbol><ci id="S3.E15.m1.1.1.2.2.2.cmml" xref="S3.E15.m1.1.1.2.2.2">𝑢</ci><ci id="S3.E15.m1.1.1.2.2.3.cmml" xref="S3.E15.m1.1.1.2.2.3">𝑥</ci></apply><ci id="S3.E15.m1.1.1.2.3.cmml" xref="S3.E15.m1.1.1.2.3">𝑑</ci></apply><ci id="S3.E15.m1.1.1.3.cmml" xref="S3.E15.m1.1.1.3">𝑓</ci></apply><apply id="S3.E15.m1.2.2.cmml" xref="S3.E15.m1.2.2"><divide id="S3.E15.m1.2.2.1.cmml" xref="S3.E15.m1.2.2"></divide><apply id="S3.E15.m1.2.2.2.cmml" xref="S3.E15.m1.2.2.2"><times id="S3.E15.m1.2.2.2.1.cmml" xref="S3.E15.m1.2.2.2.1"></times><apply id="S3.E15.m1.2.2.2.2.cmml" xref="S3.E15.m1.2.2.2.2"><csymbol cd="ambiguous" id="S3.E15.m1.2.2.2.2.1.cmml" xref="S3.E15.m1.2.2.2.2">subscript</csymbol><ci id="S3.E15.m1.2.2.2.2.2.cmml" xref="S3.E15.m1.2.2.2.2.2">𝑢</ci><ci id="S3.E15.m1.2.2.2.2.3.cmml" xref="S3.E15.m1.2.2.2.2.3">𝑦</ci></apply><ci id="S3.E15.m1.2.2.2.3.cmml" xref="S3.E15.m1.2.2.2.3">𝑑</ci></apply><ci id="S3.E15.m1.2.2.3.cmml" xref="S3.E15.m1.2.2.3">𝑓</ci></apply><ci id="S3.E15.m1.3.3.cmml" xref="S3.E15.m1.3.3">𝑑</ci></vector><ci id="S3.E15.m1.4.4.1.1.3.3.cmml" xref="S3.E15.m1.4.4.1.1.3.3">Δ</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E15.m1.4c">\mu=\left(\frac{u_{x}d}{f},\frac{u_{y}d}{f},d\right)+\Delta.</annotation><annotation encoding="application/x-llamapun" id="S3.E15.m1.4d">italic_μ = ( divide start_ARG italic_u start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_d end_ARG start_ARG italic_f end_ARG , divide start_ARG italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_d end_ARG start_ARG italic_f end_ARG , italic_d ) + roman_Δ .</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(15)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS2.SSS3.p3.4">The U-Net <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib48" title="">48</a>]</cite> with ResNet blocks <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib28" title="">28</a>]</cite> for both encoding and decoding processes, ultimately outputs a tensor, which is <math alttext="\Phi_{\text{dec}}(\Phi_{\text{enc}}(I,D))\in\mathbb{R}^{(C-1)\times H\times W}" class="ltx_Math" display="inline" id="S3.SS2.SSS3.p3.4.m1.4"><semantics id="S3.SS2.SSS3.p3.4.m1.4a"><mrow id="S3.SS2.SSS3.p3.4.m1.4.4" xref="S3.SS2.SSS3.p3.4.m1.4.4.cmml"><mrow id="S3.SS2.SSS3.p3.4.m1.4.4.1" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.cmml"><msub id="S3.SS2.SSS3.p3.4.m1.4.4.1.3" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.3.cmml"><mi id="S3.SS2.SSS3.p3.4.m1.4.4.1.3.2" mathvariant="normal" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.3.2.cmml">Φ</mi><mtext id="S3.SS2.SSS3.p3.4.m1.4.4.1.3.3" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.3.3a.cmml">dec</mtext></msub><mo id="S3.SS2.SSS3.p3.4.m1.4.4.1.2" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.2.cmml"></mo><mrow id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.cmml"><mo id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.2" stretchy="false" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.cmml">(</mo><mrow id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.cmml"><msub id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2.cmml"><mi id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2.2" mathvariant="normal" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2.2.cmml">Φ</mi><mtext id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2.3" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2.3a.cmml">enc</mtext></msub><mo id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.1" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.1.cmml"></mo><mrow id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.3.2" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.3.1.cmml"><mo id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.3.2.1" stretchy="false" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.3.1.cmml">(</mo><mi id="S3.SS2.SSS3.p3.4.m1.2.2" xref="S3.SS2.SSS3.p3.4.m1.2.2.cmml">I</mi><mo id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.3.2.2" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.3.1.cmml">,</mo><mi id="S3.SS2.SSS3.p3.4.m1.3.3" xref="S3.SS2.SSS3.p3.4.m1.3.3.cmml">D</mi><mo id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.3.2.3" stretchy="false" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.3.1.cmml">)</mo></mrow></mrow><mo id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.3" stretchy="false" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.SS2.SSS3.p3.4.m1.4.4.2" xref="S3.SS2.SSS3.p3.4.m1.4.4.2.cmml">∈</mo><msup id="S3.SS2.SSS3.p3.4.m1.4.4.3" xref="S3.SS2.SSS3.p3.4.m1.4.4.3.cmml"><mi id="S3.SS2.SSS3.p3.4.m1.4.4.3.2" xref="S3.SS2.SSS3.p3.4.m1.4.4.3.2.cmml">ℝ</mi><mrow id="S3.SS2.SSS3.p3.4.m1.1.1.1" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.cmml"><mrow id="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.cmml"><mo id="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.2" stretchy="false" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.cmml">(</mo><mrow id="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.cmml"><mi id="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.2" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.2.cmml">C</mi><mo id="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.1" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.1.cmml">−</mo><mn id="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.3" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.3.cmml">1</mn></mrow><mo id="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.3" rspace="0.055em" stretchy="false" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.cmml">)</mo></mrow><mo id="S3.SS2.SSS3.p3.4.m1.1.1.1.2" rspace="0.222em" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.2.cmml">×</mo><mi id="S3.SS2.SSS3.p3.4.m1.1.1.1.3" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.3.cmml">H</mi><mo id="S3.SS2.SSS3.p3.4.m1.1.1.1.2a" lspace="0.222em" rspace="0.222em" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.2.cmml">×</mo><mi id="S3.SS2.SSS3.p3.4.m1.1.1.1.4" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.4.cmml">W</mi></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS3.p3.4.m1.4b"><apply id="S3.SS2.SSS3.p3.4.m1.4.4.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4"><in id="S3.SS2.SSS3.p3.4.m1.4.4.2.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.2"></in><apply id="S3.SS2.SSS3.p3.4.m1.4.4.1.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.1"><times id="S3.SS2.SSS3.p3.4.m1.4.4.1.2.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.2"></times><apply id="S3.SS2.SSS3.p3.4.m1.4.4.1.3.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.3"><csymbol cd="ambiguous" id="S3.SS2.SSS3.p3.4.m1.4.4.1.3.1.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.3">subscript</csymbol><ci id="S3.SS2.SSS3.p3.4.m1.4.4.1.3.2.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.3.2">Φ</ci><ci id="S3.SS2.SSS3.p3.4.m1.4.4.1.3.3a.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.3.3"><mtext id="S3.SS2.SSS3.p3.4.m1.4.4.1.3.3.cmml" mathsize="70%" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.3.3">dec</mtext></ci></apply><apply id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1"><times id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.1.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.1"></times><apply id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2"><csymbol cd="ambiguous" id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2.1.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2">subscript</csymbol><ci id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2.2.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2.2">Φ</ci><ci id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2.3a.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2.3"><mtext id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2.3.cmml" mathsize="70%" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.2.3">enc</mtext></ci></apply><interval closure="open" id="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.3.1.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.1.1.1.1.3.2"><ci id="S3.SS2.SSS3.p3.4.m1.2.2.cmml" xref="S3.SS2.SSS3.p3.4.m1.2.2">𝐼</ci><ci id="S3.SS2.SSS3.p3.4.m1.3.3.cmml" xref="S3.SS2.SSS3.p3.4.m1.3.3">𝐷</ci></interval></apply></apply><apply id="S3.SS2.SSS3.p3.4.m1.4.4.3.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.3"><csymbol cd="ambiguous" id="S3.SS2.SSS3.p3.4.m1.4.4.3.1.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.3">superscript</csymbol><ci id="S3.SS2.SSS3.p3.4.m1.4.4.3.2.cmml" xref="S3.SS2.SSS3.p3.4.m1.4.4.3.2">ℝ</ci><apply id="S3.SS2.SSS3.p3.4.m1.1.1.1.cmml" xref="S3.SS2.SSS3.p3.4.m1.1.1.1"><times id="S3.SS2.SSS3.p3.4.m1.1.1.1.2.cmml" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.2"></times><apply id="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.cmml" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1"><minus id="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.1.cmml" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.1"></minus><ci id="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.2.cmml" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.2">𝐶</ci><cn id="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.3.cmml" type="integer" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.1.1.1.3">1</cn></apply><ci id="S3.SS2.SSS3.p3.4.m1.1.1.1.3.cmml" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.3">𝐻</ci><ci id="S3.SS2.SSS3.p3.4.m1.1.1.1.4.cmml" xref="S3.SS2.SSS3.p3.4.m1.1.1.1.4">𝑊</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS3.p3.4.m1.4c">\Phi_{\text{dec}}(\Phi_{\text{enc}}(I,D))\in\mathbb{R}^{(C-1)\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS3.p3.4.m1.4d">roman_Φ start_POSTSUBSCRIPT dec end_POSTSUBSCRIPT ( roman_Φ start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT ( italic_I , italic_D ) ) ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_C - 1 ) × italic_H × italic_W end_POSTSUPERSCRIPT</annotation></semantics></math>.</p> </div> </section> <section class="ltx_subsubsection" id="S3.SS2.SSS4"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection">3.2.4 </span>Training Loss</h4> <div class="ltx_para" id="S3.SS2.SSS4.p1"> <p class="ltx_p" id="S3.SS2.SSS4.p1.2">During the training of network <math alttext="\Phi" class="ltx_Math" display="inline" id="S3.SS2.SSS4.p1.1.m1.1"><semantics id="S3.SS2.SSS4.p1.1.m1.1a"><mi id="S3.SS2.SSS4.p1.1.m1.1.1" mathvariant="normal" xref="S3.SS2.SSS4.p1.1.m1.1.1.cmml">Φ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS4.p1.1.m1.1b"><ci id="S3.SS2.SSS4.p1.1.m1.1.1.cmml" xref="S3.SS2.SSS4.p1.1.m1.1.1">Φ</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS4.p1.1.m1.1c">\Phi</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS4.p1.1.m1.1d">roman_Φ</annotation></semantics></math> with data triplets <math alttext="(I,J,\pi)" class="ltx_Math" display="inline" id="S3.SS2.SSS4.p1.2.m2.3"><semantics id="S3.SS2.SSS4.p1.2.m2.3a"><mrow id="S3.SS2.SSS4.p1.2.m2.3.4.2" xref="S3.SS2.SSS4.p1.2.m2.3.4.1.cmml"><mo id="S3.SS2.SSS4.p1.2.m2.3.4.2.1" stretchy="false" xref="S3.SS2.SSS4.p1.2.m2.3.4.1.cmml">(</mo><mi id="S3.SS2.SSS4.p1.2.m2.1.1" xref="S3.SS2.SSS4.p1.2.m2.1.1.cmml">I</mi><mo id="S3.SS2.SSS4.p1.2.m2.3.4.2.2" xref="S3.SS2.SSS4.p1.2.m2.3.4.1.cmml">,</mo><mi id="S3.SS2.SSS4.p1.2.m2.2.2" xref="S3.SS2.SSS4.p1.2.m2.2.2.cmml">J</mi><mo id="S3.SS2.SSS4.p1.2.m2.3.4.2.3" xref="S3.SS2.SSS4.p1.2.m2.3.4.1.cmml">,</mo><mi id="S3.SS2.SSS4.p1.2.m2.3.3" xref="S3.SS2.SSS4.p1.2.m2.3.3.cmml">π</mi><mo id="S3.SS2.SSS4.p1.2.m2.3.4.2.4" stretchy="false" xref="S3.SS2.SSS4.p1.2.m2.3.4.1.cmml">)</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS4.p1.2.m2.3b"><vector id="S3.SS2.SSS4.p1.2.m2.3.4.1.cmml" xref="S3.SS2.SSS4.p1.2.m2.3.4.2"><ci id="S3.SS2.SSS4.p1.2.m2.1.1.cmml" xref="S3.SS2.SSS4.p1.2.m2.1.1">𝐼</ci><ci id="S3.SS2.SSS4.p1.2.m2.2.2.cmml" xref="S3.SS2.SSS4.p1.2.m2.2.2">𝐽</ci><ci id="S3.SS2.SSS4.p1.2.m2.3.3.cmml" xref="S3.SS2.SSS4.p1.2.m2.3.3">𝜋</ci></vector></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS4.p1.2.m2.3c">(I,J,\pi)</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS4.p1.2.m2.3d">( italic_I , italic_J , italic_π )</annotation></semantics></math>, the objective is to minimize the rendering loss,</p> <table class="ltx_equation ltx_eqn_table" id="S3.E16"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\begin{split}\mathcal{L}=&\sum_{pm\in PM}\alpha_{i}||pm(\hat{J},J)||+\\ &\lambda_{1}{||\Delta||^{2}_{2}}+\lambda_{2}{||\text{scale}(\mathbf{G})||}.% \end{split}" class="ltx_Math" display="block" id="S3.E16.m1.42"><semantics id="S3.E16.m1.42a"><mtable columnspacing="0pt" displaystyle="true" id="S3.E16.m1.42.42.3" rowspacing="0pt"><mtr id="S3.E16.m1.42.42.3a"><mtd class="ltx_align_right" columnalign="right" id="S3.E16.m1.42.42.3b"><mrow id="S3.E16.m1.2.2.2.2.2"><mi class="ltx_font_mathcaligraphic" id="S3.E16.m1.1.1.1.1.1.1" xref="S3.E16.m1.1.1.1.1.1.1.cmml">ℒ</mi><mo id="S3.E16.m1.2.2.2.2.2.2" xref="S3.E16.m1.2.2.2.2.2.2.cmml">=</mo><mi id="S3.E16.m1.2.2.2.2.2.3" xref="S3.E16.m1.40.40.1.1.1.cmml"></mi></mrow></mtd><mtd class="ltx_align_left" columnalign="left" id="S3.E16.m1.42.42.3c"><mrow id="S3.E16.m1.41.41.2.40.19.17"><mrow id="S3.E16.m1.41.41.2.40.19.17.17"><munder id="S3.E16.m1.41.41.2.40.19.17.17.2"><mo id="S3.E16.m1.3.3.3.3.1.1" movablelimits="false" xref="S3.E16.m1.3.3.3.3.1.1.cmml">∑</mo><mrow id="S3.E16.m1.4.4.4.4.2.2.1" xref="S3.E16.m1.4.4.4.4.2.2.1.cmml"><mrow id="S3.E16.m1.4.4.4.4.2.2.1.2" xref="S3.E16.m1.4.4.4.4.2.2.1.2.cmml"><mi id="S3.E16.m1.4.4.4.4.2.2.1.2.2" xref="S3.E16.m1.4.4.4.4.2.2.1.2.2.cmml">p</mi><mo id="S3.E16.m1.4.4.4.4.2.2.1.2.1" xref="S3.E16.m1.4.4.4.4.2.2.1.2.1.cmml"></mo><mi id="S3.E16.m1.4.4.4.4.2.2.1.2.3" xref="S3.E16.m1.4.4.4.4.2.2.1.2.3.cmml">m</mi></mrow><mo id="S3.E16.m1.4.4.4.4.2.2.1.1" xref="S3.E16.m1.4.4.4.4.2.2.1.1.cmml">∈</mo><mrow id="S3.E16.m1.4.4.4.4.2.2.1.3" xref="S3.E16.m1.4.4.4.4.2.2.1.3.cmml"><mi id="S3.E16.m1.4.4.4.4.2.2.1.3.2" xref="S3.E16.m1.4.4.4.4.2.2.1.3.2.cmml">P</mi><mo id="S3.E16.m1.4.4.4.4.2.2.1.3.1" xref="S3.E16.m1.4.4.4.4.2.2.1.3.1.cmml"></mo><mi id="S3.E16.m1.4.4.4.4.2.2.1.3.3" xref="S3.E16.m1.4.4.4.4.2.2.1.3.3.cmml">M</mi></mrow></mrow></munder><mrow id="S3.E16.m1.41.41.2.40.19.17.17.1"><msub id="S3.E16.m1.41.41.2.40.19.17.17.1.3"><mi id="S3.E16.m1.5.5.5.5.3.3" xref="S3.E16.m1.5.5.5.5.3.3.cmml">α</mi><mi id="S3.E16.m1.6.6.6.6.4.4.1" xref="S3.E16.m1.6.6.6.6.4.4.1.cmml">i</mi></msub><mo id="S3.E16.m1.41.41.2.40.19.17.17.1.2" xref="S3.E16.m1.40.40.1.1.1.cmml"></mo><mrow id="S3.E16.m1.41.41.2.40.19.17.17.1.1.1"><mo id="S3.E16.m1.7.7.7.7.5.5b" stretchy="false" xref="S3.E16.m1.40.40.1.1.1.cmml">‖</mo><mrow id="S3.E16.m1.41.41.2.40.19.17.17.1.1.1.1"><mi id="S3.E16.m1.9.9.9.9.7.7" xref="S3.E16.m1.9.9.9.9.7.7.cmml">p</mi><mo id="S3.E16.m1.41.41.2.40.19.17.17.1.1.1.1.1" xref="S3.E16.m1.40.40.1.1.1.cmml"></mo><mi id="S3.E16.m1.10.10.10.10.8.8" xref="S3.E16.m1.10.10.10.10.8.8.cmml">m</mi><mo id="S3.E16.m1.41.41.2.40.19.17.17.1.1.1.1.1a" xref="S3.E16.m1.40.40.1.1.1.cmml"></mo><mrow id="S3.E16.m1.41.41.2.40.19.17.17.1.1.1.1.2"><mo id="S3.E16.m1.11.11.11.11.9.9" stretchy="false" xref="S3.E16.m1.40.40.1.1.1.cmml">(</mo><mover accent="true" id="S3.E16.m1.12.12.12.12.10.10" xref="S3.E16.m1.12.12.12.12.10.10.cmml"><mi id="S3.E16.m1.12.12.12.12.10.10.2" xref="S3.E16.m1.12.12.12.12.10.10.2.cmml">J</mi><mo id="S3.E16.m1.12.12.12.12.10.10.1" xref="S3.E16.m1.12.12.12.12.10.10.1.cmml">^</mo></mover><mo id="S3.E16.m1.13.13.13.13.11.11" xref="S3.E16.m1.40.40.1.1.1.cmml">,</mo><mi id="S3.E16.m1.14.14.14.14.12.12" xref="S3.E16.m1.14.14.14.14.12.12.cmml">J</mi><mo id="S3.E16.m1.15.15.15.15.13.13" stretchy="false" xref="S3.E16.m1.40.40.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.E16.m1.16.16.16.16.14.14b" stretchy="false" xref="S3.E16.m1.40.40.1.1.1.cmml">‖</mo></mrow></mrow></mrow><mo id="S3.E16.m1.18.18.18.18.16.16" xref="S3.E16.m1.18.18.18.18.16.16.cmml">+</mo></mrow></mtd></mtr><mtr id="S3.E16.m1.42.42.3d"><mtd id="S3.E16.m1.42.42.3e" xref="S3.E16.m1.40.40.1.1.1.cmml"></mtd><mtd class="ltx_align_left" columnalign="left" id="S3.E16.m1.42.42.3f"><mrow id="S3.E16.m1.42.42.3.41.22.22.22"><mrow id="S3.E16.m1.42.42.3.41.22.22.22.1"><mrow id="S3.E16.m1.42.42.3.41.22.22.22.1.2"><msub id="S3.E16.m1.42.42.3.41.22.22.22.1.2.2"><mi id="S3.E16.m1.19.19.19.1.1.1" xref="S3.E16.m1.19.19.19.1.1.1.cmml">λ</mi><mn id="S3.E16.m1.20.20.20.2.2.2.1" xref="S3.E16.m1.20.20.20.2.2.2.1.cmml">1</mn></msub><mo id="S3.E16.m1.42.42.3.41.22.22.22.1.2.1" xref="S3.E16.m1.40.40.1.1.1.cmml"></mo><msubsup id="S3.E16.m1.42.42.3.41.22.22.22.1.2.3"><mrow id="S3.E16.m1.42.42.3.41.22.22.22.1.2.3.2.2"><mo id="S3.E16.m1.21.21.21.3.3.3b" stretchy="false" xref="S3.E16.m1.40.40.1.1.1.cmml">‖</mo><mi id="S3.E16.m1.23.23.23.5.5.5" mathvariant="normal" xref="S3.E16.m1.23.23.23.5.5.5.cmml">Δ</mi><mo id="S3.E16.m1.24.24.24.6.6.6b" stretchy="false" xref="S3.E16.m1.40.40.1.1.1.cmml">‖</mo></mrow><mn id="S3.E16.m1.27.27.27.9.9.9.1" xref="S3.E16.m1.27.27.27.9.9.9.1.cmml">2</mn><mn id="S3.E16.m1.26.26.26.8.8.8.1" xref="S3.E16.m1.26.26.26.8.8.8.1.cmml">2</mn></msubsup></mrow><mo id="S3.E16.m1.28.28.28.10.10.10" xref="S3.E16.m1.40.40.1.1.1.cmml">+</mo><mrow id="S3.E16.m1.42.42.3.41.22.22.22.1.1"><msub id="S3.E16.m1.42.42.3.41.22.22.22.1.1.3"><mi id="S3.E16.m1.29.29.29.11.11.11" xref="S3.E16.m1.29.29.29.11.11.11.cmml">λ</mi><mn id="S3.E16.m1.30.30.30.12.12.12.1" xref="S3.E16.m1.30.30.30.12.12.12.1.cmml">2</mn></msub><mo id="S3.E16.m1.42.42.3.41.22.22.22.1.1.2" xref="S3.E16.m1.40.40.1.1.1.cmml"></mo><mrow id="S3.E16.m1.42.42.3.41.22.22.22.1.1.1.1"><mo id="S3.E16.m1.31.31.31.13.13.13b" stretchy="false" xref="S3.E16.m1.40.40.1.1.1.cmml">‖</mo><mrow id="S3.E16.m1.42.42.3.41.22.22.22.1.1.1.1.1"><mtext id="S3.E16.m1.33.33.33.15.15.15" xref="S3.E16.m1.33.33.33.15.15.15a.cmml">scale</mtext><mo id="S3.E16.m1.42.42.3.41.22.22.22.1.1.1.1.1.1" xref="S3.E16.m1.40.40.1.1.1.cmml"></mo><mrow id="S3.E16.m1.42.42.3.41.22.22.22.1.1.1.1.1.2"><mo id="S3.E16.m1.34.34.34.16.16.16" stretchy="false" xref="S3.E16.m1.40.40.1.1.1.cmml">(</mo><mi id="S3.E16.m1.35.35.35.17.17.17" xref="S3.E16.m1.35.35.35.17.17.17.cmml">𝐆</mi><mo id="S3.E16.m1.36.36.36.18.18.18" stretchy="false" xref="S3.E16.m1.40.40.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.E16.m1.37.37.37.19.19.19b" stretchy="false" xref="S3.E16.m1.40.40.1.1.1.cmml">‖</mo></mrow></mrow></mrow><mo id="S3.E16.m1.39.39.39.21.21.21" lspace="0em" xref="S3.E16.m1.40.40.1.1.1.cmml">.</mo></mrow></mtd></mtr></mtable><annotation-xml encoding="MathML-Content" id="S3.E16.m1.42b"><apply id="S3.E16.m1.40.40.1.1.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><eq id="S3.E16.m1.2.2.2.2.2.2.cmml" xref="S3.E16.m1.2.2.2.2.2.2"></eq><ci id="S3.E16.m1.1.1.1.1.1.1.cmml" xref="S3.E16.m1.1.1.1.1.1.1">ℒ</ci><apply id="S3.E16.m1.40.40.1.1.1.2.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><plus id="S3.E16.m1.18.18.18.18.16.16.cmml" xref="S3.E16.m1.18.18.18.18.16.16"></plus><apply id="S3.E16.m1.40.40.1.1.1.1.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><apply id="S3.E16.m1.40.40.1.1.1.1.1.2.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><csymbol cd="ambiguous" id="S3.E16.m1.40.40.1.1.1.1.1.2.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3">subscript</csymbol><sum id="S3.E16.m1.3.3.3.3.1.1.cmml" xref="S3.E16.m1.3.3.3.3.1.1"></sum><apply id="S3.E16.m1.4.4.4.4.2.2.1.cmml" xref="S3.E16.m1.4.4.4.4.2.2.1"><in id="S3.E16.m1.4.4.4.4.2.2.1.1.cmml" xref="S3.E16.m1.4.4.4.4.2.2.1.1"></in><apply id="S3.E16.m1.4.4.4.4.2.2.1.2.cmml" xref="S3.E16.m1.4.4.4.4.2.2.1.2"><times id="S3.E16.m1.4.4.4.4.2.2.1.2.1.cmml" xref="S3.E16.m1.4.4.4.4.2.2.1.2.1"></times><ci id="S3.E16.m1.4.4.4.4.2.2.1.2.2.cmml" xref="S3.E16.m1.4.4.4.4.2.2.1.2.2">𝑝</ci><ci id="S3.E16.m1.4.4.4.4.2.2.1.2.3.cmml" xref="S3.E16.m1.4.4.4.4.2.2.1.2.3">𝑚</ci></apply><apply id="S3.E16.m1.4.4.4.4.2.2.1.3.cmml" xref="S3.E16.m1.4.4.4.4.2.2.1.3"><times id="S3.E16.m1.4.4.4.4.2.2.1.3.1.cmml" xref="S3.E16.m1.4.4.4.4.2.2.1.3.1"></times><ci id="S3.E16.m1.4.4.4.4.2.2.1.3.2.cmml" xref="S3.E16.m1.4.4.4.4.2.2.1.3.2">𝑃</ci><ci id="S3.E16.m1.4.4.4.4.2.2.1.3.3.cmml" xref="S3.E16.m1.4.4.4.4.2.2.1.3.3">𝑀</ci></apply></apply></apply><apply id="S3.E16.m1.40.40.1.1.1.1.1.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><times id="S3.E16.m1.40.40.1.1.1.1.1.1.2.cmml" xref="S3.E16.m1.2.2.2.2.2.3"></times><apply id="S3.E16.m1.40.40.1.1.1.1.1.1.3.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><csymbol cd="ambiguous" id="S3.E16.m1.40.40.1.1.1.1.1.1.3.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3">subscript</csymbol><ci id="S3.E16.m1.5.5.5.5.3.3.cmml" xref="S3.E16.m1.5.5.5.5.3.3">𝛼</ci><ci id="S3.E16.m1.6.6.6.6.4.4.1.cmml" xref="S3.E16.m1.6.6.6.6.4.4.1">𝑖</ci></apply><apply id="S3.E16.m1.40.40.1.1.1.1.1.1.1.2.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><csymbol cd="latexml" id="S3.E16.m1.40.40.1.1.1.1.1.1.1.2.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3">norm</csymbol><apply id="S3.E16.m1.40.40.1.1.1.1.1.1.1.1.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><times id="S3.E16.m1.40.40.1.1.1.1.1.1.1.1.1.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3"></times><ci id="S3.E16.m1.9.9.9.9.7.7.cmml" xref="S3.E16.m1.9.9.9.9.7.7">𝑝</ci><ci id="S3.E16.m1.10.10.10.10.8.8.cmml" xref="S3.E16.m1.10.10.10.10.8.8">𝑚</ci><interval closure="open" id="S3.E16.m1.40.40.1.1.1.1.1.1.1.1.1.4.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><apply id="S3.E16.m1.12.12.12.12.10.10.cmml" xref="S3.E16.m1.12.12.12.12.10.10"><ci id="S3.E16.m1.12.12.12.12.10.10.1.cmml" xref="S3.E16.m1.12.12.12.12.10.10.1">^</ci><ci id="S3.E16.m1.12.12.12.12.10.10.2.cmml" xref="S3.E16.m1.12.12.12.12.10.10.2">𝐽</ci></apply><ci id="S3.E16.m1.14.14.14.14.12.12.cmml" xref="S3.E16.m1.14.14.14.14.12.12">𝐽</ci></interval></apply></apply></apply></apply><apply id="S3.E16.m1.40.40.1.1.1.2.4.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><times id="S3.E16.m1.40.40.1.1.1.2.4.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3"></times><apply id="S3.E16.m1.40.40.1.1.1.2.4.2.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><csymbol cd="ambiguous" id="S3.E16.m1.40.40.1.1.1.2.4.2.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3">subscript</csymbol><ci id="S3.E16.m1.19.19.19.1.1.1.cmml" xref="S3.E16.m1.19.19.19.1.1.1">𝜆</ci><cn id="S3.E16.m1.20.20.20.2.2.2.1.cmml" type="integer" xref="S3.E16.m1.20.20.20.2.2.2.1">1</cn></apply><apply id="S3.E16.m1.40.40.1.1.1.2.4.3.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><csymbol cd="ambiguous" id="S3.E16.m1.40.40.1.1.1.2.4.3.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3">subscript</csymbol><apply id="S3.E16.m1.40.40.1.1.1.2.4.3.2.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><csymbol cd="ambiguous" id="S3.E16.m1.40.40.1.1.1.2.4.3.2.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3">superscript</csymbol><apply id="S3.E16.m1.40.40.1.1.1.2.4.3.2.2.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><csymbol cd="latexml" id="S3.E16.m1.40.40.1.1.1.2.4.3.2.2.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3">norm</csymbol><ci id="S3.E16.m1.23.23.23.5.5.5.cmml" xref="S3.E16.m1.23.23.23.5.5.5">Δ</ci></apply><cn id="S3.E16.m1.26.26.26.8.8.8.1.cmml" type="integer" xref="S3.E16.m1.26.26.26.8.8.8.1">2</cn></apply><cn id="S3.E16.m1.27.27.27.9.9.9.1.cmml" type="integer" xref="S3.E16.m1.27.27.27.9.9.9.1">2</cn></apply></apply><apply id="S3.E16.m1.40.40.1.1.1.2.2.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><times id="S3.E16.m1.40.40.1.1.1.2.2.2.cmml" xref="S3.E16.m1.2.2.2.2.2.3"></times><apply id="S3.E16.m1.40.40.1.1.1.2.2.3.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><csymbol cd="ambiguous" id="S3.E16.m1.40.40.1.1.1.2.2.3.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3">subscript</csymbol><ci id="S3.E16.m1.29.29.29.11.11.11.cmml" xref="S3.E16.m1.29.29.29.11.11.11">𝜆</ci><cn id="S3.E16.m1.30.30.30.12.12.12.1.cmml" type="integer" xref="S3.E16.m1.30.30.30.12.12.12.1">2</cn></apply><apply id="S3.E16.m1.40.40.1.1.1.2.2.1.2.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><csymbol cd="latexml" id="S3.E16.m1.40.40.1.1.1.2.2.1.2.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3">norm</csymbol><apply id="S3.E16.m1.40.40.1.1.1.2.2.1.1.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3"><times id="S3.E16.m1.40.40.1.1.1.2.2.1.1.1.1.cmml" xref="S3.E16.m1.2.2.2.2.2.3"></times><ci id="S3.E16.m1.33.33.33.15.15.15a.cmml" xref="S3.E16.m1.33.33.33.15.15.15"><mtext id="S3.E16.m1.33.33.33.15.15.15.cmml" xref="S3.E16.m1.33.33.33.15.15.15">scale</mtext></ci><ci id="S3.E16.m1.35.35.35.17.17.17.cmml" xref="S3.E16.m1.35.35.35.17.17.17">𝐆</ci></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E16.m1.42c">\begin{split}\mathcal{L}=&\sum_{pm\in PM}\alpha_{i}||pm(\hat{J},J)||+\\ &\lambda_{1}{||\Delta||^{2}_{2}}+\lambda_{2}{||\text{scale}(\mathbf{G})||}.% \end{split}</annotation><annotation encoding="application/x-llamapun" id="S3.E16.m1.42d">start_ROW start_CELL caligraphic_L = end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_p italic_m ∈ italic_P italic_M end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | italic_p italic_m ( over^ start_ARG italic_J end_ARG , italic_J ) | | + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | | roman_Δ | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | | scale ( bold_G ) | | . end_CELL end_ROW</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(16)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS2.SSS4.p1.13">Here, <math alttext="\mathbf{G}" class="ltx_Math" display="inline" id="S3.SS2.SSS4.p1.3.m1.1"><semantics id="S3.SS2.SSS4.p1.3.m1.1a"><mi id="S3.SS2.SSS4.p1.3.m1.1.1" xref="S3.SS2.SSS4.p1.3.m1.1.1.cmml">𝐆</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS4.p1.3.m1.1b"><ci id="S3.SS2.SSS4.p1.3.m1.1.1.cmml" xref="S3.SS2.SSS4.p1.3.m1.1.1">𝐆</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS4.p1.3.m1.1c">\mathbf{G}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS4.p1.3.m1.1d">bold_G</annotation></semantics></math> represents the geometric feature map generated by the network from input data <math alttext="I" class="ltx_Math" display="inline" id="S3.SS2.SSS4.p1.4.m2.1"><semantics id="S3.SS2.SSS4.p1.4.m2.1a"><mi id="S3.SS2.SSS4.p1.4.m2.1.1" xref="S3.SS2.SSS4.p1.4.m2.1.1.cmml">I</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS4.p1.4.m2.1b"><ci id="S3.SS2.SSS4.p1.4.m2.1.1.cmml" xref="S3.SS2.SSS4.p1.4.m2.1.1">𝐼</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS4.p1.4.m2.1c">I</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS4.p1.4.m2.1d">italic_I</annotation></semantics></math>; <math alttext="\text{scale}(\mathbf{G})" class="ltx_Math" display="inline" id="S3.SS2.SSS4.p1.5.m3.1"><semantics id="S3.SS2.SSS4.p1.5.m3.1a"><mrow id="S3.SS2.SSS4.p1.5.m3.1.2" xref="S3.SS2.SSS4.p1.5.m3.1.2.cmml"><mtext id="S3.SS2.SSS4.p1.5.m3.1.2.2" xref="S3.SS2.SSS4.p1.5.m3.1.2.2a.cmml">scale</mtext><mo id="S3.SS2.SSS4.p1.5.m3.1.2.1" xref="S3.SS2.SSS4.p1.5.m3.1.2.1.cmml"></mo><mrow id="S3.SS2.SSS4.p1.5.m3.1.2.3.2" xref="S3.SS2.SSS4.p1.5.m3.1.2.cmml"><mo id="S3.SS2.SSS4.p1.5.m3.1.2.3.2.1" stretchy="false" xref="S3.SS2.SSS4.p1.5.m3.1.2.cmml">(</mo><mi id="S3.SS2.SSS4.p1.5.m3.1.1" xref="S3.SS2.SSS4.p1.5.m3.1.1.cmml">𝐆</mi><mo id="S3.SS2.SSS4.p1.5.m3.1.2.3.2.2" stretchy="false" xref="S3.SS2.SSS4.p1.5.m3.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS4.p1.5.m3.1b"><apply id="S3.SS2.SSS4.p1.5.m3.1.2.cmml" xref="S3.SS2.SSS4.p1.5.m3.1.2"><times id="S3.SS2.SSS4.p1.5.m3.1.2.1.cmml" xref="S3.SS2.SSS4.p1.5.m3.1.2.1"></times><ci id="S3.SS2.SSS4.p1.5.m3.1.2.2a.cmml" xref="S3.SS2.SSS4.p1.5.m3.1.2.2"><mtext id="S3.SS2.SSS4.p1.5.m3.1.2.2.cmml" xref="S3.SS2.SSS4.p1.5.m3.1.2.2">scale</mtext></ci><ci id="S3.SS2.SSS4.p1.5.m3.1.1.cmml" xref="S3.SS2.SSS4.p1.5.m3.1.1">𝐆</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS4.p1.5.m3.1c">\text{scale}(\mathbf{G})</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS4.p1.5.m3.1d">scale ( bold_G )</annotation></semantics></math> denotes Gaussian scales with thresholded <math alttext="L_{1}" class="ltx_Math" display="inline" id="S3.SS2.SSS4.p1.6.m4.1"><semantics id="S3.SS2.SSS4.p1.6.m4.1a"><msub id="S3.SS2.SSS4.p1.6.m4.1.1" xref="S3.SS2.SSS4.p1.6.m4.1.1.cmml"><mi id="S3.SS2.SSS4.p1.6.m4.1.1.2" xref="S3.SS2.SSS4.p1.6.m4.1.1.2.cmml">L</mi><mn id="S3.SS2.SSS4.p1.6.m4.1.1.3" xref="S3.SS2.SSS4.p1.6.m4.1.1.3.cmml">1</mn></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS4.p1.6.m4.1b"><apply id="S3.SS2.SSS4.p1.6.m4.1.1.cmml" xref="S3.SS2.SSS4.p1.6.m4.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS4.p1.6.m4.1.1.1.cmml" xref="S3.SS2.SSS4.p1.6.m4.1.1">subscript</csymbol><ci id="S3.SS2.SSS4.p1.6.m4.1.1.2.cmml" xref="S3.SS2.SSS4.p1.6.m4.1.1.2">𝐿</ci><cn id="S3.SS2.SSS4.p1.6.m4.1.1.3.cmml" type="integer" xref="S3.SS2.SSS4.p1.6.m4.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS4.p1.6.m4.1c">L_{1}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS4.p1.6.m4.1d">italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT</annotation></semantics></math> regularization; <math alttext="PM" class="ltx_Math" display="inline" id="S3.SS2.SSS4.p1.7.m5.1"><semantics id="S3.SS2.SSS4.p1.7.m5.1a"><mrow id="S3.SS2.SSS4.p1.7.m5.1.1" xref="S3.SS2.SSS4.p1.7.m5.1.1.cmml"><mi id="S3.SS2.SSS4.p1.7.m5.1.1.2" xref="S3.SS2.SSS4.p1.7.m5.1.1.2.cmml">P</mi><mo id="S3.SS2.SSS4.p1.7.m5.1.1.1" xref="S3.SS2.SSS4.p1.7.m5.1.1.1.cmml"></mo><mi id="S3.SS2.SSS4.p1.7.m5.1.1.3" xref="S3.SS2.SSS4.p1.7.m5.1.1.3.cmml">M</mi></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS4.p1.7.m5.1b"><apply id="S3.SS2.SSS4.p1.7.m5.1.1.cmml" xref="S3.SS2.SSS4.p1.7.m5.1.1"><times id="S3.SS2.SSS4.p1.7.m5.1.1.1.cmml" xref="S3.SS2.SSS4.p1.7.m5.1.1.1"></times><ci id="S3.SS2.SSS4.p1.7.m5.1.1.2.cmml" xref="S3.SS2.SSS4.p1.7.m5.1.1.2">𝑃</ci><ci id="S3.SS2.SSS4.p1.7.m5.1.1.3.cmml" xref="S3.SS2.SSS4.p1.7.m5.1.1.3">𝑀</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS4.p1.7.m5.1c">PM</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS4.p1.7.m5.1d">italic_P italic_M</annotation></semantics></math> refers to photometric, a term which encompasses the measures of image quality known as PSNR, SSIM, and LPIPS which with thresholded <math alttext="L_{1}" class="ltx_Math" display="inline" id="S3.SS2.SSS4.p1.8.m6.1"><semantics id="S3.SS2.SSS4.p1.8.m6.1a"><msub id="S3.SS2.SSS4.p1.8.m6.1.1" xref="S3.SS2.SSS4.p1.8.m6.1.1.cmml"><mi id="S3.SS2.SSS4.p1.8.m6.1.1.2" xref="S3.SS2.SSS4.p1.8.m6.1.1.2.cmml">L</mi><mn id="S3.SS2.SSS4.p1.8.m6.1.1.3" xref="S3.SS2.SSS4.p1.8.m6.1.1.3.cmml">1</mn></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS4.p1.8.m6.1b"><apply id="S3.SS2.SSS4.p1.8.m6.1.1.cmml" xref="S3.SS2.SSS4.p1.8.m6.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS4.p1.8.m6.1.1.1.cmml" xref="S3.SS2.SSS4.p1.8.m6.1.1">subscript</csymbol><ci id="S3.SS2.SSS4.p1.8.m6.1.1.2.cmml" xref="S3.SS2.SSS4.p1.8.m6.1.1.2">𝐿</ci><cn id="S3.SS2.SSS4.p1.8.m6.1.1.3.cmml" type="integer" xref="S3.SS2.SSS4.p1.8.m6.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS4.p1.8.m6.1c">L_{1}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS4.p1.8.m6.1d">italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT</annotation></semantics></math> regularization; <math alttext="\Delta" class="ltx_Math" display="inline" id="S3.SS2.SSS4.p1.9.m7.1"><semantics id="S3.SS2.SSS4.p1.9.m7.1a"><mi id="S3.SS2.SSS4.p1.9.m7.1.1" mathvariant="normal" xref="S3.SS2.SSS4.p1.9.m7.1.1.cmml">Δ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS4.p1.9.m7.1b"><ci id="S3.SS2.SSS4.p1.9.m7.1.1.cmml" xref="S3.SS2.SSS4.p1.9.m7.1.1">Δ</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS4.p1.9.m7.1c">\Delta</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS4.p1.9.m7.1d">roman_Δ</annotation></semantics></math> represents depth offsets with <math alttext="L_{2}" class="ltx_Math" display="inline" id="S3.SS2.SSS4.p1.10.m8.1"><semantics id="S3.SS2.SSS4.p1.10.m8.1a"><msub id="S3.SS2.SSS4.p1.10.m8.1.1" xref="S3.SS2.SSS4.p1.10.m8.1.1.cmml"><mi id="S3.SS2.SSS4.p1.10.m8.1.1.2" xref="S3.SS2.SSS4.p1.10.m8.1.1.2.cmml">L</mi><mn id="S3.SS2.SSS4.p1.10.m8.1.1.3" xref="S3.SS2.SSS4.p1.10.m8.1.1.3.cmml">2</mn></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS4.p1.10.m8.1b"><apply id="S3.SS2.SSS4.p1.10.m8.1.1.cmml" xref="S3.SS2.SSS4.p1.10.m8.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS4.p1.10.m8.1.1.1.cmml" xref="S3.SS2.SSS4.p1.10.m8.1.1">subscript</csymbol><ci id="S3.SS2.SSS4.p1.10.m8.1.1.2.cmml" xref="S3.SS2.SSS4.p1.10.m8.1.1.2">𝐿</ci><cn id="S3.SS2.SSS4.p1.10.m8.1.1.3.cmml" type="integer" xref="S3.SS2.SSS4.p1.10.m8.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS4.p1.10.m8.1c">L_{2}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS4.p1.10.m8.1d">italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT</annotation></semantics></math> regularization, and <math alttext="J" class="ltx_Math" display="inline" id="S3.SS2.SSS4.p1.11.m9.1"><semantics id="S3.SS2.SSS4.p1.11.m9.1a"><mi id="S3.SS2.SSS4.p1.11.m9.1.1" xref="S3.SS2.SSS4.p1.11.m9.1.1.cmml">J</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS4.p1.11.m9.1b"><ci id="S3.SS2.SSS4.p1.11.m9.1.1.cmml" xref="S3.SS2.SSS4.p1.11.m9.1.1">𝐽</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS4.p1.11.m9.1c">J</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS4.p1.11.m9.1d">italic_J</annotation></semantics></math> is the target image we aim to approximate. The <math alttext="L_{1}" class="ltx_Math" display="inline" id="S3.SS2.SSS4.p1.12.m10.1"><semantics id="S3.SS2.SSS4.p1.12.m10.1a"><msub id="S3.SS2.SSS4.p1.12.m10.1.1" xref="S3.SS2.SSS4.p1.12.m10.1.1.cmml"><mi id="S3.SS2.SSS4.p1.12.m10.1.1.2" xref="S3.SS2.SSS4.p1.12.m10.1.1.2.cmml">L</mi><mn id="S3.SS2.SSS4.p1.12.m10.1.1.3" xref="S3.SS2.SSS4.p1.12.m10.1.1.3.cmml">1</mn></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS4.p1.12.m10.1b"><apply id="S3.SS2.SSS4.p1.12.m10.1.1.cmml" xref="S3.SS2.SSS4.p1.12.m10.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS4.p1.12.m10.1.1.1.cmml" xref="S3.SS2.SSS4.p1.12.m10.1.1">subscript</csymbol><ci id="S3.SS2.SSS4.p1.12.m10.1.1.2.cmml" xref="S3.SS2.SSS4.p1.12.m10.1.1.2">𝐿</ci><cn id="S3.SS2.SSS4.p1.12.m10.1.1.3.cmml" type="integer" xref="S3.SS2.SSS4.p1.12.m10.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS4.p1.12.m10.1c">L_{1}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS4.p1.12.m10.1d">italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT</annotation></semantics></math> and <math alttext="L_{2}" class="ltx_Math" display="inline" id="S3.SS2.SSS4.p1.13.m11.1"><semantics id="S3.SS2.SSS4.p1.13.m11.1a"><msub id="S3.SS2.SSS4.p1.13.m11.1.1" xref="S3.SS2.SSS4.p1.13.m11.1.1.cmml"><mi id="S3.SS2.SSS4.p1.13.m11.1.1.2" xref="S3.SS2.SSS4.p1.13.m11.1.1.2.cmml">L</mi><mn id="S3.SS2.SSS4.p1.13.m11.1.1.3" xref="S3.SS2.SSS4.p1.13.m11.1.1.3.cmml">2</mn></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.SSS4.p1.13.m11.1b"><apply id="S3.SS2.SSS4.p1.13.m11.1.1.cmml" xref="S3.SS2.SSS4.p1.13.m11.1.1"><csymbol cd="ambiguous" id="S3.SS2.SSS4.p1.13.m11.1.1.1.cmml" xref="S3.SS2.SSS4.p1.13.m11.1.1">subscript</csymbol><ci id="S3.SS2.SSS4.p1.13.m11.1.1.2.cmml" xref="S3.SS2.SSS4.p1.13.m11.1.1.2">𝐿</ci><cn id="S3.SS2.SSS4.p1.13.m11.1.1.3.cmml" type="integer" xref="S3.SS2.SSS4.p1.13.m11.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.SSS4.p1.13.m11.1c">L_{2}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.SSS4.p1.13.m11.1d">italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT</annotation></semantics></math> loss measures are used to calculate the Euclidean distance and the sum of squares between the synthesized and target images.</p> </div> </section> </section> </section> <section class="ltx_section" id="S4"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">4 </span>Experimental Results</h2> <section class="ltx_subsection" id="S4.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.1 </span>Experiment Setup</h3> <figure class="ltx_table" id="S4.T1"> <div class="ltx_inline-block ltx_transformed_outer" id="S4.T1.2" style="width:433.6pt;height:97.4pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="transform:translate(-143.9pt,32.3pt) scale(0.600996973667664,0.600996973667664) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S4.T1.2.1"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S4.T1.2.1.1.1"> <th class="ltx_td ltx_align_left ltx_th ltx_th_column ltx_th_row ltx_border_tt" id="S4.T1.2.1.1.1.1" rowspan="2" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text" id="S4.T1.2.1.1.1.1.1">Method</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" colspan="3" id="S4.T1.2.1.1.1.2" style="padding-left:9.0pt;padding-right:9.0pt;">5 frames</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" colspan="3" id="S4.T1.2.1.1.1.3" style="padding-left:9.0pt;padding-right:9.0pt;">10 frames</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" colspan="3" id="S4.T1.2.1.1.1.4" style="padding-left:9.0pt;padding-right:9.0pt;">u[-30, 30] frames</th> </tr> <tr class="ltx_tr" id="S4.T1.2.1.2.2"> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T1.2.1.2.2.1" style="padding-left:9.0pt;padding-right:9.0pt;">PSNR↑</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T1.2.1.2.2.2" style="padding-left:9.0pt;padding-right:9.0pt;">SSIM↑</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T1.2.1.2.2.3" style="padding-left:9.0pt;padding-right:9.0pt;">LPIPS↓</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T1.2.1.2.2.4" style="padding-left:9.0pt;padding-right:9.0pt;"> PSNR↑</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T1.2.1.2.2.5" style="padding-left:9.0pt;padding-right:9.0pt;">SSIM↑</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T1.2.1.2.2.6" style="padding-left:9.0pt;padding-right:9.0pt;">LPIPS↓</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T1.2.1.2.2.7" style="padding-left:9.0pt;padding-right:9.0pt;"> PSNR↑</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T1.2.1.2.2.8" style="padding-left:9.0pt;padding-right:9.0pt;">SSIM↑</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T1.2.1.2.2.9" style="padding-left:9.0pt;padding-right:9.0pt;">LPIPS↓</th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S4.T1.2.1.3.1"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S4.T1.2.1.3.1.1" style="padding-left:9.0pt;padding-right:9.0pt;">Syn-Sin <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib73" title="">73</a>]</cite> </th> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T1.2.1.3.1.2" style="padding-left:9.0pt;padding-right:9.0pt;">-</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T1.2.1.3.1.3" style="padding-left:9.0pt;padding-right:9.0pt;">-</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T1.2.1.3.1.4" style="padding-left:9.0pt;padding-right:9.0pt;">-</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T1.2.1.3.1.5" style="padding-left:9.0pt;padding-right:9.0pt;"> -</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T1.2.1.3.1.6" style="padding-left:9.0pt;padding-right:9.0pt;">-</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T1.2.1.3.1.7" style="padding-left:9.0pt;padding-right:9.0pt;">-</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T1.2.1.3.1.8" style="padding-left:9.0pt;padding-right:9.0pt;"> 22.30</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T1.2.1.3.1.9" style="padding-left:9.0pt;padding-right:9.0pt;">0.740</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T1.2.1.3.1.10" style="padding-left:9.0pt;padding-right:9.0pt;">-</td> </tr> <tr class="ltx_tr" id="S4.T1.2.1.4.2"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S4.T1.2.1.4.2.1" style="padding-left:9.0pt;padding-right:9.0pt;">SV-MPI <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib63" title="">63</a>]</cite> </th> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.4.2.2" style="padding-left:9.0pt;padding-right:9.0pt;">27.10</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.4.2.3" style="padding-left:9.0pt;padding-right:9.0pt;">0.870</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.4.2.4" style="padding-left:9.0pt;padding-right:9.0pt;">-</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.4.2.5" style="padding-left:9.0pt;padding-right:9.0pt;"> 24.40</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.4.2.6" style="padding-left:9.0pt;padding-right:9.0pt;">0.812</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.4.2.7" style="padding-left:9.0pt;padding-right:9.0pt;">-</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.4.2.8" style="padding-left:9.0pt;padding-right:9.0pt;"> 23.52</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.4.2.9" style="padding-left:9.0pt;padding-right:9.0pt;">0.785</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.4.2.10" style="padding-left:9.0pt;padding-right:9.0pt;">-</td> </tr> <tr class="ltx_tr" id="S4.T1.2.1.5.3"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S4.T1.2.1.5.3.1" style="padding-left:9.0pt;padding-right:9.0pt;">BTS <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib74" title="">74</a>]</cite> </th> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.5.3.2" style="padding-left:9.0pt;padding-right:9.0pt;">-</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.5.3.3" style="padding-left:9.0pt;padding-right:9.0pt;">-</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.5.3.4" style="padding-left:9.0pt;padding-right:9.0pt;">-</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.5.3.5" style="padding-left:9.0pt;padding-right:9.0pt;"> -</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.5.3.6" style="padding-left:9.0pt;padding-right:9.0pt;">-</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.5.3.7" style="padding-left:9.0pt;padding-right:9.0pt;">-</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.5.3.8" style="padding-left:9.0pt;padding-right:9.0pt;"> 24.00</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.5.3.9" style="padding-left:9.0pt;padding-right:9.0pt;">0.755</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.5.3.10" style="padding-left:9.0pt;padding-right:9.0pt;">0.194</td> </tr> <tr class="ltx_tr" id="S4.T1.2.1.6.4"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S4.T1.2.1.6.4.1" style="padding-left:9.0pt;padding-right:9.0pt;">Splatter Image <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib61" title="">61</a>]</cite> </th> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.6.4.2" style="padding-left:9.0pt;padding-right:9.0pt;">28.15</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.6.4.3" style="padding-left:9.0pt;padding-right:9.0pt;">0.894</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.6.4.4" style="padding-left:9.0pt;padding-right:9.0pt;">0.110</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.6.4.5" style="padding-left:9.0pt;padding-right:9.0pt;"> 25.34</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.6.4.6" style="padding-left:9.0pt;padding-right:9.0pt;">0.842</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.6.4.7" style="padding-left:9.0pt;padding-right:9.0pt;">0.144</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.6.4.8" style="padding-left:9.0pt;padding-right:9.0pt;"> 24.15</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.6.4.9" style="padding-left:9.0pt;padding-right:9.0pt;">0.810</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.6.4.10" style="padding-left:9.0pt;padding-right:9.0pt;">0.177</td> </tr> <tr class="ltx_tr" id="S4.T1.2.1.7.5"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S4.T1.2.1.7.5.1" style="padding-left:9.0pt;padding-right:9.0pt;">MINE <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib36" title="">36</a>]</cite> </th> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.7.5.2" style="padding-left:9.0pt;padding-right:9.0pt;">28.45</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.7.5.3" style="padding-left:9.0pt;padding-right:9.0pt;">0.897</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.7.5.4" style="padding-left:9.0pt;padding-right:9.0pt;">0.111</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.7.5.5" style="padding-left:9.0pt;padding-right:9.0pt;"> 25.89</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.7.5.6" style="padding-left:9.0pt;padding-right:9.0pt;">0.850</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.7.5.7" style="padding-left:9.0pt;padding-right:9.0pt;">0.150</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.7.5.8" style="padding-left:9.0pt;padding-right:9.0pt;"> 24.75</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.7.5.9" style="padding-left:9.0pt;padding-right:9.0pt;">0.820</td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.7.5.10" style="padding-left:9.0pt;padding-right:9.0pt;">0.179</td> </tr> <tr class="ltx_tr" id="S4.T1.2.1.8.6"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S4.T1.2.1.8.6.1" style="padding-left:9.0pt;padding-right:9.0pt;">Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite> </th> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.8.6.2" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T1.2.1.8.6.2.1">28.46</span></td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.8.6.3" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T1.2.1.8.6.3.1">0.899</span></td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.8.6.4" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T1.2.1.8.6.4.1">0.100</span></td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.8.6.5" style="padding-left:9.0pt;padding-right:9.0pt;"> <span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T1.2.1.8.6.5.1">25.94</span> </td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.8.6.6" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T1.2.1.8.6.6.1">0.857</span></td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.8.6.7" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T1.2.1.8.6.7.1">0.133</span></td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.8.6.8" style="padding-left:9.0pt;padding-right:9.0pt;"> <span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T1.2.1.8.6.8.1">24.93</span> </td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.8.6.9" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T1.2.1.8.6.9.1">0.833</span></td> <td class="ltx_td ltx_align_center" id="S4.T1.2.1.8.6.10" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T1.2.1.8.6.10.1">0.160</span></td> </tr> <tr class="ltx_tr" id="S4.T1.2.1.9.7"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb ltx_border_t" id="S4.T1.2.1.9.7.1" style="padding-left:9.0pt;padding-right:9.0pt;">Ours</th> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T1.2.1.9.7.2" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text ltx_font_bold" id="S4.T1.2.1.9.7.2.1">29.00</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T1.2.1.9.7.3" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text ltx_font_bold" id="S4.T1.2.1.9.7.3.1">0.904</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T1.2.1.9.7.4" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text ltx_font_bold" id="S4.T1.2.1.9.7.4.1">0.099</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T1.2.1.9.7.5" style="padding-left:9.0pt;padding-right:9.0pt;"> <span class="ltx_text ltx_font_bold" id="S4.T1.2.1.9.7.5.1">26.30</span> </td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T1.2.1.9.7.6" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text ltx_font_bold" id="S4.T1.2.1.9.7.6.1">0.862</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T1.2.1.9.7.7" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text ltx_font_bold" id="S4.T1.2.1.9.7.7.1">0.131</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T1.2.1.9.7.8" style="padding-left:9.0pt;padding-right:9.0pt;"> <span class="ltx_text ltx_font_bold" id="S4.T1.2.1.9.7.8.1">25.28</span> </td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T1.2.1.9.7.9" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text ltx_font_bold" id="S4.T1.2.1.9.7.9.1">0.836</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T1.2.1.9.7.10" style="padding-left:9.0pt;padding-right:9.0pt;"><span class="ltx_text ltx_font_bold" id="S4.T1.2.1.9.7.10.1">0.156</span></td> </tr> </tbody> </table> </span></div> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S4.T1.7.1.1" style="font-size:90%;">Table 1</span>: </span><span class="ltx_text ltx_font_bold" id="S4.T1.8.2" style="font-size:90%;">Novel view synthesis comparison on the RealEstate10K dataset <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib85" title="">85</a>]</cite>.<span class="ltx_text ltx_font_medium" id="S4.T1.8.2.1"> Following Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite>, we evaluate our method on the in-domain novel view synthesis task. As seen, our model consistently outperforms <span class="ltx_text ltx_font_italic" id="S4.T1.8.2.1.1">all</span> existing methods across different frame counts (5 frames, 10 frames, u[-30,30] frames), in terms of PSNR, SSIM, and LPIPS. (</span>Best<span class="ltx_text ltx_font_medium" id="S4.T1.8.2.2"> results are in bold, <span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T1.8.2.2.1">second best</span> underlined. ) </span></span></figcaption> </figure> <div class="ltx_para" id="S4.SS1.p1"> <p class="ltx_p" id="S4.SS1.p1.1"><span class="ltx_text ltx_font_bold" id="S4.SS1.p1.1.1">Datasets.</span> We utilize the RealEstate10K (RE10K) dataset <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib85" title="">85</a>]</cite>, a comprehensive video resource intended for 3D scene reconstruction and novel view synthesis tasks. RE10K incorporates real estate videos from YouTube, featuring 67,477 scenes for training and 7,289 scenes for testing. Its scale and diversity facilitate the robust evaluation of model generalization performance. To ensure reliability, we randomly sample 3,205 frames from within ±30 frames. RE10K is a crucial asset in the fields of virtual reality, augmented reality, and SLAM. Of note, we only employ <span class="ltx_text ltx_font_bold" id="S4.SS1.p1.1.2">87.5%<span class="ltx_note ltx_role_footnote" id="footnote1"><sup class="ltx_note_mark">1</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">1</sup><span class="ltx_tag ltx_tag_note"><span class="ltx_text ltx_font_medium" id="footnote1.1.1.1">1</span></span><span class="ltx_text ltx_font_medium" id="footnote1.9">Part of the original RE10K URLs expired when we downloaded the data, so we can only use the incomplete data for training in this work.</span></span></span></span></span> of the training data used by Flash3D, the prior SoTA method. Yet, as our results will show, the proposed method still achieves <span class="ltx_text ltx_font_italic" id="S4.SS1.p1.1.3">better</span> performance than Flash3D.</p> </div> <div class="ltx_para" id="S4.SS1.p2"> <p class="ltx_p" id="S4.SS1.p2.1">We also compare with Flash3D under the same condition on the KITTI <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib25" title="">25</a>]</cite> dataset. The results are deferred to the supplementary material with detailed discussions about the comparison setups, where our method also performs better.</p> </div> <figure class="ltx_figure" id="S4.F4"> <table class="ltx_tabular ltx_centering ltx_align_middle" id="S4.F4.16"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S4.F4.4.4"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S4.F4.1.1.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.1.1.1.g1" src="x28.png" width="198"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F4.2.2.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.2.2.2.g1" src="x29.png" width="198"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F4.3.3.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.3.3.3.g1" src="x30.png" width="198"/></td> <td class="ltx_td ltx_nopad_l ltx_align_center" id="S4.F4.4.4.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.4.4.4.g1" src="x31.png" width="198"/></td> </tr> <tr class="ltx_tr" id="S4.F4.8.8"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S4.F4.5.5.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.5.5.1.g1" src="x32.png" width="198"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F4.6.6.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.6.6.2.g1" src="x33.png" width="198"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F4.7.7.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.7.7.3.g1" src="x34.png" width="198"/></td> <td class="ltx_td ltx_nopad_l ltx_align_center" id="S4.F4.8.8.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.8.8.4.g1" src="x35.png" width="198"/></td> </tr> <tr class="ltx_tr" id="S4.F4.12.12"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S4.F4.9.9.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.9.9.1.g1" src="x36.png" width="198"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F4.10.10.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.10.10.2.g1" src="x37.png" width="198"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F4.11.11.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.11.11.3.g1" src="x38.png" width="198"/></td> <td class="ltx_td ltx_nopad_l ltx_align_center" id="S4.F4.12.12.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.12.12.4.g1" src="x39.png" width="198"/></td> </tr> <tr class="ltx_tr" id="S4.F4.16.16"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S4.F4.13.13.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.13.13.1.g1" src="x40.png" width="198"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F4.14.14.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.14.14.2.g1" src="x41.png" width="198"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F4.15.15.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.15.15.3.g1" src="x42.png" width="198"/></td> <td class="ltx_td ltx_nopad_l ltx_align_center" id="S4.F4.16.16.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="132" id="S4.F4.16.16.4.g1" src="x43.png" width="198"/></td> </tr> <tr class="ltx_tr" id="S4.F4.16.17.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S4.F4.16.17.1.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text" id="S4.F4.16.17.1.1.1" style="font-size:90%;">(a) Input</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F4.16.17.1.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text" id="S4.F4.16.17.1.2.1" style="font-size:90%;">(b) Flash3D</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F4.16.17.1.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text" id="S4.F4.16.17.1.3.1" style="font-size:90%;">(c) Ours</span></td> <td class="ltx_td ltx_nopad_l ltx_align_center" id="S4.F4.16.17.1.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text" id="S4.F4.16.17.1.4.1" style="font-size:90%;">(d) GT</span></td> </tr> </tbody> </table> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S4.F4.19.1.1" style="font-size:90%;">Figure 4</span>: </span><span class="ltx_text ltx_font_bold" id="S4.F4.20.2" style="font-size:90%;">Qualitative comparison.<span class="ltx_text ltx_font_medium" id="S4.F4.20.2.1"> From top to down, the results of Row 1 are for the 5 frames setup; the results of Row 2 for the 10 frames setup; the results of Row 3 and 4 are for the u[-30, 30] frames setup. It can be observed that our Niagara mitigates the color overflow issue and produces reconstructed images with improved geometric details.</span></span></figcaption> </figure> <div class="ltx_para ltx_noindent" id="S4.SS1.p3"> <p class="ltx_p" id="S4.SS1.p3.1"><span class="ltx_text ltx_font_bold" id="S4.SS1.p3.1.1">Metrics and Comparison Methods.</span> We evaluate the models with the MINE <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib36" title="">36</a>]</cite> test set partitioning and employ PSNR, SSIM, and LPIPS <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib10" title="">10</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib70" title="">70</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib82" title="">82</a>]</cite> as the metrics. We compare our proposed method with several SoTA single-view reconstruction approaches, including <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib73" title="">73</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib63" title="">63</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib74" title="">74</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib36" title="">36</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite>, and an improved version of Splatter Image <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib71" title="">71</a>]</cite>. We also evaluate our method against the advanced <span class="ltx_text ltx_font_italic" id="S4.SS1.p3.1.2">dual-view</span> <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib87" title="">87</a>]</cite> and novel view synthesis approaches <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib22" title="">22</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib56" title="">56</a>]</cite>. As we will show, despite using only a single view, our method outperforms these dual-view methods in many cases.</p> </div> <div class="ltx_para ltx_noindent" id="S4.SS1.p4"> <p class="ltx_p" id="S4.SS1.p4.1"><span class="ltx_text ltx_font_bold" id="S4.SS1.p4.1.1">Training Details.</span> Our Niagara employs three pretrained models: a pre-trained UniDepth model <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib46" title="">46</a>]</cite>, a pre-trained StableNormal model <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib79" title="">79</a>]</cite>, and a ResNet50 encoder <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib28" title="">28</a>]</cite>. Additionally, we use multiple depth offset decoders, normal offset decoders, Gaussian decoders, and geometric constraint conditions. Given the large size of the RE10K dataset, we pre-extract depth maps with UniDepth and normal maps with StableNormal. The model is trained for 50,000 iterations with a batch size of 16 on a single A6000 GPU, which takes around 26 hours.</p> </div> <figure class="ltx_table" id="S4.T2"> <div class="ltx_inline-block ltx_align_center ltx_transformed_outer" id="S4.T2.2" style="width:433.6pt;height:86.4pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="transform:translate(-144.4pt,28.8pt) scale(0.600211387671227,0.600211387671227) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S4.T2.2.1"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S4.T2.2.1.1.1"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_tt" id="S4.T2.2.1.1.1.1" rowspan="2" style="padding-left:17.0pt;padding-right:17.0pt;"> <span class="ltx_text" id="S4.T2.2.1.1.1.1.1">Method</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_tt" id="S4.T2.2.1.1.1.2" rowspan="2" style="padding-left:17.0pt;padding-right:17.0pt;"> <span class="ltx_text" id="S4.T2.2.1.1.1.2.1">Input Views</span></th> <td class="ltx_td ltx_align_center ltx_border_tt" colspan="3" id="S4.T2.2.1.1.1.3" style="padding-left:17.0pt;padding-right:17.0pt;"> RE10K Interpolation</td> <td class="ltx_td ltx_align_center ltx_border_tt" colspan="3" id="S4.T2.2.1.1.1.4" style="padding-left:17.0pt;padding-right:17.0pt;"> RE10K Extrapolation</td> </tr> <tr class="ltx_tr" id="S4.T2.2.1.2.2"> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.2.2.1" style="padding-left:17.0pt;padding-right:17.0pt;"> PSNR↑</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.2.2.2" style="padding-left:17.0pt;padding-right:17.0pt;"> SSIM↑</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.2.2.3" style="padding-left:17.0pt;padding-right:17.0pt;"> LPIPS↓</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.2.2.4" style="padding-left:17.0pt;padding-right:17.0pt;"> PSNR↑</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.2.2.5" style="padding-left:17.0pt;padding-right:17.0pt;"> SSIM↑</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.2.2.6" style="padding-left:17.0pt;padding-right:17.0pt;"> LPIPS↓</td> </tr> <tr class="ltx_tr" id="S4.T2.2.1.3.3"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S4.T2.2.1.3.3.1" style="padding-left:17.0pt;padding-right:17.0pt;"> Du <span class="ltx_text ltx_font_italic" id="S4.T2.2.1.3.3.1.1">et al.</span> <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib18" title="">18</a>]</cite></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t" id="S4.T2.2.1.3.3.2" style="padding-left:17.0pt;padding-right:17.0pt;"> 2</th> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.3.3.3" style="padding-left:17.0pt;padding-right:17.0pt;"> 24.78</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.3.3.4" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.820</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.3.3.5" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.213</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.3.3.6" style="padding-left:17.0pt;padding-right:17.0pt;"> 21.83</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.3.3.7" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.790</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.3.3.8" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.242</td> </tr> <tr class="ltx_tr" id="S4.T2.2.1.4.4"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S4.T2.2.1.4.4.1" style="padding-left:17.0pt;padding-right:17.0pt;"> pixelSplat <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib12" title="">12</a>]</cite></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_row" id="S4.T2.2.1.4.4.2" style="padding-left:17.0pt;padding-right:17.0pt;"> 2</th> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.4.4.3" style="padding-left:17.0pt;padding-right:17.0pt;"> <span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T2.2.1.4.4.3.1">26.09</span></td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.4.4.4" style="padding-left:17.0pt;padding-right:17.0pt;"> <span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T2.2.1.4.4.4.1">0.864</span></td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.4.4.5" style="padding-left:17.0pt;padding-right:17.0pt;"> <span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T2.2.1.4.4.5.1">0.136</span></td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.4.4.6" style="padding-left:17.0pt;padding-right:17.0pt;"> 21.84</td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.4.4.7" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.777</td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.4.4.8" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.216</td> </tr> <tr class="ltx_tr" id="S4.T2.2.1.5.5"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S4.T2.2.1.5.5.1" style="padding-left:17.0pt;padding-right:17.0pt;"> latentSplat <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib72" title="">72</a>]</cite></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_row" id="S4.T2.2.1.5.5.2" style="padding-left:17.0pt;padding-right:17.0pt;"> 2</th> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.5.5.3" style="padding-left:17.0pt;padding-right:17.0pt;"> 23.93</td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.5.5.4" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.812</td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.5.5.5" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.164</td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.5.5.6" style="padding-left:17.0pt;padding-right:17.0pt;"> 22.62</td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.5.5.7" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.777</td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.5.5.8" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.196</td> </tr> <tr class="ltx_tr" id="S4.T2.2.1.6.6"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S4.T2.2.1.6.6.1" style="padding-left:17.0pt;padding-right:17.0pt;"> MVSplat <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib14" title="">14</a>]</cite></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_row" id="S4.T2.2.1.6.6.2" style="padding-left:17.0pt;padding-right:17.0pt;"> 2</th> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.6.6.3" style="padding-left:17.0pt;padding-right:17.0pt;"> <span class="ltx_text ltx_font_bold" id="S4.T2.2.1.6.6.3.1">26.39</span></td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.6.6.4" style="padding-left:17.0pt;padding-right:17.0pt;"> <span class="ltx_text ltx_font_bold" id="S4.T2.2.1.6.6.4.1">0.869</span></td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.6.6.5" style="padding-left:17.0pt;padding-right:17.0pt;"> <span class="ltx_text ltx_font_bold" id="S4.T2.2.1.6.6.5.1">0.128</span></td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.6.6.6" style="padding-left:17.0pt;padding-right:17.0pt;"> 23.04</td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.6.6.7" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.813</td> <td class="ltx_td ltx_align_center" id="S4.T2.2.1.6.6.8" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.185</td> </tr> <tr class="ltx_tr" id="S4.T2.2.1.7.7"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S4.T2.2.1.7.7.1" style="padding-left:17.0pt;padding-right:17.0pt;"> Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t" id="S4.T2.2.1.7.7.2" style="padding-left:17.0pt;padding-right:17.0pt;"> 1</th> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.7.7.3" style="padding-left:17.0pt;padding-right:17.0pt;"> 23.87</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.7.7.4" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.811</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.7.7.5" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.185</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.7.7.6" style="padding-left:17.0pt;padding-right:17.0pt;"> <span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T2.2.1.7.7.6.1">24.10</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.7.7.7" style="padding-left:17.0pt;padding-right:17.0pt;"> <span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T2.2.1.7.7.7.1">0.815</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T2.2.1.7.7.8" style="padding-left:17.0pt;padding-right:17.0pt;"> <span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T2.2.1.7.7.8.1">0.185</span></td> </tr> <tr class="ltx_tr" id="S4.T2.2.1.8.8"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb" id="S4.T2.2.1.8.8.1" style="padding-left:17.0pt;padding-right:17.0pt;"> Ours</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_bb" id="S4.T2.2.1.8.8.2" style="padding-left:17.0pt;padding-right:17.0pt;"> 1</th> <td class="ltx_td ltx_align_center ltx_border_bb" id="S4.T2.2.1.8.8.3" style="padding-left:17.0pt;padding-right:17.0pt;"> 25.24</td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S4.T2.2.1.8.8.4" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.832</td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S4.T2.2.1.8.8.5" style="padding-left:17.0pt;padding-right:17.0pt;"> 0.162</td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S4.T2.2.1.8.8.6" style="padding-left:17.0pt;padding-right:17.0pt;"> <span class="ltx_text ltx_font_bold" id="S4.T2.2.1.8.8.6.1">25.16</span></td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S4.T2.2.1.8.8.7" style="padding-left:17.0pt;padding-right:17.0pt;"> <span class="ltx_text ltx_font_bold" id="S4.T2.2.1.8.8.7.1">0.831</span></td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S4.T2.2.1.8.8.8" style="padding-left:17.0pt;padding-right:17.0pt;"> <span class="ltx_text ltx_font_bold" id="S4.T2.2.1.8.8.8.1">0.162</span></td> </tr> </tbody> </table> </span></div> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S4.T2.6.1.1" style="font-size:90%;">Table 2</span>: </span><span class="ltx_text ltx_font_bold" id="S4.T2.7.2" style="font-size:90%;">Comparison of different methods on the RE10K interpolation and extrapolation datasets.<span class="ltx_text ltx_font_medium" id="S4.T2.7.2.1"> For all the methods, the view closest to the target is used as the source. (1) Flash3D and ours are the only two methods that achieve <span class="ltx_text ltx_font_italic" id="S4.T2.7.2.1.1">single-view</span> scene reconstruction. Ours consistently beats Flash3D in PSNR/SSIM/LPIPS. (2) Notably, although our method utilizes only a single view, it can beat some methods using two input views, like Du <span class="ltx_text ltx_font_italic" id="S4.T2.7.2.1.2">et al.</span> <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib18" title="">18</a>]</cite> and latentSplat <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib72" title="">72</a>]</cite>.</span></span></figcaption> </figure> <div class="ltx_para ltx_noindent" id="S4.SS1.p5"> <p class="ltx_p" id="S4.SS1.p5.1"><span class="ltx_text ltx_font_bold" id="S4.SS1.p5.1.1">Model Analyses and Ablation Studies.</span> We conduct detailed analyses of the geometric structure underlying our reconstruction method to understand its internal mechanisms. Additionally, abundant ablation studies are performed to assess the impact of each design component on the overall performance of the model, clarifying the specific contributions of each component to the model efficacy.</p> </div> </section> <section class="ltx_subsection" id="S4.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.2 </span>Comparison Results with SoTA Methods</h3> <div class="ltx_para ltx_noindent" id="S4.SS2.p1"> <p class="ltx_p" id="S4.SS2.p1.1"><span class="ltx_text ltx_font_bold" id="S4.SS2.p1.1.1">Qualitative Comparison.</span> <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#S1.F2" title="Figure 2 ‣ 1 Introduction ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Figure 2</span></a> shows the results of our method with the corresponding depth maps and normal maps. The reconstructed surfaces by our method are smooth and consistent across multiviews. The appearance of the generated outdoor scenes looks natural, and indoor geometries are faithfully reconstructed. In contrast, Flash3D, due to inadequate learning of Gaussian kernel parameters, produces rainbow banding artifacts and lacks finer details in structures like tables, pillars, and walls.</p> </div> <div class="ltx_para" id="S4.SS2.p2"> <p class="ltx_p" id="S4.SS2.p2.1"><a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#S4.F4" title="Figure 4 ‣ 4.1 Experiment Setup ‣ 4 Experimental Results ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Figure 4</span></a> comparatively evaluates four reconstruction components through controlled experiments. In the 5-frame configuration (first row), Flash3D demonstrates significant rendering failures in outdoor scenes, exhibiting a complete loss of architectural geometric coherence. The 10-frame configuration (second row) reveals structural instability in Flash3D’s output, manifested as anomalous edge blurring in columnar structures. Under dynamic [-30,30] frame conditions (third row), severe color bleeding artifacts occur in open environments, notably producing non-physical chromatic diffusion at vegetation-architecture boundaries. Additional tests under dynamic frame conditions (fourth row) highlight material boundary reconstruction limitations, resulting in incomplete geometric gaps at junctions between hard surfaces and transparent objects. Niagara maintains consistent structural integrity across all configurations, demonstrating enhanced surface detail sharpness while effectively eliminating color contamination. Additional comparative analyses are provided in the supplementary material.</p> </div> <figure class="ltx_table" id="S4.T3"> <div class="ltx_inline-block ltx_transformed_outer" id="S4.T3.2" style="width:496.9pt;height:92pt;vertical-align:-0.7pt;"><span class="ltx_transformed_inner" style="transform:translate(-94.3pt,17.3pt) scale(0.724939607792674,0.724939607792674) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S4.T3.2.1"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S4.T3.2.1.1.1"> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" id="S4.T3.2.1.1.1.1" rowspan="2"><span class="ltx_text" id="S4.T3.2.1.1.1.1.1">Setting</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" colspan="4" id="S4.T3.2.1.1.1.2">Module</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" colspan="3" id="S4.T3.2.1.1.1.3">5 frames</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" colspan="3" id="S4.T3.2.1.1.1.4">10 frames</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" colspan="3" id="S4.T3.2.1.1.1.5">u[-30,30] frames</th> </tr> <tr class="ltx_tr" id="S4.T3.2.1.2.2"> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T3.2.1.2.2.1">Baseline</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T3.2.1.2.2.2">Normal</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T3.2.1.2.2.3">GAF</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T3.2.1.2.2.4">3D Self-Attention</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T3.2.1.2.2.5">PSNR ↑</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T3.2.1.2.2.6">SSIM ↑</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T3.2.1.2.2.7">LPIPS ↓</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T3.2.1.2.2.8">PSNR ↑</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T3.2.1.2.2.9">SSIM ↑</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T3.2.1.2.2.10">LPIPS ↓</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T3.2.1.2.2.11">PSNR ↑</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T3.2.1.2.2.12">SSIM ↑</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T3.2.1.2.2.13">LPIPS ↓</th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S4.T3.2.1.3.1"> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T3.2.1.3.1.1">a</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T3.2.1.3.1.2">✓</td> <td class="ltx_td ltx_border_t" id="S4.T3.2.1.3.1.3"></td> <td class="ltx_td ltx_border_t" id="S4.T3.2.1.3.1.4"></td> <td class="ltx_td ltx_border_t" id="S4.T3.2.1.3.1.5"></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T3.2.1.3.1.6">28.46</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T3.2.1.3.1.7">0.899</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T3.2.1.3.1.8">0.100</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T3.2.1.3.1.9">25.94</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T3.2.1.3.1.10">0.857</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T3.2.1.3.1.11">0.133</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T3.2.1.3.1.12">24.93</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T3.2.1.3.1.13">0.833</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T3.2.1.3.1.14">0.160</td> </tr> <tr class="ltx_tr" id="S4.T3.2.1.4.2"> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.4.2.1">b</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.4.2.2">✓</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.4.2.3">✓</td> <td class="ltx_td" id="S4.T3.2.1.4.2.4"></td> <td class="ltx_td" id="S4.T3.2.1.4.2.5"></td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.4.2.6">28.70</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.4.2.7">0.903</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.4.2.8">0.099</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.4.2.9">26.14</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.4.2.10">0.862</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.4.2.11">0.131</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.4.2.12">25.10</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.4.2.13">0.837</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.4.2.14">0.157</td> </tr> <tr class="ltx_tr" id="S4.T3.2.1.5.3"> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.5.3.1">c</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.5.3.2">✓</td> <td class="ltx_td" id="S4.T3.2.1.5.3.3"></td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.5.3.4">✓</td> <td class="ltx_td" id="S4.T3.2.1.5.3.5"></td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.5.3.6">28.52</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.5.3.7">0.903</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.5.3.8"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T3.2.1.5.3.8.1">0.098</span></td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.5.3.9">26.03</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.5.3.10">0.861</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.5.3.11">0.130</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.5.3.12">24.99</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.5.3.13">0.836</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.5.3.14">0.156</td> </tr> <tr class="ltx_tr" id="S4.T3.2.1.6.4"> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.6.4.1">d</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.6.4.2">✓</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.6.4.3">✓</td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.6.4.4">✓</td> <td class="ltx_td" id="S4.T3.2.1.6.4.5"></td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.6.4.6"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T3.2.1.6.4.6.1">28.76</span></td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.6.4.7"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T3.2.1.6.4.7.1">0.904</span></td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.6.4.8"><span class="ltx_text ltx_font_bold" id="S4.T3.2.1.6.4.8.1">0.095</span></td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.6.4.9"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T3.2.1.6.4.9.1">26.16</span></td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.6.4.10"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T3.2.1.6.4.10.1">0.862</span></td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.6.4.11"><span class="ltx_text ltx_font_bold" id="S4.T3.2.1.6.4.11.1">0.128</span></td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.6.4.12"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T3.2.1.6.4.12.1">25.16</span></td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.6.4.13"><span class="ltx_text ltx_font_bold" id="S4.T3.2.1.6.4.13.1">0.837</span></td> <td class="ltx_td ltx_align_center" id="S4.T3.2.1.6.4.14"><span class="ltx_text ltx_font_bold" id="S4.T3.2.1.6.4.14.1">0.154</span></td> </tr> <tr class="ltx_tr" id="S4.T3.2.1.7.5"> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T3.2.1.7.5.1">Ours</td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T3.2.1.7.5.2">✓</td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T3.2.1.7.5.3">✓</td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T3.2.1.7.5.4">✓</td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T3.2.1.7.5.5">✓</td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T3.2.1.7.5.6"><span class="ltx_text ltx_font_bold" id="S4.T3.2.1.7.5.6.1">29.00</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T3.2.1.7.5.7"><span class="ltx_text ltx_font_bold" id="S4.T3.2.1.7.5.7.1">0.904</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T3.2.1.7.5.8">0.099</td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T3.2.1.7.5.9"><span class="ltx_text ltx_font_bold" id="S4.T3.2.1.7.5.9.1">26.30</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T3.2.1.7.5.10"><span class="ltx_text ltx_font_bold" id="S4.T3.2.1.7.5.10.1">0.862</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T3.2.1.7.5.11"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T3.2.1.7.5.11.1">0.131</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T3.2.1.7.5.12"><span class="ltx_text ltx_font_bold" id="S4.T3.2.1.7.5.12.1">25.28</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T3.2.1.7.5.13"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T3.2.1.7.5.13.1">0.836</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S4.T3.2.1.7.5.14"><span class="ltx_text ltx_framed ltx_framed_underline" id="S4.T3.2.1.7.5.14.1">0.156</span></td> </tr> </tbody> </table> </span></div> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S4.T3.4.1.1" style="font-size:90%;">Table 3</span>: </span><span class="ltx_text ltx_font_bold" id="S4.T3.5.2" style="font-size:90%;">Ablation study with quantitative results.<span class="ltx_text ltx_font_medium" id="S4.T3.5.2.1"> The three major new components introduced in our method are the normal, GAF (Geometric Affine Field) and 3D self-attention, so here we examine their influence on quantitative performance.</span></span></figcaption> </figure> <div class="ltx_para ltx_noindent" id="S4.SS2.p3"> <p class="ltx_p" id="S4.SS2.p3.1"><span class="ltx_text ltx_font_bold" id="S4.SS2.p3.1.1">Quantitative Comparison.</span> <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#S4.T1" title="Table 1 ‣ 4.1 Experiment Setup ‣ 4 Experimental Results ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Table 1</span></a> presents our in-domain evaluations on RE10K. We report metrics assessing zero-shot reconstruction quality and compare our results with those from the other SoTA methods. Across different comparison setups, our method consistently achieves superior performance on <span class="ltx_text ltx_font_italic" id="S4.SS2.p3.1.2">all</span> metrics. Our method achieves higher PSNR/SSIM while significantly reducing LPIPS, ensuring both high-quality reconstruction and accurate geometry.</p> </div> <figure class="ltx_figure" id="S4.F5"> <table class="ltx_tabular ltx_centering ltx_align_middle" id="S4.F5.20"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S4.F5.5.5"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S4.F5.1.1.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S4.F5.1.1.1.g1" src="x44.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.2.2.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S4.F5.2.2.2.g1" src="x45.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.3.3.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S4.F5.3.3.3.g1" src="x46.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.4.4.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S4.F5.4.4.4.g1" src="x47.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_align_center" id="S4.F5.5.5.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="106" id="S4.F5.5.5.5.g1" src="x48.png" width="161"/></td> </tr> <tr class="ltx_tr" id="S4.F5.10.10"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S4.F5.6.6.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="106" id="S4.F5.6.6.1.g1" src="x49.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.7.7.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="106" id="S4.F5.7.7.2.g1" src="x50.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.8.8.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="106" id="S4.F5.8.8.3.g1" src="x51.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.9.9.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="106" id="S4.F5.9.9.4.g1" src="x52.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_align_center" id="S4.F5.10.10.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="106" id="S4.F5.10.10.5.g1" src="x53.png" width="161"/></td> </tr> <tr class="ltx_tr" id="S4.F5.15.15"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S4.F5.11.11.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S4.F5.11.11.1.g1" src="x54.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.12.12.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S4.F5.12.12.2.g1" src="x55.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.13.13.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S4.F5.13.13.3.g1" src="x56.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.14.14.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="107" id="S4.F5.14.14.4.g1" src="x57.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_align_center" id="S4.F5.15.15.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="106" id="S4.F5.15.15.5.g1" src="x58.png" width="161"/></td> </tr> <tr class="ltx_tr" id="S4.F5.20.20"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S4.F5.16.16.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="106" id="S4.F5.16.16.1.g1" src="x59.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.17.17.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="106" id="S4.F5.17.17.2.g1" src="x60.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.18.18.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="106" id="S4.F5.18.18.3.g1" src="x61.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.19.19.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="106" id="S4.F5.19.19.4.g1" src="x62.png" width="161"/></td> <td class="ltx_td ltx_nopad_l ltx_align_center" id="S4.F5.20.20.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="106" id="S4.F5.20.20.5.g1" src="x63.png" width="161"/></td> </tr> <tr class="ltx_tr" id="S4.F5.20.21.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S4.F5.20.21.1.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text" id="S4.F5.20.21.1.1.1" style="font-size:90%;">(a) Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite></span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.20.21.1.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text" id="S4.F5.20.21.1.2.1" style="font-size:90%;">(b) w/o Normal</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.20.21.1.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text" id="S4.F5.20.21.1.3.1" style="font-size:90%;">(c) w/o GAF</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S4.F5.20.21.1.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text" id="S4.F5.20.21.1.4.1" style="font-size:90%;">(d) w/ GAF, Normal</span></td> <td class="ltx_td ltx_nopad_l ltx_align_center" id="S4.F5.20.21.1.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text" id="S4.F5.20.21.1.5.1" style="font-size:90%;">(e) Ours</span></td> </tr> </tbody> </table> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S4.F5.23.1.1" style="font-size:90%;">Figure 5</span>: </span><span class="ltx_text ltx_font_bold" id="S4.F5.24.2" style="font-size:90%;">Ablation study with visual results.<span class="ltx_text ltx_font_medium" id="S4.F5.24.2.1"> All models are built upon the “base” model, which only utilizes the depth information. GAF means Geometric Affine Field, and ours adds 3D self-attention based on (d).</span></span></figcaption> </figure> <div class="ltx_para" id="S4.SS2.p4"> <p class="ltx_p" id="S4.SS2.p4.1">To further evaluate the effectiveness of Niagara, we perform interpolation using pixelSplat <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib12" title="">12</a>]</cite> and extrapolation using latentSplat <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib72" title="">72</a>]</cite>. For fairness, unlike the existing two-view methods that typically assess interpolation between two source views, Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite> consistently performs extrapolation from a single view. <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#S4.T2" title="Table 2 ‣ 4.1 Experiment Setup ‣ 4 Experimental Results ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Table 2</span></a> shows the results of comparing our method with these two-view approaches and Flash3D. Although Niagara does not surpass two-view methods in interpolation due to the input disadvantage of using only one view it still achieves a notable <span class="ltx_text ltx_font_bold" id="S4.SS2.p4.1.1">10.8%</span> LPIPS reduction over Flash3D. Additionally, Niagara excels in the extrapolation task, significantly outperforming <span class="ltx_text ltx_font_italic" id="S4.SS2.p4.1.2">all</span> prior two-view methods and demonstrating substantial improvement over Flash3D, with a <span class="ltx_text ltx_font_bold" id="S4.SS2.p4.1.3">10.8%</span> LPIPS reduction.</p> </div> </section> <section class="ltx_subsection" id="S4.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.3 </span><span class="ltx_text ltx_font_bold" id="S4.SS3.1.1">Ablation Study </span> </h3> <div class="ltx_para" id="S4.SS3.p1"> <p class="ltx_p" id="S4.SS3.p1.1"><span class="ltx_text ltx_font_bold" id="S4.SS3.p1.1.1">Quantitative ablation study.</span> In this ablation study, we evaluate the impact of critical new components in our method by comparing performance metrics presented in <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#S4.T3" title="Table 3 ‣ 4.2 Comparison Results with SoTA Methods ‣ 4 Experimental Results ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Table 3</span></a>. The results demonstrate significant improvements across all metrics with the addition of normal and geometric constraints when contrasted with Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite> and highlight the essential role of these components in enhancing model performance.</p> </div> <div class="ltx_para" id="S4.SS3.p2"> <p class="ltx_p" id="S4.SS3.p2.1">Specifically, the incorporation of normal constraints significantly enhances the overall quality of scene reconstructions by enabling the model to better capture and express subtle surface variations with improved fidelity. In addition, geometric constraints encourage the accurate representation of fine geometrical details, ensuring that the model faithfully reproduces the intricate shapes and structures of objects. Together, these constraints work synergistically with depth information, substantially increasing the model’s capacity to learn and represent complex details and geometric structures. However, excessive geometric constraints impair the 3D self-attention module’s perceptual-imaginative capabilities, degrading LPIPS/SSIM.</p> </div> <div class="ltx_para ltx_noindent" id="S4.SS3.p3"> <p class="ltx_p" id="S4.SS3.p3.1"><span class="ltx_text ltx_font_bold" id="S4.SS3.p3.1.1">Qualitative ablation study.</span> We further conduct ablation studies with qualitative results. In <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#S4.F5" title="Figure 5 ‣ 4.2 Comparison Results with SoTA Methods ‣ 4 Experimental Results ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Figure 5</span></a>, the 1st and 3rd rows display rendered images, while the 2nd and 4th rows present their corresponding Gaussian Splatting representations. The top two rows depict an outdoor scene, and the bottom two rows feature an indoor scene. Our ablation study shows that reconstruction quality progressively improves with the incremental integration of the normal, the geometric affine field, and the 3D self-attention module. <span class="ltx_text ltx_font_bold" id="S4.SS3.p3.1.2">(1)</span> On the outdoor scene, the sequential incorporation of these modules yields sharper outputs, enhanced color fidelity, and improved feature alignment. <span class="ltx_text ltx_font_bold" id="S4.SS3.p3.1.3">(2)</span> On the indoor scene (see the bottom two rows), the inclusion of these components results in more accurate geometric details, finer textures, and better-aligned features. These findings indicate that the synergistic combination of the normal module, geometric affine field, and 3D self-attention substantially elevates reconstruction fidelity. Specifically, our model achieves superior geometric precision and texture reproduction, thus enabling faithful reconstructions of complex real-world scenes.</p> </div> </section> </section> <section class="ltx_section" id="S5"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">5 </span>Conclusion</h2> <div class="ltx_para" id="S5.p1"> <p class="ltx_p" id="S5.p1.1">This paper introduces <span class="ltx_text ltx_font_italic" id="S5.p1.1.1">Niagara</span>, the <span class="ltx_text ltx_font_italic" id="S5.p1.1.2">first</span> comprehensive single-view 3D reconstruction framework tailored for complex <span class="ltx_text ltx_font_italic" id="S5.p1.1.3">outdoor</span> scenes. At its core, Niagara integrates surface normals into a depth-based reconstruction pipeline to capture finer geometric details, while incorporating a geometric affine field and a 3D self-attention module for robust spatial constraint enforcement. Empirically, on the RE10K benchmark, Niagara demonstrates encouraging performance, outperforming the prior SoTA approach Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite> with consistent PSNR, SSIM, and LPIPS improvements. In the RE10K interpolation and extrapolation settings, our single-view method surpasses Flash3D by over 1 dB in PSNR and even outperforms those methods using two views under the interpolation setup. This work establishes practical foundations for advancing <span class="ltx_text ltx_font_italic" id="S5.p1.1.4">single-view</span> 3D reconstruction for geometrically complex outdoor scenes.</p> </div> </section> <section class="ltx_bibliography" id="bib"> <h2 class="ltx_title ltx_title_bibliography">References</h2> <ul class="ltx_biblist"> <li class="ltx_bibitem" id="bib.bib1"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Bae and Davison [2024]</span> <span class="ltx_bibblock"> Gwangbin Bae and Andrew J Davison. </span> <span class="ltx_bibblock">Rethinking inductive biases for surface normal estimation. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib1.1.1">CVPR</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib2"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Barron et al. [2021]</span> <span class="ltx_bibblock"> Jonathan T Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P Srinivasan. </span> <span class="ltx_bibblock">Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib2.1.1">CVPR</em>, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib3"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Barron et al. [2022]</span> <span class="ltx_bibblock"> Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. </span> <span class="ltx_bibblock">Mip-nerf 360: Unbounded anti-aliased neural radiance fields. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib3.1.1">CVPR</em>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib4"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Bemana et al. [2020]</span> <span class="ltx_bibblock"> Mojtaba Bemana, Karol Myszkowski, Hans-Peter Seidel, and Tobias Ritschel. </span> <span class="ltx_bibblock">X-fields: Implicit neural view-, light-and time-image interpolation. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib4.1.1">ACM TOG</em>, 39(6):1–15, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib5"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Bhat et al. [2021]</span> <span class="ltx_bibblock"> Shariq Farooq Bhat, Ibraheem Alhashim, and Peter Wonka. </span> <span class="ltx_bibblock">Adabins: Depth estimation using adaptive bins. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib5.1.1">CVPR</em>, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib6"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Bhoi [2019]</span> <span class="ltx_bibblock"> Amlaan Bhoi. </span> <span class="ltx_bibblock">Monocular depth estimation: A survey. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib6.1.1">arXiv preprint arXiv:1901.09402</em>, 2019. </span> </li> <li class="ltx_bibitem" id="bib.bib7"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Birchfield and Tomasi [1999]</span> <span class="ltx_bibblock"> Stan Birchfield and Carlo Tomasi. </span> <span class="ltx_bibblock">Depth discontinuities by pixel-to-pixel stereo. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib7.1.1">IJCV</em>, 35, 1999. </span> </li> <li class="ltx_bibitem" id="bib.bib8"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Buxton and Buxton [1983]</span> <span class="ltx_bibblock"> BF Buxton and Hilary Buxton. </span> <span class="ltx_bibblock">Monocular depth perception from optical flow by space time signal processing. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib8.1.1">Proc. R. Soc. B.</em>, 218(1210):27–47, 1983. </span> </li> <li class="ltx_bibitem" id="bib.bib9"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Cao et al. [2023]</span> <span class="ltx_bibblock"> Junli Cao, Huan Wang, Pavlo Chemerys, Vladislav Shakhrai, Ju Hu, Yun Fu, Denys Makoviichuk, Sergey Tulyakov, and Jian Ren. </span> <span class="ltx_bibblock">Real-time neural light field on mobile devices. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib9.1.1">CVPR</em>, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib10"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Castleman [1996]</span> <span class="ltx_bibblock"> Kenneth R Castleman. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib10.1.1">Digital image processing</em>. </span> <span class="ltx_bibblock">Prentice Hall Press, 1996. </span> </li> <li class="ltx_bibitem" id="bib.bib11"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Chang et al. [2015]</span> <span class="ltx_bibblock"> Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. </span> <span class="ltx_bibblock">Shapenet: An information-rich 3d model repository. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib11.1.1">arXiv preprint arXiv:1512.03012</em>, 2015. </span> </li> <li class="ltx_bibitem" id="bib.bib12"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Charatan et al. [2024]</span> <span class="ltx_bibblock"> David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. </span> <span class="ltx_bibblock">pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib12.1.1">CVPR</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib13"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Chen et al. [2022]</span> <span class="ltx_bibblock"> Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. </span> <span class="ltx_bibblock">Tensorf: Tensorial radiance fields. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib13.1.1">ECCV</em>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib14"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Chen et al. [2025]</span> <span class="ltx_bibblock"> Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. </span> <span class="ltx_bibblock">Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib14.1.1">ECCV</em>, 2025. </span> </li> <li class="ltx_bibitem" id="bib.bib15"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Cheng et al. [2024]</span> <span class="ltx_bibblock"> Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, and Xuejin Chen. </span> <span class="ltx_bibblock">Gaussianpro: 3d gaussian splatting with progressive propagation. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib15.1.1">ICML</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib16"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Choy et al. [2016]</span> <span class="ltx_bibblock"> Christopher B Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, and Silvio Savarese. </span> <span class="ltx_bibblock">3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib16.1.1">ECCV</em>, 2016. </span> </li> <li class="ltx_bibitem" id="bib.bib17"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Dai et al. [2020]</span> <span class="ltx_bibblock"> Peng Dai, Yinda Zhang, Zhuwen Li, Shuaicheng Liu, and Bing Zeng. </span> <span class="ltx_bibblock">Neural point cloud rendering via multi-plane projection. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib17.1.1">CVPR</em>, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib18"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Du et al. [2023]</span> <span class="ltx_bibblock"> Yilun Du, Cameron Smith, Ayush Tewari, and Vincent Sitzmann. </span> <span class="ltx_bibblock">Learning to render novel views from wide-baseline stereo pairs. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib18.1.1">ICCV</em>, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib19"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Duzceker et al. [2021]</span> <span class="ltx_bibblock"> Arda Duzceker, Silvano Galliani, Christoph Vogel, Pablo Speciale, Mihai Dusmanu, and Marc Pollefeys. </span> <span class="ltx_bibblock">Deepvideomvs: Multi-view stereo on video with recurrent spatio-temporal fusion. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib19.1.1">CVPR</em>, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib20"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Facil et al. [2019]</span> <span class="ltx_bibblock"> Jose M Facil, Benjamin Ummenhofer, Huizhong Zhou, Luis Montesano, Thomas Brox, and Javier Civera. </span> <span class="ltx_bibblock">Cam-convs: Camera-aware multi-scale convolutions for single-view depth. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib20.1.1">CVPR</em>, 2019. </span> </li> <li class="ltx_bibitem" id="bib.bib21"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Fan et al. [2020]</span> <span class="ltx_bibblock"> Rui Fan, Hengli Wang, Peide Cai, and Ming Liu. </span> <span class="ltx_bibblock">Sne-roadseg: Incorporating surface normal information into semantic segmentation for accurate freespace detection. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib21.1.1">ECCV</em>, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib22"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Flynn et al. [2016]</span> <span class="ltx_bibblock"> John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. </span> <span class="ltx_bibblock">Deepstereo: Learning to predict new views from the world’s imagery. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib22.1.1">CVPR</em>, 2016. </span> </li> <li class="ltx_bibitem" id="bib.bib23"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Furukawa et al. [2015]</span> <span class="ltx_bibblock"> Yasutaka Furukawa, Carlos Hernández, et al. </span> <span class="ltx_bibblock">Multi-view stereo: A tutorial. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib23.1.1">FTCGV</em>, 9(1-2):1–148, 2015. </span> </li> <li class="ltx_bibitem" id="bib.bib24"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Galliani et al. [2015]</span> <span class="ltx_bibblock"> Silvano Galliani, Katrin Lasinger, and Konrad Schindler. </span> <span class="ltx_bibblock">Massively parallel multiview stereopsis by surface normal diffusion. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib24.1.1">ICCV</em>, 2015. </span> </li> <li class="ltx_bibitem" id="bib.bib25"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Geiger et al. [2012]</span> <span class="ltx_bibblock"> Andreas Geiger, Philip Lenz, and Raquel Urtasun. </span> <span class="ltx_bibblock">Are we ready for autonomous driving? the kitti vision benchmark suite. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib25.1.1">CVPR</em>, 2012. </span> </li> <li class="ltx_bibitem" id="bib.bib26"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Geiger et al. [2013]</span> <span class="ltx_bibblock"> Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. </span> <span class="ltx_bibblock">Vision meets robotics: The kitti dataset. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib26.1.1">Int. J. Rob. Res.</em>, 32(11):1231–1237, 2013. </span> </li> <li class="ltx_bibitem" id="bib.bib27"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Gui et al. [2024]</span> <span class="ltx_bibblock"> Ming Gui, Johannes S Fischer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, and Björn Ommer. </span> <span class="ltx_bibblock">Depthfm: Fast monocular depth estimation with flow matching. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib27.1.1">arXiv preprint arXiv:2403.13788</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib28"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">He et al. [2016]</span> <span class="ltx_bibblock"> Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. </span> <span class="ltx_bibblock">Deep residual learning for image recognition. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib28.1.1">CVPR</em>, 2016. </span> </li> <li class="ltx_bibitem" id="bib.bib29"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Hiep et al. [2009]</span> <span class="ltx_bibblock"> Vu Hoang Hiep, Renaud Keriven, Patrick Labatut, and Jean-Philippe Pons. </span> <span class="ltx_bibblock">Towards high-resolution large-scale multi-view stereo. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib29.1.1">CVPR</em>, 2009. </span> </li> <li class="ltx_bibitem" id="bib.bib30"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Hu et al. [2024]</span> <span class="ltx_bibblock"> Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, and Shaojie Shen. </span> <span class="ltx_bibblock">Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib30.1.1">arXiv preprint arXiv:2404.15506</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib31"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Huang et al. [2018]</span> <span class="ltx_bibblock"> Po-Han Huang, Kevin Matzen, Johannes Kopf, Narendra Ahuja, and Jia-Bin Huang. </span> <span class="ltx_bibblock">Deepmvs: Learning multi-view stereopsis. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib31.1.1">CVPR</em>, 2018. </span> </li> <li class="ltx_bibitem" id="bib.bib32"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Izadi et al. [2011]</span> <span class="ltx_bibblock"> Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew Davison, et al. </span> <span class="ltx_bibblock">Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib32.1.1">UIST</em>, 2011. </span> </li> <li class="ltx_bibitem" id="bib.bib33"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Kerbl et al. [2023]</span> <span class="ltx_bibblock"> Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. </span> <span class="ltx_bibblock">3d gaussian splatting for real-time radiance field rendering. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib33.1.1">ACM TOG</em>, 42(4):139–1, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib34"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Khoshelham and Elberink [2012]</span> <span class="ltx_bibblock"> Kourosh Khoshelham and Sander Oude Elberink. </span> <span class="ltx_bibblock">Accuracy and resolution of kinect depth data for indoor mapping applications. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib34.1.1">Sensors</em>, 12(2):1437–1454, 2012. </span> </li> <li class="ltx_bibitem" id="bib.bib35"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Li et al. [2015]</span> <span class="ltx_bibblock"> Bo Li, Chunhua Shen, Yuchao Dai, Anton Van Den Hengel, and Mingyi He. </span> <span class="ltx_bibblock">Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib35.1.1">CVPR</em>, 2015. </span> </li> <li class="ltx_bibitem" id="bib.bib36"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Li et al. [2021a]</span> <span class="ltx_bibblock"> Jiaxin Li, Zijian Feng, Qi She, Henghui Ding, Changhu Wang, and Gim Hee Lee. </span> <span class="ltx_bibblock">Mine: Towards continuous depth mpi with nerf for novel view synthesis. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib36.1.1">ICCV</em>, 2021a. </span> </li> <li class="ltx_bibitem" id="bib.bib37"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Li et al. [2019]</span> <span class="ltx_bibblock"> Peiliang Li, Xiaozhi Chen, and Shaojie Shen. </span> <span class="ltx_bibblock">Stereo r-cnn based 3d object detection for autonomous driving. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib37.1.1">CVPR</em>, 2019. </span> </li> <li class="ltx_bibitem" id="bib.bib38"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Li et al. [2021b]</span> <span class="ltx_bibblock"> Zhengqi Li, Simon Niklaus, Noah Snavely, and Oliver Wang. </span> <span class="ltx_bibblock">Neural scene flow fields for space-time view synthesis of dynamic scenes. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib38.1.1">CVPR</em>, 2021b. </span> </li> <li class="ltx_bibitem" id="bib.bib39"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Long et al. [2024]</span> <span class="ltx_bibblock"> Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. </span> <span class="ltx_bibblock">Wonder3d: Single image to 3d using cross-domain diffusion. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib39.1.1">CVPR</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib40"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Max [1995]</span> <span class="ltx_bibblock"> Nelson Max. </span> <span class="ltx_bibblock">Optical models for direct volume rendering. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib40.1.1">TVCG</em>, 1(2):99–108, 1995. </span> </li> <li class="ltx_bibitem" id="bib.bib41"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Mescheder et al. [2019]</span> <span class="ltx_bibblock"> Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. </span> <span class="ltx_bibblock">Occupancy networks: Learning 3d reconstruction in function space. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib41.1.1">CVPR</em>, 2019. </span> </li> <li class="ltx_bibitem" id="bib.bib42"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Mildenhall et al. [2021]</span> <span class="ltx_bibblock"> Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. </span> <span class="ltx_bibblock">Nerf: Representing scenes as neural radiance fields for view synthesis. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib42.1.1">CACM</em>, 65(1):99–106, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib43"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Mossel and Kroeter [2016]</span> <span class="ltx_bibblock"> Annette Mossel and Manuel Kroeter. </span> <span class="ltx_bibblock">Streaming and exploration of dynamically changing dense 3d reconstructions in immersive virtual reality. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib43.1.1">ISMAR</em>, 2016. </span> </li> <li class="ltx_bibitem" id="bib.bib44"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Müller et al. [2022]</span> <span class="ltx_bibblock"> Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. </span> <span class="ltx_bibblock">Instant neural graphics primitives with a multiresolution hash encoding. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib44.1.1">ACM TOG</em>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib45"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Nex and Remondino [2014]</span> <span class="ltx_bibblock"> Francesco Nex and Fabio Remondino. </span> <span class="ltx_bibblock">Uav for 3d mapping applications: a review. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib45.1.1">Applied geomatics</em>, 6:1–15, 2014. </span> </li> <li class="ltx_bibitem" id="bib.bib46"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Piccinelli et al. [2024]</span> <span class="ltx_bibblock"> Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, and Fisher Yu. </span> <span class="ltx_bibblock">Unidepth: Universal monocular metric depth estimation. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib46.1.1">CVPR</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib47"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Ramamoorthi and Hanrahan [2001]</span> <span class="ltx_bibblock"> Ravi Ramamoorthi and Pat Hanrahan. </span> <span class="ltx_bibblock">A signal-processing framework for inverse rendering. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib47.1.1">SIGGRAPH</em>, 2001. </span> </li> <li class="ltx_bibitem" id="bib.bib48"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Ronneberger et al. [2015]</span> <span class="ltx_bibblock"> Olaf Ronneberger, Philipp Fischer, and Thomas Brox. </span> <span class="ltx_bibblock">U-net: Convolutional networks for biomedical image segmentation. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib48.1.1">MICCAI</em>, 2015. </span> </li> <li class="ltx_bibitem" id="bib.bib49"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Rusinkiewicz and Levoy [2000]</span> <span class="ltx_bibblock"> Szymon Rusinkiewicz and Marc Levoy. </span> <span class="ltx_bibblock">Qsplat: A multiresolution point rendering system for large meshes. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib49.1.1">SIGGRAPH</em>, 2000. </span> </li> <li class="ltx_bibitem" id="bib.bib50"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Saxena et al. [2008]</span> <span class="ltx_bibblock"> Ashutosh Saxena, Sung H Chung, and Andrew Y Ng. </span> <span class="ltx_bibblock">3-d depth reconstruction from a single still image. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib50.1.1">IJCV</em>, 76:53–69, 2008. </span> </li> <li class="ltx_bibitem" id="bib.bib51"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Schönberger et al. [2016]</span> <span class="ltx_bibblock"> Johannes L Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. </span> <span class="ltx_bibblock">Pixelwise view selection for unstructured multi-view stereo. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib51.1.1">ECCV</em>, 2016. </span> </li> <li class="ltx_bibitem" id="bib.bib52"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Seitz et al. [2006]</span> <span class="ltx_bibblock"> Steven M Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. </span> <span class="ltx_bibblock">A comparison and evaluation of multi-view stereo reconstruction algorithms. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib52.1.1">CVPR</em>, 2006. </span> </li> <li class="ltx_bibitem" id="bib.bib53"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Shao et al. [2023]</span> <span class="ltx_bibblock"> Shuwei Shao, Zhongcai Pei, Weihai Chen, Xingming Wu, and Zhengguo Li. </span> <span class="ltx_bibblock">Nddepth: Normal-distance assisted monocular depth estimation. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib53.1.1">ICCV</em>, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib54"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Shi et al. [2023]</span> <span class="ltx_bibblock"> Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, and Xiao Yang. </span> <span class="ltx_bibblock">Mvdream: Multi-view diffusion for 3d generation. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib54.1.1">arXiv preprint arXiv:2308.16512</em>, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib55"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Silberman et al. [2012]</span> <span class="ltx_bibblock"> Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. </span> <span class="ltx_bibblock">Indoor segmentation and support inference from rgbd images. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib55.1.1">ECCV</em>, 2012. </span> </li> <li class="ltx_bibitem" id="bib.bib56"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Sitzmann et al. [2019]</span> <span class="ltx_bibblock"> Vincent Sitzmann, Justus Thies, Felix Heide, Matthias Nießner, Gordon Wetzstein, and Michael Zollhofer. </span> <span class="ltx_bibblock">Deepvoxels: Learning persistent 3d feature embeddings. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib56.1.1">CVPR</em>, 2019. </span> </li> <li class="ltx_bibitem" id="bib.bib57"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Sitzmann et al. [2020]</span> <span class="ltx_bibblock"> Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. </span> <span class="ltx_bibblock">Implicit neural representations with periodic activation functions. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib57.1.1">NeurIPS</em>, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib58"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Sitzmann et al. [2021]</span> <span class="ltx_bibblock"> Vincent Sitzmann, Semon Rezchikov, William T Freeman, Joshua B Tenenbaum, and Fredo Durand. </span> <span class="ltx_bibblock">Light field networks: Neural scene representations with single-evaluation rendering. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib58.1.1">NeurIPS</em>, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib59"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Sloan et al. [2023]</span> <span class="ltx_bibblock"> Peter-Pike Sloan, Jan Kautz, and John Snyder. </span> <span class="ltx_bibblock">Precomputed radiance transfer for real-time rendering in dynamic, low-frequency lighting environments. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib59.1.1">SGP</em>, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib60"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Szymanowicz et al. [2024a]</span> <span class="ltx_bibblock"> Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, João F Campbell, Dylan andHenriques, Christian Rupprecht, and Andrea Vedaldi. </span> <span class="ltx_bibblock">Flash3d: Feed-forward generalisable 3d scene reconstruction from a single image. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib60.1.1">arXiv preprint arXiv:2406.04343</em>, 2024a. </span> </li> <li class="ltx_bibitem" id="bib.bib61"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Szymanowicz et al. [2024b]</span> <span class="ltx_bibblock"> Stanislaw Szymanowicz, Chrisitian Rupprecht, and Andrea Vedaldi. </span> <span class="ltx_bibblock">Splatter image: Ultra-fast single-view 3d reconstruction. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib61.1.1">CVPR</em>, 2024b. </span> </li> <li class="ltx_bibitem" id="bib.bib62"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Teichmann et al. [2018]</span> <span class="ltx_bibblock"> Marvin Teichmann, Michael Weber, Marius Zoellner, Roberto Cipolla, and Raquel Urtasun. </span> <span class="ltx_bibblock">Multinet: Real-time joint semantic reasoning for autonomous driving. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib62.1.1">IEEE IV</em>, 2018. </span> </li> <li class="ltx_bibitem" id="bib.bib63"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Tucker and Snavely [2020]</span> <span class="ltx_bibblock"> Richard Tucker and Noah Snavely. </span> <span class="ltx_bibblock">Single-view view synthesis with multiplane images. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib63.1.1">CVPR</em>, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib64"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Tulsiani et al. [2018]</span> <span class="ltx_bibblock"> Shubham Tulsiani, Richard Tucker, and Noah Snavely. </span> <span class="ltx_bibblock">Layer-structured 3d scene inference via view synthesis. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib64.1.1">Proceedings of the European Conference on Computer Vision (ECCV)</em>, 2018. </span> </li> <li class="ltx_bibitem" id="bib.bib65"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Turkulainen et al. [2024]</span> <span class="ltx_bibblock"> Matias Turkulainen, Xuqian Ren, Iaroslav Melekhov, Otto Seiskari, Esa Rahtu, and Juho Kannala. </span> <span class="ltx_bibblock">Dn-splatter: Depth and normal priors for gaussian splatting and meshing. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib65.1.1">arXiv preprint arXiv:2403.17822</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib66"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Vaswani et al. [2017]</span> <span class="ltx_bibblock"> Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. </span> <span class="ltx_bibblock">Attention is all you need. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib66.1.1">NeurIPS</em>, 2017. </span> </li> <li class="ltx_bibitem" id="bib.bib67"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Wang et al. [2022a]</span> <span class="ltx_bibblock"> Huan Wang, Jian Ren, Zeng Huang, Kyle Olszewski, Menglei Chai, Yun Fu, and Sergey Tulyakov. </span> <span class="ltx_bibblock">R2l: Distilling neural radiance field to neural light field for efficient novel view synthesis. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib67.1.1">ECCV</em>, 2022a. </span> </li> <li class="ltx_bibitem" id="bib.bib68"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Wang et al. [2022b]</span> <span class="ltx_bibblock"> Jiepeng Wang, Peng Wang, Xiaoxiao Long, Christian Theobalt, Taku Komura, Lingjie Liu, and Wenping Wang. </span> <span class="ltx_bibblock">Neuris: Neural reconstruction of indoor scenes using normal priors. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib68.1.1">ECCV</em>, 2022b. </span> </li> <li class="ltx_bibitem" id="bib.bib69"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Wang et al. [2018]</span> <span class="ltx_bibblock"> Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. </span> <span class="ltx_bibblock">Non-local neural networks. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib69.1.1">CVPR</em>, 2018. </span> </li> <li class="ltx_bibitem" id="bib.bib70"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Wang et al. [2004]</span> <span class="ltx_bibblock"> Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. </span> <span class="ltx_bibblock">Image quality assessment: from error visibility to structural similarity. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib70.1.1">TIP</em>, 13(4):600–612, 2004. </span> </li> <li class="ltx_bibitem" id="bib.bib71"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Westover [1990]</span> <span class="ltx_bibblock"> Lee Westover. </span> <span class="ltx_bibblock">Footprint evaluation for volume rendering. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib71.1.1">SIGGRAPH</em>, 1990. </span> </li> <li class="ltx_bibitem" id="bib.bib72"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Wewer et al. [2024]</span> <span class="ltx_bibblock"> Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, and Jan Eric Lenssen. </span> <span class="ltx_bibblock">latentsplat: Autoencoding variational gaussians for fast generalizable 3d reconstruction. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib72.1.1">arXiv preprint arXiv:2403.16292</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib73"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Wiles et al. [2020]</span> <span class="ltx_bibblock"> Olivia Wiles, Georgia Gkioxari, Richard Szeliski, and Justin Johnson. </span> <span class="ltx_bibblock">Synsin: End-to-end view synthesis from a single image. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib73.1.1">CVPR</em>, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib74"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Wimbauer et al. [2023]</span> <span class="ltx_bibblock"> Felix Wimbauer, Nan Yang, Christian Rupprecht, and Daniel Cremers. </span> <span class="ltx_bibblock">Behind the scenes: Density fields for single view reconstruction. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib74.1.1">CVPR</em>, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib75"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Wofk et al. [2019]</span> <span class="ltx_bibblock"> Diana Wofk, Fangchang Ma, Tien-Ju Yang, Sertac Karaman, and Vivienne Sze. </span> <span class="ltx_bibblock">Fastdepth: Fast monocular depth estimation on embedded systems. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib75.1.1">ICRA</em>, 2019. </span> </li> <li class="ltx_bibitem" id="bib.bib76"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Wu et al. [2016]</span> <span class="ltx_bibblock"> Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. </span> <span class="ltx_bibblock">Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib76.1.1">NeurIPS</em>, 2016. </span> </li> <li class="ltx_bibitem" id="bib.bib77"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Wu et al. [2024]</span> <span class="ltx_bibblock"> Xianzu Wu, Xianfeng Wu, Tianyu Luan, Yajing Bai, Zhongyuan Lai, and Junsong Yuan. </span> <span class="ltx_bibblock">Fsc: Few-point shape completion. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib77.1.1">CVPR</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib78"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Yang et al. [2024]</span> <span class="ltx_bibblock"> Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. </span> <span class="ltx_bibblock">Depth anything: Unleashing the power of large-scale unlabeled data. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib78.1.1">CVPR</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib79"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Ye et al. [2024]</span> <span class="ltx_bibblock"> Chongjie Ye, Lingteng Qiu, Xiaodong Gu, Qi Zuo, Yushuang Wu, Zilong Dong, Liefeng Bo, Yuliang Xiu, and Xiaoguang Han. </span> <span class="ltx_bibblock">Stablenormal: Reducing diffusion variance for stable and sharp normal. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib79.1.1">arXiv preprint arXiv:2406.16864</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib80"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Yin et al. [2019]</span> <span class="ltx_bibblock"> Wei Yin, Yifan Liu, Chunhua Shen, and Youliang Yan. </span> <span class="ltx_bibblock">Enforcing geometric constraints of virtual normal for depth prediction. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib80.1.1">ICCV</em>, 2019. </span> </li> <li class="ltx_bibitem" id="bib.bib81"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Yu et al. [2022]</span> <span class="ltx_bibblock"> Zehao Yu, Songyou Peng, Michael Niemeyer, Torsten Sattler, and Andreas Geiger. </span> <span class="ltx_bibblock">Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib81.1.1">NeurIPS</em>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib82"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Zhang et al. [2018]</span> <span class="ltx_bibblock"> Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. </span> <span class="ltx_bibblock">The unreasonable effectiveness of deep features as a perceptual metric. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib82.1.1">CVPR</em>, 2018. </span> </li> <li class="ltx_bibitem" id="bib.bib83"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Zhang et al. [2023]</span> <span class="ltx_bibblock"> Yi Zhang, Xiaoyang Huang, Bingbing Ni, Teng Li, and Wenjun Zhang. </span> <span class="ltx_bibblock">Frequency-modulated point cloud rendering with easy editing. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib83.1.1">CVPR</em>, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib84"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Zhou et al. [2019]</span> <span class="ltx_bibblock"> Hao Zhou, Xiang Yu, and David W Jacobs. </span> <span class="ltx_bibblock">Glosh: Global-local spherical harmonics for intrinsic image decomposition. </span> <span class="ltx_bibblock">In <em class="ltx_emph ltx_font_italic" id="bib.bib84.1.1">CVPR</em>, 2019. </span> </li> <li class="ltx_bibitem" id="bib.bib85"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Zhou et al. [2018]</span> <span class="ltx_bibblock"> Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. </span> <span class="ltx_bibblock">Stereo magnification: Learning view synthesis using multiplane images. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib85.1.1">arXiv preprint arXiv:1805.09817</em>, 2018. </span> </li> <li class="ltx_bibitem" id="bib.bib86"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Zingoni et al. [2015]</span> <span class="ltx_bibblock"> Andrea Zingoni, Marco Diani, Giovanni Corsini, and A Masini. </span> <span class="ltx_bibblock">Real-time 3d reconstruction from images taken from an uav. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib86.1.1">ISPRS</em>, 40:313–319, 2015. </span> </li> <li class="ltx_bibitem" id="bib.bib87"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Zitnick and Kanade [2000]</span> <span class="ltx_bibblock"> C Lawrence Zitnick and Takeo Kanade. </span> <span class="ltx_bibblock">A cooperative algorithm for stereo matching and occlusion detection. </span> <span class="ltx_bibblock"><em class="ltx_emph ltx_font_italic" id="bib.bib87.1.1">TPAMI</em>, 22(7):675–684, 2000. </span> </li> </ul> </section> <section class="ltx_appendix" id="A1"> <h2 class="ltx_title ltx_title_appendix"> <span class="ltx_tag ltx_tag_appendix">Appendix A </span>Implementation Details</h2> <div class="ltx_para" id="A1.p1"> <p class="ltx_p" id="A1.p1.1">In this section, we describe the implementation details of our method, including the network architecture and training hyperparameters.</p> </div> <div class="ltx_para ltx_noindent" id="A1.p2"> <p class="ltx_p" id="A1.p2.1"><span class="ltx_text ltx_font_bold" id="A1.p2.1.1">Backbone Structure.</span> We utilize a ResNet-based architecture inspired by UniDepth2 <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib46" title="">46</a>]</cite>, specifically employing ResNet50 as the backbone. Custom modifications include 64 channels for feature extraction and maintaining resolution during the upsampling stage, drawing inspiration from the Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite> architecture. Pre-trained weights are used to enhance initial performance.</p> </div> <div class="ltx_para ltx_noindent" id="A1.p3"> <p class="ltx_p" id="A1.p3.1"><span class="ltx_text ltx_font_bold" id="A1.p3.1.1">Geometric Affine Field.</span> The geometric affine field has a plane size of 32 with 64 channels, enabling robust spatial feature representation for complex scene rendering.</p> </div> <div class="ltx_para ltx_noindent" id="A1.p4"> <p class="ltx_p" id="A1.p4.1"><span class="ltx_text ltx_font_bold" id="A1.p4.1.1">UniDepth Model.</span> The UniDepth model incorporates the Vision Transformer (ViT-L/14), which is known for its strong contextual representation capabilities.</p> </div> <div class="ltx_para ltx_noindent" id="A1.p5"> <p class="ltx_p" id="A1.p5.1"><span class="ltx_text ltx_font_bold" id="A1.p5.1.1">Depth Model.</span> The depth estimation model adopts a ResNet-style architecture with 50 layers. Decoder layers are designed to process feature maps with progressively increasing channels (32, 32, 64, 128, 256), ensuring the capture of both fine and coarse details. The design includes pre-batch normalization and random background color augmentation within the Gaussian rendering pipeline, improving robustness across diverse scenarios.</p> </div> <div class="ltx_para ltx_noindent" id="A1.p6"> <p class="ltx_p" id="A1.p6.1"><span class="ltx_text ltx_font_bold" id="A1.p6.1.1">3D Self-Attention.</span> The 3D self-attention module employs 8 attention heads with 64-dimensional hidden space per head (attn_heads: 8, attn_dim_head: 64), stacked across 2 transformer layers (attn_layers: 2). This design captures long-range spatiotemporal dependencies while maintaining computational efficiency.</p> </div> <div class="ltx_para ltx_noindent" id="A1.p7"> <p class="ltx_p" id="A1.p7.1"><span class="ltx_text ltx_font_bold" id="A1.p7.1.1">Multi-frame and Gaussian Handling.</span> The model processes frames (-1, 0, 1, 2) to estimate depth, utilizing two Gaussians per pixel to refine density and depth predictions. Gaussian rendering is employed to achieve high-precision visual outputs. Additionally, pose information is integrated with rendering, further improving performance.</p> </div> <div class="ltx_para ltx_noindent" id="A1.p8"> <p class="ltx_p" id="A1.p8.1"><span class="ltx_text ltx_font_bold" id="A1.p8.1.1">Hyperparameters.</span> The training pipeline for the RE10K dataset is optimized using a consistent set of hyperparameters across multiple GPUs. A batch size of 16 with 16 data loader workers ensures efficient input throughput. The learning rate is set to 0.0001, with mixed precision (16-bit) training to optimize computational resources. Training runs for one epoch with a step scheduler reducing the learning rate every 5,000 steps. Models are checkpointed every 5,000 iterations, with progress logged every 250 steps. To balance storage and tracking, a maximum of five checkpoints is retained. Exponential Moving Average (EMA) updates occur every 10 steps after an initial buffer of 100 steps. Depth accuracy is improved by scaling poses based on estimated depths.</p> </div> <div class="ltx_para ltx_noindent" id="A1.p9"> <p class="ltx_p" id="A1.p9.1"><span class="ltx_text ltx_font_bold" id="A1.p9.1.1">Loss Configuration.</span> The loss function integrates multiple components to achieve a balance between spatial precision and perceptual quality. Gaussian scales and offsets are weighted to regulate spatial representation accuracy and ensure fine detail preservation. A combination of PSNR, SSIM, and LPIPS ensures pixel-wise accuracy and perceptual consistency, with SSIM and LPIPS activated after sufficient training steps to refine both visual fidelity and structural coherence.</p> </div> <div class="ltx_para ltx_noindent" id="A1.p10"> <p class="ltx_p" id="A1.p10.1"><span class="ltx_text ltx_font_bold" id="A1.p10.1.1">Dataset Specification.</span> The RE10K dataset is used for training, adhering to its original split. Data preprocessing includes normalization, depth handling, and normal map processing. Additional features streamline dataset preparation, while comprehensive dilation and augmentation strategies address dataset variability, ensuring robust training.</p> </div> </section> <section class="ltx_appendix" id="A2"> <h2 class="ltx_title ltx_title_appendix"> <span class="ltx_tag ltx_tag_appendix">Appendix B </span>More Qualitative Comparison</h2> <figure class="ltx_figure" id="A2.F6"> <table class="ltx_tabular ltx_centering ltx_align_middle" id="A2.F6.45"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="A2.F6.45.46.1"> <td class="ltx_td ltx_align_center" colspan="6" id="A2.F6.45.46.1.1" style="padding-bottom:5.69054pt;padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text ltx_font_bold" id="A2.F6.45.46.1.1.1">Indoor:</span></td> </tr> <tr class="ltx_tr" id="A2.F6.5.5"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.F6.1.1.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.1.1.1.g1" src="x64.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.2.2.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.2.2.2.g1" src="x65.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.3.3.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.3.3.3.g1" src="x66.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.4.4.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.4.4.4.g1" src="x67.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.5.5.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.5.5.5.g1" src="x68.png" width="165"/></td> <td class="ltx_td ltx_nopad_l" id="A2.F6.5.5.6" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"></td> </tr> <tr class="ltx_tr" id="A2.F6.10.10"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.F6.6.6.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.6.6.1.g1" src="x69.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.7.7.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.7.7.2.g1" src="x70.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.8.8.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.8.8.3.g1" src="x71.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.9.9.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.9.9.4.g1" src="x72.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.10.10.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.10.10.5.g1" src="x73.png" width="165"/></td> <td class="ltx_td ltx_nopad_l" id="A2.F6.10.10.6" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"></td> </tr> <tr class="ltx_tr" id="A2.F6.15.15"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.F6.11.11.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.11.11.1.g1" src="x74.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.12.12.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.12.12.2.g1" src="x75.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.13.13.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.13.13.3.g1" src="x76.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.14.14.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.14.14.4.g1" src="x77.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.15.15.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.15.15.5.g1" src="x78.png" width="165"/></td> <td class="ltx_td ltx_nopad_l" id="A2.F6.15.15.6" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"></td> </tr> <tr class="ltx_tr" id="A2.F6.20.20"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.F6.16.16.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.16.16.1.g1" src="x79.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.17.17.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.17.17.2.g1" src="x80.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.18.18.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.18.18.3.g1" src="x81.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.19.19.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.19.19.4.g1" src="x82.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.20.20.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.20.20.5.g1" src="x83.png" width="165"/></td> <td class="ltx_td ltx_nopad_l" id="A2.F6.20.20.6" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"></td> </tr> <tr class="ltx_tr" id="A2.F6.25.25"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.F6.21.21.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.21.21.1.g1" src="x84.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.22.22.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.22.22.2.g1" src="x85.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.23.23.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.23.23.3.g1" src="x86.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.24.24.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.24.24.4.g1" src="x87.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.25.25.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.25.25.5.g1" src="x88.png" width="165"/></td> <td class="ltx_td ltx_nopad_l" id="A2.F6.25.25.6" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"></td> </tr> <tr class="ltx_tr" id="A2.F6.45.47.2"> <td class="ltx_td ltx_align_center" colspan="6" id="A2.F6.45.47.2.1" style="padding-bottom:5.69054pt;padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text ltx_font_bold" id="A2.F6.45.47.2.1.1">Outdoor:</span></td> </tr> <tr class="ltx_tr" id="A2.F6.30.30"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.F6.26.26.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.26.26.1.g1" src="x89.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.27.27.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.27.27.2.g1" src="x90.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.28.28.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.28.28.3.g1" src="x91.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.29.29.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.29.29.4.g1" src="x92.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.30.30.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.30.30.5.g1" src="x93.png" width="165"/></td> <td class="ltx_td ltx_nopad_l" id="A2.F6.30.30.6" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"></td> </tr> <tr class="ltx_tr" id="A2.F6.35.35"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.F6.31.31.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.31.31.1.g1" src="x94.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.32.32.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.32.32.2.g1" src="x95.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.33.33.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.33.33.3.g1" src="x96.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.34.34.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.34.34.4.g1" src="x97.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.35.35.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.35.35.5.g1" src="x98.png" width="165"/></td> <td class="ltx_td ltx_nopad_l" id="A2.F6.35.35.6" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"></td> </tr> <tr class="ltx_tr" id="A2.F6.40.40"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.F6.36.36.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.36.36.1.g1" src="x99.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.37.37.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.37.37.2.g1" src="x100.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.38.38.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.38.38.3.g1" src="x101.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.39.39.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.39.39.4.g1" src="x102.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.40.40.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.40.40.5.g1" src="x103.png" width="165"/></td> <td class="ltx_td ltx_nopad_l" id="A2.F6.40.40.6" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"></td> </tr> <tr class="ltx_tr" id="A2.F6.45.45"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.F6.41.41.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.41.41.1.g1" src="x104.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.42.42.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.42.42.2.g1" src="x105.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.43.43.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.43.43.3.g1" src="x106.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.44.44.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.44.44.4.g1" src="x107.png" width="165"/></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.45.45.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="109" id="A2.F6.45.45.5.g1" src="x108.png" width="165"/></td> <td class="ltx_td ltx_nopad_l" id="A2.F6.45.45.6" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"></td> </tr> <tr class="ltx_tr" id="A2.F6.45.48.3"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.F6.45.48.3.1" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text" id="A2.F6.45.48.3.1.1" style="font-size:90%;">(a) Input</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.45.48.3.2" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text" id="A2.F6.45.48.3.2.1" style="font-size:90%;">(b) Depth</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.45.48.3.3" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text" id="A2.F6.45.48.3.3.1" style="font-size:90%;">(c) Normal</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.45.48.3.4" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text" id="A2.F6.45.48.3.4.1" style="font-size:90%;">(d) Flash3D</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="A2.F6.45.48.3.5" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"><span class="ltx_text" id="A2.F6.45.48.3.5.1" style="font-size:90%;">(e) Ours</span></td> <td class="ltx_td ltx_nopad_l" id="A2.F6.45.48.3.6" style="padding-top:-4.5pt;padding-bottom:-4.5pt;"></td> </tr> </tbody> </table> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="A2.F6.48.1.1" style="font-size:90%;">Figure 6</span>: </span><span class="ltx_text ltx_font_bold" id="A2.F6.49.2" style="font-size:90%;">More qualitative results.<span class="ltx_text ltx_font_medium" id="A2.F6.49.2.1"> Qualitative comparison of 3D scene reconstruction performance under varying illumination and geometric complexity. The upper panel presents indoor environments featuring intricate room layouts, while the lower panel demonstrates outdoor architectural structures with surrounding vegetation. </span></span></figcaption> </figure> <figure class="ltx_table" id="A2.T4"> <table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle" id="A2.T4.2"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="A2.T4.2.1.1"> <th class="ltx_td ltx_th ltx_th_row ltx_border_tt" id="A2.T4.2.1.1.1"></th> <td class="ltx_td ltx_align_center ltx_border_tt" colspan="4" id="A2.T4.2.1.1.2"><span class="ltx_text" id="A2.T4.2.1.1.2.1" style="font-size:90%;">KITTI</span></td> </tr> <tr class="ltx_tr" id="A2.T4.2.2.2"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="A2.T4.2.2.2.1"><span class="ltx_text" id="A2.T4.2.2.2.1.1" style="font-size:90%;">Method</span></th> <td class="ltx_td ltx_align_center" id="A2.T4.2.2.2.2"><span class="ltx_text" id="A2.T4.2.2.2.2.1" style="font-size:90%;">CD</span></td> <td class="ltx_td ltx_align_center" id="A2.T4.2.2.2.3"><span class="ltx_text" id="A2.T4.2.2.2.3.1" style="font-size:90%;"> PSNR↑</span></td> <td class="ltx_td ltx_align_center" id="A2.T4.2.2.2.4"><span class="ltx_text" id="A2.T4.2.2.2.4.1" style="font-size:90%;"> SSIM↑</span></td> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.T4.2.2.2.5"><span class="ltx_text" id="A2.T4.2.2.2.5.1" style="font-size:90%;"> LPIPS↓</span></td> </tr> <tr class="ltx_tr" id="A2.T4.2.3.3"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="A2.T4.2.3.3.1"> <span class="ltx_text" id="A2.T4.2.3.3.1.1" style="font-size:90%;">LDI </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text" id="A2.T4.2.3.3.1.2.1" style="font-size:90%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib64" title="">64</a><span class="ltx_text" id="A2.T4.2.3.3.1.3.2" style="font-size:90%;">]</span></cite> </th> <td class="ltx_td ltx_align_center ltx_border_t" id="A2.T4.2.3.3.2"><span class="ltx_text" id="A2.T4.2.3.3.2.1" style="font-size:90%;">×</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="A2.T4.2.3.3.3"><span class="ltx_text" id="A2.T4.2.3.3.3.1" style="font-size:90%;"> 16.50</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="A2.T4.2.3.3.4"><span class="ltx_text" id="A2.T4.2.3.3.4.1" style="font-size:90%;"> 0.572</span></td> <td class="ltx_td ltx_nopad_r ltx_align_center ltx_border_t" id="A2.T4.2.3.3.5"><span class="ltx_text" id="A2.T4.2.3.3.5.1" style="font-size:90%;"> -</span></td> </tr> <tr class="ltx_tr" id="A2.T4.2.4.4"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="A2.T4.2.4.4.1"> <span class="ltx_text" id="A2.T4.2.4.4.1.1" style="font-size:90%;">SV-MPI </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text" id="A2.T4.2.4.4.1.2.1" style="font-size:90%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib63" title="">63</a><span class="ltx_text" id="A2.T4.2.4.4.1.3.2" style="font-size:90%;">]</span></cite> </th> <td class="ltx_td ltx_align_center" id="A2.T4.2.4.4.2"><span class="ltx_text" id="A2.T4.2.4.4.2.1" style="font-size:90%;">×</span></td> <td class="ltx_td ltx_align_center" id="A2.T4.2.4.4.3"><span class="ltx_text" id="A2.T4.2.4.4.3.1" style="font-size:90%;"> 19.50</span></td> <td class="ltx_td ltx_align_center" id="A2.T4.2.4.4.4"><span class="ltx_text" id="A2.T4.2.4.4.4.1" style="font-size:90%;"> 0.733</span></td> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.T4.2.4.4.5"><span class="ltx_text" id="A2.T4.2.4.4.5.1" style="font-size:90%;"> -</span></td> </tr> <tr class="ltx_tr" id="A2.T4.2.5.5"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="A2.T4.2.5.5.1"> <span class="ltx_text" id="A2.T4.2.5.5.1.1" style="font-size:90%;">BTS </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text" id="A2.T4.2.5.5.1.2.1" style="font-size:90%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib74" title="">74</a><span class="ltx_text" id="A2.T4.2.5.5.1.3.2" style="font-size:90%;">]</span></cite> </th> <td class="ltx_td ltx_align_center" id="A2.T4.2.5.5.2"><span class="ltx_text" id="A2.T4.2.5.5.2.1" style="font-size:90%;">×</span></td> <td class="ltx_td ltx_align_center" id="A2.T4.2.5.5.3"><span class="ltx_text" id="A2.T4.2.5.5.3.1" style="font-size:90%;"> 20.10</span></td> <td class="ltx_td ltx_align_center" id="A2.T4.2.5.5.4"><span class="ltx_text" id="A2.T4.2.5.5.4.1" style="font-size:90%;"> 0.761</span></td> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.T4.2.5.5.5"> <span class="ltx_text" id="A2.T4.2.5.5.5.1" style="font-size:90%;"> </span><span class="ltx_text ltx_framed ltx_framed_underline" id="A2.T4.2.5.5.5.2" style="font-size:90%;">0.144</span> </td> </tr> <tr class="ltx_tr" id="A2.T4.2.6.6"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="A2.T4.2.6.6.1"> <span class="ltx_text" id="A2.T4.2.6.6.1.1" style="font-size:90%;">MINE </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text" id="A2.T4.2.6.6.1.2.1" style="font-size:90%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib36" title="">36</a><span class="ltx_text" id="A2.T4.2.6.6.1.3.2" style="font-size:90%;">]</span></cite> </th> <td class="ltx_td ltx_align_center" id="A2.T4.2.6.6.2"><span class="ltx_text" id="A2.T4.2.6.6.2.1" style="font-size:90%;">×</span></td> <td class="ltx_td ltx_align_center" id="A2.T4.2.6.6.3"> <span class="ltx_text" id="A2.T4.2.6.6.3.1" style="font-size:90%;"> </span><span class="ltx_text ltx_font_bold" id="A2.T4.2.6.6.3.2" style="font-size:90%;">21.90</span> </td> <td class="ltx_td ltx_align_center" id="A2.T4.2.6.6.4"> <span class="ltx_text" id="A2.T4.2.6.6.4.1" style="font-size:90%;"> </span><span class="ltx_text ltx_font_bold" id="A2.T4.2.6.6.4.2" style="font-size:90%;">0.828</span> </td> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.T4.2.6.6.5"> <span class="ltx_text" id="A2.T4.2.6.6.5.1" style="font-size:90%;"> </span><span class="ltx_text ltx_font_bold" id="A2.T4.2.6.6.5.2" style="font-size:90%;">0.112</span> </td> </tr> <tr class="ltx_tr" id="A2.T4.2.7.7"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="A2.T4.2.7.7.1"> <span class="ltx_text" id="A2.T4.2.7.7.1.1" style="font-size:90%;">Flash3D </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text" id="A2.T4.2.7.7.1.2.1" style="font-size:90%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a><span class="ltx_text" id="A2.T4.2.7.7.1.3.2" style="font-size:90%;">]</span></cite> </th> <td class="ltx_td ltx_align_center" id="A2.T4.2.7.7.2"><span class="ltx_text" id="A2.T4.2.7.7.2.1" style="font-size:90%;">✓</span></td> <td class="ltx_td ltx_align_center" id="A2.T4.2.7.7.3"><span class="ltx_text" id="A2.T4.2.7.7.3.1" style="font-size:90%;"> 20.98</span></td> <td class="ltx_td ltx_align_center" id="A2.T4.2.7.7.4"> <span class="ltx_text" id="A2.T4.2.7.7.4.1" style="font-size:90%;"> </span><span class="ltx_text ltx_framed ltx_framed_underline" id="A2.T4.2.7.7.4.2" style="font-size:90%;">0.784</span> </td> <td class="ltx_td ltx_nopad_r ltx_align_center" id="A2.T4.2.7.7.5"><span class="ltx_text" id="A2.T4.2.7.7.5.1" style="font-size:90%;"> 0.159</span></td> </tr> <tr class="ltx_tr" id="A2.T4.2.8.8"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb ltx_border_tt" id="A2.T4.2.8.8.1"><span class="ltx_text" id="A2.T4.2.8.8.1.1" style="font-size:90%;">Ours</span></th> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_tt" id="A2.T4.2.8.8.2"><span class="ltx_text" id="A2.T4.2.8.8.2.1" style="font-size:90%;">✓</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_tt" id="A2.T4.2.8.8.3"> <span class="ltx_text" id="A2.T4.2.8.8.3.1" style="font-size:90%;"> </span><span class="ltx_text ltx_framed ltx_framed_underline" id="A2.T4.2.8.8.3.2" style="font-size:90%;">21.24</span> </td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_tt" id="A2.T4.2.8.8.4"><span class="ltx_text" id="A2.T4.2.8.8.4.1" style="font-size:90%;"> 0.779</span></td> <td class="ltx_td ltx_nopad_r ltx_align_center ltx_border_bb ltx_border_tt" id="A2.T4.2.8.8.5"><span class="ltx_text" id="A2.T4.2.8.8.5.1" style="font-size:90%;"> 0.158</span></td> </tr> </tbody> </table> <figcaption class="ltx_caption ltx_centering" style="font-size:90%;"><span class="ltx_tag ltx_tag_table">Table 4: </span>Comprehensive comparison on KITTI <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib25" title="">25</a>]</cite> dataset. Bold values indicate better performance. In this context, cross-domain (CD) indicates that the method was not trained on the dataset being evaluated. (<span class="ltx_text ltx_font_bold" id="A2.T4.12.1">Best</span> results are in bold, <span class="ltx_text ltx_framed ltx_framed_underline" id="A2.T4.13.2">second best</span> underlined. )</figcaption> </figure> <div class="ltx_para" id="A2.p1"> <p class="ltx_p" id="A2.p1.1">As demonstrated in <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#A2.F6" title="Figure 6 ‣ Appendix B More Qualitative Comparison ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Figure 6</span></a>, we systematically evaluate Niagara against Flash3D for single-view 3D scene reconstruction under diverse illumination conditions and geometric complexity. The upper section (five rows) examines indoor environments with intricate layouts and challenging lighting, while the lower section (four rows) analyzes outdoor architectural structures surrounded by vegetation. Our method exhibits four principal advantages:</p> </div> <div class="ltx_para" id="A2.p2"> <ul class="ltx_itemize" id="A2.I1"> <li class="ltx_item" id="A2.I1.i1" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="A2.I1.i1.p1"> <p class="ltx_p" id="A2.I1.i1.p1.1"><span class="ltx_text ltx_font_bold" id="A2.I1.i1.p1.1.1">Superior texture preservation.</span> Niagara consistently restores fine-grained material details across scenes. In indoor environments (e.g., Row 4, Indoor), it maintains precise transitions between wall textures, preserves sharp doorframe edges under mixed lighting, and accurately reconstructs reflective floor surfaces. For outdoor scenes (e.g., Row 2, Outdoor), the method captures layered vegetation textures on building façades, whereas Flash3D oversimplifies these details into flat regions.</p> </div> </li> <li class="ltx_item" id="A2.I1.i2" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="A2.I1.i2.p1"> <p class="ltx_p" id="A2.I1.i2.p1.1"><span class="ltx_text ltx_font_bold" id="A2.I1.i2.p1.1.1">Improved geometric structure reconstruction.</span> Niagara demonstrates enhanced recovery of spatially coherent layouts. In complex indoor settings (e.g., Row 3, Indoor), Niagara reconstructs furniture arrangements with accurate depth ordering and wall connectivity, eliminating the fragmented geometries observed in Flash3D. For outdoor structures (e.g., Row 1, Outdoor), it preserves architectural proportions such as window alignments and roof slopes, while Flash3D introduces perspective distortions.</p> </div> </li> <li class="ltx_item" id="A2.I1.i3" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="A2.I1.i3.p1"> <p class="ltx_p" id="A2.I1.i3.p1.1"><span class="ltx_text ltx_font_bold" id="A2.I1.i3.p1.1.1">Reduction of color bleeding and artifacts.</span> Niagara achieves natural color separation under challenging lighting. Indoor results (e.g., Row 5, Indoor) show distinct material boundaries between wooden furniture and painted walls, even under strong ambient light. In outdoor cases (e.g., Row 3, Outdoor), Niagara prevents vegetation hues from bleeding into adjacent stone pathways – a common failure mode in Flash3D outputs.</p> </div> </li> <li class="ltx_item" id="A2.I1.i4" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="A2.I1.i4.p1"> <p class="ltx_p" id="A2.I1.i4.p1.1"><span class="ltx_text ltx_font_bold" id="A2.I1.i4.p1.1.1">Enhanced generalization to open-domain scenarios.</span> Niagara demonstrates robust generalization across diverse scene types. For intricate indoor spaces (e.g., multi-room layouts in Row 2, Indoor), Niagara maintains consistent scale across interconnected areas, unlike Flash3D, which struggles with occluded regions. In vegetation-heavy outdoor scenes (e.g., Row 4, Outdoor), it reconstructs overlapping foliage and architectural elements without oversimplification of natural complexity, whereas Flash3D produces flattened geometry.</p> </div> </li> </ul> </div> <div class="ltx_para" id="A2.p3"> <p class="ltx_p" id="A2.p3.1">These advancements originate from Niagara’s novel integration of depth-normalized geometric constraints and 3D self-attention, a framework designed to resolve ambiguities in complex scene reconstruction. The depth-normal constraints enforce surface continuity through locally adaptive normal priors, effectively mitigating topological errors in occluded regions (e.g., overlapping foliage, multi-room junctions). Concurrently, the 3D self-attention mechanism learns long-range structural dependencies across scales, dynamically harmonizing geometric coherence. This dual strategy overcomes the rigidity of prior-based methods like Flash3D, which rely on static scene assumptions and lack mechanisms to refine geometry and appearance.</p> </div> </section> <section class="ltx_appendix" id="A3"> <h2 class="ltx_title ltx_title_appendix"> <span class="ltx_tag ltx_tag_appendix">Appendix C </span>KITTI Experiments</h2> <div class="ltx_para" id="A3.p1"> <p class="ltx_p" id="A3.p1.1">We present experimental results on the KITTI <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib25" title="">25</a>]</cite> dataset, demonstrating our model’s enhanced capability for outdoor scene reconstruction compared to Flash3D <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.12553v1#bib.bib60" title="">60</a>]</cite>. Technical constraints prevented the direct reproduction of metrics reported in the original Flash3D publication, though we remain in communication with the authors to resolve methodological discrepancies.</p> </div> <div class="ltx_para" id="A3.p2"> <p class="ltx_p" id="A3.p2.1">As shown in <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#A2.T4" title="Table 4 ‣ Appendix B More Qualitative Comparison ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Table 4</span></a>, our approach outperforms Flash3D under equivalent configurations. However, both methods exhibit deviations from standard evaluation protocols, particularly in Flash3D’s open-source implementation (setting an in <a class="ltx_ref ltx_refmacro_autoref" href="https://arxiv.org/html/2503.12553v1#A2.T4" title="Table 4 ‣ Appendix B More Qualitative Comparison ‣ Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View"><span class="ltx_text ltx_ref_tag">Table 4</span></a>). We attribute performance variations primarily to differences in KITTI dataset preprocessing methodologies.</p> </div> <div class="ltx_para" id="A3.p3"> <p class="ltx_p" id="A3.p3.1">While none of the results achieve optimal performance metrics, this analysis focuses on comparative configuration impacts rather than absolute performance evaluation.</p> </div> </section> </article> </div> <footer class="ltx_page_footer"> <div class="ltx_page_logo">Generated on Sun Mar 16 15:49:58 2025 by <a class="ltx_LaTeXML_logo" href="http://dlmf.nist.gov/LaTeXML/"><span style="letter-spacing:-0.2em; margin-right:0.1em;">L<span class="ltx_font_smallcaps" style="position:relative; bottom:2.2pt;">a</span>T<span class="ltx_font_smallcaps" style="font-size:120%;position:relative; bottom:-0.2ex;">e</span></span><span style="font-size:90%; position:relative; bottom:-0.2ex;">XML</span><img alt="Mascot Sammy" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAsAAAAOCAYAAAD5YeaVAAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A/wD/oL2nkwAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9wKExQZLWTEaOUAAAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAdpJREFUKM9tkL+L2nAARz9fPZNCKFapUn8kyI0e4iRHSR1Kb8ng0lJw6FYHFwv2LwhOpcWxTjeUunYqOmqd6hEoRDhtDWdA8ApRYsSUCDHNt5ul13vz4w0vWCgUnnEc975arX6ORqN3VqtVZbfbTQC4uEHANM3jSqXymFI6yWazP2KxWAXAL9zCUa1Wy2tXVxheKA9YNoR8Pt+aTqe4FVVVvz05O6MBhqUIBGk8Hn8HAOVy+T+XLJfLS4ZhTiRJgqIoVBRFIoric47jPnmeB1mW/9rr9ZpSSn3Lsmir1fJZlqWlUonKsvwWwD8ymc/nXwVBeLjf7xEKhdBut9Hr9WgmkyGEkJwsy5eHG5vN5g0AKIoCAEgkEkin0wQAfN9/cXPdheu6P33fBwB4ngcAcByHJpPJl+fn54mD3Gg0NrquXxeLRQAAwzAYj8cwTZPwPH9/sVg8PXweDAauqqr2cDjEer1GJBLBZDJBs9mE4zjwfZ85lAGg2+06hmGgXq+j3+/DsixYlgVN03a9Xu8jgCNCyIegIAgx13Vfd7vdu+FweG8YRkjXdWy329+dTgeSJD3ieZ7RNO0VAXAPwDEAO5VKndi2fWrb9jWl9Esul6PZbDY9Go1OZ7PZ9z/lyuD3OozU2wAAAABJRU5ErkJggg=="/></a> </div></footer> </div> </body> </html>