CINXE.COM

On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding

<!DOCTYPE html> <html lang="en"> <head> <meta content="text/html; charset=utf-8" http-equiv="content-type"/> <title>On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding</title> <!--Generated on Fri Oct 4 19:55:36 2024 by LaTeXML (version 0.8.8) http://dlmf.nist.gov/LaTeXML/.--> <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv-fonts.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/latexml_styles.css" rel="stylesheet" type="text/css"/> <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.3.3/html2canvas.min.js"></script> <script src="/static/browse/0.3.4/js/addons_new.js"></script> <script src="/static/browse/0.3.4/js/feedbackOverlay.js"></script> <meta content=" Learned video compression, conditional coding, and conditional residual coding. " lang="en" name="keywords"/> <base href="/html/2410.03898v1/"/></head> <body> <nav class="ltx_page_navbar"> <nav class="ltx_TOC"> <ol class="ltx_toclist"> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S1" title="In On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">I </span><span class="ltx_text ltx_font_smallcaps">Introduction</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S2" title="In On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">II </span><span class="ltx_text ltx_font_smallcaps">Proposed Method</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S2.SS1" title="In II Proposed Method ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-A</span> </span><span class="ltx_text ltx_font_italic">System Overview</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S2.SS2" title="In II Proposed Method ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-B</span> </span><span class="ltx_text ltx_font_italic">Inter-frame Coding</span></span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S2.SS2.SSS1" title="In II-B Inter-frame Coding ‣ II Proposed Method ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-B</span>1 </span>Conditional Coding</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S2.SS2.SSS2" title="In II-B Inter-frame Coding ‣ II Proposed Method ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-B</span>2 </span>Conditional Residual Coding</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S2.SS2.SSS3" title="In II-B Inter-frame Coding ‣ II Proposed Method ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-B</span>3 </span>Masked Conditional Residual Coding</span></a></li> </ol> </li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S3" title="In On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">III </span><span class="ltx_text ltx_font_smallcaps">Experiments</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S3.SS1" title="In III Experiments ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-A</span> </span><span class="ltx_text ltx_font_italic">Training</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S3.SS2" title="In III Experiments ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-B</span> </span><span class="ltx_text ltx_font_italic">Evaluation</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S3.SS3" title="In III Experiments ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-C</span> </span><span class="ltx_text ltx_font_italic">Rate-Distortion Performance</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S3.SS4" title="In III Experiments ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-D</span> </span><span class="ltx_text ltx_font_italic">Complexity Analysis</span></span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S4" title="In On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">IV </span><span class="ltx_text ltx_font_smallcaps">Conclusion</span></span></a></li> </ol></nav> </nav> <div class="ltx_page_main"> <div class="ltx_page_content"> <article class="ltx_document ltx_authors_1line"> <h1 class="ltx_title ltx_title_document">On the Rate-Distortion-Complexity Trade-offs <br class="ltx_break"/>of Neural Video Coding</h1> <div class="ltx_authors"> <span class="ltx_creator ltx_role_author"> <span class="ltx_personname"> Yi-Hsin Chen<span class="ltx_text" id="id1.1.1" style="position:relative; bottom:0.0pt;"><sup class="ltx_sup" id="id1.1.1.1"><math alttext="1" class="ltx_Math" display="inline" id="id1.1.1.1.m1.1"><semantics id="id1.1.1.1.m1.1a"><mn id="id1.1.1.1.m1.1.1" mathsize="80%" xref="id1.1.1.1.m1.1.1.cmml">1</mn><annotation-xml encoding="MathML-Content" id="id1.1.1.1.m1.1b"><cn id="id1.1.1.1.m1.1.1.cmml" type="integer" xref="id1.1.1.1.m1.1.1">1</cn></annotation-xml><annotation encoding="application/x-tex" id="id1.1.1.1.m1.1c">1</annotation><annotation encoding="application/x-llamapun" id="id1.1.1.1.m1.1d">1</annotation></semantics></math></sup></span>  Kuan-Wei Ho<span class="ltx_text" id="id2.2.2" style="position:relative; bottom:0.0pt;"><sup class="ltx_sup" id="id2.2.2.1"><math alttext="1" class="ltx_Math" display="inline" id="id2.2.2.1.m1.1"><semantics id="id2.2.2.1.m1.1a"><mn id="id2.2.2.1.m1.1.1" mathsize="80%" xref="id2.2.2.1.m1.1.1.cmml">1</mn><annotation-xml encoding="MathML-Content" id="id2.2.2.1.m1.1b"><cn id="id2.2.2.1.m1.1.1.cmml" type="integer" xref="id2.2.2.1.m1.1.1">1</cn></annotation-xml><annotation encoding="application/x-tex" id="id2.2.2.1.m1.1c">1</annotation><annotation encoding="application/x-llamapun" id="id2.2.2.1.m1.1d">1</annotation></semantics></math></sup></span>  Martin Benjak<span class="ltx_text" id="id3.3.3" style="position:relative; bottom:0.0pt;"><sup class="ltx_sup" id="id3.3.3.1"><math alttext="2" class="ltx_Math" display="inline" id="id3.3.3.1.m1.1"><semantics id="id3.3.3.1.m1.1a"><mn id="id3.3.3.1.m1.1.1" mathsize="80%" xref="id3.3.3.1.m1.1.1.cmml">2</mn><annotation-xml encoding="MathML-Content" id="id3.3.3.1.m1.1b"><cn id="id3.3.3.1.m1.1.1.cmml" type="integer" xref="id3.3.3.1.m1.1.1">2</cn></annotation-xml><annotation encoding="application/x-tex" id="id3.3.3.1.m1.1c">2</annotation><annotation encoding="application/x-llamapun" id="id3.3.3.1.m1.1d">2</annotation></semantics></math></sup></span>  Jörn Ostermann<span class="ltx_text" id="id4.4.4" style="position:relative; bottom:0.0pt;"><sup class="ltx_sup" id="id4.4.4.1"><math alttext="2" class="ltx_Math" display="inline" id="id4.4.4.1.m1.1"><semantics id="id4.4.4.1.m1.1a"><mn id="id4.4.4.1.m1.1.1" mathsize="80%" xref="id4.4.4.1.m1.1.1.cmml">2</mn><annotation-xml encoding="MathML-Content" id="id4.4.4.1.m1.1b"><cn id="id4.4.4.1.m1.1.1.cmml" type="integer" xref="id4.4.4.1.m1.1.1">2</cn></annotation-xml><annotation encoding="application/x-tex" id="id4.4.4.1.m1.1c">2</annotation><annotation encoding="application/x-llamapun" id="id4.4.4.1.m1.1d">2</annotation></semantics></math></sup></span>  Wen-Hsiao Peng<span class="ltx_text" id="id5.5.5" style="position:relative; bottom:0.0pt;"><sup class="ltx_sup" id="id5.5.5.1"><math alttext="1" class="ltx_Math" display="inline" id="id5.5.5.1.m1.1"><semantics id="id5.5.5.1.m1.1a"><mn id="id5.5.5.1.m1.1.1" mathsize="80%" xref="id5.5.5.1.m1.1.1.cmml">1</mn><annotation-xml encoding="MathML-Content" id="id5.5.5.1.m1.1b"><cn id="id5.5.5.1.m1.1.1.cmml" type="integer" xref="id5.5.5.1.m1.1.1">1</cn></annotation-xml><annotation encoding="application/x-tex" id="id5.5.5.1.m1.1c">1</annotation><annotation encoding="application/x-llamapun" id="id5.5.5.1.m1.1d">1</annotation></semantics></math></sup></span> </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_affiliation"><span class="ltx_text" id="id6.6.1" style="position:relative; bottom:0.0pt;"><sup class="ltx_sup" id="id6.6.1.1"><math alttext="1" class="ltx_Math" display="inline" id="id6.6.1.1.m1.1"><semantics id="id6.6.1.1.m1.1a"><mn id="id6.6.1.1.m1.1.1" mathsize="80%" xref="id6.6.1.1.m1.1.1.cmml">1</mn><annotation-xml encoding="MathML-Content" id="id6.6.1.1.m1.1b"><cn id="id6.6.1.1.m1.1.1.cmml" type="integer" xref="id6.6.1.1.m1.1.1">1</cn></annotation-xml><annotation encoding="application/x-tex" id="id6.6.1.1.m1.1c">1</annotation><annotation encoding="application/x-llamapun" id="id6.6.1.1.m1.1d">1</annotation></semantics></math></sup></span>National Yang Ming Chiao Tung University, Taiwan  <span class="ltx_text" id="id7.7.2" style="position:relative; bottom:0.0pt;"><sup class="ltx_sup" id="id7.7.2.1"><math alttext="2" class="ltx_Math" display="inline" id="id7.7.2.1.m1.1"><semantics id="id7.7.2.1.m1.1a"><mn id="id7.7.2.1.m1.1.1" mathsize="80%" xref="id7.7.2.1.m1.1.1.cmml">2</mn><annotation-xml encoding="MathML-Content" id="id7.7.2.1.m1.1b"><cn id="id7.7.2.1.m1.1.1.cmml" type="integer" xref="id7.7.2.1.m1.1.1">2</cn></annotation-xml><annotation encoding="application/x-tex" id="id7.7.2.1.m1.1c">2</annotation><annotation encoding="application/x-llamapun" id="id7.7.2.1.m1.1d">2</annotation></semantics></math></sup></span>Leibniz Universität Hannover, Germany </span></span></span> </div> <div class="ltx_abstract"> <h6 class="ltx_title ltx_title_abstract">Abstract</h6> <p class="ltx_p" id="id8.id1">This paper aims to delve into the rate-distortion-complexity trade-offs of modern neural video coding. Recent years have witnessed much research effort being focused on exploring the full potential of neural video coding. Conditional autoencoders have emerged as the mainstream approach to efficient neural video coding. The central theme of conditional autoencoders is to leverage both spatial and temporal information for better conditional coding. However, a recent study indicates that conditional coding may suffer from information bottlenecks, potentially performing worse than traditional residual coding. To address this issue, recent conditional coding methods incorporate a large number of high-resolution features as the condition signal, leading to a considerable increase in the number of multiply–accumulate operations, memory footprint, and model size. Taking DCVC as the common code base, we investigate how the newly proposed conditional residual coding, an emerging new school of thought, and its variants may strike a better balance among rate, distortion, and complexity.</p> </div> <div class="ltx_keywords"> <h6 class="ltx_title ltx_title_keywords">Index Terms: </h6> Learned video compression, conditional coding, and conditional residual coding. </div> <span class="ltx_note ltx_role_footnote" id="footnote1"><sup class="ltx_note_mark">†</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">†</sup>This work is supported by National Science and Technology Council, Taiwan (112-2634-F-A49-007-, 110-2221-E-A49-065-MY3, 111-2923-E-A49-007-MY3), and National Center for High-performance Computing, Taiwan.</span></span></span> <section class="ltx_section" id="S1"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">I </span><span class="ltx_text ltx_font_smallcaps" id="S1.1.1">Introduction</span> </h2> <div class="ltx_para" id="S1.p1"> <p class="ltx_p" id="S1.p1.8">Learned video compression <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib1" title="">1</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib2" title="">2</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib3" title="">3</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib4" title="">4</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib5" title="">5</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib6" title="">6</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib7" title="">7</a>]</cite> holds the promise of revolutionizing the way high-efficiency video compression systems are developed. The state-of-the-art methods <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib4" title="">4</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib5" title="">5</a>]</cite> have shown comparable coding performance to VVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib8" title="">8</a>]</cite> under the low-delay configuration. Some <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib6" title="">6</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib7" title="">7</a>]</cite> even outpace ECM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib9" title="">9</a>]</cite>. Different from traditional codecs <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib10" title="">10</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib11" title="">11</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib9" title="">9</a>]</cite>, which mostly adopt residual coding to encode the prediction residue <math alttext="x_{t}-x_{c}" class="ltx_Math" display="inline" id="S1.p1.1.m1.1"><semantics id="S1.p1.1.m1.1a"><mrow id="S1.p1.1.m1.1.1" xref="S1.p1.1.m1.1.1.cmml"><msub id="S1.p1.1.m1.1.1.2" xref="S1.p1.1.m1.1.1.2.cmml"><mi id="S1.p1.1.m1.1.1.2.2" xref="S1.p1.1.m1.1.1.2.2.cmml">x</mi><mi id="S1.p1.1.m1.1.1.2.3" xref="S1.p1.1.m1.1.1.2.3.cmml">t</mi></msub><mo id="S1.p1.1.m1.1.1.1" xref="S1.p1.1.m1.1.1.1.cmml">−</mo><msub id="S1.p1.1.m1.1.1.3" xref="S1.p1.1.m1.1.1.3.cmml"><mi id="S1.p1.1.m1.1.1.3.2" xref="S1.p1.1.m1.1.1.3.2.cmml">x</mi><mi id="S1.p1.1.m1.1.1.3.3" xref="S1.p1.1.m1.1.1.3.3.cmml">c</mi></msub></mrow><annotation-xml encoding="MathML-Content" id="S1.p1.1.m1.1b"><apply id="S1.p1.1.m1.1.1.cmml" xref="S1.p1.1.m1.1.1"><minus id="S1.p1.1.m1.1.1.1.cmml" xref="S1.p1.1.m1.1.1.1"></minus><apply id="S1.p1.1.m1.1.1.2.cmml" xref="S1.p1.1.m1.1.1.2"><csymbol cd="ambiguous" id="S1.p1.1.m1.1.1.2.1.cmml" xref="S1.p1.1.m1.1.1.2">subscript</csymbol><ci id="S1.p1.1.m1.1.1.2.2.cmml" xref="S1.p1.1.m1.1.1.2.2">𝑥</ci><ci id="S1.p1.1.m1.1.1.2.3.cmml" xref="S1.p1.1.m1.1.1.2.3">𝑡</ci></apply><apply id="S1.p1.1.m1.1.1.3.cmml" xref="S1.p1.1.m1.1.1.3"><csymbol cd="ambiguous" id="S1.p1.1.m1.1.1.3.1.cmml" xref="S1.p1.1.m1.1.1.3">subscript</csymbol><ci id="S1.p1.1.m1.1.1.3.2.cmml" xref="S1.p1.1.m1.1.1.3.2">𝑥</ci><ci id="S1.p1.1.m1.1.1.3.3.cmml" xref="S1.p1.1.m1.1.1.3.3">𝑐</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p1.1.m1.1c">x_{t}-x_{c}</annotation><annotation encoding="application/x-llamapun" id="S1.p1.1.m1.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> between a target frame <math alttext="x_{t}" class="ltx_Math" display="inline" id="S1.p1.2.m2.1"><semantics id="S1.p1.2.m2.1a"><msub id="S1.p1.2.m2.1.1" xref="S1.p1.2.m2.1.1.cmml"><mi id="S1.p1.2.m2.1.1.2" xref="S1.p1.2.m2.1.1.2.cmml">x</mi><mi id="S1.p1.2.m2.1.1.3" xref="S1.p1.2.m2.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S1.p1.2.m2.1b"><apply id="S1.p1.2.m2.1.1.cmml" xref="S1.p1.2.m2.1.1"><csymbol cd="ambiguous" id="S1.p1.2.m2.1.1.1.cmml" xref="S1.p1.2.m2.1.1">subscript</csymbol><ci id="S1.p1.2.m2.1.1.2.cmml" xref="S1.p1.2.m2.1.1.2">𝑥</ci><ci id="S1.p1.2.m2.1.1.3.cmml" xref="S1.p1.2.m2.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p1.2.m2.1c">x_{t}</annotation><annotation encoding="application/x-llamapun" id="S1.p1.2.m2.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> and its temporal predictor <math alttext="x_{c}" class="ltx_Math" display="inline" id="S1.p1.3.m3.1"><semantics id="S1.p1.3.m3.1a"><msub id="S1.p1.3.m3.1.1" xref="S1.p1.3.m3.1.1.cmml"><mi id="S1.p1.3.m3.1.1.2" xref="S1.p1.3.m3.1.1.2.cmml">x</mi><mi id="S1.p1.3.m3.1.1.3" xref="S1.p1.3.m3.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S1.p1.3.m3.1b"><apply id="S1.p1.3.m3.1.1.cmml" xref="S1.p1.3.m3.1.1"><csymbol cd="ambiguous" id="S1.p1.3.m3.1.1.1.cmml" xref="S1.p1.3.m3.1.1">subscript</csymbol><ci id="S1.p1.3.m3.1.1.2.cmml" xref="S1.p1.3.m3.1.1.2">𝑥</ci><ci id="S1.p1.3.m3.1.1.3.cmml" xref="S1.p1.3.m3.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p1.3.m3.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S1.p1.3.m3.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math>, modern neural video codecs <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib1" title="">1</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib3" title="">3</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib4" title="">4</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib6" title="">6</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib7" title="">7</a>]</cite> attribute part of their success to the non-linear exploitation of the temporal predictor <math alttext="x_{c}" class="ltx_Math" display="inline" id="S1.p1.4.m4.1"><semantics id="S1.p1.4.m4.1a"><msub id="S1.p1.4.m4.1.1" xref="S1.p1.4.m4.1.1.cmml"><mi id="S1.p1.4.m4.1.1.2" xref="S1.p1.4.m4.1.1.2.cmml">x</mi><mi id="S1.p1.4.m4.1.1.3" xref="S1.p1.4.m4.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S1.p1.4.m4.1b"><apply id="S1.p1.4.m4.1.1.cmml" xref="S1.p1.4.m4.1.1"><csymbol cd="ambiguous" id="S1.p1.4.m4.1.1.1.cmml" xref="S1.p1.4.m4.1.1">subscript</csymbol><ci id="S1.p1.4.m4.1.1.2.cmml" xref="S1.p1.4.m4.1.1.2">𝑥</ci><ci id="S1.p1.4.m4.1.1.3.cmml" xref="S1.p1.4.m4.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p1.4.m4.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S1.p1.4.m4.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math>. Specifically, <math alttext="x_{c}" class="ltx_Math" display="inline" id="S1.p1.5.m5.1"><semantics id="S1.p1.5.m5.1a"><msub id="S1.p1.5.m5.1.1" xref="S1.p1.5.m5.1.1.cmml"><mi id="S1.p1.5.m5.1.1.2" xref="S1.p1.5.m5.1.1.2.cmml">x</mi><mi id="S1.p1.5.m5.1.1.3" xref="S1.p1.5.m5.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S1.p1.5.m5.1b"><apply id="S1.p1.5.m5.1.1.cmml" xref="S1.p1.5.m5.1.1"><csymbol cd="ambiguous" id="S1.p1.5.m5.1.1.1.cmml" xref="S1.p1.5.m5.1.1">subscript</csymbol><ci id="S1.p1.5.m5.1.1.2.cmml" xref="S1.p1.5.m5.1.1.2">𝑥</ci><ci id="S1.p1.5.m5.1.1.3.cmml" xref="S1.p1.5.m5.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p1.5.m5.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S1.p1.5.m5.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> is no longer used in the pixel domain to compute the prediction residue, but instead, it is utilized to condition the learned inter-frame codec in coding the target frame <math alttext="x_{t}" class="ltx_Math" display="inline" id="S1.p1.6.m6.1"><semantics id="S1.p1.6.m6.1a"><msub id="S1.p1.6.m6.1.1" xref="S1.p1.6.m6.1.1.cmml"><mi id="S1.p1.6.m6.1.1.2" xref="S1.p1.6.m6.1.1.2.cmml">x</mi><mi id="S1.p1.6.m6.1.1.3" xref="S1.p1.6.m6.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S1.p1.6.m6.1b"><apply id="S1.p1.6.m6.1.1.cmml" xref="S1.p1.6.m6.1.1"><csymbol cd="ambiguous" id="S1.p1.6.m6.1.1.1.cmml" xref="S1.p1.6.m6.1.1">subscript</csymbol><ci id="S1.p1.6.m6.1.1.2.cmml" xref="S1.p1.6.m6.1.1.2">𝑥</ci><ci id="S1.p1.6.m6.1.1.3.cmml" xref="S1.p1.6.m6.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p1.6.m6.1c">x_{t}</annotation><annotation encoding="application/x-llamapun" id="S1.p1.6.m6.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>. This technique is widely known as conditional coding. Theoretically, the conditional entropy <math alttext="H(x_{t}|x_{c})" class="ltx_Math" display="inline" id="S1.p1.7.m7.1"><semantics id="S1.p1.7.m7.1a"><mrow id="S1.p1.7.m7.1.1" xref="S1.p1.7.m7.1.1.cmml"><mi id="S1.p1.7.m7.1.1.3" xref="S1.p1.7.m7.1.1.3.cmml">H</mi><mo id="S1.p1.7.m7.1.1.2" xref="S1.p1.7.m7.1.1.2.cmml">⁢</mo><mrow id="S1.p1.7.m7.1.1.1.1" xref="S1.p1.7.m7.1.1.1.1.1.cmml"><mo id="S1.p1.7.m7.1.1.1.1.2" stretchy="false" xref="S1.p1.7.m7.1.1.1.1.1.cmml">(</mo><mrow id="S1.p1.7.m7.1.1.1.1.1" xref="S1.p1.7.m7.1.1.1.1.1.cmml"><msub id="S1.p1.7.m7.1.1.1.1.1.2" xref="S1.p1.7.m7.1.1.1.1.1.2.cmml"><mi id="S1.p1.7.m7.1.1.1.1.1.2.2" xref="S1.p1.7.m7.1.1.1.1.1.2.2.cmml">x</mi><mi id="S1.p1.7.m7.1.1.1.1.1.2.3" xref="S1.p1.7.m7.1.1.1.1.1.2.3.cmml">t</mi></msub><mo fence="false" id="S1.p1.7.m7.1.1.1.1.1.1" xref="S1.p1.7.m7.1.1.1.1.1.1.cmml">|</mo><msub id="S1.p1.7.m7.1.1.1.1.1.3" xref="S1.p1.7.m7.1.1.1.1.1.3.cmml"><mi id="S1.p1.7.m7.1.1.1.1.1.3.2" xref="S1.p1.7.m7.1.1.1.1.1.3.2.cmml">x</mi><mi id="S1.p1.7.m7.1.1.1.1.1.3.3" xref="S1.p1.7.m7.1.1.1.1.1.3.3.cmml">c</mi></msub></mrow><mo id="S1.p1.7.m7.1.1.1.1.3" stretchy="false" xref="S1.p1.7.m7.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S1.p1.7.m7.1b"><apply id="S1.p1.7.m7.1.1.cmml" xref="S1.p1.7.m7.1.1"><times id="S1.p1.7.m7.1.1.2.cmml" xref="S1.p1.7.m7.1.1.2"></times><ci id="S1.p1.7.m7.1.1.3.cmml" xref="S1.p1.7.m7.1.1.3">𝐻</ci><apply id="S1.p1.7.m7.1.1.1.1.1.cmml" xref="S1.p1.7.m7.1.1.1.1"><csymbol cd="latexml" id="S1.p1.7.m7.1.1.1.1.1.1.cmml" xref="S1.p1.7.m7.1.1.1.1.1.1">conditional</csymbol><apply id="S1.p1.7.m7.1.1.1.1.1.2.cmml" xref="S1.p1.7.m7.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S1.p1.7.m7.1.1.1.1.1.2.1.cmml" xref="S1.p1.7.m7.1.1.1.1.1.2">subscript</csymbol><ci id="S1.p1.7.m7.1.1.1.1.1.2.2.cmml" xref="S1.p1.7.m7.1.1.1.1.1.2.2">𝑥</ci><ci id="S1.p1.7.m7.1.1.1.1.1.2.3.cmml" xref="S1.p1.7.m7.1.1.1.1.1.2.3">𝑡</ci></apply><apply id="S1.p1.7.m7.1.1.1.1.1.3.cmml" xref="S1.p1.7.m7.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S1.p1.7.m7.1.1.1.1.1.3.1.cmml" xref="S1.p1.7.m7.1.1.1.1.1.3">subscript</csymbol><ci id="S1.p1.7.m7.1.1.1.1.1.3.2.cmml" xref="S1.p1.7.m7.1.1.1.1.1.3.2">𝑥</ci><ci id="S1.p1.7.m7.1.1.1.1.1.3.3.cmml" xref="S1.p1.7.m7.1.1.1.1.1.3.3">𝑐</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p1.7.m7.1c">H(x_{t}|x_{c})</annotation><annotation encoding="application/x-llamapun" id="S1.p1.7.m7.1d">italic_H ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT )</annotation></semantics></math> is less than or equal to the residual entropy <math alttext="H(x_{t}-x_{c})" class="ltx_Math" display="inline" id="S1.p1.8.m8.1"><semantics id="S1.p1.8.m8.1a"><mrow id="S1.p1.8.m8.1.1" xref="S1.p1.8.m8.1.1.cmml"><mi id="S1.p1.8.m8.1.1.3" xref="S1.p1.8.m8.1.1.3.cmml">H</mi><mo id="S1.p1.8.m8.1.1.2" xref="S1.p1.8.m8.1.1.2.cmml">⁢</mo><mrow id="S1.p1.8.m8.1.1.1.1" xref="S1.p1.8.m8.1.1.1.1.1.cmml"><mo id="S1.p1.8.m8.1.1.1.1.2" stretchy="false" xref="S1.p1.8.m8.1.1.1.1.1.cmml">(</mo><mrow id="S1.p1.8.m8.1.1.1.1.1" xref="S1.p1.8.m8.1.1.1.1.1.cmml"><msub id="S1.p1.8.m8.1.1.1.1.1.2" xref="S1.p1.8.m8.1.1.1.1.1.2.cmml"><mi id="S1.p1.8.m8.1.1.1.1.1.2.2" xref="S1.p1.8.m8.1.1.1.1.1.2.2.cmml">x</mi><mi id="S1.p1.8.m8.1.1.1.1.1.2.3" xref="S1.p1.8.m8.1.1.1.1.1.2.3.cmml">t</mi></msub><mo id="S1.p1.8.m8.1.1.1.1.1.1" xref="S1.p1.8.m8.1.1.1.1.1.1.cmml">−</mo><msub id="S1.p1.8.m8.1.1.1.1.1.3" xref="S1.p1.8.m8.1.1.1.1.1.3.cmml"><mi id="S1.p1.8.m8.1.1.1.1.1.3.2" xref="S1.p1.8.m8.1.1.1.1.1.3.2.cmml">x</mi><mi id="S1.p1.8.m8.1.1.1.1.1.3.3" xref="S1.p1.8.m8.1.1.1.1.1.3.3.cmml">c</mi></msub></mrow><mo id="S1.p1.8.m8.1.1.1.1.3" stretchy="false" xref="S1.p1.8.m8.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S1.p1.8.m8.1b"><apply id="S1.p1.8.m8.1.1.cmml" xref="S1.p1.8.m8.1.1"><times id="S1.p1.8.m8.1.1.2.cmml" xref="S1.p1.8.m8.1.1.2"></times><ci id="S1.p1.8.m8.1.1.3.cmml" xref="S1.p1.8.m8.1.1.3">𝐻</ci><apply id="S1.p1.8.m8.1.1.1.1.1.cmml" xref="S1.p1.8.m8.1.1.1.1"><minus id="S1.p1.8.m8.1.1.1.1.1.1.cmml" xref="S1.p1.8.m8.1.1.1.1.1.1"></minus><apply id="S1.p1.8.m8.1.1.1.1.1.2.cmml" xref="S1.p1.8.m8.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S1.p1.8.m8.1.1.1.1.1.2.1.cmml" xref="S1.p1.8.m8.1.1.1.1.1.2">subscript</csymbol><ci id="S1.p1.8.m8.1.1.1.1.1.2.2.cmml" xref="S1.p1.8.m8.1.1.1.1.1.2.2">𝑥</ci><ci id="S1.p1.8.m8.1.1.1.1.1.2.3.cmml" xref="S1.p1.8.m8.1.1.1.1.1.2.3">𝑡</ci></apply><apply id="S1.p1.8.m8.1.1.1.1.1.3.cmml" xref="S1.p1.8.m8.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S1.p1.8.m8.1.1.1.1.1.3.1.cmml" xref="S1.p1.8.m8.1.1.1.1.1.3">subscript</csymbol><ci id="S1.p1.8.m8.1.1.1.1.1.3.2.cmml" xref="S1.p1.8.m8.1.1.1.1.1.3.2">𝑥</ci><ci id="S1.p1.8.m8.1.1.1.1.1.3.3.cmml" xref="S1.p1.8.m8.1.1.1.1.1.3.3">𝑐</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p1.8.m8.1c">H(x_{t}-x_{c})</annotation><annotation encoding="application/x-llamapun" id="S1.p1.8.m8.1d">italic_H ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT )</annotation></semantics></math> <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib12" title="">12</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib13" title="">13</a>]</cite>.</p> </div> <div class="ltx_para" id="S1.p2"> <p class="ltx_p" id="S1.p2.2">Although showing promising coding performance, <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib13" title="">13</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib14" title="">14</a>]</cite> indicate that conditional coding may suffer from the information bottleneck issue. That is, the information from the temporal predictor <math alttext="x_{c}" class="ltx_Math" display="inline" id="S1.p2.1.m1.1"><semantics id="S1.p2.1.m1.1a"><msub id="S1.p2.1.m1.1.1" xref="S1.p2.1.m1.1.1.cmml"><mi id="S1.p2.1.m1.1.1.2" xref="S1.p2.1.m1.1.1.2.cmml">x</mi><mi id="S1.p2.1.m1.1.1.3" xref="S1.p2.1.m1.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S1.p2.1.m1.1b"><apply id="S1.p2.1.m1.1.1.cmml" xref="S1.p2.1.m1.1.1"><csymbol cd="ambiguous" id="S1.p2.1.m1.1.1.1.cmml" xref="S1.p2.1.m1.1.1">subscript</csymbol><ci id="S1.p2.1.m1.1.1.2.cmml" xref="S1.p2.1.m1.1.1.2">𝑥</ci><ci id="S1.p2.1.m1.1.1.3.cmml" xref="S1.p2.1.m1.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p2.1.m1.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S1.p2.1.m1.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> may be lost during the feature extraction process in formulating a condition signal. As a result, in some cases, conditional coding may perform worse than residual coding. To mitigate the effect of this information bottleneck, a general trend in the state-of-the-art conditional coding approaches <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib1" title="">1</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib3" title="">3</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib4" title="">4</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib6" title="">6</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib7" title="">7</a>]</cite> is to extract a large number of high-resolution features from <math alttext="x_{c}" class="ltx_Math" display="inline" id="S1.p2.2.m2.1"><semantics id="S1.p2.2.m2.1a"><msub id="S1.p2.2.m2.1.1" xref="S1.p2.2.m2.1.1.cmml"><mi id="S1.p2.2.m2.1.1.2" xref="S1.p2.2.m2.1.1.2.cmml">x</mi><mi id="S1.p2.2.m2.1.1.3" xref="S1.p2.2.m2.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S1.p2.2.m2.1b"><apply id="S1.p2.2.m2.1.1.cmml" xref="S1.p2.2.m2.1.1"><csymbol cd="ambiguous" id="S1.p2.2.m2.1.1.1.cmml" xref="S1.p2.2.m2.1.1">subscript</csymbol><ci id="S1.p2.2.m2.1.1.2.cmml" xref="S1.p2.2.m2.1.1.2">𝑥</ci><ci id="S1.p2.2.m2.1.1.3.cmml" xref="S1.p2.2.m2.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p2.2.m2.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S1.p2.2.m2.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math>. However, this results in a considerable increase in the number of multiply–accumulate operations, memory footprint, and model size.</p> </div> <figure class="ltx_figure" id="S1.F1"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_square" height="533" id="S1.F1.g1" src="extracted/5902691/Figure/teaser/RD_complexity_BT709_piecewice_final_correct.png" width="598"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 1: </span>BD-rate versus complexity (measured in the encoding kMACs/pixel, decoding kMACs/pixel, model size, and channel size of the full-resolution condition signal for inter-frame coding). CC, CR, MCR in the figure represent conditional coding, conditional residual coding, and masked conditional residual coding, respectively. The vertical axis represents BD-rate in terms of PSNR-RGB. The anchor is conditional coding with 64 full-resolution feature maps as the condition signal. Positive and negative BD-rate numbers indicate rate inflation and reduction, respectively. See Section <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S3" title="III Experiments ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">III</span></a> for more results and evaluation setup. </figcaption> </figure> <figure class="ltx_figure" id="S1.F2"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="301" id="S1.F2.g1" src="extracted/5902691/Figure/coding_schemes_5.png" width="598"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 2: </span>The system overview of (a) DCVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib1" title="">1</a>]</cite> and its variants with (b) conditional residual coding and (c) masked conditional residual coding. In these three variants, we follow DCVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib1" title="">1</a>]</cite> to incorporate a temporal prior into the entropy model of the inter-frame codec. For brevity, it is not depicted in the figure.</figcaption> </figure> <div class="ltx_para" id="S1.p3"> <p class="ltx_p" id="S1.p3.3">To remedy the information bottleneck, Brand <em class="ltx_emph ltx_font_italic" id="S1.p3.3.1">et al.</em> <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib14" title="">14</a>]</cite> recently introduced conditional residual coding, which encodes the prediction residue <math alttext="x_{t}-x_{c}" class="ltx_Math" display="inline" id="S1.p3.1.m1.1"><semantics id="S1.p3.1.m1.1a"><mrow id="S1.p3.1.m1.1.1" xref="S1.p3.1.m1.1.1.cmml"><msub id="S1.p3.1.m1.1.1.2" xref="S1.p3.1.m1.1.1.2.cmml"><mi id="S1.p3.1.m1.1.1.2.2" xref="S1.p3.1.m1.1.1.2.2.cmml">x</mi><mi id="S1.p3.1.m1.1.1.2.3" xref="S1.p3.1.m1.1.1.2.3.cmml">t</mi></msub><mo id="S1.p3.1.m1.1.1.1" xref="S1.p3.1.m1.1.1.1.cmml">−</mo><msub id="S1.p3.1.m1.1.1.3" xref="S1.p3.1.m1.1.1.3.cmml"><mi id="S1.p3.1.m1.1.1.3.2" xref="S1.p3.1.m1.1.1.3.2.cmml">x</mi><mi id="S1.p3.1.m1.1.1.3.3" xref="S1.p3.1.m1.1.1.3.3.cmml">c</mi></msub></mrow><annotation-xml encoding="MathML-Content" id="S1.p3.1.m1.1b"><apply id="S1.p3.1.m1.1.1.cmml" xref="S1.p3.1.m1.1.1"><minus id="S1.p3.1.m1.1.1.1.cmml" xref="S1.p3.1.m1.1.1.1"></minus><apply id="S1.p3.1.m1.1.1.2.cmml" xref="S1.p3.1.m1.1.1.2"><csymbol cd="ambiguous" id="S1.p3.1.m1.1.1.2.1.cmml" xref="S1.p3.1.m1.1.1.2">subscript</csymbol><ci id="S1.p3.1.m1.1.1.2.2.cmml" xref="S1.p3.1.m1.1.1.2.2">𝑥</ci><ci id="S1.p3.1.m1.1.1.2.3.cmml" xref="S1.p3.1.m1.1.1.2.3">𝑡</ci></apply><apply id="S1.p3.1.m1.1.1.3.cmml" xref="S1.p3.1.m1.1.1.3"><csymbol cd="ambiguous" id="S1.p3.1.m1.1.1.3.1.cmml" xref="S1.p3.1.m1.1.1.3">subscript</csymbol><ci id="S1.p3.1.m1.1.1.3.2.cmml" xref="S1.p3.1.m1.1.1.3.2">𝑥</ci><ci id="S1.p3.1.m1.1.1.3.3.cmml" xref="S1.p3.1.m1.1.1.3.3">𝑐</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p3.1.m1.1c">x_{t}-x_{c}</annotation><annotation encoding="application/x-llamapun" id="S1.p3.1.m1.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> with a conditional inter-frame codec. Brand <em class="ltx_emph ltx_font_italic" id="S1.p3.3.2">et al.</em> <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib14" title="">14</a>]</cite> show that in theory, the coding efficiency of conditional residual coding is at least as good as that of conditional coding and is less susceptible to the information bottleneck. Ideally, conditional residual coding is able to achieve higher coding performance with lower complexity. However, the theory was validated under a simplified setup with a simple learned video codec. Moreover, the results from <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib14" title="">14</a>]</cite> are based on the assumption that the temporal predictor has good quality and the entropy of the residue <math alttext="x_{t}-x_{c}" class="ltx_Math" display="inline" id="S1.p3.2.m2.1"><semantics id="S1.p3.2.m2.1a"><mrow id="S1.p3.2.m2.1.1" xref="S1.p3.2.m2.1.1.cmml"><msub id="S1.p3.2.m2.1.1.2" xref="S1.p3.2.m2.1.1.2.cmml"><mi id="S1.p3.2.m2.1.1.2.2" xref="S1.p3.2.m2.1.1.2.2.cmml">x</mi><mi id="S1.p3.2.m2.1.1.2.3" xref="S1.p3.2.m2.1.1.2.3.cmml">t</mi></msub><mo id="S1.p3.2.m2.1.1.1" xref="S1.p3.2.m2.1.1.1.cmml">−</mo><msub id="S1.p3.2.m2.1.1.3" xref="S1.p3.2.m2.1.1.3.cmml"><mi id="S1.p3.2.m2.1.1.3.2" xref="S1.p3.2.m2.1.1.3.2.cmml">x</mi><mi id="S1.p3.2.m2.1.1.3.3" xref="S1.p3.2.m2.1.1.3.3.cmml">c</mi></msub></mrow><annotation-xml encoding="MathML-Content" id="S1.p3.2.m2.1b"><apply id="S1.p3.2.m2.1.1.cmml" xref="S1.p3.2.m2.1.1"><minus id="S1.p3.2.m2.1.1.1.cmml" xref="S1.p3.2.m2.1.1.1"></minus><apply id="S1.p3.2.m2.1.1.2.cmml" xref="S1.p3.2.m2.1.1.2"><csymbol cd="ambiguous" id="S1.p3.2.m2.1.1.2.1.cmml" xref="S1.p3.2.m2.1.1.2">subscript</csymbol><ci id="S1.p3.2.m2.1.1.2.2.cmml" xref="S1.p3.2.m2.1.1.2.2">𝑥</ci><ci id="S1.p3.2.m2.1.1.2.3.cmml" xref="S1.p3.2.m2.1.1.2.3">𝑡</ci></apply><apply id="S1.p3.2.m2.1.1.3.cmml" xref="S1.p3.2.m2.1.1.3"><csymbol cd="ambiguous" id="S1.p3.2.m2.1.1.3.1.cmml" xref="S1.p3.2.m2.1.1.3">subscript</csymbol><ci id="S1.p3.2.m2.1.1.3.2.cmml" xref="S1.p3.2.m2.1.1.3.2">𝑥</ci><ci id="S1.p3.2.m2.1.1.3.3.cmml" xref="S1.p3.2.m2.1.1.3.3">𝑐</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p3.2.m2.1c">x_{t}-x_{c}</annotation><annotation encoding="application/x-llamapun" id="S1.p3.2.m2.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> is less than the entropy of the target frame, i.e., <math alttext="H(x_{t}-x_{c})\leq H(x_{t})" class="ltx_Math" display="inline" id="S1.p3.3.m3.2"><semantics id="S1.p3.3.m3.2a"><mrow id="S1.p3.3.m3.2.2" xref="S1.p3.3.m3.2.2.cmml"><mrow id="S1.p3.3.m3.1.1.1" xref="S1.p3.3.m3.1.1.1.cmml"><mi id="S1.p3.3.m3.1.1.1.3" xref="S1.p3.3.m3.1.1.1.3.cmml">H</mi><mo id="S1.p3.3.m3.1.1.1.2" xref="S1.p3.3.m3.1.1.1.2.cmml">⁢</mo><mrow id="S1.p3.3.m3.1.1.1.1.1" xref="S1.p3.3.m3.1.1.1.1.1.1.cmml"><mo id="S1.p3.3.m3.1.1.1.1.1.2" stretchy="false" xref="S1.p3.3.m3.1.1.1.1.1.1.cmml">(</mo><mrow id="S1.p3.3.m3.1.1.1.1.1.1" xref="S1.p3.3.m3.1.1.1.1.1.1.cmml"><msub id="S1.p3.3.m3.1.1.1.1.1.1.2" xref="S1.p3.3.m3.1.1.1.1.1.1.2.cmml"><mi id="S1.p3.3.m3.1.1.1.1.1.1.2.2" xref="S1.p3.3.m3.1.1.1.1.1.1.2.2.cmml">x</mi><mi id="S1.p3.3.m3.1.1.1.1.1.1.2.3" xref="S1.p3.3.m3.1.1.1.1.1.1.2.3.cmml">t</mi></msub><mo id="S1.p3.3.m3.1.1.1.1.1.1.1" xref="S1.p3.3.m3.1.1.1.1.1.1.1.cmml">−</mo><msub id="S1.p3.3.m3.1.1.1.1.1.1.3" xref="S1.p3.3.m3.1.1.1.1.1.1.3.cmml"><mi id="S1.p3.3.m3.1.1.1.1.1.1.3.2" xref="S1.p3.3.m3.1.1.1.1.1.1.3.2.cmml">x</mi><mi id="S1.p3.3.m3.1.1.1.1.1.1.3.3" xref="S1.p3.3.m3.1.1.1.1.1.1.3.3.cmml">c</mi></msub></mrow><mo id="S1.p3.3.m3.1.1.1.1.1.3" stretchy="false" xref="S1.p3.3.m3.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S1.p3.3.m3.2.2.3" xref="S1.p3.3.m3.2.2.3.cmml">≤</mo><mrow id="S1.p3.3.m3.2.2.2" xref="S1.p3.3.m3.2.2.2.cmml"><mi id="S1.p3.3.m3.2.2.2.3" xref="S1.p3.3.m3.2.2.2.3.cmml">H</mi><mo id="S1.p3.3.m3.2.2.2.2" xref="S1.p3.3.m3.2.2.2.2.cmml">⁢</mo><mrow id="S1.p3.3.m3.2.2.2.1.1" xref="S1.p3.3.m3.2.2.2.1.1.1.cmml"><mo id="S1.p3.3.m3.2.2.2.1.1.2" stretchy="false" xref="S1.p3.3.m3.2.2.2.1.1.1.cmml">(</mo><msub id="S1.p3.3.m3.2.2.2.1.1.1" xref="S1.p3.3.m3.2.2.2.1.1.1.cmml"><mi id="S1.p3.3.m3.2.2.2.1.1.1.2" xref="S1.p3.3.m3.2.2.2.1.1.1.2.cmml">x</mi><mi id="S1.p3.3.m3.2.2.2.1.1.1.3" xref="S1.p3.3.m3.2.2.2.1.1.1.3.cmml">t</mi></msub><mo id="S1.p3.3.m3.2.2.2.1.1.3" stretchy="false" xref="S1.p3.3.m3.2.2.2.1.1.1.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S1.p3.3.m3.2b"><apply id="S1.p3.3.m3.2.2.cmml" xref="S1.p3.3.m3.2.2"><leq id="S1.p3.3.m3.2.2.3.cmml" xref="S1.p3.3.m3.2.2.3"></leq><apply id="S1.p3.3.m3.1.1.1.cmml" xref="S1.p3.3.m3.1.1.1"><times id="S1.p3.3.m3.1.1.1.2.cmml" xref="S1.p3.3.m3.1.1.1.2"></times><ci id="S1.p3.3.m3.1.1.1.3.cmml" xref="S1.p3.3.m3.1.1.1.3">𝐻</ci><apply id="S1.p3.3.m3.1.1.1.1.1.1.cmml" xref="S1.p3.3.m3.1.1.1.1.1"><minus id="S1.p3.3.m3.1.1.1.1.1.1.1.cmml" xref="S1.p3.3.m3.1.1.1.1.1.1.1"></minus><apply id="S1.p3.3.m3.1.1.1.1.1.1.2.cmml" xref="S1.p3.3.m3.1.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S1.p3.3.m3.1.1.1.1.1.1.2.1.cmml" xref="S1.p3.3.m3.1.1.1.1.1.1.2">subscript</csymbol><ci id="S1.p3.3.m3.1.1.1.1.1.1.2.2.cmml" xref="S1.p3.3.m3.1.1.1.1.1.1.2.2">𝑥</ci><ci id="S1.p3.3.m3.1.1.1.1.1.1.2.3.cmml" xref="S1.p3.3.m3.1.1.1.1.1.1.2.3">𝑡</ci></apply><apply id="S1.p3.3.m3.1.1.1.1.1.1.3.cmml" xref="S1.p3.3.m3.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S1.p3.3.m3.1.1.1.1.1.1.3.1.cmml" xref="S1.p3.3.m3.1.1.1.1.1.1.3">subscript</csymbol><ci id="S1.p3.3.m3.1.1.1.1.1.1.3.2.cmml" xref="S1.p3.3.m3.1.1.1.1.1.1.3.2">𝑥</ci><ci id="S1.p3.3.m3.1.1.1.1.1.1.3.3.cmml" xref="S1.p3.3.m3.1.1.1.1.1.1.3.3">𝑐</ci></apply></apply></apply><apply id="S1.p3.3.m3.2.2.2.cmml" xref="S1.p3.3.m3.2.2.2"><times id="S1.p3.3.m3.2.2.2.2.cmml" xref="S1.p3.3.m3.2.2.2.2"></times><ci id="S1.p3.3.m3.2.2.2.3.cmml" xref="S1.p3.3.m3.2.2.2.3">𝐻</ci><apply id="S1.p3.3.m3.2.2.2.1.1.1.cmml" xref="S1.p3.3.m3.2.2.2.1.1"><csymbol cd="ambiguous" id="S1.p3.3.m3.2.2.2.1.1.1.1.cmml" xref="S1.p3.3.m3.2.2.2.1.1">subscript</csymbol><ci id="S1.p3.3.m3.2.2.2.1.1.1.2.cmml" xref="S1.p3.3.m3.2.2.2.1.1.1.2">𝑥</ci><ci id="S1.p3.3.m3.2.2.2.1.1.1.3.cmml" xref="S1.p3.3.m3.2.2.2.1.1.1.3">𝑡</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p3.3.m3.2c">H(x_{t}-x_{c})\leq H(x_{t})</annotation><annotation encoding="application/x-llamapun" id="S1.p3.3.m3.2d">italic_H ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ≤ italic_H ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )</annotation></semantics></math>.</p> </div> <div class="ltx_para" id="S1.p4"> <p class="ltx_p" id="S1.p4.1">Considering that the assumption <math alttext="H(x_{t}-x_{c})\leq H(x_{t})" class="ltx_Math" display="inline" id="S1.p4.1.m1.2"><semantics id="S1.p4.1.m1.2a"><mrow id="S1.p4.1.m1.2.2" xref="S1.p4.1.m1.2.2.cmml"><mrow id="S1.p4.1.m1.1.1.1" xref="S1.p4.1.m1.1.1.1.cmml"><mi id="S1.p4.1.m1.1.1.1.3" xref="S1.p4.1.m1.1.1.1.3.cmml">H</mi><mo id="S1.p4.1.m1.1.1.1.2" xref="S1.p4.1.m1.1.1.1.2.cmml">⁢</mo><mrow id="S1.p4.1.m1.1.1.1.1.1" xref="S1.p4.1.m1.1.1.1.1.1.1.cmml"><mo id="S1.p4.1.m1.1.1.1.1.1.2" stretchy="false" xref="S1.p4.1.m1.1.1.1.1.1.1.cmml">(</mo><mrow id="S1.p4.1.m1.1.1.1.1.1.1" xref="S1.p4.1.m1.1.1.1.1.1.1.cmml"><msub id="S1.p4.1.m1.1.1.1.1.1.1.2" xref="S1.p4.1.m1.1.1.1.1.1.1.2.cmml"><mi id="S1.p4.1.m1.1.1.1.1.1.1.2.2" xref="S1.p4.1.m1.1.1.1.1.1.1.2.2.cmml">x</mi><mi id="S1.p4.1.m1.1.1.1.1.1.1.2.3" xref="S1.p4.1.m1.1.1.1.1.1.1.2.3.cmml">t</mi></msub><mo id="S1.p4.1.m1.1.1.1.1.1.1.1" xref="S1.p4.1.m1.1.1.1.1.1.1.1.cmml">−</mo><msub id="S1.p4.1.m1.1.1.1.1.1.1.3" xref="S1.p4.1.m1.1.1.1.1.1.1.3.cmml"><mi id="S1.p4.1.m1.1.1.1.1.1.1.3.2" xref="S1.p4.1.m1.1.1.1.1.1.1.3.2.cmml">x</mi><mi id="S1.p4.1.m1.1.1.1.1.1.1.3.3" xref="S1.p4.1.m1.1.1.1.1.1.1.3.3.cmml">c</mi></msub></mrow><mo id="S1.p4.1.m1.1.1.1.1.1.3" stretchy="false" xref="S1.p4.1.m1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S1.p4.1.m1.2.2.3" xref="S1.p4.1.m1.2.2.3.cmml">≤</mo><mrow id="S1.p4.1.m1.2.2.2" xref="S1.p4.1.m1.2.2.2.cmml"><mi id="S1.p4.1.m1.2.2.2.3" xref="S1.p4.1.m1.2.2.2.3.cmml">H</mi><mo id="S1.p4.1.m1.2.2.2.2" xref="S1.p4.1.m1.2.2.2.2.cmml">⁢</mo><mrow id="S1.p4.1.m1.2.2.2.1.1" xref="S1.p4.1.m1.2.2.2.1.1.1.cmml"><mo id="S1.p4.1.m1.2.2.2.1.1.2" stretchy="false" xref="S1.p4.1.m1.2.2.2.1.1.1.cmml">(</mo><msub id="S1.p4.1.m1.2.2.2.1.1.1" xref="S1.p4.1.m1.2.2.2.1.1.1.cmml"><mi id="S1.p4.1.m1.2.2.2.1.1.1.2" xref="S1.p4.1.m1.2.2.2.1.1.1.2.cmml">x</mi><mi id="S1.p4.1.m1.2.2.2.1.1.1.3" xref="S1.p4.1.m1.2.2.2.1.1.1.3.cmml">t</mi></msub><mo id="S1.p4.1.m1.2.2.2.1.1.3" stretchy="false" xref="S1.p4.1.m1.2.2.2.1.1.1.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S1.p4.1.m1.2b"><apply id="S1.p4.1.m1.2.2.cmml" xref="S1.p4.1.m1.2.2"><leq id="S1.p4.1.m1.2.2.3.cmml" xref="S1.p4.1.m1.2.2.3"></leq><apply id="S1.p4.1.m1.1.1.1.cmml" xref="S1.p4.1.m1.1.1.1"><times id="S1.p4.1.m1.1.1.1.2.cmml" xref="S1.p4.1.m1.1.1.1.2"></times><ci id="S1.p4.1.m1.1.1.1.3.cmml" xref="S1.p4.1.m1.1.1.1.3">𝐻</ci><apply id="S1.p4.1.m1.1.1.1.1.1.1.cmml" xref="S1.p4.1.m1.1.1.1.1.1"><minus id="S1.p4.1.m1.1.1.1.1.1.1.1.cmml" xref="S1.p4.1.m1.1.1.1.1.1.1.1"></minus><apply id="S1.p4.1.m1.1.1.1.1.1.1.2.cmml" xref="S1.p4.1.m1.1.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S1.p4.1.m1.1.1.1.1.1.1.2.1.cmml" xref="S1.p4.1.m1.1.1.1.1.1.1.2">subscript</csymbol><ci id="S1.p4.1.m1.1.1.1.1.1.1.2.2.cmml" xref="S1.p4.1.m1.1.1.1.1.1.1.2.2">𝑥</ci><ci id="S1.p4.1.m1.1.1.1.1.1.1.2.3.cmml" xref="S1.p4.1.m1.1.1.1.1.1.1.2.3">𝑡</ci></apply><apply id="S1.p4.1.m1.1.1.1.1.1.1.3.cmml" xref="S1.p4.1.m1.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S1.p4.1.m1.1.1.1.1.1.1.3.1.cmml" xref="S1.p4.1.m1.1.1.1.1.1.1.3">subscript</csymbol><ci id="S1.p4.1.m1.1.1.1.1.1.1.3.2.cmml" xref="S1.p4.1.m1.1.1.1.1.1.1.3.2">𝑥</ci><ci id="S1.p4.1.m1.1.1.1.1.1.1.3.3.cmml" xref="S1.p4.1.m1.1.1.1.1.1.1.3.3">𝑐</ci></apply></apply></apply><apply id="S1.p4.1.m1.2.2.2.cmml" xref="S1.p4.1.m1.2.2.2"><times id="S1.p4.1.m1.2.2.2.2.cmml" xref="S1.p4.1.m1.2.2.2.2"></times><ci id="S1.p4.1.m1.2.2.2.3.cmml" xref="S1.p4.1.m1.2.2.2.3">𝐻</ci><apply id="S1.p4.1.m1.2.2.2.1.1.1.cmml" xref="S1.p4.1.m1.2.2.2.1.1"><csymbol cd="ambiguous" id="S1.p4.1.m1.2.2.2.1.1.1.1.cmml" xref="S1.p4.1.m1.2.2.2.1.1">subscript</csymbol><ci id="S1.p4.1.m1.2.2.2.1.1.1.2.cmml" xref="S1.p4.1.m1.2.2.2.1.1.1.2">𝑥</ci><ci id="S1.p4.1.m1.2.2.2.1.1.1.3.cmml" xref="S1.p4.1.m1.2.2.2.1.1.1.3">𝑡</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p4.1.m1.2c">H(x_{t}-x_{c})\leq H(x_{t})</annotation><annotation encoding="application/x-llamapun" id="S1.p4.1.m1.2d">italic_H ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ≤ italic_H ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )</annotation></semantics></math> of conditional residual coding may be violated in regions with dis-occlusion or unreliable motion estimates, Chen <em class="ltx_emph ltx_font_italic" id="S1.p4.1.1">et al.</em> <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib5" title="">5</a>]</cite> propose a masked conditional residual coding scheme, which improves on conditional residual coding by introducing a pixel-wise soft mask to switch between conditional coding and conditional residual coding. While this approach shows promising results, the trade-offs between coding performance and complexity among conditional coding, conditional residual coding, and masked conditional residual coding are not fully discussed in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib5" title="">5</a>]</cite>. Moreover, the experiments in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib5" title="">5</a>]</cite> are conducted using a Transformer-based video codec, which differs from the more commonly used CNN-based codecs.</p> </div> <div class="ltx_para" id="S1.p5"> <p class="ltx_p" id="S1.p5.1">In this work, we use DCVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib1" title="">1</a>]</cite> – a typical CNN-based conditional video codec – as the common code base to explore how the recently proposed conditional residual coding and its variants could potentially strike a better balance among rate, distortion, and complexity. As shown in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S1.F1" title="Figure 1 ‣ I Introduction ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">1</span></a>, both conditional residual coding and masked conditional residual coding outperform conditional coding in terms of coding performance while maintaining lower complexity. The main contributions of this work include (1) extending DCVC from conditional coding to conditional residual coding and masked conditional residual coding, and (2) conducting extensive experiments to study the rate-distortion-complexity trade-offs of these coding schemes under a nearly identical framework.</p> </div> <figure class="ltx_figure" id="S1.F3"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="294" id="S1.F3.g1" src="extracted/5902691/Figure/architecture/MaskGenerator_v3.png" width="598"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 3: </span>The network architecture of the mask generator.</figcaption> </figure> </section> <section class="ltx_section" id="S2"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">II </span><span class="ltx_text ltx_font_smallcaps" id="S2.1.1">Proposed Method</span> </h2> <figure class="ltx_table" id="S2.T1"> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">TABLE I: </span>Training procedure. MENet, FE, 3x3 Conv represent the motion estimation network, feature extractor, and 3x3 Conv (only for conditional residual coding and masked conditional residual coding) in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S1.F2" title="Figure 2 ‣ I Introduction ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">2</span></a>, respectively. The inter-frame codec in this table includes <math alttext="\{G^{enc},G^{dec}\}" class="ltx_Math" display="inline" id="S2.T1.6.m1.2"><semantics id="S2.T1.6.m1.2b"><mrow id="S2.T1.6.m1.2.2.2" xref="S2.T1.6.m1.2.2.3.cmml"><mo id="S2.T1.6.m1.2.2.2.3" stretchy="false" xref="S2.T1.6.m1.2.2.3.cmml">{</mo><msup id="S2.T1.6.m1.1.1.1.1" xref="S2.T1.6.m1.1.1.1.1.cmml"><mi id="S2.T1.6.m1.1.1.1.1.2" xref="S2.T1.6.m1.1.1.1.1.2.cmml">G</mi><mrow id="S2.T1.6.m1.1.1.1.1.3" xref="S2.T1.6.m1.1.1.1.1.3.cmml"><mi id="S2.T1.6.m1.1.1.1.1.3.2" xref="S2.T1.6.m1.1.1.1.1.3.2.cmml">e</mi><mo id="S2.T1.6.m1.1.1.1.1.3.1" xref="S2.T1.6.m1.1.1.1.1.3.1.cmml">⁢</mo><mi id="S2.T1.6.m1.1.1.1.1.3.3" xref="S2.T1.6.m1.1.1.1.1.3.3.cmml">n</mi><mo id="S2.T1.6.m1.1.1.1.1.3.1b" xref="S2.T1.6.m1.1.1.1.1.3.1.cmml">⁢</mo><mi id="S2.T1.6.m1.1.1.1.1.3.4" xref="S2.T1.6.m1.1.1.1.1.3.4.cmml">c</mi></mrow></msup><mo id="S2.T1.6.m1.2.2.2.4" xref="S2.T1.6.m1.2.2.3.cmml">,</mo><msup id="S2.T1.6.m1.2.2.2.2" xref="S2.T1.6.m1.2.2.2.2.cmml"><mi id="S2.T1.6.m1.2.2.2.2.2" xref="S2.T1.6.m1.2.2.2.2.2.cmml">G</mi><mrow id="S2.T1.6.m1.2.2.2.2.3" xref="S2.T1.6.m1.2.2.2.2.3.cmml"><mi id="S2.T1.6.m1.2.2.2.2.3.2" xref="S2.T1.6.m1.2.2.2.2.3.2.cmml">d</mi><mo id="S2.T1.6.m1.2.2.2.2.3.1" xref="S2.T1.6.m1.2.2.2.2.3.1.cmml">⁢</mo><mi id="S2.T1.6.m1.2.2.2.2.3.3" xref="S2.T1.6.m1.2.2.2.2.3.3.cmml">e</mi><mo id="S2.T1.6.m1.2.2.2.2.3.1b" xref="S2.T1.6.m1.2.2.2.2.3.1.cmml">⁢</mo><mi id="S2.T1.6.m1.2.2.2.2.3.4" xref="S2.T1.6.m1.2.2.2.2.3.4.cmml">c</mi></mrow></msup><mo id="S2.T1.6.m1.2.2.2.5" stretchy="false" xref="S2.T1.6.m1.2.2.3.cmml">}</mo></mrow><annotation-xml encoding="MathML-Content" id="S2.T1.6.m1.2c"><set id="S2.T1.6.m1.2.2.3.cmml" xref="S2.T1.6.m1.2.2.2"><apply id="S2.T1.6.m1.1.1.1.1.cmml" xref="S2.T1.6.m1.1.1.1.1"><csymbol cd="ambiguous" id="S2.T1.6.m1.1.1.1.1.1.cmml" xref="S2.T1.6.m1.1.1.1.1">superscript</csymbol><ci id="S2.T1.6.m1.1.1.1.1.2.cmml" xref="S2.T1.6.m1.1.1.1.1.2">𝐺</ci><apply id="S2.T1.6.m1.1.1.1.1.3.cmml" xref="S2.T1.6.m1.1.1.1.1.3"><times id="S2.T1.6.m1.1.1.1.1.3.1.cmml" xref="S2.T1.6.m1.1.1.1.1.3.1"></times><ci id="S2.T1.6.m1.1.1.1.1.3.2.cmml" xref="S2.T1.6.m1.1.1.1.1.3.2">𝑒</ci><ci id="S2.T1.6.m1.1.1.1.1.3.3.cmml" xref="S2.T1.6.m1.1.1.1.1.3.3">𝑛</ci><ci id="S2.T1.6.m1.1.1.1.1.3.4.cmml" xref="S2.T1.6.m1.1.1.1.1.3.4">𝑐</ci></apply></apply><apply id="S2.T1.6.m1.2.2.2.2.cmml" xref="S2.T1.6.m1.2.2.2.2"><csymbol cd="ambiguous" id="S2.T1.6.m1.2.2.2.2.1.cmml" xref="S2.T1.6.m1.2.2.2.2">superscript</csymbol><ci id="S2.T1.6.m1.2.2.2.2.2.cmml" xref="S2.T1.6.m1.2.2.2.2.2">𝐺</ci><apply id="S2.T1.6.m1.2.2.2.2.3.cmml" xref="S2.T1.6.m1.2.2.2.2.3"><times id="S2.T1.6.m1.2.2.2.2.3.1.cmml" xref="S2.T1.6.m1.2.2.2.2.3.1"></times><ci id="S2.T1.6.m1.2.2.2.2.3.2.cmml" xref="S2.T1.6.m1.2.2.2.2.3.2">𝑑</ci><ci id="S2.T1.6.m1.2.2.2.2.3.3.cmml" xref="S2.T1.6.m1.2.2.2.2.3.3">𝑒</ci><ci id="S2.T1.6.m1.2.2.2.2.3.4.cmml" xref="S2.T1.6.m1.2.2.2.2.3.4">𝑐</ci></apply></apply></set></annotation-xml><annotation encoding="application/x-tex" id="S2.T1.6.m1.2d">\{G^{enc},G^{dec}\}</annotation><annotation encoding="application/x-llamapun" id="S2.T1.6.m1.2e">{ italic_G start_POSTSUPERSCRIPT italic_e italic_n italic_c end_POSTSUPERSCRIPT , italic_G start_POSTSUPERSCRIPT italic_d italic_e italic_c end_POSTSUPERSCRIPT }</annotation></semantics></math>, the refinement net, and the frame generator in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S1.F2" title="Figure 2 ‣ I Introduction ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">2</span></a>. EPA stands for the error propagation aware training in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib15" title="">15</a>]</cite>. <math alttext="R^{motion}_{t}" class="ltx_Math" display="inline" id="S2.T1.7.m2.1"><semantics id="S2.T1.7.m2.1b"><msubsup id="S2.T1.7.m2.1.1" xref="S2.T1.7.m2.1.1.cmml"><mi id="S2.T1.7.m2.1.1.2.2" xref="S2.T1.7.m2.1.1.2.2.cmml">R</mi><mi id="S2.T1.7.m2.1.1.3" xref="S2.T1.7.m2.1.1.3.cmml">t</mi><mrow id="S2.T1.7.m2.1.1.2.3" xref="S2.T1.7.m2.1.1.2.3.cmml"><mi id="S2.T1.7.m2.1.1.2.3.2" xref="S2.T1.7.m2.1.1.2.3.2.cmml">m</mi><mo id="S2.T1.7.m2.1.1.2.3.1" xref="S2.T1.7.m2.1.1.2.3.1.cmml">⁢</mo><mi id="S2.T1.7.m2.1.1.2.3.3" xref="S2.T1.7.m2.1.1.2.3.3.cmml">o</mi><mo id="S2.T1.7.m2.1.1.2.3.1b" xref="S2.T1.7.m2.1.1.2.3.1.cmml">⁢</mo><mi id="S2.T1.7.m2.1.1.2.3.4" xref="S2.T1.7.m2.1.1.2.3.4.cmml">t</mi><mo id="S2.T1.7.m2.1.1.2.3.1c" xref="S2.T1.7.m2.1.1.2.3.1.cmml">⁢</mo><mi id="S2.T1.7.m2.1.1.2.3.5" xref="S2.T1.7.m2.1.1.2.3.5.cmml">i</mi><mo id="S2.T1.7.m2.1.1.2.3.1d" xref="S2.T1.7.m2.1.1.2.3.1.cmml">⁢</mo><mi id="S2.T1.7.m2.1.1.2.3.6" xref="S2.T1.7.m2.1.1.2.3.6.cmml">o</mi><mo id="S2.T1.7.m2.1.1.2.3.1e" xref="S2.T1.7.m2.1.1.2.3.1.cmml">⁢</mo><mi id="S2.T1.7.m2.1.1.2.3.7" xref="S2.T1.7.m2.1.1.2.3.7.cmml">n</mi></mrow></msubsup><annotation-xml encoding="MathML-Content" id="S2.T1.7.m2.1c"><apply id="S2.T1.7.m2.1.1.cmml" xref="S2.T1.7.m2.1.1"><csymbol cd="ambiguous" id="S2.T1.7.m2.1.1.1.cmml" xref="S2.T1.7.m2.1.1">subscript</csymbol><apply id="S2.T1.7.m2.1.1.2.cmml" xref="S2.T1.7.m2.1.1"><csymbol cd="ambiguous" id="S2.T1.7.m2.1.1.2.1.cmml" xref="S2.T1.7.m2.1.1">superscript</csymbol><ci id="S2.T1.7.m2.1.1.2.2.cmml" xref="S2.T1.7.m2.1.1.2.2">𝑅</ci><apply id="S2.T1.7.m2.1.1.2.3.cmml" xref="S2.T1.7.m2.1.1.2.3"><times id="S2.T1.7.m2.1.1.2.3.1.cmml" xref="S2.T1.7.m2.1.1.2.3.1"></times><ci id="S2.T1.7.m2.1.1.2.3.2.cmml" xref="S2.T1.7.m2.1.1.2.3.2">𝑚</ci><ci id="S2.T1.7.m2.1.1.2.3.3.cmml" xref="S2.T1.7.m2.1.1.2.3.3">𝑜</ci><ci id="S2.T1.7.m2.1.1.2.3.4.cmml" xref="S2.T1.7.m2.1.1.2.3.4">𝑡</ci><ci id="S2.T1.7.m2.1.1.2.3.5.cmml" xref="S2.T1.7.m2.1.1.2.3.5">𝑖</ci><ci id="S2.T1.7.m2.1.1.2.3.6.cmml" xref="S2.T1.7.m2.1.1.2.3.6">𝑜</ci><ci id="S2.T1.7.m2.1.1.2.3.7.cmml" xref="S2.T1.7.m2.1.1.2.3.7">𝑛</ci></apply></apply><ci id="S2.T1.7.m2.1.1.3.cmml" xref="S2.T1.7.m2.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.T1.7.m2.1d">R^{motion}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.T1.7.m2.1e">italic_R start_POSTSUPERSCRIPT italic_m italic_o italic_t italic_i italic_o italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> and <math alttext="R_{t}" class="ltx_Math" display="inline" id="S2.T1.8.m3.1"><semantics id="S2.T1.8.m3.1b"><msub id="S2.T1.8.m3.1.1" xref="S2.T1.8.m3.1.1.cmml"><mi id="S2.T1.8.m3.1.1.2" xref="S2.T1.8.m3.1.1.2.cmml">R</mi><mi id="S2.T1.8.m3.1.1.3" xref="S2.T1.8.m3.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.T1.8.m3.1c"><apply id="S2.T1.8.m3.1.1.cmml" xref="S2.T1.8.m3.1.1"><csymbol cd="ambiguous" id="S2.T1.8.m3.1.1.1.cmml" xref="S2.T1.8.m3.1.1">subscript</csymbol><ci id="S2.T1.8.m3.1.1.2.cmml" xref="S2.T1.8.m3.1.1.2">𝑅</ci><ci id="S2.T1.8.m3.1.1.3.cmml" xref="S2.T1.8.m3.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.T1.8.m3.1d">R_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.T1.8.m3.1e">italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> denote the estimated motion bit rate and total bit rate, respectively. <math alttext="D" class="ltx_Math" display="inline" id="S2.T1.9.m4.1"><semantics id="S2.T1.9.m4.1b"><mi id="S2.T1.9.m4.1.1" xref="S2.T1.9.m4.1.1.cmml">D</mi><annotation-xml encoding="MathML-Content" id="S2.T1.9.m4.1c"><ci id="S2.T1.9.m4.1.1.cmml" xref="S2.T1.9.m4.1.1">𝐷</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.T1.9.m4.1d">D</annotation><annotation encoding="application/x-llamapun" id="S2.T1.9.m4.1e">italic_D</annotation></semantics></math> measures the mean-squared error between the two signals. <math alttext="\lambda" class="ltx_Math" display="inline" id="S2.T1.10.m5.1"><semantics id="S2.T1.10.m5.1b"><mi id="S2.T1.10.m5.1.1" xref="S2.T1.10.m5.1.1.cmml">λ</mi><annotation-xml encoding="MathML-Content" id="S2.T1.10.m5.1c"><ci id="S2.T1.10.m5.1.1.cmml" xref="S2.T1.10.m5.1.1">𝜆</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.T1.10.m5.1d">\lambda</annotation><annotation encoding="application/x-llamapun" id="S2.T1.10.m5.1e">italic_λ</annotation></semantics></math> is a hyper parameter.</figcaption> <table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle" id="S2.T1.17"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S2.T1.17.8.1"> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S2.T1.17.8.1.1" rowspan="2" style="padding-left:2.8pt;padding-right:2.8pt;"><span class="ltx_text" id="S2.T1.17.8.1.1.1">Phase</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S2.T1.17.8.1.2" rowspan="2" style="padding-left:2.8pt;padding-right:2.8pt;"><span class="ltx_text" id="S2.T1.17.8.1.2.1">Step</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" colspan="5" id="S2.T1.17.8.1.3" style="padding-left:2.8pt;padding-right:2.8pt;">Training Modules</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S2.T1.17.8.1.4" rowspan="2" style="padding-left:2.8pt;padding-right:2.8pt;"><span class="ltx_text" id="S2.T1.17.8.1.4.1">Training Objective</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S2.T1.17.8.1.5" rowspan="2" style="padding-left:2.8pt;padding-right:2.8pt;"><span class="ltx_text" id="S2.T1.17.8.1.5.1"> <span class="ltx_tabular ltx_align_middle" id="S2.T1.17.8.1.5.1.1"> <span class="ltx_tr" id="S2.T1.17.8.1.5.1.1.1"> <span class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.17.8.1.5.1.1.1.1" style="padding-left:2.8pt;padding-right:2.8pt;"># of</span></span> <span class="ltx_tr" id="S2.T1.17.8.1.5.1.1.2"> <span class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.17.8.1.5.1.1.2.1" style="padding-left:2.8pt;padding-right:2.8pt;">Frames</span></span> </span></span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S2.T1.17.8.1.6" rowspan="2" style="padding-left:2.8pt;padding-right:2.8pt;"><span class="ltx_text" id="S2.T1.17.8.1.6.1">Epoch</span></th> </tr> <tr class="ltx_tr" id="S2.T1.17.9.2"> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S2.T1.17.9.2.1" style="padding-left:2.8pt;padding-right:2.8pt;">Inter-frame Codec</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S2.T1.17.9.2.2" style="padding-left:2.8pt;padding-right:2.8pt;">MENet</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S2.T1.17.9.2.3" style="padding-left:2.8pt;padding-right:2.8pt;">Motion Codec</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S2.T1.17.9.2.4" style="padding-left:2.8pt;padding-right:2.8pt;">FE &amp; 3x3 Conv</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S2.T1.17.9.2.5" style="padding-left:2.8pt;padding-right:2.8pt;">Mask Generator</th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S2.T1.11.1"> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.11.1.2" style="padding-left:2.8pt;padding-right:2.8pt;">Motion Compensation</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.11.1.3" style="padding-left:2.8pt;padding-right:2.8pt;">1</td> <td class="ltx_td ltx_border_t" id="S2.T1.11.1.4" style="padding-left:2.8pt;padding-right:2.8pt;"></td> <td class="ltx_td ltx_border_t" id="S2.T1.11.1.5" style="padding-left:2.8pt;padding-right:2.8pt;"></td> <td class="ltx_td ltx_border_t" id="S2.T1.11.1.6" style="padding-left:2.8pt;padding-right:2.8pt;"></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.11.1.7" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_border_t" id="S2.T1.11.1.8" style="padding-left:2.8pt;padding-right:2.8pt;"></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.11.1.1" style="padding-left:2.8pt;padding-right:2.8pt;"><math alttext="R^{motion}_{t}+\lambda\times D(x_{t},\ddot{x}_{c})" class="ltx_Math" display="inline" id="S2.T1.11.1.1.m1.2"><semantics id="S2.T1.11.1.1.m1.2a"><mrow id="S2.T1.11.1.1.m1.2.2" xref="S2.T1.11.1.1.m1.2.2.cmml"><msubsup id="S2.T1.11.1.1.m1.2.2.4" xref="S2.T1.11.1.1.m1.2.2.4.cmml"><mi id="S2.T1.11.1.1.m1.2.2.4.2.2" xref="S2.T1.11.1.1.m1.2.2.4.2.2.cmml">R</mi><mi id="S2.T1.11.1.1.m1.2.2.4.3" xref="S2.T1.11.1.1.m1.2.2.4.3.cmml">t</mi><mrow id="S2.T1.11.1.1.m1.2.2.4.2.3" xref="S2.T1.11.1.1.m1.2.2.4.2.3.cmml"><mi id="S2.T1.11.1.1.m1.2.2.4.2.3.2" xref="S2.T1.11.1.1.m1.2.2.4.2.3.2.cmml">m</mi><mo id="S2.T1.11.1.1.m1.2.2.4.2.3.1" xref="S2.T1.11.1.1.m1.2.2.4.2.3.1.cmml">⁢</mo><mi id="S2.T1.11.1.1.m1.2.2.4.2.3.3" xref="S2.T1.11.1.1.m1.2.2.4.2.3.3.cmml">o</mi><mo id="S2.T1.11.1.1.m1.2.2.4.2.3.1a" xref="S2.T1.11.1.1.m1.2.2.4.2.3.1.cmml">⁢</mo><mi id="S2.T1.11.1.1.m1.2.2.4.2.3.4" xref="S2.T1.11.1.1.m1.2.2.4.2.3.4.cmml">t</mi><mo id="S2.T1.11.1.1.m1.2.2.4.2.3.1b" xref="S2.T1.11.1.1.m1.2.2.4.2.3.1.cmml">⁢</mo><mi id="S2.T1.11.1.1.m1.2.2.4.2.3.5" xref="S2.T1.11.1.1.m1.2.2.4.2.3.5.cmml">i</mi><mo id="S2.T1.11.1.1.m1.2.2.4.2.3.1c" xref="S2.T1.11.1.1.m1.2.2.4.2.3.1.cmml">⁢</mo><mi id="S2.T1.11.1.1.m1.2.2.4.2.3.6" xref="S2.T1.11.1.1.m1.2.2.4.2.3.6.cmml">o</mi><mo id="S2.T1.11.1.1.m1.2.2.4.2.3.1d" xref="S2.T1.11.1.1.m1.2.2.4.2.3.1.cmml">⁢</mo><mi id="S2.T1.11.1.1.m1.2.2.4.2.3.7" xref="S2.T1.11.1.1.m1.2.2.4.2.3.7.cmml">n</mi></mrow></msubsup><mo id="S2.T1.11.1.1.m1.2.2.3" xref="S2.T1.11.1.1.m1.2.2.3.cmml">+</mo><mrow id="S2.T1.11.1.1.m1.2.2.2" xref="S2.T1.11.1.1.m1.2.2.2.cmml"><mrow id="S2.T1.11.1.1.m1.2.2.2.4" xref="S2.T1.11.1.1.m1.2.2.2.4.cmml"><mi id="S2.T1.11.1.1.m1.2.2.2.4.2" xref="S2.T1.11.1.1.m1.2.2.2.4.2.cmml">λ</mi><mo id="S2.T1.11.1.1.m1.2.2.2.4.1" lspace="0.222em" rspace="0.222em" xref="S2.T1.11.1.1.m1.2.2.2.4.1.cmml">×</mo><mi id="S2.T1.11.1.1.m1.2.2.2.4.3" xref="S2.T1.11.1.1.m1.2.2.2.4.3.cmml">D</mi></mrow><mo id="S2.T1.11.1.1.m1.2.2.2.3" xref="S2.T1.11.1.1.m1.2.2.2.3.cmml">⁢</mo><mrow id="S2.T1.11.1.1.m1.2.2.2.2.2" xref="S2.T1.11.1.1.m1.2.2.2.2.3.cmml"><mo id="S2.T1.11.1.1.m1.2.2.2.2.2.3" stretchy="false" xref="S2.T1.11.1.1.m1.2.2.2.2.3.cmml">(</mo><msub id="S2.T1.11.1.1.m1.1.1.1.1.1.1" xref="S2.T1.11.1.1.m1.1.1.1.1.1.1.cmml"><mi id="S2.T1.11.1.1.m1.1.1.1.1.1.1.2" xref="S2.T1.11.1.1.m1.1.1.1.1.1.1.2.cmml">x</mi><mi id="S2.T1.11.1.1.m1.1.1.1.1.1.1.3" xref="S2.T1.11.1.1.m1.1.1.1.1.1.1.3.cmml">t</mi></msub><mo id="S2.T1.11.1.1.m1.2.2.2.2.2.4" xref="S2.T1.11.1.1.m1.2.2.2.2.3.cmml">,</mo><msub id="S2.T1.11.1.1.m1.2.2.2.2.2.2" xref="S2.T1.11.1.1.m1.2.2.2.2.2.2.cmml"><mover accent="true" id="S2.T1.11.1.1.m1.2.2.2.2.2.2.2" xref="S2.T1.11.1.1.m1.2.2.2.2.2.2.2.cmml"><mi id="S2.T1.11.1.1.m1.2.2.2.2.2.2.2.2" xref="S2.T1.11.1.1.m1.2.2.2.2.2.2.2.2.cmml">x</mi><mo id="S2.T1.11.1.1.m1.2.2.2.2.2.2.2.1" xref="S2.T1.11.1.1.m1.2.2.2.2.2.2.2.1.cmml">¨</mo></mover><mi id="S2.T1.11.1.1.m1.2.2.2.2.2.2.3" xref="S2.T1.11.1.1.m1.2.2.2.2.2.2.3.cmml">c</mi></msub><mo id="S2.T1.11.1.1.m1.2.2.2.2.2.5" stretchy="false" xref="S2.T1.11.1.1.m1.2.2.2.2.3.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.T1.11.1.1.m1.2b"><apply id="S2.T1.11.1.1.m1.2.2.cmml" xref="S2.T1.11.1.1.m1.2.2"><plus id="S2.T1.11.1.1.m1.2.2.3.cmml" xref="S2.T1.11.1.1.m1.2.2.3"></plus><apply id="S2.T1.11.1.1.m1.2.2.4.cmml" xref="S2.T1.11.1.1.m1.2.2.4"><csymbol cd="ambiguous" id="S2.T1.11.1.1.m1.2.2.4.1.cmml" xref="S2.T1.11.1.1.m1.2.2.4">subscript</csymbol><apply id="S2.T1.11.1.1.m1.2.2.4.2.cmml" xref="S2.T1.11.1.1.m1.2.2.4"><csymbol cd="ambiguous" id="S2.T1.11.1.1.m1.2.2.4.2.1.cmml" xref="S2.T1.11.1.1.m1.2.2.4">superscript</csymbol><ci id="S2.T1.11.1.1.m1.2.2.4.2.2.cmml" xref="S2.T1.11.1.1.m1.2.2.4.2.2">𝑅</ci><apply id="S2.T1.11.1.1.m1.2.2.4.2.3.cmml" xref="S2.T1.11.1.1.m1.2.2.4.2.3"><times id="S2.T1.11.1.1.m1.2.2.4.2.3.1.cmml" xref="S2.T1.11.1.1.m1.2.2.4.2.3.1"></times><ci id="S2.T1.11.1.1.m1.2.2.4.2.3.2.cmml" xref="S2.T1.11.1.1.m1.2.2.4.2.3.2">𝑚</ci><ci id="S2.T1.11.1.1.m1.2.2.4.2.3.3.cmml" xref="S2.T1.11.1.1.m1.2.2.4.2.3.3">𝑜</ci><ci id="S2.T1.11.1.1.m1.2.2.4.2.3.4.cmml" xref="S2.T1.11.1.1.m1.2.2.4.2.3.4">𝑡</ci><ci id="S2.T1.11.1.1.m1.2.2.4.2.3.5.cmml" xref="S2.T1.11.1.1.m1.2.2.4.2.3.5">𝑖</ci><ci id="S2.T1.11.1.1.m1.2.2.4.2.3.6.cmml" xref="S2.T1.11.1.1.m1.2.2.4.2.3.6">𝑜</ci><ci id="S2.T1.11.1.1.m1.2.2.4.2.3.7.cmml" xref="S2.T1.11.1.1.m1.2.2.4.2.3.7">𝑛</ci></apply></apply><ci id="S2.T1.11.1.1.m1.2.2.4.3.cmml" xref="S2.T1.11.1.1.m1.2.2.4.3">𝑡</ci></apply><apply id="S2.T1.11.1.1.m1.2.2.2.cmml" xref="S2.T1.11.1.1.m1.2.2.2"><times id="S2.T1.11.1.1.m1.2.2.2.3.cmml" xref="S2.T1.11.1.1.m1.2.2.2.3"></times><apply id="S2.T1.11.1.1.m1.2.2.2.4.cmml" xref="S2.T1.11.1.1.m1.2.2.2.4"><times id="S2.T1.11.1.1.m1.2.2.2.4.1.cmml" xref="S2.T1.11.1.1.m1.2.2.2.4.1"></times><ci id="S2.T1.11.1.1.m1.2.2.2.4.2.cmml" xref="S2.T1.11.1.1.m1.2.2.2.4.2">𝜆</ci><ci id="S2.T1.11.1.1.m1.2.2.2.4.3.cmml" xref="S2.T1.11.1.1.m1.2.2.2.4.3">𝐷</ci></apply><interval closure="open" id="S2.T1.11.1.1.m1.2.2.2.2.3.cmml" xref="S2.T1.11.1.1.m1.2.2.2.2.2"><apply id="S2.T1.11.1.1.m1.1.1.1.1.1.1.cmml" xref="S2.T1.11.1.1.m1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.T1.11.1.1.m1.1.1.1.1.1.1.1.cmml" xref="S2.T1.11.1.1.m1.1.1.1.1.1.1">subscript</csymbol><ci id="S2.T1.11.1.1.m1.1.1.1.1.1.1.2.cmml" xref="S2.T1.11.1.1.m1.1.1.1.1.1.1.2">𝑥</ci><ci id="S2.T1.11.1.1.m1.1.1.1.1.1.1.3.cmml" xref="S2.T1.11.1.1.m1.1.1.1.1.1.1.3">𝑡</ci></apply><apply id="S2.T1.11.1.1.m1.2.2.2.2.2.2.cmml" xref="S2.T1.11.1.1.m1.2.2.2.2.2.2"><csymbol cd="ambiguous" id="S2.T1.11.1.1.m1.2.2.2.2.2.2.1.cmml" xref="S2.T1.11.1.1.m1.2.2.2.2.2.2">subscript</csymbol><apply id="S2.T1.11.1.1.m1.2.2.2.2.2.2.2.cmml" xref="S2.T1.11.1.1.m1.2.2.2.2.2.2.2"><ci id="S2.T1.11.1.1.m1.2.2.2.2.2.2.2.1.cmml" xref="S2.T1.11.1.1.m1.2.2.2.2.2.2.2.1">¨</ci><ci id="S2.T1.11.1.1.m1.2.2.2.2.2.2.2.2.cmml" xref="S2.T1.11.1.1.m1.2.2.2.2.2.2.2.2">𝑥</ci></apply><ci id="S2.T1.11.1.1.m1.2.2.2.2.2.2.3.cmml" xref="S2.T1.11.1.1.m1.2.2.2.2.2.2.3">𝑐</ci></apply></interval></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.T1.11.1.1.m1.2c">R^{motion}_{t}+\lambda\times D(x_{t},\ddot{x}_{c})</annotation><annotation encoding="application/x-llamapun" id="S2.T1.11.1.1.m1.2d">italic_R start_POSTSUPERSCRIPT italic_m italic_o italic_t italic_i italic_o italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ × italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over¨ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT )</annotation></semantics></math></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.11.1.9" style="padding-left:2.8pt;padding-right:2.8pt;">3</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.11.1.10" style="padding-left:2.8pt;padding-right:2.8pt;">2</td> </tr> <tr class="ltx_tr" id="S2.T1.12.2"> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.12.2.2" rowspan="2" style="padding-left:2.8pt;padding-right:2.8pt;"><span class="ltx_text" id="S2.T1.12.2.2.1">Inter-frame Coding</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.12.2.3" style="padding-left:2.8pt;padding-right:2.8pt;">2</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.12.2.4" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_border_t" id="S2.T1.12.2.5" style="padding-left:2.8pt;padding-right:2.8pt;"></td> <td class="ltx_td ltx_border_t" id="S2.T1.12.2.6" style="padding-left:2.8pt;padding-right:2.8pt;"></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.12.2.7" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.12.2.8" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.12.2.1" style="padding-left:2.8pt;padding-right:2.8pt;"><math alttext="R_{t}+\lambda\times D(x_{t},\hat{x}_{t})" class="ltx_Math" display="inline" id="S2.T1.12.2.1.m1.2"><semantics id="S2.T1.12.2.1.m1.2a"><mrow id="S2.T1.12.2.1.m1.2.2" xref="S2.T1.12.2.1.m1.2.2.cmml"><msub id="S2.T1.12.2.1.m1.2.2.4" xref="S2.T1.12.2.1.m1.2.2.4.cmml"><mi id="S2.T1.12.2.1.m1.2.2.4.2" xref="S2.T1.12.2.1.m1.2.2.4.2.cmml">R</mi><mi id="S2.T1.12.2.1.m1.2.2.4.3" xref="S2.T1.12.2.1.m1.2.2.4.3.cmml">t</mi></msub><mo id="S2.T1.12.2.1.m1.2.2.3" xref="S2.T1.12.2.1.m1.2.2.3.cmml">+</mo><mrow id="S2.T1.12.2.1.m1.2.2.2" xref="S2.T1.12.2.1.m1.2.2.2.cmml"><mrow id="S2.T1.12.2.1.m1.2.2.2.4" xref="S2.T1.12.2.1.m1.2.2.2.4.cmml"><mi id="S2.T1.12.2.1.m1.2.2.2.4.2" xref="S2.T1.12.2.1.m1.2.2.2.4.2.cmml">λ</mi><mo id="S2.T1.12.2.1.m1.2.2.2.4.1" lspace="0.222em" rspace="0.222em" xref="S2.T1.12.2.1.m1.2.2.2.4.1.cmml">×</mo><mi id="S2.T1.12.2.1.m1.2.2.2.4.3" xref="S2.T1.12.2.1.m1.2.2.2.4.3.cmml">D</mi></mrow><mo id="S2.T1.12.2.1.m1.2.2.2.3" xref="S2.T1.12.2.1.m1.2.2.2.3.cmml">⁢</mo><mrow id="S2.T1.12.2.1.m1.2.2.2.2.2" xref="S2.T1.12.2.1.m1.2.2.2.2.3.cmml"><mo id="S2.T1.12.2.1.m1.2.2.2.2.2.3" stretchy="false" xref="S2.T1.12.2.1.m1.2.2.2.2.3.cmml">(</mo><msub id="S2.T1.12.2.1.m1.1.1.1.1.1.1" xref="S2.T1.12.2.1.m1.1.1.1.1.1.1.cmml"><mi id="S2.T1.12.2.1.m1.1.1.1.1.1.1.2" xref="S2.T1.12.2.1.m1.1.1.1.1.1.1.2.cmml">x</mi><mi id="S2.T1.12.2.1.m1.1.1.1.1.1.1.3" xref="S2.T1.12.2.1.m1.1.1.1.1.1.1.3.cmml">t</mi></msub><mo id="S2.T1.12.2.1.m1.2.2.2.2.2.4" xref="S2.T1.12.2.1.m1.2.2.2.2.3.cmml">,</mo><msub id="S2.T1.12.2.1.m1.2.2.2.2.2.2" xref="S2.T1.12.2.1.m1.2.2.2.2.2.2.cmml"><mover accent="true" id="S2.T1.12.2.1.m1.2.2.2.2.2.2.2" xref="S2.T1.12.2.1.m1.2.2.2.2.2.2.2.cmml"><mi id="S2.T1.12.2.1.m1.2.2.2.2.2.2.2.2" xref="S2.T1.12.2.1.m1.2.2.2.2.2.2.2.2.cmml">x</mi><mo id="S2.T1.12.2.1.m1.2.2.2.2.2.2.2.1" xref="S2.T1.12.2.1.m1.2.2.2.2.2.2.2.1.cmml">^</mo></mover><mi id="S2.T1.12.2.1.m1.2.2.2.2.2.2.3" xref="S2.T1.12.2.1.m1.2.2.2.2.2.2.3.cmml">t</mi></msub><mo id="S2.T1.12.2.1.m1.2.2.2.2.2.5" stretchy="false" xref="S2.T1.12.2.1.m1.2.2.2.2.3.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.T1.12.2.1.m1.2b"><apply id="S2.T1.12.2.1.m1.2.2.cmml" xref="S2.T1.12.2.1.m1.2.2"><plus id="S2.T1.12.2.1.m1.2.2.3.cmml" xref="S2.T1.12.2.1.m1.2.2.3"></plus><apply id="S2.T1.12.2.1.m1.2.2.4.cmml" xref="S2.T1.12.2.1.m1.2.2.4"><csymbol cd="ambiguous" id="S2.T1.12.2.1.m1.2.2.4.1.cmml" xref="S2.T1.12.2.1.m1.2.2.4">subscript</csymbol><ci id="S2.T1.12.2.1.m1.2.2.4.2.cmml" xref="S2.T1.12.2.1.m1.2.2.4.2">𝑅</ci><ci id="S2.T1.12.2.1.m1.2.2.4.3.cmml" xref="S2.T1.12.2.1.m1.2.2.4.3">𝑡</ci></apply><apply id="S2.T1.12.2.1.m1.2.2.2.cmml" xref="S2.T1.12.2.1.m1.2.2.2"><times id="S2.T1.12.2.1.m1.2.2.2.3.cmml" xref="S2.T1.12.2.1.m1.2.2.2.3"></times><apply id="S2.T1.12.2.1.m1.2.2.2.4.cmml" xref="S2.T1.12.2.1.m1.2.2.2.4"><times id="S2.T1.12.2.1.m1.2.2.2.4.1.cmml" xref="S2.T1.12.2.1.m1.2.2.2.4.1"></times><ci id="S2.T1.12.2.1.m1.2.2.2.4.2.cmml" xref="S2.T1.12.2.1.m1.2.2.2.4.2">𝜆</ci><ci id="S2.T1.12.2.1.m1.2.2.2.4.3.cmml" xref="S2.T1.12.2.1.m1.2.2.2.4.3">𝐷</ci></apply><interval closure="open" id="S2.T1.12.2.1.m1.2.2.2.2.3.cmml" xref="S2.T1.12.2.1.m1.2.2.2.2.2"><apply id="S2.T1.12.2.1.m1.1.1.1.1.1.1.cmml" xref="S2.T1.12.2.1.m1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.T1.12.2.1.m1.1.1.1.1.1.1.1.cmml" xref="S2.T1.12.2.1.m1.1.1.1.1.1.1">subscript</csymbol><ci id="S2.T1.12.2.1.m1.1.1.1.1.1.1.2.cmml" xref="S2.T1.12.2.1.m1.1.1.1.1.1.1.2">𝑥</ci><ci id="S2.T1.12.2.1.m1.1.1.1.1.1.1.3.cmml" xref="S2.T1.12.2.1.m1.1.1.1.1.1.1.3">𝑡</ci></apply><apply id="S2.T1.12.2.1.m1.2.2.2.2.2.2.cmml" xref="S2.T1.12.2.1.m1.2.2.2.2.2.2"><csymbol cd="ambiguous" id="S2.T1.12.2.1.m1.2.2.2.2.2.2.1.cmml" xref="S2.T1.12.2.1.m1.2.2.2.2.2.2">subscript</csymbol><apply id="S2.T1.12.2.1.m1.2.2.2.2.2.2.2.cmml" xref="S2.T1.12.2.1.m1.2.2.2.2.2.2.2"><ci id="S2.T1.12.2.1.m1.2.2.2.2.2.2.2.1.cmml" xref="S2.T1.12.2.1.m1.2.2.2.2.2.2.2.1">^</ci><ci id="S2.T1.12.2.1.m1.2.2.2.2.2.2.2.2.cmml" xref="S2.T1.12.2.1.m1.2.2.2.2.2.2.2.2">𝑥</ci></apply><ci id="S2.T1.12.2.1.m1.2.2.2.2.2.2.3.cmml" xref="S2.T1.12.2.1.m1.2.2.2.2.2.2.3">𝑡</ci></apply></interval></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.T1.12.2.1.m1.2c">R_{t}+\lambda\times D(x_{t},\hat{x}_{t})</annotation><annotation encoding="application/x-llamapun" id="S2.T1.12.2.1.m1.2d">italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ × italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )</annotation></semantics></math></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.12.2.9" style="padding-left:2.8pt;padding-right:2.8pt;">3</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.12.2.10" style="padding-left:2.8pt;padding-right:2.8pt;">4</td> </tr> <tr class="ltx_tr" id="S2.T1.13.3"> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.13.3.2" style="padding-left:2.8pt;padding-right:2.8pt;">3</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.13.3.3" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_border_t" id="S2.T1.13.3.4" style="padding-left:2.8pt;padding-right:2.8pt;"></td> <td class="ltx_td ltx_border_t" id="S2.T1.13.3.5" style="padding-left:2.8pt;padding-right:2.8pt;"></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.13.3.6" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.13.3.7" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.13.3.1" style="padding-left:2.8pt;padding-right:2.8pt;"><math alttext="R_{t}+\lambda\times D(x_{t},\hat{x}_{t})" class="ltx_Math" display="inline" id="S2.T1.13.3.1.m1.2"><semantics id="S2.T1.13.3.1.m1.2a"><mrow id="S2.T1.13.3.1.m1.2.2" xref="S2.T1.13.3.1.m1.2.2.cmml"><msub id="S2.T1.13.3.1.m1.2.2.4" xref="S2.T1.13.3.1.m1.2.2.4.cmml"><mi id="S2.T1.13.3.1.m1.2.2.4.2" xref="S2.T1.13.3.1.m1.2.2.4.2.cmml">R</mi><mi id="S2.T1.13.3.1.m1.2.2.4.3" xref="S2.T1.13.3.1.m1.2.2.4.3.cmml">t</mi></msub><mo id="S2.T1.13.3.1.m1.2.2.3" xref="S2.T1.13.3.1.m1.2.2.3.cmml">+</mo><mrow id="S2.T1.13.3.1.m1.2.2.2" xref="S2.T1.13.3.1.m1.2.2.2.cmml"><mrow id="S2.T1.13.3.1.m1.2.2.2.4" xref="S2.T1.13.3.1.m1.2.2.2.4.cmml"><mi id="S2.T1.13.3.1.m1.2.2.2.4.2" xref="S2.T1.13.3.1.m1.2.2.2.4.2.cmml">λ</mi><mo id="S2.T1.13.3.1.m1.2.2.2.4.1" lspace="0.222em" rspace="0.222em" xref="S2.T1.13.3.1.m1.2.2.2.4.1.cmml">×</mo><mi id="S2.T1.13.3.1.m1.2.2.2.4.3" xref="S2.T1.13.3.1.m1.2.2.2.4.3.cmml">D</mi></mrow><mo id="S2.T1.13.3.1.m1.2.2.2.3" xref="S2.T1.13.3.1.m1.2.2.2.3.cmml">⁢</mo><mrow id="S2.T1.13.3.1.m1.2.2.2.2.2" xref="S2.T1.13.3.1.m1.2.2.2.2.3.cmml"><mo id="S2.T1.13.3.1.m1.2.2.2.2.2.3" stretchy="false" xref="S2.T1.13.3.1.m1.2.2.2.2.3.cmml">(</mo><msub id="S2.T1.13.3.1.m1.1.1.1.1.1.1" xref="S2.T1.13.3.1.m1.1.1.1.1.1.1.cmml"><mi id="S2.T1.13.3.1.m1.1.1.1.1.1.1.2" xref="S2.T1.13.3.1.m1.1.1.1.1.1.1.2.cmml">x</mi><mi id="S2.T1.13.3.1.m1.1.1.1.1.1.1.3" xref="S2.T1.13.3.1.m1.1.1.1.1.1.1.3.cmml">t</mi></msub><mo id="S2.T1.13.3.1.m1.2.2.2.2.2.4" xref="S2.T1.13.3.1.m1.2.2.2.2.3.cmml">,</mo><msub id="S2.T1.13.3.1.m1.2.2.2.2.2.2" xref="S2.T1.13.3.1.m1.2.2.2.2.2.2.cmml"><mover accent="true" id="S2.T1.13.3.1.m1.2.2.2.2.2.2.2" xref="S2.T1.13.3.1.m1.2.2.2.2.2.2.2.cmml"><mi id="S2.T1.13.3.1.m1.2.2.2.2.2.2.2.2" xref="S2.T1.13.3.1.m1.2.2.2.2.2.2.2.2.cmml">x</mi><mo id="S2.T1.13.3.1.m1.2.2.2.2.2.2.2.1" xref="S2.T1.13.3.1.m1.2.2.2.2.2.2.2.1.cmml">^</mo></mover><mi id="S2.T1.13.3.1.m1.2.2.2.2.2.2.3" xref="S2.T1.13.3.1.m1.2.2.2.2.2.2.3.cmml">t</mi></msub><mo id="S2.T1.13.3.1.m1.2.2.2.2.2.5" stretchy="false" xref="S2.T1.13.3.1.m1.2.2.2.2.3.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.T1.13.3.1.m1.2b"><apply id="S2.T1.13.3.1.m1.2.2.cmml" xref="S2.T1.13.3.1.m1.2.2"><plus id="S2.T1.13.3.1.m1.2.2.3.cmml" xref="S2.T1.13.3.1.m1.2.2.3"></plus><apply id="S2.T1.13.3.1.m1.2.2.4.cmml" xref="S2.T1.13.3.1.m1.2.2.4"><csymbol cd="ambiguous" id="S2.T1.13.3.1.m1.2.2.4.1.cmml" xref="S2.T1.13.3.1.m1.2.2.4">subscript</csymbol><ci id="S2.T1.13.3.1.m1.2.2.4.2.cmml" xref="S2.T1.13.3.1.m1.2.2.4.2">𝑅</ci><ci id="S2.T1.13.3.1.m1.2.2.4.3.cmml" xref="S2.T1.13.3.1.m1.2.2.4.3">𝑡</ci></apply><apply id="S2.T1.13.3.1.m1.2.2.2.cmml" xref="S2.T1.13.3.1.m1.2.2.2"><times id="S2.T1.13.3.1.m1.2.2.2.3.cmml" xref="S2.T1.13.3.1.m1.2.2.2.3"></times><apply id="S2.T1.13.3.1.m1.2.2.2.4.cmml" xref="S2.T1.13.3.1.m1.2.2.2.4"><times id="S2.T1.13.3.1.m1.2.2.2.4.1.cmml" xref="S2.T1.13.3.1.m1.2.2.2.4.1"></times><ci id="S2.T1.13.3.1.m1.2.2.2.4.2.cmml" xref="S2.T1.13.3.1.m1.2.2.2.4.2">𝜆</ci><ci id="S2.T1.13.3.1.m1.2.2.2.4.3.cmml" xref="S2.T1.13.3.1.m1.2.2.2.4.3">𝐷</ci></apply><interval closure="open" id="S2.T1.13.3.1.m1.2.2.2.2.3.cmml" xref="S2.T1.13.3.1.m1.2.2.2.2.2"><apply id="S2.T1.13.3.1.m1.1.1.1.1.1.1.cmml" xref="S2.T1.13.3.1.m1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.T1.13.3.1.m1.1.1.1.1.1.1.1.cmml" xref="S2.T1.13.3.1.m1.1.1.1.1.1.1">subscript</csymbol><ci id="S2.T1.13.3.1.m1.1.1.1.1.1.1.2.cmml" xref="S2.T1.13.3.1.m1.1.1.1.1.1.1.2">𝑥</ci><ci id="S2.T1.13.3.1.m1.1.1.1.1.1.1.3.cmml" xref="S2.T1.13.3.1.m1.1.1.1.1.1.1.3">𝑡</ci></apply><apply id="S2.T1.13.3.1.m1.2.2.2.2.2.2.cmml" xref="S2.T1.13.3.1.m1.2.2.2.2.2.2"><csymbol cd="ambiguous" id="S2.T1.13.3.1.m1.2.2.2.2.2.2.1.cmml" xref="S2.T1.13.3.1.m1.2.2.2.2.2.2">subscript</csymbol><apply id="S2.T1.13.3.1.m1.2.2.2.2.2.2.2.cmml" xref="S2.T1.13.3.1.m1.2.2.2.2.2.2.2"><ci id="S2.T1.13.3.1.m1.2.2.2.2.2.2.2.1.cmml" xref="S2.T1.13.3.1.m1.2.2.2.2.2.2.2.1">^</ci><ci id="S2.T1.13.3.1.m1.2.2.2.2.2.2.2.2.cmml" xref="S2.T1.13.3.1.m1.2.2.2.2.2.2.2.2">𝑥</ci></apply><ci id="S2.T1.13.3.1.m1.2.2.2.2.2.2.3.cmml" xref="S2.T1.13.3.1.m1.2.2.2.2.2.2.3">𝑡</ci></apply></interval></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.T1.13.3.1.m1.2c">R_{t}+\lambda\times D(x_{t},\hat{x}_{t})</annotation><annotation encoding="application/x-llamapun" id="S2.T1.13.3.1.m1.2d">italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ × italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )</annotation></semantics></math></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.13.3.8" style="padding-left:2.8pt;padding-right:2.8pt;">5</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.13.3.9" style="padding-left:2.8pt;padding-right:2.8pt;">4</td> </tr> <tr class="ltx_tr" id="S2.T1.14.4"> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.14.4.2" rowspan="2" style="padding-left:2.8pt;padding-right:2.8pt;"><span class="ltx_text" id="S2.T1.14.4.2.1">Finetuning</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.14.4.3" style="padding-left:2.8pt;padding-right:2.8pt;">4</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.14.4.4" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_border_t" id="S2.T1.14.4.5" style="padding-left:2.8pt;padding-right:2.8pt;"></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.14.4.6" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.14.4.7" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.14.4.8" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.14.4.1" style="padding-left:2.8pt;padding-right:2.8pt;"><math alttext="R_{t}+\lambda\times D(x_{t},\hat{x}_{t})" class="ltx_Math" display="inline" id="S2.T1.14.4.1.m1.2"><semantics id="S2.T1.14.4.1.m1.2a"><mrow id="S2.T1.14.4.1.m1.2.2" xref="S2.T1.14.4.1.m1.2.2.cmml"><msub id="S2.T1.14.4.1.m1.2.2.4" xref="S2.T1.14.4.1.m1.2.2.4.cmml"><mi id="S2.T1.14.4.1.m1.2.2.4.2" xref="S2.T1.14.4.1.m1.2.2.4.2.cmml">R</mi><mi id="S2.T1.14.4.1.m1.2.2.4.3" xref="S2.T1.14.4.1.m1.2.2.4.3.cmml">t</mi></msub><mo id="S2.T1.14.4.1.m1.2.2.3" xref="S2.T1.14.4.1.m1.2.2.3.cmml">+</mo><mrow id="S2.T1.14.4.1.m1.2.2.2" xref="S2.T1.14.4.1.m1.2.2.2.cmml"><mrow id="S2.T1.14.4.1.m1.2.2.2.4" xref="S2.T1.14.4.1.m1.2.2.2.4.cmml"><mi id="S2.T1.14.4.1.m1.2.2.2.4.2" xref="S2.T1.14.4.1.m1.2.2.2.4.2.cmml">λ</mi><mo id="S2.T1.14.4.1.m1.2.2.2.4.1" lspace="0.222em" rspace="0.222em" xref="S2.T1.14.4.1.m1.2.2.2.4.1.cmml">×</mo><mi id="S2.T1.14.4.1.m1.2.2.2.4.3" xref="S2.T1.14.4.1.m1.2.2.2.4.3.cmml">D</mi></mrow><mo id="S2.T1.14.4.1.m1.2.2.2.3" xref="S2.T1.14.4.1.m1.2.2.2.3.cmml">⁢</mo><mrow id="S2.T1.14.4.1.m1.2.2.2.2.2" xref="S2.T1.14.4.1.m1.2.2.2.2.3.cmml"><mo id="S2.T1.14.4.1.m1.2.2.2.2.2.3" stretchy="false" xref="S2.T1.14.4.1.m1.2.2.2.2.3.cmml">(</mo><msub id="S2.T1.14.4.1.m1.1.1.1.1.1.1" xref="S2.T1.14.4.1.m1.1.1.1.1.1.1.cmml"><mi id="S2.T1.14.4.1.m1.1.1.1.1.1.1.2" xref="S2.T1.14.4.1.m1.1.1.1.1.1.1.2.cmml">x</mi><mi id="S2.T1.14.4.1.m1.1.1.1.1.1.1.3" xref="S2.T1.14.4.1.m1.1.1.1.1.1.1.3.cmml">t</mi></msub><mo id="S2.T1.14.4.1.m1.2.2.2.2.2.4" xref="S2.T1.14.4.1.m1.2.2.2.2.3.cmml">,</mo><msub id="S2.T1.14.4.1.m1.2.2.2.2.2.2" xref="S2.T1.14.4.1.m1.2.2.2.2.2.2.cmml"><mover accent="true" id="S2.T1.14.4.1.m1.2.2.2.2.2.2.2" xref="S2.T1.14.4.1.m1.2.2.2.2.2.2.2.cmml"><mi id="S2.T1.14.4.1.m1.2.2.2.2.2.2.2.2" xref="S2.T1.14.4.1.m1.2.2.2.2.2.2.2.2.cmml">x</mi><mo id="S2.T1.14.4.1.m1.2.2.2.2.2.2.2.1" xref="S2.T1.14.4.1.m1.2.2.2.2.2.2.2.1.cmml">^</mo></mover><mi id="S2.T1.14.4.1.m1.2.2.2.2.2.2.3" xref="S2.T1.14.4.1.m1.2.2.2.2.2.2.3.cmml">t</mi></msub><mo id="S2.T1.14.4.1.m1.2.2.2.2.2.5" stretchy="false" xref="S2.T1.14.4.1.m1.2.2.2.2.3.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.T1.14.4.1.m1.2b"><apply id="S2.T1.14.4.1.m1.2.2.cmml" xref="S2.T1.14.4.1.m1.2.2"><plus id="S2.T1.14.4.1.m1.2.2.3.cmml" xref="S2.T1.14.4.1.m1.2.2.3"></plus><apply id="S2.T1.14.4.1.m1.2.2.4.cmml" xref="S2.T1.14.4.1.m1.2.2.4"><csymbol cd="ambiguous" id="S2.T1.14.4.1.m1.2.2.4.1.cmml" xref="S2.T1.14.4.1.m1.2.2.4">subscript</csymbol><ci id="S2.T1.14.4.1.m1.2.2.4.2.cmml" xref="S2.T1.14.4.1.m1.2.2.4.2">𝑅</ci><ci id="S2.T1.14.4.1.m1.2.2.4.3.cmml" xref="S2.T1.14.4.1.m1.2.2.4.3">𝑡</ci></apply><apply id="S2.T1.14.4.1.m1.2.2.2.cmml" xref="S2.T1.14.4.1.m1.2.2.2"><times id="S2.T1.14.4.1.m1.2.2.2.3.cmml" xref="S2.T1.14.4.1.m1.2.2.2.3"></times><apply id="S2.T1.14.4.1.m1.2.2.2.4.cmml" xref="S2.T1.14.4.1.m1.2.2.2.4"><times id="S2.T1.14.4.1.m1.2.2.2.4.1.cmml" xref="S2.T1.14.4.1.m1.2.2.2.4.1"></times><ci id="S2.T1.14.4.1.m1.2.2.2.4.2.cmml" xref="S2.T1.14.4.1.m1.2.2.2.4.2">𝜆</ci><ci id="S2.T1.14.4.1.m1.2.2.2.4.3.cmml" xref="S2.T1.14.4.1.m1.2.2.2.4.3">𝐷</ci></apply><interval closure="open" id="S2.T1.14.4.1.m1.2.2.2.2.3.cmml" xref="S2.T1.14.4.1.m1.2.2.2.2.2"><apply id="S2.T1.14.4.1.m1.1.1.1.1.1.1.cmml" xref="S2.T1.14.4.1.m1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.T1.14.4.1.m1.1.1.1.1.1.1.1.cmml" xref="S2.T1.14.4.1.m1.1.1.1.1.1.1">subscript</csymbol><ci id="S2.T1.14.4.1.m1.1.1.1.1.1.1.2.cmml" xref="S2.T1.14.4.1.m1.1.1.1.1.1.1.2">𝑥</ci><ci id="S2.T1.14.4.1.m1.1.1.1.1.1.1.3.cmml" xref="S2.T1.14.4.1.m1.1.1.1.1.1.1.3">𝑡</ci></apply><apply id="S2.T1.14.4.1.m1.2.2.2.2.2.2.cmml" xref="S2.T1.14.4.1.m1.2.2.2.2.2.2"><csymbol cd="ambiguous" id="S2.T1.14.4.1.m1.2.2.2.2.2.2.1.cmml" xref="S2.T1.14.4.1.m1.2.2.2.2.2.2">subscript</csymbol><apply id="S2.T1.14.4.1.m1.2.2.2.2.2.2.2.cmml" xref="S2.T1.14.4.1.m1.2.2.2.2.2.2.2"><ci id="S2.T1.14.4.1.m1.2.2.2.2.2.2.2.1.cmml" xref="S2.T1.14.4.1.m1.2.2.2.2.2.2.2.1">^</ci><ci id="S2.T1.14.4.1.m1.2.2.2.2.2.2.2.2.cmml" xref="S2.T1.14.4.1.m1.2.2.2.2.2.2.2.2">𝑥</ci></apply><ci id="S2.T1.14.4.1.m1.2.2.2.2.2.2.3.cmml" xref="S2.T1.14.4.1.m1.2.2.2.2.2.2.3">𝑡</ci></apply></interval></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.T1.14.4.1.m1.2c">R_{t}+\lambda\times D(x_{t},\hat{x}_{t})</annotation><annotation encoding="application/x-llamapun" id="S2.T1.14.4.1.m1.2d">italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ × italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )</annotation></semantics></math></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.14.4.9" style="padding-left:2.8pt;padding-right:2.8pt;">3</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.14.4.10" style="padding-left:2.8pt;padding-right:2.8pt;">3</td> </tr> <tr class="ltx_tr" id="S2.T1.15.5"> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.15.5.2" style="padding-left:2.8pt;padding-right:2.8pt;">5</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.15.5.3" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_border_t" id="S2.T1.15.5.4" style="padding-left:2.8pt;padding-right:2.8pt;"></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.15.5.5" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.15.5.6" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.15.5.7" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.15.5.1" style="padding-left:2.8pt;padding-right:2.8pt;"><math alttext="R_{t}+\lambda\times D(x_{t},\hat{x}_{t})" class="ltx_Math" display="inline" id="S2.T1.15.5.1.m1.2"><semantics id="S2.T1.15.5.1.m1.2a"><mrow id="S2.T1.15.5.1.m1.2.2" xref="S2.T1.15.5.1.m1.2.2.cmml"><msub id="S2.T1.15.5.1.m1.2.2.4" xref="S2.T1.15.5.1.m1.2.2.4.cmml"><mi id="S2.T1.15.5.1.m1.2.2.4.2" xref="S2.T1.15.5.1.m1.2.2.4.2.cmml">R</mi><mi id="S2.T1.15.5.1.m1.2.2.4.3" xref="S2.T1.15.5.1.m1.2.2.4.3.cmml">t</mi></msub><mo id="S2.T1.15.5.1.m1.2.2.3" xref="S2.T1.15.5.1.m1.2.2.3.cmml">+</mo><mrow id="S2.T1.15.5.1.m1.2.2.2" xref="S2.T1.15.5.1.m1.2.2.2.cmml"><mrow id="S2.T1.15.5.1.m1.2.2.2.4" xref="S2.T1.15.5.1.m1.2.2.2.4.cmml"><mi id="S2.T1.15.5.1.m1.2.2.2.4.2" xref="S2.T1.15.5.1.m1.2.2.2.4.2.cmml">λ</mi><mo id="S2.T1.15.5.1.m1.2.2.2.4.1" lspace="0.222em" rspace="0.222em" xref="S2.T1.15.5.1.m1.2.2.2.4.1.cmml">×</mo><mi id="S2.T1.15.5.1.m1.2.2.2.4.3" xref="S2.T1.15.5.1.m1.2.2.2.4.3.cmml">D</mi></mrow><mo id="S2.T1.15.5.1.m1.2.2.2.3" xref="S2.T1.15.5.1.m1.2.2.2.3.cmml">⁢</mo><mrow id="S2.T1.15.5.1.m1.2.2.2.2.2" xref="S2.T1.15.5.1.m1.2.2.2.2.3.cmml"><mo id="S2.T1.15.5.1.m1.2.2.2.2.2.3" stretchy="false" xref="S2.T1.15.5.1.m1.2.2.2.2.3.cmml">(</mo><msub id="S2.T1.15.5.1.m1.1.1.1.1.1.1" xref="S2.T1.15.5.1.m1.1.1.1.1.1.1.cmml"><mi id="S2.T1.15.5.1.m1.1.1.1.1.1.1.2" xref="S2.T1.15.5.1.m1.1.1.1.1.1.1.2.cmml">x</mi><mi id="S2.T1.15.5.1.m1.1.1.1.1.1.1.3" xref="S2.T1.15.5.1.m1.1.1.1.1.1.1.3.cmml">t</mi></msub><mo id="S2.T1.15.5.1.m1.2.2.2.2.2.4" xref="S2.T1.15.5.1.m1.2.2.2.2.3.cmml">,</mo><msub id="S2.T1.15.5.1.m1.2.2.2.2.2.2" xref="S2.T1.15.5.1.m1.2.2.2.2.2.2.cmml"><mover accent="true" id="S2.T1.15.5.1.m1.2.2.2.2.2.2.2" xref="S2.T1.15.5.1.m1.2.2.2.2.2.2.2.cmml"><mi id="S2.T1.15.5.1.m1.2.2.2.2.2.2.2.2" xref="S2.T1.15.5.1.m1.2.2.2.2.2.2.2.2.cmml">x</mi><mo id="S2.T1.15.5.1.m1.2.2.2.2.2.2.2.1" xref="S2.T1.15.5.1.m1.2.2.2.2.2.2.2.1.cmml">^</mo></mover><mi id="S2.T1.15.5.1.m1.2.2.2.2.2.2.3" xref="S2.T1.15.5.1.m1.2.2.2.2.2.2.3.cmml">t</mi></msub><mo id="S2.T1.15.5.1.m1.2.2.2.2.2.5" stretchy="false" xref="S2.T1.15.5.1.m1.2.2.2.2.3.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.T1.15.5.1.m1.2b"><apply id="S2.T1.15.5.1.m1.2.2.cmml" xref="S2.T1.15.5.1.m1.2.2"><plus id="S2.T1.15.5.1.m1.2.2.3.cmml" xref="S2.T1.15.5.1.m1.2.2.3"></plus><apply id="S2.T1.15.5.1.m1.2.2.4.cmml" xref="S2.T1.15.5.1.m1.2.2.4"><csymbol cd="ambiguous" id="S2.T1.15.5.1.m1.2.2.4.1.cmml" xref="S2.T1.15.5.1.m1.2.2.4">subscript</csymbol><ci id="S2.T1.15.5.1.m1.2.2.4.2.cmml" xref="S2.T1.15.5.1.m1.2.2.4.2">𝑅</ci><ci id="S2.T1.15.5.1.m1.2.2.4.3.cmml" xref="S2.T1.15.5.1.m1.2.2.4.3">𝑡</ci></apply><apply id="S2.T1.15.5.1.m1.2.2.2.cmml" xref="S2.T1.15.5.1.m1.2.2.2"><times id="S2.T1.15.5.1.m1.2.2.2.3.cmml" xref="S2.T1.15.5.1.m1.2.2.2.3"></times><apply id="S2.T1.15.5.1.m1.2.2.2.4.cmml" xref="S2.T1.15.5.1.m1.2.2.2.4"><times id="S2.T1.15.5.1.m1.2.2.2.4.1.cmml" xref="S2.T1.15.5.1.m1.2.2.2.4.1"></times><ci id="S2.T1.15.5.1.m1.2.2.2.4.2.cmml" xref="S2.T1.15.5.1.m1.2.2.2.4.2">𝜆</ci><ci id="S2.T1.15.5.1.m1.2.2.2.4.3.cmml" xref="S2.T1.15.5.1.m1.2.2.2.4.3">𝐷</ci></apply><interval closure="open" id="S2.T1.15.5.1.m1.2.2.2.2.3.cmml" xref="S2.T1.15.5.1.m1.2.2.2.2.2"><apply id="S2.T1.15.5.1.m1.1.1.1.1.1.1.cmml" xref="S2.T1.15.5.1.m1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.T1.15.5.1.m1.1.1.1.1.1.1.1.cmml" xref="S2.T1.15.5.1.m1.1.1.1.1.1.1">subscript</csymbol><ci id="S2.T1.15.5.1.m1.1.1.1.1.1.1.2.cmml" xref="S2.T1.15.5.1.m1.1.1.1.1.1.1.2">𝑥</ci><ci id="S2.T1.15.5.1.m1.1.1.1.1.1.1.3.cmml" xref="S2.T1.15.5.1.m1.1.1.1.1.1.1.3">𝑡</ci></apply><apply id="S2.T1.15.5.1.m1.2.2.2.2.2.2.cmml" xref="S2.T1.15.5.1.m1.2.2.2.2.2.2"><csymbol cd="ambiguous" id="S2.T1.15.5.1.m1.2.2.2.2.2.2.1.cmml" xref="S2.T1.15.5.1.m1.2.2.2.2.2.2">subscript</csymbol><apply id="S2.T1.15.5.1.m1.2.2.2.2.2.2.2.cmml" xref="S2.T1.15.5.1.m1.2.2.2.2.2.2.2"><ci id="S2.T1.15.5.1.m1.2.2.2.2.2.2.2.1.cmml" xref="S2.T1.15.5.1.m1.2.2.2.2.2.2.2.1">^</ci><ci id="S2.T1.15.5.1.m1.2.2.2.2.2.2.2.2.cmml" xref="S2.T1.15.5.1.m1.2.2.2.2.2.2.2.2">𝑥</ci></apply><ci id="S2.T1.15.5.1.m1.2.2.2.2.2.2.3.cmml" xref="S2.T1.15.5.1.m1.2.2.2.2.2.2.3">𝑡</ci></apply></interval></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.T1.15.5.1.m1.2c">R_{t}+\lambda\times D(x_{t},\hat{x}_{t})</annotation><annotation encoding="application/x-llamapun" id="S2.T1.15.5.1.m1.2d">italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ × italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )</annotation></semantics></math></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.15.5.8" style="padding-left:2.8pt;padding-right:2.8pt;">5</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.15.5.9" style="padding-left:2.8pt;padding-right:2.8pt;">3</td> </tr> <tr class="ltx_tr" id="S2.T1.16.6"> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S2.T1.16.6.2" rowspan="2" style="padding-left:2.8pt;padding-right:2.8pt;"><span class="ltx_text" id="S2.T1.16.6.2.1">Finetuning with EPA</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.16.6.3" style="padding-left:2.8pt;padding-right:2.8pt;">6</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.16.6.4" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_border_t" id="S2.T1.16.6.5" style="padding-left:2.8pt;padding-right:2.8pt;"></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.16.6.6" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.16.6.7" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.16.6.8" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.16.6.1" style="padding-left:2.8pt;padding-right:2.8pt;"><math alttext="R_{t}+\lambda\times D(x_{t},\hat{x}_{t})" class="ltx_Math" display="inline" id="S2.T1.16.6.1.m1.2"><semantics id="S2.T1.16.6.1.m1.2a"><mrow id="S2.T1.16.6.1.m1.2.2" xref="S2.T1.16.6.1.m1.2.2.cmml"><msub id="S2.T1.16.6.1.m1.2.2.4" xref="S2.T1.16.6.1.m1.2.2.4.cmml"><mi id="S2.T1.16.6.1.m1.2.2.4.2" xref="S2.T1.16.6.1.m1.2.2.4.2.cmml">R</mi><mi id="S2.T1.16.6.1.m1.2.2.4.3" xref="S2.T1.16.6.1.m1.2.2.4.3.cmml">t</mi></msub><mo id="S2.T1.16.6.1.m1.2.2.3" xref="S2.T1.16.6.1.m1.2.2.3.cmml">+</mo><mrow id="S2.T1.16.6.1.m1.2.2.2" xref="S2.T1.16.6.1.m1.2.2.2.cmml"><mrow id="S2.T1.16.6.1.m1.2.2.2.4" xref="S2.T1.16.6.1.m1.2.2.2.4.cmml"><mi id="S2.T1.16.6.1.m1.2.2.2.4.2" xref="S2.T1.16.6.1.m1.2.2.2.4.2.cmml">λ</mi><mo id="S2.T1.16.6.1.m1.2.2.2.4.1" lspace="0.222em" rspace="0.222em" xref="S2.T1.16.6.1.m1.2.2.2.4.1.cmml">×</mo><mi id="S2.T1.16.6.1.m1.2.2.2.4.3" xref="S2.T1.16.6.1.m1.2.2.2.4.3.cmml">D</mi></mrow><mo id="S2.T1.16.6.1.m1.2.2.2.3" xref="S2.T1.16.6.1.m1.2.2.2.3.cmml">⁢</mo><mrow id="S2.T1.16.6.1.m1.2.2.2.2.2" xref="S2.T1.16.6.1.m1.2.2.2.2.3.cmml"><mo id="S2.T1.16.6.1.m1.2.2.2.2.2.3" stretchy="false" xref="S2.T1.16.6.1.m1.2.2.2.2.3.cmml">(</mo><msub id="S2.T1.16.6.1.m1.1.1.1.1.1.1" xref="S2.T1.16.6.1.m1.1.1.1.1.1.1.cmml"><mi id="S2.T1.16.6.1.m1.1.1.1.1.1.1.2" xref="S2.T1.16.6.1.m1.1.1.1.1.1.1.2.cmml">x</mi><mi id="S2.T1.16.6.1.m1.1.1.1.1.1.1.3" xref="S2.T1.16.6.1.m1.1.1.1.1.1.1.3.cmml">t</mi></msub><mo id="S2.T1.16.6.1.m1.2.2.2.2.2.4" xref="S2.T1.16.6.1.m1.2.2.2.2.3.cmml">,</mo><msub id="S2.T1.16.6.1.m1.2.2.2.2.2.2" xref="S2.T1.16.6.1.m1.2.2.2.2.2.2.cmml"><mover accent="true" id="S2.T1.16.6.1.m1.2.2.2.2.2.2.2" xref="S2.T1.16.6.1.m1.2.2.2.2.2.2.2.cmml"><mi id="S2.T1.16.6.1.m1.2.2.2.2.2.2.2.2" xref="S2.T1.16.6.1.m1.2.2.2.2.2.2.2.2.cmml">x</mi><mo id="S2.T1.16.6.1.m1.2.2.2.2.2.2.2.1" xref="S2.T1.16.6.1.m1.2.2.2.2.2.2.2.1.cmml">^</mo></mover><mi id="S2.T1.16.6.1.m1.2.2.2.2.2.2.3" xref="S2.T1.16.6.1.m1.2.2.2.2.2.2.3.cmml">t</mi></msub><mo id="S2.T1.16.6.1.m1.2.2.2.2.2.5" stretchy="false" xref="S2.T1.16.6.1.m1.2.2.2.2.3.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.T1.16.6.1.m1.2b"><apply id="S2.T1.16.6.1.m1.2.2.cmml" xref="S2.T1.16.6.1.m1.2.2"><plus id="S2.T1.16.6.1.m1.2.2.3.cmml" xref="S2.T1.16.6.1.m1.2.2.3"></plus><apply id="S2.T1.16.6.1.m1.2.2.4.cmml" xref="S2.T1.16.6.1.m1.2.2.4"><csymbol cd="ambiguous" id="S2.T1.16.6.1.m1.2.2.4.1.cmml" xref="S2.T1.16.6.1.m1.2.2.4">subscript</csymbol><ci id="S2.T1.16.6.1.m1.2.2.4.2.cmml" xref="S2.T1.16.6.1.m1.2.2.4.2">𝑅</ci><ci id="S2.T1.16.6.1.m1.2.2.4.3.cmml" xref="S2.T1.16.6.1.m1.2.2.4.3">𝑡</ci></apply><apply id="S2.T1.16.6.1.m1.2.2.2.cmml" xref="S2.T1.16.6.1.m1.2.2.2"><times id="S2.T1.16.6.1.m1.2.2.2.3.cmml" xref="S2.T1.16.6.1.m1.2.2.2.3"></times><apply id="S2.T1.16.6.1.m1.2.2.2.4.cmml" xref="S2.T1.16.6.1.m1.2.2.2.4"><times id="S2.T1.16.6.1.m1.2.2.2.4.1.cmml" xref="S2.T1.16.6.1.m1.2.2.2.4.1"></times><ci id="S2.T1.16.6.1.m1.2.2.2.4.2.cmml" xref="S2.T1.16.6.1.m1.2.2.2.4.2">𝜆</ci><ci id="S2.T1.16.6.1.m1.2.2.2.4.3.cmml" xref="S2.T1.16.6.1.m1.2.2.2.4.3">𝐷</ci></apply><interval closure="open" id="S2.T1.16.6.1.m1.2.2.2.2.3.cmml" xref="S2.T1.16.6.1.m1.2.2.2.2.2"><apply id="S2.T1.16.6.1.m1.1.1.1.1.1.1.cmml" xref="S2.T1.16.6.1.m1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.T1.16.6.1.m1.1.1.1.1.1.1.1.cmml" xref="S2.T1.16.6.1.m1.1.1.1.1.1.1">subscript</csymbol><ci id="S2.T1.16.6.1.m1.1.1.1.1.1.1.2.cmml" xref="S2.T1.16.6.1.m1.1.1.1.1.1.1.2">𝑥</ci><ci id="S2.T1.16.6.1.m1.1.1.1.1.1.1.3.cmml" xref="S2.T1.16.6.1.m1.1.1.1.1.1.1.3">𝑡</ci></apply><apply id="S2.T1.16.6.1.m1.2.2.2.2.2.2.cmml" xref="S2.T1.16.6.1.m1.2.2.2.2.2.2"><csymbol cd="ambiguous" id="S2.T1.16.6.1.m1.2.2.2.2.2.2.1.cmml" xref="S2.T1.16.6.1.m1.2.2.2.2.2.2">subscript</csymbol><apply id="S2.T1.16.6.1.m1.2.2.2.2.2.2.2.cmml" xref="S2.T1.16.6.1.m1.2.2.2.2.2.2.2"><ci id="S2.T1.16.6.1.m1.2.2.2.2.2.2.2.1.cmml" xref="S2.T1.16.6.1.m1.2.2.2.2.2.2.2.1">^</ci><ci id="S2.T1.16.6.1.m1.2.2.2.2.2.2.2.2.cmml" xref="S2.T1.16.6.1.m1.2.2.2.2.2.2.2.2">𝑥</ci></apply><ci id="S2.T1.16.6.1.m1.2.2.2.2.2.2.3.cmml" xref="S2.T1.16.6.1.m1.2.2.2.2.2.2.3">𝑡</ci></apply></interval></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.T1.16.6.1.m1.2c">R_{t}+\lambda\times D(x_{t},\hat{x}_{t})</annotation><annotation encoding="application/x-llamapun" id="S2.T1.16.6.1.m1.2d">italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ × italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )</annotation></semantics></math></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.16.6.9" style="padding-left:2.8pt;padding-right:2.8pt;">5</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.16.6.10" style="padding-left:2.8pt;padding-right:2.8pt;">2</td> </tr> <tr class="ltx_tr" id="S2.T1.17.7"> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S2.T1.17.7.2" style="padding-left:2.8pt;padding-right:2.8pt;">7</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S2.T1.17.7.3" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S2.T1.17.7.4" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S2.T1.17.7.5" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S2.T1.17.7.6" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S2.T1.17.7.7" style="padding-left:2.8pt;padding-right:2.8pt;">v</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S2.T1.17.7.1" style="padding-left:2.8pt;padding-right:2.8pt;"><math alttext="R_{t}+\lambda\times D(x_{t},\hat{x}_{t})" class="ltx_Math" display="inline" id="S2.T1.17.7.1.m1.2"><semantics id="S2.T1.17.7.1.m1.2a"><mrow id="S2.T1.17.7.1.m1.2.2" xref="S2.T1.17.7.1.m1.2.2.cmml"><msub id="S2.T1.17.7.1.m1.2.2.4" xref="S2.T1.17.7.1.m1.2.2.4.cmml"><mi id="S2.T1.17.7.1.m1.2.2.4.2" xref="S2.T1.17.7.1.m1.2.2.4.2.cmml">R</mi><mi id="S2.T1.17.7.1.m1.2.2.4.3" xref="S2.T1.17.7.1.m1.2.2.4.3.cmml">t</mi></msub><mo id="S2.T1.17.7.1.m1.2.2.3" xref="S2.T1.17.7.1.m1.2.2.3.cmml">+</mo><mrow id="S2.T1.17.7.1.m1.2.2.2" xref="S2.T1.17.7.1.m1.2.2.2.cmml"><mrow id="S2.T1.17.7.1.m1.2.2.2.4" xref="S2.T1.17.7.1.m1.2.2.2.4.cmml"><mi id="S2.T1.17.7.1.m1.2.2.2.4.2" xref="S2.T1.17.7.1.m1.2.2.2.4.2.cmml">λ</mi><mo id="S2.T1.17.7.1.m1.2.2.2.4.1" lspace="0.222em" rspace="0.222em" xref="S2.T1.17.7.1.m1.2.2.2.4.1.cmml">×</mo><mi id="S2.T1.17.7.1.m1.2.2.2.4.3" xref="S2.T1.17.7.1.m1.2.2.2.4.3.cmml">D</mi></mrow><mo id="S2.T1.17.7.1.m1.2.2.2.3" xref="S2.T1.17.7.1.m1.2.2.2.3.cmml">⁢</mo><mrow id="S2.T1.17.7.1.m1.2.2.2.2.2" xref="S2.T1.17.7.1.m1.2.2.2.2.3.cmml"><mo id="S2.T1.17.7.1.m1.2.2.2.2.2.3" stretchy="false" xref="S2.T1.17.7.1.m1.2.2.2.2.3.cmml">(</mo><msub id="S2.T1.17.7.1.m1.1.1.1.1.1.1" xref="S2.T1.17.7.1.m1.1.1.1.1.1.1.cmml"><mi id="S2.T1.17.7.1.m1.1.1.1.1.1.1.2" xref="S2.T1.17.7.1.m1.1.1.1.1.1.1.2.cmml">x</mi><mi id="S2.T1.17.7.1.m1.1.1.1.1.1.1.3" xref="S2.T1.17.7.1.m1.1.1.1.1.1.1.3.cmml">t</mi></msub><mo id="S2.T1.17.7.1.m1.2.2.2.2.2.4" xref="S2.T1.17.7.1.m1.2.2.2.2.3.cmml">,</mo><msub id="S2.T1.17.7.1.m1.2.2.2.2.2.2" xref="S2.T1.17.7.1.m1.2.2.2.2.2.2.cmml"><mover accent="true" id="S2.T1.17.7.1.m1.2.2.2.2.2.2.2" xref="S2.T1.17.7.1.m1.2.2.2.2.2.2.2.cmml"><mi id="S2.T1.17.7.1.m1.2.2.2.2.2.2.2.2" xref="S2.T1.17.7.1.m1.2.2.2.2.2.2.2.2.cmml">x</mi><mo id="S2.T1.17.7.1.m1.2.2.2.2.2.2.2.1" xref="S2.T1.17.7.1.m1.2.2.2.2.2.2.2.1.cmml">^</mo></mover><mi id="S2.T1.17.7.1.m1.2.2.2.2.2.2.3" xref="S2.T1.17.7.1.m1.2.2.2.2.2.2.3.cmml">t</mi></msub><mo id="S2.T1.17.7.1.m1.2.2.2.2.2.5" stretchy="false" xref="S2.T1.17.7.1.m1.2.2.2.2.3.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.T1.17.7.1.m1.2b"><apply id="S2.T1.17.7.1.m1.2.2.cmml" xref="S2.T1.17.7.1.m1.2.2"><plus id="S2.T1.17.7.1.m1.2.2.3.cmml" xref="S2.T1.17.7.1.m1.2.2.3"></plus><apply id="S2.T1.17.7.1.m1.2.2.4.cmml" xref="S2.T1.17.7.1.m1.2.2.4"><csymbol cd="ambiguous" id="S2.T1.17.7.1.m1.2.2.4.1.cmml" xref="S2.T1.17.7.1.m1.2.2.4">subscript</csymbol><ci id="S2.T1.17.7.1.m1.2.2.4.2.cmml" xref="S2.T1.17.7.1.m1.2.2.4.2">𝑅</ci><ci id="S2.T1.17.7.1.m1.2.2.4.3.cmml" xref="S2.T1.17.7.1.m1.2.2.4.3">𝑡</ci></apply><apply id="S2.T1.17.7.1.m1.2.2.2.cmml" xref="S2.T1.17.7.1.m1.2.2.2"><times id="S2.T1.17.7.1.m1.2.2.2.3.cmml" xref="S2.T1.17.7.1.m1.2.2.2.3"></times><apply id="S2.T1.17.7.1.m1.2.2.2.4.cmml" xref="S2.T1.17.7.1.m1.2.2.2.4"><times id="S2.T1.17.7.1.m1.2.2.2.4.1.cmml" xref="S2.T1.17.7.1.m1.2.2.2.4.1"></times><ci id="S2.T1.17.7.1.m1.2.2.2.4.2.cmml" xref="S2.T1.17.7.1.m1.2.2.2.4.2">𝜆</ci><ci id="S2.T1.17.7.1.m1.2.2.2.4.3.cmml" xref="S2.T1.17.7.1.m1.2.2.2.4.3">𝐷</ci></apply><interval closure="open" id="S2.T1.17.7.1.m1.2.2.2.2.3.cmml" xref="S2.T1.17.7.1.m1.2.2.2.2.2"><apply id="S2.T1.17.7.1.m1.1.1.1.1.1.1.cmml" xref="S2.T1.17.7.1.m1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.T1.17.7.1.m1.1.1.1.1.1.1.1.cmml" xref="S2.T1.17.7.1.m1.1.1.1.1.1.1">subscript</csymbol><ci id="S2.T1.17.7.1.m1.1.1.1.1.1.1.2.cmml" xref="S2.T1.17.7.1.m1.1.1.1.1.1.1.2">𝑥</ci><ci id="S2.T1.17.7.1.m1.1.1.1.1.1.1.3.cmml" xref="S2.T1.17.7.1.m1.1.1.1.1.1.1.3">𝑡</ci></apply><apply id="S2.T1.17.7.1.m1.2.2.2.2.2.2.cmml" xref="S2.T1.17.7.1.m1.2.2.2.2.2.2"><csymbol cd="ambiguous" id="S2.T1.17.7.1.m1.2.2.2.2.2.2.1.cmml" xref="S2.T1.17.7.1.m1.2.2.2.2.2.2">subscript</csymbol><apply id="S2.T1.17.7.1.m1.2.2.2.2.2.2.2.cmml" xref="S2.T1.17.7.1.m1.2.2.2.2.2.2.2"><ci id="S2.T1.17.7.1.m1.2.2.2.2.2.2.2.1.cmml" xref="S2.T1.17.7.1.m1.2.2.2.2.2.2.2.1">^</ci><ci id="S2.T1.17.7.1.m1.2.2.2.2.2.2.2.2.cmml" xref="S2.T1.17.7.1.m1.2.2.2.2.2.2.2.2">𝑥</ci></apply><ci id="S2.T1.17.7.1.m1.2.2.2.2.2.2.3.cmml" xref="S2.T1.17.7.1.m1.2.2.2.2.2.2.3">𝑡</ci></apply></interval></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.T1.17.7.1.m1.2c">R_{t}+\lambda\times D(x_{t},\hat{x}_{t})</annotation><annotation encoding="application/x-llamapun" id="S2.T1.17.7.1.m1.2d">italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ × italic_D ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )</annotation></semantics></math></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S2.T1.17.7.8" style="padding-left:2.8pt;padding-right:2.8pt;">5</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S2.T1.17.7.9" style="padding-left:2.8pt;padding-right:2.8pt;">2</td> </tr> </tbody> </table> </figure> <section class="ltx_subsection" id="S2.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S2.SS1.4.1.1">II-A</span> </span><span class="ltx_text ltx_font_italic" id="S2.SS1.5.2">System Overview</span> </h3> <div class="ltx_para" id="S2.SS1.p1"> <p class="ltx_p" id="S2.SS1.p1.14">Fig. <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S1.F2" title="Figure 2 ‣ I Introduction ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">2</span></a> illustrates the coding frameworks for DCVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib1" title="">1</a>]</cite> and its variants that implement conditional residual coding and masked conditional residual coding, respectively. From the high-level perspective, DCVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib1" title="">1</a>]</cite> includes a motion-coding module (in green color) and a conditional inter-frame coding module (in blue color). To encode a video frame <math alttext="x_{t}\in\mathbb{R}^{3\times H\times W}" class="ltx_Math" display="inline" id="S2.SS1.p1.1.m1.1"><semantics id="S2.SS1.p1.1.m1.1a"><mrow id="S2.SS1.p1.1.m1.1.1" xref="S2.SS1.p1.1.m1.1.1.cmml"><msub id="S2.SS1.p1.1.m1.1.1.2" xref="S2.SS1.p1.1.m1.1.1.2.cmml"><mi id="S2.SS1.p1.1.m1.1.1.2.2" xref="S2.SS1.p1.1.m1.1.1.2.2.cmml">x</mi><mi id="S2.SS1.p1.1.m1.1.1.2.3" xref="S2.SS1.p1.1.m1.1.1.2.3.cmml">t</mi></msub><mo id="S2.SS1.p1.1.m1.1.1.1" xref="S2.SS1.p1.1.m1.1.1.1.cmml">∈</mo><msup id="S2.SS1.p1.1.m1.1.1.3" xref="S2.SS1.p1.1.m1.1.1.3.cmml"><mi id="S2.SS1.p1.1.m1.1.1.3.2" xref="S2.SS1.p1.1.m1.1.1.3.2.cmml">ℝ</mi><mrow id="S2.SS1.p1.1.m1.1.1.3.3" xref="S2.SS1.p1.1.m1.1.1.3.3.cmml"><mn id="S2.SS1.p1.1.m1.1.1.3.3.2" xref="S2.SS1.p1.1.m1.1.1.3.3.2.cmml">3</mn><mo id="S2.SS1.p1.1.m1.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS1.p1.1.m1.1.1.3.3.1.cmml">×</mo><mi id="S2.SS1.p1.1.m1.1.1.3.3.3" xref="S2.SS1.p1.1.m1.1.1.3.3.3.cmml">H</mi><mo id="S2.SS1.p1.1.m1.1.1.3.3.1a" lspace="0.222em" rspace="0.222em" xref="S2.SS1.p1.1.m1.1.1.3.3.1.cmml">×</mo><mi id="S2.SS1.p1.1.m1.1.1.3.3.4" xref="S2.SS1.p1.1.m1.1.1.3.3.4.cmml">W</mi></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.1.m1.1b"><apply id="S2.SS1.p1.1.m1.1.1.cmml" xref="S2.SS1.p1.1.m1.1.1"><in id="S2.SS1.p1.1.m1.1.1.1.cmml" xref="S2.SS1.p1.1.m1.1.1.1"></in><apply id="S2.SS1.p1.1.m1.1.1.2.cmml" xref="S2.SS1.p1.1.m1.1.1.2"><csymbol cd="ambiguous" id="S2.SS1.p1.1.m1.1.1.2.1.cmml" xref="S2.SS1.p1.1.m1.1.1.2">subscript</csymbol><ci id="S2.SS1.p1.1.m1.1.1.2.2.cmml" xref="S2.SS1.p1.1.m1.1.1.2.2">𝑥</ci><ci id="S2.SS1.p1.1.m1.1.1.2.3.cmml" xref="S2.SS1.p1.1.m1.1.1.2.3">𝑡</ci></apply><apply id="S2.SS1.p1.1.m1.1.1.3.cmml" xref="S2.SS1.p1.1.m1.1.1.3"><csymbol cd="ambiguous" id="S2.SS1.p1.1.m1.1.1.3.1.cmml" xref="S2.SS1.p1.1.m1.1.1.3">superscript</csymbol><ci id="S2.SS1.p1.1.m1.1.1.3.2.cmml" xref="S2.SS1.p1.1.m1.1.1.3.2">ℝ</ci><apply id="S2.SS1.p1.1.m1.1.1.3.3.cmml" xref="S2.SS1.p1.1.m1.1.1.3.3"><times id="S2.SS1.p1.1.m1.1.1.3.3.1.cmml" xref="S2.SS1.p1.1.m1.1.1.3.3.1"></times><cn id="S2.SS1.p1.1.m1.1.1.3.3.2.cmml" type="integer" xref="S2.SS1.p1.1.m1.1.1.3.3.2">3</cn><ci id="S2.SS1.p1.1.m1.1.1.3.3.3.cmml" xref="S2.SS1.p1.1.m1.1.1.3.3.3">𝐻</ci><ci id="S2.SS1.p1.1.m1.1.1.3.3.4.cmml" xref="S2.SS1.p1.1.m1.1.1.3.3.4">𝑊</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.1.m1.1c">x_{t}\in\mathbb{R}^{3\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.1.m1.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 × italic_H × italic_W end_POSTSUPERSCRIPT</annotation></semantics></math> of width <math alttext="W" class="ltx_Math" display="inline" id="S2.SS1.p1.2.m2.1"><semantics id="S2.SS1.p1.2.m2.1a"><mi id="S2.SS1.p1.2.m2.1.1" xref="S2.SS1.p1.2.m2.1.1.cmml">W</mi><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.2.m2.1b"><ci id="S2.SS1.p1.2.m2.1.1.cmml" xref="S2.SS1.p1.2.m2.1.1">𝑊</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.2.m2.1c">W</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.2.m2.1d">italic_W</annotation></semantics></math> and height <math alttext="H" class="ltx_Math" display="inline" id="S2.SS1.p1.3.m3.1"><semantics id="S2.SS1.p1.3.m3.1a"><mi id="S2.SS1.p1.3.m3.1.1" xref="S2.SS1.p1.3.m3.1.1.cmml">H</mi><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.3.m3.1b"><ci id="S2.SS1.p1.3.m3.1.1.cmml" xref="S2.SS1.p1.3.m3.1.1">𝐻</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.3.m3.1c">H</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.3.m3.1d">italic_H</annotation></semantics></math>, an optical flow map <math alttext="f_{t}\in\mathbb{R}^{2\times H\times W}" class="ltx_Math" display="inline" id="S2.SS1.p1.4.m4.1"><semantics id="S2.SS1.p1.4.m4.1a"><mrow id="S2.SS1.p1.4.m4.1.1" xref="S2.SS1.p1.4.m4.1.1.cmml"><msub id="S2.SS1.p1.4.m4.1.1.2" xref="S2.SS1.p1.4.m4.1.1.2.cmml"><mi id="S2.SS1.p1.4.m4.1.1.2.2" xref="S2.SS1.p1.4.m4.1.1.2.2.cmml">f</mi><mi id="S2.SS1.p1.4.m4.1.1.2.3" xref="S2.SS1.p1.4.m4.1.1.2.3.cmml">t</mi></msub><mo id="S2.SS1.p1.4.m4.1.1.1" xref="S2.SS1.p1.4.m4.1.1.1.cmml">∈</mo><msup id="S2.SS1.p1.4.m4.1.1.3" xref="S2.SS1.p1.4.m4.1.1.3.cmml"><mi id="S2.SS1.p1.4.m4.1.1.3.2" xref="S2.SS1.p1.4.m4.1.1.3.2.cmml">ℝ</mi><mrow id="S2.SS1.p1.4.m4.1.1.3.3" xref="S2.SS1.p1.4.m4.1.1.3.3.cmml"><mn id="S2.SS1.p1.4.m4.1.1.3.3.2" xref="S2.SS1.p1.4.m4.1.1.3.3.2.cmml">2</mn><mo id="S2.SS1.p1.4.m4.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS1.p1.4.m4.1.1.3.3.1.cmml">×</mo><mi id="S2.SS1.p1.4.m4.1.1.3.3.3" xref="S2.SS1.p1.4.m4.1.1.3.3.3.cmml">H</mi><mo id="S2.SS1.p1.4.m4.1.1.3.3.1a" lspace="0.222em" rspace="0.222em" xref="S2.SS1.p1.4.m4.1.1.3.3.1.cmml">×</mo><mi id="S2.SS1.p1.4.m4.1.1.3.3.4" xref="S2.SS1.p1.4.m4.1.1.3.3.4.cmml">W</mi></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.4.m4.1b"><apply id="S2.SS1.p1.4.m4.1.1.cmml" xref="S2.SS1.p1.4.m4.1.1"><in id="S2.SS1.p1.4.m4.1.1.1.cmml" xref="S2.SS1.p1.4.m4.1.1.1"></in><apply id="S2.SS1.p1.4.m4.1.1.2.cmml" xref="S2.SS1.p1.4.m4.1.1.2"><csymbol cd="ambiguous" id="S2.SS1.p1.4.m4.1.1.2.1.cmml" xref="S2.SS1.p1.4.m4.1.1.2">subscript</csymbol><ci id="S2.SS1.p1.4.m4.1.1.2.2.cmml" xref="S2.SS1.p1.4.m4.1.1.2.2">𝑓</ci><ci id="S2.SS1.p1.4.m4.1.1.2.3.cmml" xref="S2.SS1.p1.4.m4.1.1.2.3">𝑡</ci></apply><apply id="S2.SS1.p1.4.m4.1.1.3.cmml" xref="S2.SS1.p1.4.m4.1.1.3"><csymbol cd="ambiguous" id="S2.SS1.p1.4.m4.1.1.3.1.cmml" xref="S2.SS1.p1.4.m4.1.1.3">superscript</csymbol><ci id="S2.SS1.p1.4.m4.1.1.3.2.cmml" xref="S2.SS1.p1.4.m4.1.1.3.2">ℝ</ci><apply id="S2.SS1.p1.4.m4.1.1.3.3.cmml" xref="S2.SS1.p1.4.m4.1.1.3.3"><times id="S2.SS1.p1.4.m4.1.1.3.3.1.cmml" xref="S2.SS1.p1.4.m4.1.1.3.3.1"></times><cn id="S2.SS1.p1.4.m4.1.1.3.3.2.cmml" type="integer" xref="S2.SS1.p1.4.m4.1.1.3.3.2">2</cn><ci id="S2.SS1.p1.4.m4.1.1.3.3.3.cmml" xref="S2.SS1.p1.4.m4.1.1.3.3.3">𝐻</ci><ci id="S2.SS1.p1.4.m4.1.1.3.3.4.cmml" xref="S2.SS1.p1.4.m4.1.1.3.3.4">𝑊</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.4.m4.1c">f_{t}\in\mathbb{R}^{2\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.4.m4.1d">italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 × italic_H × italic_W end_POSTSUPERSCRIPT</annotation></semantics></math> characterizing the motion between <math alttext="x_{t}" class="ltx_Math" display="inline" id="S2.SS1.p1.5.m5.1"><semantics id="S2.SS1.p1.5.m5.1a"><msub id="S2.SS1.p1.5.m5.1.1" xref="S2.SS1.p1.5.m5.1.1.cmml"><mi id="S2.SS1.p1.5.m5.1.1.2" xref="S2.SS1.p1.5.m5.1.1.2.cmml">x</mi><mi id="S2.SS1.p1.5.m5.1.1.3" xref="S2.SS1.p1.5.m5.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.5.m5.1b"><apply id="S2.SS1.p1.5.m5.1.1.cmml" xref="S2.SS1.p1.5.m5.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.5.m5.1.1.1.cmml" xref="S2.SS1.p1.5.m5.1.1">subscript</csymbol><ci id="S2.SS1.p1.5.m5.1.1.2.cmml" xref="S2.SS1.p1.5.m5.1.1.2">𝑥</ci><ci id="S2.SS1.p1.5.m5.1.1.3.cmml" xref="S2.SS1.p1.5.m5.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.5.m5.1c">x_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.5.m5.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> and its reference frame <math alttext="\hat{x}_{t-1}\in\mathbb{R}^{3\times H\times W}" class="ltx_Math" display="inline" id="S2.SS1.p1.6.m6.1"><semantics id="S2.SS1.p1.6.m6.1a"><mrow id="S2.SS1.p1.6.m6.1.1" xref="S2.SS1.p1.6.m6.1.1.cmml"><msub id="S2.SS1.p1.6.m6.1.1.2" xref="S2.SS1.p1.6.m6.1.1.2.cmml"><mover accent="true" id="S2.SS1.p1.6.m6.1.1.2.2" xref="S2.SS1.p1.6.m6.1.1.2.2.cmml"><mi id="S2.SS1.p1.6.m6.1.1.2.2.2" xref="S2.SS1.p1.6.m6.1.1.2.2.2.cmml">x</mi><mo id="S2.SS1.p1.6.m6.1.1.2.2.1" xref="S2.SS1.p1.6.m6.1.1.2.2.1.cmml">^</mo></mover><mrow id="S2.SS1.p1.6.m6.1.1.2.3" xref="S2.SS1.p1.6.m6.1.1.2.3.cmml"><mi id="S2.SS1.p1.6.m6.1.1.2.3.2" xref="S2.SS1.p1.6.m6.1.1.2.3.2.cmml">t</mi><mo id="S2.SS1.p1.6.m6.1.1.2.3.1" xref="S2.SS1.p1.6.m6.1.1.2.3.1.cmml">−</mo><mn id="S2.SS1.p1.6.m6.1.1.2.3.3" xref="S2.SS1.p1.6.m6.1.1.2.3.3.cmml">1</mn></mrow></msub><mo id="S2.SS1.p1.6.m6.1.1.1" xref="S2.SS1.p1.6.m6.1.1.1.cmml">∈</mo><msup id="S2.SS1.p1.6.m6.1.1.3" xref="S2.SS1.p1.6.m6.1.1.3.cmml"><mi id="S2.SS1.p1.6.m6.1.1.3.2" xref="S2.SS1.p1.6.m6.1.1.3.2.cmml">ℝ</mi><mrow id="S2.SS1.p1.6.m6.1.1.3.3" xref="S2.SS1.p1.6.m6.1.1.3.3.cmml"><mn id="S2.SS1.p1.6.m6.1.1.3.3.2" xref="S2.SS1.p1.6.m6.1.1.3.3.2.cmml">3</mn><mo id="S2.SS1.p1.6.m6.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS1.p1.6.m6.1.1.3.3.1.cmml">×</mo><mi id="S2.SS1.p1.6.m6.1.1.3.3.3" xref="S2.SS1.p1.6.m6.1.1.3.3.3.cmml">H</mi><mo id="S2.SS1.p1.6.m6.1.1.3.3.1a" lspace="0.222em" rspace="0.222em" xref="S2.SS1.p1.6.m6.1.1.3.3.1.cmml">×</mo><mi id="S2.SS1.p1.6.m6.1.1.3.3.4" xref="S2.SS1.p1.6.m6.1.1.3.3.4.cmml">W</mi></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.6.m6.1b"><apply id="S2.SS1.p1.6.m6.1.1.cmml" xref="S2.SS1.p1.6.m6.1.1"><in id="S2.SS1.p1.6.m6.1.1.1.cmml" xref="S2.SS1.p1.6.m6.1.1.1"></in><apply id="S2.SS1.p1.6.m6.1.1.2.cmml" xref="S2.SS1.p1.6.m6.1.1.2"><csymbol cd="ambiguous" id="S2.SS1.p1.6.m6.1.1.2.1.cmml" xref="S2.SS1.p1.6.m6.1.1.2">subscript</csymbol><apply id="S2.SS1.p1.6.m6.1.1.2.2.cmml" xref="S2.SS1.p1.6.m6.1.1.2.2"><ci id="S2.SS1.p1.6.m6.1.1.2.2.1.cmml" xref="S2.SS1.p1.6.m6.1.1.2.2.1">^</ci><ci id="S2.SS1.p1.6.m6.1.1.2.2.2.cmml" xref="S2.SS1.p1.6.m6.1.1.2.2.2">𝑥</ci></apply><apply id="S2.SS1.p1.6.m6.1.1.2.3.cmml" xref="S2.SS1.p1.6.m6.1.1.2.3"><minus id="S2.SS1.p1.6.m6.1.1.2.3.1.cmml" xref="S2.SS1.p1.6.m6.1.1.2.3.1"></minus><ci id="S2.SS1.p1.6.m6.1.1.2.3.2.cmml" xref="S2.SS1.p1.6.m6.1.1.2.3.2">𝑡</ci><cn id="S2.SS1.p1.6.m6.1.1.2.3.3.cmml" type="integer" xref="S2.SS1.p1.6.m6.1.1.2.3.3">1</cn></apply></apply><apply id="S2.SS1.p1.6.m6.1.1.3.cmml" xref="S2.SS1.p1.6.m6.1.1.3"><csymbol cd="ambiguous" id="S2.SS1.p1.6.m6.1.1.3.1.cmml" xref="S2.SS1.p1.6.m6.1.1.3">superscript</csymbol><ci id="S2.SS1.p1.6.m6.1.1.3.2.cmml" xref="S2.SS1.p1.6.m6.1.1.3.2">ℝ</ci><apply id="S2.SS1.p1.6.m6.1.1.3.3.cmml" xref="S2.SS1.p1.6.m6.1.1.3.3"><times id="S2.SS1.p1.6.m6.1.1.3.3.1.cmml" xref="S2.SS1.p1.6.m6.1.1.3.3.1"></times><cn id="S2.SS1.p1.6.m6.1.1.3.3.2.cmml" type="integer" xref="S2.SS1.p1.6.m6.1.1.3.3.2">3</cn><ci id="S2.SS1.p1.6.m6.1.1.3.3.3.cmml" xref="S2.SS1.p1.6.m6.1.1.3.3.3">𝐻</ci><ci id="S2.SS1.p1.6.m6.1.1.3.3.4.cmml" xref="S2.SS1.p1.6.m6.1.1.3.3.4">𝑊</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.6.m6.1c">\hat{x}_{t-1}\in\mathbb{R}^{3\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.6.m6.1d">over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 × italic_H × italic_W end_POSTSUPERSCRIPT</annotation></semantics></math> is estimated by a motion estimation network and compressed by the motion codec <math alttext="\{F^{enc},F^{dec}\}" class="ltx_Math" display="inline" id="S2.SS1.p1.7.m7.2"><semantics id="S2.SS1.p1.7.m7.2a"><mrow id="S2.SS1.p1.7.m7.2.2.2" xref="S2.SS1.p1.7.m7.2.2.3.cmml"><mo id="S2.SS1.p1.7.m7.2.2.2.3" stretchy="false" xref="S2.SS1.p1.7.m7.2.2.3.cmml">{</mo><msup id="S2.SS1.p1.7.m7.1.1.1.1" xref="S2.SS1.p1.7.m7.1.1.1.1.cmml"><mi id="S2.SS1.p1.7.m7.1.1.1.1.2" xref="S2.SS1.p1.7.m7.1.1.1.1.2.cmml">F</mi><mrow id="S2.SS1.p1.7.m7.1.1.1.1.3" xref="S2.SS1.p1.7.m7.1.1.1.1.3.cmml"><mi id="S2.SS1.p1.7.m7.1.1.1.1.3.2" xref="S2.SS1.p1.7.m7.1.1.1.1.3.2.cmml">e</mi><mo id="S2.SS1.p1.7.m7.1.1.1.1.3.1" xref="S2.SS1.p1.7.m7.1.1.1.1.3.1.cmml">⁢</mo><mi id="S2.SS1.p1.7.m7.1.1.1.1.3.3" xref="S2.SS1.p1.7.m7.1.1.1.1.3.3.cmml">n</mi><mo id="S2.SS1.p1.7.m7.1.1.1.1.3.1a" xref="S2.SS1.p1.7.m7.1.1.1.1.3.1.cmml">⁢</mo><mi id="S2.SS1.p1.7.m7.1.1.1.1.3.4" xref="S2.SS1.p1.7.m7.1.1.1.1.3.4.cmml">c</mi></mrow></msup><mo id="S2.SS1.p1.7.m7.2.2.2.4" xref="S2.SS1.p1.7.m7.2.2.3.cmml">,</mo><msup id="S2.SS1.p1.7.m7.2.2.2.2" xref="S2.SS1.p1.7.m7.2.2.2.2.cmml"><mi id="S2.SS1.p1.7.m7.2.2.2.2.2" xref="S2.SS1.p1.7.m7.2.2.2.2.2.cmml">F</mi><mrow id="S2.SS1.p1.7.m7.2.2.2.2.3" xref="S2.SS1.p1.7.m7.2.2.2.2.3.cmml"><mi id="S2.SS1.p1.7.m7.2.2.2.2.3.2" xref="S2.SS1.p1.7.m7.2.2.2.2.3.2.cmml">d</mi><mo id="S2.SS1.p1.7.m7.2.2.2.2.3.1" xref="S2.SS1.p1.7.m7.2.2.2.2.3.1.cmml">⁢</mo><mi id="S2.SS1.p1.7.m7.2.2.2.2.3.3" xref="S2.SS1.p1.7.m7.2.2.2.2.3.3.cmml">e</mi><mo id="S2.SS1.p1.7.m7.2.2.2.2.3.1a" xref="S2.SS1.p1.7.m7.2.2.2.2.3.1.cmml">⁢</mo><mi id="S2.SS1.p1.7.m7.2.2.2.2.3.4" xref="S2.SS1.p1.7.m7.2.2.2.2.3.4.cmml">c</mi></mrow></msup><mo id="S2.SS1.p1.7.m7.2.2.2.5" stretchy="false" xref="S2.SS1.p1.7.m7.2.2.3.cmml">}</mo></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.7.m7.2b"><set id="S2.SS1.p1.7.m7.2.2.3.cmml" xref="S2.SS1.p1.7.m7.2.2.2"><apply id="S2.SS1.p1.7.m7.1.1.1.1.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.7.m7.1.1.1.1.1.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1">superscript</csymbol><ci id="S2.SS1.p1.7.m7.1.1.1.1.2.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1.2">𝐹</ci><apply id="S2.SS1.p1.7.m7.1.1.1.1.3.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1.3"><times id="S2.SS1.p1.7.m7.1.1.1.1.3.1.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1.3.1"></times><ci id="S2.SS1.p1.7.m7.1.1.1.1.3.2.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1.3.2">𝑒</ci><ci id="S2.SS1.p1.7.m7.1.1.1.1.3.3.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1.3.3">𝑛</ci><ci id="S2.SS1.p1.7.m7.1.1.1.1.3.4.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1.3.4">𝑐</ci></apply></apply><apply id="S2.SS1.p1.7.m7.2.2.2.2.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS1.p1.7.m7.2.2.2.2.1.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2">superscript</csymbol><ci id="S2.SS1.p1.7.m7.2.2.2.2.2.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2.2">𝐹</ci><apply id="S2.SS1.p1.7.m7.2.2.2.2.3.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2.3"><times id="S2.SS1.p1.7.m7.2.2.2.2.3.1.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2.3.1"></times><ci id="S2.SS1.p1.7.m7.2.2.2.2.3.2.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2.3.2">𝑑</ci><ci id="S2.SS1.p1.7.m7.2.2.2.2.3.3.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2.3.3">𝑒</ci><ci id="S2.SS1.p1.7.m7.2.2.2.2.3.4.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2.3.4">𝑐</ci></apply></apply></set></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.7.m7.2c">\{F^{enc},F^{dec}\}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.7.m7.2d">{ italic_F start_POSTSUPERSCRIPT italic_e italic_n italic_c end_POSTSUPERSCRIPT , italic_F start_POSTSUPERSCRIPT italic_d italic_e italic_c end_POSTSUPERSCRIPT }</annotation></semantics></math>. The decoded optical flow map <math alttext="\hat{f}_{t}" class="ltx_Math" display="inline" id="S2.SS1.p1.8.m8.1"><semantics id="S2.SS1.p1.8.m8.1a"><msub id="S2.SS1.p1.8.m8.1.1" xref="S2.SS1.p1.8.m8.1.1.cmml"><mover accent="true" id="S2.SS1.p1.8.m8.1.1.2" xref="S2.SS1.p1.8.m8.1.1.2.cmml"><mi id="S2.SS1.p1.8.m8.1.1.2.2" xref="S2.SS1.p1.8.m8.1.1.2.2.cmml">f</mi><mo id="S2.SS1.p1.8.m8.1.1.2.1" xref="S2.SS1.p1.8.m8.1.1.2.1.cmml">^</mo></mover><mi id="S2.SS1.p1.8.m8.1.1.3" xref="S2.SS1.p1.8.m8.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.8.m8.1b"><apply id="S2.SS1.p1.8.m8.1.1.cmml" xref="S2.SS1.p1.8.m8.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.8.m8.1.1.1.cmml" xref="S2.SS1.p1.8.m8.1.1">subscript</csymbol><apply id="S2.SS1.p1.8.m8.1.1.2.cmml" xref="S2.SS1.p1.8.m8.1.1.2"><ci id="S2.SS1.p1.8.m8.1.1.2.1.cmml" xref="S2.SS1.p1.8.m8.1.1.2.1">^</ci><ci id="S2.SS1.p1.8.m8.1.1.2.2.cmml" xref="S2.SS1.p1.8.m8.1.1.2.2">𝑓</ci></apply><ci id="S2.SS1.p1.8.m8.1.1.3.cmml" xref="S2.SS1.p1.8.m8.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.8.m8.1c">\hat{f}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.8.m8.1d">over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> is then utilized to warp the features <math alttext="\tilde{x}_{t-1}\in\mathbb{R}^{C\times H\times W}" class="ltx_Math" display="inline" id="S2.SS1.p1.9.m9.1"><semantics id="S2.SS1.p1.9.m9.1a"><mrow id="S2.SS1.p1.9.m9.1.1" xref="S2.SS1.p1.9.m9.1.1.cmml"><msub id="S2.SS1.p1.9.m9.1.1.2" xref="S2.SS1.p1.9.m9.1.1.2.cmml"><mover accent="true" id="S2.SS1.p1.9.m9.1.1.2.2" xref="S2.SS1.p1.9.m9.1.1.2.2.cmml"><mi id="S2.SS1.p1.9.m9.1.1.2.2.2" xref="S2.SS1.p1.9.m9.1.1.2.2.2.cmml">x</mi><mo id="S2.SS1.p1.9.m9.1.1.2.2.1" xref="S2.SS1.p1.9.m9.1.1.2.2.1.cmml">~</mo></mover><mrow id="S2.SS1.p1.9.m9.1.1.2.3" xref="S2.SS1.p1.9.m9.1.1.2.3.cmml"><mi id="S2.SS1.p1.9.m9.1.1.2.3.2" xref="S2.SS1.p1.9.m9.1.1.2.3.2.cmml">t</mi><mo id="S2.SS1.p1.9.m9.1.1.2.3.1" xref="S2.SS1.p1.9.m9.1.1.2.3.1.cmml">−</mo><mn id="S2.SS1.p1.9.m9.1.1.2.3.3" xref="S2.SS1.p1.9.m9.1.1.2.3.3.cmml">1</mn></mrow></msub><mo id="S2.SS1.p1.9.m9.1.1.1" xref="S2.SS1.p1.9.m9.1.1.1.cmml">∈</mo><msup id="S2.SS1.p1.9.m9.1.1.3" xref="S2.SS1.p1.9.m9.1.1.3.cmml"><mi id="S2.SS1.p1.9.m9.1.1.3.2" xref="S2.SS1.p1.9.m9.1.1.3.2.cmml">ℝ</mi><mrow id="S2.SS1.p1.9.m9.1.1.3.3" xref="S2.SS1.p1.9.m9.1.1.3.3.cmml"><mi id="S2.SS1.p1.9.m9.1.1.3.3.2" xref="S2.SS1.p1.9.m9.1.1.3.3.2.cmml">C</mi><mo id="S2.SS1.p1.9.m9.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS1.p1.9.m9.1.1.3.3.1.cmml">×</mo><mi id="S2.SS1.p1.9.m9.1.1.3.3.3" xref="S2.SS1.p1.9.m9.1.1.3.3.3.cmml">H</mi><mo id="S2.SS1.p1.9.m9.1.1.3.3.1a" lspace="0.222em" rspace="0.222em" xref="S2.SS1.p1.9.m9.1.1.3.3.1.cmml">×</mo><mi id="S2.SS1.p1.9.m9.1.1.3.3.4" xref="S2.SS1.p1.9.m9.1.1.3.3.4.cmml">W</mi></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.9.m9.1b"><apply id="S2.SS1.p1.9.m9.1.1.cmml" xref="S2.SS1.p1.9.m9.1.1"><in id="S2.SS1.p1.9.m9.1.1.1.cmml" xref="S2.SS1.p1.9.m9.1.1.1"></in><apply id="S2.SS1.p1.9.m9.1.1.2.cmml" xref="S2.SS1.p1.9.m9.1.1.2"><csymbol cd="ambiguous" id="S2.SS1.p1.9.m9.1.1.2.1.cmml" xref="S2.SS1.p1.9.m9.1.1.2">subscript</csymbol><apply id="S2.SS1.p1.9.m9.1.1.2.2.cmml" xref="S2.SS1.p1.9.m9.1.1.2.2"><ci id="S2.SS1.p1.9.m9.1.1.2.2.1.cmml" xref="S2.SS1.p1.9.m9.1.1.2.2.1">~</ci><ci id="S2.SS1.p1.9.m9.1.1.2.2.2.cmml" xref="S2.SS1.p1.9.m9.1.1.2.2.2">𝑥</ci></apply><apply id="S2.SS1.p1.9.m9.1.1.2.3.cmml" xref="S2.SS1.p1.9.m9.1.1.2.3"><minus id="S2.SS1.p1.9.m9.1.1.2.3.1.cmml" xref="S2.SS1.p1.9.m9.1.1.2.3.1"></minus><ci id="S2.SS1.p1.9.m9.1.1.2.3.2.cmml" xref="S2.SS1.p1.9.m9.1.1.2.3.2">𝑡</ci><cn id="S2.SS1.p1.9.m9.1.1.2.3.3.cmml" type="integer" xref="S2.SS1.p1.9.m9.1.1.2.3.3">1</cn></apply></apply><apply id="S2.SS1.p1.9.m9.1.1.3.cmml" xref="S2.SS1.p1.9.m9.1.1.3"><csymbol cd="ambiguous" id="S2.SS1.p1.9.m9.1.1.3.1.cmml" xref="S2.SS1.p1.9.m9.1.1.3">superscript</csymbol><ci id="S2.SS1.p1.9.m9.1.1.3.2.cmml" xref="S2.SS1.p1.9.m9.1.1.3.2">ℝ</ci><apply id="S2.SS1.p1.9.m9.1.1.3.3.cmml" xref="S2.SS1.p1.9.m9.1.1.3.3"><times id="S2.SS1.p1.9.m9.1.1.3.3.1.cmml" xref="S2.SS1.p1.9.m9.1.1.3.3.1"></times><ci id="S2.SS1.p1.9.m9.1.1.3.3.2.cmml" xref="S2.SS1.p1.9.m9.1.1.3.3.2">𝐶</ci><ci id="S2.SS1.p1.9.m9.1.1.3.3.3.cmml" xref="S2.SS1.p1.9.m9.1.1.3.3.3">𝐻</ci><ci id="S2.SS1.p1.9.m9.1.1.3.3.4.cmml" xref="S2.SS1.p1.9.m9.1.1.3.3.4">𝑊</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.9.m9.1c">\tilde{x}_{t-1}\in\mathbb{R}^{C\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.9.m9.1d">over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_H × italic_W end_POSTSUPERSCRIPT</annotation></semantics></math> of <math alttext="\hat{x}_{t-1}" class="ltx_Math" display="inline" id="S2.SS1.p1.10.m10.1"><semantics id="S2.SS1.p1.10.m10.1a"><msub id="S2.SS1.p1.10.m10.1.1" xref="S2.SS1.p1.10.m10.1.1.cmml"><mover accent="true" id="S2.SS1.p1.10.m10.1.1.2" xref="S2.SS1.p1.10.m10.1.1.2.cmml"><mi id="S2.SS1.p1.10.m10.1.1.2.2" xref="S2.SS1.p1.10.m10.1.1.2.2.cmml">x</mi><mo id="S2.SS1.p1.10.m10.1.1.2.1" xref="S2.SS1.p1.10.m10.1.1.2.1.cmml">^</mo></mover><mrow id="S2.SS1.p1.10.m10.1.1.3" xref="S2.SS1.p1.10.m10.1.1.3.cmml"><mi id="S2.SS1.p1.10.m10.1.1.3.2" xref="S2.SS1.p1.10.m10.1.1.3.2.cmml">t</mi><mo id="S2.SS1.p1.10.m10.1.1.3.1" xref="S2.SS1.p1.10.m10.1.1.3.1.cmml">−</mo><mn id="S2.SS1.p1.10.m10.1.1.3.3" xref="S2.SS1.p1.10.m10.1.1.3.3.cmml">1</mn></mrow></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.10.m10.1b"><apply id="S2.SS1.p1.10.m10.1.1.cmml" xref="S2.SS1.p1.10.m10.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.10.m10.1.1.1.cmml" xref="S2.SS1.p1.10.m10.1.1">subscript</csymbol><apply id="S2.SS1.p1.10.m10.1.1.2.cmml" xref="S2.SS1.p1.10.m10.1.1.2"><ci id="S2.SS1.p1.10.m10.1.1.2.1.cmml" xref="S2.SS1.p1.10.m10.1.1.2.1">^</ci><ci id="S2.SS1.p1.10.m10.1.1.2.2.cmml" xref="S2.SS1.p1.10.m10.1.1.2.2">𝑥</ci></apply><apply id="S2.SS1.p1.10.m10.1.1.3.cmml" xref="S2.SS1.p1.10.m10.1.1.3"><minus id="S2.SS1.p1.10.m10.1.1.3.1.cmml" xref="S2.SS1.p1.10.m10.1.1.3.1"></minus><ci id="S2.SS1.p1.10.m10.1.1.3.2.cmml" xref="S2.SS1.p1.10.m10.1.1.3.2">𝑡</ci><cn id="S2.SS1.p1.10.m10.1.1.3.3.cmml" type="integer" xref="S2.SS1.p1.10.m10.1.1.3.3">1</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.10.m10.1c">\hat{x}_{t-1}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.10.m10.1d">over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT</annotation></semantics></math>, where <math alttext="C" class="ltx_Math" display="inline" id="S2.SS1.p1.11.m11.1"><semantics id="S2.SS1.p1.11.m11.1a"><mi id="S2.SS1.p1.11.m11.1.1" xref="S2.SS1.p1.11.m11.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.11.m11.1b"><ci id="S2.SS1.p1.11.m11.1.1.cmml" xref="S2.SS1.p1.11.m11.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.11.m11.1c">C</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.11.m11.1d">italic_C</annotation></semantics></math> is the channel size, to obtain a temporal predictor <math alttext="x_{c}\in\mathbb{R}^{C\times H\times W}" class="ltx_Math" display="inline" id="S2.SS1.p1.12.m12.1"><semantics id="S2.SS1.p1.12.m12.1a"><mrow id="S2.SS1.p1.12.m12.1.1" xref="S2.SS1.p1.12.m12.1.1.cmml"><msub id="S2.SS1.p1.12.m12.1.1.2" xref="S2.SS1.p1.12.m12.1.1.2.cmml"><mi id="S2.SS1.p1.12.m12.1.1.2.2" xref="S2.SS1.p1.12.m12.1.1.2.2.cmml">x</mi><mi id="S2.SS1.p1.12.m12.1.1.2.3" xref="S2.SS1.p1.12.m12.1.1.2.3.cmml">c</mi></msub><mo id="S2.SS1.p1.12.m12.1.1.1" xref="S2.SS1.p1.12.m12.1.1.1.cmml">∈</mo><msup id="S2.SS1.p1.12.m12.1.1.3" xref="S2.SS1.p1.12.m12.1.1.3.cmml"><mi id="S2.SS1.p1.12.m12.1.1.3.2" xref="S2.SS1.p1.12.m12.1.1.3.2.cmml">ℝ</mi><mrow id="S2.SS1.p1.12.m12.1.1.3.3" xref="S2.SS1.p1.12.m12.1.1.3.3.cmml"><mi id="S2.SS1.p1.12.m12.1.1.3.3.2" xref="S2.SS1.p1.12.m12.1.1.3.3.2.cmml">C</mi><mo id="S2.SS1.p1.12.m12.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS1.p1.12.m12.1.1.3.3.1.cmml">×</mo><mi id="S2.SS1.p1.12.m12.1.1.3.3.3" xref="S2.SS1.p1.12.m12.1.1.3.3.3.cmml">H</mi><mo id="S2.SS1.p1.12.m12.1.1.3.3.1a" lspace="0.222em" rspace="0.222em" xref="S2.SS1.p1.12.m12.1.1.3.3.1.cmml">×</mo><mi id="S2.SS1.p1.12.m12.1.1.3.3.4" xref="S2.SS1.p1.12.m12.1.1.3.3.4.cmml">W</mi></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.12.m12.1b"><apply id="S2.SS1.p1.12.m12.1.1.cmml" xref="S2.SS1.p1.12.m12.1.1"><in id="S2.SS1.p1.12.m12.1.1.1.cmml" xref="S2.SS1.p1.12.m12.1.1.1"></in><apply id="S2.SS1.p1.12.m12.1.1.2.cmml" xref="S2.SS1.p1.12.m12.1.1.2"><csymbol cd="ambiguous" id="S2.SS1.p1.12.m12.1.1.2.1.cmml" xref="S2.SS1.p1.12.m12.1.1.2">subscript</csymbol><ci id="S2.SS1.p1.12.m12.1.1.2.2.cmml" xref="S2.SS1.p1.12.m12.1.1.2.2">𝑥</ci><ci id="S2.SS1.p1.12.m12.1.1.2.3.cmml" xref="S2.SS1.p1.12.m12.1.1.2.3">𝑐</ci></apply><apply id="S2.SS1.p1.12.m12.1.1.3.cmml" xref="S2.SS1.p1.12.m12.1.1.3"><csymbol cd="ambiguous" id="S2.SS1.p1.12.m12.1.1.3.1.cmml" xref="S2.SS1.p1.12.m12.1.1.3">superscript</csymbol><ci id="S2.SS1.p1.12.m12.1.1.3.2.cmml" xref="S2.SS1.p1.12.m12.1.1.3.2">ℝ</ci><apply id="S2.SS1.p1.12.m12.1.1.3.3.cmml" xref="S2.SS1.p1.12.m12.1.1.3.3"><times id="S2.SS1.p1.12.m12.1.1.3.3.1.cmml" xref="S2.SS1.p1.12.m12.1.1.3.3.1"></times><ci id="S2.SS1.p1.12.m12.1.1.3.3.2.cmml" xref="S2.SS1.p1.12.m12.1.1.3.3.2">𝐶</ci><ci id="S2.SS1.p1.12.m12.1.1.3.3.3.cmml" xref="S2.SS1.p1.12.m12.1.1.3.3.3">𝐻</ci><ci id="S2.SS1.p1.12.m12.1.1.3.3.4.cmml" xref="S2.SS1.p1.12.m12.1.1.3.3.4">𝑊</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.12.m12.1c">x_{c}\in\mathbb{R}^{C\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.12.m12.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_H × italic_W end_POSTSUPERSCRIPT</annotation></semantics></math>. How to use <math alttext="x_{c}" class="ltx_Math" display="inline" id="S2.SS1.p1.13.m13.1"><semantics id="S2.SS1.p1.13.m13.1a"><msub id="S2.SS1.p1.13.m13.1.1" xref="S2.SS1.p1.13.m13.1.1.cmml"><mi id="S2.SS1.p1.13.m13.1.1.2" xref="S2.SS1.p1.13.m13.1.1.2.cmml">x</mi><mi id="S2.SS1.p1.13.m13.1.1.3" xref="S2.SS1.p1.13.m13.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.13.m13.1b"><apply id="S2.SS1.p1.13.m13.1.1.cmml" xref="S2.SS1.p1.13.m13.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.13.m13.1.1.1.cmml" xref="S2.SS1.p1.13.m13.1.1">subscript</csymbol><ci id="S2.SS1.p1.13.m13.1.1.2.cmml" xref="S2.SS1.p1.13.m13.1.1.2">𝑥</ci><ci id="S2.SS1.p1.13.m13.1.1.3.cmml" xref="S2.SS1.p1.13.m13.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.13.m13.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.13.m13.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> for inter-frame coding affects crucially the trade-off between compression performance and complexity. We shall address this issue by exploring and comparing three coding techniques – conditional coding, conditional residual coding, and masked conditional residual coding – based on the same coding components used in DCVC. Section <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S2.SS2" title="II-B Inter-frame Coding ‣ II Proposed Method ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">II-B</span></span></a> details how <math alttext="x_{c}" class="ltx_Math" display="inline" id="S2.SS1.p1.14.m14.1"><semantics id="S2.SS1.p1.14.m14.1a"><msub id="S2.SS1.p1.14.m14.1.1" xref="S2.SS1.p1.14.m14.1.1.cmml"><mi id="S2.SS1.p1.14.m14.1.1.2" xref="S2.SS1.p1.14.m14.1.1.2.cmml">x</mi><mi id="S2.SS1.p1.14.m14.1.1.3" xref="S2.SS1.p1.14.m14.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.14.m14.1b"><apply id="S2.SS1.p1.14.m14.1.1.cmml" xref="S2.SS1.p1.14.m14.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.14.m14.1.1.1.cmml" xref="S2.SS1.p1.14.m14.1.1">subscript</csymbol><ci id="S2.SS1.p1.14.m14.1.1.2.cmml" xref="S2.SS1.p1.14.m14.1.1.2">𝑥</ci><ci id="S2.SS1.p1.14.m14.1.1.3.cmml" xref="S2.SS1.p1.14.m14.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.14.m14.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.14.m14.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> is employed in each of these coding techniques.</p> </div> </section> <section class="ltx_subsection" id="S2.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S2.SS2.4.1.1">II-B</span> </span><span class="ltx_text ltx_font_italic" id="S2.SS2.5.2">Inter-frame Coding</span> </h3> <section class="ltx_subsubsection" id="S2.SS2.SSS1"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection"><span class="ltx_text" id="S2.SS2.SSS1.4.1.1">II-B</span>1 </span>Conditional Coding</h4> <div class="ltx_para" id="S2.SS2.SSS1.p1"> <p class="ltx_p" id="S2.SS2.SSS1.p1.13">DCVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib1" title="">1</a>]</cite> adopts conditional coding for inter-frame coding. As illustrated in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S1.F2" title="Figure 2 ‣ I Introduction ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">2</span></a> (a), the input frame <math alttext="x_{t}" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.1.m1.1"><semantics id="S2.SS2.SSS1.p1.1.m1.1a"><msub id="S2.SS2.SSS1.p1.1.m1.1.1" xref="S2.SS2.SSS1.p1.1.m1.1.1.cmml"><mi id="S2.SS2.SSS1.p1.1.m1.1.1.2" xref="S2.SS2.SSS1.p1.1.m1.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS1.p1.1.m1.1.1.3" xref="S2.SS2.SSS1.p1.1.m1.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.1.m1.1b"><apply id="S2.SS2.SSS1.p1.1.m1.1.1.cmml" xref="S2.SS2.SSS1.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.1.m1.1.1.1.cmml" xref="S2.SS2.SSS1.p1.1.m1.1.1">subscript</csymbol><ci id="S2.SS2.SSS1.p1.1.m1.1.1.2.cmml" xref="S2.SS2.SSS1.p1.1.m1.1.1.2">𝑥</ci><ci id="S2.SS2.SSS1.p1.1.m1.1.1.3.cmml" xref="S2.SS2.SSS1.p1.1.m1.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.1.m1.1c">x_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.1.m1.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> is coded conditionally based on the temporal predictor <math alttext="x_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.2.m2.1"><semantics id="S2.SS2.SSS1.p1.2.m2.1a"><msub id="S2.SS2.SSS1.p1.2.m2.1.1" xref="S2.SS2.SSS1.p1.2.m2.1.1.cmml"><mi id="S2.SS2.SSS1.p1.2.m2.1.1.2" xref="S2.SS2.SSS1.p1.2.m2.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS1.p1.2.m2.1.1.3" xref="S2.SS2.SSS1.p1.2.m2.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.2.m2.1b"><apply id="S2.SS2.SSS1.p1.2.m2.1.1.cmml" xref="S2.SS2.SSS1.p1.2.m2.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.2.m2.1.1.1.cmml" xref="S2.SS2.SSS1.p1.2.m2.1.1">subscript</csymbol><ci id="S2.SS2.SSS1.p1.2.m2.1.1.2.cmml" xref="S2.SS2.SSS1.p1.2.m2.1.1.2">𝑥</ci><ci id="S2.SS2.SSS1.p1.2.m2.1.1.3.cmml" xref="S2.SS2.SSS1.p1.2.m2.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.2.m2.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.2.m2.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math>. To mitigate the information bottleneck from using <math alttext="x_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.3.m3.1"><semantics id="S2.SS2.SSS1.p1.3.m3.1a"><msub id="S2.SS2.SSS1.p1.3.m3.1.1" xref="S2.SS2.SSS1.p1.3.m3.1.1.cmml"><mi id="S2.SS2.SSS1.p1.3.m3.1.1.2" xref="S2.SS2.SSS1.p1.3.m3.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS1.p1.3.m3.1.1.3" xref="S2.SS2.SSS1.p1.3.m3.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.3.m3.1b"><apply id="S2.SS2.SSS1.p1.3.m3.1.1.cmml" xref="S2.SS2.SSS1.p1.3.m3.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.3.m3.1.1.1.cmml" xref="S2.SS2.SSS1.p1.3.m3.1.1">subscript</csymbol><ci id="S2.SS2.SSS1.p1.3.m3.1.1.2.cmml" xref="S2.SS2.SSS1.p1.3.m3.1.1.2">𝑥</ci><ci id="S2.SS2.SSS1.p1.3.m3.1.1.3.cmml" xref="S2.SS2.SSS1.p1.3.m3.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.3.m3.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.3.m3.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> as the condition signal, DCVC employs a two-pronged approach: (1) retaining <math alttext="x_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.4.m4.1"><semantics id="S2.SS2.SSS1.p1.4.m4.1a"><msub id="S2.SS2.SSS1.p1.4.m4.1.1" xref="S2.SS2.SSS1.p1.4.m4.1.1.cmml"><mi id="S2.SS2.SSS1.p1.4.m4.1.1.2" xref="S2.SS2.SSS1.p1.4.m4.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS1.p1.4.m4.1.1.3" xref="S2.SS2.SSS1.p1.4.m4.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.4.m4.1b"><apply id="S2.SS2.SSS1.p1.4.m4.1.1.cmml" xref="S2.SS2.SSS1.p1.4.m4.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.4.m4.1.1.1.cmml" xref="S2.SS2.SSS1.p1.4.m4.1.1">subscript</csymbol><ci id="S2.SS2.SSS1.p1.4.m4.1.1.2.cmml" xref="S2.SS2.SSS1.p1.4.m4.1.1.2">𝑥</ci><ci id="S2.SS2.SSS1.p1.4.m4.1.1.3.cmml" xref="S2.SS2.SSS1.p1.4.m4.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.4.m4.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.4.m4.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> at full resolution and refining it as <math alttext="\dot{x}_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.5.m5.1"><semantics id="S2.SS2.SSS1.p1.5.m5.1a"><msub id="S2.SS2.SSS1.p1.5.m5.1.1" xref="S2.SS2.SSS1.p1.5.m5.1.1.cmml"><mover accent="true" id="S2.SS2.SSS1.p1.5.m5.1.1.2" xref="S2.SS2.SSS1.p1.5.m5.1.1.2.cmml"><mi id="S2.SS2.SSS1.p1.5.m5.1.1.2.2" xref="S2.SS2.SSS1.p1.5.m5.1.1.2.2.cmml">x</mi><mo id="S2.SS2.SSS1.p1.5.m5.1.1.2.1" xref="S2.SS2.SSS1.p1.5.m5.1.1.2.1.cmml">˙</mo></mover><mi id="S2.SS2.SSS1.p1.5.m5.1.1.3" xref="S2.SS2.SSS1.p1.5.m5.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.5.m5.1b"><apply id="S2.SS2.SSS1.p1.5.m5.1.1.cmml" xref="S2.SS2.SSS1.p1.5.m5.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.5.m5.1.1.1.cmml" xref="S2.SS2.SSS1.p1.5.m5.1.1">subscript</csymbol><apply id="S2.SS2.SSS1.p1.5.m5.1.1.2.cmml" xref="S2.SS2.SSS1.p1.5.m5.1.1.2"><ci id="S2.SS2.SSS1.p1.5.m5.1.1.2.1.cmml" xref="S2.SS2.SSS1.p1.5.m5.1.1.2.1">˙</ci><ci id="S2.SS2.SSS1.p1.5.m5.1.1.2.2.cmml" xref="S2.SS2.SSS1.p1.5.m5.1.1.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS1.p1.5.m5.1.1.3.cmml" xref="S2.SS2.SSS1.p1.5.m5.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.5.m5.1c">\dot{x}_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.5.m5.1d">over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math>, and (2) choosing a large channel size <math alttext="C" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.6.m6.1"><semantics id="S2.SS2.SSS1.p1.6.m6.1a"><mi id="S2.SS2.SSS1.p1.6.m6.1.1" xref="S2.SS2.SSS1.p1.6.m6.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.6.m6.1b"><ci id="S2.SS2.SSS1.p1.6.m6.1.1.cmml" xref="S2.SS2.SSS1.p1.6.m6.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.6.m6.1c">C</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.6.m6.1d">italic_C</annotation></semantics></math> (e.g. 64) for <math alttext="x_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.7.m7.1"><semantics id="S2.SS2.SSS1.p1.7.m7.1a"><msub id="S2.SS2.SSS1.p1.7.m7.1.1" xref="S2.SS2.SSS1.p1.7.m7.1.1.cmml"><mi id="S2.SS2.SSS1.p1.7.m7.1.1.2" xref="S2.SS2.SSS1.p1.7.m7.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS1.p1.7.m7.1.1.3" xref="S2.SS2.SSS1.p1.7.m7.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.7.m7.1b"><apply id="S2.SS2.SSS1.p1.7.m7.1.1.cmml" xref="S2.SS2.SSS1.p1.7.m7.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.7.m7.1.1.1.cmml" xref="S2.SS2.SSS1.p1.7.m7.1.1">subscript</csymbol><ci id="S2.SS2.SSS1.p1.7.m7.1.1.2.cmml" xref="S2.SS2.SSS1.p1.7.m7.1.1.2">𝑥</ci><ci id="S2.SS2.SSS1.p1.7.m7.1.1.3.cmml" xref="S2.SS2.SSS1.p1.7.m7.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.7.m7.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.7.m7.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math>. As shown in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S1.F2" title="Figure 2 ‣ I Introduction ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">2</span></a> (a), <math alttext="x_{t}" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.8.m8.1"><semantics id="S2.SS2.SSS1.p1.8.m8.1a"><msub id="S2.SS2.SSS1.p1.8.m8.1.1" xref="S2.SS2.SSS1.p1.8.m8.1.1.cmml"><mi id="S2.SS2.SSS1.p1.8.m8.1.1.2" xref="S2.SS2.SSS1.p1.8.m8.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS1.p1.8.m8.1.1.3" xref="S2.SS2.SSS1.p1.8.m8.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.8.m8.1b"><apply id="S2.SS2.SSS1.p1.8.m8.1.1.cmml" xref="S2.SS2.SSS1.p1.8.m8.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.8.m8.1.1.1.cmml" xref="S2.SS2.SSS1.p1.8.m8.1.1">subscript</csymbol><ci id="S2.SS2.SSS1.p1.8.m8.1.1.2.cmml" xref="S2.SS2.SSS1.p1.8.m8.1.1.2">𝑥</ci><ci id="S2.SS2.SSS1.p1.8.m8.1.1.3.cmml" xref="S2.SS2.SSS1.p1.8.m8.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.8.m8.1c">x_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.8.m8.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> and <math alttext="\dot{x}_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.9.m9.1"><semantics id="S2.SS2.SSS1.p1.9.m9.1a"><msub id="S2.SS2.SSS1.p1.9.m9.1.1" xref="S2.SS2.SSS1.p1.9.m9.1.1.cmml"><mover accent="true" id="S2.SS2.SSS1.p1.9.m9.1.1.2" xref="S2.SS2.SSS1.p1.9.m9.1.1.2.cmml"><mi id="S2.SS2.SSS1.p1.9.m9.1.1.2.2" xref="S2.SS2.SSS1.p1.9.m9.1.1.2.2.cmml">x</mi><mo id="S2.SS2.SSS1.p1.9.m9.1.1.2.1" xref="S2.SS2.SSS1.p1.9.m9.1.1.2.1.cmml">˙</mo></mover><mi id="S2.SS2.SSS1.p1.9.m9.1.1.3" xref="S2.SS2.SSS1.p1.9.m9.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.9.m9.1b"><apply id="S2.SS2.SSS1.p1.9.m9.1.1.cmml" xref="S2.SS2.SSS1.p1.9.m9.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.9.m9.1.1.1.cmml" xref="S2.SS2.SSS1.p1.9.m9.1.1">subscript</csymbol><apply id="S2.SS2.SSS1.p1.9.m9.1.1.2.cmml" xref="S2.SS2.SSS1.p1.9.m9.1.1.2"><ci id="S2.SS2.SSS1.p1.9.m9.1.1.2.1.cmml" xref="S2.SS2.SSS1.p1.9.m9.1.1.2.1">˙</ci><ci id="S2.SS2.SSS1.p1.9.m9.1.1.2.2.cmml" xref="S2.SS2.SSS1.p1.9.m9.1.1.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS1.p1.9.m9.1.1.3.cmml" xref="S2.SS2.SSS1.p1.9.m9.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.9.m9.1c">\dot{x}_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.9.m9.1d">over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> are concatenated as the input to the inter-frame codec, <math alttext="\{G^{enc},G^{dec}\}" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.10.m10.2"><semantics id="S2.SS2.SSS1.p1.10.m10.2a"><mrow id="S2.SS2.SSS1.p1.10.m10.2.2.2" xref="S2.SS2.SSS1.p1.10.m10.2.2.3.cmml"><mo id="S2.SS2.SSS1.p1.10.m10.2.2.2.3" stretchy="false" xref="S2.SS2.SSS1.p1.10.m10.2.2.3.cmml">{</mo><msup id="S2.SS2.SSS1.p1.10.m10.1.1.1.1" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1.cmml"><mi id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.2" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1.2.cmml">G</mi><mrow id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.cmml"><mi id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.2" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.2.cmml">e</mi><mo id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.1" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.1.cmml">⁢</mo><mi id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.3" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.3.cmml">n</mi><mo id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.1a" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.1.cmml">⁢</mo><mi id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.4" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.4.cmml">c</mi></mrow></msup><mo id="S2.SS2.SSS1.p1.10.m10.2.2.2.4" xref="S2.SS2.SSS1.p1.10.m10.2.2.3.cmml">,</mo><msup id="S2.SS2.SSS1.p1.10.m10.2.2.2.2" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2.cmml"><mi id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.2" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2.2.cmml">G</mi><mrow id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.cmml"><mi id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.2" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.2.cmml">d</mi><mo id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.1" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.1.cmml">⁢</mo><mi id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.3" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.3.cmml">e</mi><mo id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.1a" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.1.cmml">⁢</mo><mi id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.4" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.4.cmml">c</mi></mrow></msup><mo id="S2.SS2.SSS1.p1.10.m10.2.2.2.5" stretchy="false" xref="S2.SS2.SSS1.p1.10.m10.2.2.3.cmml">}</mo></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.10.m10.2b"><set id="S2.SS2.SSS1.p1.10.m10.2.2.3.cmml" xref="S2.SS2.SSS1.p1.10.m10.2.2.2"><apply id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.cmml" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.1.cmml" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1">superscript</csymbol><ci id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.2.cmml" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1.2">𝐺</ci><apply id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.cmml" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3"><times id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.1.cmml" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.1"></times><ci id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.2.cmml" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.2">𝑒</ci><ci id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.3.cmml" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.3">𝑛</ci><ci id="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.4.cmml" xref="S2.SS2.SSS1.p1.10.m10.1.1.1.1.3.4">𝑐</ci></apply></apply><apply id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.cmml" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.1.cmml" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2">superscript</csymbol><ci id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.2.cmml" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2.2">𝐺</ci><apply id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.cmml" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3"><times id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.1.cmml" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.1"></times><ci id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.2.cmml" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.2">𝑑</ci><ci id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.3.cmml" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.3">𝑒</ci><ci id="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.4.cmml" xref="S2.SS2.SSS1.p1.10.m10.2.2.2.2.3.4">𝑐</ci></apply></apply></set></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.10.m10.2c">\{G^{enc},G^{dec}\}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.10.m10.2d">{ italic_G start_POSTSUPERSCRIPT italic_e italic_n italic_c end_POSTSUPERSCRIPT , italic_G start_POSTSUPERSCRIPT italic_d italic_e italic_c end_POSTSUPERSCRIPT }</annotation></semantics></math>. Likewise, on the decoder side, the decoded signal of dimension <math alttext="{64\times H\times W}" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.11.m11.1"><semantics id="S2.SS2.SSS1.p1.11.m11.1a"><mrow id="S2.SS2.SSS1.p1.11.m11.1.1" xref="S2.SS2.SSS1.p1.11.m11.1.1.cmml"><mn id="S2.SS2.SSS1.p1.11.m11.1.1.2" xref="S2.SS2.SSS1.p1.11.m11.1.1.2.cmml">64</mn><mo id="S2.SS2.SSS1.p1.11.m11.1.1.1" lspace="0.222em" rspace="0.222em" xref="S2.SS2.SSS1.p1.11.m11.1.1.1.cmml">×</mo><mi id="S2.SS2.SSS1.p1.11.m11.1.1.3" xref="S2.SS2.SSS1.p1.11.m11.1.1.3.cmml">H</mi><mo id="S2.SS2.SSS1.p1.11.m11.1.1.1a" lspace="0.222em" rspace="0.222em" xref="S2.SS2.SSS1.p1.11.m11.1.1.1.cmml">×</mo><mi id="S2.SS2.SSS1.p1.11.m11.1.1.4" xref="S2.SS2.SSS1.p1.11.m11.1.1.4.cmml">W</mi></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.11.m11.1b"><apply id="S2.SS2.SSS1.p1.11.m11.1.1.cmml" xref="S2.SS2.SSS1.p1.11.m11.1.1"><times id="S2.SS2.SSS1.p1.11.m11.1.1.1.cmml" xref="S2.SS2.SSS1.p1.11.m11.1.1.1"></times><cn id="S2.SS2.SSS1.p1.11.m11.1.1.2.cmml" type="integer" xref="S2.SS2.SSS1.p1.11.m11.1.1.2">64</cn><ci id="S2.SS2.SSS1.p1.11.m11.1.1.3.cmml" xref="S2.SS2.SSS1.p1.11.m11.1.1.3">𝐻</ci><ci id="S2.SS2.SSS1.p1.11.m11.1.1.4.cmml" xref="S2.SS2.SSS1.p1.11.m11.1.1.4">𝑊</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.11.m11.1c">{64\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.11.m11.1d">64 × italic_H × italic_W</annotation></semantics></math> is concatenated with <math alttext="\dot{x}_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.12.m12.1"><semantics id="S2.SS2.SSS1.p1.12.m12.1a"><msub id="S2.SS2.SSS1.p1.12.m12.1.1" xref="S2.SS2.SSS1.p1.12.m12.1.1.cmml"><mover accent="true" id="S2.SS2.SSS1.p1.12.m12.1.1.2" xref="S2.SS2.SSS1.p1.12.m12.1.1.2.cmml"><mi id="S2.SS2.SSS1.p1.12.m12.1.1.2.2" xref="S2.SS2.SSS1.p1.12.m12.1.1.2.2.cmml">x</mi><mo id="S2.SS2.SSS1.p1.12.m12.1.1.2.1" xref="S2.SS2.SSS1.p1.12.m12.1.1.2.1.cmml">˙</mo></mover><mi id="S2.SS2.SSS1.p1.12.m12.1.1.3" xref="S2.SS2.SSS1.p1.12.m12.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.12.m12.1b"><apply id="S2.SS2.SSS1.p1.12.m12.1.1.cmml" xref="S2.SS2.SSS1.p1.12.m12.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.12.m12.1.1.1.cmml" xref="S2.SS2.SSS1.p1.12.m12.1.1">subscript</csymbol><apply id="S2.SS2.SSS1.p1.12.m12.1.1.2.cmml" xref="S2.SS2.SSS1.p1.12.m12.1.1.2"><ci id="S2.SS2.SSS1.p1.12.m12.1.1.2.1.cmml" xref="S2.SS2.SSS1.p1.12.m12.1.1.2.1">˙</ci><ci id="S2.SS2.SSS1.p1.12.m12.1.1.2.2.cmml" xref="S2.SS2.SSS1.p1.12.m12.1.1.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS1.p1.12.m12.1.1.3.cmml" xref="S2.SS2.SSS1.p1.12.m12.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.12.m12.1c">\dot{x}_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.12.m12.1d">over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> and fed into the frame generator to reconstruct the input frame. DCVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib1" title="">1</a>]</cite> is typical of modern learned video codecs that adopt conditional coding. While it shows very promising coding performance, the need to fetch a large number of full-resolution features in reconstructing a video frame incurs not only high computational complexity but more notably high memory access bandwidth. The latter turns the decoding process into a memory bound operation. This work is aimed at understanding its rate-distortion-complexity trade-off by varying the channel size <math alttext="C" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.13.m13.1"><semantics id="S2.SS2.SSS1.p1.13.m13.1a"><mi id="S2.SS2.SSS1.p1.13.m13.1.1" xref="S2.SS2.SSS1.p1.13.m13.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.13.m13.1b"><ci id="S2.SS2.SSS1.p1.13.m13.1.1.cmml" xref="S2.SS2.SSS1.p1.13.m13.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.13.m13.1c">C</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.13.m13.1d">italic_C</annotation></semantics></math>.</p> </div> </section> <section class="ltx_subsubsection" id="S2.SS2.SSS2"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection"><span class="ltx_text" id="S2.SS2.SSS2.4.1.1">II-B</span>2 </span>Conditional Residual Coding</h4> <div class="ltx_para" id="S2.SS2.SSS2.p1"> <p class="ltx_p" id="S2.SS2.SSS2.p1.21">As shown in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib14" title="">14</a>]</cite>, when the temporal predictor <math alttext="x_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.1.m1.1"><semantics id="S2.SS2.SSS2.p1.1.m1.1a"><msub id="S2.SS2.SSS2.p1.1.m1.1.1" xref="S2.SS2.SSS2.p1.1.m1.1.1.cmml"><mi id="S2.SS2.SSS2.p1.1.m1.1.1.2" xref="S2.SS2.SSS2.p1.1.m1.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.1.m1.1.1.3" xref="S2.SS2.SSS2.p1.1.m1.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.1.m1.1b"><apply id="S2.SS2.SSS2.p1.1.m1.1.1.cmml" xref="S2.SS2.SSS2.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.1.m1.1.1.1.cmml" xref="S2.SS2.SSS2.p1.1.m1.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p1.1.m1.1.1.2.cmml" xref="S2.SS2.SSS2.p1.1.m1.1.1.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.1.m1.1.1.3.cmml" xref="S2.SS2.SSS2.p1.1.m1.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.1.m1.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.1.m1.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> is a good predictor for the coding frame <math alttext="x_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.2.m2.1"><semantics id="S2.SS2.SSS2.p1.2.m2.1a"><msub id="S2.SS2.SSS2.p1.2.m2.1.1" xref="S2.SS2.SSS2.p1.2.m2.1.1.cmml"><mi id="S2.SS2.SSS2.p1.2.m2.1.1.2" xref="S2.SS2.SSS2.p1.2.m2.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.2.m2.1.1.3" xref="S2.SS2.SSS2.p1.2.m2.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.2.m2.1b"><apply id="S2.SS2.SSS2.p1.2.m2.1.1.cmml" xref="S2.SS2.SSS2.p1.2.m2.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.2.m2.1.1.1.cmml" xref="S2.SS2.SSS2.p1.2.m2.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p1.2.m2.1.1.2.cmml" xref="S2.SS2.SSS2.p1.2.m2.1.1.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.2.m2.1.1.3.cmml" xref="S2.SS2.SSS2.p1.2.m2.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.2.m2.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.2.m2.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> such that <math alttext="H(x_{t}-x_{c})\leq H(x_{t})" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.3.m3.2"><semantics id="S2.SS2.SSS2.p1.3.m3.2a"><mrow id="S2.SS2.SSS2.p1.3.m3.2.2" xref="S2.SS2.SSS2.p1.3.m3.2.2.cmml"><mrow id="S2.SS2.SSS2.p1.3.m3.1.1.1" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.cmml"><mi id="S2.SS2.SSS2.p1.3.m3.1.1.1.3" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.3.cmml">H</mi><mo id="S2.SS2.SSS2.p1.3.m3.1.1.1.2" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.2.cmml">⁢</mo><mrow id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.cmml"><mo id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.2" stretchy="false" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.cmml">(</mo><mrow id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.cmml"><msub id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.2" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.2.cmml"><mi id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.2.2" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.2.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.2.3" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.2.3.cmml">t</mi></msub><mo id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.1" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.1.cmml">−</mo><msub id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.3" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.3.cmml"><mi id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.3.2" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.3.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.3.3" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.3.3.cmml">c</mi></msub></mrow><mo id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.3" stretchy="false" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S2.SS2.SSS2.p1.3.m3.2.2.3" xref="S2.SS2.SSS2.p1.3.m3.2.2.3.cmml">≤</mo><mrow id="S2.SS2.SSS2.p1.3.m3.2.2.2" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.cmml"><mi id="S2.SS2.SSS2.p1.3.m3.2.2.2.3" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.3.cmml">H</mi><mo id="S2.SS2.SSS2.p1.3.m3.2.2.2.2" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.2.cmml">⁢</mo><mrow id="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1.cmml"><mo id="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.2" stretchy="false" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1.cmml">(</mo><msub id="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1.cmml"><mi id="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1.2" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1.3" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1.3.cmml">t</mi></msub><mo id="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.3" stretchy="false" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.3.m3.2b"><apply id="S2.SS2.SSS2.p1.3.m3.2.2.cmml" xref="S2.SS2.SSS2.p1.3.m3.2.2"><leq id="S2.SS2.SSS2.p1.3.m3.2.2.3.cmml" xref="S2.SS2.SSS2.p1.3.m3.2.2.3"></leq><apply id="S2.SS2.SSS2.p1.3.m3.1.1.1.cmml" xref="S2.SS2.SSS2.p1.3.m3.1.1.1"><times id="S2.SS2.SSS2.p1.3.m3.1.1.1.2.cmml" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.2"></times><ci id="S2.SS2.SSS2.p1.3.m3.1.1.1.3.cmml" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.3">𝐻</ci><apply id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.cmml" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1"><minus id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.1.cmml" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.1"></minus><apply id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.2.cmml" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.2.1.cmml" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.2">subscript</csymbol><ci id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.2.2.cmml" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.2.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.2.3.cmml" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.2.3">𝑡</ci></apply><apply id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.3.cmml" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.3.1.cmml" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.3">subscript</csymbol><ci id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.3.2.cmml" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.3.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.3.3.cmml" xref="S2.SS2.SSS2.p1.3.m3.1.1.1.1.1.1.3.3">𝑐</ci></apply></apply></apply><apply id="S2.SS2.SSS2.p1.3.m3.2.2.2.cmml" xref="S2.SS2.SSS2.p1.3.m3.2.2.2"><times id="S2.SS2.SSS2.p1.3.m3.2.2.2.2.cmml" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.2"></times><ci id="S2.SS2.SSS2.p1.3.m3.2.2.2.3.cmml" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.3">𝐻</ci><apply id="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1.cmml" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1.1.cmml" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1.2.cmml" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1.3.cmml" xref="S2.SS2.SSS2.p1.3.m3.2.2.2.1.1.1.3">𝑡</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.3.m3.2c">H(x_{t}-x_{c})\leq H(x_{t})</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.3.m3.2d">italic_H ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ≤ italic_H ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )</annotation></semantics></math> holds true, encoding the prediction residue <math alttext="x_{t}-x_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.4.m4.1"><semantics id="S2.SS2.SSS2.p1.4.m4.1a"><mrow id="S2.SS2.SSS2.p1.4.m4.1.1" xref="S2.SS2.SSS2.p1.4.m4.1.1.cmml"><msub id="S2.SS2.SSS2.p1.4.m4.1.1.2" xref="S2.SS2.SSS2.p1.4.m4.1.1.2.cmml"><mi id="S2.SS2.SSS2.p1.4.m4.1.1.2.2" xref="S2.SS2.SSS2.p1.4.m4.1.1.2.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.4.m4.1.1.2.3" xref="S2.SS2.SSS2.p1.4.m4.1.1.2.3.cmml">t</mi></msub><mo id="S2.SS2.SSS2.p1.4.m4.1.1.1" xref="S2.SS2.SSS2.p1.4.m4.1.1.1.cmml">−</mo><msub id="S2.SS2.SSS2.p1.4.m4.1.1.3" xref="S2.SS2.SSS2.p1.4.m4.1.1.3.cmml"><mi id="S2.SS2.SSS2.p1.4.m4.1.1.3.2" xref="S2.SS2.SSS2.p1.4.m4.1.1.3.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.4.m4.1.1.3.3" xref="S2.SS2.SSS2.p1.4.m4.1.1.3.3.cmml">c</mi></msub></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.4.m4.1b"><apply id="S2.SS2.SSS2.p1.4.m4.1.1.cmml" xref="S2.SS2.SSS2.p1.4.m4.1.1"><minus id="S2.SS2.SSS2.p1.4.m4.1.1.1.cmml" xref="S2.SS2.SSS2.p1.4.m4.1.1.1"></minus><apply id="S2.SS2.SSS2.p1.4.m4.1.1.2.cmml" xref="S2.SS2.SSS2.p1.4.m4.1.1.2"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.4.m4.1.1.2.1.cmml" xref="S2.SS2.SSS2.p1.4.m4.1.1.2">subscript</csymbol><ci id="S2.SS2.SSS2.p1.4.m4.1.1.2.2.cmml" xref="S2.SS2.SSS2.p1.4.m4.1.1.2.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.4.m4.1.1.2.3.cmml" xref="S2.SS2.SSS2.p1.4.m4.1.1.2.3">𝑡</ci></apply><apply id="S2.SS2.SSS2.p1.4.m4.1.1.3.cmml" xref="S2.SS2.SSS2.p1.4.m4.1.1.3"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.4.m4.1.1.3.1.cmml" xref="S2.SS2.SSS2.p1.4.m4.1.1.3">subscript</csymbol><ci id="S2.SS2.SSS2.p1.4.m4.1.1.3.2.cmml" xref="S2.SS2.SSS2.p1.4.m4.1.1.3.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.4.m4.1.1.3.3.cmml" xref="S2.SS2.SSS2.p1.4.m4.1.1.3.3">𝑐</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.4.m4.1c">x_{t}-x_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.4.m4.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> with <math alttext="f(x_{c})" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.5.m5.1"><semantics id="S2.SS2.SSS2.p1.5.m5.1a"><mrow id="S2.SS2.SSS2.p1.5.m5.1.1" xref="S2.SS2.SSS2.p1.5.m5.1.1.cmml"><mi id="S2.SS2.SSS2.p1.5.m5.1.1.3" mathcolor="#000000" xref="S2.SS2.SSS2.p1.5.m5.1.1.3.cmml">f</mi><mo id="S2.SS2.SSS2.p1.5.m5.1.1.2" xref="S2.SS2.SSS2.p1.5.m5.1.1.2.cmml">⁢</mo><mrow id="S2.SS2.SSS2.p1.5.m5.1.1.1.1" xref="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1.cmml"><mo id="S2.SS2.SSS2.p1.5.m5.1.1.1.1.2" mathcolor="#000000" stretchy="false" xref="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1.cmml">(</mo><msub id="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1" xref="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1.cmml"><mi id="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1.2" mathcolor="#000000" xref="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1.3" mathcolor="#000000" xref="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1.3.cmml">c</mi></msub><mo id="S2.SS2.SSS2.p1.5.m5.1.1.1.1.3" mathcolor="#000000" stretchy="false" xref="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.5.m5.1b"><apply id="S2.SS2.SSS2.p1.5.m5.1.1.cmml" xref="S2.SS2.SSS2.p1.5.m5.1.1"><times id="S2.SS2.SSS2.p1.5.m5.1.1.2.cmml" xref="S2.SS2.SSS2.p1.5.m5.1.1.2"></times><ci id="S2.SS2.SSS2.p1.5.m5.1.1.3.cmml" xref="S2.SS2.SSS2.p1.5.m5.1.1.3">𝑓</ci><apply id="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1.cmml" xref="S2.SS2.SSS2.p1.5.m5.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1.1.cmml" xref="S2.SS2.SSS2.p1.5.m5.1.1.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1.2.cmml" xref="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1.3.cmml" xref="S2.SS2.SSS2.p1.5.m5.1.1.1.1.1.3">𝑐</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.5.m5.1c">f(x_{c})</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.5.m5.1d">italic_f ( italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT )</annotation></semantics></math> as the condition signal is at least as efficient as encoding the target frame <math alttext="x_{t}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.6.m6.1"><semantics id="S2.SS2.SSS2.p1.6.m6.1a"><msub id="S2.SS2.SSS2.p1.6.m6.1.1" xref="S2.SS2.SSS2.p1.6.m6.1.1.cmml"><mi id="S2.SS2.SSS2.p1.6.m6.1.1.2" xref="S2.SS2.SSS2.p1.6.m6.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.6.m6.1.1.3" xref="S2.SS2.SSS2.p1.6.m6.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.6.m6.1b"><apply id="S2.SS2.SSS2.p1.6.m6.1.1.cmml" xref="S2.SS2.SSS2.p1.6.m6.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.6.m6.1.1.1.cmml" xref="S2.SS2.SSS2.p1.6.m6.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p1.6.m6.1.1.2.cmml" xref="S2.SS2.SSS2.p1.6.m6.1.1.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.6.m6.1.1.3.cmml" xref="S2.SS2.SSS2.p1.6.m6.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.6.m6.1c">x_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.6.m6.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> with the same condition <math alttext="f(x_{c})" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.7.m7.1"><semantics id="S2.SS2.SSS2.p1.7.m7.1a"><mrow id="S2.SS2.SSS2.p1.7.m7.1.1" xref="S2.SS2.SSS2.p1.7.m7.1.1.cmml"><mi id="S2.SS2.SSS2.p1.7.m7.1.1.3" mathcolor="#000000" xref="S2.SS2.SSS2.p1.7.m7.1.1.3.cmml">f</mi><mo id="S2.SS2.SSS2.p1.7.m7.1.1.2" xref="S2.SS2.SSS2.p1.7.m7.1.1.2.cmml">⁢</mo><mrow id="S2.SS2.SSS2.p1.7.m7.1.1.1.1" xref="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1.cmml"><mo id="S2.SS2.SSS2.p1.7.m7.1.1.1.1.2" mathcolor="#000000" stretchy="false" xref="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1.cmml">(</mo><msub id="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1" xref="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1.cmml"><mi id="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1.2" mathcolor="#000000" xref="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1.3" mathcolor="#000000" xref="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1.3.cmml">c</mi></msub><mo id="S2.SS2.SSS2.p1.7.m7.1.1.1.1.3" mathcolor="#000000" stretchy="false" xref="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.7.m7.1b"><apply id="S2.SS2.SSS2.p1.7.m7.1.1.cmml" xref="S2.SS2.SSS2.p1.7.m7.1.1"><times id="S2.SS2.SSS2.p1.7.m7.1.1.2.cmml" xref="S2.SS2.SSS2.p1.7.m7.1.1.2"></times><ci id="S2.SS2.SSS2.p1.7.m7.1.1.3.cmml" xref="S2.SS2.SSS2.p1.7.m7.1.1.3">𝑓</ci><apply id="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1.cmml" xref="S2.SS2.SSS2.p1.7.m7.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1.1.cmml" xref="S2.SS2.SSS2.p1.7.m7.1.1.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1.2.cmml" xref="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1.3.cmml" xref="S2.SS2.SSS2.p1.7.m7.1.1.1.1.1.3">𝑐</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.7.m7.1c">f(x_{c})</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.7.m7.1d">italic_f ( italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT )</annotation></semantics></math>. Here <math alttext="f(x_{c})" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.8.m8.1"><semantics id="S2.SS2.SSS2.p1.8.m8.1a"><mrow id="S2.SS2.SSS2.p1.8.m8.1.1" xref="S2.SS2.SSS2.p1.8.m8.1.1.cmml"><mi id="S2.SS2.SSS2.p1.8.m8.1.1.3" xref="S2.SS2.SSS2.p1.8.m8.1.1.3.cmml">f</mi><mo id="S2.SS2.SSS2.p1.8.m8.1.1.2" xref="S2.SS2.SSS2.p1.8.m8.1.1.2.cmml">⁢</mo><mrow id="S2.SS2.SSS2.p1.8.m8.1.1.1.1" xref="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1.cmml"><mo id="S2.SS2.SSS2.p1.8.m8.1.1.1.1.2" stretchy="false" xref="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1.cmml">(</mo><msub id="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1" xref="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1.cmml"><mi id="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1.2" xref="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1.3" xref="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1.3.cmml">c</mi></msub><mo id="S2.SS2.SSS2.p1.8.m8.1.1.1.1.3" stretchy="false" xref="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.8.m8.1b"><apply id="S2.SS2.SSS2.p1.8.m8.1.1.cmml" xref="S2.SS2.SSS2.p1.8.m8.1.1"><times id="S2.SS2.SSS2.p1.8.m8.1.1.2.cmml" xref="S2.SS2.SSS2.p1.8.m8.1.1.2"></times><ci id="S2.SS2.SSS2.p1.8.m8.1.1.3.cmml" xref="S2.SS2.SSS2.p1.8.m8.1.1.3">𝑓</ci><apply id="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1.cmml" xref="S2.SS2.SSS2.p1.8.m8.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1.1.cmml" xref="S2.SS2.SSS2.p1.8.m8.1.1.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1.2.cmml" xref="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1.3.cmml" xref="S2.SS2.SSS2.p1.8.m8.1.1.1.1.1.3">𝑐</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.8.m8.1c">f(x_{c})</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.8.m8.1d">italic_f ( italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT )</annotation></semantics></math> represents the processed version of the temporal predictor <math alttext="x_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.9.m9.1"><semantics id="S2.SS2.SSS2.p1.9.m9.1a"><msub id="S2.SS2.SSS2.p1.9.m9.1.1" xref="S2.SS2.SSS2.p1.9.m9.1.1.cmml"><mi id="S2.SS2.SSS2.p1.9.m9.1.1.2" xref="S2.SS2.SSS2.p1.9.m9.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.9.m9.1.1.3" xref="S2.SS2.SSS2.p1.9.m9.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.9.m9.1b"><apply id="S2.SS2.SSS2.p1.9.m9.1.1.cmml" xref="S2.SS2.SSS2.p1.9.m9.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.9.m9.1.1.1.cmml" xref="S2.SS2.SSS2.p1.9.m9.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p1.9.m9.1.1.2.cmml" xref="S2.SS2.SSS2.p1.9.m9.1.1.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.9.m9.1.1.3.cmml" xref="S2.SS2.SSS2.p1.9.m9.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.9.m9.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.9.m9.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math>. In this work, <math alttext="f(x_{c})=\dot{x_{c}}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.10.m10.1"><semantics id="S2.SS2.SSS2.p1.10.m10.1a"><mrow id="S2.SS2.SSS2.p1.10.m10.1.1" xref="S2.SS2.SSS2.p1.10.m10.1.1.cmml"><mrow id="S2.SS2.SSS2.p1.10.m10.1.1.1" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.cmml"><mi id="S2.SS2.SSS2.p1.10.m10.1.1.1.3" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.3.cmml">f</mi><mo id="S2.SS2.SSS2.p1.10.m10.1.1.1.2" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.2.cmml">⁢</mo><mrow id="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1.cmml"><mo id="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.2" stretchy="false" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1.cmml">(</mo><msub id="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1.cmml"><mi id="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1.2" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1.3" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1.3.cmml">c</mi></msub><mo id="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.3" stretchy="false" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S2.SS2.SSS2.p1.10.m10.1.1.2" xref="S2.SS2.SSS2.p1.10.m10.1.1.2.cmml">=</mo><mover accent="true" id="S2.SS2.SSS2.p1.10.m10.1.1.3" xref="S2.SS2.SSS2.p1.10.m10.1.1.3.cmml"><msub id="S2.SS2.SSS2.p1.10.m10.1.1.3.2" xref="S2.SS2.SSS2.p1.10.m10.1.1.3.2.cmml"><mi id="S2.SS2.SSS2.p1.10.m10.1.1.3.2.2" xref="S2.SS2.SSS2.p1.10.m10.1.1.3.2.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.10.m10.1.1.3.2.3" xref="S2.SS2.SSS2.p1.10.m10.1.1.3.2.3.cmml">c</mi></msub><mo id="S2.SS2.SSS2.p1.10.m10.1.1.3.1" xref="S2.SS2.SSS2.p1.10.m10.1.1.3.1.cmml">˙</mo></mover></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.10.m10.1b"><apply id="S2.SS2.SSS2.p1.10.m10.1.1.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1"><eq id="S2.SS2.SSS2.p1.10.m10.1.1.2.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1.2"></eq><apply id="S2.SS2.SSS2.p1.10.m10.1.1.1.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1.1"><times id="S2.SS2.SSS2.p1.10.m10.1.1.1.2.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.2"></times><ci id="S2.SS2.SSS2.p1.10.m10.1.1.1.3.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.3">𝑓</ci><apply id="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1.1.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1.2.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1.3.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1.1.1.1.1.3">𝑐</ci></apply></apply><apply id="S2.SS2.SSS2.p1.10.m10.1.1.3.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1.3"><ci id="S2.SS2.SSS2.p1.10.m10.1.1.3.1.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1.3.1">˙</ci><apply id="S2.SS2.SSS2.p1.10.m10.1.1.3.2.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1.3.2"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.10.m10.1.1.3.2.1.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1.3.2">subscript</csymbol><ci id="S2.SS2.SSS2.p1.10.m10.1.1.3.2.2.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1.3.2.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.10.m10.1.1.3.2.3.cmml" xref="S2.SS2.SSS2.p1.10.m10.1.1.3.2.3">𝑐</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.10.m10.1c">f(x_{c})=\dot{x_{c}}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.10.m10.1d">italic_f ( italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) = over˙ start_ARG italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG</annotation></semantics></math>. However, in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib14" title="">14</a>]</cite>, the temporal predictor <math alttext="x_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.11.m11.1"><semantics id="S2.SS2.SSS2.p1.11.m11.1a"><msub id="S2.SS2.SSS2.p1.11.m11.1.1" xref="S2.SS2.SSS2.p1.11.m11.1.1.cmml"><mi id="S2.SS2.SSS2.p1.11.m11.1.1.2" xref="S2.SS2.SSS2.p1.11.m11.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.11.m11.1.1.3" xref="S2.SS2.SSS2.p1.11.m11.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.11.m11.1b"><apply id="S2.SS2.SSS2.p1.11.m11.1.1.cmml" xref="S2.SS2.SSS2.p1.11.m11.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.11.m11.1.1.1.cmml" xref="S2.SS2.SSS2.p1.11.m11.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p1.11.m11.1.1.2.cmml" xref="S2.SS2.SSS2.p1.11.m11.1.1.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.11.m11.1.1.3.cmml" xref="S2.SS2.SSS2.p1.11.m11.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.11.m11.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.11.m11.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> is a 3-channel signal, whereas <math alttext="x_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.12.m12.1"><semantics id="S2.SS2.SSS2.p1.12.m12.1a"><msub id="S2.SS2.SSS2.p1.12.m12.1.1" xref="S2.SS2.SSS2.p1.12.m12.1.1.cmml"><mi id="S2.SS2.SSS2.p1.12.m12.1.1.2" xref="S2.SS2.SSS2.p1.12.m12.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.12.m12.1.1.3" xref="S2.SS2.SSS2.p1.12.m12.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.12.m12.1b"><apply id="S2.SS2.SSS2.p1.12.m12.1.1.cmml" xref="S2.SS2.SSS2.p1.12.m12.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.12.m12.1.1.1.cmml" xref="S2.SS2.SSS2.p1.12.m12.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p1.12.m12.1.1.2.cmml" xref="S2.SS2.SSS2.p1.12.m12.1.1.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.12.m12.1.1.3.cmml" xref="S2.SS2.SSS2.p1.12.m12.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.12.m12.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.12.m12.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> in DCVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib1" title="">1</a>]</cite> comprises C-channel feature maps. As depicted in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S1.F2" title="Figure 2 ‣ I Introduction ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">2</span></a> (b), to adapt DCVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib1" title="">1</a>]</cite> to conditional residual coding, we use a 3x3 convolutional layer to transform <math alttext="x_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.13.m13.1"><semantics id="S2.SS2.SSS2.p1.13.m13.1a"><msub id="S2.SS2.SSS2.p1.13.m13.1.1" xref="S2.SS2.SSS2.p1.13.m13.1.1.cmml"><mi id="S2.SS2.SSS2.p1.13.m13.1.1.2" xref="S2.SS2.SSS2.p1.13.m13.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.13.m13.1.1.3" xref="S2.SS2.SSS2.p1.13.m13.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.13.m13.1b"><apply id="S2.SS2.SSS2.p1.13.m13.1.1.cmml" xref="S2.SS2.SSS2.p1.13.m13.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.13.m13.1.1.1.cmml" xref="S2.SS2.SSS2.p1.13.m13.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p1.13.m13.1.1.2.cmml" xref="S2.SS2.SSS2.p1.13.m13.1.1.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.13.m13.1.1.3.cmml" xref="S2.SS2.SSS2.p1.13.m13.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.13.m13.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.13.m13.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> into <math alttext="\ddot{x}_{c}\in\mathbb{R}^{3\times H\times W}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.14.m14.1"><semantics id="S2.SS2.SSS2.p1.14.m14.1a"><mrow id="S2.SS2.SSS2.p1.14.m14.1.1" xref="S2.SS2.SSS2.p1.14.m14.1.1.cmml"><msub id="S2.SS2.SSS2.p1.14.m14.1.1.2" xref="S2.SS2.SSS2.p1.14.m14.1.1.2.cmml"><mover accent="true" id="S2.SS2.SSS2.p1.14.m14.1.1.2.2" xref="S2.SS2.SSS2.p1.14.m14.1.1.2.2.cmml"><mi id="S2.SS2.SSS2.p1.14.m14.1.1.2.2.2" xref="S2.SS2.SSS2.p1.14.m14.1.1.2.2.2.cmml">x</mi><mo id="S2.SS2.SSS2.p1.14.m14.1.1.2.2.1" xref="S2.SS2.SSS2.p1.14.m14.1.1.2.2.1.cmml">¨</mo></mover><mi id="S2.SS2.SSS2.p1.14.m14.1.1.2.3" xref="S2.SS2.SSS2.p1.14.m14.1.1.2.3.cmml">c</mi></msub><mo id="S2.SS2.SSS2.p1.14.m14.1.1.1" xref="S2.SS2.SSS2.p1.14.m14.1.1.1.cmml">∈</mo><msup id="S2.SS2.SSS2.p1.14.m14.1.1.3" xref="S2.SS2.SSS2.p1.14.m14.1.1.3.cmml"><mi id="S2.SS2.SSS2.p1.14.m14.1.1.3.2" xref="S2.SS2.SSS2.p1.14.m14.1.1.3.2.cmml">ℝ</mi><mrow id="S2.SS2.SSS2.p1.14.m14.1.1.3.3" xref="S2.SS2.SSS2.p1.14.m14.1.1.3.3.cmml"><mn id="S2.SS2.SSS2.p1.14.m14.1.1.3.3.2" xref="S2.SS2.SSS2.p1.14.m14.1.1.3.3.2.cmml">3</mn><mo id="S2.SS2.SSS2.p1.14.m14.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS2.SSS2.p1.14.m14.1.1.3.3.1.cmml">×</mo><mi id="S2.SS2.SSS2.p1.14.m14.1.1.3.3.3" xref="S2.SS2.SSS2.p1.14.m14.1.1.3.3.3.cmml">H</mi><mo id="S2.SS2.SSS2.p1.14.m14.1.1.3.3.1a" lspace="0.222em" rspace="0.222em" xref="S2.SS2.SSS2.p1.14.m14.1.1.3.3.1.cmml">×</mo><mi id="S2.SS2.SSS2.p1.14.m14.1.1.3.3.4" xref="S2.SS2.SSS2.p1.14.m14.1.1.3.3.4.cmml">W</mi></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.14.m14.1b"><apply id="S2.SS2.SSS2.p1.14.m14.1.1.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1"><in id="S2.SS2.SSS2.p1.14.m14.1.1.1.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1.1"></in><apply id="S2.SS2.SSS2.p1.14.m14.1.1.2.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1.2"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.14.m14.1.1.2.1.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1.2">subscript</csymbol><apply id="S2.SS2.SSS2.p1.14.m14.1.1.2.2.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1.2.2"><ci id="S2.SS2.SSS2.p1.14.m14.1.1.2.2.1.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1.2.2.1">¨</ci><ci id="S2.SS2.SSS2.p1.14.m14.1.1.2.2.2.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1.2.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS2.p1.14.m14.1.1.2.3.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1.2.3">𝑐</ci></apply><apply id="S2.SS2.SSS2.p1.14.m14.1.1.3.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1.3"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.14.m14.1.1.3.1.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1.3">superscript</csymbol><ci id="S2.SS2.SSS2.p1.14.m14.1.1.3.2.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1.3.2">ℝ</ci><apply id="S2.SS2.SSS2.p1.14.m14.1.1.3.3.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1.3.3"><times id="S2.SS2.SSS2.p1.14.m14.1.1.3.3.1.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1.3.3.1"></times><cn id="S2.SS2.SSS2.p1.14.m14.1.1.3.3.2.cmml" type="integer" xref="S2.SS2.SSS2.p1.14.m14.1.1.3.3.2">3</cn><ci id="S2.SS2.SSS2.p1.14.m14.1.1.3.3.3.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1.3.3.3">𝐻</ci><ci id="S2.SS2.SSS2.p1.14.m14.1.1.3.3.4.cmml" xref="S2.SS2.SSS2.p1.14.m14.1.1.3.3.4">𝑊</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.14.m14.1c">\ddot{x}_{c}\in\mathbb{R}^{3\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.14.m14.1d">over¨ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 × italic_H × italic_W end_POSTSUPERSCRIPT</annotation></semantics></math>, a 3-channel pixel-domain temporal predictor. Since <math alttext="\dot{x}_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.15.m15.1"><semantics id="S2.SS2.SSS2.p1.15.m15.1a"><msub id="S2.SS2.SSS2.p1.15.m15.1.1" xref="S2.SS2.SSS2.p1.15.m15.1.1.cmml"><mover accent="true" id="S2.SS2.SSS2.p1.15.m15.1.1.2" xref="S2.SS2.SSS2.p1.15.m15.1.1.2.cmml"><mi id="S2.SS2.SSS2.p1.15.m15.1.1.2.2" xref="S2.SS2.SSS2.p1.15.m15.1.1.2.2.cmml">x</mi><mo id="S2.SS2.SSS2.p1.15.m15.1.1.2.1" xref="S2.SS2.SSS2.p1.15.m15.1.1.2.1.cmml">˙</mo></mover><mi id="S2.SS2.SSS2.p1.15.m15.1.1.3" xref="S2.SS2.SSS2.p1.15.m15.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.15.m15.1b"><apply id="S2.SS2.SSS2.p1.15.m15.1.1.cmml" xref="S2.SS2.SSS2.p1.15.m15.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.15.m15.1.1.1.cmml" xref="S2.SS2.SSS2.p1.15.m15.1.1">subscript</csymbol><apply id="S2.SS2.SSS2.p1.15.m15.1.1.2.cmml" xref="S2.SS2.SSS2.p1.15.m15.1.1.2"><ci id="S2.SS2.SSS2.p1.15.m15.1.1.2.1.cmml" xref="S2.SS2.SSS2.p1.15.m15.1.1.2.1">˙</ci><ci id="S2.SS2.SSS2.p1.15.m15.1.1.2.2.cmml" xref="S2.SS2.SSS2.p1.15.m15.1.1.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS2.p1.15.m15.1.1.3.cmml" xref="S2.SS2.SSS2.p1.15.m15.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.15.m15.1c">\dot{x}_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.15.m15.1d">over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> is obtained from a refinement network that may introduce information loss, we use the original <math alttext="x_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.16.m16.1"><semantics id="S2.SS2.SSS2.p1.16.m16.1a"><msub id="S2.SS2.SSS2.p1.16.m16.1.1" xref="S2.SS2.SSS2.p1.16.m16.1.1.cmml"><mi id="S2.SS2.SSS2.p1.16.m16.1.1.2" xref="S2.SS2.SSS2.p1.16.m16.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.16.m16.1.1.3" xref="S2.SS2.SSS2.p1.16.m16.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.16.m16.1b"><apply id="S2.SS2.SSS2.p1.16.m16.1.1.cmml" xref="S2.SS2.SSS2.p1.16.m16.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.16.m16.1.1.1.cmml" xref="S2.SS2.SSS2.p1.16.m16.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p1.16.m16.1.1.2.cmml" xref="S2.SS2.SSS2.p1.16.m16.1.1.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.16.m16.1.1.3.cmml" xref="S2.SS2.SSS2.p1.16.m16.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.16.m16.1c">x_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.16.m16.1d">italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> instead of <math alttext="\dot{x}_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.17.m17.1"><semantics id="S2.SS2.SSS2.p1.17.m17.1a"><msub id="S2.SS2.SSS2.p1.17.m17.1.1" xref="S2.SS2.SSS2.p1.17.m17.1.1.cmml"><mover accent="true" id="S2.SS2.SSS2.p1.17.m17.1.1.2" xref="S2.SS2.SSS2.p1.17.m17.1.1.2.cmml"><mi id="S2.SS2.SSS2.p1.17.m17.1.1.2.2" xref="S2.SS2.SSS2.p1.17.m17.1.1.2.2.cmml">x</mi><mo id="S2.SS2.SSS2.p1.17.m17.1.1.2.1" xref="S2.SS2.SSS2.p1.17.m17.1.1.2.1.cmml">˙</mo></mover><mi id="S2.SS2.SSS2.p1.17.m17.1.1.3" xref="S2.SS2.SSS2.p1.17.m17.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.17.m17.1b"><apply id="S2.SS2.SSS2.p1.17.m17.1.1.cmml" xref="S2.SS2.SSS2.p1.17.m17.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.17.m17.1.1.1.cmml" xref="S2.SS2.SSS2.p1.17.m17.1.1">subscript</csymbol><apply id="S2.SS2.SSS2.p1.17.m17.1.1.2.cmml" xref="S2.SS2.SSS2.p1.17.m17.1.1.2"><ci id="S2.SS2.SSS2.p1.17.m17.1.1.2.1.cmml" xref="S2.SS2.SSS2.p1.17.m17.1.1.2.1">˙</ci><ci id="S2.SS2.SSS2.p1.17.m17.1.1.2.2.cmml" xref="S2.SS2.SSS2.p1.17.m17.1.1.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS2.p1.17.m17.1.1.3.cmml" xref="S2.SS2.SSS2.p1.17.m17.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.17.m17.1c">\dot{x}_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.17.m17.1d">over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> to obtain <math alttext="\ddot{x}_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.18.m18.1"><semantics id="S2.SS2.SSS2.p1.18.m18.1a"><msub id="S2.SS2.SSS2.p1.18.m18.1.1" xref="S2.SS2.SSS2.p1.18.m18.1.1.cmml"><mover accent="true" id="S2.SS2.SSS2.p1.18.m18.1.1.2" xref="S2.SS2.SSS2.p1.18.m18.1.1.2.cmml"><mi id="S2.SS2.SSS2.p1.18.m18.1.1.2.2" xref="S2.SS2.SSS2.p1.18.m18.1.1.2.2.cmml">x</mi><mo id="S2.SS2.SSS2.p1.18.m18.1.1.2.1" xref="S2.SS2.SSS2.p1.18.m18.1.1.2.1.cmml">¨</mo></mover><mi id="S2.SS2.SSS2.p1.18.m18.1.1.3" xref="S2.SS2.SSS2.p1.18.m18.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.18.m18.1b"><apply id="S2.SS2.SSS2.p1.18.m18.1.1.cmml" xref="S2.SS2.SSS2.p1.18.m18.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.18.m18.1.1.1.cmml" xref="S2.SS2.SSS2.p1.18.m18.1.1">subscript</csymbol><apply id="S2.SS2.SSS2.p1.18.m18.1.1.2.cmml" xref="S2.SS2.SSS2.p1.18.m18.1.1.2"><ci id="S2.SS2.SSS2.p1.18.m18.1.1.2.1.cmml" xref="S2.SS2.SSS2.p1.18.m18.1.1.2.1">¨</ci><ci id="S2.SS2.SSS2.p1.18.m18.1.1.2.2.cmml" xref="S2.SS2.SSS2.p1.18.m18.1.1.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS2.p1.18.m18.1.1.3.cmml" xref="S2.SS2.SSS2.p1.18.m18.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.18.m18.1c">\ddot{x}_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.18.m18.1d">over¨ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math>. To encode a coding frame <math alttext="x_{t}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.19.m19.1"><semantics id="S2.SS2.SSS2.p1.19.m19.1a"><msub id="S2.SS2.SSS2.p1.19.m19.1.1" xref="S2.SS2.SSS2.p1.19.m19.1.1.cmml"><mi id="S2.SS2.SSS2.p1.19.m19.1.1.2" xref="S2.SS2.SSS2.p1.19.m19.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.19.m19.1.1.3" xref="S2.SS2.SSS2.p1.19.m19.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.19.m19.1b"><apply id="S2.SS2.SSS2.p1.19.m19.1.1.cmml" xref="S2.SS2.SSS2.p1.19.m19.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.19.m19.1.1.1.cmml" xref="S2.SS2.SSS2.p1.19.m19.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p1.19.m19.1.1.2.cmml" xref="S2.SS2.SSS2.p1.19.m19.1.1.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.19.m19.1.1.3.cmml" xref="S2.SS2.SSS2.p1.19.m19.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.19.m19.1c">x_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.19.m19.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>, our conditional residual inter-frame codec encodes the residue signal <math alttext="x_{t}-\ddot{x_{c}}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.20.m20.1"><semantics id="S2.SS2.SSS2.p1.20.m20.1a"><mrow id="S2.SS2.SSS2.p1.20.m20.1.1" xref="S2.SS2.SSS2.p1.20.m20.1.1.cmml"><msub id="S2.SS2.SSS2.p1.20.m20.1.1.2" xref="S2.SS2.SSS2.p1.20.m20.1.1.2.cmml"><mi id="S2.SS2.SSS2.p1.20.m20.1.1.2.2" xref="S2.SS2.SSS2.p1.20.m20.1.1.2.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.20.m20.1.1.2.3" xref="S2.SS2.SSS2.p1.20.m20.1.1.2.3.cmml">t</mi></msub><mo id="S2.SS2.SSS2.p1.20.m20.1.1.1" xref="S2.SS2.SSS2.p1.20.m20.1.1.1.cmml">−</mo><mover accent="true" id="S2.SS2.SSS2.p1.20.m20.1.1.3" xref="S2.SS2.SSS2.p1.20.m20.1.1.3.cmml"><msub id="S2.SS2.SSS2.p1.20.m20.1.1.3.2" xref="S2.SS2.SSS2.p1.20.m20.1.1.3.2.cmml"><mi id="S2.SS2.SSS2.p1.20.m20.1.1.3.2.2" xref="S2.SS2.SSS2.p1.20.m20.1.1.3.2.2.cmml">x</mi><mi id="S2.SS2.SSS2.p1.20.m20.1.1.3.2.3" xref="S2.SS2.SSS2.p1.20.m20.1.1.3.2.3.cmml">c</mi></msub><mo id="S2.SS2.SSS2.p1.20.m20.1.1.3.1" xref="S2.SS2.SSS2.p1.20.m20.1.1.3.1.cmml">¨</mo></mover></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.20.m20.1b"><apply id="S2.SS2.SSS2.p1.20.m20.1.1.cmml" xref="S2.SS2.SSS2.p1.20.m20.1.1"><minus id="S2.SS2.SSS2.p1.20.m20.1.1.1.cmml" xref="S2.SS2.SSS2.p1.20.m20.1.1.1"></minus><apply id="S2.SS2.SSS2.p1.20.m20.1.1.2.cmml" xref="S2.SS2.SSS2.p1.20.m20.1.1.2"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.20.m20.1.1.2.1.cmml" xref="S2.SS2.SSS2.p1.20.m20.1.1.2">subscript</csymbol><ci id="S2.SS2.SSS2.p1.20.m20.1.1.2.2.cmml" xref="S2.SS2.SSS2.p1.20.m20.1.1.2.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.20.m20.1.1.2.3.cmml" xref="S2.SS2.SSS2.p1.20.m20.1.1.2.3">𝑡</ci></apply><apply id="S2.SS2.SSS2.p1.20.m20.1.1.3.cmml" xref="S2.SS2.SSS2.p1.20.m20.1.1.3"><ci id="S2.SS2.SSS2.p1.20.m20.1.1.3.1.cmml" xref="S2.SS2.SSS2.p1.20.m20.1.1.3.1">¨</ci><apply id="S2.SS2.SSS2.p1.20.m20.1.1.3.2.cmml" xref="S2.SS2.SSS2.p1.20.m20.1.1.3.2"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.20.m20.1.1.3.2.1.cmml" xref="S2.SS2.SSS2.p1.20.m20.1.1.3.2">subscript</csymbol><ci id="S2.SS2.SSS2.p1.20.m20.1.1.3.2.2.cmml" xref="S2.SS2.SSS2.p1.20.m20.1.1.3.2.2">𝑥</ci><ci id="S2.SS2.SSS2.p1.20.m20.1.1.3.2.3.cmml" xref="S2.SS2.SSS2.p1.20.m20.1.1.3.2.3">𝑐</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.20.m20.1c">x_{t}-\ddot{x_{c}}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.20.m20.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¨ start_ARG italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG</annotation></semantics></math>, conditioned on <math alttext="\dot{x}_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p1.21.m21.1"><semantics id="S2.SS2.SSS2.p1.21.m21.1a"><msub id="S2.SS2.SSS2.p1.21.m21.1.1" xref="S2.SS2.SSS2.p1.21.m21.1.1.cmml"><mover accent="true" id="S2.SS2.SSS2.p1.21.m21.1.1.2" xref="S2.SS2.SSS2.p1.21.m21.1.1.2.cmml"><mi id="S2.SS2.SSS2.p1.21.m21.1.1.2.2" xref="S2.SS2.SSS2.p1.21.m21.1.1.2.2.cmml">x</mi><mo id="S2.SS2.SSS2.p1.21.m21.1.1.2.1" xref="S2.SS2.SSS2.p1.21.m21.1.1.2.1.cmml">˙</mo></mover><mi id="S2.SS2.SSS2.p1.21.m21.1.1.3" xref="S2.SS2.SSS2.p1.21.m21.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p1.21.m21.1b"><apply id="S2.SS2.SSS2.p1.21.m21.1.1.cmml" xref="S2.SS2.SSS2.p1.21.m21.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p1.21.m21.1.1.1.cmml" xref="S2.SS2.SSS2.p1.21.m21.1.1">subscript</csymbol><apply id="S2.SS2.SSS2.p1.21.m21.1.1.2.cmml" xref="S2.SS2.SSS2.p1.21.m21.1.1.2"><ci id="S2.SS2.SSS2.p1.21.m21.1.1.2.1.cmml" xref="S2.SS2.SSS2.p1.21.m21.1.1.2.1">˙</ci><ci id="S2.SS2.SSS2.p1.21.m21.1.1.2.2.cmml" xref="S2.SS2.SSS2.p1.21.m21.1.1.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS2.p1.21.m21.1.1.3.cmml" xref="S2.SS2.SSS2.p1.21.m21.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p1.21.m21.1c">\dot{x}_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p1.21.m21.1d">over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math>.</p> </div> </section> <section class="ltx_subsubsection" id="S2.SS2.SSS3"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection"><span class="ltx_text" id="S2.SS2.SSS3.4.1.1">II-B</span>3 </span>Masked Conditional Residual Coding</h4> <div class="ltx_para" id="S2.SS2.SSS3.p1"> <p class="ltx_p" id="S2.SS2.SSS3.p1.18">While <math alttext="H(x_{t}-\ddot{x}_{c})\leq H(x_{t})" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.1.m1.2"><semantics id="S2.SS2.SSS3.p1.1.m1.2a"><mrow id="S2.SS2.SSS3.p1.1.m1.2.2" xref="S2.SS2.SSS3.p1.1.m1.2.2.cmml"><mrow id="S2.SS2.SSS3.p1.1.m1.1.1.1" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.cmml"><mi id="S2.SS2.SSS3.p1.1.m1.1.1.1.3" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.3.cmml">H</mi><mo id="S2.SS2.SSS3.p1.1.m1.1.1.1.2" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.2.cmml">⁢</mo><mrow id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.cmml"><mo id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.2" stretchy="false" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.cmml">(</mo><mrow id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.cmml"><msub id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.2" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.2.cmml"><mi id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.2.2" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.2.2.cmml">x</mi><mi id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.2.3" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.2.3.cmml">t</mi></msub><mo id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.1" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.1.cmml">−</mo><msub id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.cmml"><mover accent="true" id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.2" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.2.cmml"><mi id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.2.2" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.2.2.cmml">x</mi><mo id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.2.1" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.2.1.cmml">¨</mo></mover><mi id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.3" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.3.cmml">c</mi></msub></mrow><mo id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.3" stretchy="false" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S2.SS2.SSS3.p1.1.m1.2.2.3" xref="S2.SS2.SSS3.p1.1.m1.2.2.3.cmml">≤</mo><mrow id="S2.SS2.SSS3.p1.1.m1.2.2.2" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.cmml"><mi id="S2.SS2.SSS3.p1.1.m1.2.2.2.3" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.3.cmml">H</mi><mo id="S2.SS2.SSS3.p1.1.m1.2.2.2.2" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.2.cmml">⁢</mo><mrow id="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1.cmml"><mo id="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.2" stretchy="false" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1.cmml">(</mo><msub id="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1.cmml"><mi id="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1.2" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1.3" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1.3.cmml">t</mi></msub><mo id="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.3" stretchy="false" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.1.m1.2b"><apply id="S2.SS2.SSS3.p1.1.m1.2.2.cmml" xref="S2.SS2.SSS3.p1.1.m1.2.2"><leq id="S2.SS2.SSS3.p1.1.m1.2.2.3.cmml" xref="S2.SS2.SSS3.p1.1.m1.2.2.3"></leq><apply id="S2.SS2.SSS3.p1.1.m1.1.1.1.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1"><times id="S2.SS2.SSS3.p1.1.m1.1.1.1.2.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.2"></times><ci id="S2.SS2.SSS3.p1.1.m1.1.1.1.3.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.3">𝐻</ci><apply id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1"><minus id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.1.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.1"></minus><apply id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.2.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.2.1.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.2">subscript</csymbol><ci id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.2.2.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.2.2">𝑥</ci><ci id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.2.3.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.2.3">𝑡</ci></apply><apply id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.1.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3">subscript</csymbol><apply id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.2.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.2"><ci id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.2.1.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.2.1">¨</ci><ci id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.2.2.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.3.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.1.3.3">𝑐</ci></apply></apply></apply><apply id="S2.SS2.SSS3.p1.1.m1.2.2.2.cmml" xref="S2.SS2.SSS3.p1.1.m1.2.2.2"><times id="S2.SS2.SSS3.p1.1.m1.2.2.2.2.cmml" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.2"></times><ci id="S2.SS2.SSS3.p1.1.m1.2.2.2.3.cmml" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.3">𝐻</ci><apply id="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1.cmml" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1.1.cmml" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1">subscript</csymbol><ci id="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1.2.cmml" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1.2">𝑥</ci><ci id="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1.3.cmml" xref="S2.SS2.SSS3.p1.1.m1.2.2.2.1.1.1.3">𝑡</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.1.m1.2c">H(x_{t}-\ddot{x}_{c})\leq H(x_{t})</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.1.m1.2d">italic_H ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¨ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ≤ italic_H ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )</annotation></semantics></math> generally holds true, it can be violated at the sub-picture level. For example, in regions with dis-occlusion or unreliable motion estimates, conditional coding is more efficient than conditional residual coding. To enjoy the merits of both conditional coding and conditional residual coding, Chen <em class="ltx_emph ltx_font_italic" id="S2.SS2.SSS3.p1.18.1">et al.</em> <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib5" title="">5</a>]</cite> propose using a pixel-wise soft mask <math alttext="m~{}\in~{}[0,1]^{1\times H\times W}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.2.m2.2"><semantics id="S2.SS2.SSS3.p1.2.m2.2a"><mrow id="S2.SS2.SSS3.p1.2.m2.2.3" xref="S2.SS2.SSS3.p1.2.m2.2.3.cmml"><mi id="S2.SS2.SSS3.p1.2.m2.2.3.2" xref="S2.SS2.SSS3.p1.2.m2.2.3.2.cmml">m</mi><mo id="S2.SS2.SSS3.p1.2.m2.2.3.1" lspace="0.608em" rspace="0.608em" xref="S2.SS2.SSS3.p1.2.m2.2.3.1.cmml">∈</mo><msup id="S2.SS2.SSS3.p1.2.m2.2.3.3" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.cmml"><mrow id="S2.SS2.SSS3.p1.2.m2.2.3.3.2.2" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.2.1.cmml"><mo id="S2.SS2.SSS3.p1.2.m2.2.3.3.2.2.1" stretchy="false" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.2.1.cmml">[</mo><mn id="S2.SS2.SSS3.p1.2.m2.1.1" xref="S2.SS2.SSS3.p1.2.m2.1.1.cmml">0</mn><mo id="S2.SS2.SSS3.p1.2.m2.2.3.3.2.2.2" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.2.1.cmml">,</mo><mn id="S2.SS2.SSS3.p1.2.m2.2.2" xref="S2.SS2.SSS3.p1.2.m2.2.2.cmml">1</mn><mo id="S2.SS2.SSS3.p1.2.m2.2.3.3.2.2.3" stretchy="false" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.2.1.cmml">]</mo></mrow><mrow id="S2.SS2.SSS3.p1.2.m2.2.3.3.3" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.3.cmml"><mn id="S2.SS2.SSS3.p1.2.m2.2.3.3.3.2" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.3.2.cmml">1</mn><mo id="S2.SS2.SSS3.p1.2.m2.2.3.3.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.3.1.cmml">×</mo><mi id="S2.SS2.SSS3.p1.2.m2.2.3.3.3.3" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.3.3.cmml">H</mi><mo id="S2.SS2.SSS3.p1.2.m2.2.3.3.3.1a" lspace="0.222em" rspace="0.222em" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.3.1.cmml">×</mo><mi id="S2.SS2.SSS3.p1.2.m2.2.3.3.3.4" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.3.4.cmml">W</mi></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.2.m2.2b"><apply id="S2.SS2.SSS3.p1.2.m2.2.3.cmml" xref="S2.SS2.SSS3.p1.2.m2.2.3"><in id="S2.SS2.SSS3.p1.2.m2.2.3.1.cmml" xref="S2.SS2.SSS3.p1.2.m2.2.3.1"></in><ci id="S2.SS2.SSS3.p1.2.m2.2.3.2.cmml" xref="S2.SS2.SSS3.p1.2.m2.2.3.2">𝑚</ci><apply id="S2.SS2.SSS3.p1.2.m2.2.3.3.cmml" xref="S2.SS2.SSS3.p1.2.m2.2.3.3"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.2.m2.2.3.3.1.cmml" xref="S2.SS2.SSS3.p1.2.m2.2.3.3">superscript</csymbol><interval closure="closed" id="S2.SS2.SSS3.p1.2.m2.2.3.3.2.1.cmml" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.2.2"><cn id="S2.SS2.SSS3.p1.2.m2.1.1.cmml" type="integer" xref="S2.SS2.SSS3.p1.2.m2.1.1">0</cn><cn id="S2.SS2.SSS3.p1.2.m2.2.2.cmml" type="integer" xref="S2.SS2.SSS3.p1.2.m2.2.2">1</cn></interval><apply id="S2.SS2.SSS3.p1.2.m2.2.3.3.3.cmml" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.3"><times id="S2.SS2.SSS3.p1.2.m2.2.3.3.3.1.cmml" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.3.1"></times><cn id="S2.SS2.SSS3.p1.2.m2.2.3.3.3.2.cmml" type="integer" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.3.2">1</cn><ci id="S2.SS2.SSS3.p1.2.m2.2.3.3.3.3.cmml" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.3.3">𝐻</ci><ci id="S2.SS2.SSS3.p1.2.m2.2.3.3.3.4.cmml" xref="S2.SS2.SSS3.p1.2.m2.2.3.3.3.4">𝑊</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.2.m2.2c">m~{}\in~{}[0,1]^{1\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.2.m2.2d">italic_m ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT 1 × italic_H × italic_W end_POSTSUPERSCRIPT</annotation></semantics></math> to blend the target frame <math alttext="x_{t}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.3.m3.1"><semantics id="S2.SS2.SSS3.p1.3.m3.1a"><msub id="S2.SS2.SSS3.p1.3.m3.1.1" xref="S2.SS2.SSS3.p1.3.m3.1.1.cmml"><mi id="S2.SS2.SSS3.p1.3.m3.1.1.2" xref="S2.SS2.SSS3.p1.3.m3.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS3.p1.3.m3.1.1.3" xref="S2.SS2.SSS3.p1.3.m3.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.3.m3.1b"><apply id="S2.SS2.SSS3.p1.3.m3.1.1.cmml" xref="S2.SS2.SSS3.p1.3.m3.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.3.m3.1.1.1.cmml" xref="S2.SS2.SSS3.p1.3.m3.1.1">subscript</csymbol><ci id="S2.SS2.SSS3.p1.3.m3.1.1.2.cmml" xref="S2.SS2.SSS3.p1.3.m3.1.1.2">𝑥</ci><ci id="S2.SS2.SSS3.p1.3.m3.1.1.3.cmml" xref="S2.SS2.SSS3.p1.3.m3.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.3.m3.1c">x_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.3.m3.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> and the residual frame <math alttext="x_{t}-\ddot{x}_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.4.m4.1"><semantics id="S2.SS2.SSS3.p1.4.m4.1a"><mrow id="S2.SS2.SSS3.p1.4.m4.1.1" xref="S2.SS2.SSS3.p1.4.m4.1.1.cmml"><msub id="S2.SS2.SSS3.p1.4.m4.1.1.2" xref="S2.SS2.SSS3.p1.4.m4.1.1.2.cmml"><mi id="S2.SS2.SSS3.p1.4.m4.1.1.2.2" xref="S2.SS2.SSS3.p1.4.m4.1.1.2.2.cmml">x</mi><mi id="S2.SS2.SSS3.p1.4.m4.1.1.2.3" xref="S2.SS2.SSS3.p1.4.m4.1.1.2.3.cmml">t</mi></msub><mo id="S2.SS2.SSS3.p1.4.m4.1.1.1" xref="S2.SS2.SSS3.p1.4.m4.1.1.1.cmml">−</mo><msub id="S2.SS2.SSS3.p1.4.m4.1.1.3" xref="S2.SS2.SSS3.p1.4.m4.1.1.3.cmml"><mover accent="true" id="S2.SS2.SSS3.p1.4.m4.1.1.3.2" xref="S2.SS2.SSS3.p1.4.m4.1.1.3.2.cmml"><mi id="S2.SS2.SSS3.p1.4.m4.1.1.3.2.2" xref="S2.SS2.SSS3.p1.4.m4.1.1.3.2.2.cmml">x</mi><mo id="S2.SS2.SSS3.p1.4.m4.1.1.3.2.1" xref="S2.SS2.SSS3.p1.4.m4.1.1.3.2.1.cmml">¨</mo></mover><mi id="S2.SS2.SSS3.p1.4.m4.1.1.3.3" xref="S2.SS2.SSS3.p1.4.m4.1.1.3.3.cmml">c</mi></msub></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.4.m4.1b"><apply id="S2.SS2.SSS3.p1.4.m4.1.1.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1"><minus id="S2.SS2.SSS3.p1.4.m4.1.1.1.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.1"></minus><apply id="S2.SS2.SSS3.p1.4.m4.1.1.2.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.2"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.4.m4.1.1.2.1.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.2">subscript</csymbol><ci id="S2.SS2.SSS3.p1.4.m4.1.1.2.2.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.2.2">𝑥</ci><ci id="S2.SS2.SSS3.p1.4.m4.1.1.2.3.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.2.3">𝑡</ci></apply><apply id="S2.SS2.SSS3.p1.4.m4.1.1.3.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.3"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.4.m4.1.1.3.1.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.3">subscript</csymbol><apply id="S2.SS2.SSS3.p1.4.m4.1.1.3.2.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.3.2"><ci id="S2.SS2.SSS3.p1.4.m4.1.1.3.2.1.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.3.2.1">¨</ci><ci id="S2.SS2.SSS3.p1.4.m4.1.1.3.2.2.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.3.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS3.p1.4.m4.1.1.3.3.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.3.3">𝑐</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.4.m4.1c">x_{t}-\ddot{x}_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.4.m4.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¨ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> as <math alttext="(1-m)\odot x_{t}+m\odot(x_{t}-\ddot{x}_{c})" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.5.m5.2"><semantics id="S2.SS2.SSS3.p1.5.m5.2a"><mrow id="S2.SS2.SSS3.p1.5.m5.2.2" xref="S2.SS2.SSS3.p1.5.m5.2.2.cmml"><mrow id="S2.SS2.SSS3.p1.5.m5.1.1.1" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.cmml"><mrow id="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.cmml"><mo id="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.2" stretchy="false" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.cmml">(</mo><mrow id="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.cmml"><mn id="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.2" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.2.cmml">1</mn><mo id="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.1" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.1.cmml">−</mo><mi id="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.3" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.3.cmml">m</mi></mrow><mo id="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.3" rspace="0.055em" stretchy="false" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.cmml">)</mo></mrow><mo id="S2.SS2.SSS3.p1.5.m5.1.1.1.2" rspace="0.222em" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.2.cmml">⊙</mo><msub id="S2.SS2.SSS3.p1.5.m5.1.1.1.3" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.3.cmml"><mi id="S2.SS2.SSS3.p1.5.m5.1.1.1.3.2" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.3.2.cmml">x</mi><mi id="S2.SS2.SSS3.p1.5.m5.1.1.1.3.3" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.3.3.cmml">t</mi></msub></mrow><mo id="S2.SS2.SSS3.p1.5.m5.2.2.3" xref="S2.SS2.SSS3.p1.5.m5.2.2.3.cmml">+</mo><mrow id="S2.SS2.SSS3.p1.5.m5.2.2.2" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.cmml"><mi id="S2.SS2.SSS3.p1.5.m5.2.2.2.3" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.3.cmml">m</mi><mo id="S2.SS2.SSS3.p1.5.m5.2.2.2.2" lspace="0.222em" rspace="0.222em" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.2.cmml">⊙</mo><mrow id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.cmml"><mo id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.2" stretchy="false" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.cmml">(</mo><mrow id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.cmml"><msub id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.2" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.2.cmml"><mi id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.2.2" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.2.2.cmml">x</mi><mi id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.2.3" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.2.3.cmml">t</mi></msub><mo id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.1" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.1.cmml">−</mo><msub id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.cmml"><mover accent="true" id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.2" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.2.cmml"><mi id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.2.2" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.2.2.cmml">x</mi><mo id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.2.1" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.2.1.cmml">¨</mo></mover><mi id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.3" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.3.cmml">c</mi></msub></mrow><mo id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.3" stretchy="false" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.5.m5.2b"><apply id="S2.SS2.SSS3.p1.5.m5.2.2.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2"><plus id="S2.SS2.SSS3.p1.5.m5.2.2.3.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.3"></plus><apply id="S2.SS2.SSS3.p1.5.m5.1.1.1.cmml" xref="S2.SS2.SSS3.p1.5.m5.1.1.1"><csymbol cd="latexml" id="S2.SS2.SSS3.p1.5.m5.1.1.1.2.cmml" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.2">direct-product</csymbol><apply id="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.cmml" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1"><minus id="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.1.cmml" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.1"></minus><cn id="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.2.cmml" type="integer" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.2">1</cn><ci id="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.3.cmml" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.1.1.1.3">𝑚</ci></apply><apply id="S2.SS2.SSS3.p1.5.m5.1.1.1.3.cmml" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.3"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.5.m5.1.1.1.3.1.cmml" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.3">subscript</csymbol><ci id="S2.SS2.SSS3.p1.5.m5.1.1.1.3.2.cmml" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.3.2">𝑥</ci><ci id="S2.SS2.SSS3.p1.5.m5.1.1.1.3.3.cmml" xref="S2.SS2.SSS3.p1.5.m5.1.1.1.3.3">𝑡</ci></apply></apply><apply id="S2.SS2.SSS3.p1.5.m5.2.2.2.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2"><csymbol cd="latexml" id="S2.SS2.SSS3.p1.5.m5.2.2.2.2.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.2">direct-product</csymbol><ci id="S2.SS2.SSS3.p1.5.m5.2.2.2.3.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.3">𝑚</ci><apply id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1"><minus id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.1.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.1"></minus><apply id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.2.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.2"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.2.1.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.2">subscript</csymbol><ci id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.2.2.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.2.2">𝑥</ci><ci id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.2.3.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.2.3">𝑡</ci></apply><apply id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.1.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3">subscript</csymbol><apply id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.2.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.2"><ci id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.2.1.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.2.1">¨</ci><ci id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.2.2.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.3.cmml" xref="S2.SS2.SSS3.p1.5.m5.2.2.2.1.1.1.3.3">𝑐</ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.5.m5.2c">(1-m)\odot x_{t}+m\odot(x_{t}-\ddot{x}_{c})</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.5.m5.2d">( 1 - italic_m ) ⊙ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_m ⊙ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¨ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT )</annotation></semantics></math>, where <math alttext="\odot" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.6.m6.1"><semantics id="S2.SS2.SSS3.p1.6.m6.1a"><mo id="S2.SS2.SSS3.p1.6.m6.1.1" xref="S2.SS2.SSS3.p1.6.m6.1.1.cmml">⊙</mo><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.6.m6.1b"><csymbol cd="latexml" id="S2.SS2.SSS3.p1.6.m6.1.1.cmml" xref="S2.SS2.SSS3.p1.6.m6.1.1">direct-product</csymbol></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.6.m6.1c">\odot</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.6.m6.1d">⊙</annotation></semantics></math> is element-wise multiplication, to serve as the input signal to the conditional inter-frame codec. When <math alttext="\ddot{x}_{t}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.7.m7.1"><semantics id="S2.SS2.SSS3.p1.7.m7.1a"><msub id="S2.SS2.SSS3.p1.7.m7.1.1" xref="S2.SS2.SSS3.p1.7.m7.1.1.cmml"><mover accent="true" id="S2.SS2.SSS3.p1.7.m7.1.1.2" xref="S2.SS2.SSS3.p1.7.m7.1.1.2.cmml"><mi id="S2.SS2.SSS3.p1.7.m7.1.1.2.2" xref="S2.SS2.SSS3.p1.7.m7.1.1.2.2.cmml">x</mi><mo id="S2.SS2.SSS3.p1.7.m7.1.1.2.1" xref="S2.SS2.SSS3.p1.7.m7.1.1.2.1.cmml">¨</mo></mover><mi id="S2.SS2.SSS3.p1.7.m7.1.1.3" xref="S2.SS2.SSS3.p1.7.m7.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.7.m7.1b"><apply id="S2.SS2.SSS3.p1.7.m7.1.1.cmml" xref="S2.SS2.SSS3.p1.7.m7.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.7.m7.1.1.1.cmml" xref="S2.SS2.SSS3.p1.7.m7.1.1">subscript</csymbol><apply id="S2.SS2.SSS3.p1.7.m7.1.1.2.cmml" xref="S2.SS2.SSS3.p1.7.m7.1.1.2"><ci id="S2.SS2.SSS3.p1.7.m7.1.1.2.1.cmml" xref="S2.SS2.SSS3.p1.7.m7.1.1.2.1">¨</ci><ci id="S2.SS2.SSS3.p1.7.m7.1.1.2.2.cmml" xref="S2.SS2.SSS3.p1.7.m7.1.1.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS3.p1.7.m7.1.1.3.cmml" xref="S2.SS2.SSS3.p1.7.m7.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.7.m7.1c">\ddot{x}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.7.m7.1d">over¨ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> forms a good prediction of <math alttext="x_{t}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.8.m8.1"><semantics id="S2.SS2.SSS3.p1.8.m8.1a"><msub id="S2.SS2.SSS3.p1.8.m8.1.1" xref="S2.SS2.SSS3.p1.8.m8.1.1.cmml"><mi id="S2.SS2.SSS3.p1.8.m8.1.1.2" xref="S2.SS2.SSS3.p1.8.m8.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS3.p1.8.m8.1.1.3" xref="S2.SS2.SSS3.p1.8.m8.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.8.m8.1b"><apply id="S2.SS2.SSS3.p1.8.m8.1.1.cmml" xref="S2.SS2.SSS3.p1.8.m8.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.8.m8.1.1.1.cmml" xref="S2.SS2.SSS3.p1.8.m8.1.1">subscript</csymbol><ci id="S2.SS2.SSS3.p1.8.m8.1.1.2.cmml" xref="S2.SS2.SSS3.p1.8.m8.1.1.2">𝑥</ci><ci id="S2.SS2.SSS3.p1.8.m8.1.1.3.cmml" xref="S2.SS2.SSS3.p1.8.m8.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.8.m8.1c">x_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.8.m8.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> in some image regions, the corresponding mask values should ideally approach 1. In this case, conditional residual coding is chosen. Conversely, when the prediction <math alttext="\ddot{x}_{t}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.9.m9.1"><semantics id="S2.SS2.SSS3.p1.9.m9.1a"><msub id="S2.SS2.SSS3.p1.9.m9.1.1" xref="S2.SS2.SSS3.p1.9.m9.1.1.cmml"><mover accent="true" id="S2.SS2.SSS3.p1.9.m9.1.1.2" xref="S2.SS2.SSS3.p1.9.m9.1.1.2.cmml"><mi id="S2.SS2.SSS3.p1.9.m9.1.1.2.2" xref="S2.SS2.SSS3.p1.9.m9.1.1.2.2.cmml">x</mi><mo id="S2.SS2.SSS3.p1.9.m9.1.1.2.1" xref="S2.SS2.SSS3.p1.9.m9.1.1.2.1.cmml">¨</mo></mover><mi id="S2.SS2.SSS3.p1.9.m9.1.1.3" xref="S2.SS2.SSS3.p1.9.m9.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.9.m9.1b"><apply id="S2.SS2.SSS3.p1.9.m9.1.1.cmml" xref="S2.SS2.SSS3.p1.9.m9.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.9.m9.1.1.1.cmml" xref="S2.SS2.SSS3.p1.9.m9.1.1">subscript</csymbol><apply id="S2.SS2.SSS3.p1.9.m9.1.1.2.cmml" xref="S2.SS2.SSS3.p1.9.m9.1.1.2"><ci id="S2.SS2.SSS3.p1.9.m9.1.1.2.1.cmml" xref="S2.SS2.SSS3.p1.9.m9.1.1.2.1">¨</ci><ci id="S2.SS2.SSS3.p1.9.m9.1.1.2.2.cmml" xref="S2.SS2.SSS3.p1.9.m9.1.1.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS3.p1.9.m9.1.1.3.cmml" xref="S2.SS2.SSS3.p1.9.m9.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.9.m9.1c">\ddot{x}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.9.m9.1d">over¨ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> is poor, the mask values should approach 0, turning the coding mode into conditional coding. In Fig. <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S1.F2" title="Figure 2 ‣ I Introduction ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">2</span></a> (c), we extend from conditional residual coding in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S1.F2" title="Figure 2 ‣ I Introduction ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">2</span></a> (b) to masked conditional residual coding. We follow <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib5" title="">5</a>]</cite> to introduce a mask generator to predict a pixel-wise soft mask <math alttext="m\in\mathbb{R}^{1\times H\times W}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.10.m10.1"><semantics id="S2.SS2.SSS3.p1.10.m10.1a"><mrow id="S2.SS2.SSS3.p1.10.m10.1.1" xref="S2.SS2.SSS3.p1.10.m10.1.1.cmml"><mi id="S2.SS2.SSS3.p1.10.m10.1.1.2" xref="S2.SS2.SSS3.p1.10.m10.1.1.2.cmml">m</mi><mo id="S2.SS2.SSS3.p1.10.m10.1.1.1" xref="S2.SS2.SSS3.p1.10.m10.1.1.1.cmml">∈</mo><msup id="S2.SS2.SSS3.p1.10.m10.1.1.3" xref="S2.SS2.SSS3.p1.10.m10.1.1.3.cmml"><mi id="S2.SS2.SSS3.p1.10.m10.1.1.3.2" xref="S2.SS2.SSS3.p1.10.m10.1.1.3.2.cmml">ℝ</mi><mrow id="S2.SS2.SSS3.p1.10.m10.1.1.3.3" xref="S2.SS2.SSS3.p1.10.m10.1.1.3.3.cmml"><mn id="S2.SS2.SSS3.p1.10.m10.1.1.3.3.2" xref="S2.SS2.SSS3.p1.10.m10.1.1.3.3.2.cmml">1</mn><mo id="S2.SS2.SSS3.p1.10.m10.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS2.SSS3.p1.10.m10.1.1.3.3.1.cmml">×</mo><mi id="S2.SS2.SSS3.p1.10.m10.1.1.3.3.3" xref="S2.SS2.SSS3.p1.10.m10.1.1.3.3.3.cmml">H</mi><mo id="S2.SS2.SSS3.p1.10.m10.1.1.3.3.1a" lspace="0.222em" rspace="0.222em" xref="S2.SS2.SSS3.p1.10.m10.1.1.3.3.1.cmml">×</mo><mi id="S2.SS2.SSS3.p1.10.m10.1.1.3.3.4" xref="S2.SS2.SSS3.p1.10.m10.1.1.3.3.4.cmml">W</mi></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.10.m10.1b"><apply id="S2.SS2.SSS3.p1.10.m10.1.1.cmml" xref="S2.SS2.SSS3.p1.10.m10.1.1"><in id="S2.SS2.SSS3.p1.10.m10.1.1.1.cmml" xref="S2.SS2.SSS3.p1.10.m10.1.1.1"></in><ci id="S2.SS2.SSS3.p1.10.m10.1.1.2.cmml" xref="S2.SS2.SSS3.p1.10.m10.1.1.2">𝑚</ci><apply id="S2.SS2.SSS3.p1.10.m10.1.1.3.cmml" xref="S2.SS2.SSS3.p1.10.m10.1.1.3"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.10.m10.1.1.3.1.cmml" xref="S2.SS2.SSS3.p1.10.m10.1.1.3">superscript</csymbol><ci id="S2.SS2.SSS3.p1.10.m10.1.1.3.2.cmml" xref="S2.SS2.SSS3.p1.10.m10.1.1.3.2">ℝ</ci><apply id="S2.SS2.SSS3.p1.10.m10.1.1.3.3.cmml" xref="S2.SS2.SSS3.p1.10.m10.1.1.3.3"><times id="S2.SS2.SSS3.p1.10.m10.1.1.3.3.1.cmml" xref="S2.SS2.SSS3.p1.10.m10.1.1.3.3.1"></times><cn id="S2.SS2.SSS3.p1.10.m10.1.1.3.3.2.cmml" type="integer" xref="S2.SS2.SSS3.p1.10.m10.1.1.3.3.2">1</cn><ci id="S2.SS2.SSS3.p1.10.m10.1.1.3.3.3.cmml" xref="S2.SS2.SSS3.p1.10.m10.1.1.3.3.3">𝐻</ci><ci id="S2.SS2.SSS3.p1.10.m10.1.1.3.3.4.cmml" xref="S2.SS2.SSS3.p1.10.m10.1.1.3.3.4">𝑊</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.10.m10.1c">m\in\mathbb{R}^{1\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.10.m10.1d">italic_m ∈ blackboard_R start_POSTSUPERSCRIPT 1 × italic_H × italic_W end_POSTSUPERSCRIPT</annotation></semantics></math>. The mask generator takes the decoded flow maps <math alttext="\hat{f}_{t}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.11.m11.1"><semantics id="S2.SS2.SSS3.p1.11.m11.1a"><msub id="S2.SS2.SSS3.p1.11.m11.1.1" xref="S2.SS2.SSS3.p1.11.m11.1.1.cmml"><mover accent="true" id="S2.SS2.SSS3.p1.11.m11.1.1.2" xref="S2.SS2.SSS3.p1.11.m11.1.1.2.cmml"><mi id="S2.SS2.SSS3.p1.11.m11.1.1.2.2" xref="S2.SS2.SSS3.p1.11.m11.1.1.2.2.cmml">f</mi><mo id="S2.SS2.SSS3.p1.11.m11.1.1.2.1" xref="S2.SS2.SSS3.p1.11.m11.1.1.2.1.cmml">^</mo></mover><mi id="S2.SS2.SSS3.p1.11.m11.1.1.3" xref="S2.SS2.SSS3.p1.11.m11.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.11.m11.1b"><apply id="S2.SS2.SSS3.p1.11.m11.1.1.cmml" xref="S2.SS2.SSS3.p1.11.m11.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.11.m11.1.1.1.cmml" xref="S2.SS2.SSS3.p1.11.m11.1.1">subscript</csymbol><apply id="S2.SS2.SSS3.p1.11.m11.1.1.2.cmml" xref="S2.SS2.SSS3.p1.11.m11.1.1.2"><ci id="S2.SS2.SSS3.p1.11.m11.1.1.2.1.cmml" xref="S2.SS2.SSS3.p1.11.m11.1.1.2.1">^</ci><ci id="S2.SS2.SSS3.p1.11.m11.1.1.2.2.cmml" xref="S2.SS2.SSS3.p1.11.m11.1.1.2.2">𝑓</ci></apply><ci id="S2.SS2.SSS3.p1.11.m11.1.1.3.cmml" xref="S2.SS2.SSS3.p1.11.m11.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.11.m11.1c">\hat{f}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.11.m11.1d">over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> and the pixel-domain temporal predictor <math alttext="\ddot{x}_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.12.m12.1"><semantics id="S2.SS2.SSS3.p1.12.m12.1a"><msub id="S2.SS2.SSS3.p1.12.m12.1.1" xref="S2.SS2.SSS3.p1.12.m12.1.1.cmml"><mover accent="true" id="S2.SS2.SSS3.p1.12.m12.1.1.2" xref="S2.SS2.SSS3.p1.12.m12.1.1.2.cmml"><mi id="S2.SS2.SSS3.p1.12.m12.1.1.2.2" xref="S2.SS2.SSS3.p1.12.m12.1.1.2.2.cmml">x</mi><mo id="S2.SS2.SSS3.p1.12.m12.1.1.2.1" xref="S2.SS2.SSS3.p1.12.m12.1.1.2.1.cmml">¨</mo></mover><mi id="S2.SS2.SSS3.p1.12.m12.1.1.3" xref="S2.SS2.SSS3.p1.12.m12.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.12.m12.1b"><apply id="S2.SS2.SSS3.p1.12.m12.1.1.cmml" xref="S2.SS2.SSS3.p1.12.m12.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.12.m12.1.1.1.cmml" xref="S2.SS2.SSS3.p1.12.m12.1.1">subscript</csymbol><apply id="S2.SS2.SSS3.p1.12.m12.1.1.2.cmml" xref="S2.SS2.SSS3.p1.12.m12.1.1.2"><ci id="S2.SS2.SSS3.p1.12.m12.1.1.2.1.cmml" xref="S2.SS2.SSS3.p1.12.m12.1.1.2.1">¨</ci><ci id="S2.SS2.SSS3.p1.12.m12.1.1.2.2.cmml" xref="S2.SS2.SSS3.p1.12.m12.1.1.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS3.p1.12.m12.1.1.3.cmml" xref="S2.SS2.SSS3.p1.12.m12.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.12.m12.1c">\ddot{x}_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.12.m12.1d">over¨ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> as inputs. The architecture of the mask generator is elaborated in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S1.F3" title="Figure 3 ‣ I Introduction ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">3</span></a>. To transmit a coding frame <math alttext="x_{t}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.13.m13.1"><semantics id="S2.SS2.SSS3.p1.13.m13.1a"><msub id="S2.SS2.SSS3.p1.13.m13.1.1" xref="S2.SS2.SSS3.p1.13.m13.1.1.cmml"><mi id="S2.SS2.SSS3.p1.13.m13.1.1.2" xref="S2.SS2.SSS3.p1.13.m13.1.1.2.cmml">x</mi><mi id="S2.SS2.SSS3.p1.13.m13.1.1.3" xref="S2.SS2.SSS3.p1.13.m13.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.13.m13.1b"><apply id="S2.SS2.SSS3.p1.13.m13.1.1.cmml" xref="S2.SS2.SSS3.p1.13.m13.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.13.m13.1.1.1.cmml" xref="S2.SS2.SSS3.p1.13.m13.1.1">subscript</csymbol><ci id="S2.SS2.SSS3.p1.13.m13.1.1.2.cmml" xref="S2.SS2.SSS3.p1.13.m13.1.1.2">𝑥</ci><ci id="S2.SS2.SSS3.p1.13.m13.1.1.3.cmml" xref="S2.SS2.SSS3.p1.13.m13.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.13.m13.1c">x_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.13.m13.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>, our masked conditional residual inter-frame codec encodes <math alttext="x_{t}-m\odot\ddot{x}_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.14.m14.1"><semantics id="S2.SS2.SSS3.p1.14.m14.1a"><mrow id="S2.SS2.SSS3.p1.14.m14.1.1" xref="S2.SS2.SSS3.p1.14.m14.1.1.cmml"><msub id="S2.SS2.SSS3.p1.14.m14.1.1.2" xref="S2.SS2.SSS3.p1.14.m14.1.1.2.cmml"><mi id="S2.SS2.SSS3.p1.14.m14.1.1.2.2" xref="S2.SS2.SSS3.p1.14.m14.1.1.2.2.cmml">x</mi><mi id="S2.SS2.SSS3.p1.14.m14.1.1.2.3" xref="S2.SS2.SSS3.p1.14.m14.1.1.2.3.cmml">t</mi></msub><mo id="S2.SS2.SSS3.p1.14.m14.1.1.1" xref="S2.SS2.SSS3.p1.14.m14.1.1.1.cmml">−</mo><mrow id="S2.SS2.SSS3.p1.14.m14.1.1.3" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.cmml"><mi id="S2.SS2.SSS3.p1.14.m14.1.1.3.2" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.2.cmml">m</mi><mo id="S2.SS2.SSS3.p1.14.m14.1.1.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.1.cmml">⊙</mo><msub id="S2.SS2.SSS3.p1.14.m14.1.1.3.3" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.3.cmml"><mover accent="true" id="S2.SS2.SSS3.p1.14.m14.1.1.3.3.2" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.3.2.cmml"><mi id="S2.SS2.SSS3.p1.14.m14.1.1.3.3.2.2" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.3.2.2.cmml">x</mi><mo id="S2.SS2.SSS3.p1.14.m14.1.1.3.3.2.1" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.3.2.1.cmml">¨</mo></mover><mi id="S2.SS2.SSS3.p1.14.m14.1.1.3.3.3" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.3.3.cmml">c</mi></msub></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.14.m14.1b"><apply id="S2.SS2.SSS3.p1.14.m14.1.1.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1"><minus id="S2.SS2.SSS3.p1.14.m14.1.1.1.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1.1"></minus><apply id="S2.SS2.SSS3.p1.14.m14.1.1.2.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1.2"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.14.m14.1.1.2.1.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1.2">subscript</csymbol><ci id="S2.SS2.SSS3.p1.14.m14.1.1.2.2.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1.2.2">𝑥</ci><ci id="S2.SS2.SSS3.p1.14.m14.1.1.2.3.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1.2.3">𝑡</ci></apply><apply id="S2.SS2.SSS3.p1.14.m14.1.1.3.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1.3"><csymbol cd="latexml" id="S2.SS2.SSS3.p1.14.m14.1.1.3.1.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.1">direct-product</csymbol><ci id="S2.SS2.SSS3.p1.14.m14.1.1.3.2.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.2">𝑚</ci><apply id="S2.SS2.SSS3.p1.14.m14.1.1.3.3.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.3"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.14.m14.1.1.3.3.1.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.3">subscript</csymbol><apply id="S2.SS2.SSS3.p1.14.m14.1.1.3.3.2.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.3.2"><ci id="S2.SS2.SSS3.p1.14.m14.1.1.3.3.2.1.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.3.2.1">¨</ci><ci id="S2.SS2.SSS3.p1.14.m14.1.1.3.3.2.2.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.3.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS3.p1.14.m14.1.1.3.3.3.cmml" xref="S2.SS2.SSS3.p1.14.m14.1.1.3.3.3">𝑐</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.14.m14.1c">x_{t}-m\odot\ddot{x}_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.14.m14.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_m ⊙ over¨ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> (equivalent to <math alttext="(1-m)\odot x_{t}+m\odot(x_{t}-\ddot{x}_{c}))" class="ltx_math_unparsed" display="inline" id="S2.SS2.SSS3.p1.15.m15.1"><semantics id="S2.SS2.SSS3.p1.15.m15.1a"><mrow id="S2.SS2.SSS3.p1.15.m15.1b"><mrow id="S2.SS2.SSS3.p1.15.m15.1.1"><mo id="S2.SS2.SSS3.p1.15.m15.1.1.1" stretchy="false">(</mo><mn id="S2.SS2.SSS3.p1.15.m15.1.1.2">1</mn><mo id="S2.SS2.SSS3.p1.15.m15.1.1.3">−</mo><mi id="S2.SS2.SSS3.p1.15.m15.1.1.4">m</mi><mo id="S2.SS2.SSS3.p1.15.m15.1.1.5" rspace="0.055em" stretchy="false">)</mo></mrow><mo id="S2.SS2.SSS3.p1.15.m15.1.2" rspace="0.222em">⊙</mo><msub id="S2.SS2.SSS3.p1.15.m15.1.3"><mi id="S2.SS2.SSS3.p1.15.m15.1.3.2">x</mi><mi id="S2.SS2.SSS3.p1.15.m15.1.3.3">t</mi></msub><mo id="S2.SS2.SSS3.p1.15.m15.1.4">+</mo><mi id="S2.SS2.SSS3.p1.15.m15.1.5">m</mi><mo id="S2.SS2.SSS3.p1.15.m15.1.6" lspace="0.222em" rspace="0.222em">⊙</mo><mrow id="S2.SS2.SSS3.p1.15.m15.1.7"><mo id="S2.SS2.SSS3.p1.15.m15.1.7.1" stretchy="false">(</mo><msub id="S2.SS2.SSS3.p1.15.m15.1.7.2"><mi id="S2.SS2.SSS3.p1.15.m15.1.7.2.2">x</mi><mi id="S2.SS2.SSS3.p1.15.m15.1.7.2.3">t</mi></msub><mo id="S2.SS2.SSS3.p1.15.m15.1.7.3">−</mo><msub id="S2.SS2.SSS3.p1.15.m15.1.7.4"><mover accent="true" id="S2.SS2.SSS3.p1.15.m15.1.7.4.2"><mi id="S2.SS2.SSS3.p1.15.m15.1.7.4.2.2">x</mi><mo id="S2.SS2.SSS3.p1.15.m15.1.7.4.2.1">¨</mo></mover><mi id="S2.SS2.SSS3.p1.15.m15.1.7.4.3">c</mi></msub><mo id="S2.SS2.SSS3.p1.15.m15.1.7.5" stretchy="false">)</mo></mrow><mo id="S2.SS2.SSS3.p1.15.m15.1.8" stretchy="false">)</mo></mrow><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.15.m15.1c">(1-m)\odot x_{t}+m\odot(x_{t}-\ddot{x}_{c}))</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.15.m15.1d">( 1 - italic_m ) ⊙ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_m ⊙ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¨ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) )</annotation></semantics></math>, conditioned on <math alttext="\dot{x}_{c}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.16.m16.1"><semantics id="S2.SS2.SSS3.p1.16.m16.1a"><msub id="S2.SS2.SSS3.p1.16.m16.1.1" xref="S2.SS2.SSS3.p1.16.m16.1.1.cmml"><mover accent="true" id="S2.SS2.SSS3.p1.16.m16.1.1.2" xref="S2.SS2.SSS3.p1.16.m16.1.1.2.cmml"><mi id="S2.SS2.SSS3.p1.16.m16.1.1.2.2" xref="S2.SS2.SSS3.p1.16.m16.1.1.2.2.cmml">x</mi><mo id="S2.SS2.SSS3.p1.16.m16.1.1.2.1" xref="S2.SS2.SSS3.p1.16.m16.1.1.2.1.cmml">˙</mo></mover><mi id="S2.SS2.SSS3.p1.16.m16.1.1.3" xref="S2.SS2.SSS3.p1.16.m16.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.16.m16.1b"><apply id="S2.SS2.SSS3.p1.16.m16.1.1.cmml" xref="S2.SS2.SSS3.p1.16.m16.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.16.m16.1.1.1.cmml" xref="S2.SS2.SSS3.p1.16.m16.1.1">subscript</csymbol><apply id="S2.SS2.SSS3.p1.16.m16.1.1.2.cmml" xref="S2.SS2.SSS3.p1.16.m16.1.1.2"><ci id="S2.SS2.SSS3.p1.16.m16.1.1.2.1.cmml" xref="S2.SS2.SSS3.p1.16.m16.1.1.2.1">˙</ci><ci id="S2.SS2.SSS3.p1.16.m16.1.1.2.2.cmml" xref="S2.SS2.SSS3.p1.16.m16.1.1.2.2">𝑥</ci></apply><ci id="S2.SS2.SSS3.p1.16.m16.1.1.3.cmml" xref="S2.SS2.SSS3.p1.16.m16.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.16.m16.1c">\dot{x}_{c}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.16.m16.1d">over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math>, where the predicted <math alttext="m\in\mathbb{R}^{1\times H\times W}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.17.m17.1"><semantics id="S2.SS2.SSS3.p1.17.m17.1a"><mrow id="S2.SS2.SSS3.p1.17.m17.1.1" xref="S2.SS2.SSS3.p1.17.m17.1.1.cmml"><mi id="S2.SS2.SSS3.p1.17.m17.1.1.2" xref="S2.SS2.SSS3.p1.17.m17.1.1.2.cmml">m</mi><mo id="S2.SS2.SSS3.p1.17.m17.1.1.1" xref="S2.SS2.SSS3.p1.17.m17.1.1.1.cmml">∈</mo><msup id="S2.SS2.SSS3.p1.17.m17.1.1.3" xref="S2.SS2.SSS3.p1.17.m17.1.1.3.cmml"><mi id="S2.SS2.SSS3.p1.17.m17.1.1.3.2" xref="S2.SS2.SSS3.p1.17.m17.1.1.3.2.cmml">ℝ</mi><mrow id="S2.SS2.SSS3.p1.17.m17.1.1.3.3" xref="S2.SS2.SSS3.p1.17.m17.1.1.3.3.cmml"><mn id="S2.SS2.SSS3.p1.17.m17.1.1.3.3.2" xref="S2.SS2.SSS3.p1.17.m17.1.1.3.3.2.cmml">1</mn><mo id="S2.SS2.SSS3.p1.17.m17.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS2.SSS3.p1.17.m17.1.1.3.3.1.cmml">×</mo><mi id="S2.SS2.SSS3.p1.17.m17.1.1.3.3.3" xref="S2.SS2.SSS3.p1.17.m17.1.1.3.3.3.cmml">H</mi><mo id="S2.SS2.SSS3.p1.17.m17.1.1.3.3.1a" lspace="0.222em" rspace="0.222em" xref="S2.SS2.SSS3.p1.17.m17.1.1.3.3.1.cmml">×</mo><mi id="S2.SS2.SSS3.p1.17.m17.1.1.3.3.4" xref="S2.SS2.SSS3.p1.17.m17.1.1.3.3.4.cmml">W</mi></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.17.m17.1b"><apply id="S2.SS2.SSS3.p1.17.m17.1.1.cmml" xref="S2.SS2.SSS3.p1.17.m17.1.1"><in id="S2.SS2.SSS3.p1.17.m17.1.1.1.cmml" xref="S2.SS2.SSS3.p1.17.m17.1.1.1"></in><ci id="S2.SS2.SSS3.p1.17.m17.1.1.2.cmml" xref="S2.SS2.SSS3.p1.17.m17.1.1.2">𝑚</ci><apply id="S2.SS2.SSS3.p1.17.m17.1.1.3.cmml" xref="S2.SS2.SSS3.p1.17.m17.1.1.3"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.17.m17.1.1.3.1.cmml" xref="S2.SS2.SSS3.p1.17.m17.1.1.3">superscript</csymbol><ci id="S2.SS2.SSS3.p1.17.m17.1.1.3.2.cmml" xref="S2.SS2.SSS3.p1.17.m17.1.1.3.2">ℝ</ci><apply id="S2.SS2.SSS3.p1.17.m17.1.1.3.3.cmml" xref="S2.SS2.SSS3.p1.17.m17.1.1.3.3"><times id="S2.SS2.SSS3.p1.17.m17.1.1.3.3.1.cmml" xref="S2.SS2.SSS3.p1.17.m17.1.1.3.3.1"></times><cn id="S2.SS2.SSS3.p1.17.m17.1.1.3.3.2.cmml" type="integer" xref="S2.SS2.SSS3.p1.17.m17.1.1.3.3.2">1</cn><ci id="S2.SS2.SSS3.p1.17.m17.1.1.3.3.3.cmml" xref="S2.SS2.SSS3.p1.17.m17.1.1.3.3.3">𝐻</ci><ci id="S2.SS2.SSS3.p1.17.m17.1.1.3.3.4.cmml" xref="S2.SS2.SSS3.p1.17.m17.1.1.3.3.4">𝑊</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.17.m17.1c">m\in\mathbb{R}^{1\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.17.m17.1d">italic_m ∈ blackboard_R start_POSTSUPERSCRIPT 1 × italic_H × italic_W end_POSTSUPERSCRIPT</annotation></semantics></math> is replicated 3 times as <math alttext="\mathbb{R}^{3\times H\times W}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.18.m18.1"><semantics id="S2.SS2.SSS3.p1.18.m18.1a"><msup id="S2.SS2.SSS3.p1.18.m18.1.1" xref="S2.SS2.SSS3.p1.18.m18.1.1.cmml"><mi id="S2.SS2.SSS3.p1.18.m18.1.1.2" xref="S2.SS2.SSS3.p1.18.m18.1.1.2.cmml">ℝ</mi><mrow id="S2.SS2.SSS3.p1.18.m18.1.1.3" xref="S2.SS2.SSS3.p1.18.m18.1.1.3.cmml"><mn id="S2.SS2.SSS3.p1.18.m18.1.1.3.2" xref="S2.SS2.SSS3.p1.18.m18.1.1.3.2.cmml">3</mn><mo id="S2.SS2.SSS3.p1.18.m18.1.1.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS2.SSS3.p1.18.m18.1.1.3.1.cmml">×</mo><mi id="S2.SS2.SSS3.p1.18.m18.1.1.3.3" xref="S2.SS2.SSS3.p1.18.m18.1.1.3.3.cmml">H</mi><mo id="S2.SS2.SSS3.p1.18.m18.1.1.3.1a" lspace="0.222em" rspace="0.222em" xref="S2.SS2.SSS3.p1.18.m18.1.1.3.1.cmml">×</mo><mi id="S2.SS2.SSS3.p1.18.m18.1.1.3.4" xref="S2.SS2.SSS3.p1.18.m18.1.1.3.4.cmml">W</mi></mrow></msup><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.18.m18.1b"><apply id="S2.SS2.SSS3.p1.18.m18.1.1.cmml" xref="S2.SS2.SSS3.p1.18.m18.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.18.m18.1.1.1.cmml" xref="S2.SS2.SSS3.p1.18.m18.1.1">superscript</csymbol><ci id="S2.SS2.SSS3.p1.18.m18.1.1.2.cmml" xref="S2.SS2.SSS3.p1.18.m18.1.1.2">ℝ</ci><apply id="S2.SS2.SSS3.p1.18.m18.1.1.3.cmml" xref="S2.SS2.SSS3.p1.18.m18.1.1.3"><times id="S2.SS2.SSS3.p1.18.m18.1.1.3.1.cmml" xref="S2.SS2.SSS3.p1.18.m18.1.1.3.1"></times><cn id="S2.SS2.SSS3.p1.18.m18.1.1.3.2.cmml" type="integer" xref="S2.SS2.SSS3.p1.18.m18.1.1.3.2">3</cn><ci id="S2.SS2.SSS3.p1.18.m18.1.1.3.3.cmml" xref="S2.SS2.SSS3.p1.18.m18.1.1.3.3">𝐻</ci><ci id="S2.SS2.SSS3.p1.18.m18.1.1.3.4.cmml" xref="S2.SS2.SSS3.p1.18.m18.1.1.3.4">𝑊</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.18.m18.1c">\mathbb{R}^{3\times H\times W}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.18.m18.1d">blackboard_R start_POSTSUPERSCRIPT 3 × italic_H × italic_W end_POSTSUPERSCRIPT</annotation></semantics></math> for element-wise multiplication.</p> </div> <figure class="ltx_figure" id="S2.F4"> <div class="ltx_flex_figure"> <div class="ltx_flex_cell ltx_flex_size_3"> <figure class="ltx_figure ltx_figure_panel ltx_align_center" id="S2.F4.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="165" id="S2.F4.1.g1" src="extracted/5902691/Figure/RD-curve/main_RD_C64_32_16_UVG_BT709_pw_correct.png" width="192"/> </figure> </div> <div class="ltx_flex_cell ltx_flex_size_3"> <figure class="ltx_figure ltx_figure_panel ltx_align_center" id="S2.F4.2"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="162" id="S2.F4.2.g1" src="extracted/5902691/Figure/RD-curve/main_RD_C64_32_16_HEVC_B_BT709_pw_correct.png" width="192"/> </figure> </div> <div class="ltx_flex_cell ltx_flex_size_3"> <figure class="ltx_figure ltx_figure_panel ltx_align_center" id="S2.F4.3"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="165" id="S2.F4.3.g1" src="extracted/5902691/Figure/RD-curve/main_RD_C64_32_16_HEVC_C_BT709_pw_correct.png" width="192"/> </figure> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_3"> <figure class="ltx_figure ltx_figure_panel ltx_align_center" id="S2.F4.4"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="165" id="S2.F4.4.g1" src="extracted/5902691/Figure/RD-curve/main_RD_C64_32_16_HEVC_E_BT709_pw_correct.png" width="192"/> </figure> </div> <div class="ltx_flex_cell ltx_flex_size_3"> <figure class="ltx_figure ltx_figure_panel ltx_align_center" id="S2.F4.5"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="165" id="S2.F4.5.g1" src="extracted/5902691/Figure/RD-curve/main_RD_C64_32_16_HEVC_RGB_pw_correct.png" width="192"/> </figure> </div> <div class="ltx_flex_cell ltx_flex_size_3"> <figure class="ltx_figure ltx_figure_panel ltx_align_center" id="S2.F4.6"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="165" id="S2.F4.6.g1" src="extracted/5902691/Figure/RD-curve/main_RD_C64_32_16_MCL_JCV_BT709_pw_correct.png" width="192"/> </figure> </div> </div> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 4: </span>Rate-distortion comparison of conditional coding, conditional residual coding, and masked conditional residual coding under varying bottleneck levels.</figcaption> </figure> </section> </section> </section> <section class="ltx_section" id="S3"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">III </span><span class="ltx_text ltx_font_smallcaps" id="S3.1.1">Experiments</span> </h2> <section class="ltx_subsection" id="S3.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS1.4.1.1">III-A</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS1.5.2">Training</span> </h3> <div class="ltx_para" id="S3.SS1.p1"> <p class="ltx_p" id="S3.SS1.p1.4">We follow the common protocol of learned video compression to train our models on Vimeo-90k dataset <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib16" title="">16</a>]</cite>, which contains 91,701 7-frame sequences with resolution <math alttext="448\times 256" class="ltx_Math" display="inline" id="S3.SS1.p1.1.m1.1"><semantics id="S3.SS1.p1.1.m1.1a"><mrow id="S3.SS1.p1.1.m1.1.1" xref="S3.SS1.p1.1.m1.1.1.cmml"><mn id="S3.SS1.p1.1.m1.1.1.2" xref="S3.SS1.p1.1.m1.1.1.2.cmml">448</mn><mo id="S3.SS1.p1.1.m1.1.1.1" lspace="0.222em" rspace="0.222em" xref="S3.SS1.p1.1.m1.1.1.1.cmml">×</mo><mn id="S3.SS1.p1.1.m1.1.1.3" xref="S3.SS1.p1.1.m1.1.1.3.cmml">256</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.1.m1.1b"><apply id="S3.SS1.p1.1.m1.1.1.cmml" xref="S3.SS1.p1.1.m1.1.1"><times id="S3.SS1.p1.1.m1.1.1.1.cmml" xref="S3.SS1.p1.1.m1.1.1.1"></times><cn id="S3.SS1.p1.1.m1.1.1.2.cmml" type="integer" xref="S3.SS1.p1.1.m1.1.1.2">448</cn><cn id="S3.SS1.p1.1.m1.1.1.3.cmml" type="integer" xref="S3.SS1.p1.1.m1.1.1.3">256</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.1.m1.1c">448\times 256</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.1.m1.1d">448 × 256</annotation></semantics></math>. We randomly crop the training sequences into <math alttext="256\times 256" class="ltx_Math" display="inline" id="S3.SS1.p1.2.m2.1"><semantics id="S3.SS1.p1.2.m2.1a"><mrow id="S3.SS1.p1.2.m2.1.1" xref="S3.SS1.p1.2.m2.1.1.cmml"><mn id="S3.SS1.p1.2.m2.1.1.2" xref="S3.SS1.p1.2.m2.1.1.2.cmml">256</mn><mo id="S3.SS1.p1.2.m2.1.1.1" lspace="0.222em" rspace="0.222em" xref="S3.SS1.p1.2.m2.1.1.1.cmml">×</mo><mn id="S3.SS1.p1.2.m2.1.1.3" xref="S3.SS1.p1.2.m2.1.1.3.cmml">256</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.2.m2.1b"><apply id="S3.SS1.p1.2.m2.1.1.cmml" xref="S3.SS1.p1.2.m2.1.1"><times id="S3.SS1.p1.2.m2.1.1.1.cmml" xref="S3.SS1.p1.2.m2.1.1.1"></times><cn id="S3.SS1.p1.2.m2.1.1.2.cmml" type="integer" xref="S3.SS1.p1.2.m2.1.1.2">256</cn><cn id="S3.SS1.p1.2.m2.1.1.3.cmml" type="integer" xref="S3.SS1.p1.2.m2.1.1.3">256</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.2.m2.1c">256\times 256</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.2.m2.1d">256 × 256</annotation></semantics></math> patches for training. Our training details are summarized in Table <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S2.T1" title="TABLE I ‣ II Proposed Method ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">I</span></a>. <span class="ltx_text" id="S3.SS1.p1.3.1" style="color:#000000;">We initialize the conditional coding model with the pre-trained model from <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib17" title="">17</a>]</cite>, starting training from step 2. Conditional residual coding and masked conditional residual coding models are initialized using the corresponding conditional coding and conditional residual coding models with the same <math alttext="C" class="ltx_Math" display="inline" id="S3.SS1.p1.3.1.m1.1"><semantics id="S3.SS1.p1.3.1.m1.1a"><mi id="S3.SS1.p1.3.1.m1.1.1" mathcolor="#000000" xref="S3.SS1.p1.3.1.m1.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.3.1.m1.1b"><ci id="S3.SS1.p1.3.1.m1.1.1.cmml" xref="S3.SS1.p1.3.1.m1.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.3.1.m1.1c">C</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.3.1.m1.1d">italic_C</annotation></semantics></math> value, beginning training from step 1.</span> Separate models are trained to optimize the mean-square error (MSE) in the RGB domain with <math alttext="\lambda=\{256,512,1024,2048\}" class="ltx_Math" display="inline" id="S3.SS1.p1.4.m3.4"><semantics id="S3.SS1.p1.4.m3.4a"><mrow id="S3.SS1.p1.4.m3.4.5" xref="S3.SS1.p1.4.m3.4.5.cmml"><mi id="S3.SS1.p1.4.m3.4.5.2" xref="S3.SS1.p1.4.m3.4.5.2.cmml">λ</mi><mo id="S3.SS1.p1.4.m3.4.5.1" xref="S3.SS1.p1.4.m3.4.5.1.cmml">=</mo><mrow id="S3.SS1.p1.4.m3.4.5.3.2" xref="S3.SS1.p1.4.m3.4.5.3.1.cmml"><mo id="S3.SS1.p1.4.m3.4.5.3.2.1" stretchy="false" xref="S3.SS1.p1.4.m3.4.5.3.1.cmml">{</mo><mn id="S3.SS1.p1.4.m3.1.1" xref="S3.SS1.p1.4.m3.1.1.cmml">256</mn><mo id="S3.SS1.p1.4.m3.4.5.3.2.2" xref="S3.SS1.p1.4.m3.4.5.3.1.cmml">,</mo><mn id="S3.SS1.p1.4.m3.2.2" xref="S3.SS1.p1.4.m3.2.2.cmml">512</mn><mo id="S3.SS1.p1.4.m3.4.5.3.2.3" xref="S3.SS1.p1.4.m3.4.5.3.1.cmml">,</mo><mn id="S3.SS1.p1.4.m3.3.3" xref="S3.SS1.p1.4.m3.3.3.cmml">1024</mn><mo id="S3.SS1.p1.4.m3.4.5.3.2.4" xref="S3.SS1.p1.4.m3.4.5.3.1.cmml">,</mo><mn id="S3.SS1.p1.4.m3.4.4" xref="S3.SS1.p1.4.m3.4.4.cmml">2048</mn><mo id="S3.SS1.p1.4.m3.4.5.3.2.5" stretchy="false" xref="S3.SS1.p1.4.m3.4.5.3.1.cmml">}</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.4.m3.4b"><apply id="S3.SS1.p1.4.m3.4.5.cmml" xref="S3.SS1.p1.4.m3.4.5"><eq id="S3.SS1.p1.4.m3.4.5.1.cmml" xref="S3.SS1.p1.4.m3.4.5.1"></eq><ci id="S3.SS1.p1.4.m3.4.5.2.cmml" xref="S3.SS1.p1.4.m3.4.5.2">𝜆</ci><set id="S3.SS1.p1.4.m3.4.5.3.1.cmml" xref="S3.SS1.p1.4.m3.4.5.3.2"><cn id="S3.SS1.p1.4.m3.1.1.cmml" type="integer" xref="S3.SS1.p1.4.m3.1.1">256</cn><cn id="S3.SS1.p1.4.m3.2.2.cmml" type="integer" xref="S3.SS1.p1.4.m3.2.2">512</cn><cn id="S3.SS1.p1.4.m3.3.3.cmml" type="integer" xref="S3.SS1.p1.4.m3.3.3">1024</cn><cn id="S3.SS1.p1.4.m3.4.4.cmml" type="integer" xref="S3.SS1.p1.4.m3.4.4">2048</cn></set></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.4.m3.4c">\lambda=\{256,512,1024,2048\}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.4.m3.4d">italic_λ = { 256 , 512 , 1024 , 2048 }</annotation></semantics></math>.</p> </div> <figure class="ltx_table" id="S3.T2"> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_table">TABLE II: </span>BD-rate (%) comparison with the distortion measured in the PSNR-RGB domain. The anchor is conditional coding with <math alttext="C=64" class="ltx_Math" display="inline" id="S3.T2.2.m1.1"><semantics id="S3.T2.2.m1.1b"><mrow id="S3.T2.2.m1.1.1" xref="S3.T2.2.m1.1.1.cmml"><mi id="S3.T2.2.m1.1.1.2" xref="S3.T2.2.m1.1.1.2.cmml">C</mi><mo id="S3.T2.2.m1.1.1.1" xref="S3.T2.2.m1.1.1.1.cmml">=</mo><mn id="S3.T2.2.m1.1.1.3" xref="S3.T2.2.m1.1.1.3.cmml">64</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.T2.2.m1.1c"><apply id="S3.T2.2.m1.1.1.cmml" xref="S3.T2.2.m1.1.1"><eq id="S3.T2.2.m1.1.1.1.cmml" xref="S3.T2.2.m1.1.1.1"></eq><ci id="S3.T2.2.m1.1.1.2.cmml" xref="S3.T2.2.m1.1.1.2">𝐶</ci><cn id="S3.T2.2.m1.1.1.3.cmml" type="integer" xref="S3.T2.2.m1.1.1.3">64</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.T2.2.m1.1d">C=64</annotation><annotation encoding="application/x-llamapun" id="S3.T2.2.m1.1e">italic_C = 64</annotation></semantics></math>.</figcaption> <table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle" id="S3.T2.3"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S3.T2.3.1.1"> <th class="ltx_td ltx_th ltx_th_row ltx_border_r ltx_border_tt" id="S3.T2.3.1.1.1"></th> <td class="ltx_td ltx_align_center ltx_border_tt" id="S3.T2.3.1.1.2">UVG</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S3.T2.3.1.1.3">HEVC-B</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S3.T2.3.1.1.4">HEVC-C</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S3.T2.3.1.1.5">HEVC-E</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S3.T2.3.1.1.6">HEVC-RGB</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S3.T2.3.1.1.7">MCL-JCV</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S3.T2.3.1.1.8">Average</td> </tr> <tr class="ltx_tr" id="S3.T2.3.2.2"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S3.T2.3.2.2.1">Cond. (C=64)</th> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.2.2.2">0</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.2.2.3">0</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.2.2.4">0</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.2.2.5">0</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.2.2.6">0</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.3.2.2.7">0</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.2.2.8">0</td> </tr> <tr class="ltx_tr" id="S3.T2.3.3.3"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r" id="S3.T2.3.3.3.1">Cond. Res. (C=64)</th> <td class="ltx_td ltx_align_center" id="S3.T2.3.3.3.2">-1.66</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.3.3.3">-2.57</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.3.3.4">-3.07</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.3.3.5">-3.55</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.3.3.6">-3.14</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T2.3.3.3.7">-3.53</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.3.3.8">-2.92</td> </tr> <tr class="ltx_tr" id="S3.T2.3.4.4"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r" id="S3.T2.3.4.4.1">Masked Cond. Res. (C=64)</th> <td class="ltx_td ltx_align_center" id="S3.T2.3.4.4.2">-11.87</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.4.4.3">-7.66</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.4.4.4">-7.40</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.4.4.5">-13.93</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.4.4.6">-9.68</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T2.3.4.4.7">-9.92</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.4.4.8">-10.08</td> </tr> <tr class="ltx_tr" id="S3.T2.3.5.5"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S3.T2.3.5.5.1">Cond. (C=32)</th> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.5.5.2">6.32</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.5.5.3">7.57</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.5.5.4">16.18</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.5.5.5">15.29</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.5.5.6">4.12</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.3.5.5.7">5.95</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.5.5.8">9.24</td> </tr> <tr class="ltx_tr" id="S3.T2.3.6.6"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r" id="S3.T2.3.6.6.1">Cond. Res. (C=32)</th> <td class="ltx_td ltx_align_center" id="S3.T2.3.6.6.2">-1.82</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.6.6.3">-1.55</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.6.6.4">3.77</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.6.6.5">-3.49</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.6.6.6">-4.28</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T2.3.6.6.7">-2.58</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.6.6.8">-1.66</td> </tr> <tr class="ltx_tr" id="S3.T2.3.7.7"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r" id="S3.T2.3.7.7.1">Masked Cond. Res. (C=32)</th> <td class="ltx_td ltx_align_center" id="S3.T2.3.7.7.2">-6.08</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.7.7.3">-5.18</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.7.7.4">0.01</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.7.7.5">-10.73</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.7.7.6">-7.99</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T2.3.7.7.7">-5.71</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.7.7.8">-5.95</td> </tr> <tr class="ltx_tr" id="S3.T2.3.8.8"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S3.T2.3.8.8.1">Cond. (C=16)</th> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.8.8.2">5.72</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.8.8.3">9.73</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.8.8.4">19.79</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.8.8.5">14.08</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.8.8.6">3.76</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.3.8.8.7">10.31</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.3.8.8.8">10.57</td> </tr> <tr class="ltx_tr" id="S3.T2.3.9.9"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r" id="S3.T2.3.9.9.1">Cond. Res. (C=16)</th> <td class="ltx_td ltx_align_center" id="S3.T2.3.9.9.2">-0.04</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.9.9.3">-0.61</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.9.9.4">5.39</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.9.9.5">-0.77</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.9.9.6">-3.17</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T2.3.9.9.7">-0.30</td> <td class="ltx_td ltx_align_center" id="S3.T2.3.9.9.8">0.08</td> </tr> <tr class="ltx_tr" id="S3.T2.3.10.10"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_bb ltx_border_r" id="S3.T2.3.10.10.1">Masked Cond. Res. (C=16)</th> <td class="ltx_td ltx_align_center ltx_border_bb" id="S3.T2.3.10.10.2">-6.09</td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S3.T2.3.10.10.3">-5.44</td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S3.T2.3.10.10.4">-0.28</td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S3.T2.3.10.10.5">-10.47</td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S3.T2.3.10.10.6">-8.10</td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_r" id="S3.T2.3.10.10.7">-6.91</td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S3.T2.3.10.10.8">-6.22</td> </tr> </tbody> </table> </figure> </section> <section class="ltx_subsection" id="S3.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS2.4.1.1">III-B</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS2.5.2">Evaluation</span> </h3> <div class="ltx_para" id="S3.SS2.p1"> <p class="ltx_p" id="S3.SS2.p1.1">We evaluate our models on UVG <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib18" title="">18</a>]</cite>, HEVC Class B <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib19" title="">19</a>]</cite>, HEVC Class C <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib19" title="">19</a>]</cite>, HEVC Class E <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib19" title="">19</a>]</cite>, HEVC-RGB <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib20" title="">20</a>]</cite> and MCL-JCV <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib21" title="">21</a>]</cite> datasets. We follow <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib6" title="">6</a>]</cite> to convert all the test sequences from YUV420 to RGB444 (except for HEVC-RGB, which is already in RGB444 format) using BT.709 <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib22" title="">22</a>]</cite> and provide results for 96-frame encoding with GOP size 32. The reconstructed quality is measured in peak signal-to-noise ratio (PSNR) in the RGB domain and the bit rate in bits-per-pixel (bpp). The average BD-rates over the test sequences in these datasets are reported following the common test protocol of traditional codecs <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib10" title="">10</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib11" title="">11</a>, <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib9" title="">9</a>]</cite> by averaging the per-sequence BD-rates. The piecewise cubic interpolation <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib23" title="">23</a>]</cite> is used for BD-rate measurements. Positive and negative BD-rate numbers indicate rate inflation and reduction, respectively.</p> </div> <figure class="ltx_figure" id="S3.F5"> <div class="ltx_inline-block ltx_transformed_outer" id="S3.F5.35" style="width:433.6pt;height:533.6pt;vertical-align:-0.6pt;"><span class="ltx_transformed_inner" style="transform:translate(-175.9pt,216.2pt) scale(0.552051909631215,0.552051909631215) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S3.F5.35.35"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S3.F5.35.35.36.1"> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_column ltx_th_row" id="S3.F5.35.35.36.1.1" style="width:5.7pt;"></th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_column ltx_th_row ltx_border_r" id="S3.F5.35.35.36.1.2" style="width:153.6pt;"></th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_column ltx_th_row ltx_border_r" id="S3.F5.35.35.36.1.3" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.35.35.36.1.3.1"> <span class="ltx_p" id="S3.F5.35.35.36.1.3.1.1"><span class="ltx_text" id="S3.F5.35.35.36.1.3.1.1.1" style="font-size:144%;">Cond. Res.</span></span> </span> </th> <th class="ltx_td ltx_align_center ltx_align_middle ltx_th ltx_th_column" colspan="3" id="S3.F5.35.35.36.1.4"><span class="ltx_text" id="S3.F5.35.35.36.1.4.1" style="font-size:144%;">Masked Cond. Res.</span></th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S3.F5.5.5.5"> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row" id="S3.F5.5.5.5.6" style="width:5.7pt;"></th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row ltx_border_r" id="S3.F5.1.1.1.1" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.1.1.1.1.1"> <span class="ltx_p" id="S3.F5.1.1.1.1.1.1"><math alttext="x_{t}" class="ltx_Math" display="inline" id="S3.F5.1.1.1.1.1.1.m1.1"><semantics id="S3.F5.1.1.1.1.1.1.m1.1a"><msub id="S3.F5.1.1.1.1.1.1.m1.1.1" xref="S3.F5.1.1.1.1.1.1.m1.1.1.cmml"><mi id="S3.F5.1.1.1.1.1.1.m1.1.1.2" mathsize="144%" xref="S3.F5.1.1.1.1.1.1.m1.1.1.2.cmml">x</mi><mi id="S3.F5.1.1.1.1.1.1.m1.1.1.3" mathsize="144%" xref="S3.F5.1.1.1.1.1.1.m1.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S3.F5.1.1.1.1.1.1.m1.1b"><apply id="S3.F5.1.1.1.1.1.1.m1.1.1.cmml" xref="S3.F5.1.1.1.1.1.1.m1.1.1"><csymbol cd="ambiguous" id="S3.F5.1.1.1.1.1.1.m1.1.1.1.cmml" xref="S3.F5.1.1.1.1.1.1.m1.1.1">subscript</csymbol><ci id="S3.F5.1.1.1.1.1.1.m1.1.1.2.cmml" xref="S3.F5.1.1.1.1.1.1.m1.1.1.2">𝑥</ci><ci id="S3.F5.1.1.1.1.1.1.m1.1.1.3.cmml" xref="S3.F5.1.1.1.1.1.1.m1.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.F5.1.1.1.1.1.1.m1.1c">x_{t}</annotation><annotation encoding="application/x-llamapun" id="S3.F5.1.1.1.1.1.1.m1.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math></span> </span> </th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row ltx_border_r" id="S3.F5.2.2.2.2" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.2.2.2.2.1"> <span class="ltx_p" id="S3.F5.2.2.2.2.1.1"><math alttext="x_{t}-\ddot{x_{c}}" class="ltx_Math" display="inline" id="S3.F5.2.2.2.2.1.1.m1.1"><semantics id="S3.F5.2.2.2.2.1.1.m1.1a"><mrow id="S3.F5.2.2.2.2.1.1.m1.1.1" xref="S3.F5.2.2.2.2.1.1.m1.1.1.cmml"><msub id="S3.F5.2.2.2.2.1.1.m1.1.1.2" xref="S3.F5.2.2.2.2.1.1.m1.1.1.2.cmml"><mi id="S3.F5.2.2.2.2.1.1.m1.1.1.2.2" mathsize="144%" xref="S3.F5.2.2.2.2.1.1.m1.1.1.2.2.cmml">x</mi><mi id="S3.F5.2.2.2.2.1.1.m1.1.1.2.3" mathsize="144%" xref="S3.F5.2.2.2.2.1.1.m1.1.1.2.3.cmml">t</mi></msub><mo id="S3.F5.2.2.2.2.1.1.m1.1.1.1" mathsize="144%" xref="S3.F5.2.2.2.2.1.1.m1.1.1.1.cmml">−</mo><mover accent="true" id="S3.F5.2.2.2.2.1.1.m1.1.1.3" xref="S3.F5.2.2.2.2.1.1.m1.1.1.3.cmml"><msub id="S3.F5.2.2.2.2.1.1.m1.1.1.3.2" xref="S3.F5.2.2.2.2.1.1.m1.1.1.3.2.cmml"><mi id="S3.F5.2.2.2.2.1.1.m1.1.1.3.2.2" mathsize="144%" xref="S3.F5.2.2.2.2.1.1.m1.1.1.3.2.2.cmml">x</mi><mi id="S3.F5.2.2.2.2.1.1.m1.1.1.3.2.3" mathsize="144%" xref="S3.F5.2.2.2.2.1.1.m1.1.1.3.2.3.cmml">c</mi></msub><mo id="S3.F5.2.2.2.2.1.1.m1.1.1.3.1" mathsize="144%" xref="S3.F5.2.2.2.2.1.1.m1.1.1.3.1.cmml">¨</mo></mover></mrow><annotation-xml encoding="MathML-Content" id="S3.F5.2.2.2.2.1.1.m1.1b"><apply id="S3.F5.2.2.2.2.1.1.m1.1.1.cmml" xref="S3.F5.2.2.2.2.1.1.m1.1.1"><minus id="S3.F5.2.2.2.2.1.1.m1.1.1.1.cmml" xref="S3.F5.2.2.2.2.1.1.m1.1.1.1"></minus><apply id="S3.F5.2.2.2.2.1.1.m1.1.1.2.cmml" xref="S3.F5.2.2.2.2.1.1.m1.1.1.2"><csymbol cd="ambiguous" id="S3.F5.2.2.2.2.1.1.m1.1.1.2.1.cmml" xref="S3.F5.2.2.2.2.1.1.m1.1.1.2">subscript</csymbol><ci id="S3.F5.2.2.2.2.1.1.m1.1.1.2.2.cmml" xref="S3.F5.2.2.2.2.1.1.m1.1.1.2.2">𝑥</ci><ci id="S3.F5.2.2.2.2.1.1.m1.1.1.2.3.cmml" xref="S3.F5.2.2.2.2.1.1.m1.1.1.2.3">𝑡</ci></apply><apply id="S3.F5.2.2.2.2.1.1.m1.1.1.3.cmml" xref="S3.F5.2.2.2.2.1.1.m1.1.1.3"><ci id="S3.F5.2.2.2.2.1.1.m1.1.1.3.1.cmml" xref="S3.F5.2.2.2.2.1.1.m1.1.1.3.1">¨</ci><apply id="S3.F5.2.2.2.2.1.1.m1.1.1.3.2.cmml" xref="S3.F5.2.2.2.2.1.1.m1.1.1.3.2"><csymbol cd="ambiguous" id="S3.F5.2.2.2.2.1.1.m1.1.1.3.2.1.cmml" xref="S3.F5.2.2.2.2.1.1.m1.1.1.3.2">subscript</csymbol><ci id="S3.F5.2.2.2.2.1.1.m1.1.1.3.2.2.cmml" xref="S3.F5.2.2.2.2.1.1.m1.1.1.3.2.2">𝑥</ci><ci id="S3.F5.2.2.2.2.1.1.m1.1.1.3.2.3.cmml" xref="S3.F5.2.2.2.2.1.1.m1.1.1.3.2.3">𝑐</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.F5.2.2.2.2.1.1.m1.1c">x_{t}-\ddot{x_{c}}</annotation><annotation encoding="application/x-llamapun" id="S3.F5.2.2.2.2.1.1.m1.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¨ start_ARG italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG</annotation></semantics></math></span> </span> </th> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.3.3.3.3" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.3.3.3.3.1"> <span class="ltx_p" id="S3.F5.3.3.3.3.1.1"><math alttext="x_{t}-m\odot\ddot{x_{c}}" class="ltx_Math" display="inline" id="S3.F5.3.3.3.3.1.1.m1.1"><semantics id="S3.F5.3.3.3.3.1.1.m1.1a"><mrow id="S3.F5.3.3.3.3.1.1.m1.1.1" xref="S3.F5.3.3.3.3.1.1.m1.1.1.cmml"><msub id="S3.F5.3.3.3.3.1.1.m1.1.1.2" xref="S3.F5.3.3.3.3.1.1.m1.1.1.2.cmml"><mi id="S3.F5.3.3.3.3.1.1.m1.1.1.2.2" mathsize="144%" xref="S3.F5.3.3.3.3.1.1.m1.1.1.2.2.cmml">x</mi><mi id="S3.F5.3.3.3.3.1.1.m1.1.1.2.3" mathsize="144%" xref="S3.F5.3.3.3.3.1.1.m1.1.1.2.3.cmml">t</mi></msub><mo id="S3.F5.3.3.3.3.1.1.m1.1.1.1" mathsize="144%" xref="S3.F5.3.3.3.3.1.1.m1.1.1.1.cmml">−</mo><mrow id="S3.F5.3.3.3.3.1.1.m1.1.1.3" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.cmml"><mi id="S3.F5.3.3.3.3.1.1.m1.1.1.3.2" mathsize="144%" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.2.cmml">m</mi><mo id="S3.F5.3.3.3.3.1.1.m1.1.1.3.1" lspace="0.222em" mathsize="144%" rspace="0.222em" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.1.cmml">⊙</mo><mover accent="true" id="S3.F5.3.3.3.3.1.1.m1.1.1.3.3" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.cmml"><msub id="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.2" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.2.cmml"><mi id="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.2.2" mathsize="144%" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.2.2.cmml">x</mi><mi id="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.2.3" mathsize="144%" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.2.3.cmml">c</mi></msub><mo id="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.1" mathsize="144%" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.1.cmml">¨</mo></mover></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.F5.3.3.3.3.1.1.m1.1b"><apply id="S3.F5.3.3.3.3.1.1.m1.1.1.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1"><minus id="S3.F5.3.3.3.3.1.1.m1.1.1.1.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1.1"></minus><apply id="S3.F5.3.3.3.3.1.1.m1.1.1.2.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1.2"><csymbol cd="ambiguous" id="S3.F5.3.3.3.3.1.1.m1.1.1.2.1.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1.2">subscript</csymbol><ci id="S3.F5.3.3.3.3.1.1.m1.1.1.2.2.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1.2.2">𝑥</ci><ci id="S3.F5.3.3.3.3.1.1.m1.1.1.2.3.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1.2.3">𝑡</ci></apply><apply id="S3.F5.3.3.3.3.1.1.m1.1.1.3.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3"><csymbol cd="latexml" id="S3.F5.3.3.3.3.1.1.m1.1.1.3.1.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.1">direct-product</csymbol><ci id="S3.F5.3.3.3.3.1.1.m1.1.1.3.2.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.2">𝑚</ci><apply id="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.3"><ci id="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.1.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.1">¨</ci><apply id="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.2.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.2"><csymbol cd="ambiguous" id="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.2.1.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.2">subscript</csymbol><ci id="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.2.2.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.2.2">𝑥</ci><ci id="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.2.3.cmml" xref="S3.F5.3.3.3.3.1.1.m1.1.1.3.3.2.3">𝑐</ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.F5.3.3.3.3.1.1.m1.1c">x_{t}-m\odot\ddot{x_{c}}</annotation><annotation encoding="application/x-llamapun" id="S3.F5.3.3.3.3.1.1.m1.1d">italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_m ⊙ over¨ start_ARG italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG</annotation></semantics></math></span> </span> </td> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.4.4.4.4" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.4.4.4.4.1"> <span class="ltx_p" id="S3.F5.4.4.4.4.1.1"><math alttext="m\odot\ddot{x_{c}}" class="ltx_Math" display="inline" id="S3.F5.4.4.4.4.1.1.m1.1"><semantics id="S3.F5.4.4.4.4.1.1.m1.1a"><mrow id="S3.F5.4.4.4.4.1.1.m1.1.1" xref="S3.F5.4.4.4.4.1.1.m1.1.1.cmml"><mi id="S3.F5.4.4.4.4.1.1.m1.1.1.2" mathsize="144%" xref="S3.F5.4.4.4.4.1.1.m1.1.1.2.cmml">m</mi><mo id="S3.F5.4.4.4.4.1.1.m1.1.1.1" lspace="0.222em" mathsize="144%" rspace="0.222em" xref="S3.F5.4.4.4.4.1.1.m1.1.1.1.cmml">⊙</mo><mover accent="true" id="S3.F5.4.4.4.4.1.1.m1.1.1.3" xref="S3.F5.4.4.4.4.1.1.m1.1.1.3.cmml"><msub id="S3.F5.4.4.4.4.1.1.m1.1.1.3.2" xref="S3.F5.4.4.4.4.1.1.m1.1.1.3.2.cmml"><mi id="S3.F5.4.4.4.4.1.1.m1.1.1.3.2.2" mathsize="144%" xref="S3.F5.4.4.4.4.1.1.m1.1.1.3.2.2.cmml">x</mi><mi id="S3.F5.4.4.4.4.1.1.m1.1.1.3.2.3" mathsize="144%" xref="S3.F5.4.4.4.4.1.1.m1.1.1.3.2.3.cmml">c</mi></msub><mo id="S3.F5.4.4.4.4.1.1.m1.1.1.3.1" mathsize="144%" xref="S3.F5.4.4.4.4.1.1.m1.1.1.3.1.cmml">¨</mo></mover></mrow><annotation-xml encoding="MathML-Content" id="S3.F5.4.4.4.4.1.1.m1.1b"><apply id="S3.F5.4.4.4.4.1.1.m1.1.1.cmml" xref="S3.F5.4.4.4.4.1.1.m1.1.1"><csymbol cd="latexml" id="S3.F5.4.4.4.4.1.1.m1.1.1.1.cmml" xref="S3.F5.4.4.4.4.1.1.m1.1.1.1">direct-product</csymbol><ci id="S3.F5.4.4.4.4.1.1.m1.1.1.2.cmml" xref="S3.F5.4.4.4.4.1.1.m1.1.1.2">𝑚</ci><apply id="S3.F5.4.4.4.4.1.1.m1.1.1.3.cmml" xref="S3.F5.4.4.4.4.1.1.m1.1.1.3"><ci id="S3.F5.4.4.4.4.1.1.m1.1.1.3.1.cmml" xref="S3.F5.4.4.4.4.1.1.m1.1.1.3.1">¨</ci><apply id="S3.F5.4.4.4.4.1.1.m1.1.1.3.2.cmml" xref="S3.F5.4.4.4.4.1.1.m1.1.1.3.2"><csymbol cd="ambiguous" id="S3.F5.4.4.4.4.1.1.m1.1.1.3.2.1.cmml" xref="S3.F5.4.4.4.4.1.1.m1.1.1.3.2">subscript</csymbol><ci id="S3.F5.4.4.4.4.1.1.m1.1.1.3.2.2.cmml" xref="S3.F5.4.4.4.4.1.1.m1.1.1.3.2.2">𝑥</ci><ci id="S3.F5.4.4.4.4.1.1.m1.1.1.3.2.3.cmml" xref="S3.F5.4.4.4.4.1.1.m1.1.1.3.2.3">𝑐</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.F5.4.4.4.4.1.1.m1.1c">m\odot\ddot{x_{c}}</annotation><annotation encoding="application/x-llamapun" id="S3.F5.4.4.4.4.1.1.m1.1d">italic_m ⊙ over¨ start_ARG italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG</annotation></semantics></math></span> </span> </td> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.5.5.5.5" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.5.5.5.5.1"> <span class="ltx_p" id="S3.F5.5.5.5.5.1.1"><math alttext="m" class="ltx_Math" display="inline" id="S3.F5.5.5.5.5.1.1.m1.1"><semantics id="S3.F5.5.5.5.5.1.1.m1.1a"><mi id="S3.F5.5.5.5.5.1.1.m1.1.1" mathsize="144%" xref="S3.F5.5.5.5.5.1.1.m1.1.1.cmml">m</mi><annotation-xml encoding="MathML-Content" id="S3.F5.5.5.5.5.1.1.m1.1b"><ci id="S3.F5.5.5.5.5.1.1.m1.1.1.cmml" xref="S3.F5.5.5.5.5.1.1.m1.1.1">𝑚</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.F5.5.5.5.5.1.1.m1.1c">m</annotation><annotation encoding="application/x-llamapun" id="S3.F5.5.5.5.5.1.1.m1.1d">italic_m</annotation></semantics></math></span> </span> </td> </tr> <tr class="ltx_tr" id="S3.F5.10.10.10"> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row" id="S3.F5.10.10.10.6" style="width:5.7pt;"> <div class="ltx_inline-block ltx_align_top ltx_transformed_outer" id="S3.F5.10.10.10.6.1" style="width:9.8pt;height:36pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:36.0pt;transform:translate(-13.08pt,-13.08pt) rotate(-90deg) ;"> <p class="ltx_p" id="S3.F5.10.10.10.6.1.1"><span class="ltx_text" id="S3.F5.10.10.10.6.1.1.1" style="font-size:144%;">C=64</span></p> </span></div> </th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row ltx_border_r" id="S3.F5.6.6.6.1" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.6.6.6.1.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.6.6.6.1.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive_down_04/gt_frame_4_downsampled.png" width="180"/> </span> </th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row ltx_border_r" id="S3.F5.7.7.7.2" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.7.7.7.2.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.7.7.7.2.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive_down_04/CR64_res_brighter_frame_4_downsampled.png" width="180"/> </span> </th> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.8.8.8.3" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.8.8.8.3.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.8.8.8.3.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive_down_04/MCR64_res_brighter_frame_4_downsampled.png" width="180"/> </span> </td> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.9.9.9.4" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.9.9.9.4.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.9.9.9.4.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive_down_04/MCR64_maskxmc_frame_4_downsampled.png" width="180"/> </span> </td> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.10.10.10.5" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.10.10.10.5.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="122" id="S3.F5.10.10.10.5.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive/MCR64_mask_2_frame_4.png" width="249"/> </span> </td> </tr> <tr class="ltx_tr" id="S3.F5.15.15.15"> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row" id="S3.F5.15.15.15.6" style="width:5.7pt;"> <div class="ltx_inline-block ltx_align_top ltx_transformed_outer" id="S3.F5.15.15.15.6.1" style="width:9.8pt;height:36pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:36.0pt;transform:translate(-13.08pt,-13.08pt) rotate(-90deg) ;"> <p class="ltx_p" id="S3.F5.15.15.15.6.1.1"><span class="ltx_text" id="S3.F5.15.15.15.6.1.1.1" style="font-size:144%;">C=32</span></p> </span></div> </th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row ltx_border_r" id="S3.F5.11.11.11.1" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.11.11.11.1.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.11.11.11.1.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive_down_04/gt_frame_4_downsampled.png" width="180"/> </span> </th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row ltx_border_r" id="S3.F5.12.12.12.2" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.12.12.12.2.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.12.12.12.2.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive_down_04/CR32_res_brighter_frame_4_downsampled.png" width="180"/> </span> </th> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.13.13.13.3" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.13.13.13.3.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.13.13.13.3.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive_down_04/MCR32_res_brighter_frame_4_downsampled.png" width="180"/> </span> </td> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.14.14.14.4" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.14.14.14.4.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.14.14.14.4.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive_down_04/MCR32_maskxmc_frame_4_downsampled.png" width="180"/> </span> </td> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.15.15.15.5" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.15.15.15.5.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="122" id="S3.F5.15.15.15.5.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive/MCR32_mask_2_frame_4.png" width="249"/> </span> </td> </tr> <tr class="ltx_tr" id="S3.F5.20.20.20"> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row" id="S3.F5.20.20.20.6" style="width:5.7pt;"> <div class="ltx_inline-block ltx_align_top ltx_transformed_outer" id="S3.F5.20.20.20.6.1" style="width:9.8pt;height:36pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:36.0pt;transform:translate(-13.08pt,-13.08pt) rotate(-90deg) ;"> <p class="ltx_p" id="S3.F5.20.20.20.6.1.1"><span class="ltx_text" id="S3.F5.20.20.20.6.1.1.1" style="font-size:144%;">C=16</span></p> </span></div> </th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row ltx_border_r" id="S3.F5.16.16.16.1" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.16.16.16.1.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.16.16.16.1.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive_down_04/gt_frame_4_downsampled.png" width="180"/> </span> </th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row ltx_border_r" id="S3.F5.17.17.17.2" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.17.17.17.2.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.17.17.17.2.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive_down_04/CR16_res_brighter_frame_4_downsampled.png" width="180"/> </span> </th> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.18.18.18.3" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.18.18.18.3.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.18.18.18.3.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive_down_04/MCR16_res_brighter_frame_4_downsampled.png" width="180"/> </span> </td> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.19.19.19.4" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.19.19.19.4.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.19.19.19.4.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive_down_04/MCR16_maskxmc_frame_4_downsampled.png" width="180"/> </span> </td> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.20.20.20.5" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.20.20.20.5.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="122" id="S3.F5.20.20.20.5.1.g1" src="extracted/5902691/Figure/visualization/BasketballDrive/MCR16_mask_2_frame_4.png" width="249"/> </span> </td> </tr> <tr class="ltx_tr" id="S3.F5.25.25.25"> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row" id="S3.F5.25.25.25.6" style="width:5.7pt;"> <div class="ltx_inline-block ltx_align_top ltx_transformed_outer" id="S3.F5.25.25.25.6.1" style="width:9.8pt;height:36pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:36.0pt;transform:translate(-13.08pt,-13.08pt) rotate(-90deg) ;"> <p class="ltx_p" id="S3.F5.25.25.25.6.1.1"><span class="ltx_text" id="S3.F5.25.25.25.6.1.1.1" style="font-size:144%;">C=64</span></p> </span></div> </th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row ltx_border_r" id="S3.F5.21.21.21.1" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.21.21.21.1.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.21.21.21.1.1.g1" src="extracted/5902691/Figure/visualization/Jockey_down_04/gt_frame_11_downsampled.png" width="180"/> </span> </th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row ltx_border_r" id="S3.F5.22.22.22.2" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.22.22.22.2.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.22.22.22.2.1.g1" src="extracted/5902691/Figure/visualization/Jockey_down_04/CR64_res_brighter_frame_11_downsampled.png" width="180"/> </span> </th> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.23.23.23.3" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.23.23.23.3.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.23.23.23.3.1.g1" src="extracted/5902691/Figure/visualization/Jockey_down_04/MCR64_res_brighter_frame_11_downsampled.png" width="180"/> </span> </td> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.24.24.24.4" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.24.24.24.4.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.24.24.24.4.1.g1" src="extracted/5902691/Figure/visualization/Jockey_down_04/MCR64_maskxmc_frame_11_downsampled.png" width="180"/> </span> </td> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.25.25.25.5" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.25.25.25.5.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="122" id="S3.F5.25.25.25.5.1.g1" src="extracted/5902691/Figure/visualization/Jockey/MCR64_mask_2_frame_11.png" width="249"/> </span> </td> </tr> <tr class="ltx_tr" id="S3.F5.30.30.30"> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row" id="S3.F5.30.30.30.6" style="width:5.7pt;"> <div class="ltx_inline-block ltx_align_top ltx_transformed_outer" id="S3.F5.30.30.30.6.1" style="width:9.8pt;height:36pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:36.0pt;transform:translate(-13.08pt,-13.08pt) rotate(-90deg) ;"> <p class="ltx_p" id="S3.F5.30.30.30.6.1.1"><span class="ltx_text" id="S3.F5.30.30.30.6.1.1.1" style="font-size:144%;">C=32</span></p> </span></div> </th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row ltx_border_r" id="S3.F5.26.26.26.1" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.26.26.26.1.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.26.26.26.1.1.g1" src="extracted/5902691/Figure/visualization/Jockey_down_04/gt_frame_11_downsampled.png" width="180"/> </span> </th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row ltx_border_r" id="S3.F5.27.27.27.2" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.27.27.27.2.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.27.27.27.2.1.g1" src="extracted/5902691/Figure/visualization/Jockey_down_04/CR32_res_brighter_frame_11_downsampled.png" width="180"/> </span> </th> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.28.28.28.3" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.28.28.28.3.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.28.28.28.3.1.g1" src="extracted/5902691/Figure/visualization/Jockey_down_04/MCR32_res_brighter_frame_11_downsampled.png" width="180"/> </span> </td> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.29.29.29.4" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.29.29.29.4.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.29.29.29.4.1.g1" src="extracted/5902691/Figure/visualization/Jockey_down_04/MCR32_maskxmc_frame_11_downsampled.png" width="180"/> </span> </td> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.30.30.30.5" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.30.30.30.5.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="122" id="S3.F5.30.30.30.5.1.g1" src="extracted/5902691/Figure/visualization/Jockey/MCR32_mask_2_frame_11.png" width="249"/> </span> </td> </tr> <tr class="ltx_tr" id="S3.F5.35.35.35"> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row" id="S3.F5.35.35.35.6" style="width:5.7pt;"> <div class="ltx_inline-block ltx_align_top ltx_transformed_outer" id="S3.F5.35.35.35.6.1" style="width:9.8pt;height:36pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:36.0pt;transform:translate(-13.08pt,-13.08pt) rotate(-90deg) ;"> <p class="ltx_p" id="S3.F5.35.35.35.6.1.1"><span class="ltx_text" id="S3.F5.35.35.35.6.1.1.1" style="font-size:144%;">C=16</span></p> </span></div> </th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row ltx_border_r" id="S3.F5.31.31.31.1" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.31.31.31.1.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.31.31.31.1.1.g1" src="extracted/5902691/Figure/visualization/Jockey_down_04/gt_frame_11_downsampled.png" width="180"/> </span> </th> <th class="ltx_td ltx_align_justify ltx_align_middle ltx_th ltx_th_row ltx_border_r" id="S3.F5.32.32.32.2" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.32.32.32.2.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.32.32.32.2.1.g1" src="extracted/5902691/Figure/visualization/Jockey_down_04/CR16_res_brighter_frame_11_downsampled.png" width="180"/> </span> </th> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.33.33.33.3" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.33.33.33.3.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.33.33.33.3.1.g1" src="extracted/5902691/Figure/visualization/Jockey_down_04/MCR16_res_brighter_frame_11_downsampled.png" width="180"/> </span> </td> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.34.34.34.4" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.34.34.34.4.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="101" id="S3.F5.34.34.34.4.1.g1" src="extracted/5902691/Figure/visualization/Jockey_down_04/MCR16_maskxmc_frame_11_downsampled.png" width="180"/> </span> </td> <td class="ltx_td ltx_align_justify ltx_align_middle" id="S3.F5.35.35.35.5" style="width:153.6pt;"> <span class="ltx_inline-block ltx_align_top" id="S3.F5.35.35.35.5.1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="122" id="S3.F5.35.35.35.5.1.g1" src="extracted/5902691/Figure/visualization/Jockey/MCR16_mask_2_frame_11.png" width="249"/> </span> </td> </tr> </tbody> </table> </span></div> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">Figure 5: </span>Visualization of the input signals and masks for conditional residual coding and masked conditional residual coding under different bottleneck levels. From left to right, the second column shows the input signals for conditional residual coding. The third, fourth, and fifth columns visualize the input signals, masked prediction residues, and masks for masked conditional residual coding, respectively.</figcaption> </figure> <figure class="ltx_table" id="S3.T3"> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_table">TABLE III: </span>Comparison of the BD-rate and complexity in terms of the encoding/decoding MACs, model size and the required channel size of the full-resolution condition signal <math alttext="\dot{x}_{c}" class="ltx_Math" display="inline" id="S3.T3.3.m1.1"><semantics id="S3.T3.3.m1.1b"><msub id="S3.T3.3.m1.1.1" xref="S3.T3.3.m1.1.1.cmml"><mover accent="true" id="S3.T3.3.m1.1.1.2" xref="S3.T3.3.m1.1.1.2.cmml"><mi id="S3.T3.3.m1.1.1.2.2" xref="S3.T3.3.m1.1.1.2.2.cmml">x</mi><mo id="S3.T3.3.m1.1.1.2.1" xref="S3.T3.3.m1.1.1.2.1.cmml">˙</mo></mover><mi id="S3.T3.3.m1.1.1.3" xref="S3.T3.3.m1.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S3.T3.3.m1.1c"><apply id="S3.T3.3.m1.1.1.cmml" xref="S3.T3.3.m1.1.1"><csymbol cd="ambiguous" id="S3.T3.3.m1.1.1.1.cmml" xref="S3.T3.3.m1.1.1">subscript</csymbol><apply id="S3.T3.3.m1.1.1.2.cmml" xref="S3.T3.3.m1.1.1.2"><ci id="S3.T3.3.m1.1.1.2.1.cmml" xref="S3.T3.3.m1.1.1.2.1">˙</ci><ci id="S3.T3.3.m1.1.1.2.2.cmml" xref="S3.T3.3.m1.1.1.2.2">𝑥</ci></apply><ci id="S3.T3.3.m1.1.1.3.cmml" xref="S3.T3.3.m1.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.3.m1.1d">\dot{x}_{c}</annotation><annotation encoding="application/x-llamapun" id="S3.T3.3.m1.1e">over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> for inter-frame coding. The values in parentheses indicate the change in percentage terms relative to conditional coding with <math alttext="C=64" class="ltx_Math" display="inline" id="S3.T3.4.m2.1"><semantics id="S3.T3.4.m2.1b"><mrow id="S3.T3.4.m2.1.1" xref="S3.T3.4.m2.1.1.cmml"><mi id="S3.T3.4.m2.1.1.2" xref="S3.T3.4.m2.1.1.2.cmml">C</mi><mo id="S3.T3.4.m2.1.1.1" xref="S3.T3.4.m2.1.1.1.cmml">=</mo><mn id="S3.T3.4.m2.1.1.3" xref="S3.T3.4.m2.1.1.3.cmml">64</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.T3.4.m2.1c"><apply id="S3.T3.4.m2.1.1.cmml" xref="S3.T3.4.m2.1.1"><eq id="S3.T3.4.m2.1.1.1.cmml" xref="S3.T3.4.m2.1.1.1"></eq><ci id="S3.T3.4.m2.1.1.2.cmml" xref="S3.T3.4.m2.1.1.2">𝐶</ci><cn id="S3.T3.4.m2.1.1.3.cmml" type="integer" xref="S3.T3.4.m2.1.1.3">64</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.4.m2.1d">C=64</annotation><annotation encoding="application/x-llamapun" id="S3.T3.4.m2.1e">italic_C = 64</annotation></semantics></math>.</figcaption> <table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle" id="S3.T3.5"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S3.T3.5.1"> <th class="ltx_td ltx_th ltx_th_column ltx_border_r ltx_border_tt" id="S3.T3.5.1.2"></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_tt" id="S3.T3.5.1.3">BD-rate (%)</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" id="S3.T3.5.1.4">Encoding (kMACs/pixel)</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" id="S3.T3.5.1.5">Decoding (kMACs/pixel)</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" id="S3.T3.5.1.6">Model Size (M)</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_tt" id="S3.T3.5.1.1">Channel Size of <math alttext="\dot{x}_{c}" class="ltx_Math" display="inline" id="S3.T3.5.1.1.m1.1"><semantics id="S3.T3.5.1.1.m1.1a"><msub id="S3.T3.5.1.1.m1.1.1" xref="S3.T3.5.1.1.m1.1.1.cmml"><mover accent="true" id="S3.T3.5.1.1.m1.1.1.2" xref="S3.T3.5.1.1.m1.1.1.2.cmml"><mi id="S3.T3.5.1.1.m1.1.1.2.2" xref="S3.T3.5.1.1.m1.1.1.2.2.cmml">x</mi><mo id="S3.T3.5.1.1.m1.1.1.2.1" xref="S3.T3.5.1.1.m1.1.1.2.1.cmml">˙</mo></mover><mi id="S3.T3.5.1.1.m1.1.1.3" xref="S3.T3.5.1.1.m1.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S3.T3.5.1.1.m1.1b"><apply id="S3.T3.5.1.1.m1.1.1.cmml" xref="S3.T3.5.1.1.m1.1.1"><csymbol cd="ambiguous" id="S3.T3.5.1.1.m1.1.1.1.cmml" xref="S3.T3.5.1.1.m1.1.1">subscript</csymbol><apply id="S3.T3.5.1.1.m1.1.1.2.cmml" xref="S3.T3.5.1.1.m1.1.1.2"><ci id="S3.T3.5.1.1.m1.1.1.2.1.cmml" xref="S3.T3.5.1.1.m1.1.1.2.1">˙</ci><ci id="S3.T3.5.1.1.m1.1.1.2.2.cmml" xref="S3.T3.5.1.1.m1.1.1.2.2">𝑥</ci></apply><ci id="S3.T3.5.1.1.m1.1.1.3.cmml" xref="S3.T3.5.1.1.m1.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.5.1.1.m1.1c">\dot{x}_{c}</annotation><annotation encoding="application/x-llamapun" id="S3.T3.5.1.1.m1.1d">over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> </th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S3.T3.5.2.1"> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S3.T3.5.2.1.1">Cond. (C=64)</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T3.5.2.1.2">0</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.5.2.1.3">1153</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.5.2.1.4">762</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.5.2.1.5">7.944</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.5.2.1.6">64</td> </tr> <tr class="ltx_tr" id="S3.T3.5.3.2"> <td class="ltx_td ltx_align_left ltx_border_r" id="S3.T3.5.3.2.1">Cond. Res. (C=64)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T3.5.3.2.2">-2.92</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.3.2.3">1155 (+0.17%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.3.2.4">764 (+0.26%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.3.2.5">7.946 (+0.03%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.3.2.6">64</td> </tr> <tr class="ltx_tr" id="S3.T3.5.4.3"> <td class="ltx_td ltx_align_left ltx_border_r" id="S3.T3.5.4.3.1">Masked Cond. Res. (C=64)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T3.5.4.3.2">-10.08</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.4.3.3">1212 (+5.12%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.4.3.4">821 (+7.74%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.4.3.5">8.169 (+2.83%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.4.3.6">64</td> </tr> <tr class="ltx_tr" id="S3.T3.5.5.4"> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S3.T3.5.5.4.1">Cond. (C=32)</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T3.5.5.4.2">9.24</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.5.5.4.3">970 (-15.87%)</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.5.5.4.4">592 (-22.31%)</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.5.5.4.5">7.684 (-3.27%)</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.5.5.4.6">32</td> </tr> <tr class="ltx_tr" id="S3.T3.5.6.5"> <td class="ltx_td ltx_align_left ltx_border_r" id="S3.T3.5.6.5.1">Cond. Res. (C=32)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T3.5.6.5.2">-1.66</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.6.5.3">971 (-15.78%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.6.5.4">593 (-22.17%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.6.5.5">7.685 (-3.26%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.6.5.6">32</td> </tr> <tr class="ltx_tr" id="S3.T3.5.7.6"> <td class="ltx_td ltx_align_left ltx_border_r" id="S3.T3.5.7.6.1">Masked Cond. Res. (C=32)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T3.5.7.6.2">-5.95</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.7.6.3">1027 (-10.93%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.7.6.4">649 (-14.83%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.7.6.5">7.908 (-0.45%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.7.6.6">32</td> </tr> <tr class="ltx_tr" id="S3.T3.5.8.7"> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S3.T3.5.8.7.1">Cond. (C=16)</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T3.5.8.7.2">10.57</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.5.8.7.3">913 (-20.82%)</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.5.8.7.4">541 (-29.00%)</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.5.8.7.5">7.589 (-4.47%)</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.5.8.7.6">16</td> </tr> <tr class="ltx_tr" id="S3.T3.5.9.8"> <td class="ltx_td ltx_align_left ltx_border_r" id="S3.T3.5.9.8.1">Cond. Res. (C=16)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T3.5.9.8.2">0.08</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.9.8.3">913 (-20.82%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.9.8.4">541 (-29.00%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.9.8.5">7.589 (-4.47%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.5.9.8.6">16</td> </tr> <tr class="ltx_tr" id="S3.T3.5.10.9"> <td class="ltx_td ltx_align_left ltx_border_bb ltx_border_r" id="S3.T3.5.10.9.1">Masked Cond. Res. (C=16)</td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_r" id="S3.T3.5.10.9.2">-6.22</td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S3.T3.5.10.9.3">970 (-15.87%)</td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S3.T3.5.10.9.4">598 (-21.52%)</td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S3.T3.5.10.9.5">7.812 (-1.66%)</td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S3.T3.5.10.9.6">16</td> </tr> </tbody> </table> </figure> </section> <section class="ltx_subsection" id="S3.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS3.4.1.1">III-C</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS3.5.2">Rate-Distortion Performance</span> </h3> <div class="ltx_para" id="S3.SS3.p1"> <p class="ltx_p" id="S3.SS3.p1.4">Fig. <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S2.F4" title="Figure 4 ‣ II-B3 Masked Conditional Residual Coding ‣ II-B Inter-frame Coding ‣ II Proposed Method ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">4</span></a> compares the rate-distortion performance of conditional coding, conditional residual coding, and masked conditional residual coding. Table <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S3.T2" title="TABLE II ‣ III-A Training ‣ III Experiments ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">II</span></a> reports their BD-rate savings with respect to conditional coding with <math alttext="C=64" class="ltx_Math" display="inline" id="S3.SS3.p1.1.m1.1"><semantics id="S3.SS3.p1.1.m1.1a"><mrow id="S3.SS3.p1.1.m1.1.1" xref="S3.SS3.p1.1.m1.1.1.cmml"><mi id="S3.SS3.p1.1.m1.1.1.2" xref="S3.SS3.p1.1.m1.1.1.2.cmml">C</mi><mo id="S3.SS3.p1.1.m1.1.1.1" xref="S3.SS3.p1.1.m1.1.1.1.cmml">=</mo><mn id="S3.SS3.p1.1.m1.1.1.3" xref="S3.SS3.p1.1.m1.1.1.3.cmml">64</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p1.1.m1.1b"><apply id="S3.SS3.p1.1.m1.1.1.cmml" xref="S3.SS3.p1.1.m1.1.1"><eq id="S3.SS3.p1.1.m1.1.1.1.cmml" xref="S3.SS3.p1.1.m1.1.1.1"></eq><ci id="S3.SS3.p1.1.m1.1.1.2.cmml" xref="S3.SS3.p1.1.m1.1.1.2">𝐶</ci><cn id="S3.SS3.p1.1.m1.1.1.3.cmml" type="integer" xref="S3.SS3.p1.1.m1.1.1.3">64</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p1.1.m1.1c">C=64</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p1.1.m1.1d">italic_C = 64</annotation></semantics></math>. We adjust the channel size <math alttext="C" class="ltx_Math" display="inline" id="S3.SS3.p1.2.m2.1"><semantics id="S3.SS3.p1.2.m2.1a"><mi id="S3.SS3.p1.2.m2.1.1" xref="S3.SS3.p1.2.m2.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S3.SS3.p1.2.m2.1b"><ci id="S3.SS3.p1.2.m2.1.1.cmml" xref="S3.SS3.p1.2.m2.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p1.2.m2.1c">C</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p1.2.m2.1d">italic_C</annotation></semantics></math> of <math alttext="\tilde{x}_{t-1},x_{c}" class="ltx_Math" display="inline" id="S3.SS3.p1.3.m3.2"><semantics id="S3.SS3.p1.3.m3.2a"><mrow id="S3.SS3.p1.3.m3.2.2.2" xref="S3.SS3.p1.3.m3.2.2.3.cmml"><msub id="S3.SS3.p1.3.m3.1.1.1.1" xref="S3.SS3.p1.3.m3.1.1.1.1.cmml"><mover accent="true" id="S3.SS3.p1.3.m3.1.1.1.1.2" xref="S3.SS3.p1.3.m3.1.1.1.1.2.cmml"><mi id="S3.SS3.p1.3.m3.1.1.1.1.2.2" xref="S3.SS3.p1.3.m3.1.1.1.1.2.2.cmml">x</mi><mo id="S3.SS3.p1.3.m3.1.1.1.1.2.1" xref="S3.SS3.p1.3.m3.1.1.1.1.2.1.cmml">~</mo></mover><mrow id="S3.SS3.p1.3.m3.1.1.1.1.3" xref="S3.SS3.p1.3.m3.1.1.1.1.3.cmml"><mi id="S3.SS3.p1.3.m3.1.1.1.1.3.2" xref="S3.SS3.p1.3.m3.1.1.1.1.3.2.cmml">t</mi><mo id="S3.SS3.p1.3.m3.1.1.1.1.3.1" xref="S3.SS3.p1.3.m3.1.1.1.1.3.1.cmml">−</mo><mn id="S3.SS3.p1.3.m3.1.1.1.1.3.3" xref="S3.SS3.p1.3.m3.1.1.1.1.3.3.cmml">1</mn></mrow></msub><mo id="S3.SS3.p1.3.m3.2.2.2.3" xref="S3.SS3.p1.3.m3.2.2.3.cmml">,</mo><msub id="S3.SS3.p1.3.m3.2.2.2.2" xref="S3.SS3.p1.3.m3.2.2.2.2.cmml"><mi id="S3.SS3.p1.3.m3.2.2.2.2.2" xref="S3.SS3.p1.3.m3.2.2.2.2.2.cmml">x</mi><mi id="S3.SS3.p1.3.m3.2.2.2.2.3" xref="S3.SS3.p1.3.m3.2.2.2.2.3.cmml">c</mi></msub></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p1.3.m3.2b"><list id="S3.SS3.p1.3.m3.2.2.3.cmml" xref="S3.SS3.p1.3.m3.2.2.2"><apply id="S3.SS3.p1.3.m3.1.1.1.1.cmml" xref="S3.SS3.p1.3.m3.1.1.1.1"><csymbol cd="ambiguous" id="S3.SS3.p1.3.m3.1.1.1.1.1.cmml" xref="S3.SS3.p1.3.m3.1.1.1.1">subscript</csymbol><apply id="S3.SS3.p1.3.m3.1.1.1.1.2.cmml" xref="S3.SS3.p1.3.m3.1.1.1.1.2"><ci id="S3.SS3.p1.3.m3.1.1.1.1.2.1.cmml" xref="S3.SS3.p1.3.m3.1.1.1.1.2.1">~</ci><ci id="S3.SS3.p1.3.m3.1.1.1.1.2.2.cmml" xref="S3.SS3.p1.3.m3.1.1.1.1.2.2">𝑥</ci></apply><apply id="S3.SS3.p1.3.m3.1.1.1.1.3.cmml" xref="S3.SS3.p1.3.m3.1.1.1.1.3"><minus id="S3.SS3.p1.3.m3.1.1.1.1.3.1.cmml" xref="S3.SS3.p1.3.m3.1.1.1.1.3.1"></minus><ci id="S3.SS3.p1.3.m3.1.1.1.1.3.2.cmml" xref="S3.SS3.p1.3.m3.1.1.1.1.3.2">𝑡</ci><cn id="S3.SS3.p1.3.m3.1.1.1.1.3.3.cmml" type="integer" xref="S3.SS3.p1.3.m3.1.1.1.1.3.3">1</cn></apply></apply><apply id="S3.SS3.p1.3.m3.2.2.2.2.cmml" xref="S3.SS3.p1.3.m3.2.2.2.2"><csymbol cd="ambiguous" id="S3.SS3.p1.3.m3.2.2.2.2.1.cmml" xref="S3.SS3.p1.3.m3.2.2.2.2">subscript</csymbol><ci id="S3.SS3.p1.3.m3.2.2.2.2.2.cmml" xref="S3.SS3.p1.3.m3.2.2.2.2.2">𝑥</ci><ci id="S3.SS3.p1.3.m3.2.2.2.2.3.cmml" xref="S3.SS3.p1.3.m3.2.2.2.2.3">𝑐</ci></apply></list></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p1.3.m3.2c">\tilde{x}_{t-1},x_{c}</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p1.3.m3.2d">over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> and <math alttext="\dot{x}_{c}" class="ltx_Math" display="inline" id="S3.SS3.p1.4.m4.1"><semantics id="S3.SS3.p1.4.m4.1a"><msub id="S3.SS3.p1.4.m4.1.1" xref="S3.SS3.p1.4.m4.1.1.cmml"><mover accent="true" id="S3.SS3.p1.4.m4.1.1.2" xref="S3.SS3.p1.4.m4.1.1.2.cmml"><mi id="S3.SS3.p1.4.m4.1.1.2.2" xref="S3.SS3.p1.4.m4.1.1.2.2.cmml">x</mi><mo id="S3.SS3.p1.4.m4.1.1.2.1" xref="S3.SS3.p1.4.m4.1.1.2.1.cmml">˙</mo></mover><mi id="S3.SS3.p1.4.m4.1.1.3" xref="S3.SS3.p1.4.m4.1.1.3.cmml">c</mi></msub><annotation-xml encoding="MathML-Content" id="S3.SS3.p1.4.m4.1b"><apply id="S3.SS3.p1.4.m4.1.1.cmml" xref="S3.SS3.p1.4.m4.1.1"><csymbol cd="ambiguous" id="S3.SS3.p1.4.m4.1.1.1.cmml" xref="S3.SS3.p1.4.m4.1.1">subscript</csymbol><apply id="S3.SS3.p1.4.m4.1.1.2.cmml" xref="S3.SS3.p1.4.m4.1.1.2"><ci id="S3.SS3.p1.4.m4.1.1.2.1.cmml" xref="S3.SS3.p1.4.m4.1.1.2.1">˙</ci><ci id="S3.SS3.p1.4.m4.1.1.2.2.cmml" xref="S3.SS3.p1.4.m4.1.1.2.2">𝑥</ci></apply><ci id="S3.SS3.p1.4.m4.1.1.3.cmml" xref="S3.SS3.p1.4.m4.1.1.3">𝑐</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p1.4.m4.1c">\dot{x}_{c}</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p1.4.m4.1d">over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT</annotation></semantics></math> to investigate the impact of the information bottleneck on their coding performance. From Table <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S3.T2" title="TABLE II ‣ III-A Training ‣ III Experiments ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">II</span></a>, the following observations are immediate:</p> </div> <div class="ltx_para" id="S3.SS3.p2"> <p class="ltx_p" id="S3.SS3.p2.1">(1) Conditional residual coding consistently outperforms conditional coding under the same <math alttext="C" class="ltx_Math" display="inline" id="S3.SS3.p2.1.m1.1"><semantics id="S3.SS3.p2.1.m1.1a"><mi id="S3.SS3.p2.1.m1.1.1" xref="S3.SS3.p2.1.m1.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S3.SS3.p2.1.m1.1b"><ci id="S3.SS3.p2.1.m1.1.1.cmml" xref="S3.SS3.p2.1.m1.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p2.1.m1.1c">C</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p2.1.m1.1d">italic_C</annotation></semantics></math> values across different datasets. As expected, masked conditional residual coding achieves even higher BD-rate savings.</p> </div> <div class="ltx_para" id="S3.SS3.p3"> <p class="ltx_p" id="S3.SS3.p3.4">(2) Conditional coding is more sensitive to the information bottleneck than both conditional residual coding and masked conditional residual coding. As the information bottleneck becomes more severe (i.e., by reducing <math alttext="C" class="ltx_Math" display="inline" id="S3.SS3.p3.1.m1.1"><semantics id="S3.SS3.p3.1.m1.1a"><mi id="S3.SS3.p3.1.m1.1.1" xref="S3.SS3.p3.1.m1.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S3.SS3.p3.1.m1.1b"><ci id="S3.SS3.p3.1.m1.1.1.cmml" xref="S3.SS3.p3.1.m1.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p3.1.m1.1c">C</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p3.1.m1.1d">italic_C</annotation></semantics></math> from 64 to 32 or to 16), the coding performance of conditional coding decreases significantly, with the BD-rate inflated by about 9% for <math alttext="C=32" class="ltx_Math" display="inline" id="S3.SS3.p3.2.m2.1"><semantics id="S3.SS3.p3.2.m2.1a"><mrow id="S3.SS3.p3.2.m2.1.1" xref="S3.SS3.p3.2.m2.1.1.cmml"><mi id="S3.SS3.p3.2.m2.1.1.2" xref="S3.SS3.p3.2.m2.1.1.2.cmml">C</mi><mo id="S3.SS3.p3.2.m2.1.1.1" xref="S3.SS3.p3.2.m2.1.1.1.cmml">=</mo><mn id="S3.SS3.p3.2.m2.1.1.3" xref="S3.SS3.p3.2.m2.1.1.3.cmml">32</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p3.2.m2.1b"><apply id="S3.SS3.p3.2.m2.1.1.cmml" xref="S3.SS3.p3.2.m2.1.1"><eq id="S3.SS3.p3.2.m2.1.1.1.cmml" xref="S3.SS3.p3.2.m2.1.1.1"></eq><ci id="S3.SS3.p3.2.m2.1.1.2.cmml" xref="S3.SS3.p3.2.m2.1.1.2">𝐶</ci><cn id="S3.SS3.p3.2.m2.1.1.3.cmml" type="integer" xref="S3.SS3.p3.2.m2.1.1.3">32</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p3.2.m2.1c">C=32</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p3.2.m2.1d">italic_C = 32</annotation></semantics></math> and 10% for <math alttext="C=16" class="ltx_Math" display="inline" id="S3.SS3.p3.3.m3.1"><semantics id="S3.SS3.p3.3.m3.1a"><mrow id="S3.SS3.p3.3.m3.1.1" xref="S3.SS3.p3.3.m3.1.1.cmml"><mi id="S3.SS3.p3.3.m3.1.1.2" xref="S3.SS3.p3.3.m3.1.1.2.cmml">C</mi><mo id="S3.SS3.p3.3.m3.1.1.1" xref="S3.SS3.p3.3.m3.1.1.1.cmml">=</mo><mn id="S3.SS3.p3.3.m3.1.1.3" xref="S3.SS3.p3.3.m3.1.1.3.cmml">16</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p3.3.m3.1b"><apply id="S3.SS3.p3.3.m3.1.1.cmml" xref="S3.SS3.p3.3.m3.1.1"><eq id="S3.SS3.p3.3.m3.1.1.1.cmml" xref="S3.SS3.p3.3.m3.1.1.1"></eq><ci id="S3.SS3.p3.3.m3.1.1.2.cmml" xref="S3.SS3.p3.3.m3.1.1.2">𝐶</ci><cn id="S3.SS3.p3.3.m3.1.1.3.cmml" type="integer" xref="S3.SS3.p3.3.m3.1.1.3">16</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p3.3.m3.1c">C=16</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p3.3.m3.1d">italic_C = 16</annotation></semantics></math>. In contrast, the BD-rate inflation of conditional residual coding and masked conditional residual coding is relatively modest when <math alttext="C" class="ltx_Math" display="inline" id="S3.SS3.p3.4.m4.1"><semantics id="S3.SS3.p3.4.m4.1a"><mi id="S3.SS3.p3.4.m4.1.1" xref="S3.SS3.p3.4.m4.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S3.SS3.p3.4.m4.1b"><ci id="S3.SS3.p3.4.m4.1.1.cmml" xref="S3.SS3.p3.4.m4.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p3.4.m4.1c">C</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p3.4.m4.1d">italic_C</annotation></semantics></math> is reduced.</p> </div> <div class="ltx_para" id="S3.SS3.p4"> <p class="ltx_p" id="S3.SS3.p4.8">(3) Conditional residual coding and masked conditional residual coding are able to achieve comparable or even better coding performance than conditional coding with a smaller <math alttext="C" class="ltx_Math" display="inline" id="S3.SS3.p4.1.m1.1"><semantics id="S3.SS3.p4.1.m1.1a"><mi id="S3.SS3.p4.1.m1.1.1" xref="S3.SS3.p4.1.m1.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.1.m1.1b"><ci id="S3.SS3.p4.1.m1.1.1.cmml" xref="S3.SS3.p4.1.m1.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.1.m1.1c">C</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.1.m1.1d">italic_C</annotation></semantics></math> value. Specifically, conditional residual coding with <math alttext="C=16" class="ltx_Math" display="inline" id="S3.SS3.p4.2.m2.1"><semantics id="S3.SS3.p4.2.m2.1a"><mrow id="S3.SS3.p4.2.m2.1.1" xref="S3.SS3.p4.2.m2.1.1.cmml"><mi id="S3.SS3.p4.2.m2.1.1.2" xref="S3.SS3.p4.2.m2.1.1.2.cmml">C</mi><mo id="S3.SS3.p4.2.m2.1.1.1" xref="S3.SS3.p4.2.m2.1.1.1.cmml">=</mo><mn id="S3.SS3.p4.2.m2.1.1.3" xref="S3.SS3.p4.2.m2.1.1.3.cmml">16</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.2.m2.1b"><apply id="S3.SS3.p4.2.m2.1.1.cmml" xref="S3.SS3.p4.2.m2.1.1"><eq id="S3.SS3.p4.2.m2.1.1.1.cmml" xref="S3.SS3.p4.2.m2.1.1.1"></eq><ci id="S3.SS3.p4.2.m2.1.1.2.cmml" xref="S3.SS3.p4.2.m2.1.1.2">𝐶</ci><cn id="S3.SS3.p4.2.m2.1.1.3.cmml" type="integer" xref="S3.SS3.p4.2.m2.1.1.3">16</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.2.m2.1c">C=16</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.2.m2.1d">italic_C = 16</annotation></semantics></math> achieves comparable coding performance to conditional coding with <math alttext="C=64" class="ltx_Math" display="inline" id="S3.SS3.p4.3.m3.1"><semantics id="S3.SS3.p4.3.m3.1a"><mrow id="S3.SS3.p4.3.m3.1.1" xref="S3.SS3.p4.3.m3.1.1.cmml"><mi id="S3.SS3.p4.3.m3.1.1.2" xref="S3.SS3.p4.3.m3.1.1.2.cmml">C</mi><mo id="S3.SS3.p4.3.m3.1.1.1" xref="S3.SS3.p4.3.m3.1.1.1.cmml">=</mo><mn id="S3.SS3.p4.3.m3.1.1.3" xref="S3.SS3.p4.3.m3.1.1.3.cmml">64</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.3.m3.1b"><apply id="S3.SS3.p4.3.m3.1.1.cmml" xref="S3.SS3.p4.3.m3.1.1"><eq id="S3.SS3.p4.3.m3.1.1.1.cmml" xref="S3.SS3.p4.3.m3.1.1.1"></eq><ci id="S3.SS3.p4.3.m3.1.1.2.cmml" xref="S3.SS3.p4.3.m3.1.1.2">𝐶</ci><cn id="S3.SS3.p4.3.m3.1.1.3.cmml" type="integer" xref="S3.SS3.p4.3.m3.1.1.3">64</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.3.m3.1c">C=64</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.3.m3.1d">italic_C = 64</annotation></semantics></math>, while conditional residual coding with <math alttext="C=32" class="ltx_Math" display="inline" id="S3.SS3.p4.4.m4.1"><semantics id="S3.SS3.p4.4.m4.1a"><mrow id="S3.SS3.p4.4.m4.1.1" xref="S3.SS3.p4.4.m4.1.1.cmml"><mi id="S3.SS3.p4.4.m4.1.1.2" xref="S3.SS3.p4.4.m4.1.1.2.cmml">C</mi><mo id="S3.SS3.p4.4.m4.1.1.1" xref="S3.SS3.p4.4.m4.1.1.1.cmml">=</mo><mn id="S3.SS3.p4.4.m4.1.1.3" xref="S3.SS3.p4.4.m4.1.1.3.cmml">32</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.4.m4.1b"><apply id="S3.SS3.p4.4.m4.1.1.cmml" xref="S3.SS3.p4.4.m4.1.1"><eq id="S3.SS3.p4.4.m4.1.1.1.cmml" xref="S3.SS3.p4.4.m4.1.1.1"></eq><ci id="S3.SS3.p4.4.m4.1.1.2.cmml" xref="S3.SS3.p4.4.m4.1.1.2">𝐶</ci><cn id="S3.SS3.p4.4.m4.1.1.3.cmml" type="integer" xref="S3.SS3.p4.4.m4.1.1.3">32</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.4.m4.1c">C=32</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.4.m4.1d">italic_C = 32</annotation></semantics></math> and masked conditional residual coding with <math alttext="C=32" class="ltx_Math" display="inline" id="S3.SS3.p4.5.m5.1"><semantics id="S3.SS3.p4.5.m5.1a"><mrow id="S3.SS3.p4.5.m5.1.1" xref="S3.SS3.p4.5.m5.1.1.cmml"><mi id="S3.SS3.p4.5.m5.1.1.2" xref="S3.SS3.p4.5.m5.1.1.2.cmml">C</mi><mo id="S3.SS3.p4.5.m5.1.1.1" xref="S3.SS3.p4.5.m5.1.1.1.cmml">=</mo><mn id="S3.SS3.p4.5.m5.1.1.3" xref="S3.SS3.p4.5.m5.1.1.3.cmml">32</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.5.m5.1b"><apply id="S3.SS3.p4.5.m5.1.1.cmml" xref="S3.SS3.p4.5.m5.1.1"><eq id="S3.SS3.p4.5.m5.1.1.1.cmml" xref="S3.SS3.p4.5.m5.1.1.1"></eq><ci id="S3.SS3.p4.5.m5.1.1.2.cmml" xref="S3.SS3.p4.5.m5.1.1.2">𝐶</ci><cn id="S3.SS3.p4.5.m5.1.1.3.cmml" type="integer" xref="S3.SS3.p4.5.m5.1.1.3">32</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.5.m5.1c">C=32</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.5.m5.1d">italic_C = 32</annotation></semantics></math> and <math alttext="C=16" class="ltx_Math" display="inline" id="S3.SS3.p4.6.m6.1"><semantics id="S3.SS3.p4.6.m6.1a"><mrow id="S3.SS3.p4.6.m6.1.1" xref="S3.SS3.p4.6.m6.1.1.cmml"><mi id="S3.SS3.p4.6.m6.1.1.2" xref="S3.SS3.p4.6.m6.1.1.2.cmml">C</mi><mo id="S3.SS3.p4.6.m6.1.1.1" xref="S3.SS3.p4.6.m6.1.1.1.cmml">=</mo><mn id="S3.SS3.p4.6.m6.1.1.3" xref="S3.SS3.p4.6.m6.1.1.3.cmml">16</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.6.m6.1b"><apply id="S3.SS3.p4.6.m6.1.1.cmml" xref="S3.SS3.p4.6.m6.1.1"><eq id="S3.SS3.p4.6.m6.1.1.1.cmml" xref="S3.SS3.p4.6.m6.1.1.1"></eq><ci id="S3.SS3.p4.6.m6.1.1.2.cmml" xref="S3.SS3.p4.6.m6.1.1.2">𝐶</ci><cn id="S3.SS3.p4.6.m6.1.1.3.cmml" type="integer" xref="S3.SS3.p4.6.m6.1.1.3">16</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.6.m6.1c">C=16</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.6.m6.1d">italic_C = 16</annotation></semantics></math> shows higher coding gain than conditional coding with <math alttext="C=64" class="ltx_Math" display="inline" id="S3.SS3.p4.7.m7.1"><semantics id="S3.SS3.p4.7.m7.1a"><mrow id="S3.SS3.p4.7.m7.1.1" xref="S3.SS3.p4.7.m7.1.1.cmml"><mi id="S3.SS3.p4.7.m7.1.1.2" xref="S3.SS3.p4.7.m7.1.1.2.cmml">C</mi><mo id="S3.SS3.p4.7.m7.1.1.1" xref="S3.SS3.p4.7.m7.1.1.1.cmml">=</mo><mn id="S3.SS3.p4.7.m7.1.1.3" xref="S3.SS3.p4.7.m7.1.1.3.cmml">64</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.7.m7.1b"><apply id="S3.SS3.p4.7.m7.1.1.cmml" xref="S3.SS3.p4.7.m7.1.1"><eq id="S3.SS3.p4.7.m7.1.1.1.cmml" xref="S3.SS3.p4.7.m7.1.1.1"></eq><ci id="S3.SS3.p4.7.m7.1.1.2.cmml" xref="S3.SS3.p4.7.m7.1.1.2">𝐶</ci><cn id="S3.SS3.p4.7.m7.1.1.3.cmml" type="integer" xref="S3.SS3.p4.7.m7.1.1.3">64</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.7.m7.1c">C=64</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.7.m7.1d">italic_C = 64</annotation></semantics></math>. A smaller <math alttext="C" class="ltx_Math" display="inline" id="S3.SS3.p4.8.m8.1"><semantics id="S3.SS3.p4.8.m8.1a"><mi id="S3.SS3.p4.8.m8.1.1" xref="S3.SS3.p4.8.m8.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.8.m8.1b"><ci id="S3.SS3.p4.8.m8.1.1.cmml" xref="S3.SS3.p4.8.m8.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.8.m8.1c">C</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.8.m8.1d">italic_C</annotation></semantics></math> value generally implies lower complexity. An in-depth rate-distortion-complexity analysis is provided in Section <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S3.SS4" title="III-D Complexity Analysis ‣ III Experiments ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">III-D</span></span></a>.</p> </div> <div class="ltx_para" id="S3.SS3.p5"> <p class="ltx_p" id="S3.SS3.p5.4">To further investigate the inner workings of conditional residual coding and masked conditional residual coding under different bottleneck levels, Fig. <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S3.F5" title="Figure 5 ‣ III-B Evaluation ‣ III Experiments ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">5</span></a> visualizes their input signals and the masks. We see that when the bottleneck issue is mild (<math alttext="C=64" class="ltx_Math" display="inline" id="S3.SS3.p5.1.m1.1"><semantics id="S3.SS3.p5.1.m1.1a"><mrow id="S3.SS3.p5.1.m1.1.1" xref="S3.SS3.p5.1.m1.1.1.cmml"><mi id="S3.SS3.p5.1.m1.1.1.2" xref="S3.SS3.p5.1.m1.1.1.2.cmml">C</mi><mo id="S3.SS3.p5.1.m1.1.1.1" xref="S3.SS3.p5.1.m1.1.1.1.cmml">=</mo><mn id="S3.SS3.p5.1.m1.1.1.3" xref="S3.SS3.p5.1.m1.1.1.3.cmml">64</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p5.1.m1.1b"><apply id="S3.SS3.p5.1.m1.1.1.cmml" xref="S3.SS3.p5.1.m1.1.1"><eq id="S3.SS3.p5.1.m1.1.1.1.cmml" xref="S3.SS3.p5.1.m1.1.1.1"></eq><ci id="S3.SS3.p5.1.m1.1.1.2.cmml" xref="S3.SS3.p5.1.m1.1.1.2">𝐶</ci><cn id="S3.SS3.p5.1.m1.1.1.3.cmml" type="integer" xref="S3.SS3.p5.1.m1.1.1.3">64</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p5.1.m1.1c">C=64</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p5.1.m1.1d">italic_C = 64</annotation></semantics></math>), the resulting input signals of both codecs has more content information. In a sense, both codecs appear to operate in the conditional coding mode. In contrast, when <math alttext="C=32" class="ltx_Math" display="inline" id="S3.SS3.p5.2.m2.1"><semantics id="S3.SS3.p5.2.m2.1a"><mrow id="S3.SS3.p5.2.m2.1.1" xref="S3.SS3.p5.2.m2.1.1.cmml"><mi id="S3.SS3.p5.2.m2.1.1.2" xref="S3.SS3.p5.2.m2.1.1.2.cmml">C</mi><mo id="S3.SS3.p5.2.m2.1.1.1" xref="S3.SS3.p5.2.m2.1.1.1.cmml">=</mo><mn id="S3.SS3.p5.2.m2.1.1.3" xref="S3.SS3.p5.2.m2.1.1.3.cmml">32</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p5.2.m2.1b"><apply id="S3.SS3.p5.2.m2.1.1.cmml" xref="S3.SS3.p5.2.m2.1.1"><eq id="S3.SS3.p5.2.m2.1.1.1.cmml" xref="S3.SS3.p5.2.m2.1.1.1"></eq><ci id="S3.SS3.p5.2.m2.1.1.2.cmml" xref="S3.SS3.p5.2.m2.1.1.2">𝐶</ci><cn id="S3.SS3.p5.2.m2.1.1.3.cmml" type="integer" xref="S3.SS3.p5.2.m2.1.1.3">32</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p5.2.m2.1c">C=32</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p5.2.m2.1d">italic_C = 32</annotation></semantics></math> or <math alttext="16" class="ltx_Math" display="inline" id="S3.SS3.p5.3.m3.1"><semantics id="S3.SS3.p5.3.m3.1a"><mn id="S3.SS3.p5.3.m3.1.1" xref="S3.SS3.p5.3.m3.1.1.cmml">16</mn><annotation-xml encoding="MathML-Content" id="S3.SS3.p5.3.m3.1b"><cn id="S3.SS3.p5.3.m3.1.1.cmml" type="integer" xref="S3.SS3.p5.3.m3.1.1">16</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p5.3.m3.1c">16</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p5.3.m3.1d">16</annotation></semantics></math>, the input signals resemble more closely a form of residual signals. This observation is in line with the findings in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#bib.bib14" title="">14</a>]</cite>, which indicate that when the bottleneck issue is mild, conditional residual coding and conditional coding perform comparably to each other. Additionally, it is worth noting that with masked conditional residual coding, the masks <math alttext="m" class="ltx_Math" display="inline" id="S3.SS3.p5.4.m4.1"><semantics id="S3.SS3.p5.4.m4.1a"><mi id="S3.SS3.p5.4.m4.1.1" xref="S3.SS3.p5.4.m4.1.1.cmml">m</mi><annotation-xml encoding="MathML-Content" id="S3.SS3.p5.4.m4.1b"><ci id="S3.SS3.p5.4.m4.1.1.cmml" xref="S3.SS3.p5.4.m4.1.1">𝑚</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p5.4.m4.1c">m</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p5.4.m4.1d">italic_m</annotation></semantics></math> exhibit relatively lower values in regions where motion estimates are unreliable, such as regions with dis-occlusion, around object boundaries or with complex motion. In other words, conditional coding is more preferred in these regions.</p> </div> </section> <section class="ltx_subsection" id="S3.SS4"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS4.4.1.1">III-D</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS4.5.2">Complexity Analysis</span> </h3> <div class="ltx_para" id="S3.SS4.p1"> <p class="ltx_p" id="S3.SS4.p1.5">Table <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S3.T3" title="TABLE III ‣ III-B Evaluation ‣ III Experiments ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">III</span></a> and Fig. <a class="ltx_ref" href="https://arxiv.org/html/2410.03898v1#S1.F1" title="Figure 1 ‣ I Introduction ‣ On the Rate-Distortion-Complexity Trade-offs of Neural Video Coding"><span class="ltx_text ltx_ref_tag">1</span></a> report the rate-distortion-complexity trade-offs of conditional coding, conditional residual coding, and masked conditional residual coding under different levels of the information bottleneck. As shown, conditional residual coding and masked conditional residual coding achieve higher coding performance while requiring lower complexity. For example, when comparing conditional coding with <math alttext="C=64" class="ltx_Math" display="inline" id="S3.SS4.p1.1.m1.1"><semantics id="S3.SS4.p1.1.m1.1a"><mrow id="S3.SS4.p1.1.m1.1.1" xref="S3.SS4.p1.1.m1.1.1.cmml"><mi id="S3.SS4.p1.1.m1.1.1.2" xref="S3.SS4.p1.1.m1.1.1.2.cmml">C</mi><mo id="S3.SS4.p1.1.m1.1.1.1" xref="S3.SS4.p1.1.m1.1.1.1.cmml">=</mo><mn id="S3.SS4.p1.1.m1.1.1.3" xref="S3.SS4.p1.1.m1.1.1.3.cmml">64</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS4.p1.1.m1.1b"><apply id="S3.SS4.p1.1.m1.1.1.cmml" xref="S3.SS4.p1.1.m1.1.1"><eq id="S3.SS4.p1.1.m1.1.1.1.cmml" xref="S3.SS4.p1.1.m1.1.1.1"></eq><ci id="S3.SS4.p1.1.m1.1.1.2.cmml" xref="S3.SS4.p1.1.m1.1.1.2">𝐶</ci><cn id="S3.SS4.p1.1.m1.1.1.3.cmml" type="integer" xref="S3.SS4.p1.1.m1.1.1.3">64</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS4.p1.1.m1.1c">C=64</annotation><annotation encoding="application/x-llamapun" id="S3.SS4.p1.1.m1.1d">italic_C = 64</annotation></semantics></math>, conditional residual coding with <math alttext="C=32" class="ltx_Math" display="inline" id="S3.SS4.p1.2.m2.1"><semantics id="S3.SS4.p1.2.m2.1a"><mrow id="S3.SS4.p1.2.m2.1.1" xref="S3.SS4.p1.2.m2.1.1.cmml"><mi id="S3.SS4.p1.2.m2.1.1.2" xref="S3.SS4.p1.2.m2.1.1.2.cmml">C</mi><mo id="S3.SS4.p1.2.m2.1.1.1" xref="S3.SS4.p1.2.m2.1.1.1.cmml">=</mo><mn id="S3.SS4.p1.2.m2.1.1.3" xref="S3.SS4.p1.2.m2.1.1.3.cmml">32</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS4.p1.2.m2.1b"><apply id="S3.SS4.p1.2.m2.1.1.cmml" xref="S3.SS4.p1.2.m2.1.1"><eq id="S3.SS4.p1.2.m2.1.1.1.cmml" xref="S3.SS4.p1.2.m2.1.1.1"></eq><ci id="S3.SS4.p1.2.m2.1.1.2.cmml" xref="S3.SS4.p1.2.m2.1.1.2">𝐶</ci><cn id="S3.SS4.p1.2.m2.1.1.3.cmml" type="integer" xref="S3.SS4.p1.2.m2.1.1.3">32</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS4.p1.2.m2.1c">C=32</annotation><annotation encoding="application/x-llamapun" id="S3.SS4.p1.2.m2.1d">italic_C = 32</annotation></semantics></math>, and masked conditional residual coding with <math alttext="C=16" class="ltx_Math" display="inline" id="S3.SS4.p1.3.m3.1"><semantics id="S3.SS4.p1.3.m3.1a"><mrow id="S3.SS4.p1.3.m3.1.1" xref="S3.SS4.p1.3.m3.1.1.cmml"><mi id="S3.SS4.p1.3.m3.1.1.2" xref="S3.SS4.p1.3.m3.1.1.2.cmml">C</mi><mo id="S3.SS4.p1.3.m3.1.1.1" xref="S3.SS4.p1.3.m3.1.1.1.cmml">=</mo><mn id="S3.SS4.p1.3.m3.1.1.3" xref="S3.SS4.p1.3.m3.1.1.3.cmml">16</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS4.p1.3.m3.1b"><apply id="S3.SS4.p1.3.m3.1.1.cmml" xref="S3.SS4.p1.3.m3.1.1"><eq id="S3.SS4.p1.3.m3.1.1.1.cmml" xref="S3.SS4.p1.3.m3.1.1.1"></eq><ci id="S3.SS4.p1.3.m3.1.1.2.cmml" xref="S3.SS4.p1.3.m3.1.1.2">𝐶</ci><cn id="S3.SS4.p1.3.m3.1.1.3.cmml" type="integer" xref="S3.SS4.p1.3.m3.1.1.3">16</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS4.p1.3.m3.1c">C=16</annotation><annotation encoding="application/x-llamapun" id="S3.SS4.p1.3.m3.1d">italic_C = 16</annotation></semantics></math>, conditional residual coding achieves a 1.66% BD-rate saving to conditional coding while masked conditional residual coding has a 6.22% BD-rate saving. Notably, in this comparison, both conditional residual coding and masked conditional residual coding exhibit approximately 16% and 22% reductions in encoding and decoding kMACs/pixel, respectively, as compared to conditional coding. In contrast, at a similar complexity level (about 16% and 22% reductions in encoding and decoding kMACs/pixel), the coding performance of conditional coding with <math alttext="C=32" class="ltx_Math" display="inline" id="S3.SS4.p1.4.m4.1"><semantics id="S3.SS4.p1.4.m4.1a"><mrow id="S3.SS4.p1.4.m4.1.1" xref="S3.SS4.p1.4.m4.1.1.cmml"><mi id="S3.SS4.p1.4.m4.1.1.2" xref="S3.SS4.p1.4.m4.1.1.2.cmml">C</mi><mo id="S3.SS4.p1.4.m4.1.1.1" xref="S3.SS4.p1.4.m4.1.1.1.cmml">=</mo><mn id="S3.SS4.p1.4.m4.1.1.3" xref="S3.SS4.p1.4.m4.1.1.3.cmml">32</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS4.p1.4.m4.1b"><apply id="S3.SS4.p1.4.m4.1.1.cmml" xref="S3.SS4.p1.4.m4.1.1"><eq id="S3.SS4.p1.4.m4.1.1.1.cmml" xref="S3.SS4.p1.4.m4.1.1.1"></eq><ci id="S3.SS4.p1.4.m4.1.1.2.cmml" xref="S3.SS4.p1.4.m4.1.1.2">𝐶</ci><cn id="S3.SS4.p1.4.m4.1.1.3.cmml" type="integer" xref="S3.SS4.p1.4.m4.1.1.3">32</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS4.p1.4.m4.1c">C=32</annotation><annotation encoding="application/x-llamapun" id="S3.SS4.p1.4.m4.1d">italic_C = 32</annotation></semantics></math> drops by about 9%. The channel size <math alttext="C" class="ltx_Math" display="inline" id="S3.SS4.p1.5.m5.1"><semantics id="S3.SS4.p1.5.m5.1a"><mi id="S3.SS4.p1.5.m5.1.1" xref="S3.SS4.p1.5.m5.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S3.SS4.p1.5.m5.1b"><ci id="S3.SS4.p1.5.m5.1.1.cmml" xref="S3.SS4.p1.5.m5.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS4.p1.5.m5.1c">C</annotation><annotation encoding="application/x-llamapun" id="S3.SS4.p1.5.m5.1d">italic_C</annotation></semantics></math> also has an impact on the memory footprint of these coding schemes. For similar coding performance, both conditional residual coding and masked conditional residual coding require fewer channels than conditional coding.</p> </div> </section> </section> <section class="ltx_section" id="S4"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">IV </span><span class="ltx_text ltx_font_smallcaps" id="S4.1.1">Conclusion</span> </h2> <div class="ltx_para" id="S4.p1"> <p class="ltx_p" id="S4.p1.1">This work explores the rate-distortion-complexity trade-offs of neural video codecs that adopt conditional coding, conditional residual coding and masked conditional residual coding. Our major findings include (1) conditional residual coding and masked conditional residual coding are able to effectively mitigate the information bottleneck issue of conditional coding, (2) at a similar or even better coding performance level, conditional residual coding and masked conditional residual coding require 84% and 78% of the encoding and decoding kMACs/pixel compared to conditional coding. This work paves the way for more efficient video compression techniques with improved performance and reduced computational complexity.</p> </div> </section> <section class="ltx_bibliography" id="bib"> <h2 class="ltx_title ltx_title_bibliography">References</h2> <ul class="ltx_biblist"> <li class="ltx_bibitem" id="bib.bib1"> <span class="ltx_tag ltx_tag_bibitem">[1]</span> <span class="ltx_bibblock"> J. Li, B. Li, and Y. Lu, “Deep contextual video compression,” in <em class="ltx_emph ltx_font_italic" id="bib.bib1.1.1">Advances in Neural Information Processing Systems</em>, 2021, pp. 18 114–18 125. </span> </li> <li class="ltx_bibitem" id="bib.bib2"> <span class="ltx_tag ltx_tag_bibitem">[2]</span> <span class="ltx_bibblock"> Y.-H. Ho, C.-P. Chang, P.-Y. Chen, A. Gnutti, and W.-H. Peng, “Canf-vc: Conditional augmented normalizing flows for video compression,” in <em class="ltx_emph ltx_font_italic" id="bib.bib2.1.1">European Conference on Computer Vision</em>, 2022, pp. 207–223. </span> </li> <li class="ltx_bibitem" id="bib.bib3"> <span class="ltx_tag ltx_tag_bibitem">[3]</span> <span class="ltx_bibblock"> X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y. Lu, “Temporal context mining for learned video compression,” <em class="ltx_emph ltx_font_italic" id="bib.bib3.1.1">IEEE Transactions on Multimedia</em>, vol. 25, pp. 7311–7322, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib4"> <span class="ltx_tag ltx_tag_bibitem">[4]</span> <span class="ltx_bibblock"> J. Li, B. Li, and Y. Lu, “Hybrid spatial-temporal entropy modelling for neural video compression,” in <em class="ltx_emph ltx_font_italic" id="bib.bib4.1.1">Proceedings of the 30th ACM International Conference on Multimedia</em>, 2022, pp. 1503–1511. </span> </li> <li class="ltx_bibitem" id="bib.bib5"> <span class="ltx_tag ltx_tag_bibitem">[5]</span> <span class="ltx_bibblock"> Y.-H. Chen, H.-S. Xie, C.-W. Chen, Z.-L. Gao, M. Benjak, W.-H. Peng, and J. Ostermann, “Maskcrt: Masked conditional residual transformer for learned video compression,” <em class="ltx_emph ltx_font_italic" id="bib.bib5.1.1">IEEE Transactions on Circuits and Systems for Video Technology</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib6"> <span class="ltx_tag ltx_tag_bibitem">[6]</span> <span class="ltx_bibblock"> J. Li, B. Li, and Y. Lu, “Neural video compression with diverse contexts,” in <em class="ltx_emph ltx_font_italic" id="bib.bib6.1.1">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</em>, 2023, pp. 22 616–22 626. </span> </li> <li class="ltx_bibitem" id="bib.bib7"> <span class="ltx_tag ltx_tag_bibitem">[7]</span> <span class="ltx_bibblock"> ——, “Neural video compression with feature modulation,” in <em class="ltx_emph ltx_font_italic" id="bib.bib7.1.1">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</em>, June 2024, pp. 26 099–26 108. </span> </li> <li class="ltx_bibitem" id="bib.bib8"> <span class="ltx_tag ltx_tag_bibitem">[8]</span> <span class="ltx_bibblock"> B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,” <em class="ltx_emph ltx_font_italic" id="bib.bib8.1.1">IEEE Transactions on Circuits and Systems for Video Technology</em>, vol. 31, no. 10, pp. 3736–3764, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib9"> <span class="ltx_tag ltx_tag_bibitem">[9]</span> <span class="ltx_bibblock"> “Ecm,” https://vcgit.hhi.fraunhofer.de/ecm/ECM. </span> </li> <li class="ltx_bibitem" id="bib.bib10"> <span class="ltx_tag ltx_tag_bibitem">[10]</span> <span class="ltx_bibblock"> G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” <em class="ltx_emph ltx_font_italic" id="bib.bib10.1.1">IEEE Transactions on circuits and systems for video technology</em>, vol. 22, no. 12, pp. 1649–1668, 2012. </span> </li> <li class="ltx_bibitem" id="bib.bib11"> <span class="ltx_tag ltx_tag_bibitem">[11]</span> <span class="ltx_bibblock"> “Vtm-17.0,” https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM, accessed: 2023-10-30. </span> </li> <li class="ltx_bibitem" id="bib.bib12"> <span class="ltx_tag ltx_tag_bibitem">[12]</span> <span class="ltx_bibblock"> T. Ladune, P. Philippe, W. Hamidouche, L. Zhang, and O. Déforges, “Optical flow and mode selection for learning-based video coding,” in <em class="ltx_emph ltx_font_italic" id="bib.bib12.1.1">IEEE 22nd International Workshop on Multimedia Signal Processing</em>, 2020, pp. 1–6. </span> </li> <li class="ltx_bibitem" id="bib.bib13"> <span class="ltx_tag ltx_tag_bibitem">[13]</span> <span class="ltx_bibblock"> F. Brand, J. Seiler, and A. Kaup, “On benefits and challenges of conditional interframe video coding in light of information theory,” in <em class="ltx_emph ltx_font_italic" id="bib.bib13.1.1">IEEE Picture Coding Symposium</em>, 2022, pp. 289–293. </span> </li> <li class="ltx_bibitem" id="bib.bib14"> <span class="ltx_tag ltx_tag_bibitem">[14]</span> <span class="ltx_bibblock"> ——, “Conditional residual coding: A remedy for bottleneck problems in conditional inter frame coding,” <em class="ltx_emph ltx_font_italic" id="bib.bib14.1.1">IEEE Transactions on Circuits and Systems for Video Technology</em>, vol. 34, no. 7, pp. 6445–6459, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib15"> <span class="ltx_tag ltx_tag_bibitem">[15]</span> <span class="ltx_bibblock"> G. Lu, C. Cai, X. Zhang, L. Chen, W. Ouyang, D. Xu, and Z. Gao, “Content adaptive and error propagation aware deep video compression,” in <em class="ltx_emph ltx_font_italic" id="bib.bib15.1.1">European Conference on Computer Vision</em>, 2020, pp. 456–472. </span> </li> <li class="ltx_bibitem" id="bib.bib16"> <span class="ltx_tag ltx_tag_bibitem">[16]</span> <span class="ltx_bibblock"> T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with task-oriented flow,” <em class="ltx_emph ltx_font_italic" id="bib.bib16.1.1">International Journal of Computer Vision</em>, vol. 127, no. 8, pp. 1106–1125, 2019. </span> </li> <li class="ltx_bibitem" id="bib.bib17"> <span class="ltx_tag ltx_tag_bibitem">[17]</span> <span class="ltx_bibblock"> https://github.com/microsoft/DCVC/tree/main/DCVC. </span> </li> <li class="ltx_bibitem" id="bib.bib18"> <span class="ltx_tag ltx_tag_bibitem">[18]</span> <span class="ltx_bibblock"> A. Mercat, M. Viitanen, and J. Vanne, “UVG dataset: 50/120fps 4k sequences for video codec analysis and development,” in <em class="ltx_emph ltx_font_italic" id="bib.bib18.1.1">Proceedings of the ACM Multimedia Systems Conference</em>, 2020, pp. 297–302. </span> </li> <li class="ltx_bibitem" id="bib.bib19"> <span class="ltx_tag ltx_tag_bibitem">[19]</span> <span class="ltx_bibblock"> F. Bossen <em class="ltx_emph ltx_font_italic" id="bib.bib19.1.1">et al.</em>, “Common test conditions and software reference configurations,” <em class="ltx_emph ltx_font_italic" id="bib.bib19.2.2">JCTVC-L1100</em>, vol. 12, no. 7, 2013. </span> </li> <li class="ltx_bibitem" id="bib.bib20"> <span class="ltx_tag ltx_tag_bibitem">[20]</span> <span class="ltx_bibblock"> D. Flynn <em class="ltx_emph ltx_font_italic" id="bib.bib20.1.1">et al.</em>, “Common test conditions and software reference configurations for hevc range extensions,” <em class="ltx_emph ltx_font_italic" id="bib.bib20.2.2">JCVTC-n1006</em>, 2013. </span> </li> <li class="ltx_bibitem" id="bib.bib21"> <span class="ltx_tag ltx_tag_bibitem">[21]</span> <span class="ltx_bibblock"> H. Wang, W. Gan, S. Hu, J. Y. Lin, L. Jin, L. Song, P. Wang, I. Katsavounidis, A. Aaron, and C.-C. J. Kuo, “MCL-JCV: a jnd-based h. 264/avc video quality assessment dataset,” in <em class="ltx_emph ltx_font_italic" id="bib.bib21.1.1">IEEE International Conference on Image Processing</em>, 2016, pp. 1509–1513. </span> </li> <li class="ltx_bibitem" id="bib.bib22"> <span class="ltx_tag ltx_tag_bibitem">[22]</span> <span class="ltx_bibblock"> “Ffmpeg,” https://www.ffmpeg.org/, accessed: 2022-05-18. </span> </li> <li class="ltx_bibitem" id="bib.bib23"> <span class="ltx_tag ltx_tag_bibitem">[23]</span> <span class="ltx_bibblock"> “Hstp-vid-wpom - working practices using objective metrics for evaluation of video coding efficiency experiments.” iTU-T Technical Paper, 2020. </span> </li> </ul> </section> </article> </div> <footer class="ltx_page_footer"> <div class="ltx_page_logo">Generated on Fri Oct 4 19:55:36 2024 by <a class="ltx_LaTeXML_logo" href="http://dlmf.nist.gov/LaTeXML/"><span style="letter-spacing:-0.2em; margin-right:0.1em;">L<span class="ltx_font_smallcaps" style="position:relative; bottom:2.2pt;">a</span>T<span class="ltx_font_smallcaps" style="font-size:120%;position:relative; bottom:-0.2ex;">e</span></span><span style="font-size:90%; position:relative; bottom:-0.2ex;">XML</span><img alt="Mascot Sammy" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAsAAAAOCAYAAAD5YeaVAAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A/wD/oL2nkwAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9wKExQZLWTEaOUAAAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAdpJREFUKM9tkL+L2nAARz9fPZNCKFapUn8kyI0e4iRHSR1Kb8ng0lJw6FYHFwv2LwhOpcWxTjeUunYqOmqd6hEoRDhtDWdA8ApRYsSUCDHNt5ul13vz4w0vWCgUnnEc975arX6ORqN3VqtVZbfbTQC4uEHANM3jSqXymFI6yWazP2KxWAXAL9zCUa1Wy2tXVxheKA9YNoR8Pt+aTqe4FVVVvz05O6MBhqUIBGk8Hn8HAOVy+T+XLJfLS4ZhTiRJgqIoVBRFIoric47jPnmeB1mW/9rr9ZpSSn3Lsmir1fJZlqWlUonKsvwWwD8ymc/nXwVBeLjf7xEKhdBut9Hr9WgmkyGEkJwsy5eHG5vN5g0AKIoCAEgkEkin0wQAfN9/cXPdheu6P33fBwB4ngcAcByHJpPJl+fn54mD3Gg0NrquXxeLRQAAwzAYj8cwTZPwPH9/sVg8PXweDAauqqr2cDjEer1GJBLBZDJBs9mE4zjwfZ85lAGg2+06hmGgXq+j3+/DsixYlgVN03a9Xu8jgCNCyIegIAgx13Vfd7vdu+FweG8YRkjXdWy329+dTgeSJD3ieZ7RNO0VAXAPwDEAO5VKndi2fWrb9jWl9Esul6PZbDY9Go1OZ7PZ9z/lyuD3OozU2wAAAABJRU5ErkJggg=="/></a> </div></footer> </div> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10