CINXE.COM
LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression
<!DOCTYPE html> <html lang="en"> <head> <meta content="text/html; charset=utf-8" http-equiv="content-type"/> <title>LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression</title> <!--Generated on Mon Mar 11 12:41:10 2024 by LaTeXML (version 0.8.7) http://dlmf.nist.gov/LaTeXML/.--> <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv_0.7.4.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/latexml_styles.css" rel="stylesheet" type="text/css"/> <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.3.3/html2canvas.min.js"></script> <script src="/static/browse/0.3.4/js/addons.js"></script> <script src="/static/browse/0.3.4/js/feedbackOverlay.js"></script> <base href="/html/2402.00680v3/"/></head> <body> <nav class="ltx_page_navbar"> <nav class="ltx_TOC"> <ol class="ltx_toclist"> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S1" title="1 Introduction ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">1 </span>Introduction</span></a></li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S2" title="2 The Proposed LVC-LGMC Method ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">2 </span>The Proposed LVC-LGMC Method</span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S2.SS1" title="2.1 Flow-based Local Compensation ‣ 2 The Proposed LVC-LGMC Method ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">2.1 </span>Flow-based Local Compensation</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S2.SS2" title="2.2 Attention-based Global Compensation ‣ 2 The Proposed LVC-LGMC Method ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">2.2 </span>Attention-based Global Compensation</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S2.SS3" title="2.3 Joint Local and Global Motion Compensation for Learned Video Compression ‣ 2 The Proposed LVC-LGMC Method ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">2.3 </span>Joint Local and Global Motion Compensation for Learned Video Compression</span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S3" title="3 Experiments ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3 </span>Experiments</span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S3.SS1" title="3.1 Experimental Setup ‣ 3 Experiments ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.1 </span>Experimental Setup</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S3.SS2" title="3.2 Rate Distortion Performance ‣ 3 Experiments ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.2 </span>Rate Distortion Performance</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S3.SS3" title="3.3 Analyses and Discussions ‣ 3 Experiments ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.3 </span>Analyses and Discussions</span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S3.SS3.SSS1" title="3.3.1 Bit Allocation ‣ 3.3 Analyses and Discussions ‣ 3 Experiments ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.3.1 </span>Bit Allocation</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S3.SS3.SSS2" title="3.3.2 Ablation Studies ‣ 3.3 Analyses and Discussions ‣ 3 Experiments ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.3.2 </span>Ablation Studies</span></a></li> </ol> </li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S4" title="4 Conclusion ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4 </span>Conclusion</span></a></li> </ol></nav> </nav> <div class="ltx_page_main"> <div class="ltx_page_content"><div class="section" id="target-section"><div id="license-tr">License: arXiv.org perpetual non-exclusive license</div><div id="watermark-tr">arXiv:2402.00680v3 [eess.IV] 11 Mar 2024</div></div> <article class="ltx_document ltx_authors_1line"> <h1 class="ltx_title ltx_title_document">LVC-LGMC: Joint Local and Global Motion Compensation <br class="ltx_break"/>for Learned Video Compression</h1> <div class="ltx_abstract"> <h6 class="ltx_title ltx_title_abstract">Abstract</h6> <p class="ltx_p" id="id1.id1">Existing learned video compression models employ flow net or deformable convolutional networks (DCN) to estimate motion information. However, the limited receptive fields of flow net and DCN inherently direct their attentiveness towards the <span class="ltx_text ltx_font_italic" id="id1.id1.1">local</span> contexts. <span class="ltx_text ltx_font_italic" id="id1.id1.2">Global</span> contexts, such as large-scale motions and global correlations among frames are ignored, presenting a significant bottleneck for capturing accurate motions. To address this issue, we propose a joint <span class="ltx_text ltx_font_italic" id="id1.id1.3">local</span> and <span class="ltx_text ltx_font_italic" id="id1.id1.4">global</span> motion compensation module (LGMC) for leaned video coding. More specifically, we adopt flow net for <span class="ltx_text ltx_font_italic" id="id1.id1.5">local</span> motion compensation. To capture <span class="ltx_text ltx_font_italic" id="id1.id1.6">global</span> context, we employ the cross attention in feature domain for motion compensation. In addition, to avoid the quadratic complexity of vanilla cross attention, we divide the softmax operations in attention into two independent softmax operations, leading to linear complexity. To validate the effectiveness of our proposed LGMC, we integrate it with DCVC-TCM and obtain learned video compression with joint <span class="ltx_text ltx_font_italic" id="id1.id1.7">local</span> and <span class="ltx_text ltx_font_italic" id="id1.id1.8">global</span> motion compensation (LVC-LGMC). Extensive experiments demonstrate that our LVC-LGMC has significant rate-distortion performance improvements over baseline DCVC-TCM.</p> </div> <div class="ltx_para" id="p1"> <p class="ltx_p" id="p1.1"><span class="ltx_text ltx_font_bold ltx_font_italic" id="p1.1.1">Index Terms<span class="ltx_text ltx_font_upright" id="p1.1.1.1">— </span></span> Motion estimation, neural video compression</p> </div> <section class="ltx_section" id="S1"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">1 </span>Introduction</h2> <div class="ltx_para" id="S1.p1"> <p class="ltx_p" id="S1.p1.1">The rapid development of social media and video applications triggers the increases of the video date volume, bringing challenges to video compression <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib1" title="">1</a>, <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib2" title="">2</a>]</cite>. In recent years, learned video compression has attracted lots of attentions. Most learned video compression models <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib1" title="">1</a>, <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib2" title="">2</a>, <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib3" title="">3</a>, <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib4" title="">4</a>]</cite> are based on predictive coding paradigm, which employs a flow net or Deformable Convolutional Networks (DCN) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib5" title="">5</a>]</cite> to predict the motion information between the reference frame and current frame. Subsequently, a motion codec is cooperated to compress the motion information. Meanwhile, residual codec <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib1" title="">1</a>]</cite> or a contextual codec <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib2" title="">2</a>]</cite> is used to compress the residuals or contexts. To enhance the compression performance, advanced entropy models <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib6" title="">6</a>]</cite> and flow coding techniques <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib7" title="">7</a>]</cite> are investigated in recent years. Some models <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib8" title="">8</a>]</cite> even outperform the Versatile Video Coding (VVC) under the Low Delay configuration. </p> </div> <div class="ltx_para" id="S1.p2"> <p class="ltx_p" id="S1.p2.1">Traditional video coding is block-based, which is capable of search over a predefined region for motion estimation. Flow-based or DCN-based motion estimation can only handle small motions due to the limited receptive field of Convolution Neural Networks (CNN), which can only capture <span class="ltx_text ltx_font_italic" id="S1.p2.1.1">local</span> redundancy. <span class="ltx_text ltx_font_italic" id="S1.p2.1.2">Global</span> redundancies are existed even in the scenario of small motions, which has not been sufficiently explored in learned video compression. </p> </div> <div class="ltx_para" id="S1.p3"> <p class="ltx_p" id="S1.p3.2">To address above issues, we propose a novel mixed flow-attention based motion estimation and motion compensation module in feature space for joint <span class="ltx_text ltx_font_italic" id="S1.p3.2.1">local</span> and <span class="ltx_text ltx_font_italic" id="S1.p3.2.2">global</span> redundancy capturing. Our module is built upon the conditional coding paradigm <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib2" title="">2</a>, <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib9" title="">9</a>]</cite>. We integrate the proposed joint <span class="ltx_text ltx_font_italic" id="S1.p3.2.3">local</span> and <span class="ltx_text ltx_font_italic" id="S1.p3.2.4">global</span> module to DCVC-TCM to obtain the learned video compression model LVC-LGMC. In the LVC-LGMC, multi-scale motion compensation is adopted. More specifically, when compressing the current P-frame <math alttext="\boldsymbol{x}_{t}" class="ltx_Math" display="inline" id="S1.p3.1.m1.1"><semantics id="S1.p3.1.m1.1a"><msub id="S1.p3.1.m1.1.1" xref="S1.p3.1.m1.1.1.cmml"><mi id="S1.p3.1.m1.1.1.2" xref="S1.p3.1.m1.1.1.2.cmml">𝒙</mi><mi id="S1.p3.1.m1.1.1.3" xref="S1.p3.1.m1.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S1.p3.1.m1.1b"><apply id="S1.p3.1.m1.1.1.cmml" xref="S1.p3.1.m1.1.1"><csymbol cd="ambiguous" id="S1.p3.1.m1.1.1.1.cmml" xref="S1.p3.1.m1.1.1">subscript</csymbol><ci id="S1.p3.1.m1.1.1.2.cmml" xref="S1.p3.1.m1.1.1.2">𝒙</ci><ci id="S1.p3.1.m1.1.1.3.cmml" xref="S1.p3.1.m1.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S1.p3.1.m1.1c">\boldsymbol{x}_{t}</annotation><annotation encoding="application/x-llamapun" id="S1.p3.1.m1.1d">bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>, we use a flow net to estimate the motion offset. By warping the multi-scale features using the decoded estimated offset, the local redundancies between frames can be well captured. To achieve global motion estimation, we propose to employ the cross attention between the propagated features and middle features of current frame for the long-range modeling. The attention map contains the global similarity between frames without consuming additional bits for representation. However, the quadratic complexity of vanilla attention impedes the compression of high-resolution videos. To address this issue, we propose to divide the softmax in vanilla attention into two softmax operations <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib10" title="">10</a>]</cite>. The dot product features after two independent softmax operations is treated as the similarity metric. Such division of softmax operation makes the complexity increases linearly with the resolution, leading the operation more efficient. When compared with the baseline DCVC-TCM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib9" title="">9</a>]</cite>, the proposed LVC-LGMC reduces <math alttext="10" class="ltx_Math" display="inline" id="S1.p3.2.m2.1"><semantics id="S1.p3.2.m2.1a"><mn id="S1.p3.2.m2.1.1" xref="S1.p3.2.m2.1.1.cmml">10</mn><annotation-xml encoding="MathML-Content" id="S1.p3.2.m2.1b"><cn id="S1.p3.2.m2.1.1.cmml" type="integer" xref="S1.p3.2.m2.1.1">10</cn></annotation-xml><annotation encoding="application/x-tex" id="S1.p3.2.m2.1c">10</annotation><annotation encoding="application/x-llamapun" id="S1.p3.2.m2.1d">10</annotation></semantics></math>% bit-rates on MCL-JCV test sequences <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib11" title="">11</a>]</cite>.</p> </div> <div class="ltx_para" id="S1.p4"> <p class="ltx_p" id="S1.p4.1">The contributions of this paper are summarized as follows: </p> <ul class="ltx_itemize" id="S1.I1"> <li class="ltx_item" id="S1.I1.i1" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S1.I1.i1.p1"> <p class="ltx_p" id="S1.I1.i1.p1.1">We propose a novel attention-based motion compensation module to handle large-scale movements and capture global redundancy between frames but with linear complexity and without extra bits. To our knowledge, this is the first attempt to use cross attention for motion compensation.</p> </div> </li> <li class="ltx_item" id="S1.I1.i2" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S1.I1.i2.p1"> <p class="ltx_p" id="S1.I1.i2.p1.1">We incorporate proposed attention-based motion compensation with flow-based motion compensation for joint local and global motion compensation. The proposed method significantly boosts the model performance.</p> </div> </li> </ul> </div> </section> <section class="ltx_section" id="S2"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">2 </span>The Proposed LVC-LGMC Method</h2> <div class="ltx_para" id="S2.p1"> <p class="ltx_p" id="S2.p1.1">The paradigm of the proposed LVC-LGMC is illustrated in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S2.F1" title="Figure 1 ‣ 2.2 Attention-based Global Compensation ‣ 2 The Proposed LVC-LGMC Method ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_tag">1</span></a>. We reproduce the DCVC-TCM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib9" title="">9</a>]</cite> and build LVC-LGMC on top of it. Following DCVC-TCM, we adopt temporal propagated multi-scale features for local compensation.</p> </div> <section class="ltx_subsection" id="S2.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">2.1 </span>Flow-based Local Compensation</h3> <div class="ltx_para" id="S2.SS1.p1"> <p class="ltx_p" id="S2.SS1.p1.10">When compressing the <math alttext="t" class="ltx_Math" display="inline" id="S2.SS1.p1.1.m1.1"><semantics id="S2.SS1.p1.1.m1.1a"><mi id="S2.SS1.p1.1.m1.1.1" xref="S2.SS1.p1.1.m1.1.1.cmml">t</mi><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.1.m1.1b"><ci id="S2.SS1.p1.1.m1.1.1.cmml" xref="S2.SS1.p1.1.m1.1.1">𝑡</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.1.m1.1c">t</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.1.m1.1d">italic_t</annotation></semantics></math>-th frame <math alttext="\boldsymbol{x}_{t}" class="ltx_Math" display="inline" id="S2.SS1.p1.2.m2.1"><semantics id="S2.SS1.p1.2.m2.1a"><msub id="S2.SS1.p1.2.m2.1.1" xref="S2.SS1.p1.2.m2.1.1.cmml"><mi id="S2.SS1.p1.2.m2.1.1.2" xref="S2.SS1.p1.2.m2.1.1.2.cmml">𝒙</mi><mi id="S2.SS1.p1.2.m2.1.1.3" xref="S2.SS1.p1.2.m2.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.2.m2.1b"><apply id="S2.SS1.p1.2.m2.1.1.cmml" xref="S2.SS1.p1.2.m2.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.2.m2.1.1.1.cmml" xref="S2.SS1.p1.2.m2.1.1">subscript</csymbol><ci id="S2.SS1.p1.2.m2.1.1.2.cmml" xref="S2.SS1.p1.2.m2.1.1.2">𝒙</ci><ci id="S2.SS1.p1.2.m2.1.1.3.cmml" xref="S2.SS1.p1.2.m2.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.2.m2.1c">\boldsymbol{x}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.2.m2.1d">bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>, first, we conduct the local compensation <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib9" title="">9</a>]</cite>. In particular, multi-scale features <math alttext="\hat{\boldsymbol{f}}_{t}^{0},\hat{\boldsymbol{f}}_{t}^{1},\hat{\boldsymbol{f}}% _{t}^{2}" class="ltx_Math" display="inline" id="S2.SS1.p1.3.m3.3"><semantics id="S2.SS1.p1.3.m3.3a"><mrow id="S2.SS1.p1.3.m3.3.3.3" xref="S2.SS1.p1.3.m3.3.3.4.cmml"><msubsup id="S2.SS1.p1.3.m3.1.1.1.1" xref="S2.SS1.p1.3.m3.1.1.1.1.cmml"><mover accent="true" id="S2.SS1.p1.3.m3.1.1.1.1.2.2" xref="S2.SS1.p1.3.m3.1.1.1.1.2.2.cmml"><mi id="S2.SS1.p1.3.m3.1.1.1.1.2.2.2" xref="S2.SS1.p1.3.m3.1.1.1.1.2.2.2.cmml">𝒇</mi><mo id="S2.SS1.p1.3.m3.1.1.1.1.2.2.1" xref="S2.SS1.p1.3.m3.1.1.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.SS1.p1.3.m3.1.1.1.1.2.3" xref="S2.SS1.p1.3.m3.1.1.1.1.2.3.cmml">t</mi><mn id="S2.SS1.p1.3.m3.1.1.1.1.3" xref="S2.SS1.p1.3.m3.1.1.1.1.3.cmml">0</mn></msubsup><mo id="S2.SS1.p1.3.m3.3.3.3.4" xref="S2.SS1.p1.3.m3.3.3.4.cmml">,</mo><msubsup id="S2.SS1.p1.3.m3.2.2.2.2" xref="S2.SS1.p1.3.m3.2.2.2.2.cmml"><mover accent="true" id="S2.SS1.p1.3.m3.2.2.2.2.2.2" xref="S2.SS1.p1.3.m3.2.2.2.2.2.2.cmml"><mi id="S2.SS1.p1.3.m3.2.2.2.2.2.2.2" xref="S2.SS1.p1.3.m3.2.2.2.2.2.2.2.cmml">𝒇</mi><mo id="S2.SS1.p1.3.m3.2.2.2.2.2.2.1" xref="S2.SS1.p1.3.m3.2.2.2.2.2.2.1.cmml">^</mo></mover><mi id="S2.SS1.p1.3.m3.2.2.2.2.2.3" xref="S2.SS1.p1.3.m3.2.2.2.2.2.3.cmml">t</mi><mn id="S2.SS1.p1.3.m3.2.2.2.2.3" xref="S2.SS1.p1.3.m3.2.2.2.2.3.cmml">1</mn></msubsup><mo id="S2.SS1.p1.3.m3.3.3.3.5" xref="S2.SS1.p1.3.m3.3.3.4.cmml">,</mo><msubsup id="S2.SS1.p1.3.m3.3.3.3.3" xref="S2.SS1.p1.3.m3.3.3.3.3.cmml"><mover accent="true" id="S2.SS1.p1.3.m3.3.3.3.3.2.2" xref="S2.SS1.p1.3.m3.3.3.3.3.2.2.cmml"><mi id="S2.SS1.p1.3.m3.3.3.3.3.2.2.2" xref="S2.SS1.p1.3.m3.3.3.3.3.2.2.2.cmml">𝒇</mi><mo id="S2.SS1.p1.3.m3.3.3.3.3.2.2.1" xref="S2.SS1.p1.3.m3.3.3.3.3.2.2.1.cmml">^</mo></mover><mi id="S2.SS1.p1.3.m3.3.3.3.3.2.3" xref="S2.SS1.p1.3.m3.3.3.3.3.2.3.cmml">t</mi><mn id="S2.SS1.p1.3.m3.3.3.3.3.3" xref="S2.SS1.p1.3.m3.3.3.3.3.3.cmml">2</mn></msubsup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.3.m3.3b"><list id="S2.SS1.p1.3.m3.3.3.4.cmml" xref="S2.SS1.p1.3.m3.3.3.3"><apply id="S2.SS1.p1.3.m3.1.1.1.1.cmml" xref="S2.SS1.p1.3.m3.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.3.m3.1.1.1.1.1.cmml" xref="S2.SS1.p1.3.m3.1.1.1.1">superscript</csymbol><apply id="S2.SS1.p1.3.m3.1.1.1.1.2.cmml" xref="S2.SS1.p1.3.m3.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.3.m3.1.1.1.1.2.1.cmml" xref="S2.SS1.p1.3.m3.1.1.1.1">subscript</csymbol><apply id="S2.SS1.p1.3.m3.1.1.1.1.2.2.cmml" xref="S2.SS1.p1.3.m3.1.1.1.1.2.2"><ci id="S2.SS1.p1.3.m3.1.1.1.1.2.2.1.cmml" xref="S2.SS1.p1.3.m3.1.1.1.1.2.2.1">^</ci><ci id="S2.SS1.p1.3.m3.1.1.1.1.2.2.2.cmml" xref="S2.SS1.p1.3.m3.1.1.1.1.2.2.2">𝒇</ci></apply><ci id="S2.SS1.p1.3.m3.1.1.1.1.2.3.cmml" xref="S2.SS1.p1.3.m3.1.1.1.1.2.3">𝑡</ci></apply><cn id="S2.SS1.p1.3.m3.1.1.1.1.3.cmml" type="integer" xref="S2.SS1.p1.3.m3.1.1.1.1.3">0</cn></apply><apply id="S2.SS1.p1.3.m3.2.2.2.2.cmml" xref="S2.SS1.p1.3.m3.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS1.p1.3.m3.2.2.2.2.1.cmml" xref="S2.SS1.p1.3.m3.2.2.2.2">superscript</csymbol><apply id="S2.SS1.p1.3.m3.2.2.2.2.2.cmml" xref="S2.SS1.p1.3.m3.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS1.p1.3.m3.2.2.2.2.2.1.cmml" xref="S2.SS1.p1.3.m3.2.2.2.2">subscript</csymbol><apply id="S2.SS1.p1.3.m3.2.2.2.2.2.2.cmml" xref="S2.SS1.p1.3.m3.2.2.2.2.2.2"><ci id="S2.SS1.p1.3.m3.2.2.2.2.2.2.1.cmml" xref="S2.SS1.p1.3.m3.2.2.2.2.2.2.1">^</ci><ci id="S2.SS1.p1.3.m3.2.2.2.2.2.2.2.cmml" xref="S2.SS1.p1.3.m3.2.2.2.2.2.2.2">𝒇</ci></apply><ci id="S2.SS1.p1.3.m3.2.2.2.2.2.3.cmml" xref="S2.SS1.p1.3.m3.2.2.2.2.2.3">𝑡</ci></apply><cn id="S2.SS1.p1.3.m3.2.2.2.2.3.cmml" type="integer" xref="S2.SS1.p1.3.m3.2.2.2.2.3">1</cn></apply><apply id="S2.SS1.p1.3.m3.3.3.3.3.cmml" xref="S2.SS1.p1.3.m3.3.3.3.3"><csymbol cd="ambiguous" id="S2.SS1.p1.3.m3.3.3.3.3.1.cmml" xref="S2.SS1.p1.3.m3.3.3.3.3">superscript</csymbol><apply id="S2.SS1.p1.3.m3.3.3.3.3.2.cmml" xref="S2.SS1.p1.3.m3.3.3.3.3"><csymbol cd="ambiguous" id="S2.SS1.p1.3.m3.3.3.3.3.2.1.cmml" xref="S2.SS1.p1.3.m3.3.3.3.3">subscript</csymbol><apply id="S2.SS1.p1.3.m3.3.3.3.3.2.2.cmml" xref="S2.SS1.p1.3.m3.3.3.3.3.2.2"><ci id="S2.SS1.p1.3.m3.3.3.3.3.2.2.1.cmml" xref="S2.SS1.p1.3.m3.3.3.3.3.2.2.1">^</ci><ci id="S2.SS1.p1.3.m3.3.3.3.3.2.2.2.cmml" xref="S2.SS1.p1.3.m3.3.3.3.3.2.2.2">𝒇</ci></apply><ci id="S2.SS1.p1.3.m3.3.3.3.3.2.3.cmml" xref="S2.SS1.p1.3.m3.3.3.3.3.2.3">𝑡</ci></apply><cn id="S2.SS1.p1.3.m3.3.3.3.3.3.cmml" type="integer" xref="S2.SS1.p1.3.m3.3.3.3.3.3">2</cn></apply></list></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.3.m3.3c">\hat{\boldsymbol{f}}_{t}^{0},\hat{\boldsymbol{f}}_{t}^{1},\hat{\boldsymbol{f}}% _{t}^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.3.m3.3d">over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math> are extracted from propagated feature <math alttext="\hat{\boldsymbol{F}}_{t-1}" class="ltx_Math" display="inline" id="S2.SS1.p1.4.m4.1"><semantics id="S2.SS1.p1.4.m4.1a"><msub id="S2.SS1.p1.4.m4.1.1" xref="S2.SS1.p1.4.m4.1.1.cmml"><mover accent="true" id="S2.SS1.p1.4.m4.1.1.2" xref="S2.SS1.p1.4.m4.1.1.2.cmml"><mi id="S2.SS1.p1.4.m4.1.1.2.2" xref="S2.SS1.p1.4.m4.1.1.2.2.cmml">𝑭</mi><mo id="S2.SS1.p1.4.m4.1.1.2.1" xref="S2.SS1.p1.4.m4.1.1.2.1.cmml">^</mo></mover><mrow id="S2.SS1.p1.4.m4.1.1.3" xref="S2.SS1.p1.4.m4.1.1.3.cmml"><mi id="S2.SS1.p1.4.m4.1.1.3.2" xref="S2.SS1.p1.4.m4.1.1.3.2.cmml">t</mi><mo id="S2.SS1.p1.4.m4.1.1.3.1" xref="S2.SS1.p1.4.m4.1.1.3.1.cmml">−</mo><mn id="S2.SS1.p1.4.m4.1.1.3.3" xref="S2.SS1.p1.4.m4.1.1.3.3.cmml">1</mn></mrow></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.4.m4.1b"><apply id="S2.SS1.p1.4.m4.1.1.cmml" xref="S2.SS1.p1.4.m4.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.4.m4.1.1.1.cmml" xref="S2.SS1.p1.4.m4.1.1">subscript</csymbol><apply id="S2.SS1.p1.4.m4.1.1.2.cmml" xref="S2.SS1.p1.4.m4.1.1.2"><ci id="S2.SS1.p1.4.m4.1.1.2.1.cmml" xref="S2.SS1.p1.4.m4.1.1.2.1">^</ci><ci id="S2.SS1.p1.4.m4.1.1.2.2.cmml" xref="S2.SS1.p1.4.m4.1.1.2.2">𝑭</ci></apply><apply id="S2.SS1.p1.4.m4.1.1.3.cmml" xref="S2.SS1.p1.4.m4.1.1.3"><minus id="S2.SS1.p1.4.m4.1.1.3.1.cmml" xref="S2.SS1.p1.4.m4.1.1.3.1"></minus><ci id="S2.SS1.p1.4.m4.1.1.3.2.cmml" xref="S2.SS1.p1.4.m4.1.1.3.2">𝑡</ci><cn id="S2.SS1.p1.4.m4.1.1.3.3.cmml" type="integer" xref="S2.SS1.p1.4.m4.1.1.3.3">1</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.4.m4.1c">\hat{\boldsymbol{F}}_{t-1}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.4.m4.1d">over^ start_ARG bold_italic_F end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT</annotation></semantics></math>. Motion vector <math alttext="\hat{\boldsymbol{v}}_{t}" class="ltx_Math" display="inline" id="S2.SS1.p1.5.m5.1"><semantics id="S2.SS1.p1.5.m5.1a"><msub id="S2.SS1.p1.5.m5.1.1" xref="S2.SS1.p1.5.m5.1.1.cmml"><mover accent="true" id="S2.SS1.p1.5.m5.1.1.2" xref="S2.SS1.p1.5.m5.1.1.2.cmml"><mi id="S2.SS1.p1.5.m5.1.1.2.2" xref="S2.SS1.p1.5.m5.1.1.2.2.cmml">𝒗</mi><mo id="S2.SS1.p1.5.m5.1.1.2.1" xref="S2.SS1.p1.5.m5.1.1.2.1.cmml">^</mo></mover><mi id="S2.SS1.p1.5.m5.1.1.3" xref="S2.SS1.p1.5.m5.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.5.m5.1b"><apply id="S2.SS1.p1.5.m5.1.1.cmml" xref="S2.SS1.p1.5.m5.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.5.m5.1.1.1.cmml" xref="S2.SS1.p1.5.m5.1.1">subscript</csymbol><apply id="S2.SS1.p1.5.m5.1.1.2.cmml" xref="S2.SS1.p1.5.m5.1.1.2"><ci id="S2.SS1.p1.5.m5.1.1.2.1.cmml" xref="S2.SS1.p1.5.m5.1.1.2.1">^</ci><ci id="S2.SS1.p1.5.m5.1.1.2.2.cmml" xref="S2.SS1.p1.5.m5.1.1.2.2">𝒗</ci></apply><ci id="S2.SS1.p1.5.m5.1.1.3.cmml" xref="S2.SS1.p1.5.m5.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.5.m5.1c">\hat{\boldsymbol{v}}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.5.m5.1d">over^ start_ARG bold_italic_v end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> is employed to warp the multi-scale features to multi-scale local contexts <math alttext="\hat{\boldsymbol{l}}_{t}^{0},\hat{\boldsymbol{l}}_{t}^{1},\hat{\boldsymbol{l}}% _{t}^{2}" class="ltx_Math" display="inline" id="S2.SS1.p1.6.m6.3"><semantics id="S2.SS1.p1.6.m6.3a"><mrow id="S2.SS1.p1.6.m6.3.3.3" xref="S2.SS1.p1.6.m6.3.3.4.cmml"><msubsup id="S2.SS1.p1.6.m6.1.1.1.1" xref="S2.SS1.p1.6.m6.1.1.1.1.cmml"><mover accent="true" id="S2.SS1.p1.6.m6.1.1.1.1.2.2" xref="S2.SS1.p1.6.m6.1.1.1.1.2.2.cmml"><mi id="S2.SS1.p1.6.m6.1.1.1.1.2.2.2" xref="S2.SS1.p1.6.m6.1.1.1.1.2.2.2.cmml">𝒍</mi><mo id="S2.SS1.p1.6.m6.1.1.1.1.2.2.1" xref="S2.SS1.p1.6.m6.1.1.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.SS1.p1.6.m6.1.1.1.1.2.3" xref="S2.SS1.p1.6.m6.1.1.1.1.2.3.cmml">t</mi><mn id="S2.SS1.p1.6.m6.1.1.1.1.3" xref="S2.SS1.p1.6.m6.1.1.1.1.3.cmml">0</mn></msubsup><mo id="S2.SS1.p1.6.m6.3.3.3.4" xref="S2.SS1.p1.6.m6.3.3.4.cmml">,</mo><msubsup id="S2.SS1.p1.6.m6.2.2.2.2" xref="S2.SS1.p1.6.m6.2.2.2.2.cmml"><mover accent="true" id="S2.SS1.p1.6.m6.2.2.2.2.2.2" xref="S2.SS1.p1.6.m6.2.2.2.2.2.2.cmml"><mi id="S2.SS1.p1.6.m6.2.2.2.2.2.2.2" xref="S2.SS1.p1.6.m6.2.2.2.2.2.2.2.cmml">𝒍</mi><mo id="S2.SS1.p1.6.m6.2.2.2.2.2.2.1" xref="S2.SS1.p1.6.m6.2.2.2.2.2.2.1.cmml">^</mo></mover><mi id="S2.SS1.p1.6.m6.2.2.2.2.2.3" xref="S2.SS1.p1.6.m6.2.2.2.2.2.3.cmml">t</mi><mn id="S2.SS1.p1.6.m6.2.2.2.2.3" xref="S2.SS1.p1.6.m6.2.2.2.2.3.cmml">1</mn></msubsup><mo id="S2.SS1.p1.6.m6.3.3.3.5" xref="S2.SS1.p1.6.m6.3.3.4.cmml">,</mo><msubsup id="S2.SS1.p1.6.m6.3.3.3.3" xref="S2.SS1.p1.6.m6.3.3.3.3.cmml"><mover accent="true" id="S2.SS1.p1.6.m6.3.3.3.3.2.2" xref="S2.SS1.p1.6.m6.3.3.3.3.2.2.cmml"><mi id="S2.SS1.p1.6.m6.3.3.3.3.2.2.2" xref="S2.SS1.p1.6.m6.3.3.3.3.2.2.2.cmml">𝒍</mi><mo id="S2.SS1.p1.6.m6.3.3.3.3.2.2.1" xref="S2.SS1.p1.6.m6.3.3.3.3.2.2.1.cmml">^</mo></mover><mi id="S2.SS1.p1.6.m6.3.3.3.3.2.3" xref="S2.SS1.p1.6.m6.3.3.3.3.2.3.cmml">t</mi><mn id="S2.SS1.p1.6.m6.3.3.3.3.3" xref="S2.SS1.p1.6.m6.3.3.3.3.3.cmml">2</mn></msubsup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.6.m6.3b"><list id="S2.SS1.p1.6.m6.3.3.4.cmml" xref="S2.SS1.p1.6.m6.3.3.3"><apply id="S2.SS1.p1.6.m6.1.1.1.1.cmml" xref="S2.SS1.p1.6.m6.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.6.m6.1.1.1.1.1.cmml" xref="S2.SS1.p1.6.m6.1.1.1.1">superscript</csymbol><apply id="S2.SS1.p1.6.m6.1.1.1.1.2.cmml" xref="S2.SS1.p1.6.m6.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.6.m6.1.1.1.1.2.1.cmml" xref="S2.SS1.p1.6.m6.1.1.1.1">subscript</csymbol><apply id="S2.SS1.p1.6.m6.1.1.1.1.2.2.cmml" xref="S2.SS1.p1.6.m6.1.1.1.1.2.2"><ci id="S2.SS1.p1.6.m6.1.1.1.1.2.2.1.cmml" xref="S2.SS1.p1.6.m6.1.1.1.1.2.2.1">^</ci><ci id="S2.SS1.p1.6.m6.1.1.1.1.2.2.2.cmml" xref="S2.SS1.p1.6.m6.1.1.1.1.2.2.2">𝒍</ci></apply><ci id="S2.SS1.p1.6.m6.1.1.1.1.2.3.cmml" xref="S2.SS1.p1.6.m6.1.1.1.1.2.3">𝑡</ci></apply><cn id="S2.SS1.p1.6.m6.1.1.1.1.3.cmml" type="integer" xref="S2.SS1.p1.6.m6.1.1.1.1.3">0</cn></apply><apply id="S2.SS1.p1.6.m6.2.2.2.2.cmml" xref="S2.SS1.p1.6.m6.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS1.p1.6.m6.2.2.2.2.1.cmml" xref="S2.SS1.p1.6.m6.2.2.2.2">superscript</csymbol><apply id="S2.SS1.p1.6.m6.2.2.2.2.2.cmml" xref="S2.SS1.p1.6.m6.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS1.p1.6.m6.2.2.2.2.2.1.cmml" xref="S2.SS1.p1.6.m6.2.2.2.2">subscript</csymbol><apply id="S2.SS1.p1.6.m6.2.2.2.2.2.2.cmml" xref="S2.SS1.p1.6.m6.2.2.2.2.2.2"><ci id="S2.SS1.p1.6.m6.2.2.2.2.2.2.1.cmml" xref="S2.SS1.p1.6.m6.2.2.2.2.2.2.1">^</ci><ci id="S2.SS1.p1.6.m6.2.2.2.2.2.2.2.cmml" xref="S2.SS1.p1.6.m6.2.2.2.2.2.2.2">𝒍</ci></apply><ci id="S2.SS1.p1.6.m6.2.2.2.2.2.3.cmml" xref="S2.SS1.p1.6.m6.2.2.2.2.2.3">𝑡</ci></apply><cn id="S2.SS1.p1.6.m6.2.2.2.2.3.cmml" type="integer" xref="S2.SS1.p1.6.m6.2.2.2.2.3">1</cn></apply><apply id="S2.SS1.p1.6.m6.3.3.3.3.cmml" xref="S2.SS1.p1.6.m6.3.3.3.3"><csymbol cd="ambiguous" id="S2.SS1.p1.6.m6.3.3.3.3.1.cmml" xref="S2.SS1.p1.6.m6.3.3.3.3">superscript</csymbol><apply id="S2.SS1.p1.6.m6.3.3.3.3.2.cmml" xref="S2.SS1.p1.6.m6.3.3.3.3"><csymbol cd="ambiguous" id="S2.SS1.p1.6.m6.3.3.3.3.2.1.cmml" xref="S2.SS1.p1.6.m6.3.3.3.3">subscript</csymbol><apply id="S2.SS1.p1.6.m6.3.3.3.3.2.2.cmml" xref="S2.SS1.p1.6.m6.3.3.3.3.2.2"><ci id="S2.SS1.p1.6.m6.3.3.3.3.2.2.1.cmml" xref="S2.SS1.p1.6.m6.3.3.3.3.2.2.1">^</ci><ci id="S2.SS1.p1.6.m6.3.3.3.3.2.2.2.cmml" xref="S2.SS1.p1.6.m6.3.3.3.3.2.2.2">𝒍</ci></apply><ci id="S2.SS1.p1.6.m6.3.3.3.3.2.3.cmml" xref="S2.SS1.p1.6.m6.3.3.3.3.2.3">𝑡</ci></apply><cn id="S2.SS1.p1.6.m6.3.3.3.3.3.cmml" type="integer" xref="S2.SS1.p1.6.m6.3.3.3.3.3">2</cn></apply></list></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.6.m6.3c">\hat{\boldsymbol{l}}_{t}^{0},\hat{\boldsymbol{l}}_{t}^{1},\hat{\boldsymbol{l}}% _{t}^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.6.m6.3d">over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math>. The <math alttext="\hat{\boldsymbol{l}}_{t}^{0},\hat{\boldsymbol{l}}_{t}^{1},\hat{\boldsymbol{l}}% _{t}^{2}" class="ltx_Math" display="inline" id="S2.SS1.p1.7.m7.3"><semantics id="S2.SS1.p1.7.m7.3a"><mrow id="S2.SS1.p1.7.m7.3.3.3" xref="S2.SS1.p1.7.m7.3.3.4.cmml"><msubsup id="S2.SS1.p1.7.m7.1.1.1.1" xref="S2.SS1.p1.7.m7.1.1.1.1.cmml"><mover accent="true" id="S2.SS1.p1.7.m7.1.1.1.1.2.2" xref="S2.SS1.p1.7.m7.1.1.1.1.2.2.cmml"><mi id="S2.SS1.p1.7.m7.1.1.1.1.2.2.2" xref="S2.SS1.p1.7.m7.1.1.1.1.2.2.2.cmml">𝒍</mi><mo id="S2.SS1.p1.7.m7.1.1.1.1.2.2.1" xref="S2.SS1.p1.7.m7.1.1.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.SS1.p1.7.m7.1.1.1.1.2.3" xref="S2.SS1.p1.7.m7.1.1.1.1.2.3.cmml">t</mi><mn id="S2.SS1.p1.7.m7.1.1.1.1.3" xref="S2.SS1.p1.7.m7.1.1.1.1.3.cmml">0</mn></msubsup><mo id="S2.SS1.p1.7.m7.3.3.3.4" xref="S2.SS1.p1.7.m7.3.3.4.cmml">,</mo><msubsup id="S2.SS1.p1.7.m7.2.2.2.2" xref="S2.SS1.p1.7.m7.2.2.2.2.cmml"><mover accent="true" id="S2.SS1.p1.7.m7.2.2.2.2.2.2" xref="S2.SS1.p1.7.m7.2.2.2.2.2.2.cmml"><mi id="S2.SS1.p1.7.m7.2.2.2.2.2.2.2" xref="S2.SS1.p1.7.m7.2.2.2.2.2.2.2.cmml">𝒍</mi><mo id="S2.SS1.p1.7.m7.2.2.2.2.2.2.1" xref="S2.SS1.p1.7.m7.2.2.2.2.2.2.1.cmml">^</mo></mover><mi id="S2.SS1.p1.7.m7.2.2.2.2.2.3" xref="S2.SS1.p1.7.m7.2.2.2.2.2.3.cmml">t</mi><mn id="S2.SS1.p1.7.m7.2.2.2.2.3" xref="S2.SS1.p1.7.m7.2.2.2.2.3.cmml">1</mn></msubsup><mo id="S2.SS1.p1.7.m7.3.3.3.5" xref="S2.SS1.p1.7.m7.3.3.4.cmml">,</mo><msubsup id="S2.SS1.p1.7.m7.3.3.3.3" xref="S2.SS1.p1.7.m7.3.3.3.3.cmml"><mover accent="true" id="S2.SS1.p1.7.m7.3.3.3.3.2.2" xref="S2.SS1.p1.7.m7.3.3.3.3.2.2.cmml"><mi id="S2.SS1.p1.7.m7.3.3.3.3.2.2.2" xref="S2.SS1.p1.7.m7.3.3.3.3.2.2.2.cmml">𝒍</mi><mo id="S2.SS1.p1.7.m7.3.3.3.3.2.2.1" xref="S2.SS1.p1.7.m7.3.3.3.3.2.2.1.cmml">^</mo></mover><mi id="S2.SS1.p1.7.m7.3.3.3.3.2.3" xref="S2.SS1.p1.7.m7.3.3.3.3.2.3.cmml">t</mi><mn id="S2.SS1.p1.7.m7.3.3.3.3.3" xref="S2.SS1.p1.7.m7.3.3.3.3.3.cmml">2</mn></msubsup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.7.m7.3b"><list id="S2.SS1.p1.7.m7.3.3.4.cmml" xref="S2.SS1.p1.7.m7.3.3.3"><apply id="S2.SS1.p1.7.m7.1.1.1.1.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.7.m7.1.1.1.1.1.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1">superscript</csymbol><apply id="S2.SS1.p1.7.m7.1.1.1.1.2.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.7.m7.1.1.1.1.2.1.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1">subscript</csymbol><apply id="S2.SS1.p1.7.m7.1.1.1.1.2.2.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1.2.2"><ci id="S2.SS1.p1.7.m7.1.1.1.1.2.2.1.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1.2.2.1">^</ci><ci id="S2.SS1.p1.7.m7.1.1.1.1.2.2.2.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1.2.2.2">𝒍</ci></apply><ci id="S2.SS1.p1.7.m7.1.1.1.1.2.3.cmml" xref="S2.SS1.p1.7.m7.1.1.1.1.2.3">𝑡</ci></apply><cn id="S2.SS1.p1.7.m7.1.1.1.1.3.cmml" type="integer" xref="S2.SS1.p1.7.m7.1.1.1.1.3">0</cn></apply><apply id="S2.SS1.p1.7.m7.2.2.2.2.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS1.p1.7.m7.2.2.2.2.1.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2">superscript</csymbol><apply id="S2.SS1.p1.7.m7.2.2.2.2.2.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS1.p1.7.m7.2.2.2.2.2.1.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2">subscript</csymbol><apply id="S2.SS1.p1.7.m7.2.2.2.2.2.2.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2.2.2"><ci id="S2.SS1.p1.7.m7.2.2.2.2.2.2.1.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2.2.2.1">^</ci><ci id="S2.SS1.p1.7.m7.2.2.2.2.2.2.2.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2.2.2.2">𝒍</ci></apply><ci id="S2.SS1.p1.7.m7.2.2.2.2.2.3.cmml" xref="S2.SS1.p1.7.m7.2.2.2.2.2.3">𝑡</ci></apply><cn id="S2.SS1.p1.7.m7.2.2.2.2.3.cmml" type="integer" xref="S2.SS1.p1.7.m7.2.2.2.2.3">1</cn></apply><apply id="S2.SS1.p1.7.m7.3.3.3.3.cmml" xref="S2.SS1.p1.7.m7.3.3.3.3"><csymbol cd="ambiguous" id="S2.SS1.p1.7.m7.3.3.3.3.1.cmml" xref="S2.SS1.p1.7.m7.3.3.3.3">superscript</csymbol><apply id="S2.SS1.p1.7.m7.3.3.3.3.2.cmml" xref="S2.SS1.p1.7.m7.3.3.3.3"><csymbol cd="ambiguous" id="S2.SS1.p1.7.m7.3.3.3.3.2.1.cmml" xref="S2.SS1.p1.7.m7.3.3.3.3">subscript</csymbol><apply id="S2.SS1.p1.7.m7.3.3.3.3.2.2.cmml" xref="S2.SS1.p1.7.m7.3.3.3.3.2.2"><ci id="S2.SS1.p1.7.m7.3.3.3.3.2.2.1.cmml" xref="S2.SS1.p1.7.m7.3.3.3.3.2.2.1">^</ci><ci id="S2.SS1.p1.7.m7.3.3.3.3.2.2.2.cmml" xref="S2.SS1.p1.7.m7.3.3.3.3.2.2.2">𝒍</ci></apply><ci id="S2.SS1.p1.7.m7.3.3.3.3.2.3.cmml" xref="S2.SS1.p1.7.m7.3.3.3.3.2.3">𝑡</ci></apply><cn id="S2.SS1.p1.7.m7.3.3.3.3.3.cmml" type="integer" xref="S2.SS1.p1.7.m7.3.3.3.3.3">2</cn></apply></list></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.7.m7.3c">\hat{\boldsymbol{l}}_{t}^{0},\hat{\boldsymbol{l}}_{t}^{1},\hat{\boldsymbol{l}}% _{t}^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.7.m7.3d">over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math> are concatenated with the current frame <math alttext="\boldsymbol{x}_{t}" class="ltx_Math" display="inline" id="S2.SS1.p1.8.m8.1"><semantics id="S2.SS1.p1.8.m8.1a"><msub id="S2.SS1.p1.8.m8.1.1" xref="S2.SS1.p1.8.m8.1.1.cmml"><mi id="S2.SS1.p1.8.m8.1.1.2" xref="S2.SS1.p1.8.m8.1.1.2.cmml">𝒙</mi><mi id="S2.SS1.p1.8.m8.1.1.3" xref="S2.SS1.p1.8.m8.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.8.m8.1b"><apply id="S2.SS1.p1.8.m8.1.1.cmml" xref="S2.SS1.p1.8.m8.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.8.m8.1.1.1.cmml" xref="S2.SS1.p1.8.m8.1.1">subscript</csymbol><ci id="S2.SS1.p1.8.m8.1.1.2.cmml" xref="S2.SS1.p1.8.m8.1.1.2">𝒙</ci><ci id="S2.SS1.p1.8.m8.1.1.3.cmml" xref="S2.SS1.p1.8.m8.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.8.m8.1c">\boldsymbol{x}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.8.m8.1d">bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>, middle-feature <math alttext="\boldsymbol{y}^{1}_{t}" class="ltx_Math" display="inline" id="S2.SS1.p1.9.m9.1"><semantics id="S2.SS1.p1.9.m9.1a"><msubsup id="S2.SS1.p1.9.m9.1.1" xref="S2.SS1.p1.9.m9.1.1.cmml"><mi id="S2.SS1.p1.9.m9.1.1.2.2" xref="S2.SS1.p1.9.m9.1.1.2.2.cmml">𝒚</mi><mi id="S2.SS1.p1.9.m9.1.1.3" xref="S2.SS1.p1.9.m9.1.1.3.cmml">t</mi><mn id="S2.SS1.p1.9.m9.1.1.2.3" xref="S2.SS1.p1.9.m9.1.1.2.3.cmml">1</mn></msubsup><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.9.m9.1b"><apply id="S2.SS1.p1.9.m9.1.1.cmml" xref="S2.SS1.p1.9.m9.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.9.m9.1.1.1.cmml" xref="S2.SS1.p1.9.m9.1.1">subscript</csymbol><apply id="S2.SS1.p1.9.m9.1.1.2.cmml" xref="S2.SS1.p1.9.m9.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.9.m9.1.1.2.1.cmml" xref="S2.SS1.p1.9.m9.1.1">superscript</csymbol><ci id="S2.SS1.p1.9.m9.1.1.2.2.cmml" xref="S2.SS1.p1.9.m9.1.1.2.2">𝒚</ci><cn id="S2.SS1.p1.9.m9.1.1.2.3.cmml" type="integer" xref="S2.SS1.p1.9.m9.1.1.2.3">1</cn></apply><ci id="S2.SS1.p1.9.m9.1.1.3.cmml" xref="S2.SS1.p1.9.m9.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.9.m9.1c">\boldsymbol{y}^{1}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.9.m9.1d">bold_italic_y start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> and middle-feature <math alttext="\hat{\boldsymbol{y}}_{t}^{2}" class="ltx_Math" display="inline" id="S2.SS1.p1.10.m10.1"><semantics id="S2.SS1.p1.10.m10.1a"><msubsup id="S2.SS1.p1.10.m10.1.1" xref="S2.SS1.p1.10.m10.1.1.cmml"><mover accent="true" id="S2.SS1.p1.10.m10.1.1.2.2" xref="S2.SS1.p1.10.m10.1.1.2.2.cmml"><mi id="S2.SS1.p1.10.m10.1.1.2.2.2" xref="S2.SS1.p1.10.m10.1.1.2.2.2.cmml">𝒚</mi><mo id="S2.SS1.p1.10.m10.1.1.2.2.1" xref="S2.SS1.p1.10.m10.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.SS1.p1.10.m10.1.1.2.3" xref="S2.SS1.p1.10.m10.1.1.2.3.cmml">t</mi><mn id="S2.SS1.p1.10.m10.1.1.3" xref="S2.SS1.p1.10.m10.1.1.3.cmml">2</mn></msubsup><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.10.m10.1b"><apply id="S2.SS1.p1.10.m10.1.1.cmml" xref="S2.SS1.p1.10.m10.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.10.m10.1.1.1.cmml" xref="S2.SS1.p1.10.m10.1.1">superscript</csymbol><apply id="S2.SS1.p1.10.m10.1.1.2.cmml" xref="S2.SS1.p1.10.m10.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.10.m10.1.1.2.1.cmml" xref="S2.SS1.p1.10.m10.1.1">subscript</csymbol><apply id="S2.SS1.p1.10.m10.1.1.2.2.cmml" xref="S2.SS1.p1.10.m10.1.1.2.2"><ci id="S2.SS1.p1.10.m10.1.1.2.2.1.cmml" xref="S2.SS1.p1.10.m10.1.1.2.2.1">^</ci><ci id="S2.SS1.p1.10.m10.1.1.2.2.2.cmml" xref="S2.SS1.p1.10.m10.1.1.2.2.2">𝒚</ci></apply><ci id="S2.SS1.p1.10.m10.1.1.2.3.cmml" xref="S2.SS1.p1.10.m10.1.1.2.3">𝑡</ci></apply><cn id="S2.SS1.p1.10.m10.1.1.3.cmml" type="integer" xref="S2.SS1.p1.10.m10.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.10.m10.1c">\hat{\boldsymbol{y}}_{t}^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.10.m10.1d">over^ start_ARG bold_italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math> respectively. As such, the network could well understand how to conduct conditional coding. During decoding, the multi-scale contexts are also concatenated to recover the frame. The overall process can be formulated as:</p> <table class="ltx_equation ltx_eqn_table" id="S2.E1"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\hat{\boldsymbol{x}}_{t}=D_{C}\left(\left\lfloor E_{C}(\boldsymbol{x}_{t}|\hat% {\boldsymbol{l}}_{t}^{0},\hat{\boldsymbol{l}}_{t}^{1},\hat{\boldsymbol{l}}_{t}% ^{2})\right\rceil|\hat{\boldsymbol{l}}_{t}^{0},\hat{\boldsymbol{l}}_{t}^{1},% \hat{\boldsymbol{l}}_{t}^{2}\right)," class="ltx_math_unparsed" display="block" id="S2.E1.m1.1"><semantics id="S2.E1.m1.1a"><mrow id="S2.E1.m1.1b"><msub id="S2.E1.m1.1.1"><mover accent="true" id="S2.E1.m1.1.1.2"><mi id="S2.E1.m1.1.1.2.2">𝒙</mi><mo id="S2.E1.m1.1.1.2.1">^</mo></mover><mi id="S2.E1.m1.1.1.3">t</mi></msub><mo id="S2.E1.m1.1.2">=</mo><msub id="S2.E1.m1.1.3"><mi id="S2.E1.m1.1.3.2">D</mi><mi id="S2.E1.m1.1.3.3">C</mi></msub><mrow id="S2.E1.m1.1.4"><mo id="S2.E1.m1.1.4.1">(</mo><mrow id="S2.E1.m1.1.4.2"><mo id="S2.E1.m1.1.4.2.1">⌊</mo><msub id="S2.E1.m1.1.4.2.2"><mi id="S2.E1.m1.1.4.2.2.2">E</mi><mi id="S2.E1.m1.1.4.2.2.3">C</mi></msub><mrow id="S2.E1.m1.1.4.2.3"><mo id="S2.E1.m1.1.4.2.3.1" stretchy="false">(</mo><msub id="S2.E1.m1.1.4.2.3.2"><mi id="S2.E1.m1.1.4.2.3.2.2">𝒙</mi><mi id="S2.E1.m1.1.4.2.3.2.3">t</mi></msub><mo fence="false" id="S2.E1.m1.1.4.2.3.3" rspace="0.167em" stretchy="false">|</mo><msubsup id="S2.E1.m1.1.4.2.3.4"><mover accent="true" id="S2.E1.m1.1.4.2.3.4.2.2"><mi id="S2.E1.m1.1.4.2.3.4.2.2.2">𝒍</mi><mo id="S2.E1.m1.1.4.2.3.4.2.2.1">^</mo></mover><mi id="S2.E1.m1.1.4.2.3.4.2.3">t</mi><mn id="S2.E1.m1.1.4.2.3.4.3">0</mn></msubsup><mo id="S2.E1.m1.1.4.2.3.5">,</mo><msubsup id="S2.E1.m1.1.4.2.3.6"><mover accent="true" id="S2.E1.m1.1.4.2.3.6.2.2"><mi id="S2.E1.m1.1.4.2.3.6.2.2.2">𝒍</mi><mo id="S2.E1.m1.1.4.2.3.6.2.2.1">^</mo></mover><mi id="S2.E1.m1.1.4.2.3.6.2.3">t</mi><mn id="S2.E1.m1.1.4.2.3.6.3">1</mn></msubsup><mo id="S2.E1.m1.1.4.2.3.7">,</mo><msubsup id="S2.E1.m1.1.4.2.3.8"><mover accent="true" id="S2.E1.m1.1.4.2.3.8.2.2"><mi id="S2.E1.m1.1.4.2.3.8.2.2.2">𝒍</mi><mo id="S2.E1.m1.1.4.2.3.8.2.2.1">^</mo></mover><mi id="S2.E1.m1.1.4.2.3.8.2.3">t</mi><mn id="S2.E1.m1.1.4.2.3.8.3">2</mn></msubsup><mo id="S2.E1.m1.1.4.2.3.9" stretchy="false">)</mo></mrow><mo id="S2.E1.m1.1.4.2.4">⌉</mo></mrow><mo fence="false" id="S2.E1.m1.1.4.3" rspace="0.167em" stretchy="false">|</mo><msubsup id="S2.E1.m1.1.4.4"><mover accent="true" id="S2.E1.m1.1.4.4.2.2"><mi id="S2.E1.m1.1.4.4.2.2.2">𝒍</mi><mo id="S2.E1.m1.1.4.4.2.2.1">^</mo></mover><mi id="S2.E1.m1.1.4.4.2.3">t</mi><mn id="S2.E1.m1.1.4.4.3">0</mn></msubsup><mo id="S2.E1.m1.1.4.5">,</mo><msubsup id="S2.E1.m1.1.4.6"><mover accent="true" id="S2.E1.m1.1.4.6.2.2"><mi id="S2.E1.m1.1.4.6.2.2.2">𝒍</mi><mo id="S2.E1.m1.1.4.6.2.2.1">^</mo></mover><mi id="S2.E1.m1.1.4.6.2.3">t</mi><mn id="S2.E1.m1.1.4.6.3">1</mn></msubsup><mo id="S2.E1.m1.1.4.7">,</mo><msubsup id="S2.E1.m1.1.4.8"><mover accent="true" id="S2.E1.m1.1.4.8.2.2"><mi id="S2.E1.m1.1.4.8.2.2.2">𝒍</mi><mo id="S2.E1.m1.1.4.8.2.2.1">^</mo></mover><mi id="S2.E1.m1.1.4.8.2.3">t</mi><mn id="S2.E1.m1.1.4.8.3">2</mn></msubsup><mo id="S2.E1.m1.1.4.9">)</mo></mrow><mo id="S2.E1.m1.1.5">,</mo></mrow><annotation encoding="application/x-tex" id="S2.E1.m1.1c">\hat{\boldsymbol{x}}_{t}=D_{C}\left(\left\lfloor E_{C}(\boldsymbol{x}_{t}|\hat% {\boldsymbol{l}}_{t}^{0},\hat{\boldsymbol{l}}_{t}^{1},\hat{\boldsymbol{l}}_{t}% ^{2})\right\rceil|\hat{\boldsymbol{l}}_{t}^{0},\hat{\boldsymbol{l}}_{t}^{1},% \hat{\boldsymbol{l}}_{t}^{2}\right),</annotation><annotation encoding="application/x-llamapun" id="S2.E1.m1.1d">over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( ⌊ italic_E start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ⌉ | over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ,</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(1)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S2.SS1.p1.12">where <math alttext="E_{C}" class="ltx_Math" display="inline" id="S2.SS1.p1.11.m1.1"><semantics id="S2.SS1.p1.11.m1.1a"><msub id="S2.SS1.p1.11.m1.1.1" xref="S2.SS1.p1.11.m1.1.1.cmml"><mi id="S2.SS1.p1.11.m1.1.1.2" xref="S2.SS1.p1.11.m1.1.1.2.cmml">E</mi><mi id="S2.SS1.p1.11.m1.1.1.3" xref="S2.SS1.p1.11.m1.1.1.3.cmml">C</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.11.m1.1b"><apply id="S2.SS1.p1.11.m1.1.1.cmml" xref="S2.SS1.p1.11.m1.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.11.m1.1.1.1.cmml" xref="S2.SS1.p1.11.m1.1.1">subscript</csymbol><ci id="S2.SS1.p1.11.m1.1.1.2.cmml" xref="S2.SS1.p1.11.m1.1.1.2">𝐸</ci><ci id="S2.SS1.p1.11.m1.1.1.3.cmml" xref="S2.SS1.p1.11.m1.1.1.3">𝐶</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.11.m1.1c">E_{C}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.11.m1.1d">italic_E start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT</annotation></semantics></math> and <math alttext="D_{C}" class="ltx_Math" display="inline" id="S2.SS1.p1.12.m2.1"><semantics id="S2.SS1.p1.12.m2.1a"><msub id="S2.SS1.p1.12.m2.1.1" xref="S2.SS1.p1.12.m2.1.1.cmml"><mi id="S2.SS1.p1.12.m2.1.1.2" xref="S2.SS1.p1.12.m2.1.1.2.cmml">D</mi><mi id="S2.SS1.p1.12.m2.1.1.3" xref="S2.SS1.p1.12.m2.1.1.3.cmml">C</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.12.m2.1b"><apply id="S2.SS1.p1.12.m2.1.1.cmml" xref="S2.SS1.p1.12.m2.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.12.m2.1.1.1.cmml" xref="S2.SS1.p1.12.m2.1.1">subscript</csymbol><ci id="S2.SS1.p1.12.m2.1.1.2.cmml" xref="S2.SS1.p1.12.m2.1.1.2">𝐷</ci><ci id="S2.SS1.p1.12.m2.1.1.3.cmml" xref="S2.SS1.p1.12.m2.1.1.3">𝐶</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.12.m2.1c">D_{C}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.12.m2.1d">italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT</annotation></semantics></math> are the contextual encoder and the contextual decoder.</p> </div> </section> <section class="ltx_subsection" id="S2.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">2.2 </span>Attention-based Global Compensation</h3> <figure class="ltx_figure" id="S2.F1"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="319" id="S2.F1.g1" src="x1.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text ltx_font_bold" id="S2.F1.10.1.1">Fig. 1</span>: </span>Overall frame work of the proposed LVC-LGMC. <math alttext="\boldsymbol{E}_{C}" class="ltx_Math" display="inline" id="S2.F1.5.m1.1"><semantics id="S2.F1.5.m1.1b"><msub id="S2.F1.5.m1.1.1" xref="S2.F1.5.m1.1.1.cmml"><mi id="S2.F1.5.m1.1.1.2" xref="S2.F1.5.m1.1.1.2.cmml">𝑬</mi><mi id="S2.F1.5.m1.1.1.3" xref="S2.F1.5.m1.1.1.3.cmml">C</mi></msub><annotation-xml encoding="MathML-Content" id="S2.F1.5.m1.1c"><apply id="S2.F1.5.m1.1.1.cmml" xref="S2.F1.5.m1.1.1"><csymbol cd="ambiguous" id="S2.F1.5.m1.1.1.1.cmml" xref="S2.F1.5.m1.1.1">subscript</csymbol><ci id="S2.F1.5.m1.1.1.2.cmml" xref="S2.F1.5.m1.1.1.2">𝑬</ci><ci id="S2.F1.5.m1.1.1.3.cmml" xref="S2.F1.5.m1.1.1.3">𝐶</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.F1.5.m1.1d">\boldsymbol{E}_{C}</annotation><annotation encoding="application/x-llamapun" id="S2.F1.5.m1.1e">bold_italic_E start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT</annotation></semantics></math> is the contextual encoder, and <math alttext="\boldsymbol{D}_{C}" class="ltx_Math" display="inline" id="S2.F1.6.m2.1"><semantics id="S2.F1.6.m2.1b"><msub id="S2.F1.6.m2.1.1" xref="S2.F1.6.m2.1.1.cmml"><mi id="S2.F1.6.m2.1.1.2" xref="S2.F1.6.m2.1.1.2.cmml">𝑫</mi><mi id="S2.F1.6.m2.1.1.3" xref="S2.F1.6.m2.1.1.3.cmml">C</mi></msub><annotation-xml encoding="MathML-Content" id="S2.F1.6.m2.1c"><apply id="S2.F1.6.m2.1.1.cmml" xref="S2.F1.6.m2.1.1"><csymbol cd="ambiguous" id="S2.F1.6.m2.1.1.1.cmml" xref="S2.F1.6.m2.1.1">subscript</csymbol><ci id="S2.F1.6.m2.1.1.2.cmml" xref="S2.F1.6.m2.1.1.2">𝑫</ci><ci id="S2.F1.6.m2.1.1.3.cmml" xref="S2.F1.6.m2.1.1.3">𝐶</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.F1.6.m2.1d">\boldsymbol{D}_{C}</annotation><annotation encoding="application/x-llamapun" id="S2.F1.6.m2.1e">bold_italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT</annotation></semantics></math> is the contextual decoder. <math alttext="\boldsymbol{E}_{M}" class="ltx_Math" display="inline" id="S2.F1.7.m3.1"><semantics id="S2.F1.7.m3.1b"><msub id="S2.F1.7.m3.1.1" xref="S2.F1.7.m3.1.1.cmml"><mi id="S2.F1.7.m3.1.1.2" xref="S2.F1.7.m3.1.1.2.cmml">𝑬</mi><mi id="S2.F1.7.m3.1.1.3" xref="S2.F1.7.m3.1.1.3.cmml">M</mi></msub><annotation-xml encoding="MathML-Content" id="S2.F1.7.m3.1c"><apply id="S2.F1.7.m3.1.1.cmml" xref="S2.F1.7.m3.1.1"><csymbol cd="ambiguous" id="S2.F1.7.m3.1.1.1.cmml" xref="S2.F1.7.m3.1.1">subscript</csymbol><ci id="S2.F1.7.m3.1.1.2.cmml" xref="S2.F1.7.m3.1.1.2">𝑬</ci><ci id="S2.F1.7.m3.1.1.3.cmml" xref="S2.F1.7.m3.1.1.3">𝑀</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.F1.7.m3.1d">\boldsymbol{E}_{M}</annotation><annotation encoding="application/x-llamapun" id="S2.F1.7.m3.1e">bold_italic_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT</annotation></semantics></math> is the MV encoder, and <math alttext="\boldsymbol{D}_{M}" class="ltx_Math" display="inline" id="S2.F1.8.m4.1"><semantics id="S2.F1.8.m4.1b"><msub id="S2.F1.8.m4.1.1" xref="S2.F1.8.m4.1.1.cmml"><mi id="S2.F1.8.m4.1.1.2" xref="S2.F1.8.m4.1.1.2.cmml">𝑫</mi><mi id="S2.F1.8.m4.1.1.3" xref="S2.F1.8.m4.1.1.3.cmml">M</mi></msub><annotation-xml encoding="MathML-Content" id="S2.F1.8.m4.1c"><apply id="S2.F1.8.m4.1.1.cmml" xref="S2.F1.8.m4.1.1"><csymbol cd="ambiguous" id="S2.F1.8.m4.1.1.1.cmml" xref="S2.F1.8.m4.1.1">subscript</csymbol><ci id="S2.F1.8.m4.1.1.2.cmml" xref="S2.F1.8.m4.1.1.2">𝑫</ci><ci id="S2.F1.8.m4.1.1.3.cmml" xref="S2.F1.8.m4.1.1.3">𝑀</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.F1.8.m4.1d">\boldsymbol{D}_{M}</annotation><annotation encoding="application/x-llamapun" id="S2.F1.8.m4.1e">bold_italic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT</annotation></semantics></math> is the MV decoder. LGMC is the proposed joint local and global motion compensation module.</figcaption> </figure> <figure class="ltx_figure" id="S2.F2"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="284" id="S2.F2.g1" src="x2.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text ltx_font_bold" id="S2.F2.2.1.1">Fig. 2</span>: </span>Illustration of the joint local and global motion compensation module (LGMC) at encoder side.</figcaption> </figure> <figure class="ltx_figure" id="S2.F3"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="284" id="S2.F3.g1" src="x3.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text ltx_font_bold" id="S2.F3.2.1.1">Fig. 3</span>: </span>Illustration of the joint local and global motion compensation module (LGMC) at decoder side.</figcaption> </figure> <div class="ltx_para" id="S2.SS2.p1"> <p class="ltx_p" id="S2.SS2.p1.11">The <span class="ltx_text ltx_font_italic" id="S2.SS2.p1.11.1">vanilla</span> approach for global compensation is introduced first. Due to the limitations of flow-based motion compensation, global motion compensation is utilized for learned video compression. It has been recognized that the CNN has limited receptive fields, such that global attention is adopted for global motion compensation. Regarding the compression of <math alttext="\boldsymbol{x}_{t}" class="ltx_Math" display="inline" id="S2.SS2.p1.1.m1.1"><semantics id="S2.SS2.p1.1.m1.1a"><msub id="S2.SS2.p1.1.m1.1.1" xref="S2.SS2.p1.1.m1.1.1.cmml"><mi id="S2.SS2.p1.1.m1.1.1.2" xref="S2.SS2.p1.1.m1.1.1.2.cmml">𝒙</mi><mi id="S2.SS2.p1.1.m1.1.1.3" xref="S2.SS2.p1.1.m1.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.1.m1.1b"><apply id="S2.SS2.p1.1.m1.1.1.cmml" xref="S2.SS2.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p1.1.m1.1.1.1.cmml" xref="S2.SS2.p1.1.m1.1.1">subscript</csymbol><ci id="S2.SS2.p1.1.m1.1.1.2.cmml" xref="S2.SS2.p1.1.m1.1.1.2">𝒙</ci><ci id="S2.SS2.p1.1.m1.1.1.3.cmml" xref="S2.SS2.p1.1.m1.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.1.m1.1c">\boldsymbol{x}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.1.m1.1d">bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>, given multi-scales features <math alttext="\hat{\boldsymbol{f}}_{t}^{0},\hat{\boldsymbol{f}}_{t}^{1},\hat{\boldsymbol{f}}% _{t}^{2}" class="ltx_Math" display="inline" id="S2.SS2.p1.2.m2.3"><semantics id="S2.SS2.p1.2.m2.3a"><mrow id="S2.SS2.p1.2.m2.3.3.3" xref="S2.SS2.p1.2.m2.3.3.4.cmml"><msubsup id="S2.SS2.p1.2.m2.1.1.1.1" xref="S2.SS2.p1.2.m2.1.1.1.1.cmml"><mover accent="true" id="S2.SS2.p1.2.m2.1.1.1.1.2.2" xref="S2.SS2.p1.2.m2.1.1.1.1.2.2.cmml"><mi id="S2.SS2.p1.2.m2.1.1.1.1.2.2.2" xref="S2.SS2.p1.2.m2.1.1.1.1.2.2.2.cmml">𝒇</mi><mo id="S2.SS2.p1.2.m2.1.1.1.1.2.2.1" xref="S2.SS2.p1.2.m2.1.1.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.SS2.p1.2.m2.1.1.1.1.2.3" xref="S2.SS2.p1.2.m2.1.1.1.1.2.3.cmml">t</mi><mn id="S2.SS2.p1.2.m2.1.1.1.1.3" xref="S2.SS2.p1.2.m2.1.1.1.1.3.cmml">0</mn></msubsup><mo id="S2.SS2.p1.2.m2.3.3.3.4" xref="S2.SS2.p1.2.m2.3.3.4.cmml">,</mo><msubsup id="S2.SS2.p1.2.m2.2.2.2.2" xref="S2.SS2.p1.2.m2.2.2.2.2.cmml"><mover accent="true" id="S2.SS2.p1.2.m2.2.2.2.2.2.2" xref="S2.SS2.p1.2.m2.2.2.2.2.2.2.cmml"><mi id="S2.SS2.p1.2.m2.2.2.2.2.2.2.2" xref="S2.SS2.p1.2.m2.2.2.2.2.2.2.2.cmml">𝒇</mi><mo id="S2.SS2.p1.2.m2.2.2.2.2.2.2.1" xref="S2.SS2.p1.2.m2.2.2.2.2.2.2.1.cmml">^</mo></mover><mi id="S2.SS2.p1.2.m2.2.2.2.2.2.3" xref="S2.SS2.p1.2.m2.2.2.2.2.2.3.cmml">t</mi><mn id="S2.SS2.p1.2.m2.2.2.2.2.3" xref="S2.SS2.p1.2.m2.2.2.2.2.3.cmml">1</mn></msubsup><mo id="S2.SS2.p1.2.m2.3.3.3.5" xref="S2.SS2.p1.2.m2.3.3.4.cmml">,</mo><msubsup id="S2.SS2.p1.2.m2.3.3.3.3" xref="S2.SS2.p1.2.m2.3.3.3.3.cmml"><mover accent="true" id="S2.SS2.p1.2.m2.3.3.3.3.2.2" xref="S2.SS2.p1.2.m2.3.3.3.3.2.2.cmml"><mi id="S2.SS2.p1.2.m2.3.3.3.3.2.2.2" xref="S2.SS2.p1.2.m2.3.3.3.3.2.2.2.cmml">𝒇</mi><mo id="S2.SS2.p1.2.m2.3.3.3.3.2.2.1" xref="S2.SS2.p1.2.m2.3.3.3.3.2.2.1.cmml">^</mo></mover><mi id="S2.SS2.p1.2.m2.3.3.3.3.2.3" xref="S2.SS2.p1.2.m2.3.3.3.3.2.3.cmml">t</mi><mn id="S2.SS2.p1.2.m2.3.3.3.3.3" xref="S2.SS2.p1.2.m2.3.3.3.3.3.cmml">2</mn></msubsup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.2.m2.3b"><list id="S2.SS2.p1.2.m2.3.3.4.cmml" xref="S2.SS2.p1.2.m2.3.3.3"><apply id="S2.SS2.p1.2.m2.1.1.1.1.cmml" xref="S2.SS2.p1.2.m2.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p1.2.m2.1.1.1.1.1.cmml" xref="S2.SS2.p1.2.m2.1.1.1.1">superscript</csymbol><apply id="S2.SS2.p1.2.m2.1.1.1.1.2.cmml" xref="S2.SS2.p1.2.m2.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p1.2.m2.1.1.1.1.2.1.cmml" xref="S2.SS2.p1.2.m2.1.1.1.1">subscript</csymbol><apply id="S2.SS2.p1.2.m2.1.1.1.1.2.2.cmml" xref="S2.SS2.p1.2.m2.1.1.1.1.2.2"><ci id="S2.SS2.p1.2.m2.1.1.1.1.2.2.1.cmml" xref="S2.SS2.p1.2.m2.1.1.1.1.2.2.1">^</ci><ci id="S2.SS2.p1.2.m2.1.1.1.1.2.2.2.cmml" xref="S2.SS2.p1.2.m2.1.1.1.1.2.2.2">𝒇</ci></apply><ci id="S2.SS2.p1.2.m2.1.1.1.1.2.3.cmml" xref="S2.SS2.p1.2.m2.1.1.1.1.2.3">𝑡</ci></apply><cn id="S2.SS2.p1.2.m2.1.1.1.1.3.cmml" type="integer" xref="S2.SS2.p1.2.m2.1.1.1.1.3">0</cn></apply><apply id="S2.SS2.p1.2.m2.2.2.2.2.cmml" xref="S2.SS2.p1.2.m2.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS2.p1.2.m2.2.2.2.2.1.cmml" xref="S2.SS2.p1.2.m2.2.2.2.2">superscript</csymbol><apply id="S2.SS2.p1.2.m2.2.2.2.2.2.cmml" xref="S2.SS2.p1.2.m2.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS2.p1.2.m2.2.2.2.2.2.1.cmml" xref="S2.SS2.p1.2.m2.2.2.2.2">subscript</csymbol><apply id="S2.SS2.p1.2.m2.2.2.2.2.2.2.cmml" xref="S2.SS2.p1.2.m2.2.2.2.2.2.2"><ci id="S2.SS2.p1.2.m2.2.2.2.2.2.2.1.cmml" xref="S2.SS2.p1.2.m2.2.2.2.2.2.2.1">^</ci><ci id="S2.SS2.p1.2.m2.2.2.2.2.2.2.2.cmml" xref="S2.SS2.p1.2.m2.2.2.2.2.2.2.2">𝒇</ci></apply><ci id="S2.SS2.p1.2.m2.2.2.2.2.2.3.cmml" xref="S2.SS2.p1.2.m2.2.2.2.2.2.3">𝑡</ci></apply><cn id="S2.SS2.p1.2.m2.2.2.2.2.3.cmml" type="integer" xref="S2.SS2.p1.2.m2.2.2.2.2.3">1</cn></apply><apply id="S2.SS2.p1.2.m2.3.3.3.3.cmml" xref="S2.SS2.p1.2.m2.3.3.3.3"><csymbol cd="ambiguous" id="S2.SS2.p1.2.m2.3.3.3.3.1.cmml" xref="S2.SS2.p1.2.m2.3.3.3.3">superscript</csymbol><apply id="S2.SS2.p1.2.m2.3.3.3.3.2.cmml" xref="S2.SS2.p1.2.m2.3.3.3.3"><csymbol cd="ambiguous" id="S2.SS2.p1.2.m2.3.3.3.3.2.1.cmml" xref="S2.SS2.p1.2.m2.3.3.3.3">subscript</csymbol><apply id="S2.SS2.p1.2.m2.3.3.3.3.2.2.cmml" xref="S2.SS2.p1.2.m2.3.3.3.3.2.2"><ci id="S2.SS2.p1.2.m2.3.3.3.3.2.2.1.cmml" xref="S2.SS2.p1.2.m2.3.3.3.3.2.2.1">^</ci><ci id="S2.SS2.p1.2.m2.3.3.3.3.2.2.2.cmml" xref="S2.SS2.p1.2.m2.3.3.3.3.2.2.2">𝒇</ci></apply><ci id="S2.SS2.p1.2.m2.3.3.3.3.2.3.cmml" xref="S2.SS2.p1.2.m2.3.3.3.3.2.3">𝑡</ci></apply><cn id="S2.SS2.p1.2.m2.3.3.3.3.3.cmml" type="integer" xref="S2.SS2.p1.2.m2.3.3.3.3.3">2</cn></apply></list></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.2.m2.3c">\hat{\boldsymbol{f}}_{t}^{0},\hat{\boldsymbol{f}}_{t}^{1},\hat{\boldsymbol{f}}% _{t}^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.2.m2.3d">over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math>, current frame <math alttext="\boldsymbol{x}_{t}" class="ltx_Math" display="inline" id="S2.SS2.p1.3.m3.1"><semantics id="S2.SS2.p1.3.m3.1a"><msub id="S2.SS2.p1.3.m3.1.1" xref="S2.SS2.p1.3.m3.1.1.cmml"><mi id="S2.SS2.p1.3.m3.1.1.2" xref="S2.SS2.p1.3.m3.1.1.2.cmml">𝒙</mi><mi id="S2.SS2.p1.3.m3.1.1.3" xref="S2.SS2.p1.3.m3.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.3.m3.1b"><apply id="S2.SS2.p1.3.m3.1.1.cmml" xref="S2.SS2.p1.3.m3.1.1"><csymbol cd="ambiguous" id="S2.SS2.p1.3.m3.1.1.1.cmml" xref="S2.SS2.p1.3.m3.1.1">subscript</csymbol><ci id="S2.SS2.p1.3.m3.1.1.2.cmml" xref="S2.SS2.p1.3.m3.1.1.2">𝒙</ci><ci id="S2.SS2.p1.3.m3.1.1.3.cmml" xref="S2.SS2.p1.3.m3.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.3.m3.1c">\boldsymbol{x}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.3.m3.1d">bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>, and middle features <math alttext="\boldsymbol{y}_{t}^{1},\boldsymbol{y}_{t}^{2}" class="ltx_Math" display="inline" id="S2.SS2.p1.4.m4.2"><semantics id="S2.SS2.p1.4.m4.2a"><mrow id="S2.SS2.p1.4.m4.2.2.2" xref="S2.SS2.p1.4.m4.2.2.3.cmml"><msubsup id="S2.SS2.p1.4.m4.1.1.1.1" xref="S2.SS2.p1.4.m4.1.1.1.1.cmml"><mi id="S2.SS2.p1.4.m4.1.1.1.1.2.2" xref="S2.SS2.p1.4.m4.1.1.1.1.2.2.cmml">𝒚</mi><mi id="S2.SS2.p1.4.m4.1.1.1.1.2.3" xref="S2.SS2.p1.4.m4.1.1.1.1.2.3.cmml">t</mi><mn id="S2.SS2.p1.4.m4.1.1.1.1.3" xref="S2.SS2.p1.4.m4.1.1.1.1.3.cmml">1</mn></msubsup><mo id="S2.SS2.p1.4.m4.2.2.2.3" xref="S2.SS2.p1.4.m4.2.2.3.cmml">,</mo><msubsup id="S2.SS2.p1.4.m4.2.2.2.2" xref="S2.SS2.p1.4.m4.2.2.2.2.cmml"><mi id="S2.SS2.p1.4.m4.2.2.2.2.2.2" xref="S2.SS2.p1.4.m4.2.2.2.2.2.2.cmml">𝒚</mi><mi id="S2.SS2.p1.4.m4.2.2.2.2.2.3" xref="S2.SS2.p1.4.m4.2.2.2.2.2.3.cmml">t</mi><mn id="S2.SS2.p1.4.m4.2.2.2.2.3" xref="S2.SS2.p1.4.m4.2.2.2.2.3.cmml">2</mn></msubsup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.4.m4.2b"><list id="S2.SS2.p1.4.m4.2.2.3.cmml" xref="S2.SS2.p1.4.m4.2.2.2"><apply id="S2.SS2.p1.4.m4.1.1.1.1.cmml" xref="S2.SS2.p1.4.m4.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p1.4.m4.1.1.1.1.1.cmml" xref="S2.SS2.p1.4.m4.1.1.1.1">superscript</csymbol><apply id="S2.SS2.p1.4.m4.1.1.1.1.2.cmml" xref="S2.SS2.p1.4.m4.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p1.4.m4.1.1.1.1.2.1.cmml" xref="S2.SS2.p1.4.m4.1.1.1.1">subscript</csymbol><ci id="S2.SS2.p1.4.m4.1.1.1.1.2.2.cmml" xref="S2.SS2.p1.4.m4.1.1.1.1.2.2">𝒚</ci><ci id="S2.SS2.p1.4.m4.1.1.1.1.2.3.cmml" xref="S2.SS2.p1.4.m4.1.1.1.1.2.3">𝑡</ci></apply><cn id="S2.SS2.p1.4.m4.1.1.1.1.3.cmml" type="integer" xref="S2.SS2.p1.4.m4.1.1.1.1.3">1</cn></apply><apply id="S2.SS2.p1.4.m4.2.2.2.2.cmml" xref="S2.SS2.p1.4.m4.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS2.p1.4.m4.2.2.2.2.1.cmml" xref="S2.SS2.p1.4.m4.2.2.2.2">superscript</csymbol><apply id="S2.SS2.p1.4.m4.2.2.2.2.2.cmml" xref="S2.SS2.p1.4.m4.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS2.p1.4.m4.2.2.2.2.2.1.cmml" xref="S2.SS2.p1.4.m4.2.2.2.2">subscript</csymbol><ci id="S2.SS2.p1.4.m4.2.2.2.2.2.2.cmml" xref="S2.SS2.p1.4.m4.2.2.2.2.2.2">𝒚</ci><ci id="S2.SS2.p1.4.m4.2.2.2.2.2.3.cmml" xref="S2.SS2.p1.4.m4.2.2.2.2.2.3">𝑡</ci></apply><cn id="S2.SS2.p1.4.m4.2.2.2.2.3.cmml" type="integer" xref="S2.SS2.p1.4.m4.2.2.2.2.3">2</cn></apply></list></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.4.m4.2c">\boldsymbol{y}_{t}^{1},\boldsymbol{y}_{t}^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.4.m4.2d">bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math>, we take <math alttext="\boldsymbol{y}_{t}^{2}\in\mathbb{R}^{C\times\frac{L}{16}},\hat{\boldsymbol{f}}% _{t}^{1}\in\mathbb{R}^{C\times L}" class="ltx_Math" display="inline" id="S2.SS2.p1.5.m5.2"><semantics id="S2.SS2.p1.5.m5.2a"><mrow id="S2.SS2.p1.5.m5.2.2.2" xref="S2.SS2.p1.5.m5.2.2.3.cmml"><mrow id="S2.SS2.p1.5.m5.1.1.1.1" xref="S2.SS2.p1.5.m5.1.1.1.1.cmml"><msubsup id="S2.SS2.p1.5.m5.1.1.1.1.2" xref="S2.SS2.p1.5.m5.1.1.1.1.2.cmml"><mi id="S2.SS2.p1.5.m5.1.1.1.1.2.2.2" xref="S2.SS2.p1.5.m5.1.1.1.1.2.2.2.cmml">𝒚</mi><mi id="S2.SS2.p1.5.m5.1.1.1.1.2.2.3" xref="S2.SS2.p1.5.m5.1.1.1.1.2.2.3.cmml">t</mi><mn id="S2.SS2.p1.5.m5.1.1.1.1.2.3" xref="S2.SS2.p1.5.m5.1.1.1.1.2.3.cmml">2</mn></msubsup><mo id="S2.SS2.p1.5.m5.1.1.1.1.1" xref="S2.SS2.p1.5.m5.1.1.1.1.1.cmml">∈</mo><msup id="S2.SS2.p1.5.m5.1.1.1.1.3" xref="S2.SS2.p1.5.m5.1.1.1.1.3.cmml"><mi id="S2.SS2.p1.5.m5.1.1.1.1.3.2" xref="S2.SS2.p1.5.m5.1.1.1.1.3.2.cmml">ℝ</mi><mrow id="S2.SS2.p1.5.m5.1.1.1.1.3.3" xref="S2.SS2.p1.5.m5.1.1.1.1.3.3.cmml"><mi id="S2.SS2.p1.5.m5.1.1.1.1.3.3.2" xref="S2.SS2.p1.5.m5.1.1.1.1.3.3.2.cmml">C</mi><mo id="S2.SS2.p1.5.m5.1.1.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS2.p1.5.m5.1.1.1.1.3.3.1.cmml">×</mo><mfrac id="S2.SS2.p1.5.m5.1.1.1.1.3.3.3" xref="S2.SS2.p1.5.m5.1.1.1.1.3.3.3.cmml"><mi id="S2.SS2.p1.5.m5.1.1.1.1.3.3.3.2" xref="S2.SS2.p1.5.m5.1.1.1.1.3.3.3.2.cmml">L</mi><mn id="S2.SS2.p1.5.m5.1.1.1.1.3.3.3.3" xref="S2.SS2.p1.5.m5.1.1.1.1.3.3.3.3.cmml">16</mn></mfrac></mrow></msup></mrow><mo id="S2.SS2.p1.5.m5.2.2.2.3" xref="S2.SS2.p1.5.m5.2.2.3a.cmml">,</mo><mrow id="S2.SS2.p1.5.m5.2.2.2.2" xref="S2.SS2.p1.5.m5.2.2.2.2.cmml"><msubsup id="S2.SS2.p1.5.m5.2.2.2.2.2" xref="S2.SS2.p1.5.m5.2.2.2.2.2.cmml"><mover accent="true" id="S2.SS2.p1.5.m5.2.2.2.2.2.2.2" xref="S2.SS2.p1.5.m5.2.2.2.2.2.2.2.cmml"><mi id="S2.SS2.p1.5.m5.2.2.2.2.2.2.2.2" xref="S2.SS2.p1.5.m5.2.2.2.2.2.2.2.2.cmml">𝒇</mi><mo id="S2.SS2.p1.5.m5.2.2.2.2.2.2.2.1" xref="S2.SS2.p1.5.m5.2.2.2.2.2.2.2.1.cmml">^</mo></mover><mi id="S2.SS2.p1.5.m5.2.2.2.2.2.2.3" xref="S2.SS2.p1.5.m5.2.2.2.2.2.2.3.cmml">t</mi><mn id="S2.SS2.p1.5.m5.2.2.2.2.2.3" xref="S2.SS2.p1.5.m5.2.2.2.2.2.3.cmml">1</mn></msubsup><mo id="S2.SS2.p1.5.m5.2.2.2.2.1" xref="S2.SS2.p1.5.m5.2.2.2.2.1.cmml">∈</mo><msup id="S2.SS2.p1.5.m5.2.2.2.2.3" xref="S2.SS2.p1.5.m5.2.2.2.2.3.cmml"><mi id="S2.SS2.p1.5.m5.2.2.2.2.3.2" xref="S2.SS2.p1.5.m5.2.2.2.2.3.2.cmml">ℝ</mi><mrow id="S2.SS2.p1.5.m5.2.2.2.2.3.3" xref="S2.SS2.p1.5.m5.2.2.2.2.3.3.cmml"><mi id="S2.SS2.p1.5.m5.2.2.2.2.3.3.2" xref="S2.SS2.p1.5.m5.2.2.2.2.3.3.2.cmml">C</mi><mo id="S2.SS2.p1.5.m5.2.2.2.2.3.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS2.p1.5.m5.2.2.2.2.3.3.1.cmml">×</mo><mi id="S2.SS2.p1.5.m5.2.2.2.2.3.3.3" xref="S2.SS2.p1.5.m5.2.2.2.2.3.3.3.cmml">L</mi></mrow></msup></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.5.m5.2b"><apply id="S2.SS2.p1.5.m5.2.2.3.cmml" xref="S2.SS2.p1.5.m5.2.2.2"><csymbol cd="ambiguous" id="S2.SS2.p1.5.m5.2.2.3a.cmml" xref="S2.SS2.p1.5.m5.2.2.2.3">formulae-sequence</csymbol><apply id="S2.SS2.p1.5.m5.1.1.1.1.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1"><in id="S2.SS2.p1.5.m5.1.1.1.1.1.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.1"></in><apply id="S2.SS2.p1.5.m5.1.1.1.1.2.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.2"><csymbol cd="ambiguous" id="S2.SS2.p1.5.m5.1.1.1.1.2.1.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.2">superscript</csymbol><apply id="S2.SS2.p1.5.m5.1.1.1.1.2.2.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.2"><csymbol cd="ambiguous" id="S2.SS2.p1.5.m5.1.1.1.1.2.2.1.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.2">subscript</csymbol><ci id="S2.SS2.p1.5.m5.1.1.1.1.2.2.2.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.2.2.2">𝒚</ci><ci id="S2.SS2.p1.5.m5.1.1.1.1.2.2.3.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.2.2.3">𝑡</ci></apply><cn id="S2.SS2.p1.5.m5.1.1.1.1.2.3.cmml" type="integer" xref="S2.SS2.p1.5.m5.1.1.1.1.2.3">2</cn></apply><apply id="S2.SS2.p1.5.m5.1.1.1.1.3.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.SS2.p1.5.m5.1.1.1.1.3.1.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.3">superscript</csymbol><ci id="S2.SS2.p1.5.m5.1.1.1.1.3.2.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.3.2">ℝ</ci><apply id="S2.SS2.p1.5.m5.1.1.1.1.3.3.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.3.3"><times id="S2.SS2.p1.5.m5.1.1.1.1.3.3.1.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.3.3.1"></times><ci id="S2.SS2.p1.5.m5.1.1.1.1.3.3.2.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.3.3.2">𝐶</ci><apply id="S2.SS2.p1.5.m5.1.1.1.1.3.3.3.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.3.3.3"><divide id="S2.SS2.p1.5.m5.1.1.1.1.3.3.3.1.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.3.3.3"></divide><ci id="S2.SS2.p1.5.m5.1.1.1.1.3.3.3.2.cmml" xref="S2.SS2.p1.5.m5.1.1.1.1.3.3.3.2">𝐿</ci><cn id="S2.SS2.p1.5.m5.1.1.1.1.3.3.3.3.cmml" type="integer" xref="S2.SS2.p1.5.m5.1.1.1.1.3.3.3.3">16</cn></apply></apply></apply></apply><apply id="S2.SS2.p1.5.m5.2.2.2.2.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2"><in id="S2.SS2.p1.5.m5.2.2.2.2.1.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.1"></in><apply id="S2.SS2.p1.5.m5.2.2.2.2.2.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS2.p1.5.m5.2.2.2.2.2.1.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.2">superscript</csymbol><apply id="S2.SS2.p1.5.m5.2.2.2.2.2.2.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS2.p1.5.m5.2.2.2.2.2.2.1.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.2">subscript</csymbol><apply id="S2.SS2.p1.5.m5.2.2.2.2.2.2.2.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.2.2.2"><ci id="S2.SS2.p1.5.m5.2.2.2.2.2.2.2.1.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.2.2.2.1">^</ci><ci id="S2.SS2.p1.5.m5.2.2.2.2.2.2.2.2.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.2.2.2.2">𝒇</ci></apply><ci id="S2.SS2.p1.5.m5.2.2.2.2.2.2.3.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.2.2.3">𝑡</ci></apply><cn id="S2.SS2.p1.5.m5.2.2.2.2.2.3.cmml" type="integer" xref="S2.SS2.p1.5.m5.2.2.2.2.2.3">1</cn></apply><apply id="S2.SS2.p1.5.m5.2.2.2.2.3.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.3"><csymbol cd="ambiguous" id="S2.SS2.p1.5.m5.2.2.2.2.3.1.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.3">superscript</csymbol><ci id="S2.SS2.p1.5.m5.2.2.2.2.3.2.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.3.2">ℝ</ci><apply id="S2.SS2.p1.5.m5.2.2.2.2.3.3.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.3.3"><times id="S2.SS2.p1.5.m5.2.2.2.2.3.3.1.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.3.3.1"></times><ci id="S2.SS2.p1.5.m5.2.2.2.2.3.3.2.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.3.3.2">𝐶</ci><ci id="S2.SS2.p1.5.m5.2.2.2.2.3.3.3.cmml" xref="S2.SS2.p1.5.m5.2.2.2.2.3.3.3">𝐿</ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.5.m5.2c">\boldsymbol{y}_{t}^{2}\in\mathbb{R}^{C\times\frac{L}{16}},\hat{\boldsymbol{f}}% _{t}^{1}\in\mathbb{R}^{C\times L}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.5.m5.2d">bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × divide start_ARG italic_L end_ARG start_ARG 16 end_ARG end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_L end_POSTSUPERSCRIPT</annotation></semantics></math> as an example. where <math alttext="C" class="ltx_Math" display="inline" id="S2.SS2.p1.6.m6.1"><semantics id="S2.SS2.p1.6.m6.1a"><mi id="S2.SS2.p1.6.m6.1.1" xref="S2.SS2.p1.6.m6.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.6.m6.1b"><ci id="S2.SS2.p1.6.m6.1.1.cmml" xref="S2.SS2.p1.6.m6.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.6.m6.1c">C</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.6.m6.1d">italic_C</annotation></semantics></math> is the channel number, and <math alttext="L=H\cdot W" class="ltx_Math" display="inline" id="S2.SS2.p1.7.m7.1"><semantics id="S2.SS2.p1.7.m7.1a"><mrow id="S2.SS2.p1.7.m7.1.1" xref="S2.SS2.p1.7.m7.1.1.cmml"><mi id="S2.SS2.p1.7.m7.1.1.2" xref="S2.SS2.p1.7.m7.1.1.2.cmml">L</mi><mo id="S2.SS2.p1.7.m7.1.1.1" xref="S2.SS2.p1.7.m7.1.1.1.cmml">=</mo><mrow id="S2.SS2.p1.7.m7.1.1.3" xref="S2.SS2.p1.7.m7.1.1.3.cmml"><mi id="S2.SS2.p1.7.m7.1.1.3.2" xref="S2.SS2.p1.7.m7.1.1.3.2.cmml">H</mi><mo id="S2.SS2.p1.7.m7.1.1.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS2.p1.7.m7.1.1.3.1.cmml">⋅</mo><mi id="S2.SS2.p1.7.m7.1.1.3.3" xref="S2.SS2.p1.7.m7.1.1.3.3.cmml">W</mi></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.7.m7.1b"><apply id="S2.SS2.p1.7.m7.1.1.cmml" xref="S2.SS2.p1.7.m7.1.1"><eq id="S2.SS2.p1.7.m7.1.1.1.cmml" xref="S2.SS2.p1.7.m7.1.1.1"></eq><ci id="S2.SS2.p1.7.m7.1.1.2.cmml" xref="S2.SS2.p1.7.m7.1.1.2">𝐿</ci><apply id="S2.SS2.p1.7.m7.1.1.3.cmml" xref="S2.SS2.p1.7.m7.1.1.3"><ci id="S2.SS2.p1.7.m7.1.1.3.1.cmml" xref="S2.SS2.p1.7.m7.1.1.3.1">⋅</ci><ci id="S2.SS2.p1.7.m7.1.1.3.2.cmml" xref="S2.SS2.p1.7.m7.1.1.3.2">𝐻</ci><ci id="S2.SS2.p1.7.m7.1.1.3.3.cmml" xref="S2.SS2.p1.7.m7.1.1.3.3">𝑊</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.7.m7.1c">L=H\cdot W</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.7.m7.1d">italic_L = italic_H ⋅ italic_W</annotation></semantics></math>, <math alttext="H" class="ltx_Math" display="inline" id="S2.SS2.p1.8.m8.1"><semantics id="S2.SS2.p1.8.m8.1a"><mi id="S2.SS2.p1.8.m8.1.1" xref="S2.SS2.p1.8.m8.1.1.cmml">H</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.8.m8.1b"><ci id="S2.SS2.p1.8.m8.1.1.cmml" xref="S2.SS2.p1.8.m8.1.1">𝐻</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.8.m8.1c">H</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.8.m8.1d">italic_H</annotation></semantics></math> is the height and <math alttext="W" class="ltx_Math" display="inline" id="S2.SS2.p1.9.m9.1"><semantics id="S2.SS2.p1.9.m9.1a"><mi id="S2.SS2.p1.9.m9.1.1" xref="S2.SS2.p1.9.m9.1.1.cmml">W</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.9.m9.1b"><ci id="S2.SS2.p1.9.m9.1.1.cmml" xref="S2.SS2.p1.9.m9.1.1">𝑊</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.9.m9.1c">W</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.9.m9.1d">italic_W</annotation></semantics></math> is the width of the frame. The <math alttext="\boldsymbol{y}_{t}^{2}" class="ltx_Math" display="inline" id="S2.SS2.p1.10.m10.1"><semantics id="S2.SS2.p1.10.m10.1a"><msubsup id="S2.SS2.p1.10.m10.1.1" xref="S2.SS2.p1.10.m10.1.1.cmml"><mi id="S2.SS2.p1.10.m10.1.1.2.2" xref="S2.SS2.p1.10.m10.1.1.2.2.cmml">𝒚</mi><mi id="S2.SS2.p1.10.m10.1.1.2.3" xref="S2.SS2.p1.10.m10.1.1.2.3.cmml">t</mi><mn id="S2.SS2.p1.10.m10.1.1.3" xref="S2.SS2.p1.10.m10.1.1.3.cmml">2</mn></msubsup><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.10.m10.1b"><apply id="S2.SS2.p1.10.m10.1.1.cmml" xref="S2.SS2.p1.10.m10.1.1"><csymbol cd="ambiguous" id="S2.SS2.p1.10.m10.1.1.1.cmml" xref="S2.SS2.p1.10.m10.1.1">superscript</csymbol><apply id="S2.SS2.p1.10.m10.1.1.2.cmml" xref="S2.SS2.p1.10.m10.1.1"><csymbol cd="ambiguous" id="S2.SS2.p1.10.m10.1.1.2.1.cmml" xref="S2.SS2.p1.10.m10.1.1">subscript</csymbol><ci id="S2.SS2.p1.10.m10.1.1.2.2.cmml" xref="S2.SS2.p1.10.m10.1.1.2.2">𝒚</ci><ci id="S2.SS2.p1.10.m10.1.1.2.3.cmml" xref="S2.SS2.p1.10.m10.1.1.2.3">𝑡</ci></apply><cn id="S2.SS2.p1.10.m10.1.1.3.cmml" type="integer" xref="S2.SS2.p1.10.m10.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.10.m10.1c">\boldsymbol{y}_{t}^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.10.m10.1d">bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math> and <math alttext="\hat{\boldsymbol{f}}_{t}^{2}" class="ltx_Math" display="inline" id="S2.SS2.p1.11.m11.1"><semantics id="S2.SS2.p1.11.m11.1a"><msubsup id="S2.SS2.p1.11.m11.1.1" xref="S2.SS2.p1.11.m11.1.1.cmml"><mover accent="true" id="S2.SS2.p1.11.m11.1.1.2.2" xref="S2.SS2.p1.11.m11.1.1.2.2.cmml"><mi id="S2.SS2.p1.11.m11.1.1.2.2.2" xref="S2.SS2.p1.11.m11.1.1.2.2.2.cmml">𝒇</mi><mo id="S2.SS2.p1.11.m11.1.1.2.2.1" xref="S2.SS2.p1.11.m11.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.SS2.p1.11.m11.1.1.2.3" xref="S2.SS2.p1.11.m11.1.1.2.3.cmml">t</mi><mn id="S2.SS2.p1.11.m11.1.1.3" xref="S2.SS2.p1.11.m11.1.1.3.cmml">2</mn></msubsup><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.11.m11.1b"><apply id="S2.SS2.p1.11.m11.1.1.cmml" xref="S2.SS2.p1.11.m11.1.1"><csymbol cd="ambiguous" id="S2.SS2.p1.11.m11.1.1.1.cmml" xref="S2.SS2.p1.11.m11.1.1">superscript</csymbol><apply id="S2.SS2.p1.11.m11.1.1.2.cmml" xref="S2.SS2.p1.11.m11.1.1"><csymbol cd="ambiguous" id="S2.SS2.p1.11.m11.1.1.2.1.cmml" xref="S2.SS2.p1.11.m11.1.1">subscript</csymbol><apply id="S2.SS2.p1.11.m11.1.1.2.2.cmml" xref="S2.SS2.p1.11.m11.1.1.2.2"><ci id="S2.SS2.p1.11.m11.1.1.2.2.1.cmml" xref="S2.SS2.p1.11.m11.1.1.2.2.1">^</ci><ci id="S2.SS2.p1.11.m11.1.1.2.2.2.cmml" xref="S2.SS2.p1.11.m11.1.1.2.2.2">𝒇</ci></apply><ci id="S2.SS2.p1.11.m11.1.1.2.3.cmml" xref="S2.SS2.p1.11.m11.1.1.2.3">𝑡</ci></apply><cn id="S2.SS2.p1.11.m11.1.1.3.cmml" type="integer" xref="S2.SS2.p1.11.m11.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.11.m11.1c">\hat{\boldsymbol{f}}_{t}^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.11.m11.1d">over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math> are first fed into a Depth Residual Bottleneck <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib12" title="">12</a>]</cite> for nonlinear embedding. The <span class="ltx_text ltx_font_italic" id="S2.SS2.p1.11.2">vanilla</span> approach adopts cross attention <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib13" title="">13</a>, <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib14" title="">14</a>]</cite>, which is formulated as:</p> <table class="ltx_equation ltx_eqn_table" id="S2.E2"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\hat{\boldsymbol{g}}_{t}^{2}=\underbrace{\textrm{softmax}\left(\boldsymbol{y}_% {t}^{2}\left(\hat{\boldsymbol{f}}_{t}^{2}\right)^{\top}\right)}_{\textrm{non-% negative}}\hat{\boldsymbol{f}}_{t}^{2}." class="ltx_Math" display="block" id="S2.E2.m1.2"><semantics id="S2.E2.m1.2a"><mrow id="S2.E2.m1.2.2.1" xref="S2.E2.m1.2.2.1.1.cmml"><mrow id="S2.E2.m1.2.2.1.1" xref="S2.E2.m1.2.2.1.1.cmml"><msubsup id="S2.E2.m1.2.2.1.1.2" xref="S2.E2.m1.2.2.1.1.2.cmml"><mover accent="true" id="S2.E2.m1.2.2.1.1.2.2.2" xref="S2.E2.m1.2.2.1.1.2.2.2.cmml"><mi id="S2.E2.m1.2.2.1.1.2.2.2.2" xref="S2.E2.m1.2.2.1.1.2.2.2.2.cmml">𝒈</mi><mo id="S2.E2.m1.2.2.1.1.2.2.2.1" xref="S2.E2.m1.2.2.1.1.2.2.2.1.cmml">^</mo></mover><mi id="S2.E2.m1.2.2.1.1.2.2.3" xref="S2.E2.m1.2.2.1.1.2.2.3.cmml">t</mi><mn id="S2.E2.m1.2.2.1.1.2.3" xref="S2.E2.m1.2.2.1.1.2.3.cmml">2</mn></msubsup><mo id="S2.E2.m1.2.2.1.1.1" xref="S2.E2.m1.2.2.1.1.1.cmml">=</mo><mrow id="S2.E2.m1.2.2.1.1.3" xref="S2.E2.m1.2.2.1.1.3.cmml"><munder id="S2.E2.m1.2.2.1.1.3.2" xref="S2.E2.m1.2.2.1.1.3.2.cmml"><munder accentunder="true" id="S2.E2.m1.1.1" xref="S2.E2.m1.1.1.cmml"><mrow id="S2.E2.m1.1.1.1" xref="S2.E2.m1.1.1.1.cmml"><mtext id="S2.E2.m1.1.1.1.3" xref="S2.E2.m1.1.1.1.3a.cmml">softmax</mtext><mo id="S2.E2.m1.1.1.1.2" xref="S2.E2.m1.1.1.1.2.cmml"></mo><mrow id="S2.E2.m1.1.1.1.1.1" xref="S2.E2.m1.1.1.1.1.1.1.cmml"><mo id="S2.E2.m1.1.1.1.1.1.2" xref="S2.E2.m1.1.1.1.1.1.1.cmml">(</mo><mrow id="S2.E2.m1.1.1.1.1.1.1" xref="S2.E2.m1.1.1.1.1.1.1.cmml"><msubsup id="S2.E2.m1.1.1.1.1.1.1.3" xref="S2.E2.m1.1.1.1.1.1.1.3.cmml"><mi id="S2.E2.m1.1.1.1.1.1.1.3.2.2" xref="S2.E2.m1.1.1.1.1.1.1.3.2.2.cmml">𝒚</mi><mi id="S2.E2.m1.1.1.1.1.1.1.3.2.3" xref="S2.E2.m1.1.1.1.1.1.1.3.2.3.cmml">t</mi><mn id="S2.E2.m1.1.1.1.1.1.1.3.3" xref="S2.E2.m1.1.1.1.1.1.1.3.3.cmml">2</mn></msubsup><mo id="S2.E2.m1.1.1.1.1.1.1.2" xref="S2.E2.m1.1.1.1.1.1.1.2.cmml"></mo><msup id="S2.E2.m1.1.1.1.1.1.1.1" xref="S2.E2.m1.1.1.1.1.1.1.1.cmml"><mrow id="S2.E2.m1.1.1.1.1.1.1.1.1.1" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.cmml"><mo id="S2.E2.m1.1.1.1.1.1.1.1.1.1.2" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.cmml">(</mo><msubsup id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.cmml"><mover accent="true" id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.2" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.2.cmml"><mi id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.2.2" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.2.2.cmml">𝒇</mi><mo id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.2.1" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.3" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.3.cmml">t</mi><mn id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.3" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.3.cmml">2</mn></msubsup><mo id="S2.E2.m1.1.1.1.1.1.1.1.1.1.3" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.cmml">)</mo></mrow><mo id="S2.E2.m1.1.1.1.1.1.1.1.3" xref="S2.E2.m1.1.1.1.1.1.1.1.3.cmml">⊤</mo></msup></mrow><mo id="S2.E2.m1.1.1.1.1.1.3" xref="S2.E2.m1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S2.E2.m1.1.1.2" xref="S2.E2.m1.1.1.2.cmml">⏟</mo></munder><mtext id="S2.E2.m1.2.2.1.1.3.2.2" xref="S2.E2.m1.2.2.1.1.3.2.2a.cmml">non-negative</mtext></munder><mo id="S2.E2.m1.2.2.1.1.3.1" xref="S2.E2.m1.2.2.1.1.3.1.cmml"></mo><msubsup id="S2.E2.m1.2.2.1.1.3.3" xref="S2.E2.m1.2.2.1.1.3.3.cmml"><mover accent="true" id="S2.E2.m1.2.2.1.1.3.3.2.2" xref="S2.E2.m1.2.2.1.1.3.3.2.2.cmml"><mi id="S2.E2.m1.2.2.1.1.3.3.2.2.2" xref="S2.E2.m1.2.2.1.1.3.3.2.2.2.cmml">𝒇</mi><mo id="S2.E2.m1.2.2.1.1.3.3.2.2.1" xref="S2.E2.m1.2.2.1.1.3.3.2.2.1.cmml">^</mo></mover><mi id="S2.E2.m1.2.2.1.1.3.3.2.3" xref="S2.E2.m1.2.2.1.1.3.3.2.3.cmml">t</mi><mn id="S2.E2.m1.2.2.1.1.3.3.3" xref="S2.E2.m1.2.2.1.1.3.3.3.cmml">2</mn></msubsup></mrow></mrow><mo id="S2.E2.m1.2.2.1.2" lspace="0em" xref="S2.E2.m1.2.2.1.1.cmml">.</mo></mrow><annotation-xml encoding="MathML-Content" id="S2.E2.m1.2b"><apply id="S2.E2.m1.2.2.1.1.cmml" xref="S2.E2.m1.2.2.1"><eq id="S2.E2.m1.2.2.1.1.1.cmml" xref="S2.E2.m1.2.2.1.1.1"></eq><apply id="S2.E2.m1.2.2.1.1.2.cmml" xref="S2.E2.m1.2.2.1.1.2"><csymbol cd="ambiguous" id="S2.E2.m1.2.2.1.1.2.1.cmml" xref="S2.E2.m1.2.2.1.1.2">superscript</csymbol><apply id="S2.E2.m1.2.2.1.1.2.2.cmml" xref="S2.E2.m1.2.2.1.1.2"><csymbol cd="ambiguous" id="S2.E2.m1.2.2.1.1.2.2.1.cmml" xref="S2.E2.m1.2.2.1.1.2">subscript</csymbol><apply id="S2.E2.m1.2.2.1.1.2.2.2.cmml" xref="S2.E2.m1.2.2.1.1.2.2.2"><ci id="S2.E2.m1.2.2.1.1.2.2.2.1.cmml" xref="S2.E2.m1.2.2.1.1.2.2.2.1">^</ci><ci id="S2.E2.m1.2.2.1.1.2.2.2.2.cmml" xref="S2.E2.m1.2.2.1.1.2.2.2.2">𝒈</ci></apply><ci id="S2.E2.m1.2.2.1.1.2.2.3.cmml" xref="S2.E2.m1.2.2.1.1.2.2.3">𝑡</ci></apply><cn id="S2.E2.m1.2.2.1.1.2.3.cmml" type="integer" xref="S2.E2.m1.2.2.1.1.2.3">2</cn></apply><apply id="S2.E2.m1.2.2.1.1.3.cmml" xref="S2.E2.m1.2.2.1.1.3"><times id="S2.E2.m1.2.2.1.1.3.1.cmml" xref="S2.E2.m1.2.2.1.1.3.1"></times><apply id="S2.E2.m1.2.2.1.1.3.2.cmml" xref="S2.E2.m1.2.2.1.1.3.2"><csymbol cd="ambiguous" id="S2.E2.m1.2.2.1.1.3.2.1.cmml" xref="S2.E2.m1.2.2.1.1.3.2">subscript</csymbol><apply id="S2.E2.m1.1.1.cmml" xref="S2.E2.m1.1.1"><ci id="S2.E2.m1.1.1.2.cmml" xref="S2.E2.m1.1.1.2">⏟</ci><apply id="S2.E2.m1.1.1.1.cmml" xref="S2.E2.m1.1.1.1"><times id="S2.E2.m1.1.1.1.2.cmml" xref="S2.E2.m1.1.1.1.2"></times><ci id="S2.E2.m1.1.1.1.3a.cmml" xref="S2.E2.m1.1.1.1.3"><mtext id="S2.E2.m1.1.1.1.3.cmml" xref="S2.E2.m1.1.1.1.3">softmax</mtext></ci><apply id="S2.E2.m1.1.1.1.1.1.1.cmml" xref="S2.E2.m1.1.1.1.1.1"><times id="S2.E2.m1.1.1.1.1.1.1.2.cmml" xref="S2.E2.m1.1.1.1.1.1.1.2"></times><apply id="S2.E2.m1.1.1.1.1.1.1.3.cmml" xref="S2.E2.m1.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.E2.m1.1.1.1.1.1.1.3.1.cmml" xref="S2.E2.m1.1.1.1.1.1.1.3">superscript</csymbol><apply id="S2.E2.m1.1.1.1.1.1.1.3.2.cmml" xref="S2.E2.m1.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.E2.m1.1.1.1.1.1.1.3.2.1.cmml" xref="S2.E2.m1.1.1.1.1.1.1.3">subscript</csymbol><ci id="S2.E2.m1.1.1.1.1.1.1.3.2.2.cmml" xref="S2.E2.m1.1.1.1.1.1.1.3.2.2">𝒚</ci><ci id="S2.E2.m1.1.1.1.1.1.1.3.2.3.cmml" xref="S2.E2.m1.1.1.1.1.1.1.3.2.3">𝑡</ci></apply><cn id="S2.E2.m1.1.1.1.1.1.1.3.3.cmml" type="integer" xref="S2.E2.m1.1.1.1.1.1.1.3.3">2</cn></apply><apply id="S2.E2.m1.1.1.1.1.1.1.1.cmml" xref="S2.E2.m1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.E2.m1.1.1.1.1.1.1.1.2.cmml" xref="S2.E2.m1.1.1.1.1.1.1.1">superscript</csymbol><apply id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1">superscript</csymbol><apply id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.cmml" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.1.cmml" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1">subscript</csymbol><apply id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.2.cmml" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.2"><ci id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.2.1.cmml" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.2.1">^</ci><ci id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.2.2.cmml" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.2.2">𝒇</ci></apply><ci id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.3.cmml" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.2.3">𝑡</ci></apply><cn id="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.3.cmml" type="integer" xref="S2.E2.m1.1.1.1.1.1.1.1.1.1.1.3">2</cn></apply><csymbol cd="latexml" id="S2.E2.m1.1.1.1.1.1.1.1.3.cmml" xref="S2.E2.m1.1.1.1.1.1.1.1.3">top</csymbol></apply></apply></apply></apply><ci id="S2.E2.m1.2.2.1.1.3.2.2a.cmml" xref="S2.E2.m1.2.2.1.1.3.2.2"><mtext id="S2.E2.m1.2.2.1.1.3.2.2.cmml" mathsize="70%" xref="S2.E2.m1.2.2.1.1.3.2.2">non-negative</mtext></ci></apply><apply id="S2.E2.m1.2.2.1.1.3.3.cmml" xref="S2.E2.m1.2.2.1.1.3.3"><csymbol cd="ambiguous" id="S2.E2.m1.2.2.1.1.3.3.1.cmml" xref="S2.E2.m1.2.2.1.1.3.3">superscript</csymbol><apply id="S2.E2.m1.2.2.1.1.3.3.2.cmml" xref="S2.E2.m1.2.2.1.1.3.3"><csymbol cd="ambiguous" id="S2.E2.m1.2.2.1.1.3.3.2.1.cmml" xref="S2.E2.m1.2.2.1.1.3.3">subscript</csymbol><apply id="S2.E2.m1.2.2.1.1.3.3.2.2.cmml" xref="S2.E2.m1.2.2.1.1.3.3.2.2"><ci id="S2.E2.m1.2.2.1.1.3.3.2.2.1.cmml" xref="S2.E2.m1.2.2.1.1.3.3.2.2.1">^</ci><ci id="S2.E2.m1.2.2.1.1.3.3.2.2.2.cmml" xref="S2.E2.m1.2.2.1.1.3.3.2.2.2">𝒇</ci></apply><ci id="S2.E2.m1.2.2.1.1.3.3.2.3.cmml" xref="S2.E2.m1.2.2.1.1.3.3.2.3">𝑡</ci></apply><cn id="S2.E2.m1.2.2.1.1.3.3.3.cmml" type="integer" xref="S2.E2.m1.2.2.1.1.3.3.3">2</cn></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E2.m1.2c">\hat{\boldsymbol{g}}_{t}^{2}=\underbrace{\textrm{softmax}\left(\boldsymbol{y}_% {t}^{2}\left(\hat{\boldsymbol{f}}_{t}^{2}\right)^{\top}\right)}_{\textrm{non-% negative}}\hat{\boldsymbol{f}}_{t}^{2}.</annotation><annotation encoding="application/x-llamapun" id="S2.E2.m1.2d">over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = under⏟ start_ARG softmax ( bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) end_ARG start_POSTSUBSCRIPT non-negative end_POSTSUBSCRIPT over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(2)</span></td> </tr></tbody> </table> </div> <figure class="ltx_figure" id="S2.F4"> <div class="ltx_flex_figure"> <div class="ltx_flex_cell ltx_flex_size_3"> <figure class="ltx_figure ltx_flex_size_3 ltx_align_center" id="S2.F4.sf1"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="170" id="S2.F4.sf1.g1" src="x4.png" width="198"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">(a) </span></figcaption> </figure> </div> <div class="ltx_flex_cell ltx_flex_size_3"> <figure class="ltx_figure ltx_flex_size_3 ltx_align_center" id="S2.F4.sf2"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="170" id="S2.F4.sf2.g1" src="x5.png" width="206"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">(b) </span></figcaption> </figure> </div> <div class="ltx_flex_cell ltx_flex_size_3"> <figure class="ltx_figure ltx_flex_size_3 ltx_align_center" id="S2.F4.sf3"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="170" id="S2.F4.sf3.g1" src="x6.png" width="198"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">(c) </span></figcaption> </figure> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_2"> <figure class="ltx_figure ltx_flex_size_2 ltx_align_center" id="S2.F4.sf4"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="170" id="S2.F4.sf4.g1" src="x7.png" width="191"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">(d) </span></figcaption> </figure> </div> <div class="ltx_flex_cell ltx_flex_size_2"> <figure class="ltx_figure ltx_flex_size_2 ltx_align_center" id="S2.F4.sf5"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="170" id="S2.F4.sf5.g1" src="x8.png" width="191"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">(e) </span></figcaption> </figure> </div> </div> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text ltx_font_bold" id="S2.F4.2.1.1">Fig. 4</span>: </span>Illustration of rate-distortion performance of the proposed LVC-LGMC, DCVC-TCM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib9" title="">9</a>]</cite>, DCVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib2" title="">2</a>]</cite>, DVCPro <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib15" title="">15</a>]</cite>, HM-16.20 and x265 codec. The distortion is PSNR. Please zoom in for better view. </figcaption> </figure> <div class="ltx_para" id="S2.SS2.p2"> <p class="ltx_p" id="S2.SS2.p2.2">The <math alttext="\textrm{softmax}\left(\boldsymbol{y}_{t}^{2}\left(\hat{\boldsymbol{f}}_{t}^{2}% \right)^{\top}\right)\in\mathbb{R}^{\frac{L}{16}\times\frac{L}{16}}\geq 0" class="ltx_Math" display="inline" id="S2.SS2.p2.1.m1.1"><semantics id="S2.SS2.p2.1.m1.1a"><mrow id="S2.SS2.p2.1.m1.1.1" xref="S2.SS2.p2.1.m1.1.1.cmml"><mrow id="S2.SS2.p2.1.m1.1.1.1" xref="S2.SS2.p2.1.m1.1.1.1.cmml"><mtext id="S2.SS2.p2.1.m1.1.1.1.3" xref="S2.SS2.p2.1.m1.1.1.1.3a.cmml">softmax</mtext><mo id="S2.SS2.p2.1.m1.1.1.1.2" xref="S2.SS2.p2.1.m1.1.1.1.2.cmml"></mo><mrow id="S2.SS2.p2.1.m1.1.1.1.1.1" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.cmml"><mo id="S2.SS2.p2.1.m1.1.1.1.1.1.2" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.cmml">(</mo><mrow id="S2.SS2.p2.1.m1.1.1.1.1.1.1" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.cmml"><msubsup id="S2.SS2.p2.1.m1.1.1.1.1.1.1.3" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.cmml"><mi id="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.2.2" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.2.2.cmml">𝒚</mi><mi id="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.2.3" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.2.3.cmml">t</mi><mn id="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.3" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.3.cmml">2</mn></msubsup><mo id="S2.SS2.p2.1.m1.1.1.1.1.1.1.2" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.2.cmml"></mo><msup id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.cmml"><mrow id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.cmml"><mo id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.2" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.cmml">(</mo><msubsup id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.cmml"><mover accent="true" id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.2" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.2.cmml"><mi id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.2.2" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.2.2.cmml">𝒇</mi><mo id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.2.1" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.3" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.3.cmml">t</mi><mn id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.3" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.3.cmml">2</mn></msubsup><mo id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.3" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.cmml">)</mo></mrow><mo id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.3" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.3.cmml">⊤</mo></msup></mrow><mo id="S2.SS2.p2.1.m1.1.1.1.1.1.3" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S2.SS2.p2.1.m1.1.1.3" xref="S2.SS2.p2.1.m1.1.1.3.cmml">∈</mo><msup id="S2.SS2.p2.1.m1.1.1.4" xref="S2.SS2.p2.1.m1.1.1.4.cmml"><mi id="S2.SS2.p2.1.m1.1.1.4.2" xref="S2.SS2.p2.1.m1.1.1.4.2.cmml">ℝ</mi><mrow id="S2.SS2.p2.1.m1.1.1.4.3" xref="S2.SS2.p2.1.m1.1.1.4.3.cmml"><mfrac id="S2.SS2.p2.1.m1.1.1.4.3.2" xref="S2.SS2.p2.1.m1.1.1.4.3.2.cmml"><mi id="S2.SS2.p2.1.m1.1.1.4.3.2.2" xref="S2.SS2.p2.1.m1.1.1.4.3.2.2.cmml">L</mi><mn id="S2.SS2.p2.1.m1.1.1.4.3.2.3" xref="S2.SS2.p2.1.m1.1.1.4.3.2.3.cmml">16</mn></mfrac><mo id="S2.SS2.p2.1.m1.1.1.4.3.1" lspace="0.222em" rspace="0.222em" xref="S2.SS2.p2.1.m1.1.1.4.3.1.cmml">×</mo><mfrac id="S2.SS2.p2.1.m1.1.1.4.3.3" xref="S2.SS2.p2.1.m1.1.1.4.3.3.cmml"><mi id="S2.SS2.p2.1.m1.1.1.4.3.3.2" xref="S2.SS2.p2.1.m1.1.1.4.3.3.2.cmml">L</mi><mn id="S2.SS2.p2.1.m1.1.1.4.3.3.3" xref="S2.SS2.p2.1.m1.1.1.4.3.3.3.cmml">16</mn></mfrac></mrow></msup><mo id="S2.SS2.p2.1.m1.1.1.5" xref="S2.SS2.p2.1.m1.1.1.5.cmml">≥</mo><mn id="S2.SS2.p2.1.m1.1.1.6" xref="S2.SS2.p2.1.m1.1.1.6.cmml">0</mn></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.p2.1.m1.1b"><apply id="S2.SS2.p2.1.m1.1.1.cmml" xref="S2.SS2.p2.1.m1.1.1"><and id="S2.SS2.p2.1.m1.1.1a.cmml" xref="S2.SS2.p2.1.m1.1.1"></and><apply id="S2.SS2.p2.1.m1.1.1b.cmml" xref="S2.SS2.p2.1.m1.1.1"><in id="S2.SS2.p2.1.m1.1.1.3.cmml" xref="S2.SS2.p2.1.m1.1.1.3"></in><apply id="S2.SS2.p2.1.m1.1.1.1.cmml" xref="S2.SS2.p2.1.m1.1.1.1"><times id="S2.SS2.p2.1.m1.1.1.1.2.cmml" xref="S2.SS2.p2.1.m1.1.1.1.2"></times><ci id="S2.SS2.p2.1.m1.1.1.1.3a.cmml" xref="S2.SS2.p2.1.m1.1.1.1.3"><mtext id="S2.SS2.p2.1.m1.1.1.1.3.cmml" xref="S2.SS2.p2.1.m1.1.1.1.3">softmax</mtext></ci><apply id="S2.SS2.p2.1.m1.1.1.1.1.1.1.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1"><times id="S2.SS2.p2.1.m1.1.1.1.1.1.1.2.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.2"></times><apply id="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.1.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.3">superscript</csymbol><apply id="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.2.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.2.1.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.3">subscript</csymbol><ci id="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.2.2.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.2.2">𝒚</ci><ci id="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.2.3.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.2.3">𝑡</ci></apply><cn id="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.3.cmml" type="integer" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.3.3">2</cn></apply><apply id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.2.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1">superscript</csymbol><apply id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1">superscript</csymbol><apply id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.1.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1">subscript</csymbol><apply id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.2.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.2"><ci id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.2.1.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.2.1">^</ci><ci id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.2.2.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.2.2">𝒇</ci></apply><ci id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.3.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.2.3">𝑡</ci></apply><cn id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.3.cmml" type="integer" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.1.1.1.3">2</cn></apply><csymbol cd="latexml" id="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.3.cmml" xref="S2.SS2.p2.1.m1.1.1.1.1.1.1.1.3">top</csymbol></apply></apply></apply><apply id="S2.SS2.p2.1.m1.1.1.4.cmml" xref="S2.SS2.p2.1.m1.1.1.4"><csymbol cd="ambiguous" id="S2.SS2.p2.1.m1.1.1.4.1.cmml" xref="S2.SS2.p2.1.m1.1.1.4">superscript</csymbol><ci id="S2.SS2.p2.1.m1.1.1.4.2.cmml" xref="S2.SS2.p2.1.m1.1.1.4.2">ℝ</ci><apply id="S2.SS2.p2.1.m1.1.1.4.3.cmml" xref="S2.SS2.p2.1.m1.1.1.4.3"><times id="S2.SS2.p2.1.m1.1.1.4.3.1.cmml" xref="S2.SS2.p2.1.m1.1.1.4.3.1"></times><apply id="S2.SS2.p2.1.m1.1.1.4.3.2.cmml" xref="S2.SS2.p2.1.m1.1.1.4.3.2"><divide id="S2.SS2.p2.1.m1.1.1.4.3.2.1.cmml" xref="S2.SS2.p2.1.m1.1.1.4.3.2"></divide><ci id="S2.SS2.p2.1.m1.1.1.4.3.2.2.cmml" xref="S2.SS2.p2.1.m1.1.1.4.3.2.2">𝐿</ci><cn id="S2.SS2.p2.1.m1.1.1.4.3.2.3.cmml" type="integer" xref="S2.SS2.p2.1.m1.1.1.4.3.2.3">16</cn></apply><apply id="S2.SS2.p2.1.m1.1.1.4.3.3.cmml" xref="S2.SS2.p2.1.m1.1.1.4.3.3"><divide id="S2.SS2.p2.1.m1.1.1.4.3.3.1.cmml" xref="S2.SS2.p2.1.m1.1.1.4.3.3"></divide><ci id="S2.SS2.p2.1.m1.1.1.4.3.3.2.cmml" xref="S2.SS2.p2.1.m1.1.1.4.3.3.2">𝐿</ci><cn id="S2.SS2.p2.1.m1.1.1.4.3.3.3.cmml" type="integer" xref="S2.SS2.p2.1.m1.1.1.4.3.3.3">16</cn></apply></apply></apply></apply><apply id="S2.SS2.p2.1.m1.1.1c.cmml" xref="S2.SS2.p2.1.m1.1.1"><geq id="S2.SS2.p2.1.m1.1.1.5.cmml" xref="S2.SS2.p2.1.m1.1.1.5"></geq><share href="#S2.SS2.p2.1.m1.1.1.4.cmml" id="S2.SS2.p2.1.m1.1.1d.cmml" xref="S2.SS2.p2.1.m1.1.1"></share><cn id="S2.SS2.p2.1.m1.1.1.6.cmml" type="integer" xref="S2.SS2.p2.1.m1.1.1.6">0</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p2.1.m1.1c">\textrm{softmax}\left(\boldsymbol{y}_{t}^{2}\left(\hat{\boldsymbol{f}}_{t}^{2}% \right)^{\top}\right)\in\mathbb{R}^{\frac{L}{16}\times\frac{L}{16}}\geq 0</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p2.1.m1.1d">softmax ( bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT divide start_ARG italic_L end_ARG start_ARG 16 end_ARG × divide start_ARG italic_L end_ARG start_ARG 16 end_ARG end_POSTSUPERSCRIPT ≥ 0</annotation></semantics></math>, and it can be treated as the similarity metric. It computes the similarity between a symbol and all other symbols, such that the similarity can be captured. The attention and <math alttext="\boldsymbol{y}_{t}^{2}" class="ltx_Math" display="inline" id="S2.SS2.p2.2.m2.1"><semantics id="S2.SS2.p2.2.m2.1a"><msubsup id="S2.SS2.p2.2.m2.1.1" xref="S2.SS2.p2.2.m2.1.1.cmml"><mi id="S2.SS2.p2.2.m2.1.1.2.2" xref="S2.SS2.p2.2.m2.1.1.2.2.cmml">𝒚</mi><mi id="S2.SS2.p2.2.m2.1.1.2.3" xref="S2.SS2.p2.2.m2.1.1.2.3.cmml">t</mi><mn id="S2.SS2.p2.2.m2.1.1.3" xref="S2.SS2.p2.2.m2.1.1.3.cmml">2</mn></msubsup><annotation-xml encoding="MathML-Content" id="S2.SS2.p2.2.m2.1b"><apply id="S2.SS2.p2.2.m2.1.1.cmml" xref="S2.SS2.p2.2.m2.1.1"><csymbol cd="ambiguous" id="S2.SS2.p2.2.m2.1.1.1.cmml" xref="S2.SS2.p2.2.m2.1.1">superscript</csymbol><apply id="S2.SS2.p2.2.m2.1.1.2.cmml" xref="S2.SS2.p2.2.m2.1.1"><csymbol cd="ambiguous" id="S2.SS2.p2.2.m2.1.1.2.1.cmml" xref="S2.SS2.p2.2.m2.1.1">subscript</csymbol><ci id="S2.SS2.p2.2.m2.1.1.2.2.cmml" xref="S2.SS2.p2.2.m2.1.1.2.2">𝒚</ci><ci id="S2.SS2.p2.2.m2.1.1.2.3.cmml" xref="S2.SS2.p2.2.m2.1.1.2.3">𝑡</ci></apply><cn id="S2.SS2.p2.2.m2.1.1.3.cmml" type="integer" xref="S2.SS2.p2.2.m2.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p2.2.m2.1c">\boldsymbol{y}_{t}^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p2.2.m2.1d">bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math> are then concatenated to conduct conditional coding on the basis of global dependency.</p> </div> <div class="ltx_para" id="S2.SS2.p3"> <p class="ltx_p" id="S2.SS2.p3.3">However, the computational complexity of Eqn. <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S2.E2" title="2 ‣ 2.2 Attention-based Global Compensation ‣ 2 The Proposed LVC-LGMC Method ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_tag">2</span></a> is <math alttext="O(L^{2})" class="ltx_Math" display="inline" id="S2.SS2.p3.1.m1.1"><semantics id="S2.SS2.p3.1.m1.1a"><mrow id="S2.SS2.p3.1.m1.1.1" xref="S2.SS2.p3.1.m1.1.1.cmml"><mi id="S2.SS2.p3.1.m1.1.1.3" xref="S2.SS2.p3.1.m1.1.1.3.cmml">O</mi><mo id="S2.SS2.p3.1.m1.1.1.2" xref="S2.SS2.p3.1.m1.1.1.2.cmml"></mo><mrow id="S2.SS2.p3.1.m1.1.1.1.1" xref="S2.SS2.p3.1.m1.1.1.1.1.1.cmml"><mo id="S2.SS2.p3.1.m1.1.1.1.1.2" stretchy="false" xref="S2.SS2.p3.1.m1.1.1.1.1.1.cmml">(</mo><msup id="S2.SS2.p3.1.m1.1.1.1.1.1" xref="S2.SS2.p3.1.m1.1.1.1.1.1.cmml"><mi id="S2.SS2.p3.1.m1.1.1.1.1.1.2" xref="S2.SS2.p3.1.m1.1.1.1.1.1.2.cmml">L</mi><mn id="S2.SS2.p3.1.m1.1.1.1.1.1.3" xref="S2.SS2.p3.1.m1.1.1.1.1.1.3.cmml">2</mn></msup><mo id="S2.SS2.p3.1.m1.1.1.1.1.3" stretchy="false" xref="S2.SS2.p3.1.m1.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.p3.1.m1.1b"><apply id="S2.SS2.p3.1.m1.1.1.cmml" xref="S2.SS2.p3.1.m1.1.1"><times id="S2.SS2.p3.1.m1.1.1.2.cmml" xref="S2.SS2.p3.1.m1.1.1.2"></times><ci id="S2.SS2.p3.1.m1.1.1.3.cmml" xref="S2.SS2.p3.1.m1.1.1.3">𝑂</ci><apply id="S2.SS2.p3.1.m1.1.1.1.1.1.cmml" xref="S2.SS2.p3.1.m1.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p3.1.m1.1.1.1.1.1.1.cmml" xref="S2.SS2.p3.1.m1.1.1.1.1">superscript</csymbol><ci id="S2.SS2.p3.1.m1.1.1.1.1.1.2.cmml" xref="S2.SS2.p3.1.m1.1.1.1.1.1.2">𝐿</ci><cn id="S2.SS2.p3.1.m1.1.1.1.1.1.3.cmml" type="integer" xref="S2.SS2.p3.1.m1.1.1.1.1.1.3">2</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p3.1.m1.1c">O(L^{2})</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p3.1.m1.1d">italic_O ( italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )</annotation></semantics></math>. The quadratic complexity makes it hard to employ the <span class="ltx_text ltx_font_italic" id="S2.SS2.p3.3.1">vanilla</span> approach for high-resolution video coding. The quadratic is caused by the softmax operation, which specifies the order of matrix multiplication. To solve the quadratic complexity, we employ efficient attention <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib10" title="">10</a>, <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib16" title="">16</a>]</cite>, which employs the softmax operation on <math alttext="\boldsymbol{y}_{t}^{2}" class="ltx_Math" display="inline" id="S2.SS2.p3.2.m2.1"><semantics id="S2.SS2.p3.2.m2.1a"><msubsup id="S2.SS2.p3.2.m2.1.1" xref="S2.SS2.p3.2.m2.1.1.cmml"><mi id="S2.SS2.p3.2.m2.1.1.2.2" xref="S2.SS2.p3.2.m2.1.1.2.2.cmml">𝒚</mi><mi id="S2.SS2.p3.2.m2.1.1.2.3" xref="S2.SS2.p3.2.m2.1.1.2.3.cmml">t</mi><mn id="S2.SS2.p3.2.m2.1.1.3" xref="S2.SS2.p3.2.m2.1.1.3.cmml">2</mn></msubsup><annotation-xml encoding="MathML-Content" id="S2.SS2.p3.2.m2.1b"><apply id="S2.SS2.p3.2.m2.1.1.cmml" xref="S2.SS2.p3.2.m2.1.1"><csymbol cd="ambiguous" id="S2.SS2.p3.2.m2.1.1.1.cmml" xref="S2.SS2.p3.2.m2.1.1">superscript</csymbol><apply id="S2.SS2.p3.2.m2.1.1.2.cmml" xref="S2.SS2.p3.2.m2.1.1"><csymbol cd="ambiguous" id="S2.SS2.p3.2.m2.1.1.2.1.cmml" xref="S2.SS2.p3.2.m2.1.1">subscript</csymbol><ci id="S2.SS2.p3.2.m2.1.1.2.2.cmml" xref="S2.SS2.p3.2.m2.1.1.2.2">𝒚</ci><ci id="S2.SS2.p3.2.m2.1.1.2.3.cmml" xref="S2.SS2.p3.2.m2.1.1.2.3">𝑡</ci></apply><cn id="S2.SS2.p3.2.m2.1.1.3.cmml" type="integer" xref="S2.SS2.p3.2.m2.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p3.2.m2.1c">\boldsymbol{y}_{t}^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p3.2.m2.1d">bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math> in row and the softmax operation on <math alttext="\hat{\boldsymbol{f}}_{t}^{2}" class="ltx_Math" display="inline" id="S2.SS2.p3.3.m3.1"><semantics id="S2.SS2.p3.3.m3.1a"><msubsup id="S2.SS2.p3.3.m3.1.1" xref="S2.SS2.p3.3.m3.1.1.cmml"><mover accent="true" id="S2.SS2.p3.3.m3.1.1.2.2" xref="S2.SS2.p3.3.m3.1.1.2.2.cmml"><mi id="S2.SS2.p3.3.m3.1.1.2.2.2" xref="S2.SS2.p3.3.m3.1.1.2.2.2.cmml">𝒇</mi><mo id="S2.SS2.p3.3.m3.1.1.2.2.1" xref="S2.SS2.p3.3.m3.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.SS2.p3.3.m3.1.1.2.3" xref="S2.SS2.p3.3.m3.1.1.2.3.cmml">t</mi><mn id="S2.SS2.p3.3.m3.1.1.3" xref="S2.SS2.p3.3.m3.1.1.3.cmml">2</mn></msubsup><annotation-xml encoding="MathML-Content" id="S2.SS2.p3.3.m3.1b"><apply id="S2.SS2.p3.3.m3.1.1.cmml" xref="S2.SS2.p3.3.m3.1.1"><csymbol cd="ambiguous" id="S2.SS2.p3.3.m3.1.1.1.cmml" xref="S2.SS2.p3.3.m3.1.1">superscript</csymbol><apply id="S2.SS2.p3.3.m3.1.1.2.cmml" xref="S2.SS2.p3.3.m3.1.1"><csymbol cd="ambiguous" id="S2.SS2.p3.3.m3.1.1.2.1.cmml" xref="S2.SS2.p3.3.m3.1.1">subscript</csymbol><apply id="S2.SS2.p3.3.m3.1.1.2.2.cmml" xref="S2.SS2.p3.3.m3.1.1.2.2"><ci id="S2.SS2.p3.3.m3.1.1.2.2.1.cmml" xref="S2.SS2.p3.3.m3.1.1.2.2.1">^</ci><ci id="S2.SS2.p3.3.m3.1.1.2.2.2.cmml" xref="S2.SS2.p3.3.m3.1.1.2.2.2">𝒇</ci></apply><ci id="S2.SS2.p3.3.m3.1.1.2.3.cmml" xref="S2.SS2.p3.3.m3.1.1.2.3">𝑡</ci></apply><cn id="S2.SS2.p3.3.m3.1.1.3.cmml" type="integer" xref="S2.SS2.p3.3.m3.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p3.3.m3.1c">\hat{\boldsymbol{f}}_{t}^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p3.3.m3.1d">over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math> in column,</p> <table class="ltx_equation ltx_eqn_table" id="S2.E3"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\hat{\boldsymbol{g}}_{t}^{2}=\underbrace{\textrm{softmax}\left(\boldsymbol{y}_% {t}^{2}\right)\textrm{softmax}\left(\hat{\boldsymbol{f}}_{t}^{2}\right)^{\top}% }_{\textrm{non-negative}}\hat{\boldsymbol{f}}_{t}^{2}." class="ltx_Math" display="block" id="S2.E3.m1.3"><semantics id="S2.E3.m1.3a"><mrow id="S2.E3.m1.3.3.1" xref="S2.E3.m1.3.3.1.1.cmml"><mrow id="S2.E3.m1.3.3.1.1" xref="S2.E3.m1.3.3.1.1.cmml"><msubsup id="S2.E3.m1.3.3.1.1.2" xref="S2.E3.m1.3.3.1.1.2.cmml"><mover accent="true" id="S2.E3.m1.3.3.1.1.2.2.2" xref="S2.E3.m1.3.3.1.1.2.2.2.cmml"><mi id="S2.E3.m1.3.3.1.1.2.2.2.2" xref="S2.E3.m1.3.3.1.1.2.2.2.2.cmml">𝒈</mi><mo id="S2.E3.m1.3.3.1.1.2.2.2.1" xref="S2.E3.m1.3.3.1.1.2.2.2.1.cmml">^</mo></mover><mi id="S2.E3.m1.3.3.1.1.2.2.3" xref="S2.E3.m1.3.3.1.1.2.2.3.cmml">t</mi><mn id="S2.E3.m1.3.3.1.1.2.3" xref="S2.E3.m1.3.3.1.1.2.3.cmml">2</mn></msubsup><mo id="S2.E3.m1.3.3.1.1.1" xref="S2.E3.m1.3.3.1.1.1.cmml">=</mo><mrow id="S2.E3.m1.3.3.1.1.3" xref="S2.E3.m1.3.3.1.1.3.cmml"><munder id="S2.E3.m1.3.3.1.1.3.2" xref="S2.E3.m1.3.3.1.1.3.2.cmml"><munder accentunder="true" id="S2.E3.m1.2.2" xref="S2.E3.m1.2.2.cmml"><mrow id="S2.E3.m1.2.2.2" xref="S2.E3.m1.2.2.2.cmml"><mtext id="S2.E3.m1.2.2.2.4" xref="S2.E3.m1.2.2.2.4a.cmml">softmax</mtext><mo id="S2.E3.m1.2.2.2.3" xref="S2.E3.m1.2.2.2.3.cmml"></mo><mrow id="S2.E3.m1.1.1.1.1.1" xref="S2.E3.m1.1.1.1.1.1.1.cmml"><mo id="S2.E3.m1.1.1.1.1.1.2" xref="S2.E3.m1.1.1.1.1.1.1.cmml">(</mo><msubsup id="S2.E3.m1.1.1.1.1.1.1" xref="S2.E3.m1.1.1.1.1.1.1.cmml"><mi id="S2.E3.m1.1.1.1.1.1.1.2.2" xref="S2.E3.m1.1.1.1.1.1.1.2.2.cmml">𝒚</mi><mi id="S2.E3.m1.1.1.1.1.1.1.2.3" xref="S2.E3.m1.1.1.1.1.1.1.2.3.cmml">t</mi><mn id="S2.E3.m1.1.1.1.1.1.1.3" xref="S2.E3.m1.1.1.1.1.1.1.3.cmml">2</mn></msubsup><mo id="S2.E3.m1.1.1.1.1.1.3" xref="S2.E3.m1.1.1.1.1.1.1.cmml">)</mo></mrow><mo id="S2.E3.m1.2.2.2.3a" xref="S2.E3.m1.2.2.2.3.cmml"></mo><mtext id="S2.E3.m1.2.2.2.5" xref="S2.E3.m1.2.2.2.5a.cmml">softmax</mtext><mo id="S2.E3.m1.2.2.2.3b" xref="S2.E3.m1.2.2.2.3.cmml"></mo><msup id="S2.E3.m1.2.2.2.2" xref="S2.E3.m1.2.2.2.2.cmml"><mrow id="S2.E3.m1.2.2.2.2.1.1" xref="S2.E3.m1.2.2.2.2.1.1.1.cmml"><mo id="S2.E3.m1.2.2.2.2.1.1.2" xref="S2.E3.m1.2.2.2.2.1.1.1.cmml">(</mo><msubsup id="S2.E3.m1.2.2.2.2.1.1.1" xref="S2.E3.m1.2.2.2.2.1.1.1.cmml"><mover accent="true" id="S2.E3.m1.2.2.2.2.1.1.1.2.2" xref="S2.E3.m1.2.2.2.2.1.1.1.2.2.cmml"><mi id="S2.E3.m1.2.2.2.2.1.1.1.2.2.2" xref="S2.E3.m1.2.2.2.2.1.1.1.2.2.2.cmml">𝒇</mi><mo id="S2.E3.m1.2.2.2.2.1.1.1.2.2.1" xref="S2.E3.m1.2.2.2.2.1.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.E3.m1.2.2.2.2.1.1.1.2.3" xref="S2.E3.m1.2.2.2.2.1.1.1.2.3.cmml">t</mi><mn id="S2.E3.m1.2.2.2.2.1.1.1.3" xref="S2.E3.m1.2.2.2.2.1.1.1.3.cmml">2</mn></msubsup><mo id="S2.E3.m1.2.2.2.2.1.1.3" xref="S2.E3.m1.2.2.2.2.1.1.1.cmml">)</mo></mrow><mo id="S2.E3.m1.2.2.2.2.3" xref="S2.E3.m1.2.2.2.2.3.cmml">⊤</mo></msup></mrow><mo id="S2.E3.m1.2.2.3" xref="S2.E3.m1.2.2.3.cmml">⏟</mo></munder><mtext id="S2.E3.m1.3.3.1.1.3.2.2" xref="S2.E3.m1.3.3.1.1.3.2.2a.cmml">non-negative</mtext></munder><mo id="S2.E3.m1.3.3.1.1.3.1" xref="S2.E3.m1.3.3.1.1.3.1.cmml"></mo><msubsup id="S2.E3.m1.3.3.1.1.3.3" xref="S2.E3.m1.3.3.1.1.3.3.cmml"><mover accent="true" id="S2.E3.m1.3.3.1.1.3.3.2.2" xref="S2.E3.m1.3.3.1.1.3.3.2.2.cmml"><mi id="S2.E3.m1.3.3.1.1.3.3.2.2.2" xref="S2.E3.m1.3.3.1.1.3.3.2.2.2.cmml">𝒇</mi><mo id="S2.E3.m1.3.3.1.1.3.3.2.2.1" xref="S2.E3.m1.3.3.1.1.3.3.2.2.1.cmml">^</mo></mover><mi id="S2.E3.m1.3.3.1.1.3.3.2.3" xref="S2.E3.m1.3.3.1.1.3.3.2.3.cmml">t</mi><mn id="S2.E3.m1.3.3.1.1.3.3.3" xref="S2.E3.m1.3.3.1.1.3.3.3.cmml">2</mn></msubsup></mrow></mrow><mo id="S2.E3.m1.3.3.1.2" lspace="0em" xref="S2.E3.m1.3.3.1.1.cmml">.</mo></mrow><annotation-xml encoding="MathML-Content" id="S2.E3.m1.3b"><apply id="S2.E3.m1.3.3.1.1.cmml" xref="S2.E3.m1.3.3.1"><eq id="S2.E3.m1.3.3.1.1.1.cmml" xref="S2.E3.m1.3.3.1.1.1"></eq><apply id="S2.E3.m1.3.3.1.1.2.cmml" xref="S2.E3.m1.3.3.1.1.2"><csymbol cd="ambiguous" id="S2.E3.m1.3.3.1.1.2.1.cmml" xref="S2.E3.m1.3.3.1.1.2">superscript</csymbol><apply id="S2.E3.m1.3.3.1.1.2.2.cmml" xref="S2.E3.m1.3.3.1.1.2"><csymbol cd="ambiguous" id="S2.E3.m1.3.3.1.1.2.2.1.cmml" xref="S2.E3.m1.3.3.1.1.2">subscript</csymbol><apply id="S2.E3.m1.3.3.1.1.2.2.2.cmml" xref="S2.E3.m1.3.3.1.1.2.2.2"><ci id="S2.E3.m1.3.3.1.1.2.2.2.1.cmml" xref="S2.E3.m1.3.3.1.1.2.2.2.1">^</ci><ci id="S2.E3.m1.3.3.1.1.2.2.2.2.cmml" xref="S2.E3.m1.3.3.1.1.2.2.2.2">𝒈</ci></apply><ci id="S2.E3.m1.3.3.1.1.2.2.3.cmml" xref="S2.E3.m1.3.3.1.1.2.2.3">𝑡</ci></apply><cn id="S2.E3.m1.3.3.1.1.2.3.cmml" type="integer" xref="S2.E3.m1.3.3.1.1.2.3">2</cn></apply><apply id="S2.E3.m1.3.3.1.1.3.cmml" xref="S2.E3.m1.3.3.1.1.3"><times id="S2.E3.m1.3.3.1.1.3.1.cmml" xref="S2.E3.m1.3.3.1.1.3.1"></times><apply id="S2.E3.m1.3.3.1.1.3.2.cmml" xref="S2.E3.m1.3.3.1.1.3.2"><csymbol cd="ambiguous" id="S2.E3.m1.3.3.1.1.3.2.1.cmml" xref="S2.E3.m1.3.3.1.1.3.2">subscript</csymbol><apply id="S2.E3.m1.2.2.cmml" xref="S2.E3.m1.2.2"><ci id="S2.E3.m1.2.2.3.cmml" xref="S2.E3.m1.2.2.3">⏟</ci><apply id="S2.E3.m1.2.2.2.cmml" xref="S2.E3.m1.2.2.2"><times id="S2.E3.m1.2.2.2.3.cmml" xref="S2.E3.m1.2.2.2.3"></times><ci id="S2.E3.m1.2.2.2.4a.cmml" xref="S2.E3.m1.2.2.2.4"><mtext id="S2.E3.m1.2.2.2.4.cmml" xref="S2.E3.m1.2.2.2.4">softmax</mtext></ci><apply id="S2.E3.m1.1.1.1.1.1.1.cmml" xref="S2.E3.m1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.E3.m1.1.1.1.1.1.1.1.cmml" xref="S2.E3.m1.1.1.1.1.1">superscript</csymbol><apply id="S2.E3.m1.1.1.1.1.1.1.2.cmml" xref="S2.E3.m1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.E3.m1.1.1.1.1.1.1.2.1.cmml" xref="S2.E3.m1.1.1.1.1.1">subscript</csymbol><ci id="S2.E3.m1.1.1.1.1.1.1.2.2.cmml" xref="S2.E3.m1.1.1.1.1.1.1.2.2">𝒚</ci><ci id="S2.E3.m1.1.1.1.1.1.1.2.3.cmml" xref="S2.E3.m1.1.1.1.1.1.1.2.3">𝑡</ci></apply><cn id="S2.E3.m1.1.1.1.1.1.1.3.cmml" type="integer" xref="S2.E3.m1.1.1.1.1.1.1.3">2</cn></apply><ci id="S2.E3.m1.2.2.2.5a.cmml" xref="S2.E3.m1.2.2.2.5"><mtext id="S2.E3.m1.2.2.2.5.cmml" xref="S2.E3.m1.2.2.2.5">softmax</mtext></ci><apply id="S2.E3.m1.2.2.2.2.cmml" xref="S2.E3.m1.2.2.2.2"><csymbol cd="ambiguous" id="S2.E3.m1.2.2.2.2.2.cmml" xref="S2.E3.m1.2.2.2.2">superscript</csymbol><apply id="S2.E3.m1.2.2.2.2.1.1.1.cmml" xref="S2.E3.m1.2.2.2.2.1.1"><csymbol cd="ambiguous" id="S2.E3.m1.2.2.2.2.1.1.1.1.cmml" xref="S2.E3.m1.2.2.2.2.1.1">superscript</csymbol><apply id="S2.E3.m1.2.2.2.2.1.1.1.2.cmml" xref="S2.E3.m1.2.2.2.2.1.1"><csymbol cd="ambiguous" id="S2.E3.m1.2.2.2.2.1.1.1.2.1.cmml" xref="S2.E3.m1.2.2.2.2.1.1">subscript</csymbol><apply id="S2.E3.m1.2.2.2.2.1.1.1.2.2.cmml" xref="S2.E3.m1.2.2.2.2.1.1.1.2.2"><ci id="S2.E3.m1.2.2.2.2.1.1.1.2.2.1.cmml" xref="S2.E3.m1.2.2.2.2.1.1.1.2.2.1">^</ci><ci id="S2.E3.m1.2.2.2.2.1.1.1.2.2.2.cmml" xref="S2.E3.m1.2.2.2.2.1.1.1.2.2.2">𝒇</ci></apply><ci id="S2.E3.m1.2.2.2.2.1.1.1.2.3.cmml" xref="S2.E3.m1.2.2.2.2.1.1.1.2.3">𝑡</ci></apply><cn id="S2.E3.m1.2.2.2.2.1.1.1.3.cmml" type="integer" xref="S2.E3.m1.2.2.2.2.1.1.1.3">2</cn></apply><csymbol cd="latexml" id="S2.E3.m1.2.2.2.2.3.cmml" xref="S2.E3.m1.2.2.2.2.3">top</csymbol></apply></apply></apply><ci id="S2.E3.m1.3.3.1.1.3.2.2a.cmml" xref="S2.E3.m1.3.3.1.1.3.2.2"><mtext id="S2.E3.m1.3.3.1.1.3.2.2.cmml" mathsize="70%" xref="S2.E3.m1.3.3.1.1.3.2.2">non-negative</mtext></ci></apply><apply id="S2.E3.m1.3.3.1.1.3.3.cmml" xref="S2.E3.m1.3.3.1.1.3.3"><csymbol cd="ambiguous" id="S2.E3.m1.3.3.1.1.3.3.1.cmml" xref="S2.E3.m1.3.3.1.1.3.3">superscript</csymbol><apply id="S2.E3.m1.3.3.1.1.3.3.2.cmml" xref="S2.E3.m1.3.3.1.1.3.3"><csymbol cd="ambiguous" id="S2.E3.m1.3.3.1.1.3.3.2.1.cmml" xref="S2.E3.m1.3.3.1.1.3.3">subscript</csymbol><apply id="S2.E3.m1.3.3.1.1.3.3.2.2.cmml" xref="S2.E3.m1.3.3.1.1.3.3.2.2"><ci id="S2.E3.m1.3.3.1.1.3.3.2.2.1.cmml" xref="S2.E3.m1.3.3.1.1.3.3.2.2.1">^</ci><ci id="S2.E3.m1.3.3.1.1.3.3.2.2.2.cmml" xref="S2.E3.m1.3.3.1.1.3.3.2.2.2">𝒇</ci></apply><ci id="S2.E3.m1.3.3.1.1.3.3.2.3.cmml" xref="S2.E3.m1.3.3.1.1.3.3.2.3">𝑡</ci></apply><cn id="S2.E3.m1.3.3.1.1.3.3.3.cmml" type="integer" xref="S2.E3.m1.3.3.1.1.3.3.3">2</cn></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E3.m1.3c">\hat{\boldsymbol{g}}_{t}^{2}=\underbrace{\textrm{softmax}\left(\boldsymbol{y}_% {t}^{2}\right)\textrm{softmax}\left(\hat{\boldsymbol{f}}_{t}^{2}\right)^{\top}% }_{\textrm{non-negative}}\hat{\boldsymbol{f}}_{t}^{2}.</annotation><annotation encoding="application/x-llamapun" id="S2.E3.m1.3d">over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = under⏟ start_ARG softmax ( bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) softmax ( over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT non-negative end_POSTSUBSCRIPT over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(3)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S2.SS2.p3.6">Since <math alttext="\textrm{softmax}\left(\boldsymbol{y}_{t}^{2}\right)\textrm{softmax}\left(\hat{% \boldsymbol{f}}_{t}^{2}\right)^{\top}\geq 0" class="ltx_Math" display="inline" id="S2.SS2.p3.4.m1.2"><semantics id="S2.SS2.p3.4.m1.2a"><mrow id="S2.SS2.p3.4.m1.2.2" xref="S2.SS2.p3.4.m1.2.2.cmml"><mrow id="S2.SS2.p3.4.m1.2.2.2" xref="S2.SS2.p3.4.m1.2.2.2.cmml"><mtext id="S2.SS2.p3.4.m1.2.2.2.4" xref="S2.SS2.p3.4.m1.2.2.2.4a.cmml">softmax</mtext><mo id="S2.SS2.p3.4.m1.2.2.2.3" xref="S2.SS2.p3.4.m1.2.2.2.3.cmml"></mo><mrow id="S2.SS2.p3.4.m1.1.1.1.1.1" xref="S2.SS2.p3.4.m1.1.1.1.1.1.1.cmml"><mo id="S2.SS2.p3.4.m1.1.1.1.1.1.2" xref="S2.SS2.p3.4.m1.1.1.1.1.1.1.cmml">(</mo><msubsup id="S2.SS2.p3.4.m1.1.1.1.1.1.1" xref="S2.SS2.p3.4.m1.1.1.1.1.1.1.cmml"><mi id="S2.SS2.p3.4.m1.1.1.1.1.1.1.2.2" xref="S2.SS2.p3.4.m1.1.1.1.1.1.1.2.2.cmml">𝒚</mi><mi id="S2.SS2.p3.4.m1.1.1.1.1.1.1.2.3" xref="S2.SS2.p3.4.m1.1.1.1.1.1.1.2.3.cmml">t</mi><mn id="S2.SS2.p3.4.m1.1.1.1.1.1.1.3" xref="S2.SS2.p3.4.m1.1.1.1.1.1.1.3.cmml">2</mn></msubsup><mo id="S2.SS2.p3.4.m1.1.1.1.1.1.3" xref="S2.SS2.p3.4.m1.1.1.1.1.1.1.cmml">)</mo></mrow><mo id="S2.SS2.p3.4.m1.2.2.2.3a" xref="S2.SS2.p3.4.m1.2.2.2.3.cmml"></mo><mtext id="S2.SS2.p3.4.m1.2.2.2.5" xref="S2.SS2.p3.4.m1.2.2.2.5a.cmml">softmax</mtext><mo id="S2.SS2.p3.4.m1.2.2.2.3b" xref="S2.SS2.p3.4.m1.2.2.2.3.cmml"></mo><msup id="S2.SS2.p3.4.m1.2.2.2.2" xref="S2.SS2.p3.4.m1.2.2.2.2.cmml"><mrow id="S2.SS2.p3.4.m1.2.2.2.2.1.1" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.cmml"><mo id="S2.SS2.p3.4.m1.2.2.2.2.1.1.2" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.cmml">(</mo><msubsup id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.cmml"><mover accent="true" id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.2" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.2.cmml"><mi id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.2.2" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.2.2.cmml">𝒇</mi><mo id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.2.1" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.3" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.3.cmml">t</mi><mn id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.3" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.3.cmml">2</mn></msubsup><mo id="S2.SS2.p3.4.m1.2.2.2.2.1.1.3" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.cmml">)</mo></mrow><mo id="S2.SS2.p3.4.m1.2.2.2.2.3" xref="S2.SS2.p3.4.m1.2.2.2.2.3.cmml">⊤</mo></msup></mrow><mo id="S2.SS2.p3.4.m1.2.2.3" xref="S2.SS2.p3.4.m1.2.2.3.cmml">≥</mo><mn id="S2.SS2.p3.4.m1.2.2.4" xref="S2.SS2.p3.4.m1.2.2.4.cmml">0</mn></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.p3.4.m1.2b"><apply id="S2.SS2.p3.4.m1.2.2.cmml" xref="S2.SS2.p3.4.m1.2.2"><geq id="S2.SS2.p3.4.m1.2.2.3.cmml" xref="S2.SS2.p3.4.m1.2.2.3"></geq><apply id="S2.SS2.p3.4.m1.2.2.2.cmml" xref="S2.SS2.p3.4.m1.2.2.2"><times id="S2.SS2.p3.4.m1.2.2.2.3.cmml" xref="S2.SS2.p3.4.m1.2.2.2.3"></times><ci id="S2.SS2.p3.4.m1.2.2.2.4a.cmml" xref="S2.SS2.p3.4.m1.2.2.2.4"><mtext id="S2.SS2.p3.4.m1.2.2.2.4.cmml" xref="S2.SS2.p3.4.m1.2.2.2.4">softmax</mtext></ci><apply id="S2.SS2.p3.4.m1.1.1.1.1.1.1.cmml" xref="S2.SS2.p3.4.m1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p3.4.m1.1.1.1.1.1.1.1.cmml" xref="S2.SS2.p3.4.m1.1.1.1.1.1">superscript</csymbol><apply id="S2.SS2.p3.4.m1.1.1.1.1.1.1.2.cmml" xref="S2.SS2.p3.4.m1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p3.4.m1.1.1.1.1.1.1.2.1.cmml" xref="S2.SS2.p3.4.m1.1.1.1.1.1">subscript</csymbol><ci id="S2.SS2.p3.4.m1.1.1.1.1.1.1.2.2.cmml" xref="S2.SS2.p3.4.m1.1.1.1.1.1.1.2.2">𝒚</ci><ci id="S2.SS2.p3.4.m1.1.1.1.1.1.1.2.3.cmml" xref="S2.SS2.p3.4.m1.1.1.1.1.1.1.2.3">𝑡</ci></apply><cn id="S2.SS2.p3.4.m1.1.1.1.1.1.1.3.cmml" type="integer" xref="S2.SS2.p3.4.m1.1.1.1.1.1.1.3">2</cn></apply><ci id="S2.SS2.p3.4.m1.2.2.2.5a.cmml" xref="S2.SS2.p3.4.m1.2.2.2.5"><mtext id="S2.SS2.p3.4.m1.2.2.2.5.cmml" xref="S2.SS2.p3.4.m1.2.2.2.5">softmax</mtext></ci><apply id="S2.SS2.p3.4.m1.2.2.2.2.cmml" xref="S2.SS2.p3.4.m1.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS2.p3.4.m1.2.2.2.2.2.cmml" xref="S2.SS2.p3.4.m1.2.2.2.2">superscript</csymbol><apply id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.cmml" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1"><csymbol cd="ambiguous" id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.1.cmml" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1">superscript</csymbol><apply id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.cmml" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1"><csymbol cd="ambiguous" id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.1.cmml" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1">subscript</csymbol><apply id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.2.cmml" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.2"><ci id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.2.1.cmml" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.2.1">^</ci><ci id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.2.2.cmml" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.2.2">𝒇</ci></apply><ci id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.3.cmml" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.2.3">𝑡</ci></apply><cn id="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.3.cmml" type="integer" xref="S2.SS2.p3.4.m1.2.2.2.2.1.1.1.3">2</cn></apply><csymbol cd="latexml" id="S2.SS2.p3.4.m1.2.2.2.2.3.cmml" xref="S2.SS2.p3.4.m1.2.2.2.2.3">top</csymbol></apply></apply><cn id="S2.SS2.p3.4.m1.2.2.4.cmml" type="integer" xref="S2.SS2.p3.4.m1.2.2.4">0</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p3.4.m1.2c">\textrm{softmax}\left(\boldsymbol{y}_{t}^{2}\right)\textrm{softmax}\left(\hat{% \boldsymbol{f}}_{t}^{2}\right)^{\top}\geq 0</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p3.4.m1.2d">softmax ( bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) softmax ( over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ≥ 0</annotation></semantics></math>, which makes is can be used as a similarity metric. Larger values denotes higher similarity. In practice, <math alttext="\textrm{softmax}\left(\hat{\boldsymbol{f}}_{t}^{2}\right)^{\top}\hat{% \boldsymbol{f}}_{t}^{2}" class="ltx_Math" display="inline" id="S2.SS2.p3.5.m2.1"><semantics id="S2.SS2.p3.5.m2.1a"><mrow id="S2.SS2.p3.5.m2.1.1" xref="S2.SS2.p3.5.m2.1.1.cmml"><mtext id="S2.SS2.p3.5.m2.1.1.3" xref="S2.SS2.p3.5.m2.1.1.3a.cmml">softmax</mtext><mo id="S2.SS2.p3.5.m2.1.1.2" xref="S2.SS2.p3.5.m2.1.1.2.cmml"></mo><msup id="S2.SS2.p3.5.m2.1.1.1" xref="S2.SS2.p3.5.m2.1.1.1.cmml"><mrow id="S2.SS2.p3.5.m2.1.1.1.1.1" xref="S2.SS2.p3.5.m2.1.1.1.1.1.1.cmml"><mo id="S2.SS2.p3.5.m2.1.1.1.1.1.2" xref="S2.SS2.p3.5.m2.1.1.1.1.1.1.cmml">(</mo><msubsup id="S2.SS2.p3.5.m2.1.1.1.1.1.1" xref="S2.SS2.p3.5.m2.1.1.1.1.1.1.cmml"><mover accent="true" id="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.2" xref="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.2.cmml"><mi id="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.2.2" xref="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.2.2.cmml">𝒇</mi><mo id="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.2.1" xref="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.3" xref="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.3.cmml">t</mi><mn id="S2.SS2.p3.5.m2.1.1.1.1.1.1.3" xref="S2.SS2.p3.5.m2.1.1.1.1.1.1.3.cmml">2</mn></msubsup><mo id="S2.SS2.p3.5.m2.1.1.1.1.1.3" xref="S2.SS2.p3.5.m2.1.1.1.1.1.1.cmml">)</mo></mrow><mo id="S2.SS2.p3.5.m2.1.1.1.3" xref="S2.SS2.p3.5.m2.1.1.1.3.cmml">⊤</mo></msup><mo id="S2.SS2.p3.5.m2.1.1.2a" xref="S2.SS2.p3.5.m2.1.1.2.cmml"></mo><msubsup id="S2.SS2.p3.5.m2.1.1.4" xref="S2.SS2.p3.5.m2.1.1.4.cmml"><mover accent="true" id="S2.SS2.p3.5.m2.1.1.4.2.2" xref="S2.SS2.p3.5.m2.1.1.4.2.2.cmml"><mi id="S2.SS2.p3.5.m2.1.1.4.2.2.2" xref="S2.SS2.p3.5.m2.1.1.4.2.2.2.cmml">𝒇</mi><mo id="S2.SS2.p3.5.m2.1.1.4.2.2.1" xref="S2.SS2.p3.5.m2.1.1.4.2.2.1.cmml">^</mo></mover><mi id="S2.SS2.p3.5.m2.1.1.4.2.3" xref="S2.SS2.p3.5.m2.1.1.4.2.3.cmml">t</mi><mn id="S2.SS2.p3.5.m2.1.1.4.3" xref="S2.SS2.p3.5.m2.1.1.4.3.cmml">2</mn></msubsup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.p3.5.m2.1b"><apply id="S2.SS2.p3.5.m2.1.1.cmml" xref="S2.SS2.p3.5.m2.1.1"><times id="S2.SS2.p3.5.m2.1.1.2.cmml" xref="S2.SS2.p3.5.m2.1.1.2"></times><ci id="S2.SS2.p3.5.m2.1.1.3a.cmml" xref="S2.SS2.p3.5.m2.1.1.3"><mtext id="S2.SS2.p3.5.m2.1.1.3.cmml" xref="S2.SS2.p3.5.m2.1.1.3">softmax</mtext></ci><apply id="S2.SS2.p3.5.m2.1.1.1.cmml" xref="S2.SS2.p3.5.m2.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p3.5.m2.1.1.1.2.cmml" xref="S2.SS2.p3.5.m2.1.1.1">superscript</csymbol><apply id="S2.SS2.p3.5.m2.1.1.1.1.1.1.cmml" xref="S2.SS2.p3.5.m2.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p3.5.m2.1.1.1.1.1.1.1.cmml" xref="S2.SS2.p3.5.m2.1.1.1.1.1">superscript</csymbol><apply id="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.cmml" xref="S2.SS2.p3.5.m2.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.1.cmml" xref="S2.SS2.p3.5.m2.1.1.1.1.1">subscript</csymbol><apply id="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.2.cmml" xref="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.2"><ci id="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.2.1.cmml" xref="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.2.1">^</ci><ci id="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.2.2.cmml" xref="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.2.2">𝒇</ci></apply><ci id="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.3.cmml" xref="S2.SS2.p3.5.m2.1.1.1.1.1.1.2.3">𝑡</ci></apply><cn id="S2.SS2.p3.5.m2.1.1.1.1.1.1.3.cmml" type="integer" xref="S2.SS2.p3.5.m2.1.1.1.1.1.1.3">2</cn></apply><csymbol cd="latexml" id="S2.SS2.p3.5.m2.1.1.1.3.cmml" xref="S2.SS2.p3.5.m2.1.1.1.3">top</csymbol></apply><apply id="S2.SS2.p3.5.m2.1.1.4.cmml" xref="S2.SS2.p3.5.m2.1.1.4"><csymbol cd="ambiguous" id="S2.SS2.p3.5.m2.1.1.4.1.cmml" xref="S2.SS2.p3.5.m2.1.1.4">superscript</csymbol><apply id="S2.SS2.p3.5.m2.1.1.4.2.cmml" xref="S2.SS2.p3.5.m2.1.1.4"><csymbol cd="ambiguous" id="S2.SS2.p3.5.m2.1.1.4.2.1.cmml" xref="S2.SS2.p3.5.m2.1.1.4">subscript</csymbol><apply id="S2.SS2.p3.5.m2.1.1.4.2.2.cmml" xref="S2.SS2.p3.5.m2.1.1.4.2.2"><ci id="S2.SS2.p3.5.m2.1.1.4.2.2.1.cmml" xref="S2.SS2.p3.5.m2.1.1.4.2.2.1">^</ci><ci id="S2.SS2.p3.5.m2.1.1.4.2.2.2.cmml" xref="S2.SS2.p3.5.m2.1.1.4.2.2.2">𝒇</ci></apply><ci id="S2.SS2.p3.5.m2.1.1.4.2.3.cmml" xref="S2.SS2.p3.5.m2.1.1.4.2.3">𝑡</ci></apply><cn id="S2.SS2.p3.5.m2.1.1.4.3.cmml" type="integer" xref="S2.SS2.p3.5.m2.1.1.4.3">2</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p3.5.m2.1c">\textrm{softmax}\left(\hat{\boldsymbol{f}}_{t}^{2}\right)^{\top}\hat{% \boldsymbol{f}}_{t}^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p3.5.m2.1d">softmax ( over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math> is computed first, resulting in the <math alttext="O(C^{2}L)" class="ltx_Math" display="inline" id="S2.SS2.p3.6.m3.1"><semantics id="S2.SS2.p3.6.m3.1a"><mrow id="S2.SS2.p3.6.m3.1.1" xref="S2.SS2.p3.6.m3.1.1.cmml"><mi id="S2.SS2.p3.6.m3.1.1.3" xref="S2.SS2.p3.6.m3.1.1.3.cmml">O</mi><mo id="S2.SS2.p3.6.m3.1.1.2" xref="S2.SS2.p3.6.m3.1.1.2.cmml"></mo><mrow id="S2.SS2.p3.6.m3.1.1.1.1" xref="S2.SS2.p3.6.m3.1.1.1.1.1.cmml"><mo id="S2.SS2.p3.6.m3.1.1.1.1.2" stretchy="false" xref="S2.SS2.p3.6.m3.1.1.1.1.1.cmml">(</mo><mrow id="S2.SS2.p3.6.m3.1.1.1.1.1" xref="S2.SS2.p3.6.m3.1.1.1.1.1.cmml"><msup id="S2.SS2.p3.6.m3.1.1.1.1.1.2" xref="S2.SS2.p3.6.m3.1.1.1.1.1.2.cmml"><mi id="S2.SS2.p3.6.m3.1.1.1.1.1.2.2" xref="S2.SS2.p3.6.m3.1.1.1.1.1.2.2.cmml">C</mi><mn id="S2.SS2.p3.6.m3.1.1.1.1.1.2.3" xref="S2.SS2.p3.6.m3.1.1.1.1.1.2.3.cmml">2</mn></msup><mo id="S2.SS2.p3.6.m3.1.1.1.1.1.1" xref="S2.SS2.p3.6.m3.1.1.1.1.1.1.cmml"></mo><mi id="S2.SS2.p3.6.m3.1.1.1.1.1.3" xref="S2.SS2.p3.6.m3.1.1.1.1.1.3.cmml">L</mi></mrow><mo id="S2.SS2.p3.6.m3.1.1.1.1.3" stretchy="false" xref="S2.SS2.p3.6.m3.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.p3.6.m3.1b"><apply id="S2.SS2.p3.6.m3.1.1.cmml" xref="S2.SS2.p3.6.m3.1.1"><times id="S2.SS2.p3.6.m3.1.1.2.cmml" xref="S2.SS2.p3.6.m3.1.1.2"></times><ci id="S2.SS2.p3.6.m3.1.1.3.cmml" xref="S2.SS2.p3.6.m3.1.1.3">𝑂</ci><apply id="S2.SS2.p3.6.m3.1.1.1.1.1.cmml" xref="S2.SS2.p3.6.m3.1.1.1.1"><times id="S2.SS2.p3.6.m3.1.1.1.1.1.1.cmml" xref="S2.SS2.p3.6.m3.1.1.1.1.1.1"></times><apply id="S2.SS2.p3.6.m3.1.1.1.1.1.2.cmml" xref="S2.SS2.p3.6.m3.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S2.SS2.p3.6.m3.1.1.1.1.1.2.1.cmml" xref="S2.SS2.p3.6.m3.1.1.1.1.1.2">superscript</csymbol><ci id="S2.SS2.p3.6.m3.1.1.1.1.1.2.2.cmml" xref="S2.SS2.p3.6.m3.1.1.1.1.1.2.2">𝐶</ci><cn id="S2.SS2.p3.6.m3.1.1.1.1.1.2.3.cmml" type="integer" xref="S2.SS2.p3.6.m3.1.1.1.1.1.2.3">2</cn></apply><ci id="S2.SS2.p3.6.m3.1.1.1.1.1.3.cmml" xref="S2.SS2.p3.6.m3.1.1.1.1.1.3">𝐿</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p3.6.m3.1c">O(C^{2}L)</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p3.6.m3.1d">italic_O ( italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L )</annotation></semantics></math> complexity.</p> </div> <div class="ltx_para" id="S2.SS2.p4"> <p class="ltx_p" id="S2.SS2.p4.1">During decompression, the attention-based global compensation is also conducted. When conducting global motion Compensation, the overall process is</p> <table class="ltx_equation ltx_eqn_table" id="S2.E4"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\hat{\boldsymbol{x}}_{t}=D_{C}\left(\left\lfloor E_{C}(\boldsymbol{x}_{t}|\hat% {\boldsymbol{g}}_{t}^{0},\hat{\boldsymbol{g}}_{t}^{1},\hat{\boldsymbol{g}}_{t}% ^{2})\right\rceil|\hat{\boldsymbol{g}}_{t}^{0},\hat{\boldsymbol{g}}_{t}^{1},% \hat{\boldsymbol{g}}_{t}^{2}\right)." class="ltx_math_unparsed" display="block" id="S2.E4.m1.1"><semantics id="S2.E4.m1.1a"><mrow id="S2.E4.m1.1b"><msub id="S2.E4.m1.1.1"><mover accent="true" id="S2.E4.m1.1.1.2"><mi id="S2.E4.m1.1.1.2.2">𝒙</mi><mo id="S2.E4.m1.1.1.2.1">^</mo></mover><mi id="S2.E4.m1.1.1.3">t</mi></msub><mo id="S2.E4.m1.1.2">=</mo><msub id="S2.E4.m1.1.3"><mi id="S2.E4.m1.1.3.2">D</mi><mi id="S2.E4.m1.1.3.3">C</mi></msub><mrow id="S2.E4.m1.1.4"><mo id="S2.E4.m1.1.4.1">(</mo><mrow id="S2.E4.m1.1.4.2"><mo id="S2.E4.m1.1.4.2.1">⌊</mo><msub id="S2.E4.m1.1.4.2.2"><mi id="S2.E4.m1.1.4.2.2.2">E</mi><mi id="S2.E4.m1.1.4.2.2.3">C</mi></msub><mrow id="S2.E4.m1.1.4.2.3"><mo id="S2.E4.m1.1.4.2.3.1" stretchy="false">(</mo><msub id="S2.E4.m1.1.4.2.3.2"><mi id="S2.E4.m1.1.4.2.3.2.2">𝒙</mi><mi id="S2.E4.m1.1.4.2.3.2.3">t</mi></msub><mo fence="false" id="S2.E4.m1.1.4.2.3.3" rspace="0.167em" stretchy="false">|</mo><msubsup id="S2.E4.m1.1.4.2.3.4"><mover accent="true" id="S2.E4.m1.1.4.2.3.4.2.2"><mi id="S2.E4.m1.1.4.2.3.4.2.2.2">𝒈</mi><mo id="S2.E4.m1.1.4.2.3.4.2.2.1">^</mo></mover><mi id="S2.E4.m1.1.4.2.3.4.2.3">t</mi><mn id="S2.E4.m1.1.4.2.3.4.3">0</mn></msubsup><mo id="S2.E4.m1.1.4.2.3.5">,</mo><msubsup id="S2.E4.m1.1.4.2.3.6"><mover accent="true" id="S2.E4.m1.1.4.2.3.6.2.2"><mi id="S2.E4.m1.1.4.2.3.6.2.2.2">𝒈</mi><mo id="S2.E4.m1.1.4.2.3.6.2.2.1">^</mo></mover><mi id="S2.E4.m1.1.4.2.3.6.2.3">t</mi><mn id="S2.E4.m1.1.4.2.3.6.3">1</mn></msubsup><mo id="S2.E4.m1.1.4.2.3.7">,</mo><msubsup id="S2.E4.m1.1.4.2.3.8"><mover accent="true" id="S2.E4.m1.1.4.2.3.8.2.2"><mi id="S2.E4.m1.1.4.2.3.8.2.2.2">𝒈</mi><mo id="S2.E4.m1.1.4.2.3.8.2.2.1">^</mo></mover><mi id="S2.E4.m1.1.4.2.3.8.2.3">t</mi><mn id="S2.E4.m1.1.4.2.3.8.3">2</mn></msubsup><mo id="S2.E4.m1.1.4.2.3.9" stretchy="false">)</mo></mrow><mo id="S2.E4.m1.1.4.2.4">⌉</mo></mrow><mo fence="false" id="S2.E4.m1.1.4.3" rspace="0.167em" stretchy="false">|</mo><msubsup id="S2.E4.m1.1.4.4"><mover accent="true" id="S2.E4.m1.1.4.4.2.2"><mi id="S2.E4.m1.1.4.4.2.2.2">𝒈</mi><mo id="S2.E4.m1.1.4.4.2.2.1">^</mo></mover><mi id="S2.E4.m1.1.4.4.2.3">t</mi><mn id="S2.E4.m1.1.4.4.3">0</mn></msubsup><mo id="S2.E4.m1.1.4.5">,</mo><msubsup id="S2.E4.m1.1.4.6"><mover accent="true" id="S2.E4.m1.1.4.6.2.2"><mi id="S2.E4.m1.1.4.6.2.2.2">𝒈</mi><mo id="S2.E4.m1.1.4.6.2.2.1">^</mo></mover><mi id="S2.E4.m1.1.4.6.2.3">t</mi><mn id="S2.E4.m1.1.4.6.3">1</mn></msubsup><mo id="S2.E4.m1.1.4.7">,</mo><msubsup id="S2.E4.m1.1.4.8"><mover accent="true" id="S2.E4.m1.1.4.8.2.2"><mi id="S2.E4.m1.1.4.8.2.2.2">𝒈</mi><mo id="S2.E4.m1.1.4.8.2.2.1">^</mo></mover><mi id="S2.E4.m1.1.4.8.2.3">t</mi><mn id="S2.E4.m1.1.4.8.3">2</mn></msubsup><mo id="S2.E4.m1.1.4.9">)</mo></mrow><mo id="S2.E4.m1.1.5" lspace="0em">.</mo></mrow><annotation encoding="application/x-tex" id="S2.E4.m1.1c">\hat{\boldsymbol{x}}_{t}=D_{C}\left(\left\lfloor E_{C}(\boldsymbol{x}_{t}|\hat% {\boldsymbol{g}}_{t}^{0},\hat{\boldsymbol{g}}_{t}^{1},\hat{\boldsymbol{g}}_{t}% ^{2})\right\rceil|\hat{\boldsymbol{g}}_{t}^{0},\hat{\boldsymbol{g}}_{t}^{1},% \hat{\boldsymbol{g}}_{t}^{2}\right).</annotation><annotation encoding="application/x-llamapun" id="S2.E4.m1.1d">over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( ⌊ italic_E start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ⌉ | over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(4)</span></td> </tr></tbody> </table> </div> <figure class="ltx_figure" id="S2.F5"> <div class="ltx_flex_figure"> <div class="ltx_flex_cell ltx_flex_size_3"> <figure class="ltx_figure ltx_flex_size_3 ltx_align_center" id="S2.F5.sf1"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="170" id="S2.F5.sf1.g1" src="x9.png" width="203"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">(a) </span></figcaption> </figure> </div> <div class="ltx_flex_cell ltx_flex_size_3"> <figure class="ltx_figure ltx_flex_size_3 ltx_align_center" id="S2.F5.sf2"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="170" id="S2.F5.sf2.g1" src="x10.png" width="204"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">(b) </span></figcaption> </figure> </div> <div class="ltx_flex_cell ltx_flex_size_3"> <figure class="ltx_figure ltx_flex_size_3 ltx_align_center" id="S2.F5.sf3"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="170" id="S2.F5.sf3.g1" src="x11.png" width="211"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">(c) </span></figcaption> </figure> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_2"> <figure class="ltx_figure ltx_flex_size_2 ltx_align_center" id="S2.F5.sf4"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="170" id="S2.F5.sf4.g1" src="x12.png" width="211"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">(d) </span></figcaption> </figure> </div> <div class="ltx_flex_cell ltx_flex_size_2"> <figure class="ltx_figure ltx_flex_size_2 ltx_align_center" id="S2.F5.sf5"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="170" id="S2.F5.sf5.g1" src="x13.png" width="203"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">(e) </span></figcaption> </figure> </div> </div> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text ltx_font_bold" id="S2.F5.2.1.1">Fig. 5</span>: </span>Illustration of rate-distortion performance of the proposed LVC-LGMC, DCVC-TCM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib9" title="">9</a>]</cite>, DCVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib2" title="">2</a>]</cite>, DVCPro <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib15" title="">15</a>]</cite>, HM-16.20 and x265 codec. The distortion metric is MS-SSIM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib17" title="">17</a>]</cite>. Please zoom in for better view. </figcaption> </figure> <figure class="ltx_table" id="S2.T1"> <table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle" id="S2.T1.2"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S2.T1.2.1.1"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_tt" id="S2.T1.2.1.1.1" rowspan="2"> <span class="ltx_text" id="S2.T1.2.1.1.1.1" style="font-size:80%;">Methods</span><span class="ltx_text" id="S2.T1.2.1.1.1.2" style="font-size:80%;"></span> </th> <td class="ltx_td ltx_align_center ltx_border_tt" colspan="2" id="S2.T1.2.1.1.2"> <span class="ltx_text" id="S2.T1.2.1.1.2.1" style="font-size:80%;">MCL-JCV </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text" id="S2.T1.2.1.1.2.2.1" style="font-size:80%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib11" title="">11</a><span class="ltx_text" id="S2.T1.2.1.1.2.3.2" style="font-size:80%;">]</span></cite><span class="ltx_text" id="S2.T1.2.1.1.2.4" style="font-size:80%;"></span> </td> <td class="ltx_td ltx_align_center ltx_border_tt" colspan="2" id="S2.T1.2.1.1.3"> <span class="ltx_text" id="S2.T1.2.1.1.3.1" style="font-size:80%;">UVG </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text" id="S2.T1.2.1.1.3.2.1" style="font-size:80%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib18" title="">18</a><span class="ltx_text" id="S2.T1.2.1.1.3.3.2" style="font-size:80%;">]</span></cite><span class="ltx_text" id="S2.T1.2.1.1.3.4" style="font-size:80%;"></span> </td> <td class="ltx_td ltx_align_center ltx_border_tt" colspan="2" id="S2.T1.2.1.1.4"><span class="ltx_text" id="S2.T1.2.1.1.4.1" style="font-size:80%;">HEVC B</span></td> <td class="ltx_td ltx_align_center ltx_border_tt" colspan="2" id="S2.T1.2.1.1.5"><span class="ltx_text" id="S2.T1.2.1.1.5.1" style="font-size:80%;">HEVC C</span></td> <td class="ltx_td ltx_align_center ltx_border_tt" colspan="2" id="S2.T1.2.1.1.6"><span class="ltx_text" id="S2.T1.2.1.1.6.1" style="font-size:80%;">HEVC D</span></td> </tr> <tr class="ltx_tr" id="S2.T1.2.2.2"> <td class="ltx_td ltx_align_center" id="S2.T1.2.2.2.1"><span class="ltx_text" id="S2.T1.2.2.2.1.1" style="font-size:80%;">PSNR</span></td> <td class="ltx_td ltx_align_center" id="S2.T1.2.2.2.2"><span class="ltx_text" id="S2.T1.2.2.2.2.1" style="font-size:80%;">MS-SSIM</span></td> <td class="ltx_td ltx_align_center" id="S2.T1.2.2.2.3"><span class="ltx_text" id="S2.T1.2.2.2.3.1" style="font-size:80%;">PSNR</span></td> <td class="ltx_td ltx_align_center" id="S2.T1.2.2.2.4"><span class="ltx_text" id="S2.T1.2.2.2.4.1" style="font-size:80%;">MS-SSIM</span></td> <td class="ltx_td ltx_align_center" id="S2.T1.2.2.2.5"><span class="ltx_text" id="S2.T1.2.2.2.5.1" style="font-size:80%;">PSNR</span></td> <td class="ltx_td ltx_align_center" id="S2.T1.2.2.2.6"><span class="ltx_text" id="S2.T1.2.2.2.6.1" style="font-size:80%;">MS-SSIM</span></td> <td class="ltx_td ltx_align_center" id="S2.T1.2.2.2.7"><span class="ltx_text" id="S2.T1.2.2.2.7.1" style="font-size:80%;">PSNR</span></td> <td class="ltx_td ltx_align_center" id="S2.T1.2.2.2.8"><span class="ltx_text" id="S2.T1.2.2.2.8.1" style="font-size:80%;">MS-SSIM</span></td> <td class="ltx_td ltx_align_center" id="S2.T1.2.2.2.9"><span class="ltx_text" id="S2.T1.2.2.2.9.1" style="font-size:80%;">PSNR</span></td> <td class="ltx_td ltx_align_center" id="S2.T1.2.2.2.10"><span class="ltx_text" id="S2.T1.2.2.2.10.1" style="font-size:80%;">MS-SSIM</span></td> </tr> <tr class="ltx_tr" id="S2.T1.2.3.3"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S2.T1.2.3.3.1"><span class="ltx_text" id="S2.T1.2.3.3.1.1" style="font-size:80%;">HM-16.20</span></th> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.3.3.2"><span class="ltx_text" id="S2.T1.2.3.3.2.1" style="font-size:80%;">0.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.3.3.3"><span class="ltx_text" id="S2.T1.2.3.3.3.1" style="font-size:80%;">0.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.3.3.4"><span class="ltx_text" id="S2.T1.2.3.3.4.1" style="font-size:80%;">0.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.3.3.5"><span class="ltx_text" id="S2.T1.2.3.3.5.1" style="font-size:80%;">0.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.3.3.6"><span class="ltx_text" id="S2.T1.2.3.3.6.1" style="font-size:80%;">0.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.3.3.7"><span class="ltx_text" id="S2.T1.2.3.3.7.1" style="font-size:80%;">0.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.3.3.8"><span class="ltx_text" id="S2.T1.2.3.3.8.1" style="font-size:80%;">0.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.3.3.9"><span class="ltx_text" id="S2.T1.2.3.3.9.1" style="font-size:80%;">0.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.3.3.10"><span class="ltx_text" id="S2.T1.2.3.3.10.1" style="font-size:80%;">0.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.3.3.11"><span class="ltx_text" id="S2.T1.2.3.3.11.1" style="font-size:80%;">0.0</span></td> </tr> <tr class="ltx_tr" id="S2.T1.2.4.4"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S2.T1.2.4.4.1"> <span class="ltx_text" id="S2.T1.2.4.4.1.1" style="font-size:80%;">DVCPro </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text" id="S2.T1.2.4.4.1.2.1" style="font-size:80%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib15" title="">15</a><span class="ltx_text" id="S2.T1.2.4.4.1.3.2" style="font-size:80%;">]</span></cite><span class="ltx_text" id="S2.T1.2.4.4.1.4" style="font-size:80%;"> (TPAMI’20)</span> </th> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.4.4.2"><span class="ltx_text" id="S2.T1.2.4.4.2.1" style="font-size:80%;">99.3</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.4.4.3"><span class="ltx_text" id="S2.T1.2.4.4.3.1" style="font-size:80%;">7.8</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.4.4.4"><span class="ltx_text" id="S2.T1.2.4.4.4.1" style="font-size:80%;">137.7</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.4.4.5"><span class="ltx_text" id="S2.T1.2.4.4.5.1" style="font-size:80%;">36.2</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.4.4.6"><span class="ltx_text" id="S2.T1.2.4.4.6.1" style="font-size:80%;">123.7</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.4.4.7"><span class="ltx_text" id="S2.T1.2.4.4.7.1" style="font-size:80%;">23.5</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.4.4.8"><span class="ltx_text" id="S2.T1.2.4.4.8.1" style="font-size:80%;">124.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.4.4.9"><span class="ltx_text" id="S2.T1.2.4.4.9.1" style="font-size:80%;">17.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.4.4.10"><span class="ltx_text" id="S2.T1.2.4.4.10.1" style="font-size:80%;">93.6</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.4.4.11"><span class="ltx_text" id="S2.T1.2.4.4.11.1" style="font-size:80%;">-7.8</span></td> </tr> <tr class="ltx_tr" id="S2.T1.2.5.5"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S2.T1.2.5.5.1"> <span class="ltx_text" id="S2.T1.2.5.5.1.1" style="font-size:80%;">RLVC </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text" id="S2.T1.2.5.5.1.2.1" style="font-size:80%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib19" title="">19</a><span class="ltx_text" id="S2.T1.2.5.5.1.3.2" style="font-size:80%;">]</span></cite><span class="ltx_text" id="S2.T1.2.5.5.1.4" style="font-size:80%;"> (CVPR’20)</span> </th> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.5.5.2"><span class="ltx_text" id="S2.T1.2.5.5.2.1" style="font-size:80%;">124.8</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.5.5.3"><span class="ltx_text" id="S2.T1.2.5.5.3.1" style="font-size:80%;">34.5</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.5.5.4"><span class="ltx_text" id="S2.T1.2.5.5.4.1" style="font-size:80%;">140.1</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.5.5.5"><span class="ltx_text" id="S2.T1.2.5.5.5.1" style="font-size:80%;">49.4</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.5.5.6"><span class="ltx_text" id="S2.T1.2.5.5.6.1" style="font-size:80%;">122.6</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.5.5.7"><span class="ltx_text" id="S2.T1.2.5.5.7.1" style="font-size:80%;">28.3</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.5.5.8"><span class="ltx_text" id="S2.T1.2.5.5.8.1" style="font-size:80%;">118.9</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.5.5.9"><span class="ltx_text" id="S2.T1.2.5.5.9.1" style="font-size:80%;">30.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.5.5.10"><span class="ltx_text" id="S2.T1.2.5.5.10.1" style="font-size:80%;">81.2</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.5.5.11"><span class="ltx_text" id="S2.T1.2.5.5.11.1" style="font-size:80%;">0.2</span></td> </tr> <tr class="ltx_tr" id="S2.T1.2.6.6"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S2.T1.2.6.6.1"> <span class="ltx_text" id="S2.T1.2.6.6.1.1" style="font-size:80%;">MLVC </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text" id="S2.T1.2.6.6.1.2.1" style="font-size:80%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib20" title="">20</a><span class="ltx_text" id="S2.T1.2.6.6.1.3.2" style="font-size:80%;">]</span></cite><span class="ltx_text" id="S2.T1.2.6.6.1.4" style="font-size:80%;"> (CVPR’20)</span> </th> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.6.6.2"><span class="ltx_text" id="S2.T1.2.6.6.2.1" style="font-size:80%;">66.8</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.6.6.3"><span class="ltx_text" id="S2.T1.2.6.6.3.1" style="font-size:80%;">50.3</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.6.6.4"><span class="ltx_text" id="S2.T1.2.6.6.4.1" style="font-size:80%;">66.5</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.6.6.5"><span class="ltx_text" id="S2.T1.2.6.6.5.1" style="font-size:80%;">64.7</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.6.6.6"><span class="ltx_text" id="S2.T1.2.6.6.6.1" style="font-size:80%;">61.4</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.6.6.7"><span class="ltx_text" id="S2.T1.2.6.6.7.1" style="font-size:80%;">50.2</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.6.6.8"><span class="ltx_text" id="S2.T1.2.6.6.8.1" style="font-size:80%;">124.1</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.6.6.9"><span class="ltx_text" id="S2.T1.2.6.6.9.1" style="font-size:80%;">53.1</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.6.6.10"><span class="ltx_text" id="S2.T1.2.6.6.10.1" style="font-size:80%;">96.1</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.6.6.11"><span class="ltx_text" id="S2.T1.2.6.6.11.1" style="font-size:80%;">40.4</span></td> </tr> <tr class="ltx_tr" id="S2.T1.2.7.7"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S2.T1.2.7.7.1"> <span class="ltx_text" id="S2.T1.2.7.7.1.1" style="font-size:80%;">DCVC </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text" id="S2.T1.2.7.7.1.2.1" style="font-size:80%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib2" title="">2</a><span class="ltx_text" id="S2.T1.2.7.7.1.3.2" style="font-size:80%;">]</span></cite><span class="ltx_text" id="S2.T1.2.7.7.1.4" style="font-size:80%;"> (NeurIPS’21)</span> </th> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.7.7.2"><span class="ltx_text" id="S2.T1.2.7.7.2.1" style="font-size:80%;">42.8</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.7.7.3"><span class="ltx_text" id="S2.T1.2.7.7.3.1" style="font-size:80%;">-16.3</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.7.7.4"><span class="ltx_text" id="S2.T1.2.7.7.4.1" style="font-size:80%;">67.3</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.7.7.5"><span class="ltx_text" id="S2.T1.2.7.7.5.1" style="font-size:80%;">9.2</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.7.7.6"><span class="ltx_text" id="S2.T1.2.7.7.6.1" style="font-size:80%;">56.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.7.7.7"><span class="ltx_text" id="S2.T1.2.7.7.7.1" style="font-size:80%;">0.9</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.7.7.8"><span class="ltx_text" id="S2.T1.2.7.7.8.1" style="font-size:80%;">76.9</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.7.7.9"><span class="ltx_text" id="S2.T1.2.7.7.9.1" style="font-size:80%;">-8.9</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.7.7.10"><span class="ltx_text" id="S2.T1.2.7.7.10.1" style="font-size:80%;">52.8</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.7.7.11"><span class="ltx_text" id="S2.T1.2.7.7.11.1" style="font-size:80%;">-24.2</span></td> </tr> <tr class="ltx_tr" id="S2.T1.2.8.8"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S2.T1.2.8.8.1"> <span class="ltx_text" id="S2.T1.2.8.8.1.1" style="font-size:80%;">CANF-VC </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text" id="S2.T1.2.8.8.1.2.1" style="font-size:80%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib21" title="">21</a><span class="ltx_text" id="S2.T1.2.8.8.1.3.2" style="font-size:80%;">]</span></cite><span class="ltx_text" id="S2.T1.2.8.8.1.4" style="font-size:80%;"> (ECCV’22)</span> </th> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.8.8.2"><span class="ltx_text" id="S2.T1.2.8.8.2.1" style="font-size:80%;">8.5</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.8.8.3"><span class="ltx_text" id="S2.T1.2.8.8.3.1" style="font-size:80%;">-21.9</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.8.8.4"><span class="ltx_text" id="S2.T1.2.8.8.4.1" style="font-size:80%;">6.2</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.8.8.5"><span class="ltx_text" id="S2.T1.2.8.8.5.1" style="font-size:80%;">-4.4</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.8.8.6"><span class="ltx_text" id="S2.T1.2.8.8.6.1" style="font-size:80%;">9.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.8.8.7"><span class="ltx_text" id="S2.T1.2.8.8.7.1" style="font-size:80%;">-6.8</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.8.8.8"><span class="ltx_text" id="S2.T1.2.8.8.8.1" style="font-size:80%;">21.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.8.8.9"><span class="ltx_text" id="S2.T1.2.8.8.9.1" style="font-size:80%;">-9.4</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.8.8.10"><span class="ltx_text" id="S2.T1.2.8.8.10.1" style="font-size:80%;">12.5</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.8.8.11"><span class="ltx_text" id="S2.T1.2.8.8.11.1" style="font-size:80%;">-18.1</span></td> </tr> <tr class="ltx_tr" id="S2.T1.2.9.9"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S2.T1.2.9.9.1"> <span class="ltx_text" id="S2.T1.2.9.9.1.1" style="font-size:80%;">DCVC-TCM </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text" id="S2.T1.2.9.9.1.2.1" style="font-size:80%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib9" title="">9</a><span class="ltx_text" id="S2.T1.2.9.9.1.3.2" style="font-size:80%;">]</span></cite><span class="ltx_text" id="S2.T1.2.9.9.1.4" style="font-size:80%;"> (TMM’22)</span> </th> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.9.9.2"><span class="ltx_text" id="S2.T1.2.9.9.2.1" style="font-size:80%;">-3.2</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.9.9.3"><span class="ltx_text" id="S2.T1.2.9.9.3.1" style="font-size:80%;">-38.3</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.9.9.4"><span class="ltx_text" id="S2.T1.2.9.9.4.1" style="font-size:80%;">-9.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.9.9.5"><span class="ltx_text" id="S2.T1.2.9.9.5.1" style="font-size:80%;">-25.5</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.9.9.6"><span class="ltx_text" id="S2.T1.2.9.9.6.1" style="font-size:80%;">-5.3</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.9.9.7"><span class="ltx_text" id="S2.T1.2.9.9.7.1" style="font-size:80%;">-40.8</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.9.9.8"><span class="ltx_text" id="S2.T1.2.9.9.8.1" style="font-size:80%;">15.1</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.9.9.9"><span class="ltx_text" id="S2.T1.2.9.9.9.1" style="font-size:80%;">-42.4</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.9.9.10"><span class="ltx_text" id="S2.T1.2.9.9.10.1" style="font-size:80%;">-5.4</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.2.9.9.11"><span class="ltx_text" id="S2.T1.2.9.9.11.1" style="font-size:80%;">-52.6</span></td> </tr> <tr class="ltx_tr" id="S2.T1.2.10.10"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_bb ltx_border_r ltx_border_t" id="S2.T1.2.10.10.1"><span class="ltx_text" id="S2.T1.2.10.10.1.1" style="font-size:80%;">LVC-LGMC (Ours)</span></th> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S2.T1.2.10.10.2"><span class="ltx_text ltx_font_bold" id="S2.T1.2.10.10.2.1" style="font-size:80%;">-13.0</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S2.T1.2.10.10.3"><span class="ltx_text ltx_font_bold" id="S2.T1.2.10.10.3.1" style="font-size:80%;">-44.3</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S2.T1.2.10.10.4"><span class="ltx_text ltx_font_bold" id="S2.T1.2.10.10.4.1" style="font-size:80%;">-13.2</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S2.T1.2.10.10.5"><span class="ltx_text ltx_font_bold" id="S2.T1.2.10.10.5.1" style="font-size:80%;">-32.4</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S2.T1.2.10.10.6"><span class="ltx_text ltx_font_bold" id="S2.T1.2.10.10.6.1" style="font-size:80%;">-13.1</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S2.T1.2.10.10.7"><span class="ltx_text ltx_font_bold" id="S2.T1.2.10.10.7.1" style="font-size:80%;">-44.8</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S2.T1.2.10.10.8"><span class="ltx_text ltx_font_bold" id="S2.T1.2.10.10.8.1" style="font-size:80%;">-5.3</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S2.T1.2.10.10.9"><span class="ltx_text ltx_font_bold" id="S2.T1.2.10.10.9.1" style="font-size:80%;">-48.4</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S2.T1.2.10.10.10"><span class="ltx_text ltx_font_bold" id="S2.T1.2.10.10.10.1" style="font-size:80%;">-19.1</span></td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S2.T1.2.10.10.11"><span class="ltx_text ltx_font_bold" id="S2.T1.2.10.10.11.1" style="font-size:80%;">-56.3</span></td> </tr> </tbody> </table> <figcaption class="ltx_caption ltx_centering" style="font-size:80%;"><span class="ltx_tag ltx_tag_table"><span class="ltx_text ltx_font_bold" id="S2.T1.8.1.1">Table 1</span>: </span>BD-rate (%) for PSNR and MS-SSIM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib17" title="">17</a>]</cite>. </figcaption> </figure> </section> <section class="ltx_subsection" id="S2.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">2.3 </span>Joint Local and Global Motion Compensation for Learned Video Compression</h3> <div class="ltx_para" id="S2.SS3.p1"> <p class="ltx_p" id="S2.SS3.p1.2">Our proposed joint local and global motion compensation (LGMC) is illustrated in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S2.F2" title="Figure 2 ‣ 2.2 Attention-based Global Compensation ‣ 2 The Proposed LVC-LGMC Method ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_tag">2</span></a> and Fig. <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S2.F3" title="Figure 3 ‣ 2.2 Attention-based Global Compensation ‣ 2 The Proposed LVC-LGMC Method ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_tag">3</span></a>. Flow is employed to warp to obtain local contexts <math alttext="\hat{\boldsymbol{l}}_{t}^{0},\hat{\boldsymbol{l}}_{t}^{1},\hat{\boldsymbol{l}}% _{t}^{2}" class="ltx_Math" display="inline" id="S2.SS3.p1.1.m1.3"><semantics id="S2.SS3.p1.1.m1.3a"><mrow id="S2.SS3.p1.1.m1.3.3.3" xref="S2.SS3.p1.1.m1.3.3.4.cmml"><msubsup id="S2.SS3.p1.1.m1.1.1.1.1" xref="S2.SS3.p1.1.m1.1.1.1.1.cmml"><mover accent="true" id="S2.SS3.p1.1.m1.1.1.1.1.2.2" xref="S2.SS3.p1.1.m1.1.1.1.1.2.2.cmml"><mi id="S2.SS3.p1.1.m1.1.1.1.1.2.2.2" xref="S2.SS3.p1.1.m1.1.1.1.1.2.2.2.cmml">𝒍</mi><mo id="S2.SS3.p1.1.m1.1.1.1.1.2.2.1" xref="S2.SS3.p1.1.m1.1.1.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.SS3.p1.1.m1.1.1.1.1.2.3" xref="S2.SS3.p1.1.m1.1.1.1.1.2.3.cmml">t</mi><mn id="S2.SS3.p1.1.m1.1.1.1.1.3" xref="S2.SS3.p1.1.m1.1.1.1.1.3.cmml">0</mn></msubsup><mo id="S2.SS3.p1.1.m1.3.3.3.4" xref="S2.SS3.p1.1.m1.3.3.4.cmml">,</mo><msubsup id="S2.SS3.p1.1.m1.2.2.2.2" xref="S2.SS3.p1.1.m1.2.2.2.2.cmml"><mover accent="true" id="S2.SS3.p1.1.m1.2.2.2.2.2.2" xref="S2.SS3.p1.1.m1.2.2.2.2.2.2.cmml"><mi id="S2.SS3.p1.1.m1.2.2.2.2.2.2.2" xref="S2.SS3.p1.1.m1.2.2.2.2.2.2.2.cmml">𝒍</mi><mo id="S2.SS3.p1.1.m1.2.2.2.2.2.2.1" xref="S2.SS3.p1.1.m1.2.2.2.2.2.2.1.cmml">^</mo></mover><mi id="S2.SS3.p1.1.m1.2.2.2.2.2.3" xref="S2.SS3.p1.1.m1.2.2.2.2.2.3.cmml">t</mi><mn id="S2.SS3.p1.1.m1.2.2.2.2.3" xref="S2.SS3.p1.1.m1.2.2.2.2.3.cmml">1</mn></msubsup><mo id="S2.SS3.p1.1.m1.3.3.3.5" xref="S2.SS3.p1.1.m1.3.3.4.cmml">,</mo><msubsup id="S2.SS3.p1.1.m1.3.3.3.3" xref="S2.SS3.p1.1.m1.3.3.3.3.cmml"><mover accent="true" id="S2.SS3.p1.1.m1.3.3.3.3.2.2" xref="S2.SS3.p1.1.m1.3.3.3.3.2.2.cmml"><mi id="S2.SS3.p1.1.m1.3.3.3.3.2.2.2" xref="S2.SS3.p1.1.m1.3.3.3.3.2.2.2.cmml">𝒍</mi><mo id="S2.SS3.p1.1.m1.3.3.3.3.2.2.1" xref="S2.SS3.p1.1.m1.3.3.3.3.2.2.1.cmml">^</mo></mover><mi id="S2.SS3.p1.1.m1.3.3.3.3.2.3" xref="S2.SS3.p1.1.m1.3.3.3.3.2.3.cmml">t</mi><mn id="S2.SS3.p1.1.m1.3.3.3.3.3" xref="S2.SS3.p1.1.m1.3.3.3.3.3.cmml">2</mn></msubsup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.1.m1.3b"><list id="S2.SS3.p1.1.m1.3.3.4.cmml" xref="S2.SS3.p1.1.m1.3.3.3"><apply id="S2.SS3.p1.1.m1.1.1.1.1.cmml" xref="S2.SS3.p1.1.m1.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS3.p1.1.m1.1.1.1.1.1.cmml" xref="S2.SS3.p1.1.m1.1.1.1.1">superscript</csymbol><apply id="S2.SS3.p1.1.m1.1.1.1.1.2.cmml" xref="S2.SS3.p1.1.m1.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS3.p1.1.m1.1.1.1.1.2.1.cmml" xref="S2.SS3.p1.1.m1.1.1.1.1">subscript</csymbol><apply id="S2.SS3.p1.1.m1.1.1.1.1.2.2.cmml" xref="S2.SS3.p1.1.m1.1.1.1.1.2.2"><ci id="S2.SS3.p1.1.m1.1.1.1.1.2.2.1.cmml" xref="S2.SS3.p1.1.m1.1.1.1.1.2.2.1">^</ci><ci id="S2.SS3.p1.1.m1.1.1.1.1.2.2.2.cmml" xref="S2.SS3.p1.1.m1.1.1.1.1.2.2.2">𝒍</ci></apply><ci id="S2.SS3.p1.1.m1.1.1.1.1.2.3.cmml" xref="S2.SS3.p1.1.m1.1.1.1.1.2.3">𝑡</ci></apply><cn id="S2.SS3.p1.1.m1.1.1.1.1.3.cmml" type="integer" xref="S2.SS3.p1.1.m1.1.1.1.1.3">0</cn></apply><apply id="S2.SS3.p1.1.m1.2.2.2.2.cmml" xref="S2.SS3.p1.1.m1.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS3.p1.1.m1.2.2.2.2.1.cmml" xref="S2.SS3.p1.1.m1.2.2.2.2">superscript</csymbol><apply id="S2.SS3.p1.1.m1.2.2.2.2.2.cmml" xref="S2.SS3.p1.1.m1.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS3.p1.1.m1.2.2.2.2.2.1.cmml" xref="S2.SS3.p1.1.m1.2.2.2.2">subscript</csymbol><apply id="S2.SS3.p1.1.m1.2.2.2.2.2.2.cmml" xref="S2.SS3.p1.1.m1.2.2.2.2.2.2"><ci id="S2.SS3.p1.1.m1.2.2.2.2.2.2.1.cmml" xref="S2.SS3.p1.1.m1.2.2.2.2.2.2.1">^</ci><ci id="S2.SS3.p1.1.m1.2.2.2.2.2.2.2.cmml" xref="S2.SS3.p1.1.m1.2.2.2.2.2.2.2">𝒍</ci></apply><ci id="S2.SS3.p1.1.m1.2.2.2.2.2.3.cmml" xref="S2.SS3.p1.1.m1.2.2.2.2.2.3">𝑡</ci></apply><cn id="S2.SS3.p1.1.m1.2.2.2.2.3.cmml" type="integer" xref="S2.SS3.p1.1.m1.2.2.2.2.3">1</cn></apply><apply id="S2.SS3.p1.1.m1.3.3.3.3.cmml" xref="S2.SS3.p1.1.m1.3.3.3.3"><csymbol cd="ambiguous" id="S2.SS3.p1.1.m1.3.3.3.3.1.cmml" xref="S2.SS3.p1.1.m1.3.3.3.3">superscript</csymbol><apply id="S2.SS3.p1.1.m1.3.3.3.3.2.cmml" xref="S2.SS3.p1.1.m1.3.3.3.3"><csymbol cd="ambiguous" id="S2.SS3.p1.1.m1.3.3.3.3.2.1.cmml" xref="S2.SS3.p1.1.m1.3.3.3.3">subscript</csymbol><apply id="S2.SS3.p1.1.m1.3.3.3.3.2.2.cmml" xref="S2.SS3.p1.1.m1.3.3.3.3.2.2"><ci id="S2.SS3.p1.1.m1.3.3.3.3.2.2.1.cmml" xref="S2.SS3.p1.1.m1.3.3.3.3.2.2.1">^</ci><ci id="S2.SS3.p1.1.m1.3.3.3.3.2.2.2.cmml" xref="S2.SS3.p1.1.m1.3.3.3.3.2.2.2">𝒍</ci></apply><ci id="S2.SS3.p1.1.m1.3.3.3.3.2.3.cmml" xref="S2.SS3.p1.1.m1.3.3.3.3.2.3">𝑡</ci></apply><cn id="S2.SS3.p1.1.m1.3.3.3.3.3.cmml" type="integer" xref="S2.SS3.p1.1.m1.3.3.3.3.3">2</cn></apply></list></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.1.m1.3c">\hat{\boldsymbol{l}}_{t}^{0},\hat{\boldsymbol{l}}_{t}^{1},\hat{\boldsymbol{l}}% _{t}^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.1.m1.3d">over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math> and attention is used to obtain global contexts <math alttext="\hat{\boldsymbol{g}}_{t}^{0},\hat{\boldsymbol{g}}_{t}^{1},\hat{\boldsymbol{g}}% _{t}^{2}" class="ltx_Math" display="inline" id="S2.SS3.p1.2.m2.3"><semantics id="S2.SS3.p1.2.m2.3a"><mrow id="S2.SS3.p1.2.m2.3.3.3" xref="S2.SS3.p1.2.m2.3.3.4.cmml"><msubsup id="S2.SS3.p1.2.m2.1.1.1.1" xref="S2.SS3.p1.2.m2.1.1.1.1.cmml"><mover accent="true" id="S2.SS3.p1.2.m2.1.1.1.1.2.2" xref="S2.SS3.p1.2.m2.1.1.1.1.2.2.cmml"><mi id="S2.SS3.p1.2.m2.1.1.1.1.2.2.2" xref="S2.SS3.p1.2.m2.1.1.1.1.2.2.2.cmml">𝒈</mi><mo id="S2.SS3.p1.2.m2.1.1.1.1.2.2.1" xref="S2.SS3.p1.2.m2.1.1.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.SS3.p1.2.m2.1.1.1.1.2.3" xref="S2.SS3.p1.2.m2.1.1.1.1.2.3.cmml">t</mi><mn id="S2.SS3.p1.2.m2.1.1.1.1.3" xref="S2.SS3.p1.2.m2.1.1.1.1.3.cmml">0</mn></msubsup><mo id="S2.SS3.p1.2.m2.3.3.3.4" xref="S2.SS3.p1.2.m2.3.3.4.cmml">,</mo><msubsup id="S2.SS3.p1.2.m2.2.2.2.2" xref="S2.SS3.p1.2.m2.2.2.2.2.cmml"><mover accent="true" id="S2.SS3.p1.2.m2.2.2.2.2.2.2" xref="S2.SS3.p1.2.m2.2.2.2.2.2.2.cmml"><mi id="S2.SS3.p1.2.m2.2.2.2.2.2.2.2" xref="S2.SS3.p1.2.m2.2.2.2.2.2.2.2.cmml">𝒈</mi><mo id="S2.SS3.p1.2.m2.2.2.2.2.2.2.1" xref="S2.SS3.p1.2.m2.2.2.2.2.2.2.1.cmml">^</mo></mover><mi id="S2.SS3.p1.2.m2.2.2.2.2.2.3" xref="S2.SS3.p1.2.m2.2.2.2.2.2.3.cmml">t</mi><mn id="S2.SS3.p1.2.m2.2.2.2.2.3" xref="S2.SS3.p1.2.m2.2.2.2.2.3.cmml">1</mn></msubsup><mo id="S2.SS3.p1.2.m2.3.3.3.5" xref="S2.SS3.p1.2.m2.3.3.4.cmml">,</mo><msubsup id="S2.SS3.p1.2.m2.3.3.3.3" xref="S2.SS3.p1.2.m2.3.3.3.3.cmml"><mover accent="true" id="S2.SS3.p1.2.m2.3.3.3.3.2.2" xref="S2.SS3.p1.2.m2.3.3.3.3.2.2.cmml"><mi id="S2.SS3.p1.2.m2.3.3.3.3.2.2.2" xref="S2.SS3.p1.2.m2.3.3.3.3.2.2.2.cmml">𝒈</mi><mo id="S2.SS3.p1.2.m2.3.3.3.3.2.2.1" xref="S2.SS3.p1.2.m2.3.3.3.3.2.2.1.cmml">^</mo></mover><mi id="S2.SS3.p1.2.m2.3.3.3.3.2.3" xref="S2.SS3.p1.2.m2.3.3.3.3.2.3.cmml">t</mi><mn id="S2.SS3.p1.2.m2.3.3.3.3.3" xref="S2.SS3.p1.2.m2.3.3.3.3.3.cmml">2</mn></msubsup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.2.m2.3b"><list id="S2.SS3.p1.2.m2.3.3.4.cmml" xref="S2.SS3.p1.2.m2.3.3.3"><apply id="S2.SS3.p1.2.m2.1.1.1.1.cmml" xref="S2.SS3.p1.2.m2.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS3.p1.2.m2.1.1.1.1.1.cmml" xref="S2.SS3.p1.2.m2.1.1.1.1">superscript</csymbol><apply id="S2.SS3.p1.2.m2.1.1.1.1.2.cmml" xref="S2.SS3.p1.2.m2.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS3.p1.2.m2.1.1.1.1.2.1.cmml" xref="S2.SS3.p1.2.m2.1.1.1.1">subscript</csymbol><apply id="S2.SS3.p1.2.m2.1.1.1.1.2.2.cmml" xref="S2.SS3.p1.2.m2.1.1.1.1.2.2"><ci id="S2.SS3.p1.2.m2.1.1.1.1.2.2.1.cmml" xref="S2.SS3.p1.2.m2.1.1.1.1.2.2.1">^</ci><ci id="S2.SS3.p1.2.m2.1.1.1.1.2.2.2.cmml" xref="S2.SS3.p1.2.m2.1.1.1.1.2.2.2">𝒈</ci></apply><ci id="S2.SS3.p1.2.m2.1.1.1.1.2.3.cmml" xref="S2.SS3.p1.2.m2.1.1.1.1.2.3">𝑡</ci></apply><cn id="S2.SS3.p1.2.m2.1.1.1.1.3.cmml" type="integer" xref="S2.SS3.p1.2.m2.1.1.1.1.3">0</cn></apply><apply id="S2.SS3.p1.2.m2.2.2.2.2.cmml" xref="S2.SS3.p1.2.m2.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS3.p1.2.m2.2.2.2.2.1.cmml" xref="S2.SS3.p1.2.m2.2.2.2.2">superscript</csymbol><apply id="S2.SS3.p1.2.m2.2.2.2.2.2.cmml" xref="S2.SS3.p1.2.m2.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS3.p1.2.m2.2.2.2.2.2.1.cmml" xref="S2.SS3.p1.2.m2.2.2.2.2">subscript</csymbol><apply id="S2.SS3.p1.2.m2.2.2.2.2.2.2.cmml" xref="S2.SS3.p1.2.m2.2.2.2.2.2.2"><ci id="S2.SS3.p1.2.m2.2.2.2.2.2.2.1.cmml" xref="S2.SS3.p1.2.m2.2.2.2.2.2.2.1">^</ci><ci id="S2.SS3.p1.2.m2.2.2.2.2.2.2.2.cmml" xref="S2.SS3.p1.2.m2.2.2.2.2.2.2.2">𝒈</ci></apply><ci id="S2.SS3.p1.2.m2.2.2.2.2.2.3.cmml" xref="S2.SS3.p1.2.m2.2.2.2.2.2.3">𝑡</ci></apply><cn id="S2.SS3.p1.2.m2.2.2.2.2.3.cmml" type="integer" xref="S2.SS3.p1.2.m2.2.2.2.2.3">1</cn></apply><apply id="S2.SS3.p1.2.m2.3.3.3.3.cmml" xref="S2.SS3.p1.2.m2.3.3.3.3"><csymbol cd="ambiguous" id="S2.SS3.p1.2.m2.3.3.3.3.1.cmml" xref="S2.SS3.p1.2.m2.3.3.3.3">superscript</csymbol><apply id="S2.SS3.p1.2.m2.3.3.3.3.2.cmml" xref="S2.SS3.p1.2.m2.3.3.3.3"><csymbol cd="ambiguous" id="S2.SS3.p1.2.m2.3.3.3.3.2.1.cmml" xref="S2.SS3.p1.2.m2.3.3.3.3">subscript</csymbol><apply id="S2.SS3.p1.2.m2.3.3.3.3.2.2.cmml" xref="S2.SS3.p1.2.m2.3.3.3.3.2.2"><ci id="S2.SS3.p1.2.m2.3.3.3.3.2.2.1.cmml" xref="S2.SS3.p1.2.m2.3.3.3.3.2.2.1">^</ci><ci id="S2.SS3.p1.2.m2.3.3.3.3.2.2.2.cmml" xref="S2.SS3.p1.2.m2.3.3.3.3.2.2.2">𝒈</ci></apply><ci id="S2.SS3.p1.2.m2.3.3.3.3.2.3.cmml" xref="S2.SS3.p1.2.m2.3.3.3.3.2.3">𝑡</ci></apply><cn id="S2.SS3.p1.2.m2.3.3.3.3.3.cmml" type="integer" xref="S2.SS3.p1.2.m2.3.3.3.3.3">2</cn></apply></list></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.2.m2.3c">\hat{\boldsymbol{g}}_{t}^{0},\hat{\boldsymbol{g}}_{t}^{1},\hat{\boldsymbol{g}}% _{t}^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.2.m2.3d">over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math>. We concatenate the local context, global context with current frames or middle feature. In this manner, local and global contexts can be utilized for conditional coding. The overall process can be described as follows,</p> <table class="ltx_equationgroup ltx_eqn_table" id="S2.E5"> <tbody> <tr class="ltx_equation ltx_eqn_row ltx_align_baseline" id="S2.E5X"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_td ltx_align_right ltx_eqn_cell"><math alttext="\displaystyle\hat{\boldsymbol{y}}_{t}" class="ltx_Math" display="inline" id="S2.E5X.2.1.1.m1.1"><semantics id="S2.E5X.2.1.1.m1.1a"><msub id="S2.E5X.2.1.1.m1.1.1" xref="S2.E5X.2.1.1.m1.1.1.cmml"><mover accent="true" id="S2.E5X.2.1.1.m1.1.1.2" xref="S2.E5X.2.1.1.m1.1.1.2.cmml"><mi id="S2.E5X.2.1.1.m1.1.1.2.2" xref="S2.E5X.2.1.1.m1.1.1.2.2.cmml">𝒚</mi><mo id="S2.E5X.2.1.1.m1.1.1.2.1" xref="S2.E5X.2.1.1.m1.1.1.2.1.cmml">^</mo></mover><mi id="S2.E5X.2.1.1.m1.1.1.3" xref="S2.E5X.2.1.1.m1.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.E5X.2.1.1.m1.1b"><apply id="S2.E5X.2.1.1.m1.1.1.cmml" xref="S2.E5X.2.1.1.m1.1.1"><csymbol cd="ambiguous" id="S2.E5X.2.1.1.m1.1.1.1.cmml" xref="S2.E5X.2.1.1.m1.1.1">subscript</csymbol><apply id="S2.E5X.2.1.1.m1.1.1.2.cmml" xref="S2.E5X.2.1.1.m1.1.1.2"><ci id="S2.E5X.2.1.1.m1.1.1.2.1.cmml" xref="S2.E5X.2.1.1.m1.1.1.2.1">^</ci><ci id="S2.E5X.2.1.1.m1.1.1.2.2.cmml" xref="S2.E5X.2.1.1.m1.1.1.2.2">𝒚</ci></apply><ci id="S2.E5X.2.1.1.m1.1.1.3.cmml" xref="S2.E5X.2.1.1.m1.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E5X.2.1.1.m1.1c">\displaystyle\hat{\boldsymbol{y}}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.E5X.2.1.1.m1.1d">over^ start_ARG bold_italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_eqn_cell"><math alttext="\displaystyle=\lfloor E_{C}(\boldsymbol{x}_{t}|\hat{\boldsymbol{l}}_{t}^{0},% \hat{\boldsymbol{l}}_{t}^{1},\hat{\boldsymbol{l}}_{t}^{2},\hat{\boldsymbol{g}}% _{t}^{0},\hat{\boldsymbol{g}}_{t}^{1},\hat{\boldsymbol{g}}_{t}^{2})\rceil," class="ltx_math_unparsed" display="inline" id="S2.E5X.3.2.2.m1.1"><semantics id="S2.E5X.3.2.2.m1.1a"><mrow id="S2.E5X.3.2.2.m1.1b"><mo id="S2.E5X.3.2.2.m1.1.1">=</mo><mrow id="S2.E5X.3.2.2.m1.1.2"><mo id="S2.E5X.3.2.2.m1.1.2.1" stretchy="false">⌊</mo><msub id="S2.E5X.3.2.2.m1.1.2.2"><mi id="S2.E5X.3.2.2.m1.1.2.2.2">E</mi><mi id="S2.E5X.3.2.2.m1.1.2.2.3">C</mi></msub><mrow id="S2.E5X.3.2.2.m1.1.2.3"><mo id="S2.E5X.3.2.2.m1.1.2.3.1" stretchy="false">(</mo><msub id="S2.E5X.3.2.2.m1.1.2.3.2"><mi id="S2.E5X.3.2.2.m1.1.2.3.2.2">𝒙</mi><mi id="S2.E5X.3.2.2.m1.1.2.3.2.3">t</mi></msub><mo fence="false" id="S2.E5X.3.2.2.m1.1.2.3.3" rspace="0.167em" stretchy="false">|</mo><msubsup id="S2.E5X.3.2.2.m1.1.2.3.4"><mover accent="true" id="S2.E5X.3.2.2.m1.1.2.3.4.2.2"><mi id="S2.E5X.3.2.2.m1.1.2.3.4.2.2.2">𝒍</mi><mo id="S2.E5X.3.2.2.m1.1.2.3.4.2.2.1">^</mo></mover><mi id="S2.E5X.3.2.2.m1.1.2.3.4.2.3">t</mi><mn id="S2.E5X.3.2.2.m1.1.2.3.4.3">0</mn></msubsup><mo id="S2.E5X.3.2.2.m1.1.2.3.5">,</mo><msubsup id="S2.E5X.3.2.2.m1.1.2.3.6"><mover accent="true" id="S2.E5X.3.2.2.m1.1.2.3.6.2.2"><mi id="S2.E5X.3.2.2.m1.1.2.3.6.2.2.2">𝒍</mi><mo id="S2.E5X.3.2.2.m1.1.2.3.6.2.2.1">^</mo></mover><mi id="S2.E5X.3.2.2.m1.1.2.3.6.2.3">t</mi><mn id="S2.E5X.3.2.2.m1.1.2.3.6.3">1</mn></msubsup><mo id="S2.E5X.3.2.2.m1.1.2.3.7">,</mo><msubsup id="S2.E5X.3.2.2.m1.1.2.3.8"><mover accent="true" id="S2.E5X.3.2.2.m1.1.2.3.8.2.2"><mi id="S2.E5X.3.2.2.m1.1.2.3.8.2.2.2">𝒍</mi><mo id="S2.E5X.3.2.2.m1.1.2.3.8.2.2.1">^</mo></mover><mi id="S2.E5X.3.2.2.m1.1.2.3.8.2.3">t</mi><mn id="S2.E5X.3.2.2.m1.1.2.3.8.3">2</mn></msubsup><mo id="S2.E5X.3.2.2.m1.1.2.3.9">,</mo><msubsup id="S2.E5X.3.2.2.m1.1.2.3.10"><mover accent="true" id="S2.E5X.3.2.2.m1.1.2.3.10.2.2"><mi id="S2.E5X.3.2.2.m1.1.2.3.10.2.2.2">𝒈</mi><mo id="S2.E5X.3.2.2.m1.1.2.3.10.2.2.1">^</mo></mover><mi id="S2.E5X.3.2.2.m1.1.2.3.10.2.3">t</mi><mn id="S2.E5X.3.2.2.m1.1.2.3.10.3">0</mn></msubsup><mo id="S2.E5X.3.2.2.m1.1.2.3.11">,</mo><msubsup id="S2.E5X.3.2.2.m1.1.2.3.12"><mover accent="true" id="S2.E5X.3.2.2.m1.1.2.3.12.2.2"><mi id="S2.E5X.3.2.2.m1.1.2.3.12.2.2.2">𝒈</mi><mo id="S2.E5X.3.2.2.m1.1.2.3.12.2.2.1">^</mo></mover><mi id="S2.E5X.3.2.2.m1.1.2.3.12.2.3">t</mi><mn id="S2.E5X.3.2.2.m1.1.2.3.12.3">1</mn></msubsup><mo id="S2.E5X.3.2.2.m1.1.2.3.13">,</mo><msubsup id="S2.E5X.3.2.2.m1.1.2.3.14"><mover accent="true" id="S2.E5X.3.2.2.m1.1.2.3.14.2.2"><mi id="S2.E5X.3.2.2.m1.1.2.3.14.2.2.2">𝒈</mi><mo id="S2.E5X.3.2.2.m1.1.2.3.14.2.2.1">^</mo></mover><mi id="S2.E5X.3.2.2.m1.1.2.3.14.2.3">t</mi><mn id="S2.E5X.3.2.2.m1.1.2.3.14.3">2</mn></msubsup><mo id="S2.E5X.3.2.2.m1.1.2.3.15" stretchy="false">)</mo></mrow><mo id="S2.E5X.3.2.2.m1.1.2.4" stretchy="false">⌉</mo></mrow><mo id="S2.E5X.3.2.2.m1.1.3">,</mo></mrow><annotation encoding="application/x-tex" id="S2.E5X.3.2.2.m1.1c">\displaystyle=\lfloor E_{C}(\boldsymbol{x}_{t}|\hat{\boldsymbol{l}}_{t}^{0},% \hat{\boldsymbol{l}}_{t}^{1},\hat{\boldsymbol{l}}_{t}^{2},\hat{\boldsymbol{g}}% _{t}^{0},\hat{\boldsymbol{g}}_{t}^{1},\hat{\boldsymbol{g}}_{t}^{2})\rceil,</annotation><annotation encoding="application/x-llamapun" id="S2.E5X.3.2.2.m1.1d">= ⌊ italic_E start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ⌉ ,</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="2"><span class="ltx_tag ltx_tag_equationgroup ltx_align_right">(5)</span></td> </tr> <tr class="ltx_equation ltx_eqn_row ltx_align_baseline" id="S2.E5Xa"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_td ltx_align_right ltx_eqn_cell"><math alttext="\displaystyle\hat{\boldsymbol{x}}_{t}" class="ltx_Math" display="inline" id="S2.E5Xa.2.1.1.m1.1"><semantics id="S2.E5Xa.2.1.1.m1.1a"><msub id="S2.E5Xa.2.1.1.m1.1.1" xref="S2.E5Xa.2.1.1.m1.1.1.cmml"><mover accent="true" id="S2.E5Xa.2.1.1.m1.1.1.2" xref="S2.E5Xa.2.1.1.m1.1.1.2.cmml"><mi id="S2.E5Xa.2.1.1.m1.1.1.2.2" xref="S2.E5Xa.2.1.1.m1.1.1.2.2.cmml">𝒙</mi><mo id="S2.E5Xa.2.1.1.m1.1.1.2.1" xref="S2.E5Xa.2.1.1.m1.1.1.2.1.cmml">^</mo></mover><mi id="S2.E5Xa.2.1.1.m1.1.1.3" xref="S2.E5Xa.2.1.1.m1.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.E5Xa.2.1.1.m1.1b"><apply id="S2.E5Xa.2.1.1.m1.1.1.cmml" xref="S2.E5Xa.2.1.1.m1.1.1"><csymbol cd="ambiguous" id="S2.E5Xa.2.1.1.m1.1.1.1.cmml" xref="S2.E5Xa.2.1.1.m1.1.1">subscript</csymbol><apply id="S2.E5Xa.2.1.1.m1.1.1.2.cmml" xref="S2.E5Xa.2.1.1.m1.1.1.2"><ci id="S2.E5Xa.2.1.1.m1.1.1.2.1.cmml" xref="S2.E5Xa.2.1.1.m1.1.1.2.1">^</ci><ci id="S2.E5Xa.2.1.1.m1.1.1.2.2.cmml" xref="S2.E5Xa.2.1.1.m1.1.1.2.2">𝒙</ci></apply><ci id="S2.E5Xa.2.1.1.m1.1.1.3.cmml" xref="S2.E5Xa.2.1.1.m1.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E5Xa.2.1.1.m1.1c">\displaystyle\hat{\boldsymbol{x}}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.E5Xa.2.1.1.m1.1d">over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_eqn_cell"><math alttext="\displaystyle=D_{C}\left(\hat{\boldsymbol{y}}_{t}|\hat{\boldsymbol{l}}_{t}^{0}% ,\hat{\boldsymbol{l}}_{t}^{1},\hat{\boldsymbol{l}}_{t}^{2},\hat{\boldsymbol{g}% }_{t}^{0},\hat{\boldsymbol{g}}_{t}^{1},\hat{\boldsymbol{g}}_{t}^{2}\right)," class="ltx_Math" display="inline" id="S2.E5Xa.3.2.2.m1.1"><semantics id="S2.E5Xa.3.2.2.m1.1a"><mrow id="S2.E5Xa.3.2.2.m1.1.1.1" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.cmml"><mrow id="S2.E5Xa.3.2.2.m1.1.1.1.1" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.cmml"><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.3.cmml"></mi><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.2.cmml">=</mo><mrow id="S2.E5Xa.3.2.2.m1.1.1.1.1.1" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.cmml"><msub id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.3.cmml"><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.3.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.3.2.cmml">D</mi><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.3.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.3.3.cmml">C</mi></msub><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.2.cmml"></mo><mrow id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.cmml"><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.cmml">(</mo><mrow id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.cmml"><msub id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.cmml"><mover accent="true" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.2.cmml"><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.2.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.2.2.cmml">𝒚</mi><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.2.1" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.2.1.cmml">^</mo></mover><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.3.cmml">t</mi></msub><mo fence="false" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.7" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.7.cmml">|</mo><mrow id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.7.cmml"><msubsup id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.cmml"><mover accent="true" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.2.cmml"><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.2.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.2.2.cmml">𝒍</mi><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.2.1" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.2.1.cmml">^</mo></mover><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.3.cmml">t</mi><mn id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.3.cmml">0</mn></msubsup><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.7" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.7.cmml">,</mo><msubsup id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.cmml"><mover accent="true" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.2.cmml"><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.2.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.2.2.cmml">𝒍</mi><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.2.1" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.2.1.cmml">^</mo></mover><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.3.cmml">t</mi><mn id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.3.cmml">1</mn></msubsup><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.8" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.7.cmml">,</mo><msubsup id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.cmml"><mover accent="true" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.2.cmml"><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.2.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.2.2.cmml">𝒍</mi><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.2.1" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.2.1.cmml">^</mo></mover><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.3.cmml">t</mi><mn id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.3.cmml">2</mn></msubsup><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.9" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.7.cmml">,</mo><msubsup id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.cmml"><mover accent="true" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.2.cmml"><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.2.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.2.2.cmml">𝒈</mi><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.2.1" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.2.1.cmml">^</mo></mover><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.3.cmml">t</mi><mn id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.3.cmml">0</mn></msubsup><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.10" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.7.cmml">,</mo><msubsup id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.cmml"><mover accent="true" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.2.cmml"><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.2.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.2.2.cmml">𝒈</mi><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.2.1" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.2.1.cmml">^</mo></mover><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.3.cmml">t</mi><mn id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.3.cmml">1</mn></msubsup><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.11" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.7.cmml">,</mo><msubsup id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.cmml"><mover accent="true" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.2.cmml"><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.2.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.2.2.cmml">𝒈</mi><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.2.1" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.2.1.cmml">^</mo></mover><mi id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.3.cmml">t</mi><mn id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.3.cmml">2</mn></msubsup></mrow></mrow><mo id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.3" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow></mrow><mo id="S2.E5Xa.3.2.2.m1.1.1.1.2" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.cmml">,</mo></mrow><annotation-xml encoding="MathML-Content" id="S2.E5Xa.3.2.2.m1.1b"><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1"><eq id="S2.E5Xa.3.2.2.m1.1.1.1.1.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.2"></eq><csymbol cd="latexml" id="S2.E5Xa.3.2.2.m1.1.1.1.1.3.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.3">absent</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1"><times id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.2"></times><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.3.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.3.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.3">subscript</csymbol><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.3.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.3.2">𝐷</ci><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.3.3.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.3.3">𝐶</ci></apply><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1"><csymbol cd="latexml" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.7.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.7">conditional</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8"><csymbol cd="ambiguous" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8">subscript</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.2"><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.2.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.2.1">^</ci><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.2.2">𝒚</ci></apply><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.3.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.8.3">𝑡</ci></apply><list id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.7.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6"><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1">superscript</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1">subscript</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.2"><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.2.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.2.1">^</ci><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.2.2">𝒍</ci></apply><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.3.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.2.3">𝑡</ci></apply><cn id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.3.cmml" type="integer" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.1.1.1.3">0</cn></apply><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2"><csymbol cd="ambiguous" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2">superscript</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2"><csymbol cd="ambiguous" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2">subscript</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.2"><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.2.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.2.1">^</ci><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.2.2">𝒍</ci></apply><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.3.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.2.3">𝑡</ci></apply><cn id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.3.cmml" type="integer" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.2.2.2.3">1</cn></apply><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3"><csymbol cd="ambiguous" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3">superscript</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3"><csymbol cd="ambiguous" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3">subscript</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.2"><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.2.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.2.1">^</ci><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.2.2">𝒍</ci></apply><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.3.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.2.3">𝑡</ci></apply><cn id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.3.cmml" type="integer" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.3.3.3.3">2</cn></apply><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4"><csymbol cd="ambiguous" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4">superscript</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4"><csymbol cd="ambiguous" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4">subscript</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.2"><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.2.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.2.1">^</ci><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.2.2">𝒈</ci></apply><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.3.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.2.3">𝑡</ci></apply><cn id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.3.cmml" type="integer" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.4.4.4.3">0</cn></apply><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5"><csymbol cd="ambiguous" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5">superscript</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5"><csymbol cd="ambiguous" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5">subscript</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.2"><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.2.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.2.1">^</ci><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.2.2">𝒈</ci></apply><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.3.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.2.3">𝑡</ci></apply><cn id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.3.cmml" type="integer" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.5.5.5.3">1</cn></apply><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6"><csymbol cd="ambiguous" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6">superscript</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6"><csymbol cd="ambiguous" id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6">subscript</csymbol><apply id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.2"><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.2.1.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.2.1">^</ci><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.2.2.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.2.2">𝒈</ci></apply><ci id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.3.cmml" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.2.3">𝑡</ci></apply><cn id="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.3.cmml" type="integer" xref="S2.E5Xa.3.2.2.m1.1.1.1.1.1.1.1.1.6.6.6.3">2</cn></apply></list></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E5Xa.3.2.2.m1.1c">\displaystyle=D_{C}\left(\hat{\boldsymbol{y}}_{t}|\hat{\boldsymbol{l}}_{t}^{0}% ,\hat{\boldsymbol{l}}_{t}^{1},\hat{\boldsymbol{l}}_{t}^{2},\hat{\boldsymbol{g}% }_{t}^{0},\hat{\boldsymbol{g}}_{t}^{1},\hat{\boldsymbol{g}}_{t}^{2}\right),</annotation><annotation encoding="application/x-llamapun" id="S2.E5Xa.3.2.2.m1.1d">= italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_g end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ,</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> </tr> </tbody> </table> <p class="ltx_p" id="S2.SS3.p1.4">where <math alttext="\hat{\boldsymbol{y}}_{t}" class="ltx_Math" display="inline" id="S2.SS3.p1.3.m1.1"><semantics id="S2.SS3.p1.3.m1.1a"><msub id="S2.SS3.p1.3.m1.1.1" xref="S2.SS3.p1.3.m1.1.1.cmml"><mover accent="true" id="S2.SS3.p1.3.m1.1.1.2" xref="S2.SS3.p1.3.m1.1.1.2.cmml"><mi id="S2.SS3.p1.3.m1.1.1.2.2" xref="S2.SS3.p1.3.m1.1.1.2.2.cmml">𝒚</mi><mo id="S2.SS3.p1.3.m1.1.1.2.1" xref="S2.SS3.p1.3.m1.1.1.2.1.cmml">^</mo></mover><mi id="S2.SS3.p1.3.m1.1.1.3" xref="S2.SS3.p1.3.m1.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.3.m1.1b"><apply id="S2.SS3.p1.3.m1.1.1.cmml" xref="S2.SS3.p1.3.m1.1.1"><csymbol cd="ambiguous" id="S2.SS3.p1.3.m1.1.1.1.cmml" xref="S2.SS3.p1.3.m1.1.1">subscript</csymbol><apply id="S2.SS3.p1.3.m1.1.1.2.cmml" xref="S2.SS3.p1.3.m1.1.1.2"><ci id="S2.SS3.p1.3.m1.1.1.2.1.cmml" xref="S2.SS3.p1.3.m1.1.1.2.1">^</ci><ci id="S2.SS3.p1.3.m1.1.1.2.2.cmml" xref="S2.SS3.p1.3.m1.1.1.2.2">𝒚</ci></apply><ci id="S2.SS3.p1.3.m1.1.1.3.cmml" xref="S2.SS3.p1.3.m1.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.3.m1.1c">\hat{\boldsymbol{y}}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.3.m1.1d">over^ start_ARG bold_italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> is the latent representation of <math alttext="\boldsymbol{x}_{t}" class="ltx_Math" display="inline" id="S2.SS3.p1.4.m2.1"><semantics id="S2.SS3.p1.4.m2.1a"><msub id="S2.SS3.p1.4.m2.1.1" xref="S2.SS3.p1.4.m2.1.1.cmml"><mi id="S2.SS3.p1.4.m2.1.1.2" xref="S2.SS3.p1.4.m2.1.1.2.cmml">𝒙</mi><mi id="S2.SS3.p1.4.m2.1.1.3" xref="S2.SS3.p1.4.m2.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.4.m2.1b"><apply id="S2.SS3.p1.4.m2.1.1.cmml" xref="S2.SS3.p1.4.m2.1.1"><csymbol cd="ambiguous" id="S2.SS3.p1.4.m2.1.1.1.cmml" xref="S2.SS3.p1.4.m2.1.1">subscript</csymbol><ci id="S2.SS3.p1.4.m2.1.1.2.cmml" xref="S2.SS3.p1.4.m2.1.1.2">𝒙</ci><ci id="S2.SS3.p1.4.m2.1.1.3.cmml" xref="S2.SS3.p1.4.m2.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.4.m2.1c">\boldsymbol{x}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.4.m2.1d">bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>.</p> </div> </section> </section> <section class="ltx_section" id="S3"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">3 </span>Experiments</h2> <section class="ltx_subsection" id="S3.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">3.1 </span>Experimental Setup</h3> <div class="ltx_para" id="S3.SS1.p1"> <p class="ltx_p" id="S3.SS1.p1.1">We train the LVC-LGMC on Vimeo-90k dataset <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib22" title="">22</a>]</cite>. Frames are randomly cropped into <math alttext="256\times 256" class="ltx_Math" display="inline" id="S3.SS1.p1.1.m1.1"><semantics id="S3.SS1.p1.1.m1.1a"><mrow id="S3.SS1.p1.1.m1.1.1" xref="S3.SS1.p1.1.m1.1.1.cmml"><mn id="S3.SS1.p1.1.m1.1.1.2" xref="S3.SS1.p1.1.m1.1.1.2.cmml">256</mn><mo id="S3.SS1.p1.1.m1.1.1.1" lspace="0.222em" rspace="0.222em" xref="S3.SS1.p1.1.m1.1.1.1.cmml">×</mo><mn id="S3.SS1.p1.1.m1.1.1.3" xref="S3.SS1.p1.1.m1.1.1.3.cmml">256</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.1.m1.1b"><apply id="S3.SS1.p1.1.m1.1.1.cmml" xref="S3.SS1.p1.1.m1.1.1"><times id="S3.SS1.p1.1.m1.1.1.1.cmml" xref="S3.SS1.p1.1.m1.1.1.1"></times><cn id="S3.SS1.p1.1.m1.1.1.2.cmml" type="integer" xref="S3.SS1.p1.1.m1.1.1.2">256</cn><cn id="S3.SS1.p1.1.m1.1.1.3.cmml" type="integer" xref="S3.SS1.p1.1.m1.1.1.3">256</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.1.m1.1c">256\times 256</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.1.m1.1d">256 × 256</annotation></semantics></math> patches. The proposed models are optimized with the rate-distortion loss as follows,</p> <table class="ltx_equationgroup ltx_eqn_table" id="S3.E6"> <tbody> <tr class="ltx_equation ltx_eqn_row ltx_align_baseline" id="S3.E6X"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_td ltx_align_right ltx_eqn_cell"><math alttext="\displaystyle\mathcal{L}" class="ltx_Math" display="inline" id="S3.E6X.2.1.1.m1.1"><semantics id="S3.E6X.2.1.1.m1.1a"><mi class="ltx_font_mathcaligraphic" id="S3.E6X.2.1.1.m1.1.1" xref="S3.E6X.2.1.1.m1.1.1.cmml">ℒ</mi><annotation-xml encoding="MathML-Content" id="S3.E6X.2.1.1.m1.1b"><ci id="S3.E6X.2.1.1.m1.1.1.cmml" xref="S3.E6X.2.1.1.m1.1.1">ℒ</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.E6X.2.1.1.m1.1c">\displaystyle\mathcal{L}</annotation><annotation encoding="application/x-llamapun" id="S3.E6X.2.1.1.m1.1d">caligraphic_L</annotation></semantics></math></td> <td class="ltx_td ltx_align_left ltx_eqn_cell"><math alttext="\displaystyle=\mathcal{R}+\lambda\times\mathcal{D}," class="ltx_Math" display="inline" id="S3.E6X.3.2.2.m1.1"><semantics id="S3.E6X.3.2.2.m1.1a"><mrow id="S3.E6X.3.2.2.m1.1.1.1" xref="S3.E6X.3.2.2.m1.1.1.1.1.cmml"><mrow id="S3.E6X.3.2.2.m1.1.1.1.1" xref="S3.E6X.3.2.2.m1.1.1.1.1.cmml"><mi id="S3.E6X.3.2.2.m1.1.1.1.1.2" xref="S3.E6X.3.2.2.m1.1.1.1.1.2.cmml"></mi><mo id="S3.E6X.3.2.2.m1.1.1.1.1.1" xref="S3.E6X.3.2.2.m1.1.1.1.1.1.cmml">=</mo><mrow id="S3.E6X.3.2.2.m1.1.1.1.1.3" xref="S3.E6X.3.2.2.m1.1.1.1.1.3.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.E6X.3.2.2.m1.1.1.1.1.3.2" xref="S3.E6X.3.2.2.m1.1.1.1.1.3.2.cmml">ℛ</mi><mo id="S3.E6X.3.2.2.m1.1.1.1.1.3.1" xref="S3.E6X.3.2.2.m1.1.1.1.1.3.1.cmml">+</mo><mrow id="S3.E6X.3.2.2.m1.1.1.1.1.3.3" xref="S3.E6X.3.2.2.m1.1.1.1.1.3.3.cmml"><mi id="S3.E6X.3.2.2.m1.1.1.1.1.3.3.2" xref="S3.E6X.3.2.2.m1.1.1.1.1.3.3.2.cmml">λ</mi><mo id="S3.E6X.3.2.2.m1.1.1.1.1.3.3.1" lspace="0.222em" rspace="0.222em" xref="S3.E6X.3.2.2.m1.1.1.1.1.3.3.1.cmml">×</mo><mi class="ltx_font_mathcaligraphic" id="S3.E6X.3.2.2.m1.1.1.1.1.3.3.3" xref="S3.E6X.3.2.2.m1.1.1.1.1.3.3.3.cmml">𝒟</mi></mrow></mrow></mrow><mo id="S3.E6X.3.2.2.m1.1.1.1.2" xref="S3.E6X.3.2.2.m1.1.1.1.1.cmml">,</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.E6X.3.2.2.m1.1b"><apply id="S3.E6X.3.2.2.m1.1.1.1.1.cmml" xref="S3.E6X.3.2.2.m1.1.1.1"><eq id="S3.E6X.3.2.2.m1.1.1.1.1.1.cmml" xref="S3.E6X.3.2.2.m1.1.1.1.1.1"></eq><csymbol cd="latexml" id="S3.E6X.3.2.2.m1.1.1.1.1.2.cmml" xref="S3.E6X.3.2.2.m1.1.1.1.1.2">absent</csymbol><apply id="S3.E6X.3.2.2.m1.1.1.1.1.3.cmml" xref="S3.E6X.3.2.2.m1.1.1.1.1.3"><plus id="S3.E6X.3.2.2.m1.1.1.1.1.3.1.cmml" xref="S3.E6X.3.2.2.m1.1.1.1.1.3.1"></plus><ci id="S3.E6X.3.2.2.m1.1.1.1.1.3.2.cmml" xref="S3.E6X.3.2.2.m1.1.1.1.1.3.2">ℛ</ci><apply id="S3.E6X.3.2.2.m1.1.1.1.1.3.3.cmml" xref="S3.E6X.3.2.2.m1.1.1.1.1.3.3"><times id="S3.E6X.3.2.2.m1.1.1.1.1.3.3.1.cmml" xref="S3.E6X.3.2.2.m1.1.1.1.1.3.3.1"></times><ci id="S3.E6X.3.2.2.m1.1.1.1.1.3.3.2.cmml" xref="S3.E6X.3.2.2.m1.1.1.1.1.3.3.2">𝜆</ci><ci id="S3.E6X.3.2.2.m1.1.1.1.1.3.3.3.cmml" xref="S3.E6X.3.2.2.m1.1.1.1.1.3.3.3">𝒟</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E6X.3.2.2.m1.1c">\displaystyle=\mathcal{R}+\lambda\times\mathcal{D},</annotation><annotation encoding="application/x-llamapun" id="S3.E6X.3.2.2.m1.1d">= caligraphic_R + italic_λ × caligraphic_D ,</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equationgroup ltx_align_right">(6)</span></td> </tr> </tbody> </table> <p class="ltx_p" id="S3.SS1.p1.6">where <math alttext="\mathcal{D}" class="ltx_Math" display="inline" id="S3.SS1.p1.2.m1.1"><semantics id="S3.SS1.p1.2.m1.1a"><mi class="ltx_font_mathcaligraphic" id="S3.SS1.p1.2.m1.1.1" xref="S3.SS1.p1.2.m1.1.1.cmml">𝒟</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.2.m1.1b"><ci id="S3.SS1.p1.2.m1.1.1.cmml" xref="S3.SS1.p1.2.m1.1.1">𝒟</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.2.m1.1c">\mathcal{D}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.2.m1.1d">caligraphic_D</annotation></semantics></math> denotes the distortion metric Mean Square Error (MSE) or MS-SSIM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib17" title="">17</a>]</cite>. To cover wide range of coding rates, the <math alttext="\lambda" class="ltx_Math" display="inline" id="S3.SS1.p1.3.m2.1"><semantics id="S3.SS1.p1.3.m2.1a"><mi id="S3.SS1.p1.3.m2.1.1" xref="S3.SS1.p1.3.m2.1.1.cmml">λ</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.3.m2.1b"><ci id="S3.SS1.p1.3.m2.1.1.cmml" xref="S3.SS1.p1.3.m2.1.1">𝜆</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.3.m2.1c">\lambda</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.3.m2.1d">italic_λ</annotation></semantics></math> is set as <math alttext="\{256,512,1024,2048\}" class="ltx_Math" display="inline" id="S3.SS1.p1.4.m3.4"><semantics id="S3.SS1.p1.4.m3.4a"><mrow id="S3.SS1.p1.4.m3.4.5.2" xref="S3.SS1.p1.4.m3.4.5.1.cmml"><mo id="S3.SS1.p1.4.m3.4.5.2.1" stretchy="false" xref="S3.SS1.p1.4.m3.4.5.1.cmml">{</mo><mn id="S3.SS1.p1.4.m3.1.1" xref="S3.SS1.p1.4.m3.1.1.cmml">256</mn><mo id="S3.SS1.p1.4.m3.4.5.2.2" xref="S3.SS1.p1.4.m3.4.5.1.cmml">,</mo><mn id="S3.SS1.p1.4.m3.2.2" xref="S3.SS1.p1.4.m3.2.2.cmml">512</mn><mo id="S3.SS1.p1.4.m3.4.5.2.3" xref="S3.SS1.p1.4.m3.4.5.1.cmml">,</mo><mn id="S3.SS1.p1.4.m3.3.3" xref="S3.SS1.p1.4.m3.3.3.cmml">1024</mn><mo id="S3.SS1.p1.4.m3.4.5.2.4" xref="S3.SS1.p1.4.m3.4.5.1.cmml">,</mo><mn id="S3.SS1.p1.4.m3.4.4" xref="S3.SS1.p1.4.m3.4.4.cmml">2048</mn><mo id="S3.SS1.p1.4.m3.4.5.2.5" stretchy="false" xref="S3.SS1.p1.4.m3.4.5.1.cmml">}</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.4.m3.4b"><set id="S3.SS1.p1.4.m3.4.5.1.cmml" xref="S3.SS1.p1.4.m3.4.5.2"><cn id="S3.SS1.p1.4.m3.1.1.cmml" type="integer" xref="S3.SS1.p1.4.m3.1.1">256</cn><cn id="S3.SS1.p1.4.m3.2.2.cmml" type="integer" xref="S3.SS1.p1.4.m3.2.2">512</cn><cn id="S3.SS1.p1.4.m3.3.3.cmml" type="integer" xref="S3.SS1.p1.4.m3.3.3">1024</cn><cn id="S3.SS1.p1.4.m3.4.4.cmml" type="integer" xref="S3.SS1.p1.4.m3.4.4">2048</cn></set></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.4.m3.4c">\{256,512,1024,2048\}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.4.m3.4d">{ 256 , 512 , 1024 , 2048 }</annotation></semantics></math> for MSE and <math alttext="\{256,512,1024,2048\}\times\frac{1}{50}" class="ltx_Math" display="inline" id="S3.SS1.p1.5.m4.4"><semantics id="S3.SS1.p1.5.m4.4a"><mrow id="S3.SS1.p1.5.m4.4.5" xref="S3.SS1.p1.5.m4.4.5.cmml"><mrow id="S3.SS1.p1.5.m4.4.5.2.2" xref="S3.SS1.p1.5.m4.4.5.2.1.cmml"><mo id="S3.SS1.p1.5.m4.4.5.2.2.1" stretchy="false" xref="S3.SS1.p1.5.m4.4.5.2.1.cmml">{</mo><mn id="S3.SS1.p1.5.m4.1.1" xref="S3.SS1.p1.5.m4.1.1.cmml">256</mn><mo id="S3.SS1.p1.5.m4.4.5.2.2.2" xref="S3.SS1.p1.5.m4.4.5.2.1.cmml">,</mo><mn id="S3.SS1.p1.5.m4.2.2" xref="S3.SS1.p1.5.m4.2.2.cmml">512</mn><mo id="S3.SS1.p1.5.m4.4.5.2.2.3" xref="S3.SS1.p1.5.m4.4.5.2.1.cmml">,</mo><mn id="S3.SS1.p1.5.m4.3.3" xref="S3.SS1.p1.5.m4.3.3.cmml">1024</mn><mo id="S3.SS1.p1.5.m4.4.5.2.2.4" xref="S3.SS1.p1.5.m4.4.5.2.1.cmml">,</mo><mn id="S3.SS1.p1.5.m4.4.4" xref="S3.SS1.p1.5.m4.4.4.cmml">2048</mn><mo id="S3.SS1.p1.5.m4.4.5.2.2.5" rspace="0.055em" stretchy="false" xref="S3.SS1.p1.5.m4.4.5.2.1.cmml">}</mo></mrow><mo id="S3.SS1.p1.5.m4.4.5.1" rspace="0.222em" xref="S3.SS1.p1.5.m4.4.5.1.cmml">×</mo><mfrac id="S3.SS1.p1.5.m4.4.5.3" xref="S3.SS1.p1.5.m4.4.5.3.cmml"><mn id="S3.SS1.p1.5.m4.4.5.3.2" xref="S3.SS1.p1.5.m4.4.5.3.2.cmml">1</mn><mn id="S3.SS1.p1.5.m4.4.5.3.3" xref="S3.SS1.p1.5.m4.4.5.3.3.cmml">50</mn></mfrac></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.5.m4.4b"><apply id="S3.SS1.p1.5.m4.4.5.cmml" xref="S3.SS1.p1.5.m4.4.5"><times id="S3.SS1.p1.5.m4.4.5.1.cmml" xref="S3.SS1.p1.5.m4.4.5.1"></times><set id="S3.SS1.p1.5.m4.4.5.2.1.cmml" xref="S3.SS1.p1.5.m4.4.5.2.2"><cn id="S3.SS1.p1.5.m4.1.1.cmml" type="integer" xref="S3.SS1.p1.5.m4.1.1">256</cn><cn id="S3.SS1.p1.5.m4.2.2.cmml" type="integer" xref="S3.SS1.p1.5.m4.2.2">512</cn><cn id="S3.SS1.p1.5.m4.3.3.cmml" type="integer" xref="S3.SS1.p1.5.m4.3.3">1024</cn><cn id="S3.SS1.p1.5.m4.4.4.cmml" type="integer" xref="S3.SS1.p1.5.m4.4.4">2048</cn></set><apply id="S3.SS1.p1.5.m4.4.5.3.cmml" xref="S3.SS1.p1.5.m4.4.5.3"><divide id="S3.SS1.p1.5.m4.4.5.3.1.cmml" xref="S3.SS1.p1.5.m4.4.5.3"></divide><cn id="S3.SS1.p1.5.m4.4.5.3.2.cmml" type="integer" xref="S3.SS1.p1.5.m4.4.5.3.2">1</cn><cn id="S3.SS1.p1.5.m4.4.5.3.3.cmml" type="integer" xref="S3.SS1.p1.5.m4.4.5.3.3">50</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.5.m4.4c">\{256,512,1024,2048\}\times\frac{1}{50}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.5.m4.4d">{ 256 , 512 , 1024 , 2048 } × divide start_ARG 1 end_ARG start_ARG 50 end_ARG</annotation></semantics></math> for MS-SSIM. Following DCVC-TCM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib9" title="">9</a>]</cite>, we adopt the multi-stage training strategy with the AdamW optimizer, and the batch size is set as <math alttext="4" class="ltx_Math" display="inline" id="S3.SS1.p1.6.m5.1"><semantics id="S3.SS1.p1.6.m5.1a"><mn id="S3.SS1.p1.6.m5.1.1" xref="S3.SS1.p1.6.m5.1.1.cmml">4</mn><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.6.m5.1b"><cn id="S3.SS1.p1.6.m5.1.1.cmml" type="integer" xref="S3.SS1.p1.6.m5.1.1">4</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.6.m5.1c">4</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.6.m5.1d">4</annotation></semantics></math>. </p> </div> </section> <section class="ltx_subsection" id="S3.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">3.2 </span>Rate Distortion Performance</h3> <div class="ltx_para" id="S3.SS2.p1"> <p class="ltx_p" id="S3.SS2.p1.1">To demonstrate the effectiveness of the proposed method, the rate-distortion performance and model complexity is evaluated. We compare our LVC-LGMC with existing video coding schemes, including DCVC-TCM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib9" title="">9</a>]</cite>, CANF-VC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib21" title="">21</a>]</cite>, DCVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib2" title="">2</a>]</cite>, MLVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib20" title="">20</a>]</cite>, RLVC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib19" title="">19</a>]</cite>, DVCPro <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib15" title="">15</a>]</cite>, HM-16.20, x265. The UVG <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib18" title="">18</a>]</cite>, MCL-JCV <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib11" title="">11</a>]</cite>, and HEVC test sequences (class B, class C and class D) are all involved for performance evaluation. In particular, for learned coding schemes, we employ the official models for testing. First 96 frames in each sequences are involved in testing, and the intra period is set to 32. The height and width of frames are padded to the multiples of 64 to facilitate testing. Fig. <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S2.F4" title="Figure 4 ‣ 2.2 Attention-based Global Compensation ‣ 2 The Proposed LVC-LGMC Method ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_tag">4</span></a>, Fig. <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S2.F5" title="Figure 5 ‣ 2.2 Attention-based Global Compensation ‣ 2 The Proposed LVC-LGMC Method ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_tag">5</span></a> and Table <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S2.T1" title="Table 1 ‣ 2.2 Attention-based Global Compensation ‣ 2 The Proposed LVC-LGMC Method ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_tag">1</span></a> present the rate-distortion performance. The proposed global and local motion compensation model LVC-LGMC outperforms the baseline DCVC-TCM model <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib9" title="">9</a>]</cite> on all datasets, indicating the effectiveness of employing global and local motion compensation for motion estimation in learning-based video coding. Moreover, the proposed LVC-LGMC shows superior rate-distortion performance when compared with the CANF-VC, DCVC, MLVC, RLVC, DVCPro, HM-16.20, and x265. When compared with baseline model DCVC-TCM, the proposed LVC-LGMC achieves significant bit-rate savings. More specifically, the BD-rate gains on MCL-JCV are 10.09% in terms of PSNR and 9.74% BD-rate savings are obtained in terms of MS-SSIM. Moreover, it is interesting to see that we could constant achieve the rate-distortion performance improvement on different resolutions and contents, indicating the effectiveness of the proposed method. We compare the LVC-LGMC with the baseline model DCVC-TCM regarding the parameter numbers, and coding complexity. The test environment is Tesla A100 GPU. The parameter numbers of the proposed LVC-LGMC and DCVC-TCM are 14.09M and 10.71M, respectively. The coding complexity increament of the LVC-LGMC is marginal, which increases the decoding time by 20% on 1080p sequences. The complexity increases of the LVC-LGMC is tolerable as the rate-distortion performance improvements are significant. </p> </div> <figure class="ltx_figure" id="S3.F6"> <div class="ltx_flex_figure"> <div class="ltx_flex_cell ltx_flex_size_1"> <figure class="ltx_figure ltx_flex_size_1 ltx_align_center" id="S3.F6.sf1"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="87" id="S3.F6.sf1.g1" src="x14.png" width="336"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">(a) </span></figcaption> </figure> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <figure class="ltx_figure ltx_flex_size_1 ltx_align_center" id="S3.F6.sf2"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="87" id="S3.F6.sf2.g1" src="x15.png" width="340"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">(b) </span></figcaption> </figure> </div> </div> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text ltx_font_bold" id="S3.F6.2.1.1">Fig. 6</span>: </span>Bit allocation of LVC-LGMC and baseline model DCVC-TCM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib9" title="">9</a>]</cite>.</figcaption> </figure> <figure class="ltx_figure" id="S3.F7"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="230" id="S3.F7.g1" src="x16.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text ltx_font_bold" id="S3.F7.2.1.1">Fig. 7</span>: </span>Ablation Studies on MCL-JCV dataset.</figcaption> </figure> </section> <section class="ltx_subsection" id="S3.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">3.3 </span>Analyses and Discussions</h3> <section class="ltx_subsubsection" id="S3.SS3.SSS1"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection">3.3.1 </span>Bit Allocation</h4> <div class="ltx_para" id="S3.SS3.SSS1.p1"> <p class="ltx_p" id="S3.SS3.SSS1.p1.1">We analyze the proposed mixed attention-flow based motion estimation and compensation via bit-allocation. We take sequence BQTerrace from HEVC B as an example, where the resolution is 1920<math alttext="\times" class="ltx_Math" display="inline" id="S3.SS3.SSS1.p1.1.m1.1"><semantics id="S3.SS3.SSS1.p1.1.m1.1a"><mo id="S3.SS3.SSS1.p1.1.m1.1.1" xref="S3.SS3.SSS1.p1.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S3.SS3.SSS1.p1.1.m1.1b"><times id="S3.SS3.SSS1.p1.1.m1.1.1.cmml" xref="S3.SS3.SSS1.p1.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.SSS1.p1.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.SSS1.p1.1.m1.1d">×</annotation></semantics></math>1080 and the frame rate is 60. The average PSNR of LVC-LGMC is 33.878 dB and the average PSNR of DCVC-TCM is 33.871 dB. The associated bit-rate and motion bit-rate of individual frame are presented in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S3.F6" title="Figure 6 ‣ 3.2 Rate Distortion Performance ‣ 3 Experiments ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_tag">6</span></a>. The proposed LVC-LGMC consumes lower coding bits in motion representation, and the overall bit consumption is lower than the DCVC-TCM. It is worthy of noting that the proposed global module is bit-free for motion representation. Long-range similarities can be well estimated with the global module, such that fewer bits are required for representing the flow map. Consequently, the enhanced prediction leads to the overall bit reduction of the frame-level information coding. </p> </div> </section> <section class="ltx_subsubsection" id="S3.SS3.SSS2"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection">3.3.2 </span>Ablation Studies</h4> <div class="ltx_para" id="S3.SS3.SSS2.p1"> <p class="ltx_p" id="S3.SS3.SSS2.p1.1">To comprehensively evaluate the effectiveness of the proposed attention-based global context for motion compensation, we remove the global module at the encoder-side and remove the global module at the decoder-side. Removing global at the encoder-side is quite similar with distributed source coding. The results are presented in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#S3.F7" title="Figure 7 ‣ 3.2 Rate Distortion Performance ‣ 3 Experiments ‣ LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression"><span class="ltx_text ltx_ref_tag">7</span></a>. Global at encoder and global at decoder both lead to performance degradation, but performs better than baseline DCVC-TCM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib9" title="">9</a>]</cite>.</p> </div> </section> </section> </section> <section class="ltx_section" id="S4"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">4 </span>Conclusion</h2> <div class="ltx_para" id="S4.p1"> <p class="ltx_p" id="S4.p1.1">In this paper, we propose joint global and local motion compensation (LGMC) for learned video coding. We propose to employ the cross attention to capture global contexts for global motion compensation. To reduce the complexity, we divide the softmax operation in vanilla attention into two independent softmax operations, leading to linear complexity. We employ existing flow-based motion comprensation for local contexts. To evaluate our proposed module, we incorporate it with DCVC-TCM and we get video compression model LVC-LGMC. Extensive experiments demonstrate that our LVC-LGMC have significant improvements over corresponding baseline DCVC-TCM. Our methods are plug-and-play and can be employed in other conditional coding based models <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib2" title="">2</a>, <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib6" title="">6</a>, <a class="ltx_ref" href="https://arxiv.org/html/2402.00680v3#bib.bib8" title="">8</a>]</cite> for further performance improvements.</p> </div> </section> <section class="ltx_bibliography" id="bib"> <h2 class="ltx_title ltx_title_bibliography">References</h2> <ul class="ltx_biblist"> <li class="ltx_bibitem" id="bib.bib1"> <span class="ltx_tag ltx_tag_bibitem">[1]</span> <span class="ltx_bibblock"> Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao, </span> <span class="ltx_bibblock">“Dvc: An end-to-end deep video compression framework,” </span> <span class="ltx_bibblock">in <span class="ltx_text ltx_font_italic" id="bib.bib1.1.1">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</span>, 2019, pp. 11006–11015. </span> </li> <li class="ltx_bibitem" id="bib.bib2"> <span class="ltx_tag ltx_tag_bibitem">[2]</span> <span class="ltx_bibblock"> Jiahao Li, Bin Li, and Yan Lu, </span> <span class="ltx_bibblock">“Deep contextual video compression,” </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib2.1.1">Advances in Neural Information Processing Systems</span>, vol. 34, pp. 18114–18125, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib3"> <span class="ltx_tag ltx_tag_bibitem">[3]</span> <span class="ltx_bibblock"> Kai Lin, Chuanmin Jia, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Wen Gao, </span> <span class="ltx_bibblock">“Dmvc: Decomposed motion modeling for learned video compression,” </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib3.1.1">IEEE Transactions on Circuits and Systems for Video Technology</span>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib4"> <span class="ltx_tag ltx_tag_bibitem">[4]</span> <span class="ltx_bibblock"> Haifeng Guo, Sam Kwong, Chuanmin Jia, and Shiqi Wang, </span> <span class="ltx_bibblock">“Enhanced motion compensation for deep video compression,” </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib4.1.1">IEEE Signal Processing Letters</span>, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib5"> <span class="ltx_tag ltx_tag_bibitem">[5]</span> <span class="ltx_bibblock"> Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei, </span> <span class="ltx_bibblock">“Deformable convolutional networks,” </span> <span class="ltx_bibblock">in <span class="ltx_text ltx_font_italic" id="bib.bib5.1.1">Proceedings of the IEEE international conference on computer vision</span>, 2017, pp. 764–773. </span> </li> <li class="ltx_bibitem" id="bib.bib6"> <span class="ltx_tag ltx_tag_bibitem">[6]</span> <span class="ltx_bibblock"> Jiahao Li, Bin Li, and Yan Lu, </span> <span class="ltx_bibblock">“Hybrid spatial-temporal entropy modelling for neural video compression,” </span> <span class="ltx_bibblock">in <span class="ltx_text ltx_font_italic" id="bib.bib6.1.1">Proceedings of the 30th ACM International Conference on Multimedia</span>, 2022, pp. 1503–1511. </span> </li> <li class="ltx_bibitem" id="bib.bib7"> <span class="ltx_tag ltx_tag_bibitem">[7]</span> <span class="ltx_bibblock"> Zhihao Hu, Guo Lu, Jinyang Guo, Shan Liu, Wei Jiang, and Dong Xu, </span> <span class="ltx_bibblock">“Coarse-to-fine deep video coding with hyperprior-guided mode prediction,” </span> <span class="ltx_bibblock">in <span class="ltx_text ltx_font_italic" id="bib.bib7.1.1">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</span>, 2022, pp. 5921–5930. </span> </li> <li class="ltx_bibitem" id="bib.bib8"> <span class="ltx_tag ltx_tag_bibitem">[8]</span> <span class="ltx_bibblock"> Jiahao Li, Bin Li, and Yan Lu, </span> <span class="ltx_bibblock">“Neural video compression with diverse contexts,” </span> <span class="ltx_bibblock">in <span class="ltx_text ltx_font_italic" id="bib.bib8.1.1">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</span>, 2023, pp. 22616–22626. </span> </li> <li class="ltx_bibitem" id="bib.bib9"> <span class="ltx_tag ltx_tag_bibitem">[9]</span> <span class="ltx_bibblock"> Xihua Sheng, Jiahao Li, Bin Li, Li Li, Dong Liu, and Yan Lu, </span> <span class="ltx_bibblock">“Temporal context mining for learned video compression,” </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib9.1.1">IEEE Transactions on Multimedia</span>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib10"> <span class="ltx_tag ltx_tag_bibitem">[10]</span> <span class="ltx_bibblock"> Zhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, and Hongsheng Li, </span> <span class="ltx_bibblock">“Efficient attention: Attention with linear complexities,” </span> <span class="ltx_bibblock">in <span class="ltx_text ltx_font_italic" id="bib.bib10.1.1">Proceedings of the IEEE/CVF winter conference on applications of computer vision</span>, 2021, pp. 3531–3539. </span> </li> <li class="ltx_bibitem" id="bib.bib11"> <span class="ltx_tag ltx_tag_bibitem">[11]</span> <span class="ltx_bibblock"> Haiqiang Wang, Weihao Gan, Sudeng Hu, Joe Yuchieh Lin, Lina Jin, Longguang Song, Ping Wang, Ioannis Katsavounidis, Anne Aaron, and C-C Jay Kuo, </span> <span class="ltx_bibblock">“Mcl-jcv: a jnd-based h. 264/avc video quality assessment dataset,” </span> <span class="ltx_bibblock">in <span class="ltx_text ltx_font_italic" id="bib.bib11.1.1">2016 IEEE international conference on image processing (ICIP)</span>. IEEE, 2016, pp. 1509–1513. </span> </li> <li class="ltx_bibitem" id="bib.bib12"> <span class="ltx_tag ltx_tag_bibitem">[12]</span> <span class="ltx_bibblock"> Wei Jiang, Peirong Ning, and Ronggang Wang, </span> <span class="ltx_bibblock">“Slic: Self-conditioned adaptive transform with large-scale receptive fields for learned image compression,” </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib12.1.1">arXiv preprint arXiv:2304.09571</span>, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib13"> <span class="ltx_tag ltx_tag_bibitem">[13]</span> <span class="ltx_bibblock"> Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, <span class="ltx_text ltx_font_caligraphic" id="bib.bib13.1.1">L</span>ukasz Kaiser, and Illia Polosukhin, </span> <span class="ltx_bibblock">“Attention is all you need,” </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib13.2.1">Advances in neural information processing systems</span>, vol. 30, 2017. </span> </li> <li class="ltx_bibitem" id="bib.bib14"> <span class="ltx_tag ltx_tag_bibitem">[14]</span> <span class="ltx_bibblock"> Wei Jiang, Jiayu Yang, Yongqi Zhai, Peirong Ning, Feng Gao, and Ronggang Wang, </span> <span class="ltx_bibblock">“Mlic: Multi-reference entropy model for learned image compression,” </span> <span class="ltx_bibblock">in <span class="ltx_text ltx_font_italic" id="bib.bib14.1.1">Proceedings of the 31st ACM International Conference on Multimedia</span>, 2023, pp. 7618–7627. </span> </li> <li class="ltx_bibitem" id="bib.bib15"> <span class="ltx_tag ltx_tag_bibitem">[15]</span> <span class="ltx_bibblock"> Guo Lu, Xiaoyun Zhang, Wanli Ouyang, Li Chen, Zhiyong Gao, and Dong Xu, </span> <span class="ltx_bibblock">“An end-to-end learning framework for video compression,” </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib15.1.1">IEEE transactions on pattern analysis and machine intelligence</span>, vol. 43, no. 10, pp. 3292–3308, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib16"> <span class="ltx_tag ltx_tag_bibitem">[16]</span> <span class="ltx_bibblock"> Wei Jiang and Ronggang Wang, </span> <span class="ltx_bibblock">“Mlic++: Linear complexity multi-reference entropy modeling for learned image compression,” </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib16.1.1">arXiv preprint arXiv:2307.15421</span>, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib17"> <span class="ltx_tag ltx_tag_bibitem">[17]</span> <span class="ltx_bibblock"> Zhou Wang, Eero P Simoncelli, and Alan C Bovik, </span> <span class="ltx_bibblock">“Multiscale structural similarity for image quality assessment,” </span> <span class="ltx_bibblock">in <span class="ltx_text ltx_font_italic" id="bib.bib17.1.1">The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003</span>. Ieee, 2003, vol. 2, pp. 1398–1402. </span> </li> <li class="ltx_bibitem" id="bib.bib18"> <span class="ltx_tag ltx_tag_bibitem">[18]</span> <span class="ltx_bibblock"> Alexandre Mercat, Marko Viitanen, and Jarno Vanne, </span> <span class="ltx_bibblock">“Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” </span> <span class="ltx_bibblock">in <span class="ltx_text ltx_font_italic" id="bib.bib18.1.1">Proceedings of the 11th ACM Multimedia Systems Conference</span>, 2020, pp. 297–302. </span> </li> <li class="ltx_bibitem" id="bib.bib19"> <span class="ltx_tag ltx_tag_bibitem">[19]</span> <span class="ltx_bibblock"> Ren Yang, Fabian Mentzer, Luc Van Gool, and Radu Timofte, </span> <span class="ltx_bibblock">“Learning for video compression with hierarchical quality and recurrent enhancement,” </span> <span class="ltx_bibblock">in <span class="ltx_text ltx_font_italic" id="bib.bib19.1.1">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</span>, 2020, pp. 6628–6637. </span> </li> <li class="ltx_bibitem" id="bib.bib20"> <span class="ltx_tag ltx_tag_bibitem">[20]</span> <span class="ltx_bibblock"> Jianping Lin, Dong Liu, Houqiang Li, and Feng Wu, </span> <span class="ltx_bibblock">“M-lvc: Multiple frames prediction for learned video compression,” </span> <span class="ltx_bibblock">in <span class="ltx_text ltx_font_italic" id="bib.bib20.1.1">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</span>, June 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib21"> <span class="ltx_tag ltx_tag_bibitem">[21]</span> <span class="ltx_bibblock"> Yung-Han Ho, Chih-Peng Chang, Peng-Yu Chen, Alessandro Gnutti, and Wen-Hsiao Peng, </span> <span class="ltx_bibblock">“Canf-vc: Conditional augmented normalizing flows for video compression,” </span> <span class="ltx_bibblock">in <span class="ltx_text ltx_font_italic" id="bib.bib21.1.1">European Conference on Computer Vision</span>. Springer, 2022, pp. 207–223. </span> </li> <li class="ltx_bibitem" id="bib.bib22"> <span class="ltx_tag ltx_tag_bibitem">[22]</span> <span class="ltx_bibblock"> Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman, </span> <span class="ltx_bibblock">“Video enhancement with task-oriented flow,” </span> <span class="ltx_bibblock"><span class="ltx_text ltx_font_italic" id="bib.bib22.1.1">International Journal of Computer Vision (IJCV)</span>, vol. 127, no. 8, pp. 1106–1125, 2019. </span> </li> </ul> </section> </article> </div> <footer class="ltx_page_footer"> <div class="ltx_page_logo">Generated on Mon Mar 11 12:41:10 2024 by <a class="ltx_LaTeXML_logo" href="http://dlmf.nist.gov/LaTeXML/"><span style="letter-spacing:-0.2em; margin-right:0.1em;">L<span style="font-size:70%;position:relative; bottom:2.2pt;">A</span>T<span style="position:relative; bottom:-0.4ex;">E</span></span><span class="ltx_font_smallcaps">xml</span><img alt="[LOGO]" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAsAAAAOCAYAAAD5YeaVAAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A/wD/oL2nkwAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9wKExQZLWTEaOUAAAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAdpJREFUKM9tkL+L2nAARz9fPZNCKFapUn8kyI0e4iRHSR1Kb8ng0lJw6FYHFwv2LwhOpcWxTjeUunYqOmqd6hEoRDhtDWdA8ApRYsSUCDHNt5ul13vz4w0vWCgUnnEc975arX6ORqN3VqtVZbfbTQC4uEHANM3jSqXymFI6yWazP2KxWAXAL9zCUa1Wy2tXVxheKA9YNoR8Pt+aTqe4FVVVvz05O6MBhqUIBGk8Hn8HAOVy+T+XLJfLS4ZhTiRJgqIoVBRFIoric47jPnmeB1mW/9rr9ZpSSn3Lsmir1fJZlqWlUonKsvwWwD8ymc/nXwVBeLjf7xEKhdBut9Hr9WgmkyGEkJwsy5eHG5vN5g0AKIoCAEgkEkin0wQAfN9/cXPdheu6P33fBwB4ngcAcByHJpPJl+fn54mD3Gg0NrquXxeLRQAAwzAYj8cwTZPwPH9/sVg8PXweDAauqqr2cDjEer1GJBLBZDJBs9mE4zjwfZ85lAGg2+06hmGgXq+j3+/DsixYlgVN03a9Xu8jgCNCyIegIAgx13Vfd7vdu+FweG8YRkjXdWy329+dTgeSJD3ieZ7RNO0VAXAPwDEAO5VKndi2fWrb9jWl9Esul6PZbDY9Go1OZ7PZ9z/lyuD3OozU2wAAAABJRU5ErkJggg=="/></a> </div></footer> </div> </body> </html>