CINXE.COM

An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation

<!DOCTYPE html> <html lang="en"> <head> <meta content="text/html; charset=utf-8" http-equiv="content-type"/> <title>An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation</title> <!--Generated on Tue Mar 18 07:27:46 2025 by LaTeXML (version 0.8.8) http://dlmf.nist.gov/LaTeXML/.--> <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv-fonts.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/latexml_styles.css" rel="stylesheet" type="text/css"/> <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.3.3/html2canvas.min.js"></script> <script src="/static/browse/0.3.4/js/addons_new.js"></script> <script src="/static/browse/0.3.4/js/feedbackOverlay.js"></script> <base href="/html/2503.10118v2/"/></head> <body> <nav class="ltx_page_navbar"> <nav class="ltx_TOC"> <ol class="ltx_toclist"> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S1" title="In An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">I </span><span class="ltx_text ltx_font_smallcaps">Introduction</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S2" title="In An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">II </span><span class="ltx_text ltx_font_smallcaps">Preliminary</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S2.SS1" title="In II Preliminary ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-A</span> </span><span class="ltx_text ltx_font_italic">Reinforcement Learning Algorithms</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S2.SS2" title="In II Preliminary ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-B</span> </span><span class="ltx_text ltx_font_italic">Differentiable Simulator</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S2.SS3" title="In II Preliminary ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-C</span> </span><span class="ltx_text ltx_font_italic">Data Collection Approaches</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S2.SS4" title="In II Preliminary ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-D</span> </span><span class="ltx_text ltx_font_italic">Information Theory</span></span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S2.SS4.SSS1" title="In II-D Information Theory ‣ II Preliminary ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-D</span>1 </span>Kernel Density Estimation (KDE)</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S2.SS4.SSS2" title="In II-D Information Theory ‣ II Preliminary ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-D</span>2 </span>Distribution Divergence Measure</span></a></li> </ol> </li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S3" title="In An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">III </span><span class="ltx_text ltx_font_smallcaps">Method</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S3.SS1" title="In III Method ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-A</span> </span><span class="ltx_text ltx_font_italic">System Overview</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S3.SS2" title="In III Method ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-B</span> </span><span class="ltx_text ltx_font_italic">Simulation-Env Parameter Tuning Process</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S3.SS3" title="In III Method ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-C</span> </span><span class="ltx_text ltx_font_italic">Adaptive InfoGap Loss construction</span></span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S4" title="In An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">IV </span><span class="ltx_text ltx_font_smallcaps">Experiments</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S4.SS1" title="In IV Experiments ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">IV-A</span> </span><span class="ltx_text ltx_font_italic">Experimental Settings</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S4.SS2" title="In IV Experiments ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">IV-B</span> </span><span class="ltx_text ltx_font_italic">Block Pushing Experiment</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S4.SS3" title="In IV Experiments ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">IV-C</span> </span><span class="ltx_text ltx_font_italic">T-shaped Block Pushing</span></span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S5" title="In An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">V </span><span class="ltx_text ltx_font_smallcaps">Discussion on Incorporating Visual Loss</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S6" title="In An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">VI </span><span class="ltx_text ltx_font_smallcaps">Limitations</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S7" title="In An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">VII </span><span class="ltx_text ltx_font_smallcaps">Conclusion and Future Work</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S8" title="In An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">VIII </span><span class="ltx_text ltx_font_smallcaps">Appendix</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S8.SS1" title="In VIII Appendix ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">VIII-A</span> </span><span class="ltx_text ltx_font_italic">Algorithm</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S8.SS2" title="In VIII Appendix ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">VIII-B</span> </span><span class="ltx_text ltx_font_italic">RL Settings</span></span></a></li> </ol> </li> </ol></nav> </nav> <div class="ltx_page_main"> <div class="ltx_page_content"> <article class="ltx_document ltx_authors_1line"> <h1 class="ltx_title ltx_font_bold ltx_title_document" style="font-size:173%;">An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation </h1> <div class="ltx_authors"> <span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Lu Shi<sup class="ltx_sup" id="id12.12.id1"><span class="ltx_text ltx_font_italic" id="id12.12.id1.1">1</span></sup>, Yuxuan Xu<sup class="ltx_sup" id="id13.13.id2"><span class="ltx_text ltx_font_italic" id="id13.13.id2.1">2</span></sup>, Shiyu Wang<sup class="ltx_sup" id="id14.14.id3"><span class="ltx_text ltx_font_italic" id="id14.14.id3.1">2</span></sup>, Jinhao Huang<sup class="ltx_sup" id="id15.15.id4"><span class="ltx_text ltx_font_italic" id="id15.15.id4.1">1</span></sup>, Wenhao Zhao<sup class="ltx_sup" id="id16.16.id5"><span class="ltx_text ltx_font_italic" id="id16.16.id5.1">3</span></sup>, <br class="ltx_break"/>Zike Yan<sup class="ltx_sup" id="id17.17.id6"><span class="ltx_text ltx_font_italic" id="id17.17.id6.1">1</span></sup>, Weibin Gu<sup class="ltx_sup" id="id18.18.id7"><span class="ltx_text ltx_font_italic" id="id18.18.id7.1">1</span></sup>, and Guyue Zhou<sup class="ltx_sup" id="id19.19.id8"><span class="ltx_text ltx_font_italic" id="id19.19.id8.1">1</span></sup> </span><span class="ltx_author_notes">The authors are with <sup class="ltx_sup" id="id20.20.id1"><span class="ltx_text ltx_font_italic" id="id20.20.id1.1">1</span></sup>Institute for AI Industry Research (AIR), Tsinghua University, <sup class="ltx_sup" id="id21.21.id2">2</sup> Beijing Jiaotong University, and <sup class="ltx_sup" id="id22.22.id3">3</sup>The Hong Kong University of Science and Technology (Guangzhou).We gratefully acknowledge the support of Wuxi Research Institute of Applied Technologies, Tsinghua University under Grant 20242001120 and the Shuimu Scholarship of Tsinghua University. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.</span></span> </div> <div class="ltx_abstract"> <h6 class="ltx_title ltx_title_abstract">Abstract</h6> <p class="ltx_p" id="id23.id1">The sim-to-real gap remains a critical challenge in robotics, hindering the deployment of algorithms trained in simulation to real-world systems. This paper introduces a novel Real-Sim-Real (RSR) loop framework leveraging differentiable simulation to address this gap by iteratively refining simulation parameters, aligning them with real-world conditions, and enabling robust and efficient policy transfer. A key contribution of our work is the design of an informative cost function that encourages the collection of diverse and representative real-world data, minimizing bias and maximizing the utility of each data point for simulation refinement. This cost function integrates seamlessly into existing reinforcement learning algorithms (e.g., PPO, SAC) and ensures a balanced exploration of critical regions in the real domain. Furthermore, our approach is implemented on the versatile Mujoco MJX platform, and our framework is compatible with a wide range of robotic systems. Experimental results on several robotic manipulation tasks demonstrate that our method significantly reduces the sim-to-real gap, achieving high task performance and generalizability across diverse scenarios of both explicit and implicit environmental uncertainties. Please refer to our GitHub repository <span class="ltx_ref ltx_nolink ltx_url ltx_font_typewriter ltx_ref_self">https://github.com/sunnyshi0310/RSR-MJX.git</span> for the code.</p> </div> <section class="ltx_section" id="S1"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">I </span><span class="ltx_text ltx_font_smallcaps" id="S1.1.1">Introduction</span> </h2> <div class="ltx_para" id="S1.p1"> <p class="ltx_p" id="S1.p1.1">In recent years, simulation has become an indispensable tool in the field of robot learning, especially for developing and testing intelligent control algorithms <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib1" title="">1</a>]</cite>. The use of simulations allows researchers to train policies in a safe and cost-effective environment, avoiding the risks associated with physical robots, such as damage to the robot or the surrounding environment. In addition, simulation enables rapid experimentation, which is critical in reducing the time and data costs required to train machine learning models.</p> </div> <div class="ltx_para" id="S1.p2"> <p class="ltx_p" id="S1.p2.1">The sim2real gap refers to the discrepancies between the simulated environment and the real world <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib2" title="">2</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib3" title="">3</a>]</cite>. While simulation provides a high level of control over the training process, the virtual environment often does not capture the full complexity and variability of the real world. As a result, policies trained in simulation may not generalize well when applied to real robots, leading to suboptimal or unsafe behavior <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib4" title="">4</a>]</cite>. Sim2real gap may arise from various sources, especially differences in physics, sensor noise, and environmental uncertainties.</p> </div> <div class="ltx_para" id="S1.p3"> <p class="ltx_p" id="S1.p3.1">Various approaches have been proposed to address these challenges <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib5" title="">5</a>]</cite>. One of the most widely used techniques is Domain Randomization (DR) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib6" title="">6</a>]</cite>. DR enhances generalization by increasing the diversity of the simulated environment, training algorithms across a wide range of parameter variations. Although effective, DR often requires manual selection of parameters to randomize, lacks scalability as the complexity of the problem grows, and operates as an open-loop approach that does not incorporate feedback from real-world data. Another promising avenue lies in domain adaptation <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib7" title="">7</a>]</cite>, where the sim2real problem is framed as a transfer learning task. In this paradigm, the simulation domain (source domain) and the real world domain (target domain) are bridged through feature alignment techniques, such as adversarial methods <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib8" title="">8</a>]</cite> or reconstruction-based approaches <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib9" title="">9</a>]</cite>. Despite their success in computer vision, adapting these methods to robotics poses unique challenges, particularly in extracting and leveraging invariant features for domain matching and transfer. Additionally, techniques such as neural-augmented simulation <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib10" title="">10</a>]</cite> utilize models like LSTM networks to predict discrepancies between simulation and real-world performance. However, these approaches often require large amounts of real-world data and are sensitive to noise, limiting their robustness.</p> </div> <div class="ltx_para" id="S1.p4"> <p class="ltx_p" id="S1.p4.1">Recent work has demonstrated the effectiveness of using real-world data to tune simulators, helping bridge the sim-to-real gap <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib11" title="">11</a>]</cite>. For instance, <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib12" title="">12</a>]</cite> propose a novel real-sim-real (RSR) transfer method. In the real-to-sim training phase, a task-relevant simulated environment is constructed using semantic information. A policy is then trained within this simulated environment. In the subsequent sim-to-real inference phase, the trained policy is directly applied to control the robot in real-world scenarios, without requiring any additional real-world data. Other methods, such as Bayessim <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib13" title="">13</a>]</cite>, enhance domain randomization by incorporating real-world data to build a posterior distribution of simulation parameters. Additionally, <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib14" title="">14</a>]</cite> integrates vision-based foundation models with robotics tasks, enabling the reconstruction of robot poses from video using forward kinematics and gradient-based optimization techniques. Other than those methods, differentiable simulators <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib15" title="">15</a>]</cite> allow simulation parameters to be fine-tuned using gradient-based optimization techniques with real-world data. This enables the simulation environment to more closely match the real-world dynamics of the robot <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib16" title="">16</a>]</cite>. For example, <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib17" title="">17</a>]</cite> defines key physical parameters, such as mass and force control inputs, and uses a differentiable physics engine to simulate corresponding states. These simulated states are then compared with real-world data, such as video recordings, to generate a loss function. Backpropagation is applied to update the simulator parameters, improving the simulation’s accuracy. In another example, <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib18" title="">18</a>]</cite> establishes a full differentiable pipeline, linking pixel observations to reinforcement learning (RL) policies and state estimation, which significantly enhances RL performance.</p> </div> <div class="ltx_para" id="S1.p5"> <p class="ltx_p" id="S1.p5.1">However, these approaches face several challenges. First, many existing methods overlook potential biases in real-world data collection, which can lead to suboptimal tuning results. The assumption that real-world data accurately represents the robot’s domain may not always hold, especially if the data is collected in a non-representative manner or fails to cover the regions of interest for the policy. Second, some methods rely solely on visual appearance or video data for tuning. While this approach can provide useful information, it is highly sensitive to factors such as lighting conditions, camera angles, and sensor calibration, which require rigorous experimental setups. Additionally, visual data primarily captures information about the pose of the robot, but it often fails to provide insights into critical dynamic variables like velocity, acceleration, or thrust, which are essential for accurate simulation and control. Finally, many of these approaches are limited to specific types of robots or environments, reducing their generalizability.</p> </div> <div class="ltx_para" id="S1.p6"> <p class="ltx_p" id="S1.p6.1">To overcome these limitations, we propose a novel framework for sim-to-real transfer that leverages real-world data to adjust the parameters of a differentiable simulator. Our framework uses a cost function derived from information theory, which incorporates sim2real gap considerations and encourages the policy to collect data that is most informative for tuning the simulator. This cost function can be integrated with various reinforcement learning algorithms, such as PPO <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib19" title="">19</a>]</cite> and SAC <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib20" title="">20</a>]</cite>, etc. In addition, we consider both physical states and visual appearance as part of the cost, enabling more comprehensive tuning. Our framework is built on the MuJoCo MJX engine <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib21" title="">21</a>]</cite>, ensuring compatibility with a wide range of robotic platforms. We demonstrate the effectiveness of our approach through several experiments on a robotic arm. At the end of the paper, we present the discussion on the effect include the visual loss for the computation, limitation of the work and our future works.</p> </div> </section> <section class="ltx_section" id="S2"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">II </span><span class="ltx_text ltx_font_smallcaps" id="S2.1.1">Preliminary</span> </h2> <section class="ltx_subsection" id="S2.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S2.SS1.4.1.1">II-A</span> </span><span class="ltx_text ltx_font_italic" id="S2.SS1.5.2">Reinforcement Learning Algorithms</span> </h3> <div class="ltx_para" id="S2.SS1.p1"> <p class="ltx_p" id="S2.SS1.p1.7">Reinforcement Learning (RL) is a framework where an agent learns to make sequential decisions by interacting with an environment to maximize a cumulative reward. The agent observes the current state <math alttext="s_{t}" class="ltx_Math" display="inline" id="S2.SS1.p1.1.m1.1"><semantics id="S2.SS1.p1.1.m1.1a"><msub id="S2.SS1.p1.1.m1.1.1" xref="S2.SS1.p1.1.m1.1.1.cmml"><mi id="S2.SS1.p1.1.m1.1.1.2" xref="S2.SS1.p1.1.m1.1.1.2.cmml">s</mi><mi id="S2.SS1.p1.1.m1.1.1.3" xref="S2.SS1.p1.1.m1.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.1.m1.1b"><apply id="S2.SS1.p1.1.m1.1.1.cmml" xref="S2.SS1.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.1.m1.1.1.1.cmml" xref="S2.SS1.p1.1.m1.1.1">subscript</csymbol><ci id="S2.SS1.p1.1.m1.1.1.2.cmml" xref="S2.SS1.p1.1.m1.1.1.2">𝑠</ci><ci id="S2.SS1.p1.1.m1.1.1.3.cmml" xref="S2.SS1.p1.1.m1.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.1.m1.1c">s_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.1.m1.1d">italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> at each time step <math alttext="t" class="ltx_Math" display="inline" id="S2.SS1.p1.2.m2.1"><semantics id="S2.SS1.p1.2.m2.1a"><mi id="S2.SS1.p1.2.m2.1.1" xref="S2.SS1.p1.2.m2.1.1.cmml">t</mi><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.2.m2.1b"><ci id="S2.SS1.p1.2.m2.1.1.cmml" xref="S2.SS1.p1.2.m2.1.1">𝑡</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.2.m2.1c">t</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.2.m2.1d">italic_t</annotation></semantics></math>, selects an action <math alttext="a_{t}" class="ltx_Math" display="inline" id="S2.SS1.p1.3.m3.1"><semantics id="S2.SS1.p1.3.m3.1a"><msub id="S2.SS1.p1.3.m3.1.1" xref="S2.SS1.p1.3.m3.1.1.cmml"><mi id="S2.SS1.p1.3.m3.1.1.2" xref="S2.SS1.p1.3.m3.1.1.2.cmml">a</mi><mi id="S2.SS1.p1.3.m3.1.1.3" xref="S2.SS1.p1.3.m3.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.3.m3.1b"><apply id="S2.SS1.p1.3.m3.1.1.cmml" xref="S2.SS1.p1.3.m3.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.3.m3.1.1.1.cmml" xref="S2.SS1.p1.3.m3.1.1">subscript</csymbol><ci id="S2.SS1.p1.3.m3.1.1.2.cmml" xref="S2.SS1.p1.3.m3.1.1.2">𝑎</ci><ci id="S2.SS1.p1.3.m3.1.1.3.cmml" xref="S2.SS1.p1.3.m3.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.3.m3.1c">a_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.3.m3.1d">italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>, and receives feedback in the form of a reward <math alttext="r_{t}" class="ltx_Math" display="inline" id="S2.SS1.p1.4.m4.1"><semantics id="S2.SS1.p1.4.m4.1a"><msub id="S2.SS1.p1.4.m4.1.1" xref="S2.SS1.p1.4.m4.1.1.cmml"><mi id="S2.SS1.p1.4.m4.1.1.2" xref="S2.SS1.p1.4.m4.1.1.2.cmml">r</mi><mi id="S2.SS1.p1.4.m4.1.1.3" xref="S2.SS1.p1.4.m4.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.4.m4.1b"><apply id="S2.SS1.p1.4.m4.1.1.cmml" xref="S2.SS1.p1.4.m4.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.4.m4.1.1.1.cmml" xref="S2.SS1.p1.4.m4.1.1">subscript</csymbol><ci id="S2.SS1.p1.4.m4.1.1.2.cmml" xref="S2.SS1.p1.4.m4.1.1.2">𝑟</ci><ci id="S2.SS1.p1.4.m4.1.1.3.cmml" xref="S2.SS1.p1.4.m4.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.4.m4.1c">r_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.4.m4.1d">italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>. The goal is to learn a policy <math alttext="\pi(a_{t}|s_{t})" class="ltx_Math" display="inline" id="S2.SS1.p1.5.m5.1"><semantics id="S2.SS1.p1.5.m5.1a"><mrow id="S2.SS1.p1.5.m5.1.1" xref="S2.SS1.p1.5.m5.1.1.cmml"><mi id="S2.SS1.p1.5.m5.1.1.3" xref="S2.SS1.p1.5.m5.1.1.3.cmml">π</mi><mo id="S2.SS1.p1.5.m5.1.1.2" xref="S2.SS1.p1.5.m5.1.1.2.cmml">⁢</mo><mrow id="S2.SS1.p1.5.m5.1.1.1.1" xref="S2.SS1.p1.5.m5.1.1.1.1.1.cmml"><mo id="S2.SS1.p1.5.m5.1.1.1.1.2" stretchy="false" xref="S2.SS1.p1.5.m5.1.1.1.1.1.cmml">(</mo><mrow id="S2.SS1.p1.5.m5.1.1.1.1.1" xref="S2.SS1.p1.5.m5.1.1.1.1.1.cmml"><msub id="S2.SS1.p1.5.m5.1.1.1.1.1.2" xref="S2.SS1.p1.5.m5.1.1.1.1.1.2.cmml"><mi id="S2.SS1.p1.5.m5.1.1.1.1.1.2.2" xref="S2.SS1.p1.5.m5.1.1.1.1.1.2.2.cmml">a</mi><mi id="S2.SS1.p1.5.m5.1.1.1.1.1.2.3" xref="S2.SS1.p1.5.m5.1.1.1.1.1.2.3.cmml">t</mi></msub><mo fence="false" id="S2.SS1.p1.5.m5.1.1.1.1.1.1" xref="S2.SS1.p1.5.m5.1.1.1.1.1.1.cmml">|</mo><msub id="S2.SS1.p1.5.m5.1.1.1.1.1.3" xref="S2.SS1.p1.5.m5.1.1.1.1.1.3.cmml"><mi id="S2.SS1.p1.5.m5.1.1.1.1.1.3.2" xref="S2.SS1.p1.5.m5.1.1.1.1.1.3.2.cmml">s</mi><mi id="S2.SS1.p1.5.m5.1.1.1.1.1.3.3" xref="S2.SS1.p1.5.m5.1.1.1.1.1.3.3.cmml">t</mi></msub></mrow><mo id="S2.SS1.p1.5.m5.1.1.1.1.3" stretchy="false" xref="S2.SS1.p1.5.m5.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.5.m5.1b"><apply id="S2.SS1.p1.5.m5.1.1.cmml" xref="S2.SS1.p1.5.m5.1.1"><times id="S2.SS1.p1.5.m5.1.1.2.cmml" xref="S2.SS1.p1.5.m5.1.1.2"></times><ci id="S2.SS1.p1.5.m5.1.1.3.cmml" xref="S2.SS1.p1.5.m5.1.1.3">𝜋</ci><apply id="S2.SS1.p1.5.m5.1.1.1.1.1.cmml" xref="S2.SS1.p1.5.m5.1.1.1.1"><csymbol cd="latexml" id="S2.SS1.p1.5.m5.1.1.1.1.1.1.cmml" xref="S2.SS1.p1.5.m5.1.1.1.1.1.1">conditional</csymbol><apply id="S2.SS1.p1.5.m5.1.1.1.1.1.2.cmml" xref="S2.SS1.p1.5.m5.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S2.SS1.p1.5.m5.1.1.1.1.1.2.1.cmml" xref="S2.SS1.p1.5.m5.1.1.1.1.1.2">subscript</csymbol><ci id="S2.SS1.p1.5.m5.1.1.1.1.1.2.2.cmml" xref="S2.SS1.p1.5.m5.1.1.1.1.1.2.2">𝑎</ci><ci id="S2.SS1.p1.5.m5.1.1.1.1.1.2.3.cmml" xref="S2.SS1.p1.5.m5.1.1.1.1.1.2.3">𝑡</ci></apply><apply id="S2.SS1.p1.5.m5.1.1.1.1.1.3.cmml" xref="S2.SS1.p1.5.m5.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.SS1.p1.5.m5.1.1.1.1.1.3.1.cmml" xref="S2.SS1.p1.5.m5.1.1.1.1.1.3">subscript</csymbol><ci id="S2.SS1.p1.5.m5.1.1.1.1.1.3.2.cmml" xref="S2.SS1.p1.5.m5.1.1.1.1.1.3.2">𝑠</ci><ci id="S2.SS1.p1.5.m5.1.1.1.1.1.3.3.cmml" xref="S2.SS1.p1.5.m5.1.1.1.1.1.3.3">𝑡</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.5.m5.1c">\pi(a_{t}|s_{t})</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.5.m5.1d">italic_π ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )</annotation></semantics></math>, which defines the probability distribution over actions <math alttext="a_{t}" class="ltx_Math" display="inline" id="S2.SS1.p1.6.m6.1"><semantics id="S2.SS1.p1.6.m6.1a"><msub id="S2.SS1.p1.6.m6.1.1" xref="S2.SS1.p1.6.m6.1.1.cmml"><mi id="S2.SS1.p1.6.m6.1.1.2" xref="S2.SS1.p1.6.m6.1.1.2.cmml">a</mi><mi id="S2.SS1.p1.6.m6.1.1.3" xref="S2.SS1.p1.6.m6.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.6.m6.1b"><apply id="S2.SS1.p1.6.m6.1.1.cmml" xref="S2.SS1.p1.6.m6.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.6.m6.1.1.1.cmml" xref="S2.SS1.p1.6.m6.1.1">subscript</csymbol><ci id="S2.SS1.p1.6.m6.1.1.2.cmml" xref="S2.SS1.p1.6.m6.1.1.2">𝑎</ci><ci id="S2.SS1.p1.6.m6.1.1.3.cmml" xref="S2.SS1.p1.6.m6.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.6.m6.1c">a_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.6.m6.1d">italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> given a state <math alttext="s_{t}" class="ltx_Math" display="inline" id="S2.SS1.p1.7.m7.1"><semantics id="S2.SS1.p1.7.m7.1a"><msub id="S2.SS1.p1.7.m7.1.1" xref="S2.SS1.p1.7.m7.1.1.cmml"><mi id="S2.SS1.p1.7.m7.1.1.2" xref="S2.SS1.p1.7.m7.1.1.2.cmml">s</mi><mi id="S2.SS1.p1.7.m7.1.1.3" xref="S2.SS1.p1.7.m7.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.7.m7.1b"><apply id="S2.SS1.p1.7.m7.1.1.cmml" xref="S2.SS1.p1.7.m7.1.1"><csymbol cd="ambiguous" id="S2.SS1.p1.7.m7.1.1.1.cmml" xref="S2.SS1.p1.7.m7.1.1">subscript</csymbol><ci id="S2.SS1.p1.7.m7.1.1.2.cmml" xref="S2.SS1.p1.7.m7.1.1.2">𝑠</ci><ci id="S2.SS1.p1.7.m7.1.1.3.cmml" xref="S2.SS1.p1.7.m7.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.7.m7.1c">s_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.7.m7.1d">italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>, that maximizes the expected cumulative reward.</p> </div> <div class="ltx_para" id="S2.SS1.p2"> <p class="ltx_p" id="S2.SS1.p2.1">The networks are usually trained using the following objective:</p> <table class="ltx_equation ltx_eqn_table" id="S2.E1"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\mathcal{L}_{task}=\mathbb{E}[(r_{t}+\gamma V(s_{t+1})-V(s_{t}))\nabla_{\pi}% \log\pi(a_{t}|s_{t})]" class="ltx_Math" display="block" id="S2.E1.m1.1"><semantics id="S2.E1.m1.1a"><mrow id="S2.E1.m1.1.1" xref="S2.E1.m1.1.1.cmml"><msub id="S2.E1.m1.1.1.3" xref="S2.E1.m1.1.1.3.cmml"><mi class="ltx_font_mathcaligraphic" id="S2.E1.m1.1.1.3.2" xref="S2.E1.m1.1.1.3.2.cmml">ℒ</mi><mrow id="S2.E1.m1.1.1.3.3" xref="S2.E1.m1.1.1.3.3.cmml"><mi id="S2.E1.m1.1.1.3.3.2" xref="S2.E1.m1.1.1.3.3.2.cmml">t</mi><mo id="S2.E1.m1.1.1.3.3.1" xref="S2.E1.m1.1.1.3.3.1.cmml">⁢</mo><mi id="S2.E1.m1.1.1.3.3.3" xref="S2.E1.m1.1.1.3.3.3.cmml">a</mi><mo id="S2.E1.m1.1.1.3.3.1a" xref="S2.E1.m1.1.1.3.3.1.cmml">⁢</mo><mi id="S2.E1.m1.1.1.3.3.4" xref="S2.E1.m1.1.1.3.3.4.cmml">s</mi><mo id="S2.E1.m1.1.1.3.3.1b" xref="S2.E1.m1.1.1.3.3.1.cmml">⁢</mo><mi id="S2.E1.m1.1.1.3.3.5" xref="S2.E1.m1.1.1.3.3.5.cmml">k</mi></mrow></msub><mo id="S2.E1.m1.1.1.2" xref="S2.E1.m1.1.1.2.cmml">=</mo><mrow id="S2.E1.m1.1.1.1" xref="S2.E1.m1.1.1.1.cmml"><mi id="S2.E1.m1.1.1.1.3" xref="S2.E1.m1.1.1.1.3.cmml">𝔼</mi><mo id="S2.E1.m1.1.1.1.2" xref="S2.E1.m1.1.1.1.2.cmml">⁢</mo><mrow id="S2.E1.m1.1.1.1.1.1" xref="S2.E1.m1.1.1.1.1.2.cmml"><mo id="S2.E1.m1.1.1.1.1.1.2" stretchy="false" xref="S2.E1.m1.1.1.1.1.2.1.cmml">[</mo><mrow id="S2.E1.m1.1.1.1.1.1.1" xref="S2.E1.m1.1.1.1.1.1.1.cmml"><mrow id="S2.E1.m1.1.1.1.1.1.1.1.1" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.cmml"><mo id="S2.E1.m1.1.1.1.1.1.1.1.1.2" stretchy="false" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.cmml">(</mo><mrow id="S2.E1.m1.1.1.1.1.1.1.1.1.1" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.cmml"><mrow id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.cmml"><msub id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.3" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.3.cmml"><mi id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.3.2" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.3.2.cmml">r</mi><mi id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.3.3" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.3.3.cmml">t</mi></msub><mo id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.2" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.2.cmml">+</mo><mrow id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.cmml"><mi id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.3" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.3.cmml">γ</mi><mo id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.2" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.2.cmml">⁢</mo><mi id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.4" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.4.cmml">V</mi><mo id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.2a" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.2.cmml">⁢</mo><mrow id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.cmml"><mo id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.2" stretchy="false" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.cmml">(</mo><msub id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.cmml"><mi id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.cmml">s</mi><mrow id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.cmml"><mi id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.2" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.2.cmml">t</mi><mo id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.1" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.1.cmml">+</mo><mn id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.3" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.3.cmml">1</mn></mrow></msub><mo id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.3" stretchy="false" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow></mrow><mo id="S2.E1.m1.1.1.1.1.1.1.1.1.1.3" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.3.cmml">−</mo><mrow id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.cmml"><mi id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.3" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.3.cmml">V</mi><mo id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.2" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.2.cmml">⁢</mo><mrow id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1.cmml"><mo id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.2" stretchy="false" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1.cmml">(</mo><msub id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1.cmml"><mi id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1.2" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.cmml">s</mi><mi id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1.3" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.cmml">t</mi></msub><mo id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.3" stretchy="false" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1.cmml">)</mo></mrow></mrow></mrow><mo id="S2.E1.m1.1.1.1.1.1.1.1.1.3" stretchy="false" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.cmml">)</mo></mrow><mo id="S2.E1.m1.1.1.1.1.1.1.3" lspace="0.167em" xref="S2.E1.m1.1.1.1.1.1.1.3.cmml">⁢</mo><mrow id="S2.E1.m1.1.1.1.1.1.1.4" xref="S2.E1.m1.1.1.1.1.1.1.4.cmml"><mrow id="S2.E1.m1.1.1.1.1.1.1.4.1" xref="S2.E1.m1.1.1.1.1.1.1.4.1.cmml"><msub id="S2.E1.m1.1.1.1.1.1.1.4.1.1" xref="S2.E1.m1.1.1.1.1.1.1.4.1.1.cmml"><mo id="S2.E1.m1.1.1.1.1.1.1.4.1.1.2" rspace="0.167em" xref="S2.E1.m1.1.1.1.1.1.1.4.1.1.2.cmml">∇</mo><mi id="S2.E1.m1.1.1.1.1.1.1.4.1.1.3" xref="S2.E1.m1.1.1.1.1.1.1.4.1.1.3.cmml">π</mi></msub><mi id="S2.E1.m1.1.1.1.1.1.1.4.1.2" xref="S2.E1.m1.1.1.1.1.1.1.4.1.2.cmml">log</mi></mrow><mo id="S2.E1.m1.1.1.1.1.1.1.4a" lspace="0.167em" xref="S2.E1.m1.1.1.1.1.1.1.4.cmml">⁡</mo><mi id="S2.E1.m1.1.1.1.1.1.1.4.2" xref="S2.E1.m1.1.1.1.1.1.1.4.2.cmml">π</mi></mrow><mo id="S2.E1.m1.1.1.1.1.1.1.3a" xref="S2.E1.m1.1.1.1.1.1.1.3.cmml">⁢</mo><mrow id="S2.E1.m1.1.1.1.1.1.1.2.1" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.cmml"><mo id="S2.E1.m1.1.1.1.1.1.1.2.1.2" stretchy="false" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.cmml">(</mo><mrow id="S2.E1.m1.1.1.1.1.1.1.2.1.1" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.cmml"><msub id="S2.E1.m1.1.1.1.1.1.1.2.1.1.2" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.2.cmml"><mi id="S2.E1.m1.1.1.1.1.1.1.2.1.1.2.2" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.2.2.cmml">a</mi><mi id="S2.E1.m1.1.1.1.1.1.1.2.1.1.2.3" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.2.3.cmml">t</mi></msub><mo fence="false" id="S2.E1.m1.1.1.1.1.1.1.2.1.1.1" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.1.cmml">|</mo><msub id="S2.E1.m1.1.1.1.1.1.1.2.1.1.3" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.3.cmml"><mi id="S2.E1.m1.1.1.1.1.1.1.2.1.1.3.2" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.3.2.cmml">s</mi><mi id="S2.E1.m1.1.1.1.1.1.1.2.1.1.3.3" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.3.3.cmml">t</mi></msub></mrow><mo id="S2.E1.m1.1.1.1.1.1.1.2.1.3" stretchy="false" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.cmml">)</mo></mrow></mrow><mo id="S2.E1.m1.1.1.1.1.1.3" stretchy="false" xref="S2.E1.m1.1.1.1.1.2.1.cmml">]</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.E1.m1.1b"><apply id="S2.E1.m1.1.1.cmml" xref="S2.E1.m1.1.1"><eq id="S2.E1.m1.1.1.2.cmml" xref="S2.E1.m1.1.1.2"></eq><apply id="S2.E1.m1.1.1.3.cmml" xref="S2.E1.m1.1.1.3"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.3.1.cmml" xref="S2.E1.m1.1.1.3">subscript</csymbol><ci id="S2.E1.m1.1.1.3.2.cmml" xref="S2.E1.m1.1.1.3.2">ℒ</ci><apply id="S2.E1.m1.1.1.3.3.cmml" xref="S2.E1.m1.1.1.3.3"><times id="S2.E1.m1.1.1.3.3.1.cmml" xref="S2.E1.m1.1.1.3.3.1"></times><ci id="S2.E1.m1.1.1.3.3.2.cmml" xref="S2.E1.m1.1.1.3.3.2">𝑡</ci><ci id="S2.E1.m1.1.1.3.3.3.cmml" xref="S2.E1.m1.1.1.3.3.3">𝑎</ci><ci id="S2.E1.m1.1.1.3.3.4.cmml" xref="S2.E1.m1.1.1.3.3.4">𝑠</ci><ci id="S2.E1.m1.1.1.3.3.5.cmml" xref="S2.E1.m1.1.1.3.3.5">𝑘</ci></apply></apply><apply id="S2.E1.m1.1.1.1.cmml" xref="S2.E1.m1.1.1.1"><times id="S2.E1.m1.1.1.1.2.cmml" xref="S2.E1.m1.1.1.1.2"></times><ci id="S2.E1.m1.1.1.1.3.cmml" xref="S2.E1.m1.1.1.1.3">𝔼</ci><apply id="S2.E1.m1.1.1.1.1.2.cmml" xref="S2.E1.m1.1.1.1.1.1"><csymbol cd="latexml" id="S2.E1.m1.1.1.1.1.2.1.cmml" xref="S2.E1.m1.1.1.1.1.1.2">delimited-[]</csymbol><apply id="S2.E1.m1.1.1.1.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1"><times id="S2.E1.m1.1.1.1.1.1.1.3.cmml" xref="S2.E1.m1.1.1.1.1.1.1.3"></times><apply id="S2.E1.m1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1"><minus id="S2.E1.m1.1.1.1.1.1.1.1.1.1.3.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.3"></minus><apply id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1"><plus id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.2"></plus><apply id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.3.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.3.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.3">subscript</csymbol><ci id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.3.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.3.2">𝑟</ci><ci id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.3.3.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.3.3">𝑡</ci></apply><apply id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1"><times id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.2"></times><ci id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.3.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.3">𝛾</ci><ci id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.4.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.4">𝑉</ci><apply id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1">subscript</csymbol><ci id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2">𝑠</ci><apply id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3"><plus id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.1"></plus><ci id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.2">𝑡</ci><cn id="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.3.cmml" type="integer" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.3">1</cn></apply></apply></apply></apply><apply id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2"><times id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.2"></times><ci id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.3.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.3">𝑉</ci><apply id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1">subscript</csymbol><ci id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1.2">𝑠</ci><ci id="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.1.2.1.1.1.3">𝑡</ci></apply></apply></apply><apply id="S2.E1.m1.1.1.1.1.1.1.4.cmml" xref="S2.E1.m1.1.1.1.1.1.1.4"><apply id="S2.E1.m1.1.1.1.1.1.1.4.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.4.1"><apply id="S2.E1.m1.1.1.1.1.1.1.4.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.4.1.1"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.1.1.1.1.4.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.4.1.1">subscript</csymbol><ci id="S2.E1.m1.1.1.1.1.1.1.4.1.1.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.4.1.1.2">∇</ci><ci id="S2.E1.m1.1.1.1.1.1.1.4.1.1.3.cmml" xref="S2.E1.m1.1.1.1.1.1.1.4.1.1.3">𝜋</ci></apply><log id="S2.E1.m1.1.1.1.1.1.1.4.1.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.4.1.2"></log></apply><ci id="S2.E1.m1.1.1.1.1.1.1.4.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.4.2">𝜋</ci></apply><apply id="S2.E1.m1.1.1.1.1.1.1.2.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.2.1"><csymbol cd="latexml" id="S2.E1.m1.1.1.1.1.1.1.2.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.1">conditional</csymbol><apply id="S2.E1.m1.1.1.1.1.1.1.2.1.1.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.2"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.1.1.1.1.2.1.1.2.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.2">subscript</csymbol><ci id="S2.E1.m1.1.1.1.1.1.1.2.1.1.2.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.2.2">𝑎</ci><ci id="S2.E1.m1.1.1.1.1.1.1.2.1.1.2.3.cmml" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.2.3">𝑡</ci></apply><apply id="S2.E1.m1.1.1.1.1.1.1.2.1.1.3.cmml" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.3"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.1.1.1.1.2.1.1.3.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.3">subscript</csymbol><ci id="S2.E1.m1.1.1.1.1.1.1.2.1.1.3.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.3.2">𝑠</ci><ci id="S2.E1.m1.1.1.1.1.1.1.2.1.1.3.3.cmml" xref="S2.E1.m1.1.1.1.1.1.1.2.1.1.3.3">𝑡</ci></apply></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E1.m1.1c">\mathcal{L}_{task}=\mathbb{E}[(r_{t}+\gamma V(s_{t+1})-V(s_{t}))\nabla_{\pi}% \log\pi(a_{t}|s_{t})]</annotation><annotation encoding="application/x-llamapun" id="S2.E1.m1.1d">caligraphic_L start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT = blackboard_E [ ( italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_γ italic_V ( italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) - italic_V ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ∇ start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT roman_log italic_π ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ]</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(1)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S2.SS1.p2.2">The goal of training is to minimize this cost, thereby improving the value function and, consequently, the policy over time.</p> </div> </section> <section class="ltx_subsection" id="S2.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S2.SS2.4.1.1">II-B</span> </span><span class="ltx_text ltx_font_italic" id="S2.SS2.5.2">Differentiable Simulator</span> </h3> <div class="ltx_para" id="S2.SS2.p1"> <p class="ltx_p" id="S2.SS2.p1.5">A differentiable physical engine allows the computation of gradients with respect to physical simulation parameters, e.g., friction, mass, and elasticity, etc., enabling optimization of those parameters. In a typical physics engine, the state of the system <math alttext="s_{t}" class="ltx_Math" display="inline" id="S2.SS2.p1.1.m1.1"><semantics id="S2.SS2.p1.1.m1.1a"><msub id="S2.SS2.p1.1.m1.1.1" xref="S2.SS2.p1.1.m1.1.1.cmml"><mi id="S2.SS2.p1.1.m1.1.1.2" xref="S2.SS2.p1.1.m1.1.1.2.cmml">s</mi><mi id="S2.SS2.p1.1.m1.1.1.3" xref="S2.SS2.p1.1.m1.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.1.m1.1b"><apply id="S2.SS2.p1.1.m1.1.1.cmml" xref="S2.SS2.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p1.1.m1.1.1.1.cmml" xref="S2.SS2.p1.1.m1.1.1">subscript</csymbol><ci id="S2.SS2.p1.1.m1.1.1.2.cmml" xref="S2.SS2.p1.1.m1.1.1.2">𝑠</ci><ci id="S2.SS2.p1.1.m1.1.1.3.cmml" xref="S2.SS2.p1.1.m1.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.1.m1.1c">s_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.1.m1.1d">italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> evolves according to some physical laws, modeled as a function <math alttext="f_{\theta}(s_{t},a_{t})" class="ltx_Math" display="inline" id="S2.SS2.p1.2.m2.2"><semantics id="S2.SS2.p1.2.m2.2a"><mrow id="S2.SS2.p1.2.m2.2.2" xref="S2.SS2.p1.2.m2.2.2.cmml"><msub id="S2.SS2.p1.2.m2.2.2.4" xref="S2.SS2.p1.2.m2.2.2.4.cmml"><mi id="S2.SS2.p1.2.m2.2.2.4.2" xref="S2.SS2.p1.2.m2.2.2.4.2.cmml">f</mi><mi id="S2.SS2.p1.2.m2.2.2.4.3" xref="S2.SS2.p1.2.m2.2.2.4.3.cmml">θ</mi></msub><mo id="S2.SS2.p1.2.m2.2.2.3" xref="S2.SS2.p1.2.m2.2.2.3.cmml">⁢</mo><mrow id="S2.SS2.p1.2.m2.2.2.2.2" xref="S2.SS2.p1.2.m2.2.2.2.3.cmml"><mo id="S2.SS2.p1.2.m2.2.2.2.2.3" stretchy="false" xref="S2.SS2.p1.2.m2.2.2.2.3.cmml">(</mo><msub id="S2.SS2.p1.2.m2.1.1.1.1.1" xref="S2.SS2.p1.2.m2.1.1.1.1.1.cmml"><mi id="S2.SS2.p1.2.m2.1.1.1.1.1.2" xref="S2.SS2.p1.2.m2.1.1.1.1.1.2.cmml">s</mi><mi id="S2.SS2.p1.2.m2.1.1.1.1.1.3" xref="S2.SS2.p1.2.m2.1.1.1.1.1.3.cmml">t</mi></msub><mo id="S2.SS2.p1.2.m2.2.2.2.2.4" xref="S2.SS2.p1.2.m2.2.2.2.3.cmml">,</mo><msub id="S2.SS2.p1.2.m2.2.2.2.2.2" xref="S2.SS2.p1.2.m2.2.2.2.2.2.cmml"><mi id="S2.SS2.p1.2.m2.2.2.2.2.2.2" xref="S2.SS2.p1.2.m2.2.2.2.2.2.2.cmml">a</mi><mi id="S2.SS2.p1.2.m2.2.2.2.2.2.3" xref="S2.SS2.p1.2.m2.2.2.2.2.2.3.cmml">t</mi></msub><mo id="S2.SS2.p1.2.m2.2.2.2.2.5" stretchy="false" xref="S2.SS2.p1.2.m2.2.2.2.3.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.2.m2.2b"><apply id="S2.SS2.p1.2.m2.2.2.cmml" xref="S2.SS2.p1.2.m2.2.2"><times id="S2.SS2.p1.2.m2.2.2.3.cmml" xref="S2.SS2.p1.2.m2.2.2.3"></times><apply id="S2.SS2.p1.2.m2.2.2.4.cmml" xref="S2.SS2.p1.2.m2.2.2.4"><csymbol cd="ambiguous" id="S2.SS2.p1.2.m2.2.2.4.1.cmml" xref="S2.SS2.p1.2.m2.2.2.4">subscript</csymbol><ci id="S2.SS2.p1.2.m2.2.2.4.2.cmml" xref="S2.SS2.p1.2.m2.2.2.4.2">𝑓</ci><ci id="S2.SS2.p1.2.m2.2.2.4.3.cmml" xref="S2.SS2.p1.2.m2.2.2.4.3">𝜃</ci></apply><interval closure="open" id="S2.SS2.p1.2.m2.2.2.2.3.cmml" xref="S2.SS2.p1.2.m2.2.2.2.2"><apply id="S2.SS2.p1.2.m2.1.1.1.1.1.cmml" xref="S2.SS2.p1.2.m2.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.p1.2.m2.1.1.1.1.1.1.cmml" xref="S2.SS2.p1.2.m2.1.1.1.1.1">subscript</csymbol><ci id="S2.SS2.p1.2.m2.1.1.1.1.1.2.cmml" xref="S2.SS2.p1.2.m2.1.1.1.1.1.2">𝑠</ci><ci id="S2.SS2.p1.2.m2.1.1.1.1.1.3.cmml" xref="S2.SS2.p1.2.m2.1.1.1.1.1.3">𝑡</ci></apply><apply id="S2.SS2.p1.2.m2.2.2.2.2.2.cmml" xref="S2.SS2.p1.2.m2.2.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS2.p1.2.m2.2.2.2.2.2.1.cmml" xref="S2.SS2.p1.2.m2.2.2.2.2.2">subscript</csymbol><ci id="S2.SS2.p1.2.m2.2.2.2.2.2.2.cmml" xref="S2.SS2.p1.2.m2.2.2.2.2.2.2">𝑎</ci><ci id="S2.SS2.p1.2.m2.2.2.2.2.2.3.cmml" xref="S2.SS2.p1.2.m2.2.2.2.2.2.3">𝑡</ci></apply></interval></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.2.m2.2c">f_{\theta}(s_{t},a_{t})</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.2.m2.2d">italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )</annotation></semantics></math>, where <math alttext="\theta" class="ltx_Math" display="inline" id="S2.SS2.p1.3.m3.1"><semantics id="S2.SS2.p1.3.m3.1a"><mi id="S2.SS2.p1.3.m3.1.1" xref="S2.SS2.p1.3.m3.1.1.cmml">θ</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.3.m3.1b"><ci id="S2.SS2.p1.3.m3.1.1.cmml" xref="S2.SS2.p1.3.m3.1.1">𝜃</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.3.m3.1c">\theta</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.3.m3.1d">italic_θ</annotation></semantics></math> represents the system parameters, and <math alttext="a_{t}" class="ltx_Math" display="inline" id="S2.SS2.p1.4.m4.1"><semantics id="S2.SS2.p1.4.m4.1a"><msub id="S2.SS2.p1.4.m4.1.1" xref="S2.SS2.p1.4.m4.1.1.cmml"><mi id="S2.SS2.p1.4.m4.1.1.2" xref="S2.SS2.p1.4.m4.1.1.2.cmml">a</mi><mi id="S2.SS2.p1.4.m4.1.1.3" xref="S2.SS2.p1.4.m4.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.4.m4.1b"><apply id="S2.SS2.p1.4.m4.1.1.cmml" xref="S2.SS2.p1.4.m4.1.1"><csymbol cd="ambiguous" id="S2.SS2.p1.4.m4.1.1.1.cmml" xref="S2.SS2.p1.4.m4.1.1">subscript</csymbol><ci id="S2.SS2.p1.4.m4.1.1.2.cmml" xref="S2.SS2.p1.4.m4.1.1.2">𝑎</ci><ci id="S2.SS2.p1.4.m4.1.1.3.cmml" xref="S2.SS2.p1.4.m4.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.4.m4.1c">a_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.4.m4.1d">italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> is the control input (action). The differential form of the system dynamics can be computed by applying the chain rule to the physical engine’s equations of motion. This gradient allows us to optimize the system’s parameters <math alttext="\theta" class="ltx_Math" display="inline" id="S2.SS2.p1.5.m5.1"><semantics id="S2.SS2.p1.5.m5.1a"><mi id="S2.SS2.p1.5.m5.1.1" xref="S2.SS2.p1.5.m5.1.1.cmml">θ</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.p1.5.m5.1b"><ci id="S2.SS2.p1.5.m5.1.1.cmml" xref="S2.SS2.p1.5.m5.1.1">𝜃</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.p1.5.m5.1c">\theta</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.p1.5.m5.1d">italic_θ</annotation></semantics></math> using standard gradient-based techniques.</p> </div> <figure class="ltx_figure" id="S2.F1"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="235" id="S2.F1.g1" src="extracted/6289317/Figure/Overview.png" width="509"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 1: </span>Overview of the proposed RSR (Real-to-Sim-to-Real) Loop Framework (marked in red), consisting of two key feedback loops. The “Sim-Env Parameters Tuning Loop” (marked in green) adjusts the simulator parameters by utilizing data from the real robot. This loop iterates (indexed by <math alttext="i" class="ltx_Math" display="inline" id="S2.F1.3.m1.1"><semantics id="S2.F1.3.m1.1b"><mi id="S2.F1.3.m1.1.1" xref="S2.F1.3.m1.1.1.cmml">i</mi><annotation-xml encoding="MathML-Content" id="S2.F1.3.m1.1c"><ci id="S2.F1.3.m1.1.1.cmml" xref="S2.F1.3.m1.1.1">𝑖</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.F1.3.m1.1d">i</annotation><annotation encoding="application/x-llamapun" id="S2.F1.3.m1.1e">italic_i</annotation></semantics></math>) to fine-tune the simulator, reducing the sim-to-real gap. The “Policy Training Loop” (marked in blue) utilized the tuned simulator of the current iteration <math alttext="k" class="ltx_Math" display="inline" id="S2.F1.4.m2.1"><semantics id="S2.F1.4.m2.1b"><mi id="S2.F1.4.m2.1.1" xref="S2.F1.4.m2.1.1.cmml">k</mi><annotation-xml encoding="MathML-Content" id="S2.F1.4.m2.1c"><ci id="S2.F1.4.m2.1.1.cmml" xref="S2.F1.4.m2.1.1">𝑘</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.F1.4.m2.1d">k</annotation><annotation encoding="application/x-llamapun" id="S2.F1.4.m2.1e">italic_k</annotation></semantics></math> and the adaptive InfoGap loss to further train a policy for the next iteration. Together, these loops facilitate continuous improvement of both the policy and the simulator to enhance real-world performance.</figcaption> </figure> </section> <section class="ltx_subsection" id="S2.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S2.SS3.4.1.1">II-C</span> </span><span class="ltx_text ltx_font_italic" id="S2.SS3.5.2">Data Collection Approaches</span> </h3> <div class="ltx_para" id="S2.SS3.p1"> <p class="ltx_p" id="S2.SS3.p1.11">Common methods for data collection <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib22" title="">22</a>]</cite> include random sampling, grid sampling, and trajectory-based sampling, etc. (1) In random sampling <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib23" title="">23</a>]</cite>, data points are selected independently from the state space, often resulting in an uneven distribution that may not cover critical regions. Mathematically, if <math alttext="S" class="ltx_Math" display="inline" id="S2.SS3.p1.1.m1.1"><semantics id="S2.SS3.p1.1.m1.1a"><mi id="S2.SS3.p1.1.m1.1.1" xref="S2.SS3.p1.1.m1.1.1.cmml">S</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.1.m1.1b"><ci id="S2.SS3.p1.1.m1.1.1.cmml" xref="S2.SS3.p1.1.m1.1.1">𝑆</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.1.m1.1c">S</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.1.m1.1d">italic_S</annotation></semantics></math> represents the state space, random sampling involves selecting points <math alttext="\mathbf{s}_{i}\in S" class="ltx_Math" display="inline" id="S2.SS3.p1.2.m2.1"><semantics id="S2.SS3.p1.2.m2.1a"><mrow id="S2.SS3.p1.2.m2.1.1" xref="S2.SS3.p1.2.m2.1.1.cmml"><msub id="S2.SS3.p1.2.m2.1.1.2" xref="S2.SS3.p1.2.m2.1.1.2.cmml"><mi id="S2.SS3.p1.2.m2.1.1.2.2" xref="S2.SS3.p1.2.m2.1.1.2.2.cmml">𝐬</mi><mi id="S2.SS3.p1.2.m2.1.1.2.3" xref="S2.SS3.p1.2.m2.1.1.2.3.cmml">i</mi></msub><mo id="S2.SS3.p1.2.m2.1.1.1" xref="S2.SS3.p1.2.m2.1.1.1.cmml">∈</mo><mi id="S2.SS3.p1.2.m2.1.1.3" xref="S2.SS3.p1.2.m2.1.1.3.cmml">S</mi></mrow><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.2.m2.1b"><apply id="S2.SS3.p1.2.m2.1.1.cmml" xref="S2.SS3.p1.2.m2.1.1"><in id="S2.SS3.p1.2.m2.1.1.1.cmml" xref="S2.SS3.p1.2.m2.1.1.1"></in><apply id="S2.SS3.p1.2.m2.1.1.2.cmml" xref="S2.SS3.p1.2.m2.1.1.2"><csymbol cd="ambiguous" id="S2.SS3.p1.2.m2.1.1.2.1.cmml" xref="S2.SS3.p1.2.m2.1.1.2">subscript</csymbol><ci id="S2.SS3.p1.2.m2.1.1.2.2.cmml" xref="S2.SS3.p1.2.m2.1.1.2.2">𝐬</ci><ci id="S2.SS3.p1.2.m2.1.1.2.3.cmml" xref="S2.SS3.p1.2.m2.1.1.2.3">𝑖</ci></apply><ci id="S2.SS3.p1.2.m2.1.1.3.cmml" xref="S2.SS3.p1.2.m2.1.1.3">𝑆</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.2.m2.1c">\mathbf{s}_{i}\in S</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.2.m2.1d">bold_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_S</annotation></semantics></math> with uniform probability, which can lead to underrepresentation of important regions <math alttext="S_{\text{critical}}\subset S" class="ltx_Math" display="inline" id="S2.SS3.p1.3.m3.1"><semantics id="S2.SS3.p1.3.m3.1a"><mrow id="S2.SS3.p1.3.m3.1.1" xref="S2.SS3.p1.3.m3.1.1.cmml"><msub id="S2.SS3.p1.3.m3.1.1.2" xref="S2.SS3.p1.3.m3.1.1.2.cmml"><mi id="S2.SS3.p1.3.m3.1.1.2.2" xref="S2.SS3.p1.3.m3.1.1.2.2.cmml">S</mi><mtext id="S2.SS3.p1.3.m3.1.1.2.3" xref="S2.SS3.p1.3.m3.1.1.2.3a.cmml">critical</mtext></msub><mo id="S2.SS3.p1.3.m3.1.1.1" xref="S2.SS3.p1.3.m3.1.1.1.cmml">⊂</mo><mi id="S2.SS3.p1.3.m3.1.1.3" xref="S2.SS3.p1.3.m3.1.1.3.cmml">S</mi></mrow><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.3.m3.1b"><apply id="S2.SS3.p1.3.m3.1.1.cmml" xref="S2.SS3.p1.3.m3.1.1"><subset id="S2.SS3.p1.3.m3.1.1.1.cmml" xref="S2.SS3.p1.3.m3.1.1.1"></subset><apply id="S2.SS3.p1.3.m3.1.1.2.cmml" xref="S2.SS3.p1.3.m3.1.1.2"><csymbol cd="ambiguous" id="S2.SS3.p1.3.m3.1.1.2.1.cmml" xref="S2.SS3.p1.3.m3.1.1.2">subscript</csymbol><ci id="S2.SS3.p1.3.m3.1.1.2.2.cmml" xref="S2.SS3.p1.3.m3.1.1.2.2">𝑆</ci><ci id="S2.SS3.p1.3.m3.1.1.2.3a.cmml" xref="S2.SS3.p1.3.m3.1.1.2.3"><mtext id="S2.SS3.p1.3.m3.1.1.2.3.cmml" mathsize="70%" xref="S2.SS3.p1.3.m3.1.1.2.3">critical</mtext></ci></apply><ci id="S2.SS3.p1.3.m3.1.1.3.cmml" xref="S2.SS3.p1.3.m3.1.1.3">𝑆</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.3.m3.1c">S_{\text{critical}}\subset S</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.3.m3.1d">italic_S start_POSTSUBSCRIPT critical end_POSTSUBSCRIPT ⊂ italic_S</annotation></semantics></math>. (2) Grid sampling <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib24" title="">24</a>]</cite>, where data points are chosen systematically over a predefined grid, ensures coverage of the state space but can suffer from high computational costs, particularly in high-dimensional environments. If the state space <math alttext="S" class="ltx_Math" display="inline" id="S2.SS3.p1.4.m4.1"><semantics id="S2.SS3.p1.4.m4.1a"><mi id="S2.SS3.p1.4.m4.1.1" xref="S2.SS3.p1.4.m4.1.1.cmml">S</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.4.m4.1b"><ci id="S2.SS3.p1.4.m4.1.1.cmml" xref="S2.SS3.p1.4.m4.1.1">𝑆</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.4.m4.1c">S</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.4.m4.1d">italic_S</annotation></semantics></math> is partitioned into <math alttext="N" class="ltx_Math" display="inline" id="S2.SS3.p1.5.m5.1"><semantics id="S2.SS3.p1.5.m5.1a"><mi id="S2.SS3.p1.5.m5.1.1" xref="S2.SS3.p1.5.m5.1.1.cmml">N</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.5.m5.1b"><ci id="S2.SS3.p1.5.m5.1.1.cmml" xref="S2.SS3.p1.5.m5.1.1">𝑁</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.5.m5.1c">N</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.5.m5.1d">italic_N</annotation></semantics></math> discrete regions, the number of samples grows as <math alttext="O(N^{d})" class="ltx_Math" display="inline" id="S2.SS3.p1.6.m6.1"><semantics id="S2.SS3.p1.6.m6.1a"><mrow id="S2.SS3.p1.6.m6.1.1" xref="S2.SS3.p1.6.m6.1.1.cmml"><mi id="S2.SS3.p1.6.m6.1.1.3" xref="S2.SS3.p1.6.m6.1.1.3.cmml">O</mi><mo id="S2.SS3.p1.6.m6.1.1.2" xref="S2.SS3.p1.6.m6.1.1.2.cmml">⁢</mo><mrow id="S2.SS3.p1.6.m6.1.1.1.1" xref="S2.SS3.p1.6.m6.1.1.1.1.1.cmml"><mo id="S2.SS3.p1.6.m6.1.1.1.1.2" stretchy="false" xref="S2.SS3.p1.6.m6.1.1.1.1.1.cmml">(</mo><msup id="S2.SS3.p1.6.m6.1.1.1.1.1" xref="S2.SS3.p1.6.m6.1.1.1.1.1.cmml"><mi id="S2.SS3.p1.6.m6.1.1.1.1.1.2" xref="S2.SS3.p1.6.m6.1.1.1.1.1.2.cmml">N</mi><mi id="S2.SS3.p1.6.m6.1.1.1.1.1.3" xref="S2.SS3.p1.6.m6.1.1.1.1.1.3.cmml">d</mi></msup><mo id="S2.SS3.p1.6.m6.1.1.1.1.3" stretchy="false" xref="S2.SS3.p1.6.m6.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.6.m6.1b"><apply id="S2.SS3.p1.6.m6.1.1.cmml" xref="S2.SS3.p1.6.m6.1.1"><times id="S2.SS3.p1.6.m6.1.1.2.cmml" xref="S2.SS3.p1.6.m6.1.1.2"></times><ci id="S2.SS3.p1.6.m6.1.1.3.cmml" xref="S2.SS3.p1.6.m6.1.1.3">𝑂</ci><apply id="S2.SS3.p1.6.m6.1.1.1.1.1.cmml" xref="S2.SS3.p1.6.m6.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS3.p1.6.m6.1.1.1.1.1.1.cmml" xref="S2.SS3.p1.6.m6.1.1.1.1">superscript</csymbol><ci id="S2.SS3.p1.6.m6.1.1.1.1.1.2.cmml" xref="S2.SS3.p1.6.m6.1.1.1.1.1.2">𝑁</ci><ci id="S2.SS3.p1.6.m6.1.1.1.1.1.3.cmml" xref="S2.SS3.p1.6.m6.1.1.1.1.1.3">𝑑</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.6.m6.1c">O(N^{d})</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.6.m6.1d">italic_O ( italic_N start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT )</annotation></semantics></math> for <math alttext="d" class="ltx_Math" display="inline" id="S2.SS3.p1.7.m7.1"><semantics id="S2.SS3.p1.7.m7.1a"><mi id="S2.SS3.p1.7.m7.1.1" xref="S2.SS3.p1.7.m7.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.7.m7.1b"><ci id="S2.SS3.p1.7.m7.1.1.cmml" xref="S2.SS3.p1.7.m7.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.7.m7.1c">d</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.7.m7.1d">italic_d</annotation></semantics></math>-dimensional spaces, making this approach inefficient in large-scale settings. (3) Trajectory-based sampling collects data along the robot’s trajectories, but it often focuses disproportionately on regions the current policy already explores. Let <math alttext="\mathcal{T}_{i}" class="ltx_Math" display="inline" id="S2.SS3.p1.8.m8.1"><semantics id="S2.SS3.p1.8.m8.1a"><msub id="S2.SS3.p1.8.m8.1.1" xref="S2.SS3.p1.8.m8.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S2.SS3.p1.8.m8.1.1.2" xref="S2.SS3.p1.8.m8.1.1.2.cmml">𝒯</mi><mi id="S2.SS3.p1.8.m8.1.1.3" xref="S2.SS3.p1.8.m8.1.1.3.cmml">i</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.8.m8.1b"><apply id="S2.SS3.p1.8.m8.1.1.cmml" xref="S2.SS3.p1.8.m8.1.1"><csymbol cd="ambiguous" id="S2.SS3.p1.8.m8.1.1.1.cmml" xref="S2.SS3.p1.8.m8.1.1">subscript</csymbol><ci id="S2.SS3.p1.8.m8.1.1.2.cmml" xref="S2.SS3.p1.8.m8.1.1.2">𝒯</ci><ci id="S2.SS3.p1.8.m8.1.1.3.cmml" xref="S2.SS3.p1.8.m8.1.1.3">𝑖</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.8.m8.1c">\mathcal{T}_{i}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.8.m8.1d">caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT</annotation></semantics></math> denote the trajectory of the robot, and the data sampled along <math alttext="\mathcal{T}_{i}" class="ltx_Math" display="inline" id="S2.SS3.p1.9.m9.1"><semantics id="S2.SS3.p1.9.m9.1a"><msub id="S2.SS3.p1.9.m9.1.1" xref="S2.SS3.p1.9.m9.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S2.SS3.p1.9.m9.1.1.2" xref="S2.SS3.p1.9.m9.1.1.2.cmml">𝒯</mi><mi id="S2.SS3.p1.9.m9.1.1.3" xref="S2.SS3.p1.9.m9.1.1.3.cmml">i</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.9.m9.1b"><apply id="S2.SS3.p1.9.m9.1.1.cmml" xref="S2.SS3.p1.9.m9.1.1"><csymbol cd="ambiguous" id="S2.SS3.p1.9.m9.1.1.1.cmml" xref="S2.SS3.p1.9.m9.1.1">subscript</csymbol><ci id="S2.SS3.p1.9.m9.1.1.2.cmml" xref="S2.SS3.p1.9.m9.1.1.2">𝒯</ci><ci id="S2.SS3.p1.9.m9.1.1.3.cmml" xref="S2.SS3.p1.9.m9.1.1.3">𝑖</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.9.m9.1c">\mathcal{T}_{i}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.9.m9.1d">caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT</annotation></semantics></math> may over-represent regions <math alttext="\mathcal{T}_{i}\subset S" class="ltx_Math" display="inline" id="S2.SS3.p1.10.m10.1"><semantics id="S2.SS3.p1.10.m10.1a"><mrow id="S2.SS3.p1.10.m10.1.1" xref="S2.SS3.p1.10.m10.1.1.cmml"><msub id="S2.SS3.p1.10.m10.1.1.2" xref="S2.SS3.p1.10.m10.1.1.2.cmml"><mi class="ltx_font_mathcaligraphic" id="S2.SS3.p1.10.m10.1.1.2.2" xref="S2.SS3.p1.10.m10.1.1.2.2.cmml">𝒯</mi><mi id="S2.SS3.p1.10.m10.1.1.2.3" xref="S2.SS3.p1.10.m10.1.1.2.3.cmml">i</mi></msub><mo id="S2.SS3.p1.10.m10.1.1.1" xref="S2.SS3.p1.10.m10.1.1.1.cmml">⊂</mo><mi id="S2.SS3.p1.10.m10.1.1.3" xref="S2.SS3.p1.10.m10.1.1.3.cmml">S</mi></mrow><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.10.m10.1b"><apply id="S2.SS3.p1.10.m10.1.1.cmml" xref="S2.SS3.p1.10.m10.1.1"><subset id="S2.SS3.p1.10.m10.1.1.1.cmml" xref="S2.SS3.p1.10.m10.1.1.1"></subset><apply id="S2.SS3.p1.10.m10.1.1.2.cmml" xref="S2.SS3.p1.10.m10.1.1.2"><csymbol cd="ambiguous" id="S2.SS3.p1.10.m10.1.1.2.1.cmml" xref="S2.SS3.p1.10.m10.1.1.2">subscript</csymbol><ci id="S2.SS3.p1.10.m10.1.1.2.2.cmml" xref="S2.SS3.p1.10.m10.1.1.2.2">𝒯</ci><ci id="S2.SS3.p1.10.m10.1.1.2.3.cmml" xref="S2.SS3.p1.10.m10.1.1.2.3">𝑖</ci></apply><ci id="S2.SS3.p1.10.m10.1.1.3.cmml" xref="S2.SS3.p1.10.m10.1.1.3">𝑆</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.10.m10.1c">\mathcal{T}_{i}\subset S</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.10.m10.1d">caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊂ italic_S</annotation></semantics></math> that the policy already visits, neglecting unexplored regions <math alttext="S\setminus\mathcal{T}_{i}" class="ltx_Math" display="inline" id="S2.SS3.p1.11.m11.1"><semantics id="S2.SS3.p1.11.m11.1a"><mrow id="S2.SS3.p1.11.m11.1.1" xref="S2.SS3.p1.11.m11.1.1.cmml"><mi id="S2.SS3.p1.11.m11.1.1.2" xref="S2.SS3.p1.11.m11.1.1.2.cmml">S</mi><mo id="S2.SS3.p1.11.m11.1.1.1" xref="S2.SS3.p1.11.m11.1.1.1.cmml">∖</mo><msub id="S2.SS3.p1.11.m11.1.1.3" xref="S2.SS3.p1.11.m11.1.1.3.cmml"><mi class="ltx_font_mathcaligraphic" id="S2.SS3.p1.11.m11.1.1.3.2" xref="S2.SS3.p1.11.m11.1.1.3.2.cmml">𝒯</mi><mi id="S2.SS3.p1.11.m11.1.1.3.3" xref="S2.SS3.p1.11.m11.1.1.3.3.cmml">i</mi></msub></mrow><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.11.m11.1b"><apply id="S2.SS3.p1.11.m11.1.1.cmml" xref="S2.SS3.p1.11.m11.1.1"><setdiff id="S2.SS3.p1.11.m11.1.1.1.cmml" xref="S2.SS3.p1.11.m11.1.1.1"></setdiff><ci id="S2.SS3.p1.11.m11.1.1.2.cmml" xref="S2.SS3.p1.11.m11.1.1.2">𝑆</ci><apply id="S2.SS3.p1.11.m11.1.1.3.cmml" xref="S2.SS3.p1.11.m11.1.1.3"><csymbol cd="ambiguous" id="S2.SS3.p1.11.m11.1.1.3.1.cmml" xref="S2.SS3.p1.11.m11.1.1.3">subscript</csymbol><ci id="S2.SS3.p1.11.m11.1.1.3.2.cmml" xref="S2.SS3.p1.11.m11.1.1.3.2">𝒯</ci><ci id="S2.SS3.p1.11.m11.1.1.3.3.cmml" xref="S2.SS3.p1.11.m11.1.1.3.3">𝑖</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.11.m11.1c">S\setminus\mathcal{T}_{i}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.11.m11.1d">italic_S ∖ caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT</annotation></semantics></math>.</p> </div> </section> <section class="ltx_subsection" id="S2.SS4"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S2.SS4.4.1.1">II-D</span> </span><span class="ltx_text ltx_font_italic" id="S2.SS4.5.2">Information Theory</span> </h3> <div class="ltx_para" id="S2.SS4.p1"> <p class="ltx_p" id="S2.SS4.p1.1">Information theory provides fundamental tools for quantifying and comparing distributions of information <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib25" title="">25</a>]</cite>. If a particular data point or event significantly alters the distribution, it means that the data point is “informative” <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib26" title="">26</a>]</cite>. Here we give a short introduction to the basic concepts in the information theory that will be further used in the proposed structure.</p> </div> <section class="ltx_subsubsection" id="S2.SS4.SSS1"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection"><span class="ltx_text" id="S2.SS4.SSS1.4.1.1">II-D</span>1 </span>Kernel Density Estimation (KDE)</h4> <div class="ltx_para" id="S2.SS4.SSS1.p1"> <p class="ltx_p" id="S2.SS4.SSS1.p1.1">Kernel Density Estimation <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib27" title="">27</a>]</cite> is a non-parametric way to estimate the probability density function of a random variable from a set of samples <math alttext="\{D\}" class="ltx_Math" display="inline" id="S2.SS4.SSS1.p1.1.m1.1"><semantics id="S2.SS4.SSS1.p1.1.m1.1a"><mrow id="S2.SS4.SSS1.p1.1.m1.1.2.2" xref="S2.SS4.SSS1.p1.1.m1.1.2.1.cmml"><mo id="S2.SS4.SSS1.p1.1.m1.1.2.2.1" stretchy="false" xref="S2.SS4.SSS1.p1.1.m1.1.2.1.cmml">{</mo><mi id="S2.SS4.SSS1.p1.1.m1.1.1" xref="S2.SS4.SSS1.p1.1.m1.1.1.cmml">D</mi><mo id="S2.SS4.SSS1.p1.1.m1.1.2.2.2" stretchy="false" xref="S2.SS4.SSS1.p1.1.m1.1.2.1.cmml">}</mo></mrow><annotation-xml encoding="MathML-Content" id="S2.SS4.SSS1.p1.1.m1.1b"><set id="S2.SS4.SSS1.p1.1.m1.1.2.1.cmml" xref="S2.SS4.SSS1.p1.1.m1.1.2.2"><ci id="S2.SS4.SSS1.p1.1.m1.1.1.cmml" xref="S2.SS4.SSS1.p1.1.m1.1.1">𝐷</ci></set></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.SSS1.p1.1.m1.1c">\{D\}</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.SSS1.p1.1.m1.1d">{ italic_D }</annotation></semantics></math>. It smooths the distribution of data by placing a kernel (e.g., a Gaussian) on each data point and summing these kernels to approximate the overall distribution. Mathematically, KDE is expressed as:</p> <table class="ltx_equation ltx_eqn_table" id="S2.Ex1"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\hat{p}(\mathcal{D})=\frac{1}{nh}\sum_{i=1}^{n}K\left(\frac{x-x_{i}}{h}\right)" class="ltx_Math" display="block" id="S2.Ex1.m1.2"><semantics id="S2.Ex1.m1.2a"><mrow id="S2.Ex1.m1.2.3" xref="S2.Ex1.m1.2.3.cmml"><mrow id="S2.Ex1.m1.2.3.2" xref="S2.Ex1.m1.2.3.2.cmml"><mover accent="true" id="S2.Ex1.m1.2.3.2.2" xref="S2.Ex1.m1.2.3.2.2.cmml"><mi id="S2.Ex1.m1.2.3.2.2.2" xref="S2.Ex1.m1.2.3.2.2.2.cmml">p</mi><mo id="S2.Ex1.m1.2.3.2.2.1" xref="S2.Ex1.m1.2.3.2.2.1.cmml">^</mo></mover><mo id="S2.Ex1.m1.2.3.2.1" xref="S2.Ex1.m1.2.3.2.1.cmml">⁢</mo><mrow id="S2.Ex1.m1.2.3.2.3.2" xref="S2.Ex1.m1.2.3.2.cmml"><mo id="S2.Ex1.m1.2.3.2.3.2.1" stretchy="false" xref="S2.Ex1.m1.2.3.2.cmml">(</mo><mi class="ltx_font_mathcaligraphic" id="S2.Ex1.m1.1.1" xref="S2.Ex1.m1.1.1.cmml">𝒟</mi><mo id="S2.Ex1.m1.2.3.2.3.2.2" stretchy="false" xref="S2.Ex1.m1.2.3.2.cmml">)</mo></mrow></mrow><mo id="S2.Ex1.m1.2.3.1" xref="S2.Ex1.m1.2.3.1.cmml">=</mo><mrow id="S2.Ex1.m1.2.3.3" xref="S2.Ex1.m1.2.3.3.cmml"><mfrac id="S2.Ex1.m1.2.3.3.2" xref="S2.Ex1.m1.2.3.3.2.cmml"><mn id="S2.Ex1.m1.2.3.3.2.2" xref="S2.Ex1.m1.2.3.3.2.2.cmml">1</mn><mrow id="S2.Ex1.m1.2.3.3.2.3" xref="S2.Ex1.m1.2.3.3.2.3.cmml"><mi id="S2.Ex1.m1.2.3.3.2.3.2" xref="S2.Ex1.m1.2.3.3.2.3.2.cmml">n</mi><mo id="S2.Ex1.m1.2.3.3.2.3.1" xref="S2.Ex1.m1.2.3.3.2.3.1.cmml">⁢</mo><mi id="S2.Ex1.m1.2.3.3.2.3.3" xref="S2.Ex1.m1.2.3.3.2.3.3.cmml">h</mi></mrow></mfrac><mo id="S2.Ex1.m1.2.3.3.1" xref="S2.Ex1.m1.2.3.3.1.cmml">⁢</mo><mrow id="S2.Ex1.m1.2.3.3.3" xref="S2.Ex1.m1.2.3.3.3.cmml"><munderover id="S2.Ex1.m1.2.3.3.3.1" xref="S2.Ex1.m1.2.3.3.3.1.cmml"><mo id="S2.Ex1.m1.2.3.3.3.1.2.2" movablelimits="false" xref="S2.Ex1.m1.2.3.3.3.1.2.2.cmml">∑</mo><mrow id="S2.Ex1.m1.2.3.3.3.1.2.3" xref="S2.Ex1.m1.2.3.3.3.1.2.3.cmml"><mi id="S2.Ex1.m1.2.3.3.3.1.2.3.2" xref="S2.Ex1.m1.2.3.3.3.1.2.3.2.cmml">i</mi><mo id="S2.Ex1.m1.2.3.3.3.1.2.3.1" xref="S2.Ex1.m1.2.3.3.3.1.2.3.1.cmml">=</mo><mn id="S2.Ex1.m1.2.3.3.3.1.2.3.3" xref="S2.Ex1.m1.2.3.3.3.1.2.3.3.cmml">1</mn></mrow><mi id="S2.Ex1.m1.2.3.3.3.1.3" xref="S2.Ex1.m1.2.3.3.3.1.3.cmml">n</mi></munderover><mrow id="S2.Ex1.m1.2.3.3.3.2" xref="S2.Ex1.m1.2.3.3.3.2.cmml"><mi id="S2.Ex1.m1.2.3.3.3.2.2" xref="S2.Ex1.m1.2.3.3.3.2.2.cmml">K</mi><mo id="S2.Ex1.m1.2.3.3.3.2.1" xref="S2.Ex1.m1.2.3.3.3.2.1.cmml">⁢</mo><mrow id="S2.Ex1.m1.2.3.3.3.2.3.2" xref="S2.Ex1.m1.2.2.cmml"><mo id="S2.Ex1.m1.2.3.3.3.2.3.2.1" xref="S2.Ex1.m1.2.2.cmml">(</mo><mfrac id="S2.Ex1.m1.2.2" xref="S2.Ex1.m1.2.2.cmml"><mrow id="S2.Ex1.m1.2.2.2" xref="S2.Ex1.m1.2.2.2.cmml"><mi id="S2.Ex1.m1.2.2.2.2" xref="S2.Ex1.m1.2.2.2.2.cmml">x</mi><mo id="S2.Ex1.m1.2.2.2.1" xref="S2.Ex1.m1.2.2.2.1.cmml">−</mo><msub id="S2.Ex1.m1.2.2.2.3" xref="S2.Ex1.m1.2.2.2.3.cmml"><mi id="S2.Ex1.m1.2.2.2.3.2" xref="S2.Ex1.m1.2.2.2.3.2.cmml">x</mi><mi id="S2.Ex1.m1.2.2.2.3.3" xref="S2.Ex1.m1.2.2.2.3.3.cmml">i</mi></msub></mrow><mi id="S2.Ex1.m1.2.2.3" xref="S2.Ex1.m1.2.2.3.cmml">h</mi></mfrac><mo id="S2.Ex1.m1.2.3.3.3.2.3.2.2" xref="S2.Ex1.m1.2.2.cmml">)</mo></mrow></mrow></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.Ex1.m1.2b"><apply id="S2.Ex1.m1.2.3.cmml" xref="S2.Ex1.m1.2.3"><eq id="S2.Ex1.m1.2.3.1.cmml" xref="S2.Ex1.m1.2.3.1"></eq><apply id="S2.Ex1.m1.2.3.2.cmml" xref="S2.Ex1.m1.2.3.2"><times id="S2.Ex1.m1.2.3.2.1.cmml" xref="S2.Ex1.m1.2.3.2.1"></times><apply id="S2.Ex1.m1.2.3.2.2.cmml" xref="S2.Ex1.m1.2.3.2.2"><ci id="S2.Ex1.m1.2.3.2.2.1.cmml" xref="S2.Ex1.m1.2.3.2.2.1">^</ci><ci id="S2.Ex1.m1.2.3.2.2.2.cmml" xref="S2.Ex1.m1.2.3.2.2.2">𝑝</ci></apply><ci id="S2.Ex1.m1.1.1.cmml" xref="S2.Ex1.m1.1.1">𝒟</ci></apply><apply id="S2.Ex1.m1.2.3.3.cmml" xref="S2.Ex1.m1.2.3.3"><times id="S2.Ex1.m1.2.3.3.1.cmml" xref="S2.Ex1.m1.2.3.3.1"></times><apply id="S2.Ex1.m1.2.3.3.2.cmml" xref="S2.Ex1.m1.2.3.3.2"><divide id="S2.Ex1.m1.2.3.3.2.1.cmml" xref="S2.Ex1.m1.2.3.3.2"></divide><cn id="S2.Ex1.m1.2.3.3.2.2.cmml" type="integer" xref="S2.Ex1.m1.2.3.3.2.2">1</cn><apply id="S2.Ex1.m1.2.3.3.2.3.cmml" xref="S2.Ex1.m1.2.3.3.2.3"><times id="S2.Ex1.m1.2.3.3.2.3.1.cmml" xref="S2.Ex1.m1.2.3.3.2.3.1"></times><ci id="S2.Ex1.m1.2.3.3.2.3.2.cmml" xref="S2.Ex1.m1.2.3.3.2.3.2">𝑛</ci><ci id="S2.Ex1.m1.2.3.3.2.3.3.cmml" xref="S2.Ex1.m1.2.3.3.2.3.3">ℎ</ci></apply></apply><apply id="S2.Ex1.m1.2.3.3.3.cmml" xref="S2.Ex1.m1.2.3.3.3"><apply id="S2.Ex1.m1.2.3.3.3.1.cmml" xref="S2.Ex1.m1.2.3.3.3.1"><csymbol cd="ambiguous" id="S2.Ex1.m1.2.3.3.3.1.1.cmml" xref="S2.Ex1.m1.2.3.3.3.1">superscript</csymbol><apply id="S2.Ex1.m1.2.3.3.3.1.2.cmml" xref="S2.Ex1.m1.2.3.3.3.1"><csymbol cd="ambiguous" id="S2.Ex1.m1.2.3.3.3.1.2.1.cmml" xref="S2.Ex1.m1.2.3.3.3.1">subscript</csymbol><sum id="S2.Ex1.m1.2.3.3.3.1.2.2.cmml" xref="S2.Ex1.m1.2.3.3.3.1.2.2"></sum><apply id="S2.Ex1.m1.2.3.3.3.1.2.3.cmml" xref="S2.Ex1.m1.2.3.3.3.1.2.3"><eq id="S2.Ex1.m1.2.3.3.3.1.2.3.1.cmml" xref="S2.Ex1.m1.2.3.3.3.1.2.3.1"></eq><ci id="S2.Ex1.m1.2.3.3.3.1.2.3.2.cmml" xref="S2.Ex1.m1.2.3.3.3.1.2.3.2">𝑖</ci><cn id="S2.Ex1.m1.2.3.3.3.1.2.3.3.cmml" type="integer" xref="S2.Ex1.m1.2.3.3.3.1.2.3.3">1</cn></apply></apply><ci id="S2.Ex1.m1.2.3.3.3.1.3.cmml" xref="S2.Ex1.m1.2.3.3.3.1.3">𝑛</ci></apply><apply id="S2.Ex1.m1.2.3.3.3.2.cmml" xref="S2.Ex1.m1.2.3.3.3.2"><times id="S2.Ex1.m1.2.3.3.3.2.1.cmml" xref="S2.Ex1.m1.2.3.3.3.2.1"></times><ci id="S2.Ex1.m1.2.3.3.3.2.2.cmml" xref="S2.Ex1.m1.2.3.3.3.2.2">𝐾</ci><apply id="S2.Ex1.m1.2.2.cmml" xref="S2.Ex1.m1.2.3.3.3.2.3.2"><divide id="S2.Ex1.m1.2.2.1.cmml" xref="S2.Ex1.m1.2.3.3.3.2.3.2"></divide><apply id="S2.Ex1.m1.2.2.2.cmml" xref="S2.Ex1.m1.2.2.2"><minus id="S2.Ex1.m1.2.2.2.1.cmml" xref="S2.Ex1.m1.2.2.2.1"></minus><ci id="S2.Ex1.m1.2.2.2.2.cmml" xref="S2.Ex1.m1.2.2.2.2">𝑥</ci><apply id="S2.Ex1.m1.2.2.2.3.cmml" xref="S2.Ex1.m1.2.2.2.3"><csymbol cd="ambiguous" id="S2.Ex1.m1.2.2.2.3.1.cmml" xref="S2.Ex1.m1.2.2.2.3">subscript</csymbol><ci id="S2.Ex1.m1.2.2.2.3.2.cmml" xref="S2.Ex1.m1.2.2.2.3.2">𝑥</ci><ci id="S2.Ex1.m1.2.2.2.3.3.cmml" xref="S2.Ex1.m1.2.2.2.3.3">𝑖</ci></apply></apply><ci id="S2.Ex1.m1.2.2.3.cmml" xref="S2.Ex1.m1.2.2.3">ℎ</ci></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.Ex1.m1.2c">\hat{p}(\mathcal{D})=\frac{1}{nh}\sum_{i=1}^{n}K\left(\frac{x-x_{i}}{h}\right)</annotation><annotation encoding="application/x-llamapun" id="S2.Ex1.m1.2d">over^ start_ARG italic_p end_ARG ( caligraphic_D ) = divide start_ARG 1 end_ARG start_ARG italic_n italic_h end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_K ( divide start_ARG italic_x - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_h end_ARG )</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> </tr></tbody> </table> <p class="ltx_p" id="S2.SS4.SSS1.p1.4">where <math alttext="K" class="ltx_Math" display="inline" id="S2.SS4.SSS1.p1.2.m1.1"><semantics id="S2.SS4.SSS1.p1.2.m1.1a"><mi id="S2.SS4.SSS1.p1.2.m1.1.1" xref="S2.SS4.SSS1.p1.2.m1.1.1.cmml">K</mi><annotation-xml encoding="MathML-Content" id="S2.SS4.SSS1.p1.2.m1.1b"><ci id="S2.SS4.SSS1.p1.2.m1.1.1.cmml" xref="S2.SS4.SSS1.p1.2.m1.1.1">𝐾</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.SSS1.p1.2.m1.1c">K</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.SSS1.p1.2.m1.1d">italic_K</annotation></semantics></math> is the kernel function, <math alttext="h" class="ltx_Math" display="inline" id="S2.SS4.SSS1.p1.3.m2.1"><semantics id="S2.SS4.SSS1.p1.3.m2.1a"><mi id="S2.SS4.SSS1.p1.3.m2.1.1" xref="S2.SS4.SSS1.p1.3.m2.1.1.cmml">h</mi><annotation-xml encoding="MathML-Content" id="S2.SS4.SSS1.p1.3.m2.1b"><ci id="S2.SS4.SSS1.p1.3.m2.1.1.cmml" xref="S2.SS4.SSS1.p1.3.m2.1.1">ℎ</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.SSS1.p1.3.m2.1c">h</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.SSS1.p1.3.m2.1d">italic_h</annotation></semantics></math> is the bandwidth parameter that controls the smoothness, and <math alttext="x_{i}\in\{\mathcal{D}\}" class="ltx_Math" display="inline" id="S2.SS4.SSS1.p1.4.m3.1"><semantics id="S2.SS4.SSS1.p1.4.m3.1a"><mrow id="S2.SS4.SSS1.p1.4.m3.1.2" xref="S2.SS4.SSS1.p1.4.m3.1.2.cmml"><msub id="S2.SS4.SSS1.p1.4.m3.1.2.2" xref="S2.SS4.SSS1.p1.4.m3.1.2.2.cmml"><mi id="S2.SS4.SSS1.p1.4.m3.1.2.2.2" xref="S2.SS4.SSS1.p1.4.m3.1.2.2.2.cmml">x</mi><mi id="S2.SS4.SSS1.p1.4.m3.1.2.2.3" xref="S2.SS4.SSS1.p1.4.m3.1.2.2.3.cmml">i</mi></msub><mo id="S2.SS4.SSS1.p1.4.m3.1.2.1" xref="S2.SS4.SSS1.p1.4.m3.1.2.1.cmml">∈</mo><mrow id="S2.SS4.SSS1.p1.4.m3.1.2.3.2" xref="S2.SS4.SSS1.p1.4.m3.1.2.3.1.cmml"><mo id="S2.SS4.SSS1.p1.4.m3.1.2.3.2.1" stretchy="false" xref="S2.SS4.SSS1.p1.4.m3.1.2.3.1.cmml">{</mo><mi class="ltx_font_mathcaligraphic" id="S2.SS4.SSS1.p1.4.m3.1.1" xref="S2.SS4.SSS1.p1.4.m3.1.1.cmml">𝒟</mi><mo id="S2.SS4.SSS1.p1.4.m3.1.2.3.2.2" stretchy="false" xref="S2.SS4.SSS1.p1.4.m3.1.2.3.1.cmml">}</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS4.SSS1.p1.4.m3.1b"><apply id="S2.SS4.SSS1.p1.4.m3.1.2.cmml" xref="S2.SS4.SSS1.p1.4.m3.1.2"><in id="S2.SS4.SSS1.p1.4.m3.1.2.1.cmml" xref="S2.SS4.SSS1.p1.4.m3.1.2.1"></in><apply id="S2.SS4.SSS1.p1.4.m3.1.2.2.cmml" xref="S2.SS4.SSS1.p1.4.m3.1.2.2"><csymbol cd="ambiguous" id="S2.SS4.SSS1.p1.4.m3.1.2.2.1.cmml" xref="S2.SS4.SSS1.p1.4.m3.1.2.2">subscript</csymbol><ci id="S2.SS4.SSS1.p1.4.m3.1.2.2.2.cmml" xref="S2.SS4.SSS1.p1.4.m3.1.2.2.2">𝑥</ci><ci id="S2.SS4.SSS1.p1.4.m3.1.2.2.3.cmml" xref="S2.SS4.SSS1.p1.4.m3.1.2.2.3">𝑖</ci></apply><set id="S2.SS4.SSS1.p1.4.m3.1.2.3.1.cmml" xref="S2.SS4.SSS1.p1.4.m3.1.2.3.2"><ci id="S2.SS4.SSS1.p1.4.m3.1.1.cmml" xref="S2.SS4.SSS1.p1.4.m3.1.1">𝒟</ci></set></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.SSS1.p1.4.m3.1c">x_{i}\in\{\mathcal{D}\}</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.SSS1.p1.4.m3.1d">italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { caligraphic_D }</annotation></semantics></math> are the data points.</p> </div> </section> <section class="ltx_subsubsection" id="S2.SS4.SSS2"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection"><span class="ltx_text" id="S2.SS4.SSS2.4.1.1">II-D</span>2 </span>Distribution Divergence Measure</h4> <div class="ltx_para" id="S2.SS4.SSS2.p1"> <p class="ltx_p" id="S2.SS4.SSS2.p1.2">The “Kullback-Leibler (KL) divergence” is a measure of how one probability distribution diverges from a second probability distribution. KL divergence is often used as a cost function to measure the dissimilarity between two distributions <math alttext="p" class="ltx_Math" display="inline" id="S2.SS4.SSS2.p1.1.m1.1"><semantics id="S2.SS4.SSS2.p1.1.m1.1a"><mi id="S2.SS4.SSS2.p1.1.m1.1.1" xref="S2.SS4.SSS2.p1.1.m1.1.1.cmml">p</mi><annotation-xml encoding="MathML-Content" id="S2.SS4.SSS2.p1.1.m1.1b"><ci id="S2.SS4.SSS2.p1.1.m1.1.1.cmml" xref="S2.SS4.SSS2.p1.1.m1.1.1">𝑝</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.SSS2.p1.1.m1.1c">p</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.SSS2.p1.1.m1.1d">italic_p</annotation></semantics></math> and <math alttext="q" class="ltx_Math" display="inline" id="S2.SS4.SSS2.p1.2.m2.1"><semantics id="S2.SS4.SSS2.p1.2.m2.1a"><mi id="S2.SS4.SSS2.p1.2.m2.1.1" xref="S2.SS4.SSS2.p1.2.m2.1.1.cmml">q</mi><annotation-xml encoding="MathML-Content" id="S2.SS4.SSS2.p1.2.m2.1b"><ci id="S2.SS4.SSS2.p1.2.m2.1.1.cmml" xref="S2.SS4.SSS2.p1.2.m2.1.1">𝑞</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.SSS2.p1.2.m2.1c">q</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.SSS2.p1.2.m2.1d">italic_q</annotation></semantics></math>:</p> <table class="ltx_equation ltx_eqn_table" id="S2.Ex2"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\text{KL}(p\parallel q)=\sum_{x}p(x)\log\left(\frac{p(x)}{q(x)}\right)" class="ltx_Math" display="block" id="S2.Ex2.m1.5"><semantics id="S2.Ex2.m1.5a"><mrow id="S2.Ex2.m1.5.5" xref="S2.Ex2.m1.5.5.cmml"><mrow id="S2.Ex2.m1.5.5.1" xref="S2.Ex2.m1.5.5.1.cmml"><mtext id="S2.Ex2.m1.5.5.1.3" xref="S2.Ex2.m1.5.5.1.3a.cmml">KL</mtext><mo id="S2.Ex2.m1.5.5.1.2" xref="S2.Ex2.m1.5.5.1.2.cmml">⁢</mo><mrow id="S2.Ex2.m1.5.5.1.1.1" xref="S2.Ex2.m1.5.5.1.1.1.1.cmml"><mo id="S2.Ex2.m1.5.5.1.1.1.2" stretchy="false" xref="S2.Ex2.m1.5.5.1.1.1.1.cmml">(</mo><mrow id="S2.Ex2.m1.5.5.1.1.1.1" xref="S2.Ex2.m1.5.5.1.1.1.1.cmml"><mi id="S2.Ex2.m1.5.5.1.1.1.1.2" xref="S2.Ex2.m1.5.5.1.1.1.1.2.cmml">p</mi><mo id="S2.Ex2.m1.5.5.1.1.1.1.1" xref="S2.Ex2.m1.5.5.1.1.1.1.1.cmml">∥</mo><mi id="S2.Ex2.m1.5.5.1.1.1.1.3" xref="S2.Ex2.m1.5.5.1.1.1.1.3.cmml">q</mi></mrow><mo id="S2.Ex2.m1.5.5.1.1.1.3" stretchy="false" xref="S2.Ex2.m1.5.5.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S2.Ex2.m1.5.5.2" rspace="0.111em" xref="S2.Ex2.m1.5.5.2.cmml">=</mo><mrow id="S2.Ex2.m1.5.5.3" xref="S2.Ex2.m1.5.5.3.cmml"><munder id="S2.Ex2.m1.5.5.3.1" xref="S2.Ex2.m1.5.5.3.1.cmml"><mo id="S2.Ex2.m1.5.5.3.1.2" movablelimits="false" xref="S2.Ex2.m1.5.5.3.1.2.cmml">∑</mo><mi id="S2.Ex2.m1.5.5.3.1.3" xref="S2.Ex2.m1.5.5.3.1.3.cmml">x</mi></munder><mrow id="S2.Ex2.m1.5.5.3.2" xref="S2.Ex2.m1.5.5.3.2.cmml"><mi id="S2.Ex2.m1.5.5.3.2.2" xref="S2.Ex2.m1.5.5.3.2.2.cmml">p</mi><mo id="S2.Ex2.m1.5.5.3.2.1" xref="S2.Ex2.m1.5.5.3.2.1.cmml">⁢</mo><mrow id="S2.Ex2.m1.5.5.3.2.3.2" xref="S2.Ex2.m1.5.5.3.2.cmml"><mo id="S2.Ex2.m1.5.5.3.2.3.2.1" stretchy="false" xref="S2.Ex2.m1.5.5.3.2.cmml">(</mo><mi id="S2.Ex2.m1.3.3" xref="S2.Ex2.m1.3.3.cmml">x</mi><mo id="S2.Ex2.m1.5.5.3.2.3.2.2" stretchy="false" xref="S2.Ex2.m1.5.5.3.2.cmml">)</mo></mrow><mo id="S2.Ex2.m1.5.5.3.2.1a" lspace="0.167em" xref="S2.Ex2.m1.5.5.3.2.1.cmml">⁢</mo><mrow id="S2.Ex2.m1.5.5.3.2.4.2" xref="S2.Ex2.m1.5.5.3.2.4.1.cmml"><mi id="S2.Ex2.m1.4.4" xref="S2.Ex2.m1.4.4.cmml">log</mi><mo id="S2.Ex2.m1.5.5.3.2.4.2a" xref="S2.Ex2.m1.5.5.3.2.4.1.cmml">⁡</mo><mrow id="S2.Ex2.m1.5.5.3.2.4.2.1" xref="S2.Ex2.m1.5.5.3.2.4.1.cmml"><mo id="S2.Ex2.m1.5.5.3.2.4.2.1.1" xref="S2.Ex2.m1.5.5.3.2.4.1.cmml">(</mo><mfrac id="S2.Ex2.m1.2.2" xref="S2.Ex2.m1.2.2.cmml"><mrow id="S2.Ex2.m1.1.1.1" xref="S2.Ex2.m1.1.1.1.cmml"><mi id="S2.Ex2.m1.1.1.1.3" xref="S2.Ex2.m1.1.1.1.3.cmml">p</mi><mo id="S2.Ex2.m1.1.1.1.2" xref="S2.Ex2.m1.1.1.1.2.cmml">⁢</mo><mrow id="S2.Ex2.m1.1.1.1.4.2" xref="S2.Ex2.m1.1.1.1.cmml"><mo id="S2.Ex2.m1.1.1.1.4.2.1" stretchy="false" xref="S2.Ex2.m1.1.1.1.cmml">(</mo><mi id="S2.Ex2.m1.1.1.1.1" xref="S2.Ex2.m1.1.1.1.1.cmml">x</mi><mo id="S2.Ex2.m1.1.1.1.4.2.2" stretchy="false" xref="S2.Ex2.m1.1.1.1.cmml">)</mo></mrow></mrow><mrow id="S2.Ex2.m1.2.2.2" xref="S2.Ex2.m1.2.2.2.cmml"><mi id="S2.Ex2.m1.2.2.2.3" xref="S2.Ex2.m1.2.2.2.3.cmml">q</mi><mo id="S2.Ex2.m1.2.2.2.2" xref="S2.Ex2.m1.2.2.2.2.cmml">⁢</mo><mrow id="S2.Ex2.m1.2.2.2.4.2" xref="S2.Ex2.m1.2.2.2.cmml"><mo id="S2.Ex2.m1.2.2.2.4.2.1" stretchy="false" xref="S2.Ex2.m1.2.2.2.cmml">(</mo><mi id="S2.Ex2.m1.2.2.2.1" xref="S2.Ex2.m1.2.2.2.1.cmml">x</mi><mo id="S2.Ex2.m1.2.2.2.4.2.2" stretchy="false" xref="S2.Ex2.m1.2.2.2.cmml">)</mo></mrow></mrow></mfrac><mo id="S2.Ex2.m1.5.5.3.2.4.2.1.2" xref="S2.Ex2.m1.5.5.3.2.4.1.cmml">)</mo></mrow></mrow></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.Ex2.m1.5b"><apply id="S2.Ex2.m1.5.5.cmml" xref="S2.Ex2.m1.5.5"><eq id="S2.Ex2.m1.5.5.2.cmml" xref="S2.Ex2.m1.5.5.2"></eq><apply id="S2.Ex2.m1.5.5.1.cmml" xref="S2.Ex2.m1.5.5.1"><times id="S2.Ex2.m1.5.5.1.2.cmml" xref="S2.Ex2.m1.5.5.1.2"></times><ci id="S2.Ex2.m1.5.5.1.3a.cmml" xref="S2.Ex2.m1.5.5.1.3"><mtext id="S2.Ex2.m1.5.5.1.3.cmml" xref="S2.Ex2.m1.5.5.1.3">KL</mtext></ci><apply id="S2.Ex2.m1.5.5.1.1.1.1.cmml" xref="S2.Ex2.m1.5.5.1.1.1"><csymbol cd="latexml" id="S2.Ex2.m1.5.5.1.1.1.1.1.cmml" xref="S2.Ex2.m1.5.5.1.1.1.1.1">conditional</csymbol><ci id="S2.Ex2.m1.5.5.1.1.1.1.2.cmml" xref="S2.Ex2.m1.5.5.1.1.1.1.2">𝑝</ci><ci id="S2.Ex2.m1.5.5.1.1.1.1.3.cmml" xref="S2.Ex2.m1.5.5.1.1.1.1.3">𝑞</ci></apply></apply><apply id="S2.Ex2.m1.5.5.3.cmml" xref="S2.Ex2.m1.5.5.3"><apply id="S2.Ex2.m1.5.5.3.1.cmml" xref="S2.Ex2.m1.5.5.3.1"><csymbol cd="ambiguous" id="S2.Ex2.m1.5.5.3.1.1.cmml" xref="S2.Ex2.m1.5.5.3.1">subscript</csymbol><sum id="S2.Ex2.m1.5.5.3.1.2.cmml" xref="S2.Ex2.m1.5.5.3.1.2"></sum><ci id="S2.Ex2.m1.5.5.3.1.3.cmml" xref="S2.Ex2.m1.5.5.3.1.3">𝑥</ci></apply><apply id="S2.Ex2.m1.5.5.3.2.cmml" xref="S2.Ex2.m1.5.5.3.2"><times id="S2.Ex2.m1.5.5.3.2.1.cmml" xref="S2.Ex2.m1.5.5.3.2.1"></times><ci id="S2.Ex2.m1.5.5.3.2.2.cmml" xref="S2.Ex2.m1.5.5.3.2.2">𝑝</ci><ci id="S2.Ex2.m1.3.3.cmml" xref="S2.Ex2.m1.3.3">𝑥</ci><apply id="S2.Ex2.m1.5.5.3.2.4.1.cmml" xref="S2.Ex2.m1.5.5.3.2.4.2"><log id="S2.Ex2.m1.4.4.cmml" xref="S2.Ex2.m1.4.4"></log><apply id="S2.Ex2.m1.2.2.cmml" xref="S2.Ex2.m1.2.2"><divide id="S2.Ex2.m1.2.2.3.cmml" xref="S2.Ex2.m1.2.2"></divide><apply id="S2.Ex2.m1.1.1.1.cmml" xref="S2.Ex2.m1.1.1.1"><times id="S2.Ex2.m1.1.1.1.2.cmml" xref="S2.Ex2.m1.1.1.1.2"></times><ci id="S2.Ex2.m1.1.1.1.3.cmml" xref="S2.Ex2.m1.1.1.1.3">𝑝</ci><ci id="S2.Ex2.m1.1.1.1.1.cmml" xref="S2.Ex2.m1.1.1.1.1">𝑥</ci></apply><apply id="S2.Ex2.m1.2.2.2.cmml" xref="S2.Ex2.m1.2.2.2"><times id="S2.Ex2.m1.2.2.2.2.cmml" xref="S2.Ex2.m1.2.2.2.2"></times><ci id="S2.Ex2.m1.2.2.2.3.cmml" xref="S2.Ex2.m1.2.2.2.3">𝑞</ci><ci id="S2.Ex2.m1.2.2.2.1.cmml" xref="S2.Ex2.m1.2.2.2.1">𝑥</ci></apply></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.Ex2.m1.5c">\text{KL}(p\parallel q)=\sum_{x}p(x)\log\left(\frac{p(x)}{q(x)}\right)</annotation><annotation encoding="application/x-llamapun" id="S2.Ex2.m1.5d">KL ( italic_p ∥ italic_q ) = ∑ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_p ( italic_x ) roman_log ( divide start_ARG italic_p ( italic_x ) end_ARG start_ARG italic_q ( italic_x ) end_ARG )</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> </tr></tbody> </table> <p class="ltx_p" id="S2.SS4.SSS2.p1.6">The “Wasserstein distance”, also known as “Earth Mover’s Distance (EMD)”, is another metric used to measure the difference between two probability distributions. It is particularly useful in situations when the geometry of the space is important. The Wasserstein distance <math alttext="W_{\beta}(p,q)" class="ltx_Math" display="inline" id="S2.SS4.SSS2.p1.3.m1.2"><semantics id="S2.SS4.SSS2.p1.3.m1.2a"><mrow id="S2.SS4.SSS2.p1.3.m1.2.3" xref="S2.SS4.SSS2.p1.3.m1.2.3.cmml"><msub id="S2.SS4.SSS2.p1.3.m1.2.3.2" xref="S2.SS4.SSS2.p1.3.m1.2.3.2.cmml"><mi id="S2.SS4.SSS2.p1.3.m1.2.3.2.2" xref="S2.SS4.SSS2.p1.3.m1.2.3.2.2.cmml">W</mi><mi id="S2.SS4.SSS2.p1.3.m1.2.3.2.3" xref="S2.SS4.SSS2.p1.3.m1.2.3.2.3.cmml">β</mi></msub><mo id="S2.SS4.SSS2.p1.3.m1.2.3.1" xref="S2.SS4.SSS2.p1.3.m1.2.3.1.cmml">⁢</mo><mrow id="S2.SS4.SSS2.p1.3.m1.2.3.3.2" xref="S2.SS4.SSS2.p1.3.m1.2.3.3.1.cmml"><mo id="S2.SS4.SSS2.p1.3.m1.2.3.3.2.1" stretchy="false" xref="S2.SS4.SSS2.p1.3.m1.2.3.3.1.cmml">(</mo><mi id="S2.SS4.SSS2.p1.3.m1.1.1" xref="S2.SS4.SSS2.p1.3.m1.1.1.cmml">p</mi><mo id="S2.SS4.SSS2.p1.3.m1.2.3.3.2.2" xref="S2.SS4.SSS2.p1.3.m1.2.3.3.1.cmml">,</mo><mi id="S2.SS4.SSS2.p1.3.m1.2.2" xref="S2.SS4.SSS2.p1.3.m1.2.2.cmml">q</mi><mo id="S2.SS4.SSS2.p1.3.m1.2.3.3.2.3" stretchy="false" xref="S2.SS4.SSS2.p1.3.m1.2.3.3.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS4.SSS2.p1.3.m1.2b"><apply id="S2.SS4.SSS2.p1.3.m1.2.3.cmml" xref="S2.SS4.SSS2.p1.3.m1.2.3"><times id="S2.SS4.SSS2.p1.3.m1.2.3.1.cmml" xref="S2.SS4.SSS2.p1.3.m1.2.3.1"></times><apply id="S2.SS4.SSS2.p1.3.m1.2.3.2.cmml" xref="S2.SS4.SSS2.p1.3.m1.2.3.2"><csymbol cd="ambiguous" id="S2.SS4.SSS2.p1.3.m1.2.3.2.1.cmml" xref="S2.SS4.SSS2.p1.3.m1.2.3.2">subscript</csymbol><ci id="S2.SS4.SSS2.p1.3.m1.2.3.2.2.cmml" xref="S2.SS4.SSS2.p1.3.m1.2.3.2.2">𝑊</ci><ci id="S2.SS4.SSS2.p1.3.m1.2.3.2.3.cmml" xref="S2.SS4.SSS2.p1.3.m1.2.3.2.3">𝛽</ci></apply><interval closure="open" id="S2.SS4.SSS2.p1.3.m1.2.3.3.1.cmml" xref="S2.SS4.SSS2.p1.3.m1.2.3.3.2"><ci id="S2.SS4.SSS2.p1.3.m1.1.1.cmml" xref="S2.SS4.SSS2.p1.3.m1.1.1">𝑝</ci><ci id="S2.SS4.SSS2.p1.3.m1.2.2.cmml" xref="S2.SS4.SSS2.p1.3.m1.2.2">𝑞</ci></interval></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.SSS2.p1.3.m1.2c">W_{\beta}(p,q)</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.SSS2.p1.3.m1.2d">italic_W start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( italic_p , italic_q )</annotation></semantics></math> of order <math alttext="\beta" class="ltx_Math" display="inline" id="S2.SS4.SSS2.p1.4.m2.1"><semantics id="S2.SS4.SSS2.p1.4.m2.1a"><mi id="S2.SS4.SSS2.p1.4.m2.1.1" xref="S2.SS4.SSS2.p1.4.m2.1.1.cmml">β</mi><annotation-xml encoding="MathML-Content" id="S2.SS4.SSS2.p1.4.m2.1b"><ci id="S2.SS4.SSS2.p1.4.m2.1.1.cmml" xref="S2.SS4.SSS2.p1.4.m2.1.1">𝛽</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.SSS2.p1.4.m2.1c">\beta</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.SSS2.p1.4.m2.1d">italic_β</annotation></semantics></math> for two probability distributions <math alttext="p" class="ltx_Math" display="inline" id="S2.SS4.SSS2.p1.5.m3.1"><semantics id="S2.SS4.SSS2.p1.5.m3.1a"><mi id="S2.SS4.SSS2.p1.5.m3.1.1" xref="S2.SS4.SSS2.p1.5.m3.1.1.cmml">p</mi><annotation-xml encoding="MathML-Content" id="S2.SS4.SSS2.p1.5.m3.1b"><ci id="S2.SS4.SSS2.p1.5.m3.1.1.cmml" xref="S2.SS4.SSS2.p1.5.m3.1.1">𝑝</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.SSS2.p1.5.m3.1c">p</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.SSS2.p1.5.m3.1d">italic_p</annotation></semantics></math> and <math alttext="q" class="ltx_Math" display="inline" id="S2.SS4.SSS2.p1.6.m4.1"><semantics id="S2.SS4.SSS2.p1.6.m4.1a"><mi id="S2.SS4.SSS2.p1.6.m4.1.1" xref="S2.SS4.SSS2.p1.6.m4.1.1.cmml">q</mi><annotation-xml encoding="MathML-Content" id="S2.SS4.SSS2.p1.6.m4.1b"><ci id="S2.SS4.SSS2.p1.6.m4.1.1.cmml" xref="S2.SS4.SSS2.p1.6.m4.1.1">𝑞</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.SSS2.p1.6.m4.1c">q</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.SSS2.p1.6.m4.1d">italic_q</annotation></semantics></math> can be intuitively thought of as the minimum cost of transforming one distribution into another, where the cost is determined by the amount of mass moved and the distance it is moved.</p> </div> </section> </section> </section> <section class="ltx_section" id="S3"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">III </span><span class="ltx_text ltx_font_smallcaps" id="S3.1.1">Method</span> </h2> <section class="ltx_subsection" id="S3.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS1.4.1.1">III-A</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS1.5.2">System Overview</span> </h3> <div class="ltx_para" id="S3.SS1.p1"> <p class="ltx_p" id="S3.SS1.p1.1">In this section, we provide an introduction to the proposed structure as shown in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S2.F1" title="Figure 1 ‣ II-B Differentiable Simulator ‣ II Preliminary ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_tag">1</span></a>. The goal is to collect real-world data from the robot to tune the simulator and construct the cost function for policy training, and then deploy the trained policy on the real robot to collect new data for the next iteration. This iterative process is referred to as the “Real-Sim-Real (RSR) Loop Framework”, which aims to improve sim-to-real transfer. Through continuous cycles of environment tuning and policy training, the robot’s performance are progressively enhanced within limited iterations, resulting in a more robust system. In the following, we will outline the details of each key component of the proposed structure. The algorithm is illustrated in the Appendix <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S8.SS1" title="VIII-A Algorithm ‣ VIII Appendix ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">VIII-A</span></span></a></p> </div> </section> <section class="ltx_subsection" id="S3.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS2.4.1.1">III-B</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS2.5.2">Simulation-Env Parameter Tuning Process</span> </h3> <div class="ltx_para" id="S3.SS2.p1"> <p class="ltx_p" id="S3.SS2.p1.1">To align the simulation environment with real-world dynamics, we iteratively optimize the simulation parameters <math alttext="\theta" class="ltx_Math" display="inline" id="S3.SS2.p1.1.m1.1"><semantics id="S3.SS2.p1.1.m1.1a"><mi id="S3.SS2.p1.1.m1.1.1" xref="S3.SS2.p1.1.m1.1.1.cmml">θ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.p1.1.m1.1b"><ci id="S3.SS2.p1.1.m1.1.1.cmml" xref="S3.SS2.p1.1.m1.1.1">𝜃</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p1.1.m1.1c">\theta</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p1.1.m1.1d">italic_θ</annotation></semantics></math> using data collected from real-world experiments.</p> <table class="ltx_equation ltx_eqn_table" id="S3.E2"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\theta=arg\min\enspace\mathcal{L}_{physical}(\mathcal{D}_{real}-\mathcal{D}_{% sim}(\theta))" class="ltx_Math" display="block" id="S3.E2.m1.2"><semantics id="S3.E2.m1.2a"><mrow id="S3.E2.m1.2.2" xref="S3.E2.m1.2.2.cmml"><mi id="S3.E2.m1.2.2.3" xref="S3.E2.m1.2.2.3.cmml">θ</mi><mo id="S3.E2.m1.2.2.2" xref="S3.E2.m1.2.2.2.cmml">=</mo><mrow id="S3.E2.m1.2.2.1" xref="S3.E2.m1.2.2.1.cmml"><mi id="S3.E2.m1.2.2.1.3" xref="S3.E2.m1.2.2.1.3.cmml">a</mi><mo id="S3.E2.m1.2.2.1.2" xref="S3.E2.m1.2.2.1.2.cmml">⁢</mo><mi id="S3.E2.m1.2.2.1.4" xref="S3.E2.m1.2.2.1.4.cmml">r</mi><mo id="S3.E2.m1.2.2.1.2a" xref="S3.E2.m1.2.2.1.2.cmml">⁢</mo><mi id="S3.E2.m1.2.2.1.5" xref="S3.E2.m1.2.2.1.5.cmml">g</mi><mo id="S3.E2.m1.2.2.1.2b" lspace="0.167em" xref="S3.E2.m1.2.2.1.2.cmml">⁢</mo><mrow id="S3.E2.m1.2.2.1.6" xref="S3.E2.m1.2.2.1.6.cmml"><mi id="S3.E2.m1.2.2.1.6.1" xref="S3.E2.m1.2.2.1.6.1.cmml">min</mi><mo id="S3.E2.m1.2.2.1.6a" lspace="0.667em" xref="S3.E2.m1.2.2.1.6.cmml">⁡</mo><msub id="S3.E2.m1.2.2.1.6.2" xref="S3.E2.m1.2.2.1.6.2.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.E2.m1.2.2.1.6.2.2" xref="S3.E2.m1.2.2.1.6.2.2.cmml">ℒ</mi><mrow id="S3.E2.m1.2.2.1.6.2.3" xref="S3.E2.m1.2.2.1.6.2.3.cmml"><mi id="S3.E2.m1.2.2.1.6.2.3.2" xref="S3.E2.m1.2.2.1.6.2.3.2.cmml">p</mi><mo id="S3.E2.m1.2.2.1.6.2.3.1" xref="S3.E2.m1.2.2.1.6.2.3.1.cmml">⁢</mo><mi id="S3.E2.m1.2.2.1.6.2.3.3" xref="S3.E2.m1.2.2.1.6.2.3.3.cmml">h</mi><mo id="S3.E2.m1.2.2.1.6.2.3.1a" xref="S3.E2.m1.2.2.1.6.2.3.1.cmml">⁢</mo><mi id="S3.E2.m1.2.2.1.6.2.3.4" xref="S3.E2.m1.2.2.1.6.2.3.4.cmml">y</mi><mo id="S3.E2.m1.2.2.1.6.2.3.1b" xref="S3.E2.m1.2.2.1.6.2.3.1.cmml">⁢</mo><mi id="S3.E2.m1.2.2.1.6.2.3.5" xref="S3.E2.m1.2.2.1.6.2.3.5.cmml">s</mi><mo id="S3.E2.m1.2.2.1.6.2.3.1c" xref="S3.E2.m1.2.2.1.6.2.3.1.cmml">⁢</mo><mi id="S3.E2.m1.2.2.1.6.2.3.6" xref="S3.E2.m1.2.2.1.6.2.3.6.cmml">i</mi><mo id="S3.E2.m1.2.2.1.6.2.3.1d" xref="S3.E2.m1.2.2.1.6.2.3.1.cmml">⁢</mo><mi id="S3.E2.m1.2.2.1.6.2.3.7" xref="S3.E2.m1.2.2.1.6.2.3.7.cmml">c</mi><mo id="S3.E2.m1.2.2.1.6.2.3.1e" xref="S3.E2.m1.2.2.1.6.2.3.1.cmml">⁢</mo><mi id="S3.E2.m1.2.2.1.6.2.3.8" xref="S3.E2.m1.2.2.1.6.2.3.8.cmml">a</mi><mo id="S3.E2.m1.2.2.1.6.2.3.1f" xref="S3.E2.m1.2.2.1.6.2.3.1.cmml">⁢</mo><mi id="S3.E2.m1.2.2.1.6.2.3.9" xref="S3.E2.m1.2.2.1.6.2.3.9.cmml">l</mi></mrow></msub></mrow><mo id="S3.E2.m1.2.2.1.2c" xref="S3.E2.m1.2.2.1.2.cmml">⁢</mo><mrow id="S3.E2.m1.2.2.1.1.1" xref="S3.E2.m1.2.2.1.1.1.1.cmml"><mo id="S3.E2.m1.2.2.1.1.1.2" stretchy="false" xref="S3.E2.m1.2.2.1.1.1.1.cmml">(</mo><mrow id="S3.E2.m1.2.2.1.1.1.1" xref="S3.E2.m1.2.2.1.1.1.1.cmml"><msub id="S3.E2.m1.2.2.1.1.1.1.2" xref="S3.E2.m1.2.2.1.1.1.1.2.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.E2.m1.2.2.1.1.1.1.2.2" xref="S3.E2.m1.2.2.1.1.1.1.2.2.cmml">𝒟</mi><mrow id="S3.E2.m1.2.2.1.1.1.1.2.3" xref="S3.E2.m1.2.2.1.1.1.1.2.3.cmml"><mi id="S3.E2.m1.2.2.1.1.1.1.2.3.2" xref="S3.E2.m1.2.2.1.1.1.1.2.3.2.cmml">r</mi><mo id="S3.E2.m1.2.2.1.1.1.1.2.3.1" xref="S3.E2.m1.2.2.1.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.E2.m1.2.2.1.1.1.1.2.3.3" xref="S3.E2.m1.2.2.1.1.1.1.2.3.3.cmml">e</mi><mo id="S3.E2.m1.2.2.1.1.1.1.2.3.1a" xref="S3.E2.m1.2.2.1.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.E2.m1.2.2.1.1.1.1.2.3.4" xref="S3.E2.m1.2.2.1.1.1.1.2.3.4.cmml">a</mi><mo id="S3.E2.m1.2.2.1.1.1.1.2.3.1b" xref="S3.E2.m1.2.2.1.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.E2.m1.2.2.1.1.1.1.2.3.5" xref="S3.E2.m1.2.2.1.1.1.1.2.3.5.cmml">l</mi></mrow></msub><mo id="S3.E2.m1.2.2.1.1.1.1.1" xref="S3.E2.m1.2.2.1.1.1.1.1.cmml">−</mo><mrow id="S3.E2.m1.2.2.1.1.1.1.3" xref="S3.E2.m1.2.2.1.1.1.1.3.cmml"><msub id="S3.E2.m1.2.2.1.1.1.1.3.2" xref="S3.E2.m1.2.2.1.1.1.1.3.2.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.E2.m1.2.2.1.1.1.1.3.2.2" xref="S3.E2.m1.2.2.1.1.1.1.3.2.2.cmml">𝒟</mi><mrow id="S3.E2.m1.2.2.1.1.1.1.3.2.3" xref="S3.E2.m1.2.2.1.1.1.1.3.2.3.cmml"><mi id="S3.E2.m1.2.2.1.1.1.1.3.2.3.2" xref="S3.E2.m1.2.2.1.1.1.1.3.2.3.2.cmml">s</mi><mo id="S3.E2.m1.2.2.1.1.1.1.3.2.3.1" xref="S3.E2.m1.2.2.1.1.1.1.3.2.3.1.cmml">⁢</mo><mi id="S3.E2.m1.2.2.1.1.1.1.3.2.3.3" xref="S3.E2.m1.2.2.1.1.1.1.3.2.3.3.cmml">i</mi><mo id="S3.E2.m1.2.2.1.1.1.1.3.2.3.1a" xref="S3.E2.m1.2.2.1.1.1.1.3.2.3.1.cmml">⁢</mo><mi id="S3.E2.m1.2.2.1.1.1.1.3.2.3.4" xref="S3.E2.m1.2.2.1.1.1.1.3.2.3.4.cmml">m</mi></mrow></msub><mo id="S3.E2.m1.2.2.1.1.1.1.3.1" xref="S3.E2.m1.2.2.1.1.1.1.3.1.cmml">⁢</mo><mrow id="S3.E2.m1.2.2.1.1.1.1.3.3.2" xref="S3.E2.m1.2.2.1.1.1.1.3.cmml"><mo id="S3.E2.m1.2.2.1.1.1.1.3.3.2.1" stretchy="false" xref="S3.E2.m1.2.2.1.1.1.1.3.cmml">(</mo><mi id="S3.E2.m1.1.1" xref="S3.E2.m1.1.1.cmml">θ</mi><mo id="S3.E2.m1.2.2.1.1.1.1.3.3.2.2" stretchy="false" xref="S3.E2.m1.2.2.1.1.1.1.3.cmml">)</mo></mrow></mrow></mrow><mo id="S3.E2.m1.2.2.1.1.1.3" stretchy="false" xref="S3.E2.m1.2.2.1.1.1.1.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.E2.m1.2b"><apply id="S3.E2.m1.2.2.cmml" xref="S3.E2.m1.2.2"><eq id="S3.E2.m1.2.2.2.cmml" xref="S3.E2.m1.2.2.2"></eq><ci id="S3.E2.m1.2.2.3.cmml" xref="S3.E2.m1.2.2.3">𝜃</ci><apply id="S3.E2.m1.2.2.1.cmml" xref="S3.E2.m1.2.2.1"><times id="S3.E2.m1.2.2.1.2.cmml" xref="S3.E2.m1.2.2.1.2"></times><ci id="S3.E2.m1.2.2.1.3.cmml" xref="S3.E2.m1.2.2.1.3">𝑎</ci><ci id="S3.E2.m1.2.2.1.4.cmml" xref="S3.E2.m1.2.2.1.4">𝑟</ci><ci id="S3.E2.m1.2.2.1.5.cmml" xref="S3.E2.m1.2.2.1.5">𝑔</ci><apply id="S3.E2.m1.2.2.1.6.cmml" xref="S3.E2.m1.2.2.1.6"><min id="S3.E2.m1.2.2.1.6.1.cmml" xref="S3.E2.m1.2.2.1.6.1"></min><apply id="S3.E2.m1.2.2.1.6.2.cmml" xref="S3.E2.m1.2.2.1.6.2"><csymbol cd="ambiguous" id="S3.E2.m1.2.2.1.6.2.1.cmml" xref="S3.E2.m1.2.2.1.6.2">subscript</csymbol><ci id="S3.E2.m1.2.2.1.6.2.2.cmml" xref="S3.E2.m1.2.2.1.6.2.2">ℒ</ci><apply id="S3.E2.m1.2.2.1.6.2.3.cmml" xref="S3.E2.m1.2.2.1.6.2.3"><times id="S3.E2.m1.2.2.1.6.2.3.1.cmml" xref="S3.E2.m1.2.2.1.6.2.3.1"></times><ci id="S3.E2.m1.2.2.1.6.2.3.2.cmml" xref="S3.E2.m1.2.2.1.6.2.3.2">𝑝</ci><ci id="S3.E2.m1.2.2.1.6.2.3.3.cmml" xref="S3.E2.m1.2.2.1.6.2.3.3">ℎ</ci><ci id="S3.E2.m1.2.2.1.6.2.3.4.cmml" xref="S3.E2.m1.2.2.1.6.2.3.4">𝑦</ci><ci id="S3.E2.m1.2.2.1.6.2.3.5.cmml" xref="S3.E2.m1.2.2.1.6.2.3.5">𝑠</ci><ci id="S3.E2.m1.2.2.1.6.2.3.6.cmml" xref="S3.E2.m1.2.2.1.6.2.3.6">𝑖</ci><ci id="S3.E2.m1.2.2.1.6.2.3.7.cmml" xref="S3.E2.m1.2.2.1.6.2.3.7">𝑐</ci><ci id="S3.E2.m1.2.2.1.6.2.3.8.cmml" xref="S3.E2.m1.2.2.1.6.2.3.8">𝑎</ci><ci id="S3.E2.m1.2.2.1.6.2.3.9.cmml" xref="S3.E2.m1.2.2.1.6.2.3.9">𝑙</ci></apply></apply></apply><apply id="S3.E2.m1.2.2.1.1.1.1.cmml" xref="S3.E2.m1.2.2.1.1.1"><minus id="S3.E2.m1.2.2.1.1.1.1.1.cmml" xref="S3.E2.m1.2.2.1.1.1.1.1"></minus><apply id="S3.E2.m1.2.2.1.1.1.1.2.cmml" xref="S3.E2.m1.2.2.1.1.1.1.2"><csymbol cd="ambiguous" id="S3.E2.m1.2.2.1.1.1.1.2.1.cmml" xref="S3.E2.m1.2.2.1.1.1.1.2">subscript</csymbol><ci id="S3.E2.m1.2.2.1.1.1.1.2.2.cmml" xref="S3.E2.m1.2.2.1.1.1.1.2.2">𝒟</ci><apply id="S3.E2.m1.2.2.1.1.1.1.2.3.cmml" xref="S3.E2.m1.2.2.1.1.1.1.2.3"><times id="S3.E2.m1.2.2.1.1.1.1.2.3.1.cmml" xref="S3.E2.m1.2.2.1.1.1.1.2.3.1"></times><ci id="S3.E2.m1.2.2.1.1.1.1.2.3.2.cmml" xref="S3.E2.m1.2.2.1.1.1.1.2.3.2">𝑟</ci><ci id="S3.E2.m1.2.2.1.1.1.1.2.3.3.cmml" xref="S3.E2.m1.2.2.1.1.1.1.2.3.3">𝑒</ci><ci id="S3.E2.m1.2.2.1.1.1.1.2.3.4.cmml" xref="S3.E2.m1.2.2.1.1.1.1.2.3.4">𝑎</ci><ci id="S3.E2.m1.2.2.1.1.1.1.2.3.5.cmml" xref="S3.E2.m1.2.2.1.1.1.1.2.3.5">𝑙</ci></apply></apply><apply id="S3.E2.m1.2.2.1.1.1.1.3.cmml" xref="S3.E2.m1.2.2.1.1.1.1.3"><times id="S3.E2.m1.2.2.1.1.1.1.3.1.cmml" xref="S3.E2.m1.2.2.1.1.1.1.3.1"></times><apply id="S3.E2.m1.2.2.1.1.1.1.3.2.cmml" xref="S3.E2.m1.2.2.1.1.1.1.3.2"><csymbol cd="ambiguous" id="S3.E2.m1.2.2.1.1.1.1.3.2.1.cmml" xref="S3.E2.m1.2.2.1.1.1.1.3.2">subscript</csymbol><ci id="S3.E2.m1.2.2.1.1.1.1.3.2.2.cmml" xref="S3.E2.m1.2.2.1.1.1.1.3.2.2">𝒟</ci><apply id="S3.E2.m1.2.2.1.1.1.1.3.2.3.cmml" xref="S3.E2.m1.2.2.1.1.1.1.3.2.3"><times id="S3.E2.m1.2.2.1.1.1.1.3.2.3.1.cmml" xref="S3.E2.m1.2.2.1.1.1.1.3.2.3.1"></times><ci id="S3.E2.m1.2.2.1.1.1.1.3.2.3.2.cmml" xref="S3.E2.m1.2.2.1.1.1.1.3.2.3.2">𝑠</ci><ci id="S3.E2.m1.2.2.1.1.1.1.3.2.3.3.cmml" xref="S3.E2.m1.2.2.1.1.1.1.3.2.3.3">𝑖</ci><ci id="S3.E2.m1.2.2.1.1.1.1.3.2.3.4.cmml" xref="S3.E2.m1.2.2.1.1.1.1.3.2.3.4">𝑚</ci></apply></apply><ci id="S3.E2.m1.1.1.cmml" xref="S3.E2.m1.1.1">𝜃</ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E2.m1.2c">\theta=arg\min\enspace\mathcal{L}_{physical}(\mathcal{D}_{real}-\mathcal{D}_{% sim}(\theta))</annotation><annotation encoding="application/x-llamapun" id="S3.E2.m1.2d">italic_θ = italic_a italic_r italic_g roman_min caligraphic_L start_POSTSUBSCRIPT italic_p italic_h italic_y italic_s italic_i italic_c italic_a italic_l end_POSTSUBSCRIPT ( caligraphic_D start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT - caligraphic_D start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT ( italic_θ ) )</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(2)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS2.p1.6">The optimization process minimizes the physical loss <math alttext="\mathcal{L}_{physical}" class="ltx_Math" display="inline" id="S3.SS2.p1.2.m1.1"><semantics id="S3.SS2.p1.2.m1.1a"><msub id="S3.SS2.p1.2.m1.1.1" xref="S3.SS2.p1.2.m1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.SS2.p1.2.m1.1.1.2" xref="S3.SS2.p1.2.m1.1.1.2.cmml">ℒ</mi><mrow id="S3.SS2.p1.2.m1.1.1.3" xref="S3.SS2.p1.2.m1.1.1.3.cmml"><mi id="S3.SS2.p1.2.m1.1.1.3.2" xref="S3.SS2.p1.2.m1.1.1.3.2.cmml">p</mi><mo id="S3.SS2.p1.2.m1.1.1.3.1" xref="S3.SS2.p1.2.m1.1.1.3.1.cmml">⁢</mo><mi id="S3.SS2.p1.2.m1.1.1.3.3" xref="S3.SS2.p1.2.m1.1.1.3.3.cmml">h</mi><mo id="S3.SS2.p1.2.m1.1.1.3.1a" xref="S3.SS2.p1.2.m1.1.1.3.1.cmml">⁢</mo><mi id="S3.SS2.p1.2.m1.1.1.3.4" xref="S3.SS2.p1.2.m1.1.1.3.4.cmml">y</mi><mo id="S3.SS2.p1.2.m1.1.1.3.1b" xref="S3.SS2.p1.2.m1.1.1.3.1.cmml">⁢</mo><mi id="S3.SS2.p1.2.m1.1.1.3.5" xref="S3.SS2.p1.2.m1.1.1.3.5.cmml">s</mi><mo id="S3.SS2.p1.2.m1.1.1.3.1c" xref="S3.SS2.p1.2.m1.1.1.3.1.cmml">⁢</mo><mi id="S3.SS2.p1.2.m1.1.1.3.6" xref="S3.SS2.p1.2.m1.1.1.3.6.cmml">i</mi><mo id="S3.SS2.p1.2.m1.1.1.3.1d" xref="S3.SS2.p1.2.m1.1.1.3.1.cmml">⁢</mo><mi id="S3.SS2.p1.2.m1.1.1.3.7" xref="S3.SS2.p1.2.m1.1.1.3.7.cmml">c</mi><mo id="S3.SS2.p1.2.m1.1.1.3.1e" xref="S3.SS2.p1.2.m1.1.1.3.1.cmml">⁢</mo><mi id="S3.SS2.p1.2.m1.1.1.3.8" xref="S3.SS2.p1.2.m1.1.1.3.8.cmml">a</mi><mo id="S3.SS2.p1.2.m1.1.1.3.1f" xref="S3.SS2.p1.2.m1.1.1.3.1.cmml">⁢</mo><mi id="S3.SS2.p1.2.m1.1.1.3.9" xref="S3.SS2.p1.2.m1.1.1.3.9.cmml">l</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.p1.2.m1.1b"><apply id="S3.SS2.p1.2.m1.1.1.cmml" xref="S3.SS2.p1.2.m1.1.1"><csymbol cd="ambiguous" id="S3.SS2.p1.2.m1.1.1.1.cmml" xref="S3.SS2.p1.2.m1.1.1">subscript</csymbol><ci id="S3.SS2.p1.2.m1.1.1.2.cmml" xref="S3.SS2.p1.2.m1.1.1.2">ℒ</ci><apply id="S3.SS2.p1.2.m1.1.1.3.cmml" xref="S3.SS2.p1.2.m1.1.1.3"><times id="S3.SS2.p1.2.m1.1.1.3.1.cmml" xref="S3.SS2.p1.2.m1.1.1.3.1"></times><ci id="S3.SS2.p1.2.m1.1.1.3.2.cmml" xref="S3.SS2.p1.2.m1.1.1.3.2">𝑝</ci><ci id="S3.SS2.p1.2.m1.1.1.3.3.cmml" xref="S3.SS2.p1.2.m1.1.1.3.3">ℎ</ci><ci id="S3.SS2.p1.2.m1.1.1.3.4.cmml" xref="S3.SS2.p1.2.m1.1.1.3.4">𝑦</ci><ci id="S3.SS2.p1.2.m1.1.1.3.5.cmml" xref="S3.SS2.p1.2.m1.1.1.3.5">𝑠</ci><ci id="S3.SS2.p1.2.m1.1.1.3.6.cmml" xref="S3.SS2.p1.2.m1.1.1.3.6">𝑖</ci><ci id="S3.SS2.p1.2.m1.1.1.3.7.cmml" xref="S3.SS2.p1.2.m1.1.1.3.7">𝑐</ci><ci id="S3.SS2.p1.2.m1.1.1.3.8.cmml" xref="S3.SS2.p1.2.m1.1.1.3.8">𝑎</ci><ci id="S3.SS2.p1.2.m1.1.1.3.9.cmml" xref="S3.SS2.p1.2.m1.1.1.3.9">𝑙</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p1.2.m1.1c">\mathcal{L}_{physical}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p1.2.m1.1d">caligraphic_L start_POSTSUBSCRIPT italic_p italic_h italic_y italic_s italic_i italic_c italic_a italic_l end_POSTSUBSCRIPT</annotation></semantics></math>, which is computed as the discrepancy between the real-world measurements <math alttext="\mathcal{D}_{real}" class="ltx_Math" display="inline" id="S3.SS2.p1.3.m2.1"><semantics id="S3.SS2.p1.3.m2.1a"><msub id="S3.SS2.p1.3.m2.1.1" xref="S3.SS2.p1.3.m2.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.SS2.p1.3.m2.1.1.2" xref="S3.SS2.p1.3.m2.1.1.2.cmml">𝒟</mi><mrow id="S3.SS2.p1.3.m2.1.1.3" xref="S3.SS2.p1.3.m2.1.1.3.cmml"><mi id="S3.SS2.p1.3.m2.1.1.3.2" xref="S3.SS2.p1.3.m2.1.1.3.2.cmml">r</mi><mo id="S3.SS2.p1.3.m2.1.1.3.1" xref="S3.SS2.p1.3.m2.1.1.3.1.cmml">⁢</mo><mi id="S3.SS2.p1.3.m2.1.1.3.3" xref="S3.SS2.p1.3.m2.1.1.3.3.cmml">e</mi><mo id="S3.SS2.p1.3.m2.1.1.3.1a" xref="S3.SS2.p1.3.m2.1.1.3.1.cmml">⁢</mo><mi id="S3.SS2.p1.3.m2.1.1.3.4" xref="S3.SS2.p1.3.m2.1.1.3.4.cmml">a</mi><mo id="S3.SS2.p1.3.m2.1.1.3.1b" xref="S3.SS2.p1.3.m2.1.1.3.1.cmml">⁢</mo><mi id="S3.SS2.p1.3.m2.1.1.3.5" xref="S3.SS2.p1.3.m2.1.1.3.5.cmml">l</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.SS2.p1.3.m2.1b"><apply id="S3.SS2.p1.3.m2.1.1.cmml" xref="S3.SS2.p1.3.m2.1.1"><csymbol cd="ambiguous" id="S3.SS2.p1.3.m2.1.1.1.cmml" xref="S3.SS2.p1.3.m2.1.1">subscript</csymbol><ci id="S3.SS2.p1.3.m2.1.1.2.cmml" xref="S3.SS2.p1.3.m2.1.1.2">𝒟</ci><apply id="S3.SS2.p1.3.m2.1.1.3.cmml" xref="S3.SS2.p1.3.m2.1.1.3"><times id="S3.SS2.p1.3.m2.1.1.3.1.cmml" xref="S3.SS2.p1.3.m2.1.1.3.1"></times><ci id="S3.SS2.p1.3.m2.1.1.3.2.cmml" xref="S3.SS2.p1.3.m2.1.1.3.2">𝑟</ci><ci id="S3.SS2.p1.3.m2.1.1.3.3.cmml" xref="S3.SS2.p1.3.m2.1.1.3.3">𝑒</ci><ci id="S3.SS2.p1.3.m2.1.1.3.4.cmml" xref="S3.SS2.p1.3.m2.1.1.3.4">𝑎</ci><ci id="S3.SS2.p1.3.m2.1.1.3.5.cmml" xref="S3.SS2.p1.3.m2.1.1.3.5">𝑙</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p1.3.m2.1c">\mathcal{D}_{real}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p1.3.m2.1d">caligraphic_D start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT</annotation></semantics></math> and their simulated counterparts <math alttext="\mathcal{D}_{sim}(\theta)" class="ltx_Math" display="inline" id="S3.SS2.p1.4.m3.1"><semantics id="S3.SS2.p1.4.m3.1a"><mrow id="S3.SS2.p1.4.m3.1.2" xref="S3.SS2.p1.4.m3.1.2.cmml"><msub id="S3.SS2.p1.4.m3.1.2.2" xref="S3.SS2.p1.4.m3.1.2.2.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.SS2.p1.4.m3.1.2.2.2" xref="S3.SS2.p1.4.m3.1.2.2.2.cmml">𝒟</mi><mrow id="S3.SS2.p1.4.m3.1.2.2.3" xref="S3.SS2.p1.4.m3.1.2.2.3.cmml"><mi id="S3.SS2.p1.4.m3.1.2.2.3.2" xref="S3.SS2.p1.4.m3.1.2.2.3.2.cmml">s</mi><mo id="S3.SS2.p1.4.m3.1.2.2.3.1" xref="S3.SS2.p1.4.m3.1.2.2.3.1.cmml">⁢</mo><mi id="S3.SS2.p1.4.m3.1.2.2.3.3" xref="S3.SS2.p1.4.m3.1.2.2.3.3.cmml">i</mi><mo id="S3.SS2.p1.4.m3.1.2.2.3.1a" xref="S3.SS2.p1.4.m3.1.2.2.3.1.cmml">⁢</mo><mi id="S3.SS2.p1.4.m3.1.2.2.3.4" xref="S3.SS2.p1.4.m3.1.2.2.3.4.cmml">m</mi></mrow></msub><mo id="S3.SS2.p1.4.m3.1.2.1" xref="S3.SS2.p1.4.m3.1.2.1.cmml">⁢</mo><mrow id="S3.SS2.p1.4.m3.1.2.3.2" xref="S3.SS2.p1.4.m3.1.2.cmml"><mo id="S3.SS2.p1.4.m3.1.2.3.2.1" stretchy="false" xref="S3.SS2.p1.4.m3.1.2.cmml">(</mo><mi id="S3.SS2.p1.4.m3.1.1" xref="S3.SS2.p1.4.m3.1.1.cmml">θ</mi><mo id="S3.SS2.p1.4.m3.1.2.3.2.2" stretchy="false" xref="S3.SS2.p1.4.m3.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.p1.4.m3.1b"><apply id="S3.SS2.p1.4.m3.1.2.cmml" xref="S3.SS2.p1.4.m3.1.2"><times id="S3.SS2.p1.4.m3.1.2.1.cmml" xref="S3.SS2.p1.4.m3.1.2.1"></times><apply id="S3.SS2.p1.4.m3.1.2.2.cmml" xref="S3.SS2.p1.4.m3.1.2.2"><csymbol cd="ambiguous" id="S3.SS2.p1.4.m3.1.2.2.1.cmml" xref="S3.SS2.p1.4.m3.1.2.2">subscript</csymbol><ci id="S3.SS2.p1.4.m3.1.2.2.2.cmml" xref="S3.SS2.p1.4.m3.1.2.2.2">𝒟</ci><apply id="S3.SS2.p1.4.m3.1.2.2.3.cmml" xref="S3.SS2.p1.4.m3.1.2.2.3"><times id="S3.SS2.p1.4.m3.1.2.2.3.1.cmml" xref="S3.SS2.p1.4.m3.1.2.2.3.1"></times><ci id="S3.SS2.p1.4.m3.1.2.2.3.2.cmml" xref="S3.SS2.p1.4.m3.1.2.2.3.2">𝑠</ci><ci id="S3.SS2.p1.4.m3.1.2.2.3.3.cmml" xref="S3.SS2.p1.4.m3.1.2.2.3.3">𝑖</ci><ci id="S3.SS2.p1.4.m3.1.2.2.3.4.cmml" xref="S3.SS2.p1.4.m3.1.2.2.3.4">𝑚</ci></apply></apply><ci id="S3.SS2.p1.4.m3.1.1.cmml" xref="S3.SS2.p1.4.m3.1.1">𝜃</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p1.4.m3.1c">\mathcal{D}_{sim}(\theta)</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p1.4.m3.1d">caligraphic_D start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT ( italic_θ )</annotation></semantics></math>, The total loss <math alttext="\mathcal{L}" class="ltx_Math" display="inline" id="S3.SS2.p1.5.m4.1"><semantics id="S3.SS2.p1.5.m4.1a"><mi class="ltx_font_mathcaligraphic" id="S3.SS2.p1.5.m4.1.1" xref="S3.SS2.p1.5.m4.1.1.cmml">ℒ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.p1.5.m4.1b"><ci id="S3.SS2.p1.5.m4.1.1.cmml" xref="S3.SS2.p1.5.m4.1.1">ℒ</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p1.5.m4.1c">\mathcal{L}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p1.5.m4.1d">caligraphic_L</annotation></semantics></math> is then minimized using gradient-based optimization, where the parameters <math alttext="\theta" class="ltx_Math" display="inline" id="S3.SS2.p1.6.m5.1"><semantics id="S3.SS2.p1.6.m5.1a"><mi id="S3.SS2.p1.6.m5.1.1" xref="S3.SS2.p1.6.m5.1.1.cmml">θ</mi><annotation-xml encoding="MathML-Content" id="S3.SS2.p1.6.m5.1b"><ci id="S3.SS2.p1.6.m5.1.1.cmml" xref="S3.SS2.p1.6.m5.1.1">𝜃</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p1.6.m5.1c">\theta</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p1.6.m5.1d">italic_θ</annotation></semantics></math> are updated as</p> <table class="ltx_equation ltx_eqn_table" id="S3.Ex3"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\theta\leftarrow\theta-\alpha\nabla_{\theta}\mathcal{L}(\theta)\enspace." class="ltx_Math" display="block" id="S3.Ex3.m1.2"><semantics id="S3.Ex3.m1.2a"><mrow id="S3.Ex3.m1.2.2.1" xref="S3.Ex3.m1.2.2.1.1.cmml"><mrow id="S3.Ex3.m1.2.2.1.1" xref="S3.Ex3.m1.2.2.1.1.cmml"><mi id="S3.Ex3.m1.2.2.1.1.2" xref="S3.Ex3.m1.2.2.1.1.2.cmml">θ</mi><mo id="S3.Ex3.m1.2.2.1.1.1" stretchy="false" xref="S3.Ex3.m1.2.2.1.1.1.cmml">←</mo><mrow id="S3.Ex3.m1.2.2.1.1.3" xref="S3.Ex3.m1.2.2.1.1.3.cmml"><mi id="S3.Ex3.m1.2.2.1.1.3.2" xref="S3.Ex3.m1.2.2.1.1.3.2.cmml">θ</mi><mo id="S3.Ex3.m1.2.2.1.1.3.1" xref="S3.Ex3.m1.2.2.1.1.3.1.cmml">−</mo><mrow id="S3.Ex3.m1.2.2.1.1.3.3" xref="S3.Ex3.m1.2.2.1.1.3.3.cmml"><mi id="S3.Ex3.m1.2.2.1.1.3.3.2" xref="S3.Ex3.m1.2.2.1.1.3.3.2.cmml">α</mi><mo id="S3.Ex3.m1.2.2.1.1.3.3.1" lspace="0.167em" xref="S3.Ex3.m1.2.2.1.1.3.3.1.cmml">⁢</mo><mrow id="S3.Ex3.m1.2.2.1.1.3.3.3" xref="S3.Ex3.m1.2.2.1.1.3.3.3.cmml"><msub id="S3.Ex3.m1.2.2.1.1.3.3.3.1" xref="S3.Ex3.m1.2.2.1.1.3.3.3.1.cmml"><mo id="S3.Ex3.m1.2.2.1.1.3.3.3.1.2" rspace="0.167em" xref="S3.Ex3.m1.2.2.1.1.3.3.3.1.2.cmml">∇</mo><mi id="S3.Ex3.m1.2.2.1.1.3.3.3.1.3" xref="S3.Ex3.m1.2.2.1.1.3.3.3.1.3.cmml">θ</mi></msub><mi class="ltx_font_mathcaligraphic" id="S3.Ex3.m1.2.2.1.1.3.3.3.2" xref="S3.Ex3.m1.2.2.1.1.3.3.3.2.cmml">ℒ</mi></mrow><mo id="S3.Ex3.m1.2.2.1.1.3.3.1a" xref="S3.Ex3.m1.2.2.1.1.3.3.1.cmml">⁢</mo><mrow id="S3.Ex3.m1.2.2.1.1.3.3.4.2" xref="S3.Ex3.m1.2.2.1.1.3.3.cmml"><mo id="S3.Ex3.m1.2.2.1.1.3.3.4.2.1" stretchy="false" xref="S3.Ex3.m1.2.2.1.1.3.3.cmml">(</mo><mi id="S3.Ex3.m1.1.1" xref="S3.Ex3.m1.1.1.cmml">θ</mi><mo id="S3.Ex3.m1.2.2.1.1.3.3.4.2.2" rspace="0.222em" stretchy="false" xref="S3.Ex3.m1.2.2.1.1.3.3.cmml">)</mo></mrow></mrow></mrow></mrow><mo id="S3.Ex3.m1.2.2.1.2" xref="S3.Ex3.m1.2.2.1.1.cmml">.</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.Ex3.m1.2b"><apply id="S3.Ex3.m1.2.2.1.1.cmml" xref="S3.Ex3.m1.2.2.1"><ci id="S3.Ex3.m1.2.2.1.1.1.cmml" xref="S3.Ex3.m1.2.2.1.1.1">←</ci><ci id="S3.Ex3.m1.2.2.1.1.2.cmml" xref="S3.Ex3.m1.2.2.1.1.2">𝜃</ci><apply id="S3.Ex3.m1.2.2.1.1.3.cmml" xref="S3.Ex3.m1.2.2.1.1.3"><minus id="S3.Ex3.m1.2.2.1.1.3.1.cmml" xref="S3.Ex3.m1.2.2.1.1.3.1"></minus><ci id="S3.Ex3.m1.2.2.1.1.3.2.cmml" xref="S3.Ex3.m1.2.2.1.1.3.2">𝜃</ci><apply id="S3.Ex3.m1.2.2.1.1.3.3.cmml" xref="S3.Ex3.m1.2.2.1.1.3.3"><times id="S3.Ex3.m1.2.2.1.1.3.3.1.cmml" xref="S3.Ex3.m1.2.2.1.1.3.3.1"></times><ci id="S3.Ex3.m1.2.2.1.1.3.3.2.cmml" xref="S3.Ex3.m1.2.2.1.1.3.3.2">𝛼</ci><apply id="S3.Ex3.m1.2.2.1.1.3.3.3.cmml" xref="S3.Ex3.m1.2.2.1.1.3.3.3"><apply id="S3.Ex3.m1.2.2.1.1.3.3.3.1.cmml" xref="S3.Ex3.m1.2.2.1.1.3.3.3.1"><csymbol cd="ambiguous" id="S3.Ex3.m1.2.2.1.1.3.3.3.1.1.cmml" xref="S3.Ex3.m1.2.2.1.1.3.3.3.1">subscript</csymbol><ci id="S3.Ex3.m1.2.2.1.1.3.3.3.1.2.cmml" xref="S3.Ex3.m1.2.2.1.1.3.3.3.1.2">∇</ci><ci id="S3.Ex3.m1.2.2.1.1.3.3.3.1.3.cmml" xref="S3.Ex3.m1.2.2.1.1.3.3.3.1.3">𝜃</ci></apply><ci id="S3.Ex3.m1.2.2.1.1.3.3.3.2.cmml" xref="S3.Ex3.m1.2.2.1.1.3.3.3.2">ℒ</ci></apply><ci id="S3.Ex3.m1.1.1.cmml" xref="S3.Ex3.m1.1.1">𝜃</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.Ex3.m1.2c">\theta\leftarrow\theta-\alpha\nabla_{\theta}\mathcal{L}(\theta)\enspace.</annotation><annotation encoding="application/x-llamapun" id="S3.Ex3.m1.2d">italic_θ ← italic_θ - italic_α ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_L ( italic_θ ) .</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS2.p1.7">Through backpropagation, this process adjusts the simulation parameters iteratively to ensure the environment accurately reproduces both the physical behaviors and visual appearances observed in real-world data.</p> </div> </section> <section class="ltx_subsection" id="S3.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS3.4.1.1">III-C</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS3.5.2">Adaptive InfoGap Loss construction</span> </h3> <div class="ltx_para" id="S3.SS3.p1"> <p class="ltx_p" id="S3.SS3.p1.1">Once the simulator is tuned for the current iteration, it is used to train the new policy. The trained policy is then deployed on the real robot to collect data for the next iteration, i.e., a trajectory-based sampling method is utilized as discussed in Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S2.SS3" title="II-C Data Collection Approaches ‣ II Preliminary ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">II-C</span></span></a>. Thus, the cost function used for policy training must be designed not only to focus on task completion but also to address the collection of new real-world data that helps narrow the sim-to-real gap. As noted earlier, trajectory-based sampling can introduce bias when estimating the distribution of the real-world domain. To mitigate this, we balance the estimation of the real domain with task completion throughout the iterations, ensuring that the policy explores underrepresented regions of the real world while still optimizing for task performance.</p> </div> <figure class="ltx_figure" id="S3.F2"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="140" id="S3.F2.g1" src="extracted/6289317/Figure/cost.png" width="269"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 2: </span>The process of bridging the sim-to-real gap in robot training. When the discrepancy between the simulation (blue domain) and real robot (orange domain) is large, the policy prioritizes collecting informative data (marked as crosses) from the real domain to better characterize its properties other than the task trajectory (black dashed line). </figcaption> </figure> <div class="ltx_para" id="S3.SS3.p2"> <p class="ltx_p" id="S3.SS3.p2.1">The core idea as shown in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S3.F2" title="Figure 2 ‣ III-C Adaptive InfoGap Loss construction ‣ III Method ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_tag">2</span></a> is that when the discrepancy between the simulation environment and the real robot is substantial, the policy prioritizes collecting more informative data that better characterizes the real domain properties rather than focusing solely on task completion. This is because a policy trained under such a significant sim-to-real gap is unlikely to perform well on the real robot. As the gap narrows and the simulator parameters converge towards the real domain, this term in the loss function diminishes, allowing the policy to shift its focus to task completion.</p> </div> <div class="ltx_para" id="S3.SS3.p3"> <p class="ltx_p" id="S3.SS3.p3.2">Specifically, this adaptive InfoGap cost and the corresponding minimization problem to determine the action at timestep <math alttext="t" class="ltx_Math" display="inline" id="S3.SS3.p3.1.m1.1"><semantics id="S3.SS3.p3.1.m1.1a"><mi id="S3.SS3.p3.1.m1.1.1" xref="S3.SS3.p3.1.m1.1.1.cmml">t</mi><annotation-xml encoding="MathML-Content" id="S3.SS3.p3.1.m1.1b"><ci id="S3.SS3.p3.1.m1.1.1.cmml" xref="S3.SS3.p3.1.m1.1.1">𝑡</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p3.1.m1.1c">t</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p3.1.m1.1d">italic_t</annotation></semantics></math> during the <math alttext="k" class="ltx_Math" display="inline" id="S3.SS3.p3.2.m2.1"><semantics id="S3.SS3.p3.2.m2.1a"><mi id="S3.SS3.p3.2.m2.1.1" xref="S3.SS3.p3.2.m2.1.1.cmml">k</mi><annotation-xml encoding="MathML-Content" id="S3.SS3.p3.2.m2.1b"><ci id="S3.SS3.p3.2.m2.1.1.cmml" xref="S3.SS3.p3.2.m2.1.1">𝑘</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p3.2.m2.1c">k</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p3.2.m2.1d">italic_k</annotation></semantics></math>-th sim-to-real tuning iteration is constructed as</p> <table class="ltx_equation ltx_eqn_table" id="S3.Ex4"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="a_{k,t}=arg\min\enspace\mathcal{L}(a_{k,t})=\mathcal{L}_{task}(a_{t})+\mathcal% {L}_{sr}(a_{k,t})\enspace," class="ltx_Math" display="block" id="S3.Ex4.m1.7"><semantics id="S3.Ex4.m1.7a"><mrow id="S3.Ex4.m1.7.7.1" xref="S3.Ex4.m1.7.7.1.1.cmml"><mrow id="S3.Ex4.m1.7.7.1.1" xref="S3.Ex4.m1.7.7.1.1.cmml"><msub id="S3.Ex4.m1.7.7.1.1.5" xref="S3.Ex4.m1.7.7.1.1.5.cmml"><mi id="S3.Ex4.m1.7.7.1.1.5.2" xref="S3.Ex4.m1.7.7.1.1.5.2.cmml">a</mi><mrow id="S3.Ex4.m1.2.2.2.4" xref="S3.Ex4.m1.2.2.2.3.cmml"><mi id="S3.Ex4.m1.1.1.1.1" xref="S3.Ex4.m1.1.1.1.1.cmml">k</mi><mo id="S3.Ex4.m1.2.2.2.4.1" xref="S3.Ex4.m1.2.2.2.3.cmml">,</mo><mi id="S3.Ex4.m1.2.2.2.2" xref="S3.Ex4.m1.2.2.2.2.cmml">t</mi></mrow></msub><mo id="S3.Ex4.m1.7.7.1.1.6" xref="S3.Ex4.m1.7.7.1.1.6.cmml">=</mo><mrow id="S3.Ex4.m1.7.7.1.1.1" xref="S3.Ex4.m1.7.7.1.1.1.cmml"><mi id="S3.Ex4.m1.7.7.1.1.1.3" xref="S3.Ex4.m1.7.7.1.1.1.3.cmml">a</mi><mo id="S3.Ex4.m1.7.7.1.1.1.2" xref="S3.Ex4.m1.7.7.1.1.1.2.cmml">⁢</mo><mi id="S3.Ex4.m1.7.7.1.1.1.4" xref="S3.Ex4.m1.7.7.1.1.1.4.cmml">r</mi><mo id="S3.Ex4.m1.7.7.1.1.1.2a" xref="S3.Ex4.m1.7.7.1.1.1.2.cmml">⁢</mo><mi id="S3.Ex4.m1.7.7.1.1.1.5" xref="S3.Ex4.m1.7.7.1.1.1.5.cmml">g</mi><mo id="S3.Ex4.m1.7.7.1.1.1.2b" lspace="0.167em" xref="S3.Ex4.m1.7.7.1.1.1.2.cmml">⁢</mo><mrow id="S3.Ex4.m1.7.7.1.1.1.6" xref="S3.Ex4.m1.7.7.1.1.1.6.cmml"><mi id="S3.Ex4.m1.7.7.1.1.1.6.1" xref="S3.Ex4.m1.7.7.1.1.1.6.1.cmml">min</mi><mo id="S3.Ex4.m1.7.7.1.1.1.6a" lspace="0.667em" xref="S3.Ex4.m1.7.7.1.1.1.6.cmml">⁡</mo><mi class="ltx_font_mathcaligraphic" id="S3.Ex4.m1.7.7.1.1.1.6.2" xref="S3.Ex4.m1.7.7.1.1.1.6.2.cmml">ℒ</mi></mrow><mo id="S3.Ex4.m1.7.7.1.1.1.2c" xref="S3.Ex4.m1.7.7.1.1.1.2.cmml">⁢</mo><mrow id="S3.Ex4.m1.7.7.1.1.1.1.1" xref="S3.Ex4.m1.7.7.1.1.1.1.1.1.cmml"><mo id="S3.Ex4.m1.7.7.1.1.1.1.1.2" stretchy="false" xref="S3.Ex4.m1.7.7.1.1.1.1.1.1.cmml">(</mo><msub id="S3.Ex4.m1.7.7.1.1.1.1.1.1" xref="S3.Ex4.m1.7.7.1.1.1.1.1.1.cmml"><mi id="S3.Ex4.m1.7.7.1.1.1.1.1.1.2" xref="S3.Ex4.m1.7.7.1.1.1.1.1.1.2.cmml">a</mi><mrow id="S3.Ex4.m1.4.4.2.4" xref="S3.Ex4.m1.4.4.2.3.cmml"><mi id="S3.Ex4.m1.3.3.1.1" xref="S3.Ex4.m1.3.3.1.1.cmml">k</mi><mo id="S3.Ex4.m1.4.4.2.4.1" xref="S3.Ex4.m1.4.4.2.3.cmml">,</mo><mi id="S3.Ex4.m1.4.4.2.2" xref="S3.Ex4.m1.4.4.2.2.cmml">t</mi></mrow></msub><mo id="S3.Ex4.m1.7.7.1.1.1.1.1.3" stretchy="false" xref="S3.Ex4.m1.7.7.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.Ex4.m1.7.7.1.1.7" xref="S3.Ex4.m1.7.7.1.1.7.cmml">=</mo><mrow id="S3.Ex4.m1.7.7.1.1.3" xref="S3.Ex4.m1.7.7.1.1.3.cmml"><mrow id="S3.Ex4.m1.7.7.1.1.2.1" xref="S3.Ex4.m1.7.7.1.1.2.1.cmml"><msub id="S3.Ex4.m1.7.7.1.1.2.1.3" xref="S3.Ex4.m1.7.7.1.1.2.1.3.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.Ex4.m1.7.7.1.1.2.1.3.2" xref="S3.Ex4.m1.7.7.1.1.2.1.3.2.cmml">ℒ</mi><mrow id="S3.Ex4.m1.7.7.1.1.2.1.3.3" xref="S3.Ex4.m1.7.7.1.1.2.1.3.3.cmml"><mi id="S3.Ex4.m1.7.7.1.1.2.1.3.3.2" xref="S3.Ex4.m1.7.7.1.1.2.1.3.3.2.cmml">t</mi><mo id="S3.Ex4.m1.7.7.1.1.2.1.3.3.1" xref="S3.Ex4.m1.7.7.1.1.2.1.3.3.1.cmml">⁢</mo><mi id="S3.Ex4.m1.7.7.1.1.2.1.3.3.3" xref="S3.Ex4.m1.7.7.1.1.2.1.3.3.3.cmml">a</mi><mo id="S3.Ex4.m1.7.7.1.1.2.1.3.3.1a" xref="S3.Ex4.m1.7.7.1.1.2.1.3.3.1.cmml">⁢</mo><mi id="S3.Ex4.m1.7.7.1.1.2.1.3.3.4" xref="S3.Ex4.m1.7.7.1.1.2.1.3.3.4.cmml">s</mi><mo id="S3.Ex4.m1.7.7.1.1.2.1.3.3.1b" xref="S3.Ex4.m1.7.7.1.1.2.1.3.3.1.cmml">⁢</mo><mi id="S3.Ex4.m1.7.7.1.1.2.1.3.3.5" xref="S3.Ex4.m1.7.7.1.1.2.1.3.3.5.cmml">k</mi></mrow></msub><mo id="S3.Ex4.m1.7.7.1.1.2.1.2" xref="S3.Ex4.m1.7.7.1.1.2.1.2.cmml">⁢</mo><mrow id="S3.Ex4.m1.7.7.1.1.2.1.1.1" xref="S3.Ex4.m1.7.7.1.1.2.1.1.1.1.cmml"><mo id="S3.Ex4.m1.7.7.1.1.2.1.1.1.2" stretchy="false" xref="S3.Ex4.m1.7.7.1.1.2.1.1.1.1.cmml">(</mo><msub id="S3.Ex4.m1.7.7.1.1.2.1.1.1.1" xref="S3.Ex4.m1.7.7.1.1.2.1.1.1.1.cmml"><mi id="S3.Ex4.m1.7.7.1.1.2.1.1.1.1.2" xref="S3.Ex4.m1.7.7.1.1.2.1.1.1.1.2.cmml">a</mi><mi id="S3.Ex4.m1.7.7.1.1.2.1.1.1.1.3" xref="S3.Ex4.m1.7.7.1.1.2.1.1.1.1.3.cmml">t</mi></msub><mo id="S3.Ex4.m1.7.7.1.1.2.1.1.1.3" stretchy="false" xref="S3.Ex4.m1.7.7.1.1.2.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.Ex4.m1.7.7.1.1.3.3" xref="S3.Ex4.m1.7.7.1.1.3.3.cmml">+</mo><mrow id="S3.Ex4.m1.7.7.1.1.3.2" xref="S3.Ex4.m1.7.7.1.1.3.2.cmml"><msub id="S3.Ex4.m1.7.7.1.1.3.2.3" xref="S3.Ex4.m1.7.7.1.1.3.2.3.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.Ex4.m1.7.7.1.1.3.2.3.2" xref="S3.Ex4.m1.7.7.1.1.3.2.3.2.cmml">ℒ</mi><mrow id="S3.Ex4.m1.7.7.1.1.3.2.3.3" xref="S3.Ex4.m1.7.7.1.1.3.2.3.3.cmml"><mi id="S3.Ex4.m1.7.7.1.1.3.2.3.3.2" xref="S3.Ex4.m1.7.7.1.1.3.2.3.3.2.cmml">s</mi><mo id="S3.Ex4.m1.7.7.1.1.3.2.3.3.1" xref="S3.Ex4.m1.7.7.1.1.3.2.3.3.1.cmml">⁢</mo><mi id="S3.Ex4.m1.7.7.1.1.3.2.3.3.3" xref="S3.Ex4.m1.7.7.1.1.3.2.3.3.3.cmml">r</mi></mrow></msub><mo id="S3.Ex4.m1.7.7.1.1.3.2.2" xref="S3.Ex4.m1.7.7.1.1.3.2.2.cmml">⁢</mo><mrow id="S3.Ex4.m1.7.7.1.1.3.2.1.1" xref="S3.Ex4.m1.7.7.1.1.3.2.1.1.1.cmml"><mo id="S3.Ex4.m1.7.7.1.1.3.2.1.1.2" stretchy="false" xref="S3.Ex4.m1.7.7.1.1.3.2.1.1.1.cmml">(</mo><msub id="S3.Ex4.m1.7.7.1.1.3.2.1.1.1" xref="S3.Ex4.m1.7.7.1.1.3.2.1.1.1.cmml"><mi id="S3.Ex4.m1.7.7.1.1.3.2.1.1.1.2" xref="S3.Ex4.m1.7.7.1.1.3.2.1.1.1.2.cmml">a</mi><mrow id="S3.Ex4.m1.6.6.2.4" xref="S3.Ex4.m1.6.6.2.3.cmml"><mi id="S3.Ex4.m1.5.5.1.1" xref="S3.Ex4.m1.5.5.1.1.cmml">k</mi><mo id="S3.Ex4.m1.6.6.2.4.1" xref="S3.Ex4.m1.6.6.2.3.cmml">,</mo><mi id="S3.Ex4.m1.6.6.2.2" xref="S3.Ex4.m1.6.6.2.2.cmml">t</mi></mrow></msub><mo id="S3.Ex4.m1.7.7.1.1.3.2.1.1.3" rspace="0.500em" stretchy="false" xref="S3.Ex4.m1.7.7.1.1.3.2.1.1.1.cmml">)</mo></mrow></mrow></mrow></mrow><mo id="S3.Ex4.m1.7.7.1.2" xref="S3.Ex4.m1.7.7.1.1.cmml">,</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.Ex4.m1.7b"><apply id="S3.Ex4.m1.7.7.1.1.cmml" xref="S3.Ex4.m1.7.7.1"><and id="S3.Ex4.m1.7.7.1.1a.cmml" xref="S3.Ex4.m1.7.7.1"></and><apply id="S3.Ex4.m1.7.7.1.1b.cmml" xref="S3.Ex4.m1.7.7.1"><eq id="S3.Ex4.m1.7.7.1.1.6.cmml" xref="S3.Ex4.m1.7.7.1.1.6"></eq><apply id="S3.Ex4.m1.7.7.1.1.5.cmml" xref="S3.Ex4.m1.7.7.1.1.5"><csymbol cd="ambiguous" id="S3.Ex4.m1.7.7.1.1.5.1.cmml" xref="S3.Ex4.m1.7.7.1.1.5">subscript</csymbol><ci id="S3.Ex4.m1.7.7.1.1.5.2.cmml" xref="S3.Ex4.m1.7.7.1.1.5.2">𝑎</ci><list id="S3.Ex4.m1.2.2.2.3.cmml" xref="S3.Ex4.m1.2.2.2.4"><ci id="S3.Ex4.m1.1.1.1.1.cmml" xref="S3.Ex4.m1.1.1.1.1">𝑘</ci><ci id="S3.Ex4.m1.2.2.2.2.cmml" xref="S3.Ex4.m1.2.2.2.2">𝑡</ci></list></apply><apply id="S3.Ex4.m1.7.7.1.1.1.cmml" xref="S3.Ex4.m1.7.7.1.1.1"><times id="S3.Ex4.m1.7.7.1.1.1.2.cmml" xref="S3.Ex4.m1.7.7.1.1.1.2"></times><ci id="S3.Ex4.m1.7.7.1.1.1.3.cmml" xref="S3.Ex4.m1.7.7.1.1.1.3">𝑎</ci><ci id="S3.Ex4.m1.7.7.1.1.1.4.cmml" xref="S3.Ex4.m1.7.7.1.1.1.4">𝑟</ci><ci id="S3.Ex4.m1.7.7.1.1.1.5.cmml" xref="S3.Ex4.m1.7.7.1.1.1.5">𝑔</ci><apply id="S3.Ex4.m1.7.7.1.1.1.6.cmml" xref="S3.Ex4.m1.7.7.1.1.1.6"><min id="S3.Ex4.m1.7.7.1.1.1.6.1.cmml" xref="S3.Ex4.m1.7.7.1.1.1.6.1"></min><ci id="S3.Ex4.m1.7.7.1.1.1.6.2.cmml" xref="S3.Ex4.m1.7.7.1.1.1.6.2">ℒ</ci></apply><apply id="S3.Ex4.m1.7.7.1.1.1.1.1.1.cmml" xref="S3.Ex4.m1.7.7.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.Ex4.m1.7.7.1.1.1.1.1.1.1.cmml" xref="S3.Ex4.m1.7.7.1.1.1.1.1">subscript</csymbol><ci id="S3.Ex4.m1.7.7.1.1.1.1.1.1.2.cmml" xref="S3.Ex4.m1.7.7.1.1.1.1.1.1.2">𝑎</ci><list id="S3.Ex4.m1.4.4.2.3.cmml" xref="S3.Ex4.m1.4.4.2.4"><ci id="S3.Ex4.m1.3.3.1.1.cmml" xref="S3.Ex4.m1.3.3.1.1">𝑘</ci><ci id="S3.Ex4.m1.4.4.2.2.cmml" xref="S3.Ex4.m1.4.4.2.2">𝑡</ci></list></apply></apply></apply><apply id="S3.Ex4.m1.7.7.1.1c.cmml" xref="S3.Ex4.m1.7.7.1"><eq id="S3.Ex4.m1.7.7.1.1.7.cmml" xref="S3.Ex4.m1.7.7.1.1.7"></eq><share href="https://arxiv.org/html/2503.10118v2#S3.Ex4.m1.7.7.1.1.1.cmml" id="S3.Ex4.m1.7.7.1.1d.cmml" xref="S3.Ex4.m1.7.7.1"></share><apply id="S3.Ex4.m1.7.7.1.1.3.cmml" xref="S3.Ex4.m1.7.7.1.1.3"><plus id="S3.Ex4.m1.7.7.1.1.3.3.cmml" xref="S3.Ex4.m1.7.7.1.1.3.3"></plus><apply id="S3.Ex4.m1.7.7.1.1.2.1.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1"><times id="S3.Ex4.m1.7.7.1.1.2.1.2.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1.2"></times><apply id="S3.Ex4.m1.7.7.1.1.2.1.3.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1.3"><csymbol cd="ambiguous" id="S3.Ex4.m1.7.7.1.1.2.1.3.1.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1.3">subscript</csymbol><ci id="S3.Ex4.m1.7.7.1.1.2.1.3.2.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1.3.2">ℒ</ci><apply id="S3.Ex4.m1.7.7.1.1.2.1.3.3.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1.3.3"><times id="S3.Ex4.m1.7.7.1.1.2.1.3.3.1.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1.3.3.1"></times><ci id="S3.Ex4.m1.7.7.1.1.2.1.3.3.2.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1.3.3.2">𝑡</ci><ci id="S3.Ex4.m1.7.7.1.1.2.1.3.3.3.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1.3.3.3">𝑎</ci><ci id="S3.Ex4.m1.7.7.1.1.2.1.3.3.4.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1.3.3.4">𝑠</ci><ci id="S3.Ex4.m1.7.7.1.1.2.1.3.3.5.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1.3.3.5">𝑘</ci></apply></apply><apply id="S3.Ex4.m1.7.7.1.1.2.1.1.1.1.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1.1.1"><csymbol cd="ambiguous" id="S3.Ex4.m1.7.7.1.1.2.1.1.1.1.1.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1.1.1">subscript</csymbol><ci id="S3.Ex4.m1.7.7.1.1.2.1.1.1.1.2.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1.1.1.1.2">𝑎</ci><ci id="S3.Ex4.m1.7.7.1.1.2.1.1.1.1.3.cmml" xref="S3.Ex4.m1.7.7.1.1.2.1.1.1.1.3">𝑡</ci></apply></apply><apply id="S3.Ex4.m1.7.7.1.1.3.2.cmml" xref="S3.Ex4.m1.7.7.1.1.3.2"><times id="S3.Ex4.m1.7.7.1.1.3.2.2.cmml" xref="S3.Ex4.m1.7.7.1.1.3.2.2"></times><apply id="S3.Ex4.m1.7.7.1.1.3.2.3.cmml" xref="S3.Ex4.m1.7.7.1.1.3.2.3"><csymbol cd="ambiguous" id="S3.Ex4.m1.7.7.1.1.3.2.3.1.cmml" xref="S3.Ex4.m1.7.7.1.1.3.2.3">subscript</csymbol><ci id="S3.Ex4.m1.7.7.1.1.3.2.3.2.cmml" xref="S3.Ex4.m1.7.7.1.1.3.2.3.2">ℒ</ci><apply id="S3.Ex4.m1.7.7.1.1.3.2.3.3.cmml" xref="S3.Ex4.m1.7.7.1.1.3.2.3.3"><times id="S3.Ex4.m1.7.7.1.1.3.2.3.3.1.cmml" xref="S3.Ex4.m1.7.7.1.1.3.2.3.3.1"></times><ci id="S3.Ex4.m1.7.7.1.1.3.2.3.3.2.cmml" xref="S3.Ex4.m1.7.7.1.1.3.2.3.3.2">𝑠</ci><ci id="S3.Ex4.m1.7.7.1.1.3.2.3.3.3.cmml" xref="S3.Ex4.m1.7.7.1.1.3.2.3.3.3">𝑟</ci></apply></apply><apply id="S3.Ex4.m1.7.7.1.1.3.2.1.1.1.cmml" xref="S3.Ex4.m1.7.7.1.1.3.2.1.1"><csymbol cd="ambiguous" id="S3.Ex4.m1.7.7.1.1.3.2.1.1.1.1.cmml" xref="S3.Ex4.m1.7.7.1.1.3.2.1.1">subscript</csymbol><ci id="S3.Ex4.m1.7.7.1.1.3.2.1.1.1.2.cmml" xref="S3.Ex4.m1.7.7.1.1.3.2.1.1.1.2">𝑎</ci><list id="S3.Ex4.m1.6.6.2.3.cmml" xref="S3.Ex4.m1.6.6.2.4"><ci id="S3.Ex4.m1.5.5.1.1.cmml" xref="S3.Ex4.m1.5.5.1.1">𝑘</ci><ci id="S3.Ex4.m1.6.6.2.2.cmml" xref="S3.Ex4.m1.6.6.2.2">𝑡</ci></list></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.Ex4.m1.7c">a_{k,t}=arg\min\enspace\mathcal{L}(a_{k,t})=\mathcal{L}_{task}(a_{t})+\mathcal% {L}_{sr}(a_{k,t})\enspace,</annotation><annotation encoding="application/x-llamapun" id="S3.Ex4.m1.7d">italic_a start_POSTSUBSCRIPT italic_k , italic_t end_POSTSUBSCRIPT = italic_a italic_r italic_g roman_min caligraphic_L ( italic_a start_POSTSUBSCRIPT italic_k , italic_t end_POSTSUBSCRIPT ) = caligraphic_L start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + caligraphic_L start_POSTSUBSCRIPT italic_s italic_r end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_k , italic_t end_POSTSUBSCRIPT ) ,</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS3.p3.3">where <math alttext="\mathcal{L}_{task}" class="ltx_Math" display="inline" id="S3.SS3.p3.3.m1.1"><semantics id="S3.SS3.p3.3.m1.1a"><msub id="S3.SS3.p3.3.m1.1.1" xref="S3.SS3.p3.3.m1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.SS3.p3.3.m1.1.1.2" xref="S3.SS3.p3.3.m1.1.1.2.cmml">ℒ</mi><mrow id="S3.SS3.p3.3.m1.1.1.3" xref="S3.SS3.p3.3.m1.1.1.3.cmml"><mi id="S3.SS3.p3.3.m1.1.1.3.2" xref="S3.SS3.p3.3.m1.1.1.3.2.cmml">t</mi><mo id="S3.SS3.p3.3.m1.1.1.3.1" xref="S3.SS3.p3.3.m1.1.1.3.1.cmml">⁢</mo><mi id="S3.SS3.p3.3.m1.1.1.3.3" xref="S3.SS3.p3.3.m1.1.1.3.3.cmml">a</mi><mo id="S3.SS3.p3.3.m1.1.1.3.1a" xref="S3.SS3.p3.3.m1.1.1.3.1.cmml">⁢</mo><mi id="S3.SS3.p3.3.m1.1.1.3.4" xref="S3.SS3.p3.3.m1.1.1.3.4.cmml">s</mi><mo id="S3.SS3.p3.3.m1.1.1.3.1b" xref="S3.SS3.p3.3.m1.1.1.3.1.cmml">⁢</mo><mi id="S3.SS3.p3.3.m1.1.1.3.5" xref="S3.SS3.p3.3.m1.1.1.3.5.cmml">k</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.SS3.p3.3.m1.1b"><apply id="S3.SS3.p3.3.m1.1.1.cmml" xref="S3.SS3.p3.3.m1.1.1"><csymbol cd="ambiguous" id="S3.SS3.p3.3.m1.1.1.1.cmml" xref="S3.SS3.p3.3.m1.1.1">subscript</csymbol><ci id="S3.SS3.p3.3.m1.1.1.2.cmml" xref="S3.SS3.p3.3.m1.1.1.2">ℒ</ci><apply id="S3.SS3.p3.3.m1.1.1.3.cmml" xref="S3.SS3.p3.3.m1.1.1.3"><times id="S3.SS3.p3.3.m1.1.1.3.1.cmml" xref="S3.SS3.p3.3.m1.1.1.3.1"></times><ci id="S3.SS3.p3.3.m1.1.1.3.2.cmml" xref="S3.SS3.p3.3.m1.1.1.3.2">𝑡</ci><ci id="S3.SS3.p3.3.m1.1.1.3.3.cmml" xref="S3.SS3.p3.3.m1.1.1.3.3">𝑎</ci><ci id="S3.SS3.p3.3.m1.1.1.3.4.cmml" xref="S3.SS3.p3.3.m1.1.1.3.4">𝑠</ci><ci id="S3.SS3.p3.3.m1.1.1.3.5.cmml" xref="S3.SS3.p3.3.m1.1.1.3.5">𝑘</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p3.3.m1.1c">\mathcal{L}_{task}</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p3.3.m1.1d">caligraphic_L start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT</annotation></semantics></math> is the nominal cost (<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S2.E1" title="In II-A Reinforcement Learning Algorithms ‣ II Preliminary ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_tag">1</span></a>) for task completion and</p> <table class="ltx_equation ltx_eqn_table" id="S3.E3"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\medsize\mathcal{L}_{sr}=-\text{KL}\left(\hat{p}(\mathcal{D}_{real}^{k})% \parallel\hat{p}(\mathcal{D}^{k-1}_{sim})\right)\cdot W_{\beta}\left(\hat{p}(% \mathcal{D}_{sim}^{k}+\mathcal{D}_{t}),\hat{p}(\mathcal{D}_{sim}^{k})\right)\enspace." class="ltx_Math" display="block" id="S3.E3.m1.1"><semantics id="S3.E3.m1.1a"><mrow id="S3.E3.m1.1.1.1" xref="S3.E3.m1.1.1.1.1.cmml"><mrow id="S3.E3.m1.1.1.1.1" xref="S3.E3.m1.1.1.1.1.cmml"><mrow id="S3.E3.m1.1.1.1.1.5" xref="S3.E3.m1.1.1.1.1.5.cmml"><merror class="ltx_ERROR undefined undefined" id="S3.E3.m1.1.1.1.1.5.2" xref="S3.E3.m1.1.1.1.1.5.2b.cmml"><mtext id="S3.E3.m1.1.1.1.1.5.2a" xref="S3.E3.m1.1.1.1.1.5.2b.cmml">{medsize}</mtext></merror><mo id="S3.E3.m1.1.1.1.1.5.1" xref="S3.E3.m1.1.1.1.1.5.1.cmml">⁢</mo><msub id="S3.E3.m1.1.1.1.1.5.3" xref="S3.E3.m1.1.1.1.1.5.3.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.E3.m1.1.1.1.1.5.3.2" xref="S3.E3.m1.1.1.1.1.5.3.2.cmml">ℒ</mi><mrow id="S3.E3.m1.1.1.1.1.5.3.3" xref="S3.E3.m1.1.1.1.1.5.3.3.cmml"><mi id="S3.E3.m1.1.1.1.1.5.3.3.2" xref="S3.E3.m1.1.1.1.1.5.3.3.2.cmml">s</mi><mo id="S3.E3.m1.1.1.1.1.5.3.3.1" xref="S3.E3.m1.1.1.1.1.5.3.3.1.cmml">⁢</mo><mi id="S3.E3.m1.1.1.1.1.5.3.3.3" xref="S3.E3.m1.1.1.1.1.5.3.3.3.cmml">r</mi></mrow></msub></mrow><mo id="S3.E3.m1.1.1.1.1.4" xref="S3.E3.m1.1.1.1.1.4.cmml">=</mo><mrow id="S3.E3.m1.1.1.1.1.3" xref="S3.E3.m1.1.1.1.1.3.cmml"><mo id="S3.E3.m1.1.1.1.1.3a" xref="S3.E3.m1.1.1.1.1.3.cmml">−</mo><mrow id="S3.E3.m1.1.1.1.1.3.3" xref="S3.E3.m1.1.1.1.1.3.3.cmml"><mrow id="S3.E3.m1.1.1.1.1.1.1.1" xref="S3.E3.m1.1.1.1.1.1.1.1.cmml"><mrow id="S3.E3.m1.1.1.1.1.1.1.1.1" xref="S3.E3.m1.1.1.1.1.1.1.1.1.cmml"><mtext id="S3.E3.m1.1.1.1.1.1.1.1.1.3" xref="S3.E3.m1.1.1.1.1.1.1.1.1.3a.cmml">KL</mtext><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.2" xref="S3.E3.m1.1.1.1.1.1.1.1.1.2.cmml">⁢</mo><mrow id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.cmml"><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.2" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.cmml">(</mo><mrow id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.cmml"><mrow id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.cmml"><mover accent="true" id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.3" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.3.cmml"><mi id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.3.2" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.3.2.cmml">p</mi><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.3.1" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.3.1.cmml">^</mo></mover><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.2" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.2.cmml">⁢</mo><mrow id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.cmml"><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2" stretchy="false" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.cmml">(</mo><msubsup id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.2" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.2.cmml">𝒟</mi><mrow id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.cmml"><mi id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.2" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.2.cmml">r</mi><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.1" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.3" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.3.cmml">e</mi><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.1a" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.4" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.4.cmml">a</mi><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.1b" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.5" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.5.cmml">l</mi></mrow><mi id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.cmml">k</mi></msubsup><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3" stretchy="false" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.3" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.3.cmml">∥</mo><mrow id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.cmml"><mover accent="true" id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.3" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.3.cmml"><mi id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.3.2" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.3.2.cmml">p</mi><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.3.1" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.3.1.cmml">^</mo></mover><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.2" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.2.cmml">⁢</mo><mrow id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.cmml"><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.2" stretchy="false" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.cmml">(</mo><msubsup id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.2" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.2.cmml">𝒟</mi><mrow id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.cmml"><mi id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.2" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.2.cmml">s</mi><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.1" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.1.cmml">⁢</mo><mi id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.3" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.3.cmml">i</mi><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.1a" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.1.cmml">⁢</mo><mi id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.4" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.4.cmml">m</mi></mrow><mrow id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3.cmml"><mi id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3.2" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3.2.cmml">k</mi><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3.1" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3.1.cmml">−</mo><mn id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3.3" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3.3.cmml">1</mn></mrow></msubsup><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.3" stretchy="false" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.cmml">)</mo></mrow></mrow></mrow><mo id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.3" rspace="0.055em" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.E3.m1.1.1.1.1.1.1.1.2" rspace="0.222em" xref="S3.E3.m1.1.1.1.1.1.1.1.2.cmml">⋅</mo><msub id="S3.E3.m1.1.1.1.1.1.1.1.3" xref="S3.E3.m1.1.1.1.1.1.1.1.3.cmml"><mi id="S3.E3.m1.1.1.1.1.1.1.1.3.2" xref="S3.E3.m1.1.1.1.1.1.1.1.3.2.cmml">W</mi><mi id="S3.E3.m1.1.1.1.1.1.1.1.3.3" xref="S3.E3.m1.1.1.1.1.1.1.1.3.3.cmml">β</mi></msub></mrow><mo id="S3.E3.m1.1.1.1.1.3.3.4" xref="S3.E3.m1.1.1.1.1.3.3.4.cmml">⁢</mo><mrow id="S3.E3.m1.1.1.1.1.3.3.3.2" xref="S3.E3.m1.1.1.1.1.3.3.3.3.cmml"><mo id="S3.E3.m1.1.1.1.1.3.3.3.2.3" xref="S3.E3.m1.1.1.1.1.3.3.3.3.cmml">(</mo><mrow id="S3.E3.m1.1.1.1.1.2.2.2.1.1" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.cmml"><mover accent="true" id="S3.E3.m1.1.1.1.1.2.2.2.1.1.3" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.3.cmml"><mi id="S3.E3.m1.1.1.1.1.2.2.2.1.1.3.2" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.3.2.cmml">p</mi><mo id="S3.E3.m1.1.1.1.1.2.2.2.1.1.3.1" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.3.1.cmml">^</mo></mover><mo id="S3.E3.m1.1.1.1.1.2.2.2.1.1.2" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.2.cmml">⁢</mo><mrow id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.cmml"><mo id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.2" stretchy="false" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.cmml">(</mo><mrow id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.cmml"><msubsup id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.2" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.2.cmml">𝒟</mi><mrow id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.cmml"><mi id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.2" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.2.cmml">s</mi><mo id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.1" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.1.cmml">⁢</mo><mi id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.3" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.3.cmml">i</mi><mo id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.1a" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.1.cmml">⁢</mo><mi id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.4" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.4.cmml">m</mi></mrow><mi id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.3" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.3.cmml">k</mi></msubsup><mo id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.1" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.1.cmml">+</mo><msub id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.3" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.3.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.3.2" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.3.2.cmml">𝒟</mi><mi id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.3.3" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.3.3.cmml">t</mi></msub></mrow><mo id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.3" stretchy="false" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.E3.m1.1.1.1.1.3.3.3.2.4" xref="S3.E3.m1.1.1.1.1.3.3.3.3.cmml">,</mo><mrow id="S3.E3.m1.1.1.1.1.3.3.3.2.2" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.cmml"><mover accent="true" id="S3.E3.m1.1.1.1.1.3.3.3.2.2.3" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.3.cmml"><mi id="S3.E3.m1.1.1.1.1.3.3.3.2.2.3.2" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.3.2.cmml">p</mi><mo id="S3.E3.m1.1.1.1.1.3.3.3.2.2.3.1" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.3.1.cmml">^</mo></mover><mo id="S3.E3.m1.1.1.1.1.3.3.3.2.2.2" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.2.cmml">⁢</mo><mrow id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.cmml"><mo id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.2" stretchy="false" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.cmml">(</mo><msubsup id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.2" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.2.cmml">𝒟</mi><mrow id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.cmml"><mi id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.2" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.2.cmml">s</mi><mo id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.1" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.3" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.3.cmml">i</mi><mo id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.1a" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.4" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.4.cmml">m</mi></mrow><mi id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.3" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.3.cmml">k</mi></msubsup><mo id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.3" stretchy="false" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.E3.m1.1.1.1.1.3.3.3.2.5" rspace="0.222em" xref="S3.E3.m1.1.1.1.1.3.3.3.3.cmml">)</mo></mrow></mrow></mrow></mrow><mo id="S3.E3.m1.1.1.1.2" xref="S3.E3.m1.1.1.1.1.cmml">.</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.E3.m1.1b"><apply id="S3.E3.m1.1.1.1.1.cmml" xref="S3.E3.m1.1.1.1"><eq id="S3.E3.m1.1.1.1.1.4.cmml" xref="S3.E3.m1.1.1.1.1.4"></eq><apply id="S3.E3.m1.1.1.1.1.5.cmml" xref="S3.E3.m1.1.1.1.1.5"><times id="S3.E3.m1.1.1.1.1.5.1.cmml" xref="S3.E3.m1.1.1.1.1.5.1"></times><ci id="S3.E3.m1.1.1.1.1.5.2b.cmml" xref="S3.E3.m1.1.1.1.1.5.2"><merror class="ltx_ERROR undefined undefined" id="S3.E3.m1.1.1.1.1.5.2.cmml" xref="S3.E3.m1.1.1.1.1.5.2"><mtext id="S3.E3.m1.1.1.1.1.5.2a.cmml" xref="S3.E3.m1.1.1.1.1.5.2">{medsize}</mtext></merror></ci><apply id="S3.E3.m1.1.1.1.1.5.3.cmml" xref="S3.E3.m1.1.1.1.1.5.3"><csymbol cd="ambiguous" id="S3.E3.m1.1.1.1.1.5.3.1.cmml" xref="S3.E3.m1.1.1.1.1.5.3">subscript</csymbol><ci id="S3.E3.m1.1.1.1.1.5.3.2.cmml" xref="S3.E3.m1.1.1.1.1.5.3.2">ℒ</ci><apply id="S3.E3.m1.1.1.1.1.5.3.3.cmml" xref="S3.E3.m1.1.1.1.1.5.3.3"><times id="S3.E3.m1.1.1.1.1.5.3.3.1.cmml" xref="S3.E3.m1.1.1.1.1.5.3.3.1"></times><ci id="S3.E3.m1.1.1.1.1.5.3.3.2.cmml" xref="S3.E3.m1.1.1.1.1.5.3.3.2">𝑠</ci><ci id="S3.E3.m1.1.1.1.1.5.3.3.3.cmml" xref="S3.E3.m1.1.1.1.1.5.3.3.3">𝑟</ci></apply></apply></apply><apply id="S3.E3.m1.1.1.1.1.3.cmml" xref="S3.E3.m1.1.1.1.1.3"><minus id="S3.E3.m1.1.1.1.1.3.4.cmml" xref="S3.E3.m1.1.1.1.1.3"></minus><apply id="S3.E3.m1.1.1.1.1.3.3.cmml" xref="S3.E3.m1.1.1.1.1.3.3"><times id="S3.E3.m1.1.1.1.1.3.3.4.cmml" xref="S3.E3.m1.1.1.1.1.3.3.4"></times><apply id="S3.E3.m1.1.1.1.1.1.1.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1"><ci id="S3.E3.m1.1.1.1.1.1.1.1.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.2">⋅</ci><apply id="S3.E3.m1.1.1.1.1.1.1.1.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1"><times id="S3.E3.m1.1.1.1.1.1.1.1.1.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.2"></times><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.3a.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.3"><mtext id="S3.E3.m1.1.1.1.1.1.1.1.1.3.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.3">KL</mtext></ci><apply id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1"><csymbol cd="latexml" id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.3.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.3">conditional</csymbol><apply id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1"><times id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.2"></times><apply id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.3.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.3"><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.3.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.3.1">^</ci><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.3.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.3.2">𝑝</ci></apply><apply id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1">superscript</csymbol><apply id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1">subscript</csymbol><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.2">𝒟</ci><apply id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3"><times id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.1"></times><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.2">𝑟</ci><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.3.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.3">𝑒</ci><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.4.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.4">𝑎</ci><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.5.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.3.5">𝑙</ci></apply></apply><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3">𝑘</ci></apply></apply><apply id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2"><times id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.2"></times><apply id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.3.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.3"><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.3.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.3.1">^</ci><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.3.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.3.2">𝑝</ci></apply><apply id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1"><csymbol cd="ambiguous" id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1">subscript</csymbol><apply id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1"><csymbol cd="ambiguous" id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1">superscript</csymbol><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.2">𝒟</ci><apply id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3"><minus id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3.1"></minus><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3.2">𝑘</ci><cn id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3.3.cmml" type="integer" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.2.3.3">1</cn></apply></apply><apply id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3"><times id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.1"></times><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.2">𝑠</ci><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.3.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.3">𝑖</ci><ci id="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.4.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.3.4">𝑚</ci></apply></apply></apply></apply></apply><apply id="S3.E3.m1.1.1.1.1.1.1.1.3.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S3.E3.m1.1.1.1.1.1.1.1.3.1.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.3">subscript</csymbol><ci id="S3.E3.m1.1.1.1.1.1.1.1.3.2.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.3.2">𝑊</ci><ci id="S3.E3.m1.1.1.1.1.1.1.1.3.3.cmml" xref="S3.E3.m1.1.1.1.1.1.1.1.3.3">𝛽</ci></apply></apply><interval closure="open" id="S3.E3.m1.1.1.1.1.3.3.3.3.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2"><apply id="S3.E3.m1.1.1.1.1.2.2.2.1.1.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1"><times id="S3.E3.m1.1.1.1.1.2.2.2.1.1.2.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.2"></times><apply id="S3.E3.m1.1.1.1.1.2.2.2.1.1.3.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.3"><ci id="S3.E3.m1.1.1.1.1.2.2.2.1.1.3.1.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.3.1">^</ci><ci id="S3.E3.m1.1.1.1.1.2.2.2.1.1.3.2.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.3.2">𝑝</ci></apply><apply id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1"><plus id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.1.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.1"></plus><apply id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.1.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2">superscript</csymbol><apply id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.1.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2">subscript</csymbol><ci id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.2.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.2">𝒟</ci><apply id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3"><times id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.1.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.1"></times><ci id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.2.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.2">𝑠</ci><ci id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.3.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.3">𝑖</ci><ci id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.4.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.2.3.4">𝑚</ci></apply></apply><ci id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.3.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.2.3">𝑘</ci></apply><apply id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.3.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.3.1.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.3">subscript</csymbol><ci id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.3.2.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.3.2">𝒟</ci><ci id="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.3.3.cmml" xref="S3.E3.m1.1.1.1.1.2.2.2.1.1.1.1.1.3.3">𝑡</ci></apply></apply></apply><apply id="S3.E3.m1.1.1.1.1.3.3.3.2.2.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2"><times id="S3.E3.m1.1.1.1.1.3.3.3.2.2.2.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.2"></times><apply id="S3.E3.m1.1.1.1.1.3.3.3.2.2.3.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.3"><ci id="S3.E3.m1.1.1.1.1.3.3.3.2.2.3.1.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.3.1">^</ci><ci id="S3.E3.m1.1.1.1.1.3.3.3.2.2.3.2.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.3.2">𝑝</ci></apply><apply id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1"><csymbol cd="ambiguous" id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.1.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1">superscript</csymbol><apply id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1"><csymbol cd="ambiguous" id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.1.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1">subscript</csymbol><ci id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.2.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.2">𝒟</ci><apply id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3"><times id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.1.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.1"></times><ci id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.2.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.2">𝑠</ci><ci id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.3.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.3">𝑖</ci><ci id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.4.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.2.3.4">𝑚</ci></apply></apply><ci id="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.3.cmml" xref="S3.E3.m1.1.1.1.1.3.3.3.2.2.1.1.1.3">𝑘</ci></apply></apply></interval></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E3.m1.1c">\medsize\mathcal{L}_{sr}=-\text{KL}\left(\hat{p}(\mathcal{D}_{real}^{k})% \parallel\hat{p}(\mathcal{D}^{k-1}_{sim})\right)\cdot W_{\beta}\left(\hat{p}(% \mathcal{D}_{sim}^{k}+\mathcal{D}_{t}),\hat{p}(\mathcal{D}_{sim}^{k})\right)\enspace.</annotation><annotation encoding="application/x-llamapun" id="S3.E3.m1.1d">caligraphic_L start_POSTSUBSCRIPT italic_s italic_r end_POSTSUBSCRIPT = - KL ( over^ start_ARG italic_p end_ARG ( caligraphic_D start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ over^ start_ARG italic_p end_ARG ( caligraphic_D start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT ) ) ⋅ italic_W start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( over^ start_ARG italic_p end_ARG ( caligraphic_D start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , over^ start_ARG italic_p end_ARG ( caligraphic_D start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ) .</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(3)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S3.SS3.p3.4">The meanings of each notation are:</p> <ul class="ltx_itemize" id="S3.I1"> <li class="ltx_item" id="S3.I1.i1" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I1.i1.p1"> <p class="ltx_p" id="S3.I1.i1.p1.5"><math alttext="\mathcal{D}_{real}^{k}" class="ltx_Math" display="inline" id="S3.I1.i1.p1.1.m1.1"><semantics id="S3.I1.i1.p1.1.m1.1a"><msubsup id="S3.I1.i1.p1.1.m1.1.1" xref="S3.I1.i1.p1.1.m1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.I1.i1.p1.1.m1.1.1.2.2" xref="S3.I1.i1.p1.1.m1.1.1.2.2.cmml">𝒟</mi><mrow id="S3.I1.i1.p1.1.m1.1.1.2.3" xref="S3.I1.i1.p1.1.m1.1.1.2.3.cmml"><mi id="S3.I1.i1.p1.1.m1.1.1.2.3.2" xref="S3.I1.i1.p1.1.m1.1.1.2.3.2.cmml">r</mi><mo id="S3.I1.i1.p1.1.m1.1.1.2.3.1" xref="S3.I1.i1.p1.1.m1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.I1.i1.p1.1.m1.1.1.2.3.3" xref="S3.I1.i1.p1.1.m1.1.1.2.3.3.cmml">e</mi><mo id="S3.I1.i1.p1.1.m1.1.1.2.3.1a" xref="S3.I1.i1.p1.1.m1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.I1.i1.p1.1.m1.1.1.2.3.4" xref="S3.I1.i1.p1.1.m1.1.1.2.3.4.cmml">a</mi><mo id="S3.I1.i1.p1.1.m1.1.1.2.3.1b" xref="S3.I1.i1.p1.1.m1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.I1.i1.p1.1.m1.1.1.2.3.5" xref="S3.I1.i1.p1.1.m1.1.1.2.3.5.cmml">l</mi></mrow><mi id="S3.I1.i1.p1.1.m1.1.1.3" xref="S3.I1.i1.p1.1.m1.1.1.3.cmml">k</mi></msubsup><annotation-xml encoding="MathML-Content" id="S3.I1.i1.p1.1.m1.1b"><apply id="S3.I1.i1.p1.1.m1.1.1.cmml" xref="S3.I1.i1.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S3.I1.i1.p1.1.m1.1.1.1.cmml" xref="S3.I1.i1.p1.1.m1.1.1">superscript</csymbol><apply id="S3.I1.i1.p1.1.m1.1.1.2.cmml" xref="S3.I1.i1.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S3.I1.i1.p1.1.m1.1.1.2.1.cmml" xref="S3.I1.i1.p1.1.m1.1.1">subscript</csymbol><ci id="S3.I1.i1.p1.1.m1.1.1.2.2.cmml" xref="S3.I1.i1.p1.1.m1.1.1.2.2">𝒟</ci><apply id="S3.I1.i1.p1.1.m1.1.1.2.3.cmml" xref="S3.I1.i1.p1.1.m1.1.1.2.3"><times id="S3.I1.i1.p1.1.m1.1.1.2.3.1.cmml" xref="S3.I1.i1.p1.1.m1.1.1.2.3.1"></times><ci id="S3.I1.i1.p1.1.m1.1.1.2.3.2.cmml" xref="S3.I1.i1.p1.1.m1.1.1.2.3.2">𝑟</ci><ci id="S3.I1.i1.p1.1.m1.1.1.2.3.3.cmml" xref="S3.I1.i1.p1.1.m1.1.1.2.3.3">𝑒</ci><ci id="S3.I1.i1.p1.1.m1.1.1.2.3.4.cmml" xref="S3.I1.i1.p1.1.m1.1.1.2.3.4">𝑎</ci><ci id="S3.I1.i1.p1.1.m1.1.1.2.3.5.cmml" xref="S3.I1.i1.p1.1.m1.1.1.2.3.5">𝑙</ci></apply></apply><ci id="S3.I1.i1.p1.1.m1.1.1.3.cmml" xref="S3.I1.i1.p1.1.m1.1.1.3">𝑘</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.I1.i1.p1.1.m1.1c">\mathcal{D}_{real}^{k}</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i1.p1.1.m1.1d">caligraphic_D start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT</annotation></semantics></math>: the data set of length <math alttext="M" class="ltx_Math" display="inline" id="S3.I1.i1.p1.2.m2.1"><semantics id="S3.I1.i1.p1.2.m2.1a"><mi id="S3.I1.i1.p1.2.m2.1.1" xref="S3.I1.i1.p1.2.m2.1.1.cmml">M</mi><annotation-xml encoding="MathML-Content" id="S3.I1.i1.p1.2.m2.1b"><ci id="S3.I1.i1.p1.2.m2.1.1.cmml" xref="S3.I1.i1.p1.2.m2.1.1">𝑀</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.I1.i1.p1.2.m2.1c">M</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i1.p1.2.m2.1d">italic_M</annotation></semantics></math> <math alttext="\{s_{m},a_{m}=\pi_{k-1}(s_{m}),s_{m+1}^{real}\}_{0}^{M}" class="ltx_math_unparsed" display="inline" id="S3.I1.i1.p1.3.m3.1"><semantics id="S3.I1.i1.p1.3.m3.1a"><msubsup id="S3.I1.i1.p1.3.m3.1.1"><mrow id="S3.I1.i1.p1.3.m3.1.1.2.2"><mo id="S3.I1.i1.p1.3.m3.1.1.2.2.1" stretchy="false">{</mo><msub id="S3.I1.i1.p1.3.m3.1.1.2.2.2"><mi id="S3.I1.i1.p1.3.m3.1.1.2.2.2.2">s</mi><mi id="S3.I1.i1.p1.3.m3.1.1.2.2.2.3">m</mi></msub><mo id="S3.I1.i1.p1.3.m3.1.1.2.2.3">,</mo><msub id="S3.I1.i1.p1.3.m3.1.1.2.2.4"><mi id="S3.I1.i1.p1.3.m3.1.1.2.2.4.2">a</mi><mi id="S3.I1.i1.p1.3.m3.1.1.2.2.4.3">m</mi></msub><mo id="S3.I1.i1.p1.3.m3.1.1.2.2.5">=</mo><msub id="S3.I1.i1.p1.3.m3.1.1.2.2.6"><mi id="S3.I1.i1.p1.3.m3.1.1.2.2.6.2">π</mi><mrow id="S3.I1.i1.p1.3.m3.1.1.2.2.6.3"><mi id="S3.I1.i1.p1.3.m3.1.1.2.2.6.3.2">k</mi><mo id="S3.I1.i1.p1.3.m3.1.1.2.2.6.3.1">−</mo><mn id="S3.I1.i1.p1.3.m3.1.1.2.2.6.3.3">1</mn></mrow></msub><mrow id="S3.I1.i1.p1.3.m3.1.1.2.2.7"><mo id="S3.I1.i1.p1.3.m3.1.1.2.2.7.1" stretchy="false">(</mo><msub id="S3.I1.i1.p1.3.m3.1.1.2.2.7.2"><mi id="S3.I1.i1.p1.3.m3.1.1.2.2.7.2.2">s</mi><mi id="S3.I1.i1.p1.3.m3.1.1.2.2.7.2.3">m</mi></msub><mo id="S3.I1.i1.p1.3.m3.1.1.2.2.7.3" stretchy="false">)</mo></mrow><mo id="S3.I1.i1.p1.3.m3.1.1.2.2.8">,</mo><msubsup id="S3.I1.i1.p1.3.m3.1.1.2.2.9"><mi id="S3.I1.i1.p1.3.m3.1.1.2.2.9.2.2">s</mi><mrow id="S3.I1.i1.p1.3.m3.1.1.2.2.9.2.3"><mi id="S3.I1.i1.p1.3.m3.1.1.2.2.9.2.3.2">m</mi><mo id="S3.I1.i1.p1.3.m3.1.1.2.2.9.2.3.1">+</mo><mn id="S3.I1.i1.p1.3.m3.1.1.2.2.9.2.3.3">1</mn></mrow><mrow id="S3.I1.i1.p1.3.m3.1.1.2.2.9.3"><mi id="S3.I1.i1.p1.3.m3.1.1.2.2.9.3.2">r</mi><mo id="S3.I1.i1.p1.3.m3.1.1.2.2.9.3.1">⁢</mo><mi id="S3.I1.i1.p1.3.m3.1.1.2.2.9.3.3">e</mi><mo id="S3.I1.i1.p1.3.m3.1.1.2.2.9.3.1a">⁢</mo><mi id="S3.I1.i1.p1.3.m3.1.1.2.2.9.3.4">a</mi><mo id="S3.I1.i1.p1.3.m3.1.1.2.2.9.3.1b">⁢</mo><mi id="S3.I1.i1.p1.3.m3.1.1.2.2.9.3.5">l</mi></mrow></msubsup><mo id="S3.I1.i1.p1.3.m3.1.1.2.2.10" stretchy="false">}</mo></mrow><mn id="S3.I1.i1.p1.3.m3.1.1.2.3">0</mn><mi id="S3.I1.i1.p1.3.m3.1.1.3">M</mi></msubsup><annotation encoding="application/x-tex" id="S3.I1.i1.p1.3.m3.1b">\{s_{m},a_{m}=\pi_{k-1}(s_{m}),s_{m+1}^{real}\}_{0}^{M}</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i1.p1.3.m3.1c">{ italic_s start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_π start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) , italic_s start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r italic_e italic_a italic_l end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT</annotation></semantics></math> collected from the actual robot with the policy <math alttext="\pi_{k-1}" class="ltx_Math" display="inline" id="S3.I1.i1.p1.4.m4.1"><semantics id="S3.I1.i1.p1.4.m4.1a"><msub id="S3.I1.i1.p1.4.m4.1.1" xref="S3.I1.i1.p1.4.m4.1.1.cmml"><mi id="S3.I1.i1.p1.4.m4.1.1.2" xref="S3.I1.i1.p1.4.m4.1.1.2.cmml">π</mi><mrow id="S3.I1.i1.p1.4.m4.1.1.3" xref="S3.I1.i1.p1.4.m4.1.1.3.cmml"><mi id="S3.I1.i1.p1.4.m4.1.1.3.2" xref="S3.I1.i1.p1.4.m4.1.1.3.2.cmml">k</mi><mo id="S3.I1.i1.p1.4.m4.1.1.3.1" xref="S3.I1.i1.p1.4.m4.1.1.3.1.cmml">−</mo><mn id="S3.I1.i1.p1.4.m4.1.1.3.3" xref="S3.I1.i1.p1.4.m4.1.1.3.3.cmml">1</mn></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.I1.i1.p1.4.m4.1b"><apply id="S3.I1.i1.p1.4.m4.1.1.cmml" xref="S3.I1.i1.p1.4.m4.1.1"><csymbol cd="ambiguous" id="S3.I1.i1.p1.4.m4.1.1.1.cmml" xref="S3.I1.i1.p1.4.m4.1.1">subscript</csymbol><ci id="S3.I1.i1.p1.4.m4.1.1.2.cmml" xref="S3.I1.i1.p1.4.m4.1.1.2">𝜋</ci><apply id="S3.I1.i1.p1.4.m4.1.1.3.cmml" xref="S3.I1.i1.p1.4.m4.1.1.3"><minus id="S3.I1.i1.p1.4.m4.1.1.3.1.cmml" xref="S3.I1.i1.p1.4.m4.1.1.3.1"></minus><ci id="S3.I1.i1.p1.4.m4.1.1.3.2.cmml" xref="S3.I1.i1.p1.4.m4.1.1.3.2">𝑘</ci><cn id="S3.I1.i1.p1.4.m4.1.1.3.3.cmml" type="integer" xref="S3.I1.i1.p1.4.m4.1.1.3.3">1</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.I1.i1.p1.4.m4.1c">\pi_{k-1}</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i1.p1.4.m4.1d">italic_π start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT</annotation></semantics></math> trained from the last iteration <math alttext="k-1" class="ltx_Math" display="inline" id="S3.I1.i1.p1.5.m5.1"><semantics id="S3.I1.i1.p1.5.m5.1a"><mrow id="S3.I1.i1.p1.5.m5.1.1" xref="S3.I1.i1.p1.5.m5.1.1.cmml"><mi id="S3.I1.i1.p1.5.m5.1.1.2" xref="S3.I1.i1.p1.5.m5.1.1.2.cmml">k</mi><mo id="S3.I1.i1.p1.5.m5.1.1.1" xref="S3.I1.i1.p1.5.m5.1.1.1.cmml">−</mo><mn id="S3.I1.i1.p1.5.m5.1.1.3" xref="S3.I1.i1.p1.5.m5.1.1.3.cmml">1</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.I1.i1.p1.5.m5.1b"><apply id="S3.I1.i1.p1.5.m5.1.1.cmml" xref="S3.I1.i1.p1.5.m5.1.1"><minus id="S3.I1.i1.p1.5.m5.1.1.1.cmml" xref="S3.I1.i1.p1.5.m5.1.1.1"></minus><ci id="S3.I1.i1.p1.5.m5.1.1.2.cmml" xref="S3.I1.i1.p1.5.m5.1.1.2">𝑘</ci><cn id="S3.I1.i1.p1.5.m5.1.1.3.cmml" type="integer" xref="S3.I1.i1.p1.5.m5.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.I1.i1.p1.5.m5.1c">k-1</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i1.p1.5.m5.1d">italic_k - 1</annotation></semantics></math>.</p> </div> </li> <li class="ltx_item" id="S3.I1.i2" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I1.i2.p1"> <p class="ltx_p" id="S3.I1.i2.p1.2"><math alttext="\mathcal{D}_{sim}^{k-1}" class="ltx_Math" display="inline" id="S3.I1.i2.p1.1.m1.1"><semantics id="S3.I1.i2.p1.1.m1.1a"><msubsup id="S3.I1.i2.p1.1.m1.1.1" xref="S3.I1.i2.p1.1.m1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.I1.i2.p1.1.m1.1.1.2.2" xref="S3.I1.i2.p1.1.m1.1.1.2.2.cmml">𝒟</mi><mrow id="S3.I1.i2.p1.1.m1.1.1.2.3" xref="S3.I1.i2.p1.1.m1.1.1.2.3.cmml"><mi id="S3.I1.i2.p1.1.m1.1.1.2.3.2" xref="S3.I1.i2.p1.1.m1.1.1.2.3.2.cmml">s</mi><mo id="S3.I1.i2.p1.1.m1.1.1.2.3.1" xref="S3.I1.i2.p1.1.m1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.I1.i2.p1.1.m1.1.1.2.3.3" xref="S3.I1.i2.p1.1.m1.1.1.2.3.3.cmml">i</mi><mo id="S3.I1.i2.p1.1.m1.1.1.2.3.1a" xref="S3.I1.i2.p1.1.m1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.I1.i2.p1.1.m1.1.1.2.3.4" xref="S3.I1.i2.p1.1.m1.1.1.2.3.4.cmml">m</mi></mrow><mrow id="S3.I1.i2.p1.1.m1.1.1.3" xref="S3.I1.i2.p1.1.m1.1.1.3.cmml"><mi id="S3.I1.i2.p1.1.m1.1.1.3.2" xref="S3.I1.i2.p1.1.m1.1.1.3.2.cmml">k</mi><mo id="S3.I1.i2.p1.1.m1.1.1.3.1" xref="S3.I1.i2.p1.1.m1.1.1.3.1.cmml">−</mo><mn id="S3.I1.i2.p1.1.m1.1.1.3.3" xref="S3.I1.i2.p1.1.m1.1.1.3.3.cmml">1</mn></mrow></msubsup><annotation-xml encoding="MathML-Content" id="S3.I1.i2.p1.1.m1.1b"><apply id="S3.I1.i2.p1.1.m1.1.1.cmml" xref="S3.I1.i2.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S3.I1.i2.p1.1.m1.1.1.1.cmml" xref="S3.I1.i2.p1.1.m1.1.1">superscript</csymbol><apply id="S3.I1.i2.p1.1.m1.1.1.2.cmml" xref="S3.I1.i2.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S3.I1.i2.p1.1.m1.1.1.2.1.cmml" xref="S3.I1.i2.p1.1.m1.1.1">subscript</csymbol><ci id="S3.I1.i2.p1.1.m1.1.1.2.2.cmml" xref="S3.I1.i2.p1.1.m1.1.1.2.2">𝒟</ci><apply id="S3.I1.i2.p1.1.m1.1.1.2.3.cmml" xref="S3.I1.i2.p1.1.m1.1.1.2.3"><times id="S3.I1.i2.p1.1.m1.1.1.2.3.1.cmml" xref="S3.I1.i2.p1.1.m1.1.1.2.3.1"></times><ci id="S3.I1.i2.p1.1.m1.1.1.2.3.2.cmml" xref="S3.I1.i2.p1.1.m1.1.1.2.3.2">𝑠</ci><ci id="S3.I1.i2.p1.1.m1.1.1.2.3.3.cmml" xref="S3.I1.i2.p1.1.m1.1.1.2.3.3">𝑖</ci><ci id="S3.I1.i2.p1.1.m1.1.1.2.3.4.cmml" xref="S3.I1.i2.p1.1.m1.1.1.2.3.4">𝑚</ci></apply></apply><apply id="S3.I1.i2.p1.1.m1.1.1.3.cmml" xref="S3.I1.i2.p1.1.m1.1.1.3"><minus id="S3.I1.i2.p1.1.m1.1.1.3.1.cmml" xref="S3.I1.i2.p1.1.m1.1.1.3.1"></minus><ci id="S3.I1.i2.p1.1.m1.1.1.3.2.cmml" xref="S3.I1.i2.p1.1.m1.1.1.3.2">𝑘</ci><cn id="S3.I1.i2.p1.1.m1.1.1.3.3.cmml" type="integer" xref="S3.I1.i2.p1.1.m1.1.1.3.3">1</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.I1.i2.p1.1.m1.1c">\mathcal{D}_{sim}^{k-1}</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i2.p1.1.m1.1d">caligraphic_D start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT</annotation></semantics></math>: the data set of the same input but propagated in the untuned simulator <math alttext="\{s_{m},a_{m},s_{m+1}^{sim}\}_{0}^{M}" class="ltx_Math" display="inline" id="S3.I1.i2.p1.2.m2.3"><semantics id="S3.I1.i2.p1.2.m2.3a"><msubsup id="S3.I1.i2.p1.2.m2.3.3" xref="S3.I1.i2.p1.2.m2.3.3.cmml"><mrow id="S3.I1.i2.p1.2.m2.3.3.3.3.3" xref="S3.I1.i2.p1.2.m2.3.3.3.3.4.cmml"><mo id="S3.I1.i2.p1.2.m2.3.3.3.3.3.4" stretchy="false" xref="S3.I1.i2.p1.2.m2.3.3.3.3.4.cmml">{</mo><msub id="S3.I1.i2.p1.2.m2.1.1.1.1.1.1" xref="S3.I1.i2.p1.2.m2.1.1.1.1.1.1.cmml"><mi id="S3.I1.i2.p1.2.m2.1.1.1.1.1.1.2" xref="S3.I1.i2.p1.2.m2.1.1.1.1.1.1.2.cmml">s</mi><mi id="S3.I1.i2.p1.2.m2.1.1.1.1.1.1.3" xref="S3.I1.i2.p1.2.m2.1.1.1.1.1.1.3.cmml">m</mi></msub><mo id="S3.I1.i2.p1.2.m2.3.3.3.3.3.5" xref="S3.I1.i2.p1.2.m2.3.3.3.3.4.cmml">,</mo><msub id="S3.I1.i2.p1.2.m2.2.2.2.2.2.2" xref="S3.I1.i2.p1.2.m2.2.2.2.2.2.2.cmml"><mi id="S3.I1.i2.p1.2.m2.2.2.2.2.2.2.2" xref="S3.I1.i2.p1.2.m2.2.2.2.2.2.2.2.cmml">a</mi><mi id="S3.I1.i2.p1.2.m2.2.2.2.2.2.2.3" xref="S3.I1.i2.p1.2.m2.2.2.2.2.2.2.3.cmml">m</mi></msub><mo id="S3.I1.i2.p1.2.m2.3.3.3.3.3.6" xref="S3.I1.i2.p1.2.m2.3.3.3.3.4.cmml">,</mo><msubsup id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.cmml"><mi id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.2" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.2.cmml">s</mi><mrow id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3.cmml"><mi id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3.2" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3.2.cmml">m</mi><mo id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3.1" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3.1.cmml">+</mo><mn id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3.3" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3.3.cmml">1</mn></mrow><mrow id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.cmml"><mi id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.2" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.2.cmml">s</mi><mo id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.1" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.1.cmml">⁢</mo><mi id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.3" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.3.cmml">i</mi><mo id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.1a" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.1.cmml">⁢</mo><mi id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.4" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.4.cmml">m</mi></mrow></msubsup><mo id="S3.I1.i2.p1.2.m2.3.3.3.3.3.7" stretchy="false" xref="S3.I1.i2.p1.2.m2.3.3.3.3.4.cmml">}</mo></mrow><mn id="S3.I1.i2.p1.2.m2.3.3.3.5" xref="S3.I1.i2.p1.2.m2.3.3.3.5.cmml">0</mn><mi id="S3.I1.i2.p1.2.m2.3.3.5" xref="S3.I1.i2.p1.2.m2.3.3.5.cmml">M</mi></msubsup><annotation-xml encoding="MathML-Content" id="S3.I1.i2.p1.2.m2.3b"><apply id="S3.I1.i2.p1.2.m2.3.3.cmml" xref="S3.I1.i2.p1.2.m2.3.3"><csymbol cd="ambiguous" id="S3.I1.i2.p1.2.m2.3.3.4.cmml" xref="S3.I1.i2.p1.2.m2.3.3">superscript</csymbol><apply id="S3.I1.i2.p1.2.m2.3.3.3.cmml" xref="S3.I1.i2.p1.2.m2.3.3"><csymbol cd="ambiguous" id="S3.I1.i2.p1.2.m2.3.3.3.4.cmml" xref="S3.I1.i2.p1.2.m2.3.3">subscript</csymbol><set id="S3.I1.i2.p1.2.m2.3.3.3.3.4.cmml" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3"><apply id="S3.I1.i2.p1.2.m2.1.1.1.1.1.1.cmml" xref="S3.I1.i2.p1.2.m2.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.I1.i2.p1.2.m2.1.1.1.1.1.1.1.cmml" xref="S3.I1.i2.p1.2.m2.1.1.1.1.1.1">subscript</csymbol><ci id="S3.I1.i2.p1.2.m2.1.1.1.1.1.1.2.cmml" xref="S3.I1.i2.p1.2.m2.1.1.1.1.1.1.2">𝑠</ci><ci id="S3.I1.i2.p1.2.m2.1.1.1.1.1.1.3.cmml" xref="S3.I1.i2.p1.2.m2.1.1.1.1.1.1.3">𝑚</ci></apply><apply id="S3.I1.i2.p1.2.m2.2.2.2.2.2.2.cmml" xref="S3.I1.i2.p1.2.m2.2.2.2.2.2.2"><csymbol cd="ambiguous" id="S3.I1.i2.p1.2.m2.2.2.2.2.2.2.1.cmml" xref="S3.I1.i2.p1.2.m2.2.2.2.2.2.2">subscript</csymbol><ci id="S3.I1.i2.p1.2.m2.2.2.2.2.2.2.2.cmml" xref="S3.I1.i2.p1.2.m2.2.2.2.2.2.2.2">𝑎</ci><ci id="S3.I1.i2.p1.2.m2.2.2.2.2.2.2.3.cmml" xref="S3.I1.i2.p1.2.m2.2.2.2.2.2.2.3">𝑚</ci></apply><apply id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.cmml" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3"><csymbol cd="ambiguous" id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.1.cmml" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3">superscript</csymbol><apply id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.cmml" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3"><csymbol cd="ambiguous" id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.1.cmml" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3">subscript</csymbol><ci id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.2.cmml" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.2">𝑠</ci><apply id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3.cmml" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3"><plus id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3.1.cmml" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3.1"></plus><ci id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3.2.cmml" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3.2">𝑚</ci><cn id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3.3.cmml" type="integer" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.2.3.3">1</cn></apply></apply><apply id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.cmml" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3"><times id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.1.cmml" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.1"></times><ci id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.2.cmml" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.2">𝑠</ci><ci id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.3.cmml" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.3">𝑖</ci><ci id="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.4.cmml" xref="S3.I1.i2.p1.2.m2.3.3.3.3.3.3.3.4">𝑚</ci></apply></apply></set><cn id="S3.I1.i2.p1.2.m2.3.3.3.5.cmml" type="integer" xref="S3.I1.i2.p1.2.m2.3.3.3.5">0</cn></apply><ci id="S3.I1.i2.p1.2.m2.3.3.5.cmml" xref="S3.I1.i2.p1.2.m2.3.3.5">𝑀</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.I1.i2.p1.2.m2.3c">\{s_{m},a_{m},s_{m+1}^{sim}\}_{0}^{M}</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i2.p1.2.m2.3d">{ italic_s start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s italic_i italic_m end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT</annotation></semantics></math>.</p> </div> </li> <li class="ltx_item" id="S3.I1.i3" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I1.i3.p1"> <p class="ltx_p" id="S3.I1.i3.p1.2"><math alttext="\mathcal{D}_{sim}^{k}" class="ltx_Math" display="inline" id="S3.I1.i3.p1.1.m1.1"><semantics id="S3.I1.i3.p1.1.m1.1a"><msubsup id="S3.I1.i3.p1.1.m1.1.1" xref="S3.I1.i3.p1.1.m1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.I1.i3.p1.1.m1.1.1.2.2" xref="S3.I1.i3.p1.1.m1.1.1.2.2.cmml">𝒟</mi><mrow id="S3.I1.i3.p1.1.m1.1.1.2.3" xref="S3.I1.i3.p1.1.m1.1.1.2.3.cmml"><mi id="S3.I1.i3.p1.1.m1.1.1.2.3.2" xref="S3.I1.i3.p1.1.m1.1.1.2.3.2.cmml">s</mi><mo id="S3.I1.i3.p1.1.m1.1.1.2.3.1" xref="S3.I1.i3.p1.1.m1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.I1.i3.p1.1.m1.1.1.2.3.3" xref="S3.I1.i3.p1.1.m1.1.1.2.3.3.cmml">i</mi><mo id="S3.I1.i3.p1.1.m1.1.1.2.3.1a" xref="S3.I1.i3.p1.1.m1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.I1.i3.p1.1.m1.1.1.2.3.4" xref="S3.I1.i3.p1.1.m1.1.1.2.3.4.cmml">m</mi></mrow><mi id="S3.I1.i3.p1.1.m1.1.1.3" xref="S3.I1.i3.p1.1.m1.1.1.3.cmml">k</mi></msubsup><annotation-xml encoding="MathML-Content" id="S3.I1.i3.p1.1.m1.1b"><apply id="S3.I1.i3.p1.1.m1.1.1.cmml" xref="S3.I1.i3.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S3.I1.i3.p1.1.m1.1.1.1.cmml" xref="S3.I1.i3.p1.1.m1.1.1">superscript</csymbol><apply id="S3.I1.i3.p1.1.m1.1.1.2.cmml" xref="S3.I1.i3.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S3.I1.i3.p1.1.m1.1.1.2.1.cmml" xref="S3.I1.i3.p1.1.m1.1.1">subscript</csymbol><ci id="S3.I1.i3.p1.1.m1.1.1.2.2.cmml" xref="S3.I1.i3.p1.1.m1.1.1.2.2">𝒟</ci><apply id="S3.I1.i3.p1.1.m1.1.1.2.3.cmml" xref="S3.I1.i3.p1.1.m1.1.1.2.3"><times id="S3.I1.i3.p1.1.m1.1.1.2.3.1.cmml" xref="S3.I1.i3.p1.1.m1.1.1.2.3.1"></times><ci id="S3.I1.i3.p1.1.m1.1.1.2.3.2.cmml" xref="S3.I1.i3.p1.1.m1.1.1.2.3.2">𝑠</ci><ci id="S3.I1.i3.p1.1.m1.1.1.2.3.3.cmml" xref="S3.I1.i3.p1.1.m1.1.1.2.3.3">𝑖</ci><ci id="S3.I1.i3.p1.1.m1.1.1.2.3.4.cmml" xref="S3.I1.i3.p1.1.m1.1.1.2.3.4">𝑚</ci></apply></apply><ci id="S3.I1.i3.p1.1.m1.1.1.3.cmml" xref="S3.I1.i3.p1.1.m1.1.1.3">𝑘</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.I1.i3.p1.1.m1.1c">\mathcal{D}_{sim}^{k}</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i3.p1.1.m1.1d">caligraphic_D start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT</annotation></semantics></math>: the dataset propagated in this tuned new simulator <math alttext="\{s_{m},a_{m},\hat{s}_{m+1}^{sim}\}_{0}^{M}" class="ltx_Math" display="inline" id="S3.I1.i3.p1.2.m2.3"><semantics id="S3.I1.i3.p1.2.m2.3a"><msubsup id="S3.I1.i3.p1.2.m2.3.3" xref="S3.I1.i3.p1.2.m2.3.3.cmml"><mrow id="S3.I1.i3.p1.2.m2.3.3.3.3.3" xref="S3.I1.i3.p1.2.m2.3.3.3.3.4.cmml"><mo id="S3.I1.i3.p1.2.m2.3.3.3.3.3.4" stretchy="false" xref="S3.I1.i3.p1.2.m2.3.3.3.3.4.cmml">{</mo><msub id="S3.I1.i3.p1.2.m2.1.1.1.1.1.1" xref="S3.I1.i3.p1.2.m2.1.1.1.1.1.1.cmml"><mi id="S3.I1.i3.p1.2.m2.1.1.1.1.1.1.2" xref="S3.I1.i3.p1.2.m2.1.1.1.1.1.1.2.cmml">s</mi><mi id="S3.I1.i3.p1.2.m2.1.1.1.1.1.1.3" xref="S3.I1.i3.p1.2.m2.1.1.1.1.1.1.3.cmml">m</mi></msub><mo id="S3.I1.i3.p1.2.m2.3.3.3.3.3.5" xref="S3.I1.i3.p1.2.m2.3.3.3.3.4.cmml">,</mo><msub id="S3.I1.i3.p1.2.m2.2.2.2.2.2.2" xref="S3.I1.i3.p1.2.m2.2.2.2.2.2.2.cmml"><mi id="S3.I1.i3.p1.2.m2.2.2.2.2.2.2.2" xref="S3.I1.i3.p1.2.m2.2.2.2.2.2.2.2.cmml">a</mi><mi id="S3.I1.i3.p1.2.m2.2.2.2.2.2.2.3" xref="S3.I1.i3.p1.2.m2.2.2.2.2.2.2.3.cmml">m</mi></msub><mo id="S3.I1.i3.p1.2.m2.3.3.3.3.3.6" xref="S3.I1.i3.p1.2.m2.3.3.3.3.4.cmml">,</mo><msubsup id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.cmml"><mover accent="true" id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.2" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.2.cmml"><mi id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.2.2" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.2.2.cmml">s</mi><mo id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.2.1" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.2.1.cmml">^</mo></mover><mrow id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3.cmml"><mi id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3.2" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3.2.cmml">m</mi><mo id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3.1" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3.1.cmml">+</mo><mn id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3.3" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3.3.cmml">1</mn></mrow><mrow id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.cmml"><mi id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.2" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.2.cmml">s</mi><mo id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.1" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.1.cmml">⁢</mo><mi id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.3" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.3.cmml">i</mi><mo id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.1a" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.1.cmml">⁢</mo><mi id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.4" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.4.cmml">m</mi></mrow></msubsup><mo id="S3.I1.i3.p1.2.m2.3.3.3.3.3.7" stretchy="false" xref="S3.I1.i3.p1.2.m2.3.3.3.3.4.cmml">}</mo></mrow><mn id="S3.I1.i3.p1.2.m2.3.3.3.5" xref="S3.I1.i3.p1.2.m2.3.3.3.5.cmml">0</mn><mi id="S3.I1.i3.p1.2.m2.3.3.5" xref="S3.I1.i3.p1.2.m2.3.3.5.cmml">M</mi></msubsup><annotation-xml encoding="MathML-Content" id="S3.I1.i3.p1.2.m2.3b"><apply id="S3.I1.i3.p1.2.m2.3.3.cmml" xref="S3.I1.i3.p1.2.m2.3.3"><csymbol cd="ambiguous" id="S3.I1.i3.p1.2.m2.3.3.4.cmml" xref="S3.I1.i3.p1.2.m2.3.3">superscript</csymbol><apply id="S3.I1.i3.p1.2.m2.3.3.3.cmml" xref="S3.I1.i3.p1.2.m2.3.3"><csymbol cd="ambiguous" id="S3.I1.i3.p1.2.m2.3.3.3.4.cmml" xref="S3.I1.i3.p1.2.m2.3.3">subscript</csymbol><set id="S3.I1.i3.p1.2.m2.3.3.3.3.4.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3"><apply id="S3.I1.i3.p1.2.m2.1.1.1.1.1.1.cmml" xref="S3.I1.i3.p1.2.m2.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.I1.i3.p1.2.m2.1.1.1.1.1.1.1.cmml" xref="S3.I1.i3.p1.2.m2.1.1.1.1.1.1">subscript</csymbol><ci id="S3.I1.i3.p1.2.m2.1.1.1.1.1.1.2.cmml" xref="S3.I1.i3.p1.2.m2.1.1.1.1.1.1.2">𝑠</ci><ci id="S3.I1.i3.p1.2.m2.1.1.1.1.1.1.3.cmml" xref="S3.I1.i3.p1.2.m2.1.1.1.1.1.1.3">𝑚</ci></apply><apply id="S3.I1.i3.p1.2.m2.2.2.2.2.2.2.cmml" xref="S3.I1.i3.p1.2.m2.2.2.2.2.2.2"><csymbol cd="ambiguous" id="S3.I1.i3.p1.2.m2.2.2.2.2.2.2.1.cmml" xref="S3.I1.i3.p1.2.m2.2.2.2.2.2.2">subscript</csymbol><ci id="S3.I1.i3.p1.2.m2.2.2.2.2.2.2.2.cmml" xref="S3.I1.i3.p1.2.m2.2.2.2.2.2.2.2">𝑎</ci><ci id="S3.I1.i3.p1.2.m2.2.2.2.2.2.2.3.cmml" xref="S3.I1.i3.p1.2.m2.2.2.2.2.2.2.3">𝑚</ci></apply><apply id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3"><csymbol cd="ambiguous" id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.1.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3">superscript</csymbol><apply id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3"><csymbol cd="ambiguous" id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.1.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3">subscript</csymbol><apply id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.2.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.2"><ci id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.2.1.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.2.1">^</ci><ci id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.2.2.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.2.2">𝑠</ci></apply><apply id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3"><plus id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3.1.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3.1"></plus><ci id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3.2.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3.2">𝑚</ci><cn id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3.3.cmml" type="integer" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.2.3.3">1</cn></apply></apply><apply id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3"><times id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.1.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.1"></times><ci id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.2.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.2">𝑠</ci><ci id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.3.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.3">𝑖</ci><ci id="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.4.cmml" xref="S3.I1.i3.p1.2.m2.3.3.3.3.3.3.3.4">𝑚</ci></apply></apply></set><cn id="S3.I1.i3.p1.2.m2.3.3.3.5.cmml" type="integer" xref="S3.I1.i3.p1.2.m2.3.3.3.5">0</cn></apply><ci id="S3.I1.i3.p1.2.m2.3.3.5.cmml" xref="S3.I1.i3.p1.2.m2.3.3.5">𝑀</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.I1.i3.p1.2.m2.3c">\{s_{m},a_{m},\hat{s}_{m+1}^{sim}\}_{0}^{M}</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i3.p1.2.m2.3d">{ italic_s start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s italic_i italic_m end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT</annotation></semantics></math>.</p> </div> </li> <li class="ltx_item" id="S3.I1.i4" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I1.i4.p1"> <p class="ltx_p" id="S3.I1.i4.p1.2"><math alttext="\mathcal{D}_{t}" class="ltx_Math" display="inline" id="S3.I1.i4.p1.1.m1.1"><semantics id="S3.I1.i4.p1.1.m1.1a"><msub id="S3.I1.i4.p1.1.m1.1.1" xref="S3.I1.i4.p1.1.m1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.I1.i4.p1.1.m1.1.1.2" xref="S3.I1.i4.p1.1.m1.1.1.2.cmml">𝒟</mi><mi id="S3.I1.i4.p1.1.m1.1.1.3" xref="S3.I1.i4.p1.1.m1.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S3.I1.i4.p1.1.m1.1b"><apply id="S3.I1.i4.p1.1.m1.1.1.cmml" xref="S3.I1.i4.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S3.I1.i4.p1.1.m1.1.1.1.cmml" xref="S3.I1.i4.p1.1.m1.1.1">subscript</csymbol><ci id="S3.I1.i4.p1.1.m1.1.1.2.cmml" xref="S3.I1.i4.p1.1.m1.1.1.2">𝒟</ci><ci id="S3.I1.i4.p1.1.m1.1.1.3.cmml" xref="S3.I1.i4.p1.1.m1.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.I1.i4.p1.1.m1.1c">\mathcal{D}_{t}</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i4.p1.1.m1.1d">caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>: the data pair generated by the current input <math alttext="a_{k,t}" class="ltx_Math" display="inline" id="S3.I1.i4.p1.2.m2.2"><semantics id="S3.I1.i4.p1.2.m2.2a"><msub id="S3.I1.i4.p1.2.m2.2.3" xref="S3.I1.i4.p1.2.m2.2.3.cmml"><mi id="S3.I1.i4.p1.2.m2.2.3.2" xref="S3.I1.i4.p1.2.m2.2.3.2.cmml">a</mi><mrow id="S3.I1.i4.p1.2.m2.2.2.2.4" xref="S3.I1.i4.p1.2.m2.2.2.2.3.cmml"><mi id="S3.I1.i4.p1.2.m2.1.1.1.1" xref="S3.I1.i4.p1.2.m2.1.1.1.1.cmml">k</mi><mo id="S3.I1.i4.p1.2.m2.2.2.2.4.1" xref="S3.I1.i4.p1.2.m2.2.2.2.3.cmml">,</mo><mi id="S3.I1.i4.p1.2.m2.2.2.2.2" xref="S3.I1.i4.p1.2.m2.2.2.2.2.cmml">t</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.I1.i4.p1.2.m2.2b"><apply id="S3.I1.i4.p1.2.m2.2.3.cmml" xref="S3.I1.i4.p1.2.m2.2.3"><csymbol cd="ambiguous" id="S3.I1.i4.p1.2.m2.2.3.1.cmml" xref="S3.I1.i4.p1.2.m2.2.3">subscript</csymbol><ci id="S3.I1.i4.p1.2.m2.2.3.2.cmml" xref="S3.I1.i4.p1.2.m2.2.3.2">𝑎</ci><list id="S3.I1.i4.p1.2.m2.2.2.2.3.cmml" xref="S3.I1.i4.p1.2.m2.2.2.2.4"><ci id="S3.I1.i4.p1.2.m2.1.1.1.1.cmml" xref="S3.I1.i4.p1.2.m2.1.1.1.1">𝑘</ci><ci id="S3.I1.i4.p1.2.m2.2.2.2.2.cmml" xref="S3.I1.i4.p1.2.m2.2.2.2.2">𝑡</ci></list></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.I1.i4.p1.2.m2.2c">a_{k,t}</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i4.p1.2.m2.2d">italic_a start_POSTSUBSCRIPT italic_k , italic_t end_POSTSUBSCRIPT</annotation></semantics></math>.</p> </div> </li> <li class="ltx_item" id="S3.I1.i5" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I1.i5.p1"> <p class="ltx_p" id="S3.I1.i5.p1.2"><math alttext="\hat{p}(\{\mathcal{D}\})" class="ltx_Math" display="inline" id="S3.I1.i5.p1.1.m1.2"><semantics id="S3.I1.i5.p1.1.m1.2a"><mrow id="S3.I1.i5.p1.1.m1.2.2" xref="S3.I1.i5.p1.1.m1.2.2.cmml"><mover accent="true" id="S3.I1.i5.p1.1.m1.2.2.3" xref="S3.I1.i5.p1.1.m1.2.2.3.cmml"><mi id="S3.I1.i5.p1.1.m1.2.2.3.2" xref="S3.I1.i5.p1.1.m1.2.2.3.2.cmml">p</mi><mo id="S3.I1.i5.p1.1.m1.2.2.3.1" xref="S3.I1.i5.p1.1.m1.2.2.3.1.cmml">^</mo></mover><mo id="S3.I1.i5.p1.1.m1.2.2.2" xref="S3.I1.i5.p1.1.m1.2.2.2.cmml">⁢</mo><mrow id="S3.I1.i5.p1.1.m1.2.2.1.1" xref="S3.I1.i5.p1.1.m1.2.2.cmml"><mo id="S3.I1.i5.p1.1.m1.2.2.1.1.2" stretchy="false" xref="S3.I1.i5.p1.1.m1.2.2.cmml">(</mo><mrow id="S3.I1.i5.p1.1.m1.2.2.1.1.1.2" xref="S3.I1.i5.p1.1.m1.2.2.1.1.1.1.cmml"><mo id="S3.I1.i5.p1.1.m1.2.2.1.1.1.2.1" stretchy="false" xref="S3.I1.i5.p1.1.m1.2.2.1.1.1.1.cmml">{</mo><mi class="ltx_font_mathcaligraphic" id="S3.I1.i5.p1.1.m1.1.1" xref="S3.I1.i5.p1.1.m1.1.1.cmml">𝒟</mi><mo id="S3.I1.i5.p1.1.m1.2.2.1.1.1.2.2" stretchy="false" xref="S3.I1.i5.p1.1.m1.2.2.1.1.1.1.cmml">}</mo></mrow><mo id="S3.I1.i5.p1.1.m1.2.2.1.1.3" stretchy="false" xref="S3.I1.i5.p1.1.m1.2.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.I1.i5.p1.1.m1.2b"><apply id="S3.I1.i5.p1.1.m1.2.2.cmml" xref="S3.I1.i5.p1.1.m1.2.2"><times id="S3.I1.i5.p1.1.m1.2.2.2.cmml" xref="S3.I1.i5.p1.1.m1.2.2.2"></times><apply id="S3.I1.i5.p1.1.m1.2.2.3.cmml" xref="S3.I1.i5.p1.1.m1.2.2.3"><ci id="S3.I1.i5.p1.1.m1.2.2.3.1.cmml" xref="S3.I1.i5.p1.1.m1.2.2.3.1">^</ci><ci id="S3.I1.i5.p1.1.m1.2.2.3.2.cmml" xref="S3.I1.i5.p1.1.m1.2.2.3.2">𝑝</ci></apply><set id="S3.I1.i5.p1.1.m1.2.2.1.1.1.1.cmml" xref="S3.I1.i5.p1.1.m1.2.2.1.1.1.2"><ci id="S3.I1.i5.p1.1.m1.1.1.cmml" xref="S3.I1.i5.p1.1.m1.1.1">𝒟</ci></set></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.I1.i5.p1.1.m1.2c">\hat{p}(\{\mathcal{D}\})</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i5.p1.1.m1.2d">over^ start_ARG italic_p end_ARG ( { caligraphic_D } )</annotation></semantics></math> represents the distribution estimation function from data samples set <math alttext="\mathcal{D}" class="ltx_Math" display="inline" id="S3.I1.i5.p1.2.m2.1"><semantics id="S3.I1.i5.p1.2.m2.1a"><mi class="ltx_font_mathcaligraphic" id="S3.I1.i5.p1.2.m2.1.1" xref="S3.I1.i5.p1.2.m2.1.1.cmml">𝒟</mi><annotation-xml encoding="MathML-Content" id="S3.I1.i5.p1.2.m2.1b"><ci id="S3.I1.i5.p1.2.m2.1.1.cmml" xref="S3.I1.i5.p1.2.m2.1.1">𝒟</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.I1.i5.p1.2.m2.1c">\mathcal{D}</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i5.p1.2.m2.1d">caligraphic_D</annotation></semantics></math>.</p> </div> </li> <li class="ltx_item" id="S3.I1.i6" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I1.i6.p1"> <p class="ltx_p" id="S3.I1.i6.p1.2"><math alttext="\text{KL}(p,q)" class="ltx_Math" display="inline" id="S3.I1.i6.p1.1.m1.2"><semantics id="S3.I1.i6.p1.1.m1.2a"><mrow id="S3.I1.i6.p1.1.m1.2.3" xref="S3.I1.i6.p1.1.m1.2.3.cmml"><mtext id="S3.I1.i6.p1.1.m1.2.3.2" xref="S3.I1.i6.p1.1.m1.2.3.2a.cmml">KL</mtext><mo id="S3.I1.i6.p1.1.m1.2.3.1" xref="S3.I1.i6.p1.1.m1.2.3.1.cmml">⁢</mo><mrow id="S3.I1.i6.p1.1.m1.2.3.3.2" xref="S3.I1.i6.p1.1.m1.2.3.3.1.cmml"><mo id="S3.I1.i6.p1.1.m1.2.3.3.2.1" stretchy="false" xref="S3.I1.i6.p1.1.m1.2.3.3.1.cmml">(</mo><mi id="S3.I1.i6.p1.1.m1.1.1" xref="S3.I1.i6.p1.1.m1.1.1.cmml">p</mi><mo id="S3.I1.i6.p1.1.m1.2.3.3.2.2" xref="S3.I1.i6.p1.1.m1.2.3.3.1.cmml">,</mo><mi id="S3.I1.i6.p1.1.m1.2.2" xref="S3.I1.i6.p1.1.m1.2.2.cmml">q</mi><mo id="S3.I1.i6.p1.1.m1.2.3.3.2.3" stretchy="false" xref="S3.I1.i6.p1.1.m1.2.3.3.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.I1.i6.p1.1.m1.2b"><apply id="S3.I1.i6.p1.1.m1.2.3.cmml" xref="S3.I1.i6.p1.1.m1.2.3"><times id="S3.I1.i6.p1.1.m1.2.3.1.cmml" xref="S3.I1.i6.p1.1.m1.2.3.1"></times><ci id="S3.I1.i6.p1.1.m1.2.3.2a.cmml" xref="S3.I1.i6.p1.1.m1.2.3.2"><mtext id="S3.I1.i6.p1.1.m1.2.3.2.cmml" xref="S3.I1.i6.p1.1.m1.2.3.2">KL</mtext></ci><interval closure="open" id="S3.I1.i6.p1.1.m1.2.3.3.1.cmml" xref="S3.I1.i6.p1.1.m1.2.3.3.2"><ci id="S3.I1.i6.p1.1.m1.1.1.cmml" xref="S3.I1.i6.p1.1.m1.1.1">𝑝</ci><ci id="S3.I1.i6.p1.1.m1.2.2.cmml" xref="S3.I1.i6.p1.1.m1.2.2">𝑞</ci></interval></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.I1.i6.p1.1.m1.2c">\text{KL}(p,q)</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i6.p1.1.m1.2d">KL ( italic_p , italic_q )</annotation></semantics></math> and <math alttext="W_{\beta}" class="ltx_Math" display="inline" id="S3.I1.i6.p1.2.m2.1"><semantics id="S3.I1.i6.p1.2.m2.1a"><msub id="S3.I1.i6.p1.2.m2.1.1" xref="S3.I1.i6.p1.2.m2.1.1.cmml"><mi id="S3.I1.i6.p1.2.m2.1.1.2" xref="S3.I1.i6.p1.2.m2.1.1.2.cmml">W</mi><mi id="S3.I1.i6.p1.2.m2.1.1.3" xref="S3.I1.i6.p1.2.m2.1.1.3.cmml">β</mi></msub><annotation-xml encoding="MathML-Content" id="S3.I1.i6.p1.2.m2.1b"><apply id="S3.I1.i6.p1.2.m2.1.1.cmml" xref="S3.I1.i6.p1.2.m2.1.1"><csymbol cd="ambiguous" id="S3.I1.i6.p1.2.m2.1.1.1.cmml" xref="S3.I1.i6.p1.2.m2.1.1">subscript</csymbol><ci id="S3.I1.i6.p1.2.m2.1.1.2.cmml" xref="S3.I1.i6.p1.2.m2.1.1.2">𝑊</ci><ci id="S3.I1.i6.p1.2.m2.1.1.3.cmml" xref="S3.I1.i6.p1.2.m2.1.1.3">𝛽</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.I1.i6.p1.2.m2.1c">W_{\beta}</annotation><annotation encoding="application/x-llamapun" id="S3.I1.i6.p1.2.m2.1d">italic_W start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT</annotation></semantics></math> represents the KL divergence and Wasserstein distance, seperately.</p> </div> </li> </ul> </div> <div class="ltx_para" id="S3.SS3.p4"> <p class="ltx_p" id="S3.SS3.p4.8">At the beginning of each training iteration <math alttext="k" class="ltx_Math" display="inline" id="S3.SS3.p4.1.m1.1"><semantics id="S3.SS3.p4.1.m1.1a"><mi id="S3.SS3.p4.1.m1.1.1" xref="S3.SS3.p4.1.m1.1.1.cmml">k</mi><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.1.m1.1b"><ci id="S3.SS3.p4.1.m1.1.1.cmml" xref="S3.SS3.p4.1.m1.1.1">𝑘</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.1.m1.1c">k</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.1.m1.1d">italic_k</annotation></semantics></math> in the simulation, the first term, <math alttext="\text{KL}\left(\hat{p}(\mathcal{D}_{real}^{k})\parallel\hat{p}(\mathcal{D}^{k-% 1}_{sim})\right)" class="ltx_Math" display="inline" id="S3.SS3.p4.2.m2.1"><semantics id="S3.SS3.p4.2.m2.1a"><mrow id="S3.SS3.p4.2.m2.1.1" xref="S3.SS3.p4.2.m2.1.1.cmml"><mtext id="S3.SS3.p4.2.m2.1.1.3" xref="S3.SS3.p4.2.m2.1.1.3a.cmml">KL</mtext><mo id="S3.SS3.p4.2.m2.1.1.2" xref="S3.SS3.p4.2.m2.1.1.2.cmml">⁢</mo><mrow id="S3.SS3.p4.2.m2.1.1.1.1" xref="S3.SS3.p4.2.m2.1.1.1.1.1.cmml"><mo id="S3.SS3.p4.2.m2.1.1.1.1.2" xref="S3.SS3.p4.2.m2.1.1.1.1.1.cmml">(</mo><mrow id="S3.SS3.p4.2.m2.1.1.1.1.1" xref="S3.SS3.p4.2.m2.1.1.1.1.1.cmml"><mrow id="S3.SS3.p4.2.m2.1.1.1.1.1.1" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.cmml"><mover accent="true" id="S3.SS3.p4.2.m2.1.1.1.1.1.1.3" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.3.cmml"><mi id="S3.SS3.p4.2.m2.1.1.1.1.1.1.3.2" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.3.2.cmml">p</mi><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.1.3.1" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.3.1.cmml">^</mo></mover><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.1.2" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.2.cmml">⁢</mo><mrow id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.cmml"><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.2" stretchy="false" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.cmml">(</mo><msubsup id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.2" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.2.cmml">𝒟</mi><mrow id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.cmml"><mi id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.2" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.2.cmml">r</mi><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.1" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.3" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.3.cmml">e</mi><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.1a" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.4" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.4.cmml">a</mi><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.1b" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.5" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.5.cmml">l</mi></mrow><mi id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.3" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.3.cmml">k</mi></msubsup><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.3" stretchy="false" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.3" xref="S3.SS3.p4.2.m2.1.1.1.1.1.3.cmml">∥</mo><mrow id="S3.SS3.p4.2.m2.1.1.1.1.1.2" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.cmml"><mover accent="true" id="S3.SS3.p4.2.m2.1.1.1.1.1.2.3" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.3.cmml"><mi id="S3.SS3.p4.2.m2.1.1.1.1.1.2.3.2" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.3.2.cmml">p</mi><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.2.3.1" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.3.1.cmml">^</mo></mover><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.2.2" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.2.cmml">⁢</mo><mrow id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.cmml"><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.2" stretchy="false" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.cmml">(</mo><msubsup id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.2" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.2.cmml">𝒟</mi><mrow id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.cmml"><mi id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.2" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.2.cmml">s</mi><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.1" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.3" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.3.cmml">i</mi><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.1a" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.4" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.4.cmml">m</mi></mrow><mrow id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3.cmml"><mi id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3.2" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3.2.cmml">k</mi><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3.1" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3.1.cmml">−</mo><mn id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3.3" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3.3.cmml">1</mn></mrow></msubsup><mo id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.3" stretchy="false" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.cmml">)</mo></mrow></mrow></mrow><mo id="S3.SS3.p4.2.m2.1.1.1.1.3" xref="S3.SS3.p4.2.m2.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.2.m2.1b"><apply id="S3.SS3.p4.2.m2.1.1.cmml" xref="S3.SS3.p4.2.m2.1.1"><times id="S3.SS3.p4.2.m2.1.1.2.cmml" xref="S3.SS3.p4.2.m2.1.1.2"></times><ci id="S3.SS3.p4.2.m2.1.1.3a.cmml" xref="S3.SS3.p4.2.m2.1.1.3"><mtext id="S3.SS3.p4.2.m2.1.1.3.cmml" xref="S3.SS3.p4.2.m2.1.1.3">KL</mtext></ci><apply id="S3.SS3.p4.2.m2.1.1.1.1.1.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1"><csymbol cd="latexml" id="S3.SS3.p4.2.m2.1.1.1.1.1.3.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.3">conditional</csymbol><apply id="S3.SS3.p4.2.m2.1.1.1.1.1.1.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1"><times id="S3.SS3.p4.2.m2.1.1.1.1.1.1.2.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.2"></times><apply id="S3.SS3.p4.2.m2.1.1.1.1.1.1.3.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.3"><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.1.3.1.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.3.1">^</ci><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.1.3.2.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.3.2">𝑝</ci></apply><apply id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.1.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1">superscript</csymbol><apply id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.1.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1">subscript</csymbol><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.2.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.2">𝒟</ci><apply id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3"><times id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.1.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.1"></times><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.2.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.2">𝑟</ci><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.3.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.3">𝑒</ci><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.4.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.4">𝑎</ci><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.5.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.2.3.5">𝑙</ci></apply></apply><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.3.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.1.1.1.1.3">𝑘</ci></apply></apply><apply id="S3.SS3.p4.2.m2.1.1.1.1.1.2.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2"><times id="S3.SS3.p4.2.m2.1.1.1.1.1.2.2.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.2"></times><apply id="S3.SS3.p4.2.m2.1.1.1.1.1.2.3.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.3"><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.2.3.1.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.3.1">^</ci><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.2.3.2.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.3.2">𝑝</ci></apply><apply id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1"><csymbol cd="ambiguous" id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.1.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1">subscript</csymbol><apply id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1"><csymbol cd="ambiguous" id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.1.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1">superscript</csymbol><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.2.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.2">𝒟</ci><apply id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3"><minus id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3.1.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3.1"></minus><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3.2.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3.2">𝑘</ci><cn id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3.3.cmml" type="integer" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.2.3.3">1</cn></apply></apply><apply id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3"><times id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.1.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.1"></times><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.2.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.2">𝑠</ci><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.3.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.3">𝑖</ci><ci id="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.4.cmml" xref="S3.SS3.p4.2.m2.1.1.1.1.1.2.1.1.1.3.4">𝑚</ci></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.2.m2.1c">\text{KL}\left(\hat{p}(\mathcal{D}_{real}^{k})\parallel\hat{p}(\mathcal{D}^{k-% 1}_{sim})\right)</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.2.m2.1d">KL ( over^ start_ARG italic_p end_ARG ( caligraphic_D start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ over^ start_ARG italic_p end_ARG ( caligraphic_D start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT ) )</annotation></semantics></math>, remains constant throughout the training process and represents the sim-to-real gap at the current iteration. Meanwhile, a specific data pair <math alttext="\mathcal{D}_{t}" class="ltx_Math" display="inline" id="S3.SS3.p4.3.m3.1"><semantics id="S3.SS3.p4.3.m3.1a"><msub id="S3.SS3.p4.3.m3.1.1" xref="S3.SS3.p4.3.m3.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.SS3.p4.3.m3.1.1.2" xref="S3.SS3.p4.3.m3.1.1.2.cmml">𝒟</mi><mi id="S3.SS3.p4.3.m3.1.1.3" xref="S3.SS3.p4.3.m3.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.3.m3.1b"><apply id="S3.SS3.p4.3.m3.1.1.cmml" xref="S3.SS3.p4.3.m3.1.1"><csymbol cd="ambiguous" id="S3.SS3.p4.3.m3.1.1.1.cmml" xref="S3.SS3.p4.3.m3.1.1">subscript</csymbol><ci id="S3.SS3.p4.3.m3.1.1.2.cmml" xref="S3.SS3.p4.3.m3.1.1.2">𝒟</ci><ci id="S3.SS3.p4.3.m3.1.1.3.cmml" xref="S3.SS3.p4.3.m3.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.3.m3.1c">\mathcal{D}_{t}</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.3.m3.1d">caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>, influenced by the policy’s action <math alttext="a_{t}" class="ltx_Math" display="inline" id="S3.SS3.p4.4.m4.1"><semantics id="S3.SS3.p4.4.m4.1a"><msub id="S3.SS3.p4.4.m4.1.1" xref="S3.SS3.p4.4.m4.1.1.cmml"><mi id="S3.SS3.p4.4.m4.1.1.2" xref="S3.SS3.p4.4.m4.1.1.2.cmml">a</mi><mi id="S3.SS3.p4.4.m4.1.1.3" xref="S3.SS3.p4.4.m4.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.4.m4.1b"><apply id="S3.SS3.p4.4.m4.1.1.cmml" xref="S3.SS3.p4.4.m4.1.1"><csymbol cd="ambiguous" id="S3.SS3.p4.4.m4.1.1.1.cmml" xref="S3.SS3.p4.4.m4.1.1">subscript</csymbol><ci id="S3.SS3.p4.4.m4.1.1.2.cmml" xref="S3.SS3.p4.4.m4.1.1.2">𝑎</ci><ci id="S3.SS3.p4.4.m4.1.1.3.cmml" xref="S3.SS3.p4.4.m4.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.4.m4.1c">a_{t}</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.4.m4.1d">italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> during training, affects the second term, <math alttext="W_{\beta}\left(\hat{p}(\mathcal{D}_{sim}^{k,t}+\mathcal{D}_{t}),\hat{p}(% \mathcal{D}_{sim}^{k,t})\right)" class="ltx_Math" display="inline" id="S3.SS3.p4.5.m5.6"><semantics id="S3.SS3.p4.5.m5.6a"><mrow id="S3.SS3.p4.5.m5.6.6" xref="S3.SS3.p4.5.m5.6.6.cmml"><msub id="S3.SS3.p4.5.m5.6.6.4" xref="S3.SS3.p4.5.m5.6.6.4.cmml"><mi id="S3.SS3.p4.5.m5.6.6.4.2" xref="S3.SS3.p4.5.m5.6.6.4.2.cmml">W</mi><mi id="S3.SS3.p4.5.m5.6.6.4.3" xref="S3.SS3.p4.5.m5.6.6.4.3.cmml">β</mi></msub><mo id="S3.SS3.p4.5.m5.6.6.3" xref="S3.SS3.p4.5.m5.6.6.3.cmml">⁢</mo><mrow id="S3.SS3.p4.5.m5.6.6.2.2" xref="S3.SS3.p4.5.m5.6.6.2.3.cmml"><mo id="S3.SS3.p4.5.m5.6.6.2.2.3" xref="S3.SS3.p4.5.m5.6.6.2.3.cmml">(</mo><mrow id="S3.SS3.p4.5.m5.5.5.1.1.1" xref="S3.SS3.p4.5.m5.5.5.1.1.1.cmml"><mover accent="true" id="S3.SS3.p4.5.m5.5.5.1.1.1.3" xref="S3.SS3.p4.5.m5.5.5.1.1.1.3.cmml"><mi id="S3.SS3.p4.5.m5.5.5.1.1.1.3.2" xref="S3.SS3.p4.5.m5.5.5.1.1.1.3.2.cmml">p</mi><mo id="S3.SS3.p4.5.m5.5.5.1.1.1.3.1" xref="S3.SS3.p4.5.m5.5.5.1.1.1.3.1.cmml">^</mo></mover><mo id="S3.SS3.p4.5.m5.5.5.1.1.1.2" xref="S3.SS3.p4.5.m5.5.5.1.1.1.2.cmml">⁢</mo><mrow id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.cmml"><mo id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.2" stretchy="false" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.cmml">(</mo><mrow id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.cmml"><msubsup id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.2" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.2.cmml">𝒟</mi><mrow id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.cmml"><mi id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.2" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.2.cmml">s</mi><mo id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.1" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.3" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.3.cmml">i</mi><mo id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.1a" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.4" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.4.cmml">m</mi></mrow><mrow id="S3.SS3.p4.5.m5.2.2.2.4" xref="S3.SS3.p4.5.m5.2.2.2.3.cmml"><mi id="S3.SS3.p4.5.m5.1.1.1.1" xref="S3.SS3.p4.5.m5.1.1.1.1.cmml">k</mi><mo id="S3.SS3.p4.5.m5.2.2.2.4.1" xref="S3.SS3.p4.5.m5.2.2.2.3.cmml">,</mo><mi id="S3.SS3.p4.5.m5.2.2.2.2" xref="S3.SS3.p4.5.m5.2.2.2.2.cmml">t</mi></mrow></msubsup><mo id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.1" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.1.cmml">+</mo><msub id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.3" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.3.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.3.2" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.3.2.cmml">𝒟</mi><mi id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.3.3" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.3.3.cmml">t</mi></msub></mrow><mo id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.3" stretchy="false" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.SS3.p4.5.m5.6.6.2.2.4" xref="S3.SS3.p4.5.m5.6.6.2.3.cmml">,</mo><mrow id="S3.SS3.p4.5.m5.6.6.2.2.2" xref="S3.SS3.p4.5.m5.6.6.2.2.2.cmml"><mover accent="true" id="S3.SS3.p4.5.m5.6.6.2.2.2.3" xref="S3.SS3.p4.5.m5.6.6.2.2.2.3.cmml"><mi id="S3.SS3.p4.5.m5.6.6.2.2.2.3.2" xref="S3.SS3.p4.5.m5.6.6.2.2.2.3.2.cmml">p</mi><mo id="S3.SS3.p4.5.m5.6.6.2.2.2.3.1" xref="S3.SS3.p4.5.m5.6.6.2.2.2.3.1.cmml">^</mo></mover><mo id="S3.SS3.p4.5.m5.6.6.2.2.2.2" xref="S3.SS3.p4.5.m5.6.6.2.2.2.2.cmml">⁢</mo><mrow id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.cmml"><mo id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.2" stretchy="false" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.cmml">(</mo><msubsup id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.2" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.2.cmml">𝒟</mi><mrow id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.cmml"><mi id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.2" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.2.cmml">s</mi><mo id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.1" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.3" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.3.cmml">i</mi><mo id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.1a" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.4" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.4.cmml">m</mi></mrow><mrow id="S3.SS3.p4.5.m5.4.4.2.4" xref="S3.SS3.p4.5.m5.4.4.2.3.cmml"><mi id="S3.SS3.p4.5.m5.3.3.1.1" xref="S3.SS3.p4.5.m5.3.3.1.1.cmml">k</mi><mo id="S3.SS3.p4.5.m5.4.4.2.4.1" xref="S3.SS3.p4.5.m5.4.4.2.3.cmml">,</mo><mi id="S3.SS3.p4.5.m5.4.4.2.2" xref="S3.SS3.p4.5.m5.4.4.2.2.cmml">t</mi></mrow></msubsup><mo id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.3" stretchy="false" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.SS3.p4.5.m5.6.6.2.2.5" xref="S3.SS3.p4.5.m5.6.6.2.3.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.5.m5.6b"><apply id="S3.SS3.p4.5.m5.6.6.cmml" xref="S3.SS3.p4.5.m5.6.6"><times id="S3.SS3.p4.5.m5.6.6.3.cmml" xref="S3.SS3.p4.5.m5.6.6.3"></times><apply id="S3.SS3.p4.5.m5.6.6.4.cmml" xref="S3.SS3.p4.5.m5.6.6.4"><csymbol cd="ambiguous" id="S3.SS3.p4.5.m5.6.6.4.1.cmml" xref="S3.SS3.p4.5.m5.6.6.4">subscript</csymbol><ci id="S3.SS3.p4.5.m5.6.6.4.2.cmml" xref="S3.SS3.p4.5.m5.6.6.4.2">𝑊</ci><ci id="S3.SS3.p4.5.m5.6.6.4.3.cmml" xref="S3.SS3.p4.5.m5.6.6.4.3">𝛽</ci></apply><interval closure="open" id="S3.SS3.p4.5.m5.6.6.2.3.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2"><apply id="S3.SS3.p4.5.m5.5.5.1.1.1.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1"><times id="S3.SS3.p4.5.m5.5.5.1.1.1.2.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.2"></times><apply id="S3.SS3.p4.5.m5.5.5.1.1.1.3.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.3"><ci id="S3.SS3.p4.5.m5.5.5.1.1.1.3.1.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.3.1">^</ci><ci id="S3.SS3.p4.5.m5.5.5.1.1.1.3.2.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.3.2">𝑝</ci></apply><apply id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1"><plus id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.1.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.1"></plus><apply id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.1.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2">superscript</csymbol><apply id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.1.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2">subscript</csymbol><ci id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.2.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.2">𝒟</ci><apply id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3"><times id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.1.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.1"></times><ci id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.2.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.2">𝑠</ci><ci id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.3.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.3">𝑖</ci><ci id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.4.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.2.2.3.4">𝑚</ci></apply></apply><list id="S3.SS3.p4.5.m5.2.2.2.3.cmml" xref="S3.SS3.p4.5.m5.2.2.2.4"><ci id="S3.SS3.p4.5.m5.1.1.1.1.cmml" xref="S3.SS3.p4.5.m5.1.1.1.1">𝑘</ci><ci id="S3.SS3.p4.5.m5.2.2.2.2.cmml" xref="S3.SS3.p4.5.m5.2.2.2.2">𝑡</ci></list></apply><apply id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.3.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.3.1.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.3">subscript</csymbol><ci id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.3.2.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.3.2">𝒟</ci><ci id="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.3.3.cmml" xref="S3.SS3.p4.5.m5.5.5.1.1.1.1.1.1.3.3">𝑡</ci></apply></apply></apply><apply id="S3.SS3.p4.5.m5.6.6.2.2.2.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2"><times id="S3.SS3.p4.5.m5.6.6.2.2.2.2.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2.2"></times><apply id="S3.SS3.p4.5.m5.6.6.2.2.2.3.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2.3"><ci id="S3.SS3.p4.5.m5.6.6.2.2.2.3.1.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2.3.1">^</ci><ci id="S3.SS3.p4.5.m5.6.6.2.2.2.3.2.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2.3.2">𝑝</ci></apply><apply id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1"><csymbol cd="ambiguous" id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.1.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1">superscript</csymbol><apply id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1"><csymbol cd="ambiguous" id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.1.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1">subscript</csymbol><ci id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.2.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.2">𝒟</ci><apply id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3"><times id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.1.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.1"></times><ci id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.2.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.2">𝑠</ci><ci id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.3.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.3">𝑖</ci><ci id="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.4.cmml" xref="S3.SS3.p4.5.m5.6.6.2.2.2.1.1.1.2.3.4">𝑚</ci></apply></apply><list id="S3.SS3.p4.5.m5.4.4.2.3.cmml" xref="S3.SS3.p4.5.m5.4.4.2.4"><ci id="S3.SS3.p4.5.m5.3.3.1.1.cmml" xref="S3.SS3.p4.5.m5.3.3.1.1">𝑘</ci><ci id="S3.SS3.p4.5.m5.4.4.2.2.cmml" xref="S3.SS3.p4.5.m5.4.4.2.2">𝑡</ci></list></apply></apply></interval></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.5.m5.6c">W_{\beta}\left(\hat{p}(\mathcal{D}_{sim}^{k,t}+\mathcal{D}_{t}),\hat{p}(% \mathcal{D}_{sim}^{k,t})\right)</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.5.m5.6d">italic_W start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( over^ start_ARG italic_p end_ARG ( caligraphic_D start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k , italic_t end_POSTSUPERSCRIPT + caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , over^ start_ARG italic_p end_ARG ( caligraphic_D start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k , italic_t end_POSTSUPERSCRIPT ) )</annotation></semantics></math>. This term encourages the policy to generate actions that produce more informative data—data that exhibit larger discrepancies from the simulated distribution while being closer to the real-world distribution. Therefore, the cost function is designed to encourage actions that are more informative and exploratory, enabling the collection of data that fully captures the distribution or characteristics of the real domain, thereby mitigating data bias when addressing the sim-to-real gap. When the sim-to-real gap becomes negligible (i.e., <math alttext="\text{KL}\left(\hat{p}(\mathcal{D}_{real}^{k})\parallel\hat{p}(\mathcal{D}^{k-% 1}_{sim})\right)" class="ltx_Math" display="inline" id="S3.SS3.p4.6.m6.1"><semantics id="S3.SS3.p4.6.m6.1a"><mrow id="S3.SS3.p4.6.m6.1.1" xref="S3.SS3.p4.6.m6.1.1.cmml"><mtext id="S3.SS3.p4.6.m6.1.1.3" xref="S3.SS3.p4.6.m6.1.1.3a.cmml">KL</mtext><mo id="S3.SS3.p4.6.m6.1.1.2" xref="S3.SS3.p4.6.m6.1.1.2.cmml">⁢</mo><mrow id="S3.SS3.p4.6.m6.1.1.1.1" xref="S3.SS3.p4.6.m6.1.1.1.1.1.cmml"><mo id="S3.SS3.p4.6.m6.1.1.1.1.2" xref="S3.SS3.p4.6.m6.1.1.1.1.1.cmml">(</mo><mrow id="S3.SS3.p4.6.m6.1.1.1.1.1" xref="S3.SS3.p4.6.m6.1.1.1.1.1.cmml"><mrow id="S3.SS3.p4.6.m6.1.1.1.1.1.1" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.cmml"><mover accent="true" id="S3.SS3.p4.6.m6.1.1.1.1.1.1.3" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.3.cmml"><mi id="S3.SS3.p4.6.m6.1.1.1.1.1.1.3.2" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.3.2.cmml">p</mi><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.1.3.1" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.3.1.cmml">^</mo></mover><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.1.2" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.2.cmml">⁢</mo><mrow id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.cmml"><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.2" stretchy="false" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.cmml">(</mo><msubsup id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.2" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.2.cmml">𝒟</mi><mrow id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.cmml"><mi id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.2" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.2.cmml">r</mi><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.1" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.3" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.3.cmml">e</mi><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.1a" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.4" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.4.cmml">a</mi><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.1b" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.5" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.5.cmml">l</mi></mrow><mi id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.3" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.3.cmml">k</mi></msubsup><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.3" stretchy="false" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.3" xref="S3.SS3.p4.6.m6.1.1.1.1.1.3.cmml">∥</mo><mrow id="S3.SS3.p4.6.m6.1.1.1.1.1.2" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.cmml"><mover accent="true" id="S3.SS3.p4.6.m6.1.1.1.1.1.2.3" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.3.cmml"><mi id="S3.SS3.p4.6.m6.1.1.1.1.1.2.3.2" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.3.2.cmml">p</mi><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.2.3.1" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.3.1.cmml">^</mo></mover><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.2.2" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.2.cmml">⁢</mo><mrow id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.cmml"><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.2" stretchy="false" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.cmml">(</mo><msubsup id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.2" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.2.cmml">𝒟</mi><mrow id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.cmml"><mi id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.2" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.2.cmml">s</mi><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.1" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.3" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.3.cmml">i</mi><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.1a" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.4" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.4.cmml">m</mi></mrow><mrow id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3.cmml"><mi id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3.2" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3.2.cmml">k</mi><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3.1" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3.1.cmml">−</mo><mn id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3.3" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3.3.cmml">1</mn></mrow></msubsup><mo id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.3" stretchy="false" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.cmml">)</mo></mrow></mrow></mrow><mo id="S3.SS3.p4.6.m6.1.1.1.1.3" xref="S3.SS3.p4.6.m6.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.6.m6.1b"><apply id="S3.SS3.p4.6.m6.1.1.cmml" xref="S3.SS3.p4.6.m6.1.1"><times id="S3.SS3.p4.6.m6.1.1.2.cmml" xref="S3.SS3.p4.6.m6.1.1.2"></times><ci id="S3.SS3.p4.6.m6.1.1.3a.cmml" xref="S3.SS3.p4.6.m6.1.1.3"><mtext id="S3.SS3.p4.6.m6.1.1.3.cmml" xref="S3.SS3.p4.6.m6.1.1.3">KL</mtext></ci><apply id="S3.SS3.p4.6.m6.1.1.1.1.1.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1"><csymbol cd="latexml" id="S3.SS3.p4.6.m6.1.1.1.1.1.3.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.3">conditional</csymbol><apply id="S3.SS3.p4.6.m6.1.1.1.1.1.1.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1"><times id="S3.SS3.p4.6.m6.1.1.1.1.1.1.2.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.2"></times><apply id="S3.SS3.p4.6.m6.1.1.1.1.1.1.3.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.3"><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.1.3.1.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.3.1">^</ci><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.1.3.2.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.3.2">𝑝</ci></apply><apply id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.1.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1">superscript</csymbol><apply id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.1.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1">subscript</csymbol><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.2.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.2">𝒟</ci><apply id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3"><times id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.1.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.1"></times><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.2.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.2">𝑟</ci><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.3.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.3">𝑒</ci><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.4.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.4">𝑎</ci><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.5.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.2.3.5">𝑙</ci></apply></apply><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.3.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.1.1.1.1.3">𝑘</ci></apply></apply><apply id="S3.SS3.p4.6.m6.1.1.1.1.1.2.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2"><times id="S3.SS3.p4.6.m6.1.1.1.1.1.2.2.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.2"></times><apply id="S3.SS3.p4.6.m6.1.1.1.1.1.2.3.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.3"><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.2.3.1.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.3.1">^</ci><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.2.3.2.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.3.2">𝑝</ci></apply><apply id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1"><csymbol cd="ambiguous" id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.1.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1">subscript</csymbol><apply id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1"><csymbol cd="ambiguous" id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.1.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1">superscript</csymbol><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.2.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.2">𝒟</ci><apply id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3"><minus id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3.1.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3.1"></minus><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3.2.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3.2">𝑘</ci><cn id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3.3.cmml" type="integer" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.2.3.3">1</cn></apply></apply><apply id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3"><times id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.1.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.1"></times><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.2.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.2">𝑠</ci><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.3.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.3">𝑖</ci><ci id="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.4.cmml" xref="S3.SS3.p4.6.m6.1.1.1.1.1.2.1.1.1.3.4">𝑚</ci></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.6.m6.1c">\text{KL}\left(\hat{p}(\mathcal{D}_{real}^{k})\parallel\hat{p}(\mathcal{D}^{k-% 1}_{sim})\right)</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.6.m6.1d">KL ( over^ start_ARG italic_p end_ARG ( caligraphic_D start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ over^ start_ARG italic_p end_ARG ( caligraphic_D start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s italic_i italic_m end_POSTSUBSCRIPT ) )</annotation></semantics></math> approaches zero), this term <math alttext="\mathcal{L}_{sr}" class="ltx_Math" display="inline" id="S3.SS3.p4.7.m7.1"><semantics id="S3.SS3.p4.7.m7.1a"><msub id="S3.SS3.p4.7.m7.1.1" xref="S3.SS3.p4.7.m7.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.SS3.p4.7.m7.1.1.2" xref="S3.SS3.p4.7.m7.1.1.2.cmml">ℒ</mi><mrow id="S3.SS3.p4.7.m7.1.1.3" xref="S3.SS3.p4.7.m7.1.1.3.cmml"><mi id="S3.SS3.p4.7.m7.1.1.3.2" xref="S3.SS3.p4.7.m7.1.1.3.2.cmml">s</mi><mo id="S3.SS3.p4.7.m7.1.1.3.1" xref="S3.SS3.p4.7.m7.1.1.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.7.m7.1.1.3.3" xref="S3.SS3.p4.7.m7.1.1.3.3.cmml">r</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.7.m7.1b"><apply id="S3.SS3.p4.7.m7.1.1.cmml" xref="S3.SS3.p4.7.m7.1.1"><csymbol cd="ambiguous" id="S3.SS3.p4.7.m7.1.1.1.cmml" xref="S3.SS3.p4.7.m7.1.1">subscript</csymbol><ci id="S3.SS3.p4.7.m7.1.1.2.cmml" xref="S3.SS3.p4.7.m7.1.1.2">ℒ</ci><apply id="S3.SS3.p4.7.m7.1.1.3.cmml" xref="S3.SS3.p4.7.m7.1.1.3"><times id="S3.SS3.p4.7.m7.1.1.3.1.cmml" xref="S3.SS3.p4.7.m7.1.1.3.1"></times><ci id="S3.SS3.p4.7.m7.1.1.3.2.cmml" xref="S3.SS3.p4.7.m7.1.1.3.2">𝑠</ci><ci id="S3.SS3.p4.7.m7.1.1.3.3.cmml" xref="S3.SS3.p4.7.m7.1.1.3.3">𝑟</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.7.m7.1c">\mathcal{L}_{sr}</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.7.m7.1d">caligraphic_L start_POSTSUBSCRIPT italic_s italic_r end_POSTSUBSCRIPT</annotation></semantics></math> in the cost function diminishes, leaving only the task completion cost <math alttext="\mathcal{L}_{task}" class="ltx_Math" display="inline" id="S3.SS3.p4.8.m8.1"><semantics id="S3.SS3.p4.8.m8.1a"><msub id="S3.SS3.p4.8.m8.1.1" xref="S3.SS3.p4.8.m8.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S3.SS3.p4.8.m8.1.1.2" xref="S3.SS3.p4.8.m8.1.1.2.cmml">ℒ</mi><mrow id="S3.SS3.p4.8.m8.1.1.3" xref="S3.SS3.p4.8.m8.1.1.3.cmml"><mi id="S3.SS3.p4.8.m8.1.1.3.2" xref="S3.SS3.p4.8.m8.1.1.3.2.cmml">t</mi><mo id="S3.SS3.p4.8.m8.1.1.3.1" xref="S3.SS3.p4.8.m8.1.1.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.8.m8.1.1.3.3" xref="S3.SS3.p4.8.m8.1.1.3.3.cmml">a</mi><mo id="S3.SS3.p4.8.m8.1.1.3.1a" xref="S3.SS3.p4.8.m8.1.1.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.8.m8.1.1.3.4" xref="S3.SS3.p4.8.m8.1.1.3.4.cmml">s</mi><mo id="S3.SS3.p4.8.m8.1.1.3.1b" xref="S3.SS3.p4.8.m8.1.1.3.1.cmml">⁢</mo><mi id="S3.SS3.p4.8.m8.1.1.3.5" xref="S3.SS3.p4.8.m8.1.1.3.5.cmml">k</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.SS3.p4.8.m8.1b"><apply id="S3.SS3.p4.8.m8.1.1.cmml" xref="S3.SS3.p4.8.m8.1.1"><csymbol cd="ambiguous" id="S3.SS3.p4.8.m8.1.1.1.cmml" xref="S3.SS3.p4.8.m8.1.1">subscript</csymbol><ci id="S3.SS3.p4.8.m8.1.1.2.cmml" xref="S3.SS3.p4.8.m8.1.1.2">ℒ</ci><apply id="S3.SS3.p4.8.m8.1.1.3.cmml" xref="S3.SS3.p4.8.m8.1.1.3"><times id="S3.SS3.p4.8.m8.1.1.3.1.cmml" xref="S3.SS3.p4.8.m8.1.1.3.1"></times><ci id="S3.SS3.p4.8.m8.1.1.3.2.cmml" xref="S3.SS3.p4.8.m8.1.1.3.2">𝑡</ci><ci id="S3.SS3.p4.8.m8.1.1.3.3.cmml" xref="S3.SS3.p4.8.m8.1.1.3.3">𝑎</ci><ci id="S3.SS3.p4.8.m8.1.1.3.4.cmml" xref="S3.SS3.p4.8.m8.1.1.3.4">𝑠</ci><ci id="S3.SS3.p4.8.m8.1.1.3.5.cmml" xref="S3.SS3.p4.8.m8.1.1.3.5">𝑘</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p4.8.m8.1c">\mathcal{L}_{task}</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p4.8.m8.1d">caligraphic_L start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT</annotation></semantics></math> to guide the actions. Ultimately, this approach yields a policy capable of completing the task effectively while minimizing the sim-to-real gap.</p> </div> </section> </section> <section class="ltx_section" id="S4"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">IV </span><span class="ltx_text ltx_font_smallcaps" id="S4.1.1">Experiments</span> </h2> <div class="ltx_para" id="S4.p1"> <p class="ltx_p" id="S4.p1.1">In this section, we present a series of experiments designed to evaluate and validate the effectiveness of our proposed RSR loop framework in the block-pushing tasks with a 6-dof robotic arm.</p> </div> <section class="ltx_subsection" id="S4.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S4.SS1.4.1.1">IV-A</span> </span><span class="ltx_text ltx_font_italic" id="S4.SS1.5.2">Experimental Settings</span> </h3> <div class="ltx_para" id="S4.SS1.p1"> <p class="ltx_p" id="S4.SS1.p1.1">For the simulation, we use the DISCOVERSE <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib28" title="">28</a>]</cite> simulator to model the robot’s interactions with its environment. The reinforcement learning algorithm used is PPO, implemented with the JAX computation library. Definitions of action space, observation space and rewards used in the RL policy are shown in Appendix <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S8.SS2" title="VIII-B RL Settings ‣ VIII Appendix ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">VIII-B</span></span></a>.The simulation runs on a laptop equipped with an NVIDIA 4090 GPU and 24GB of memory. In the real-world experiments, a 6-DOF AIRBOT Play robotic arm is used to execute object manipulation tasks. The robot is equipped with a Realsense D435i depth camera for visual perception, providing accurate pose estimation of the objects in the workspace. </p> </div> </section> <section class="ltx_subsection" id="S4.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S4.SS2.4.1.1">IV-B</span> </span><span class="ltx_text ltx_font_italic" id="S4.SS2.5.2">Block Pushing Experiment</span> </h3> <div class="ltx_para" id="S4.SS2.p1"> <p class="ltx_p" id="S4.SS2.p1.2">In this experiment, the robot is required to push the block to random target points within the workspace. We trained the policy in the original simulation (marked as “simulated trajectory”), transferred it to the real robot (marked as “1st PPO”), and iterated through the <math alttext="n" class="ltx_Math" display="inline" id="S4.SS2.p1.1.m1.1"><semantics id="S4.SS2.p1.1.m1.1a"><mi id="S4.SS2.p1.1.m1.1.1" xref="S4.SS2.p1.1.m1.1.1.cmml">n</mi><annotation-xml encoding="MathML-Content" id="S4.SS2.p1.1.m1.1b"><ci id="S4.SS2.p1.1.m1.1.1.cmml" xref="S4.SS2.p1.1.m1.1.1">𝑛</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p1.1.m1.1c">n</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p1.1.m1.1d">italic_n</annotation></semantics></math>-th RSR loop to obtain the corresponding trajectories in the real environment (marked as <math alttext="n" class="ltx_Math" display="inline" id="S4.SS2.p1.2.m2.1"><semantics id="S4.SS2.p1.2.m2.1a"><mi id="S4.SS2.p1.2.m2.1.1" xref="S4.SS2.p1.2.m2.1.1.cmml">n</mi><annotation-xml encoding="MathML-Content" id="S4.SS2.p1.2.m2.1b"><ci id="S4.SS2.p1.2.m2.1.1.cmml" xref="S4.SS2.p1.2.m2.1.1">𝑛</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p1.2.m2.1c">n</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p1.2.m2.1d">italic_n</annotation></semantics></math>-th RSR Trajectory). As seen in Table <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S4.T1" title="TABLE I ‣ IV-B Block Pushing Experiment ‣ IV Experiments ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_tag">I</span></a>, the KL divergence between the simulation and real trajectories is initially high, particularly in the 1-st PPO stage. This indicates a significant discrepancy between the simulation-trained policy and the real-world environment. However, after applying multiple iterations of the RSR loop, the KL divergence progressively decreases, demonstrating that the simulator is becoming more representative of real-world dynamics, and the refined policy is better aligned with real-world behavior. This demonstrates the effectiveness of the designed InfoGap loss.</p> </div> <figure class="ltx_table" id="S4.T1"> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">TABLE I: </span>Results of Distribution Deviation</figcaption> <table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle" id="S4.T1.1"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S4.T1.1.1.1"> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_l ltx_border_r ltx_border_t" id="S4.T1.1.1.1.1"><span class="ltx_text ltx_font_bold" id="S4.T1.1.1.1.1.1">Stage</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S4.T1.1.1.1.2"><span class="ltx_text ltx_font_bold" id="S4.T1.1.1.1.2.1">KL Divergence (X)</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S4.T1.1.1.1.3"><span class="ltx_text ltx_font_bold" id="S4.T1.1.1.1.3.1">KL Divergence (Y)</span></th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S4.T1.1.2.1"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S4.T1.1.2.1.1">1st_PPO</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T1.1.2.1.2">16.3509</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T1.1.2.1.3">36.3168</td> </tr> <tr class="ltx_tr" id="S4.T1.1.3.2"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S4.T1.1.3.2.1">2nd_RSR</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T1.1.3.2.2">1.6903</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T1.1.3.2.3">3.4206</td> </tr> <tr class="ltx_tr" id="S4.T1.1.4.3"> <td class="ltx_td ltx_align_center ltx_border_l ltx_border_r ltx_border_t" id="S4.T1.1.4.3.1">3rd_RSR</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T1.1.4.3.2">5.0462</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T1.1.4.3.3">2.3946</td> </tr> <tr class="ltx_tr" id="S4.T1.1.5.4"> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_l ltx_border_r ltx_border_t" id="S4.T1.1.5.4.1">4th_RSR</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T1.1.5.4.2">0.9739</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T1.1.5.4.3">0.8384</td> </tr> </tbody> </table> </figure> <figure class="ltx_figure" id="S4.F3"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="349" id="S4.F3.g1" src="x1.png" width="502"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 3: </span>Real-world pushing trajectories across different iterations.</figcaption> </figure> <div class="ltx_para" id="S4.SS2.p2"> <p class="ltx_p" id="S4.SS2.p2.1">The improvement in trajectory alignment is further visualized in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S4.F3" title="Figure 3 ‣ IV-B Block Pushing Experiment ‣ IV Experiments ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_tag">3</span></a>, which shows the block’s trajectories at different training stages. The 1st PPO trajectory (blue) deviates significantly from the simulation trajectory and fails to reach the target. The low friction between the cube and the desk as well as the misalignment between the gripper and the cube causes the gripper to slip along the cube’s surface instead of effectively pushing it, resulting in the cube moving in an unintended direction. In contrast, the RSR-refined trajectories (orange, yellow, and purple) show a progressive improvement in accuracy. By the 4th RSR iteration (purple), the real-world trajectory closely matches the simulation trajectory, and the block successfully reaches the target position (red cross “<math alttext="\times" class="ltx_Math" display="inline" id="S4.SS2.p2.1.m1.1"><semantics id="S4.SS2.p2.1.m1.1a"><mo id="S4.SS2.p2.1.m1.1.1" xref="S4.SS2.p2.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S4.SS2.p2.1.m1.1b"><times id="S4.SS2.p2.1.m1.1.1.cmml" xref="S4.SS2.p2.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p2.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p2.1.m1.1d">×</annotation></semantics></math>”). These results highlight the effectiveness of the RSR approach in closing the sim-to-real gap. By iteratively refining the simulation model and retraining the policy, we achieve a substantial reduction in trajectory divergence and significantly improve the real-world task performance. To ensure the reliability of our results and mitigate potential biases, we repeated the experiments three times. Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S4.F4" title="Figure 4 ‣ IV-B Block Pushing Experiment ‣ IV Experiments ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_tag">4</span></a> presents the mean trajectory error in both the X and Y directions over time, computed as the deviation of real-world trajectories from the simulation reference, averaged over three independent runs. The error trends illustrate how the RSR framework progressively refines the simulation parameters, leading to improved alignment between simulated and real trajectories with similar patterns of error reduction that are observed in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S4.F3" title="Figure 3 ‣ IV-B Block Pushing Experiment ‣ IV Experiments ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_tag">3</span></a>, further supporting the effectiveness of our approach in bridging the sim-to-real gap.</p> </div> <figure class="ltx_figure" id="S4.F4"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="395" id="S4.F4.g1" src="x2.png" width="538"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 4: </span>Mean trajectory error over three trials in the X and Y directions against time for different iterations. </figcaption> </figure> </section> <section class="ltx_subsection" id="S4.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S4.SS3.4.1.1">IV-C</span> </span><span class="ltx_text ltx_font_italic" id="S4.SS3.5.2">T-shaped Block Pushing</span> </h3> <div class="ltx_para" id="S4.SS3.p1"> <p class="ltx_p" id="S4.SS3.p1.1">In this variant of the block-pushing task, the robot is required to push a T-shaped object to a specified target position. This experiment provides a more challenging scenario due to the object’s geometry and the increased difficulty of manipulating irregularly shaped objects. We did the similar experiments as we evaluated in the cube block. Three trials were conducted and the results are shown in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S4.F5" title="Figure 5 ‣ IV-C T-shaped Block Pushing ‣ IV Experiments ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_tag">5</span></a>. The figure shows the yaw error decreasing as the RSR iterations progress. The initial PPO policy (blue) has a high yaw error, while the 2nd RSR iteration (red) starts improving but remains unstable. By the 3rd (yellow) and 4th (purple) iterations, the error is significantly reduced, demonstrating the effectiveness of the RSR pipeline in refining sim-to-real alignment. The similar performance validates the efficiency of the proposed pipeline in different types of tasks.</p> </div> <figure class="ltx_figure" id="S4.F5"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="202" id="S4.F5.g1" src="extracted/6289317/Figure/t_result.png" width="269"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 5: </span>1-<math alttext="\sigma" class="ltx_Math" display="inline" id="S4.F5.2.m1.1"><semantics id="S4.F5.2.m1.1b"><mi id="S4.F5.2.m1.1.1" xref="S4.F5.2.m1.1.1.cmml">σ</mi><annotation-xml encoding="MathML-Content" id="S4.F5.2.m1.1c"><ci id="S4.F5.2.m1.1.1.cmml" xref="S4.F5.2.m1.1.1">𝜎</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.F5.2.m1.1d">\sigma</annotation><annotation encoding="application/x-llamapun" id="S4.F5.2.m1.1e">italic_σ</annotation></semantics></math> bounds of real trajectories for the yaw angle in the T-shaped block pushing trials for different iterations, where the shaded area represents the bounds.</figcaption> </figure> </section> </section> <section class="ltx_section" id="S5"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">V </span><span class="ltx_text ltx_font_smallcaps" id="S5.1.1">Discussion on Incorporating Visual Loss</span> </h2> <div class="ltx_para" id="S5.p1"> <p class="ltx_p" id="S5.p1.1">In our experiments, we explored the possibility of incorporating a visual loss component into the sim-to-real adaptation process. Specifically, we computed the visual loss using the Structural Similarity Index (SSIM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#bib.bib29" title="">29</a>]</cite>) between Gaussian-rendered simulation images and real-world images. This loss was then used as a multiplier to the original sim-to-real loss, aiming to improve domain alignment by encouraging the simulator to produce not only physically but also visually consistent data.</p> </div> <figure class="ltx_figure" id="S5.F6"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_square" height="242" id="S5.F6.g1" src="extracted/6289317/Figure/visual_loss.png" width="299"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 6: </span>Trajectories of real-robot deployment with and without visual loss.</figcaption> </figure> <div class="ltx_para" id="S5.p2"> <p class="ltx_p" id="S5.p2.1">However, as shown in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S5.F6" title="Figure 6 ‣ V Discussion on Incorporating Visual Loss ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_tag">6</span></a>, the results indicate that incorporating the visual loss in this manner does not lead to improved performance. In fact, it appears to introduce instability in the adaptation process. One possible reason for this is that while SSIM effectively captures high-level structural similarity, it does not directly correlate with the physical parameters that affect policy learning. As a result, using SSIM-based visual loss as a weight may amplify discrepancies in ways that are not meaningful for improving real-world task performance. Additionally, since the visual differences can be caused by lighting conditions, reflections, and camera noise—factors that do not directly impact the underlying dynamics—forcing the simulator to match real images too closely might introduce misleading optimization directions.</p> </div> <div class="ltx_para" id="S5.p3"> <p class="ltx_p" id="S5.p3.1">Another critical limitation we observed is the computational cost associated with rendering high-fidelity images for environment parameter tuning. Real-time rendering with physically realistic lighting and textures requires significant GPU resources, making it impractical for iterative training. Given these challenges, we have chosen not to include the visual loss in our current pipeline. Instead, our approach primarily focuses on optimizing the physical parameters that directly affect sim-to-real transfer.</p> </div> </section> <section class="ltx_section" id="S6"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">VI </span><span class="ltx_text ltx_font_smallcaps" id="S6.1.1">Limitations</span> </h2> <div class="ltx_para" id="S6.p1"> <p class="ltx_p" id="S6.p1.1">While the RSR framework demonstrates promising results in bridging the sim-to-real gap, there are certain limitations that need to be addressed in future research and development. Firstly, the speed of the overall algorithm is heavily dependent on the underlying simulation engine (MuJoCo MJX) and the computation framework (JAX). While MJX offers a highly accurate and flexible simulation environment, it requires massive computational resources.</p> </div> <div class="ltx_para" id="S6.p2"> <p class="ltx_p" id="S6.p2.1">On the other hand, in this study we have primarily focused on environmental effects that can be explicitly tuned, such as friction, mass, and elasticity, etc. While these effects are important for sim-to-real transfer, they only represent a subset of the factors that influence the real-world performance of robotic systems. The RSR framework’s current implementation does not account for implicit environmental factors, such as dynamic ground effects or other non-observable environmental variables that may require adjustments at the physical level. In future experiments, we are trying to extend our framework to aerial robots. The dynamics of aerial robots are fundamentally different from ground robots, as they are subject to various environmental effects such as wind, atmospheric pressure, and turbulence.</p> </div> </section> <section class="ltx_section" id="S7"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">VII </span><span class="ltx_text ltx_font_smallcaps" id="S7.1.1">Conclusion and Future Work</span> </h2> <div class="ltx_para" id="S7.p1"> <p class="ltx_p" id="S7.p1.1">In this work, we propose an RSR pipeline to facilitate smoother sim-to-real transfer by leveraging real-world data through two key components: a gap-aware loss function that can be integrated into standard reinforcement learning methods and a parameter tuning process for a differentiable simulator. The approach is evaluated using a 6-DOF robotic arm. Experimental results demonstrate that the proposed loss function effectively reduces the divergence between simulation and reality while improving the performance of simulator-trained policies in real-world deployment through iterative RSR refinements.</p> </div> <div class="ltx_para" id="S7.p2"> <p class="ltx_p" id="S7.p2.1">This work makes a significant contribution to the development of a more robust and efficient pipeline for transferring policies learned in simulation to real robots, advancing the state of the art in sim2real research. Note that the proposed structure is also highly adaptable and can be applied in a general policy transfer scenario, provided that data from the target environment can be collected. This opens up new possibilities for autonomous robots to learn and operate in diverse, previously unexplored environments with minimal additional training, making it a versatile tool for a wide range of robotic applications.</p> </div> <div class="ltx_para" id="S7.p3"> <p class="ltx_p" id="S7.p3.1">Future work will focus on testing the RSR framework with aerial robots, where the effects of the environment—such as wind resistance, ground effects, and turbulence—will be accounted for by implicitly adjusting simulation parameters. We also aim to improve the framework’s ability to adapt to these dynamic environmental changes during the real-time execution of the task.</p> </div> </section> <section class="ltx_bibliography" id="bib"> <h2 class="ltx_title ltx_title_bibliography">References</h2> <ul class="ltx_biblist"> <li class="ltx_bibitem" id="bib.bib1"> <span class="ltx_tag ltx_tag_bibitem">[1]</span> <span class="ltx_bibblock"> T. Goyal, S. Hussain, E. Martinez-Marroquin, N. A. T. Brown, and P. K. Jamwal, “Stiffness-observer-based adaptive control of an intrinsically compliant parallel wrist rehabilitation robot,” <em class="ltx_emph ltx_font_italic" id="bib.bib1.1.1">IEEE Transactions on Human-Machine Systems</em>, vol. 53, no. 1, pp. 65–74, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib2"> <span class="ltx_tag ltx_tag_bibitem">[2]</span> <span class="ltx_bibblock"> S. Höfer, K. Bekris, A. Handa, J. C. Gamboa, M. Mozifian, F. Golemo, C. Atkeson, D. Fox, K. Goldberg, J. Leonard, <em class="ltx_emph ltx_font_italic" id="bib.bib2.1.1">et al.</em>, “Sim2real in robotics and automation: Applications and challenges,” <em class="ltx_emph ltx_font_italic" id="bib.bib2.2.2">IEEE transactions on automation science and engineering</em>, vol. 18, no. 2, pp. 398–400, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib3"> <span class="ltx_tag ltx_tag_bibitem">[3]</span> <span class="ltx_bibblock"> H. Ge, R. Wang, Z.-a. Xu, H. Zhu, R. Deng, Y. Dong, Z. Pang, G. Zhou, J. Zhang, and L. Shi, “Bridging the resource gap: Deploying advanced imitation learning models onto affordable embedded platforms,” <em class="ltx_emph ltx_font_italic" id="bib.bib3.1.1">IEEE International Conference on Robotics and Biomimetics (ROBIO)</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib4"> <span class="ltx_tag ltx_tag_bibitem">[4]</span> <span class="ltx_bibblock"> E. Salvato, G. Fenu, E. Medvet, and F. A. Pellegrino, “Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning,” <em class="ltx_emph ltx_font_italic" id="bib.bib4.1.1">IEEE Access</em>, vol. 9, pp. 153 171–153 187, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib5"> <span class="ltx_tag ltx_tag_bibitem">[5]</span> <span class="ltx_bibblock"> W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-real transfer in deep reinforcement learning for robotics: A survey,” in <em class="ltx_emph ltx_font_italic" id="bib.bib5.1.1">IEEE symposium series on computational intelligence (SSCI)</em>, 2020, pp. 737–744. </span> </li> <li class="ltx_bibitem" id="bib.bib6"> <span class="ltx_tag ltx_tag_bibitem">[6]</span> <span class="ltx_bibblock"> J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in <em class="ltx_emph ltx_font_italic" id="bib.bib6.1.1">IEEE/RSJ international conference on intelligent robots and systems (IROS)</em>, 2017, pp. 23–30. </span> </li> <li class="ltx_bibitem" id="bib.bib7"> <span class="ltx_tag ltx_tag_bibitem">[7]</span> <span class="ltx_bibblock"> K. Bousmalis, A. Irpan, P. Wohlhart, Y. Bai, M. Kelcey, M. Kalakrishnan, L. Downs, J. Ibarz, P. Pastor, K. Konolige, <em class="ltx_emph ltx_font_italic" id="bib.bib7.1.1">et al.</em>, “Using simulation and domain adaptation to improve efficiency of deep robotic grasping,” in <em class="ltx_emph ltx_font_italic" id="bib.bib7.2.2">IEEE international conference on robotics and automation (ICRA)</em>, 2018, pp. 4243–4250. </span> </li> <li class="ltx_bibitem" id="bib.bib8"> <span class="ltx_tag ltx_tag_bibitem">[8]</span> <span class="ltx_bibblock"> Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. March, and V. Lempitsky, “Domain-adversarial training of neural networks,” <em class="ltx_emph ltx_font_italic" id="bib.bib8.1.1">Journal of machine learning research</em>, vol. 17, no. 59, pp. 1–35, 2016. </span> </li> <li class="ltx_bibitem" id="bib.bib9"> <span class="ltx_tag ltx_tag_bibitem">[9]</span> <span class="ltx_bibblock"> K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan, “Domain separation networks,” <em class="ltx_emph ltx_font_italic" id="bib.bib9.1.1">Advances in neural information processing systems</em>, vol. 29, 2016. </span> </li> <li class="ltx_bibitem" id="bib.bib10"> <span class="ltx_tag ltx_tag_bibitem">[10]</span> <span class="ltx_bibblock"> F. Golemo, A. A. Taiga, A. Courville, and P.-Y. Oudeyer, “Sim-to-real transfer with neural-augmented robot simulation,” in <em class="ltx_emph ltx_font_italic" id="bib.bib10.1.1">Conference on Robot Learning</em>.   PMLR, 2018, pp. 817–828. </span> </li> <li class="ltx_bibitem" id="bib.bib11"> <span class="ltx_tag ltx_tag_bibitem">[11]</span> <span class="ltx_bibblock"> E. Heiden, D. Millard, E. Coumans, Y. Sheng, and G. S. Sukhatme, “Neuralsim: Augmenting differentiable simulators with neural networks,” in <em class="ltx_emph ltx_font_italic" id="bib.bib11.1.1">IEEE International Conference on Robotics and Automation (ICRA)</em>, 2021, pp. 9474–9481. </span> </li> <li class="ltx_bibitem" id="bib.bib12"> <span class="ltx_tag ltx_tag_bibitem">[12]</span> <span class="ltx_bibblock"> N. Liu, Y. Cai, T. Lu, R. Wang, and S. Wang, “Real–sim–real transfer for real-world robot control policy learning with deep reinforcement learning,” <em class="ltx_emph ltx_font_italic" id="bib.bib12.1.1">Applied Sciences</em>, vol. 10, no. 5, p. 1555, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib13"> <span class="ltx_tag ltx_tag_bibitem">[13]</span> <span class="ltx_bibblock"> F. Ramos, R. C. Possas, and D. Fox, “Bayessim: Adaptive domain randomization via probabilistic inference for robotics simulators,” <em class="ltx_emph ltx_font_italic" id="bib.bib13.1.1">arXiv preprint arXiv:1906.01728</em>, 2019. </span> </li> <li class="ltx_bibitem" id="bib.bib14"> <span class="ltx_tag ltx_tag_bibitem">[14]</span> <span class="ltx_bibblock"> R. Liu, A. Canberk, S. Song, and C. Vondrick, “Differentiable robot rendering,” in <em class="ltx_emph ltx_font_italic" id="bib.bib14.1.1">8th Annual Conference on Robot Learning</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib15"> <span class="ltx_tag ltx_tag_bibitem">[15]</span> <span class="ltx_bibblock"> R. Newbury, J. Collins, K. He, J. Pan, I. Posner, D. Howard, and A. Cosgun, “A review of differentiable simulators,” <em class="ltx_emph ltx_font_italic" id="bib.bib15.1.1">IEEE Access</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib16"> <span class="ltx_tag ltx_tag_bibitem">[16]</span> <span class="ltx_bibblock"> Y.-L. Qiao, J. Liang, V. Koltun, and M. C. Lin, “Efficient differentiable simulation of articulated bodies,” in <em class="ltx_emph ltx_font_italic" id="bib.bib16.1.1">International Conference on Machine Learning</em>.   PMLR, 2021, pp. 8661–8671. </span> </li> <li class="ltx_bibitem" id="bib.bib17"> <span class="ltx_tag ltx_tag_bibitem">[17]</span> <span class="ltx_bibblock"> J. K. Murthy, M. Macklin, F. Golemo, V. Voleti, L. Petrini, M. Weiss, B. Considine, J. Parent-Lévesque, K. Xie, K. Erleben, <em class="ltx_emph ltx_font_italic" id="bib.bib17.1.1">et al.</em>, “gradsim: Differentiable simulation for system identification and visuomotor control,” in <em class="ltx_emph ltx_font_italic" id="bib.bib17.2.2">International conference on learning representations</em>, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib18"> <span class="ltx_tag ltx_tag_bibitem">[18]</span> <span class="ltx_bibblock"> J. Heeg, Y. Song, and D. Scaramuzza, “Learning quadrotor control from visual features using differentiable simulation,” <em class="ltx_emph ltx_font_italic" id="bib.bib18.1.1">arXiv preprint arXiv:2410.15979</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib19"> <span class="ltx_tag ltx_tag_bibitem">[19]</span> <span class="ltx_bibblock"> H. Xu, Z. Yan, J. Xuan, G. Zhang, and J. Lu, “Improving proximal policy optimization with alpha divergence,” <em class="ltx_emph ltx_font_italic" id="bib.bib19.1.1">Neurocomputing</em>, vol. 534, pp. 94–105, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib20"> <span class="ltx_tag ltx_tag_bibitem">[20]</span> <span class="ltx_bibblock"> C. Xu, J. Ma, and H. Tao, “Batch process control based on reinforcement learning with segmented prioritized experience replay,” <em class="ltx_emph ltx_font_italic" id="bib.bib20.1.1">Measurement Science and Technology</em>, vol. 35, no. 5, p. 056202, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib21"> <span class="ltx_tag ltx_tag_bibitem">[21]</span> <span class="ltx_bibblock"> C. Liao, Y. Wang, X. Ding, Y. Ren, X. Duan, and J. He, “Performance comparison of typical physics engines using robot models with multiple joints,” <em class="ltx_emph ltx_font_italic" id="bib.bib21.1.1">IEEE Robotics and Automation Letters</em>, vol. 8, no. 11, pp. 7520–7526, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib22"> <span class="ltx_tag ltx_tag_bibitem">[22]</span> <span class="ltx_bibblock"> N. L. Leech and C. Donovan, <em class="ltx_emph ltx_font_italic" id="bib.bib22.1.1">Mixed methods sampling and data collection</em>.   International Encyclopedia of Education, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib23"> <span class="ltx_tag ltx_tag_bibitem">[23]</span> <span class="ltx_bibblock"> A. Kumar, A. S. Siddiqui, M. S. Mustafa, E. Hussam, H. M. Aljohani, and F. A. Almulhim, “Mean estimation using an efficient class of estimators based on simple random sampling: Simulation and applications,” <em class="ltx_emph ltx_font_italic" id="bib.bib23.1.1">Alexandria Engineering Journal</em>, vol. 91, pp. 197–203, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib24"> <span class="ltx_tag ltx_tag_bibitem">[24]</span> <span class="ltx_bibblock"> S. R. Lindemann, A. Yershova, and S. M. LaValle, “Incremental grid sampling strategies in robotics,” in <em class="ltx_emph ltx_font_italic" id="bib.bib24.1.1">Algorithmic Foundations of Robotics VI</em>.   Springer, 2005, pp. 313–328. </span> </li> <li class="ltx_bibitem" id="bib.bib25"> <span class="ltx_tag ltx_tag_bibitem">[25]</span> <span class="ltx_bibblock"> R. B. Ash, <em class="ltx_emph ltx_font_italic" id="bib.bib25.1.1">Information theory</em>.   Courier Corporation, 2012. </span> </li> <li class="ltx_bibitem" id="bib.bib26"> <span class="ltx_tag ltx_tag_bibitem">[26]</span> <span class="ltx_bibblock"> J. Gao, Z. Wang, T. Jin, J. Cheng, Z. Lei, and S. Gao, “Information gain ratio-based subfeature grouping empowers particle swarm optimization for feature selection,” <em class="ltx_emph ltx_font_italic" id="bib.bib26.1.1">Knowledge-Based Systems</em>, vol. 286, p. 111380, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib27"> <span class="ltx_tag ltx_tag_bibitem">[27]</span> <span class="ltx_bibblock"> Y.-C. Chen, “A tutorial on kernel density estimation and recent advances,” <em class="ltx_emph ltx_font_italic" id="bib.bib27.1.1">Biostatistics &amp; Epidemiology</em>, vol. 1, no. 1, pp. 161–187, 2017. </span> </li> <li class="ltx_bibitem" id="bib.bib28"> <span class="ltx_tag ltx_tag_bibitem">[28]</span> <span class="ltx_bibblock"> Y. Jia, G. Wang, Y. Dong, J. Wu, Y. Zeng, H. Ge, K. Ding, Z. Yan, W. Gu, C. Li, Z. Wang, Y. Cheng, W. Sui, R. Huang, and G. Zhou, “DISCOVERSE: Efficient robot simulation in complex high-fidelity environments,” 2024. [Online]. Available: <span class="ltx_ref ltx_nolink ltx_url ltx_ref_self">https://air-discoverse.github.io/</span> </span> </li> <li class="ltx_bibitem" id="bib.bib29"> <span class="ltx_tag ltx_tag_bibitem">[29]</span> <span class="ltx_bibblock"> U. Sara, M. Akter, and M. S. Uddin, “Image quality assessment through FSIM, SSIM, MSE and PSNR—a comparative study,” <em class="ltx_emph ltx_font_italic" id="bib.bib29.1.1">Journal of Computer and Communications</em>, vol. 7, no. 3, pp. 8–18, 2019. </span> </li> </ul> </section> <section class="ltx_section" id="S8"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">VIII </span><span class="ltx_text ltx_font_smallcaps" id="S8.1.1">Appendix</span> </h2> <section class="ltx_subsection" id="S8.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S8.SS1.4.1.1">VIII-A</span> </span><span class="ltx_text ltx_font_italic" id="S8.SS1.5.2">Algorithm</span> </h3> <div class="ltx_para" id="S8.SS1.p1"> <p class="ltx_p" id="S8.SS1.p1.1">The whole framework can be illustrated as in the following algorithm:</p> </div> <figure class="ltx_float ltx_float_algorithm ltx_framed ltx_framed_top" id="alg1"> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_float"><span class="ltx_text ltx_font_bold" id="alg1.2.1.1">Algorithm 1</span> </span> The RSR Loop Framework</figcaption> <div class="ltx_listing ltx_listing" id="alg1.3"> <div class="ltx_listingline" id="alg1.l1"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l1.1.1.1" style="font-size:80%;">1:</span></span>  <span class="ltx_text ltx_font_bold" id="alg1.l1.2">Initialize:</span> Initial policy <math alttext="\pi_{0}" class="ltx_Math" display="inline" id="alg1.l1.m1.1"><semantics id="alg1.l1.m1.1a"><msub id="alg1.l1.m1.1.1" xref="alg1.l1.m1.1.1.cmml"><mi id="alg1.l1.m1.1.1.2" xref="alg1.l1.m1.1.1.2.cmml">π</mi><mn id="alg1.l1.m1.1.1.3" xref="alg1.l1.m1.1.1.3.cmml">0</mn></msub><annotation-xml encoding="MathML-Content" id="alg1.l1.m1.1b"><apply id="alg1.l1.m1.1.1.cmml" xref="alg1.l1.m1.1.1"><csymbol cd="ambiguous" id="alg1.l1.m1.1.1.1.cmml" xref="alg1.l1.m1.1.1">subscript</csymbol><ci id="alg1.l1.m1.1.1.2.cmml" xref="alg1.l1.m1.1.1.2">𝜋</ci><cn id="alg1.l1.m1.1.1.3.cmml" type="integer" xref="alg1.l1.m1.1.1.3">0</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l1.m1.1c">\pi_{0}</annotation><annotation encoding="application/x-llamapun" id="alg1.l1.m1.1d">italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT</annotation></semantics></math>, environment parameters <math alttext="\theta_{0}" class="ltx_Math" display="inline" id="alg1.l1.m2.1"><semantics id="alg1.l1.m2.1a"><msub id="alg1.l1.m2.1.1" xref="alg1.l1.m2.1.1.cmml"><mi id="alg1.l1.m2.1.1.2" xref="alg1.l1.m2.1.1.2.cmml">θ</mi><mn id="alg1.l1.m2.1.1.3" xref="alg1.l1.m2.1.1.3.cmml">0</mn></msub><annotation-xml encoding="MathML-Content" id="alg1.l1.m2.1b"><apply id="alg1.l1.m2.1.1.cmml" xref="alg1.l1.m2.1.1"><csymbol cd="ambiguous" id="alg1.l1.m2.1.1.1.cmml" xref="alg1.l1.m2.1.1">subscript</csymbol><ci id="alg1.l1.m2.1.1.2.cmml" xref="alg1.l1.m2.1.1.2">𝜃</ci><cn id="alg1.l1.m2.1.1.3.cmml" type="integer" xref="alg1.l1.m2.1.1.3">0</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l1.m2.1c">\theta_{0}</annotation><annotation encoding="application/x-llamapun" id="alg1.l1.m2.1d">italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT</annotation></semantics></math>, and initial real dataset <math alttext="\mathcal{D}_{real}^{0}" class="ltx_Math" display="inline" id="alg1.l1.m3.1"><semantics id="alg1.l1.m3.1a"><msubsup id="alg1.l1.m3.1.1" xref="alg1.l1.m3.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="alg1.l1.m3.1.1.2.2" xref="alg1.l1.m3.1.1.2.2.cmml">𝒟</mi><mrow id="alg1.l1.m3.1.1.2.3" xref="alg1.l1.m3.1.1.2.3.cmml"><mi id="alg1.l1.m3.1.1.2.3.2" xref="alg1.l1.m3.1.1.2.3.2.cmml">r</mi><mo id="alg1.l1.m3.1.1.2.3.1" xref="alg1.l1.m3.1.1.2.3.1.cmml">⁢</mo><mi id="alg1.l1.m3.1.1.2.3.3" xref="alg1.l1.m3.1.1.2.3.3.cmml">e</mi><mo id="alg1.l1.m3.1.1.2.3.1a" xref="alg1.l1.m3.1.1.2.3.1.cmml">⁢</mo><mi id="alg1.l1.m3.1.1.2.3.4" xref="alg1.l1.m3.1.1.2.3.4.cmml">a</mi><mo id="alg1.l1.m3.1.1.2.3.1b" xref="alg1.l1.m3.1.1.2.3.1.cmml">⁢</mo><mi id="alg1.l1.m3.1.1.2.3.5" xref="alg1.l1.m3.1.1.2.3.5.cmml">l</mi></mrow><mn id="alg1.l1.m3.1.1.3" xref="alg1.l1.m3.1.1.3.cmml">0</mn></msubsup><annotation-xml encoding="MathML-Content" id="alg1.l1.m3.1b"><apply id="alg1.l1.m3.1.1.cmml" xref="alg1.l1.m3.1.1"><csymbol cd="ambiguous" id="alg1.l1.m3.1.1.1.cmml" xref="alg1.l1.m3.1.1">superscript</csymbol><apply id="alg1.l1.m3.1.1.2.cmml" xref="alg1.l1.m3.1.1"><csymbol cd="ambiguous" id="alg1.l1.m3.1.1.2.1.cmml" xref="alg1.l1.m3.1.1">subscript</csymbol><ci id="alg1.l1.m3.1.1.2.2.cmml" xref="alg1.l1.m3.1.1.2.2">𝒟</ci><apply id="alg1.l1.m3.1.1.2.3.cmml" xref="alg1.l1.m3.1.1.2.3"><times id="alg1.l1.m3.1.1.2.3.1.cmml" xref="alg1.l1.m3.1.1.2.3.1"></times><ci id="alg1.l1.m3.1.1.2.3.2.cmml" xref="alg1.l1.m3.1.1.2.3.2">𝑟</ci><ci id="alg1.l1.m3.1.1.2.3.3.cmml" xref="alg1.l1.m3.1.1.2.3.3">𝑒</ci><ci id="alg1.l1.m3.1.1.2.3.4.cmml" xref="alg1.l1.m3.1.1.2.3.4">𝑎</ci><ci id="alg1.l1.m3.1.1.2.3.5.cmml" xref="alg1.l1.m3.1.1.2.3.5">𝑙</ci></apply></apply><cn id="alg1.l1.m3.1.1.3.cmml" type="integer" xref="alg1.l1.m3.1.1.3">0</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l1.m3.1c">\mathcal{D}_{real}^{0}</annotation><annotation encoding="application/x-llamapun" id="alg1.l1.m3.1d">caligraphic_D start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT</annotation></semantics></math> collected from the deployment of <math alttext="\pi_{0}" class="ltx_Math" display="inline" id="alg1.l1.m4.1"><semantics id="alg1.l1.m4.1a"><msub id="alg1.l1.m4.1.1" xref="alg1.l1.m4.1.1.cmml"><mi id="alg1.l1.m4.1.1.2" xref="alg1.l1.m4.1.1.2.cmml">π</mi><mn id="alg1.l1.m4.1.1.3" xref="alg1.l1.m4.1.1.3.cmml">0</mn></msub><annotation-xml encoding="MathML-Content" id="alg1.l1.m4.1b"><apply id="alg1.l1.m4.1.1.cmml" xref="alg1.l1.m4.1.1"><csymbol cd="ambiguous" id="alg1.l1.m4.1.1.1.cmml" xref="alg1.l1.m4.1.1">subscript</csymbol><ci id="alg1.l1.m4.1.1.2.cmml" xref="alg1.l1.m4.1.1.2">𝜋</ci><cn id="alg1.l1.m4.1.1.3.cmml" type="integer" xref="alg1.l1.m4.1.1.3">0</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l1.m4.1c">\pi_{0}</annotation><annotation encoding="application/x-llamapun" id="alg1.l1.m4.1d">italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT</annotation></semantics></math> on the real robot. </div> <div class="ltx_listingline" id="alg1.l2"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l2.1.1.1" style="font-size:80%;">2:</span></span>  <span class="ltx_text ltx_font_bold" id="alg1.l2.2">while</span> not converged, at iteration <math alttext="k" class="ltx_Math" display="inline" id="alg1.l2.m1.1"><semantics id="alg1.l2.m1.1a"><mi id="alg1.l2.m1.1.1" xref="alg1.l2.m1.1.1.cmml">k</mi><annotation-xml encoding="MathML-Content" id="alg1.l2.m1.1b"><ci id="alg1.l2.m1.1.1.cmml" xref="alg1.l2.m1.1.1">𝑘</ci></annotation-xml><annotation encoding="application/x-tex" id="alg1.l2.m1.1c">k</annotation><annotation encoding="application/x-llamapun" id="alg1.l2.m1.1d">italic_k</annotation></semantics></math> <span class="ltx_text ltx_font_bold" id="alg1.l2.3">do</span> </div> <div class="ltx_listingline" id="alg1.l3"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l3.1.1.1" style="font-size:80%;">3:</span></span>     <span class="ltx_text ltx_font_bold" id="alg1.l3.2">Step 1:</span> <span class="ltx_text ltx_font_bold" id="alg1.l3.3">Environment Parameter Tuning</span> </div> <div class="ltx_listingline" id="alg1.l4"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l4.1.1.1" style="font-size:80%;">4:</span></span>      Initialize <math alttext="\theta_{i=0,k-1}=\theta_{k-1}" class="ltx_Math" display="inline" id="alg1.l4.m1.2"><semantics id="alg1.l4.m1.2a"><mrow id="alg1.l4.m1.2.3" xref="alg1.l4.m1.2.3.cmml"><msub id="alg1.l4.m1.2.3.2" xref="alg1.l4.m1.2.3.2.cmml"><mi id="alg1.l4.m1.2.3.2.2" xref="alg1.l4.m1.2.3.2.2.cmml">θ</mi><mrow id="alg1.l4.m1.2.2.2" xref="alg1.l4.m1.2.2.2.cmml"><mi id="alg1.l4.m1.2.2.2.4" xref="alg1.l4.m1.2.2.2.4.cmml">i</mi><mo id="alg1.l4.m1.2.2.2.3" xref="alg1.l4.m1.2.2.2.3.cmml">=</mo><mrow id="alg1.l4.m1.2.2.2.2.1" xref="alg1.l4.m1.2.2.2.2.2.cmml"><mn id="alg1.l4.m1.1.1.1.1" xref="alg1.l4.m1.1.1.1.1.cmml">0</mn><mo id="alg1.l4.m1.2.2.2.2.1.2" xref="alg1.l4.m1.2.2.2.2.2.cmml">,</mo><mrow id="alg1.l4.m1.2.2.2.2.1.1" xref="alg1.l4.m1.2.2.2.2.1.1.cmml"><mi id="alg1.l4.m1.2.2.2.2.1.1.2" xref="alg1.l4.m1.2.2.2.2.1.1.2.cmml">k</mi><mo id="alg1.l4.m1.2.2.2.2.1.1.1" xref="alg1.l4.m1.2.2.2.2.1.1.1.cmml">−</mo><mn id="alg1.l4.m1.2.2.2.2.1.1.3" xref="alg1.l4.m1.2.2.2.2.1.1.3.cmml">1</mn></mrow></mrow></mrow></msub><mo id="alg1.l4.m1.2.3.1" xref="alg1.l4.m1.2.3.1.cmml">=</mo><msub id="alg1.l4.m1.2.3.3" xref="alg1.l4.m1.2.3.3.cmml"><mi id="alg1.l4.m1.2.3.3.2" xref="alg1.l4.m1.2.3.3.2.cmml">θ</mi><mrow id="alg1.l4.m1.2.3.3.3" xref="alg1.l4.m1.2.3.3.3.cmml"><mi id="alg1.l4.m1.2.3.3.3.2" xref="alg1.l4.m1.2.3.3.3.2.cmml">k</mi><mo id="alg1.l4.m1.2.3.3.3.1" xref="alg1.l4.m1.2.3.3.3.1.cmml">−</mo><mn id="alg1.l4.m1.2.3.3.3.3" xref="alg1.l4.m1.2.3.3.3.3.cmml">1</mn></mrow></msub></mrow><annotation-xml encoding="MathML-Content" id="alg1.l4.m1.2b"><apply id="alg1.l4.m1.2.3.cmml" xref="alg1.l4.m1.2.3"><eq id="alg1.l4.m1.2.3.1.cmml" xref="alg1.l4.m1.2.3.1"></eq><apply id="alg1.l4.m1.2.3.2.cmml" xref="alg1.l4.m1.2.3.2"><csymbol cd="ambiguous" id="alg1.l4.m1.2.3.2.1.cmml" xref="alg1.l4.m1.2.3.2">subscript</csymbol><ci id="alg1.l4.m1.2.3.2.2.cmml" xref="alg1.l4.m1.2.3.2.2">𝜃</ci><apply id="alg1.l4.m1.2.2.2.cmml" xref="alg1.l4.m1.2.2.2"><eq id="alg1.l4.m1.2.2.2.3.cmml" xref="alg1.l4.m1.2.2.2.3"></eq><ci id="alg1.l4.m1.2.2.2.4.cmml" xref="alg1.l4.m1.2.2.2.4">𝑖</ci><list id="alg1.l4.m1.2.2.2.2.2.cmml" xref="alg1.l4.m1.2.2.2.2.1"><cn id="alg1.l4.m1.1.1.1.1.cmml" type="integer" xref="alg1.l4.m1.1.1.1.1">0</cn><apply id="alg1.l4.m1.2.2.2.2.1.1.cmml" xref="alg1.l4.m1.2.2.2.2.1.1"><minus id="alg1.l4.m1.2.2.2.2.1.1.1.cmml" xref="alg1.l4.m1.2.2.2.2.1.1.1"></minus><ci id="alg1.l4.m1.2.2.2.2.1.1.2.cmml" xref="alg1.l4.m1.2.2.2.2.1.1.2">𝑘</ci><cn id="alg1.l4.m1.2.2.2.2.1.1.3.cmml" type="integer" xref="alg1.l4.m1.2.2.2.2.1.1.3">1</cn></apply></list></apply></apply><apply id="alg1.l4.m1.2.3.3.cmml" xref="alg1.l4.m1.2.3.3"><csymbol cd="ambiguous" id="alg1.l4.m1.2.3.3.1.cmml" xref="alg1.l4.m1.2.3.3">subscript</csymbol><ci id="alg1.l4.m1.2.3.3.2.cmml" xref="alg1.l4.m1.2.3.3.2">𝜃</ci><apply id="alg1.l4.m1.2.3.3.3.cmml" xref="alg1.l4.m1.2.3.3.3"><minus id="alg1.l4.m1.2.3.3.3.1.cmml" xref="alg1.l4.m1.2.3.3.3.1"></minus><ci id="alg1.l4.m1.2.3.3.3.2.cmml" xref="alg1.l4.m1.2.3.3.3.2">𝑘</ci><cn id="alg1.l4.m1.2.3.3.3.3.cmml" type="integer" xref="alg1.l4.m1.2.3.3.3.3">1</cn></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l4.m1.2c">\theta_{i=0,k-1}=\theta_{k-1}</annotation><annotation encoding="application/x-llamapun" id="alg1.l4.m1.2d">italic_θ start_POSTSUBSCRIPT italic_i = 0 , italic_k - 1 end_POSTSUBSCRIPT = italic_θ start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT</annotation></semantics></math>, compute the corresponding loss function <math alttext="\mathcal{L}(\theta_{i,k-1})" class="ltx_Math" display="inline" id="alg1.l4.m2.3"><semantics id="alg1.l4.m2.3a"><mrow id="alg1.l4.m2.3.3" xref="alg1.l4.m2.3.3.cmml"><mi class="ltx_font_mathcaligraphic" id="alg1.l4.m2.3.3.3" xref="alg1.l4.m2.3.3.3.cmml">ℒ</mi><mo id="alg1.l4.m2.3.3.2" xref="alg1.l4.m2.3.3.2.cmml">⁢</mo><mrow id="alg1.l4.m2.3.3.1.1" xref="alg1.l4.m2.3.3.1.1.1.cmml"><mo id="alg1.l4.m2.3.3.1.1.2" stretchy="false" xref="alg1.l4.m2.3.3.1.1.1.cmml">(</mo><msub id="alg1.l4.m2.3.3.1.1.1" xref="alg1.l4.m2.3.3.1.1.1.cmml"><mi id="alg1.l4.m2.3.3.1.1.1.2" xref="alg1.l4.m2.3.3.1.1.1.2.cmml">θ</mi><mrow id="alg1.l4.m2.2.2.2.2" xref="alg1.l4.m2.2.2.2.3.cmml"><mi id="alg1.l4.m2.1.1.1.1" xref="alg1.l4.m2.1.1.1.1.cmml">i</mi><mo id="alg1.l4.m2.2.2.2.2.2" xref="alg1.l4.m2.2.2.2.3.cmml">,</mo><mrow id="alg1.l4.m2.2.2.2.2.1" xref="alg1.l4.m2.2.2.2.2.1.cmml"><mi id="alg1.l4.m2.2.2.2.2.1.2" xref="alg1.l4.m2.2.2.2.2.1.2.cmml">k</mi><mo id="alg1.l4.m2.2.2.2.2.1.1" xref="alg1.l4.m2.2.2.2.2.1.1.cmml">−</mo><mn id="alg1.l4.m2.2.2.2.2.1.3" xref="alg1.l4.m2.2.2.2.2.1.3.cmml">1</mn></mrow></mrow></msub><mo id="alg1.l4.m2.3.3.1.1.3" stretchy="false" xref="alg1.l4.m2.3.3.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg1.l4.m2.3b"><apply id="alg1.l4.m2.3.3.cmml" xref="alg1.l4.m2.3.3"><times id="alg1.l4.m2.3.3.2.cmml" xref="alg1.l4.m2.3.3.2"></times><ci id="alg1.l4.m2.3.3.3.cmml" xref="alg1.l4.m2.3.3.3">ℒ</ci><apply id="alg1.l4.m2.3.3.1.1.1.cmml" xref="alg1.l4.m2.3.3.1.1"><csymbol cd="ambiguous" id="alg1.l4.m2.3.3.1.1.1.1.cmml" xref="alg1.l4.m2.3.3.1.1">subscript</csymbol><ci id="alg1.l4.m2.3.3.1.1.1.2.cmml" xref="alg1.l4.m2.3.3.1.1.1.2">𝜃</ci><list id="alg1.l4.m2.2.2.2.3.cmml" xref="alg1.l4.m2.2.2.2.2"><ci id="alg1.l4.m2.1.1.1.1.cmml" xref="alg1.l4.m2.1.1.1.1">𝑖</ci><apply id="alg1.l4.m2.2.2.2.2.1.cmml" xref="alg1.l4.m2.2.2.2.2.1"><minus id="alg1.l4.m2.2.2.2.2.1.1.cmml" xref="alg1.l4.m2.2.2.2.2.1.1"></minus><ci id="alg1.l4.m2.2.2.2.2.1.2.cmml" xref="alg1.l4.m2.2.2.2.2.1.2">𝑘</ci><cn id="alg1.l4.m2.2.2.2.2.1.3.cmml" type="integer" xref="alg1.l4.m2.2.2.2.2.1.3">1</cn></apply></list></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l4.m2.3c">\mathcal{L}(\theta_{i,k-1})</annotation><annotation encoding="application/x-llamapun" id="alg1.l4.m2.3d">caligraphic_L ( italic_θ start_POSTSUBSCRIPT italic_i , italic_k - 1 end_POSTSUBSCRIPT )</annotation></semantics></math> <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S3.E2" title="In III-B Simulation-Env Parameter Tuning Process ‣ III Method ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_tag">2</span></a> for each parameter <math alttext="\theta_{i,k-1}" class="ltx_Math" display="inline" id="alg1.l4.m3.2"><semantics id="alg1.l4.m3.2a"><msub id="alg1.l4.m3.2.3" xref="alg1.l4.m3.2.3.cmml"><mi id="alg1.l4.m3.2.3.2" xref="alg1.l4.m3.2.3.2.cmml">θ</mi><mrow id="alg1.l4.m3.2.2.2.2" xref="alg1.l4.m3.2.2.2.3.cmml"><mi id="alg1.l4.m3.1.1.1.1" xref="alg1.l4.m3.1.1.1.1.cmml">i</mi><mo id="alg1.l4.m3.2.2.2.2.2" xref="alg1.l4.m3.2.2.2.3.cmml">,</mo><mrow id="alg1.l4.m3.2.2.2.2.1" xref="alg1.l4.m3.2.2.2.2.1.cmml"><mi id="alg1.l4.m3.2.2.2.2.1.2" xref="alg1.l4.m3.2.2.2.2.1.2.cmml">k</mi><mo id="alg1.l4.m3.2.2.2.2.1.1" xref="alg1.l4.m3.2.2.2.2.1.1.cmml">−</mo><mn id="alg1.l4.m3.2.2.2.2.1.3" xref="alg1.l4.m3.2.2.2.2.1.3.cmml">1</mn></mrow></mrow></msub><annotation-xml encoding="MathML-Content" id="alg1.l4.m3.2b"><apply id="alg1.l4.m3.2.3.cmml" xref="alg1.l4.m3.2.3"><csymbol cd="ambiguous" id="alg1.l4.m3.2.3.1.cmml" xref="alg1.l4.m3.2.3">subscript</csymbol><ci id="alg1.l4.m3.2.3.2.cmml" xref="alg1.l4.m3.2.3.2">𝜃</ci><list id="alg1.l4.m3.2.2.2.3.cmml" xref="alg1.l4.m3.2.2.2.2"><ci id="alg1.l4.m3.1.1.1.1.cmml" xref="alg1.l4.m3.1.1.1.1">𝑖</ci><apply id="alg1.l4.m3.2.2.2.2.1.cmml" xref="alg1.l4.m3.2.2.2.2.1"><minus id="alg1.l4.m3.2.2.2.2.1.1.cmml" xref="alg1.l4.m3.2.2.2.2.1.1"></minus><ci id="alg1.l4.m3.2.2.2.2.1.2.cmml" xref="alg1.l4.m3.2.2.2.2.1.2">𝑘</ci><cn id="alg1.l4.m3.2.2.2.2.1.3.cmml" type="integer" xref="alg1.l4.m3.2.2.2.2.1.3">1</cn></apply></list></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l4.m3.2c">\theta_{i,k-1}</annotation><annotation encoding="application/x-llamapun" id="alg1.l4.m3.2d">italic_θ start_POSTSUBSCRIPT italic_i , italic_k - 1 end_POSTSUBSCRIPT</annotation></semantics></math> with the new real dataset <math alttext="\mathcal{D}_{real}^{k}" class="ltx_Math" display="inline" id="alg1.l4.m4.1"><semantics id="alg1.l4.m4.1a"><msubsup id="alg1.l4.m4.1.1" xref="alg1.l4.m4.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="alg1.l4.m4.1.1.2.2" xref="alg1.l4.m4.1.1.2.2.cmml">𝒟</mi><mrow id="alg1.l4.m4.1.1.2.3" xref="alg1.l4.m4.1.1.2.3.cmml"><mi id="alg1.l4.m4.1.1.2.3.2" xref="alg1.l4.m4.1.1.2.3.2.cmml">r</mi><mo id="alg1.l4.m4.1.1.2.3.1" xref="alg1.l4.m4.1.1.2.3.1.cmml">⁢</mo><mi id="alg1.l4.m4.1.1.2.3.3" xref="alg1.l4.m4.1.1.2.3.3.cmml">e</mi><mo id="alg1.l4.m4.1.1.2.3.1a" xref="alg1.l4.m4.1.1.2.3.1.cmml">⁢</mo><mi id="alg1.l4.m4.1.1.2.3.4" xref="alg1.l4.m4.1.1.2.3.4.cmml">a</mi><mo id="alg1.l4.m4.1.1.2.3.1b" xref="alg1.l4.m4.1.1.2.3.1.cmml">⁢</mo><mi id="alg1.l4.m4.1.1.2.3.5" xref="alg1.l4.m4.1.1.2.3.5.cmml">l</mi></mrow><mi id="alg1.l4.m4.1.1.3" xref="alg1.l4.m4.1.1.3.cmml">k</mi></msubsup><annotation-xml encoding="MathML-Content" id="alg1.l4.m4.1b"><apply id="alg1.l4.m4.1.1.cmml" xref="alg1.l4.m4.1.1"><csymbol cd="ambiguous" id="alg1.l4.m4.1.1.1.cmml" xref="alg1.l4.m4.1.1">superscript</csymbol><apply id="alg1.l4.m4.1.1.2.cmml" xref="alg1.l4.m4.1.1"><csymbol cd="ambiguous" id="alg1.l4.m4.1.1.2.1.cmml" xref="alg1.l4.m4.1.1">subscript</csymbol><ci id="alg1.l4.m4.1.1.2.2.cmml" xref="alg1.l4.m4.1.1.2.2">𝒟</ci><apply id="alg1.l4.m4.1.1.2.3.cmml" xref="alg1.l4.m4.1.1.2.3"><times id="alg1.l4.m4.1.1.2.3.1.cmml" xref="alg1.l4.m4.1.1.2.3.1"></times><ci id="alg1.l4.m4.1.1.2.3.2.cmml" xref="alg1.l4.m4.1.1.2.3.2">𝑟</ci><ci id="alg1.l4.m4.1.1.2.3.3.cmml" xref="alg1.l4.m4.1.1.2.3.3">𝑒</ci><ci id="alg1.l4.m4.1.1.2.3.4.cmml" xref="alg1.l4.m4.1.1.2.3.4">𝑎</ci><ci id="alg1.l4.m4.1.1.2.3.5.cmml" xref="alg1.l4.m4.1.1.2.3.5">𝑙</ci></apply></apply><ci id="alg1.l4.m4.1.1.3.cmml" xref="alg1.l4.m4.1.1.3">𝑘</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l4.m4.1c">\mathcal{D}_{real}^{k}</annotation><annotation encoding="application/x-llamapun" id="alg1.l4.m4.1d">caligraphic_D start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT</annotation></semantics></math>. Update the environment parameters <math alttext="\theta_{i,k-1}" class="ltx_Math" display="inline" id="alg1.l4.m5.2"><semantics id="alg1.l4.m5.2a"><msub id="alg1.l4.m5.2.3" xref="alg1.l4.m5.2.3.cmml"><mi id="alg1.l4.m5.2.3.2" xref="alg1.l4.m5.2.3.2.cmml">θ</mi><mrow id="alg1.l4.m5.2.2.2.2" xref="alg1.l4.m5.2.2.2.3.cmml"><mi id="alg1.l4.m5.1.1.1.1" xref="alg1.l4.m5.1.1.1.1.cmml">i</mi><mo id="alg1.l4.m5.2.2.2.2.2" xref="alg1.l4.m5.2.2.2.3.cmml">,</mo><mrow id="alg1.l4.m5.2.2.2.2.1" xref="alg1.l4.m5.2.2.2.2.1.cmml"><mi id="alg1.l4.m5.2.2.2.2.1.2" xref="alg1.l4.m5.2.2.2.2.1.2.cmml">k</mi><mo id="alg1.l4.m5.2.2.2.2.1.1" xref="alg1.l4.m5.2.2.2.2.1.1.cmml">−</mo><mn id="alg1.l4.m5.2.2.2.2.1.3" xref="alg1.l4.m5.2.2.2.2.1.3.cmml">1</mn></mrow></mrow></msub><annotation-xml encoding="MathML-Content" id="alg1.l4.m5.2b"><apply id="alg1.l4.m5.2.3.cmml" xref="alg1.l4.m5.2.3"><csymbol cd="ambiguous" id="alg1.l4.m5.2.3.1.cmml" xref="alg1.l4.m5.2.3">subscript</csymbol><ci id="alg1.l4.m5.2.3.2.cmml" xref="alg1.l4.m5.2.3.2">𝜃</ci><list id="alg1.l4.m5.2.2.2.3.cmml" xref="alg1.l4.m5.2.2.2.2"><ci id="alg1.l4.m5.1.1.1.1.cmml" xref="alg1.l4.m5.1.1.1.1">𝑖</ci><apply id="alg1.l4.m5.2.2.2.2.1.cmml" xref="alg1.l4.m5.2.2.2.2.1"><minus id="alg1.l4.m5.2.2.2.2.1.1.cmml" xref="alg1.l4.m5.2.2.2.2.1.1"></minus><ci id="alg1.l4.m5.2.2.2.2.1.2.cmml" xref="alg1.l4.m5.2.2.2.2.1.2">𝑘</ci><cn id="alg1.l4.m5.2.2.2.2.1.3.cmml" type="integer" xref="alg1.l4.m5.2.2.2.2.1.3">1</cn></apply></list></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l4.m5.2c">\theta_{i,k-1}</annotation><annotation encoding="application/x-llamapun" id="alg1.l4.m5.2d">italic_θ start_POSTSUBSCRIPT italic_i , italic_k - 1 end_POSTSUBSCRIPT</annotation></semantics></math> using gradient descent <table class="ltx_equation ltx_eqn_table" id="S8.Ex5"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\theta_{i,k-1}\leftarrow\theta_{i-1,k-1}-\alpha\nabla_{\theta}\mathcal{L}(% \theta_{i-1,k-1})" class="ltx_Math" display="block" id="S8.Ex5.m1.7"><semantics id="S8.Ex5.m1.7a"><mrow id="S8.Ex5.m1.7.7" xref="S8.Ex5.m1.7.7.cmml"><msub id="S8.Ex5.m1.7.7.3" xref="S8.Ex5.m1.7.7.3.cmml"><mi id="S8.Ex5.m1.7.7.3.2" xref="S8.Ex5.m1.7.7.3.2.cmml">θ</mi><mrow id="S8.Ex5.m1.2.2.2.2" xref="S8.Ex5.m1.2.2.2.3.cmml"><mi id="S8.Ex5.m1.1.1.1.1" xref="S8.Ex5.m1.1.1.1.1.cmml">i</mi><mo id="S8.Ex5.m1.2.2.2.2.2" xref="S8.Ex5.m1.2.2.2.3.cmml">,</mo><mrow id="S8.Ex5.m1.2.2.2.2.1" xref="S8.Ex5.m1.2.2.2.2.1.cmml"><mi id="S8.Ex5.m1.2.2.2.2.1.2" xref="S8.Ex5.m1.2.2.2.2.1.2.cmml">k</mi><mo id="S8.Ex5.m1.2.2.2.2.1.1" xref="S8.Ex5.m1.2.2.2.2.1.1.cmml">−</mo><mn id="S8.Ex5.m1.2.2.2.2.1.3" xref="S8.Ex5.m1.2.2.2.2.1.3.cmml">1</mn></mrow></mrow></msub><mo id="S8.Ex5.m1.7.7.2" stretchy="false" xref="S8.Ex5.m1.7.7.2.cmml">←</mo><mrow id="S8.Ex5.m1.7.7.1" xref="S8.Ex5.m1.7.7.1.cmml"><msub id="S8.Ex5.m1.7.7.1.3" xref="S8.Ex5.m1.7.7.1.3.cmml"><mi id="S8.Ex5.m1.7.7.1.3.2" xref="S8.Ex5.m1.7.7.1.3.2.cmml">θ</mi><mrow id="S8.Ex5.m1.4.4.2.2" xref="S8.Ex5.m1.4.4.2.3.cmml"><mrow id="S8.Ex5.m1.3.3.1.1.1" xref="S8.Ex5.m1.3.3.1.1.1.cmml"><mi id="S8.Ex5.m1.3.3.1.1.1.2" xref="S8.Ex5.m1.3.3.1.1.1.2.cmml">i</mi><mo id="S8.Ex5.m1.3.3.1.1.1.1" xref="S8.Ex5.m1.3.3.1.1.1.1.cmml">−</mo><mn id="S8.Ex5.m1.3.3.1.1.1.3" xref="S8.Ex5.m1.3.3.1.1.1.3.cmml">1</mn></mrow><mo id="S8.Ex5.m1.4.4.2.2.3" xref="S8.Ex5.m1.4.4.2.3.cmml">,</mo><mrow id="S8.Ex5.m1.4.4.2.2.2" xref="S8.Ex5.m1.4.4.2.2.2.cmml"><mi id="S8.Ex5.m1.4.4.2.2.2.2" xref="S8.Ex5.m1.4.4.2.2.2.2.cmml">k</mi><mo id="S8.Ex5.m1.4.4.2.2.2.1" xref="S8.Ex5.m1.4.4.2.2.2.1.cmml">−</mo><mn id="S8.Ex5.m1.4.4.2.2.2.3" xref="S8.Ex5.m1.4.4.2.2.2.3.cmml">1</mn></mrow></mrow></msub><mo id="S8.Ex5.m1.7.7.1.2" xref="S8.Ex5.m1.7.7.1.2.cmml">−</mo><mrow id="S8.Ex5.m1.7.7.1.1" xref="S8.Ex5.m1.7.7.1.1.cmml"><mi id="S8.Ex5.m1.7.7.1.1.3" xref="S8.Ex5.m1.7.7.1.1.3.cmml">α</mi><mo id="S8.Ex5.m1.7.7.1.1.2" lspace="0.167em" xref="S8.Ex5.m1.7.7.1.1.2.cmml">⁢</mo><mrow id="S8.Ex5.m1.7.7.1.1.4" xref="S8.Ex5.m1.7.7.1.1.4.cmml"><msub id="S8.Ex5.m1.7.7.1.1.4.1" xref="S8.Ex5.m1.7.7.1.1.4.1.cmml"><mo id="S8.Ex5.m1.7.7.1.1.4.1.2" rspace="0.167em" xref="S8.Ex5.m1.7.7.1.1.4.1.2.cmml">∇</mo><mi id="S8.Ex5.m1.7.7.1.1.4.1.3" xref="S8.Ex5.m1.7.7.1.1.4.1.3.cmml">θ</mi></msub><mi class="ltx_font_mathcaligraphic" id="S8.Ex5.m1.7.7.1.1.4.2" xref="S8.Ex5.m1.7.7.1.1.4.2.cmml">ℒ</mi></mrow><mo id="S8.Ex5.m1.7.7.1.1.2a" xref="S8.Ex5.m1.7.7.1.1.2.cmml">⁢</mo><mrow id="S8.Ex5.m1.7.7.1.1.1.1" xref="S8.Ex5.m1.7.7.1.1.1.1.1.cmml"><mo id="S8.Ex5.m1.7.7.1.1.1.1.2" stretchy="false" xref="S8.Ex5.m1.7.7.1.1.1.1.1.cmml">(</mo><msub id="S8.Ex5.m1.7.7.1.1.1.1.1" xref="S8.Ex5.m1.7.7.1.1.1.1.1.cmml"><mi id="S8.Ex5.m1.7.7.1.1.1.1.1.2" xref="S8.Ex5.m1.7.7.1.1.1.1.1.2.cmml">θ</mi><mrow id="S8.Ex5.m1.6.6.2.2" xref="S8.Ex5.m1.6.6.2.3.cmml"><mrow id="S8.Ex5.m1.5.5.1.1.1" xref="S8.Ex5.m1.5.5.1.1.1.cmml"><mi id="S8.Ex5.m1.5.5.1.1.1.2" xref="S8.Ex5.m1.5.5.1.1.1.2.cmml">i</mi><mo id="S8.Ex5.m1.5.5.1.1.1.1" xref="S8.Ex5.m1.5.5.1.1.1.1.cmml">−</mo><mn id="S8.Ex5.m1.5.5.1.1.1.3" xref="S8.Ex5.m1.5.5.1.1.1.3.cmml">1</mn></mrow><mo id="S8.Ex5.m1.6.6.2.2.3" xref="S8.Ex5.m1.6.6.2.3.cmml">,</mo><mrow id="S8.Ex5.m1.6.6.2.2.2" xref="S8.Ex5.m1.6.6.2.2.2.cmml"><mi id="S8.Ex5.m1.6.6.2.2.2.2" xref="S8.Ex5.m1.6.6.2.2.2.2.cmml">k</mi><mo id="S8.Ex5.m1.6.6.2.2.2.1" xref="S8.Ex5.m1.6.6.2.2.2.1.cmml">−</mo><mn id="S8.Ex5.m1.6.6.2.2.2.3" xref="S8.Ex5.m1.6.6.2.2.2.3.cmml">1</mn></mrow></mrow></msub><mo id="S8.Ex5.m1.7.7.1.1.1.1.3" stretchy="false" xref="S8.Ex5.m1.7.7.1.1.1.1.1.cmml">)</mo></mrow></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S8.Ex5.m1.7b"><apply id="S8.Ex5.m1.7.7.cmml" xref="S8.Ex5.m1.7.7"><ci id="S8.Ex5.m1.7.7.2.cmml" xref="S8.Ex5.m1.7.7.2">←</ci><apply id="S8.Ex5.m1.7.7.3.cmml" xref="S8.Ex5.m1.7.7.3"><csymbol cd="ambiguous" id="S8.Ex5.m1.7.7.3.1.cmml" xref="S8.Ex5.m1.7.7.3">subscript</csymbol><ci id="S8.Ex5.m1.7.7.3.2.cmml" xref="S8.Ex5.m1.7.7.3.2">𝜃</ci><list id="S8.Ex5.m1.2.2.2.3.cmml" xref="S8.Ex5.m1.2.2.2.2"><ci id="S8.Ex5.m1.1.1.1.1.cmml" xref="S8.Ex5.m1.1.1.1.1">𝑖</ci><apply id="S8.Ex5.m1.2.2.2.2.1.cmml" xref="S8.Ex5.m1.2.2.2.2.1"><minus id="S8.Ex5.m1.2.2.2.2.1.1.cmml" xref="S8.Ex5.m1.2.2.2.2.1.1"></minus><ci id="S8.Ex5.m1.2.2.2.2.1.2.cmml" xref="S8.Ex5.m1.2.2.2.2.1.2">𝑘</ci><cn id="S8.Ex5.m1.2.2.2.2.1.3.cmml" type="integer" xref="S8.Ex5.m1.2.2.2.2.1.3">1</cn></apply></list></apply><apply id="S8.Ex5.m1.7.7.1.cmml" xref="S8.Ex5.m1.7.7.1"><minus id="S8.Ex5.m1.7.7.1.2.cmml" xref="S8.Ex5.m1.7.7.1.2"></minus><apply id="S8.Ex5.m1.7.7.1.3.cmml" xref="S8.Ex5.m1.7.7.1.3"><csymbol cd="ambiguous" id="S8.Ex5.m1.7.7.1.3.1.cmml" xref="S8.Ex5.m1.7.7.1.3">subscript</csymbol><ci id="S8.Ex5.m1.7.7.1.3.2.cmml" xref="S8.Ex5.m1.7.7.1.3.2">𝜃</ci><list id="S8.Ex5.m1.4.4.2.3.cmml" xref="S8.Ex5.m1.4.4.2.2"><apply id="S8.Ex5.m1.3.3.1.1.1.cmml" xref="S8.Ex5.m1.3.3.1.1.1"><minus id="S8.Ex5.m1.3.3.1.1.1.1.cmml" xref="S8.Ex5.m1.3.3.1.1.1.1"></minus><ci id="S8.Ex5.m1.3.3.1.1.1.2.cmml" xref="S8.Ex5.m1.3.3.1.1.1.2">𝑖</ci><cn id="S8.Ex5.m1.3.3.1.1.1.3.cmml" type="integer" xref="S8.Ex5.m1.3.3.1.1.1.3">1</cn></apply><apply id="S8.Ex5.m1.4.4.2.2.2.cmml" xref="S8.Ex5.m1.4.4.2.2.2"><minus id="S8.Ex5.m1.4.4.2.2.2.1.cmml" xref="S8.Ex5.m1.4.4.2.2.2.1"></minus><ci id="S8.Ex5.m1.4.4.2.2.2.2.cmml" xref="S8.Ex5.m1.4.4.2.2.2.2">𝑘</ci><cn id="S8.Ex5.m1.4.4.2.2.2.3.cmml" type="integer" xref="S8.Ex5.m1.4.4.2.2.2.3">1</cn></apply></list></apply><apply id="S8.Ex5.m1.7.7.1.1.cmml" xref="S8.Ex5.m1.7.7.1.1"><times id="S8.Ex5.m1.7.7.1.1.2.cmml" xref="S8.Ex5.m1.7.7.1.1.2"></times><ci id="S8.Ex5.m1.7.7.1.1.3.cmml" xref="S8.Ex5.m1.7.7.1.1.3">𝛼</ci><apply id="S8.Ex5.m1.7.7.1.1.4.cmml" xref="S8.Ex5.m1.7.7.1.1.4"><apply id="S8.Ex5.m1.7.7.1.1.4.1.cmml" xref="S8.Ex5.m1.7.7.1.1.4.1"><csymbol cd="ambiguous" id="S8.Ex5.m1.7.7.1.1.4.1.1.cmml" xref="S8.Ex5.m1.7.7.1.1.4.1">subscript</csymbol><ci id="S8.Ex5.m1.7.7.1.1.4.1.2.cmml" xref="S8.Ex5.m1.7.7.1.1.4.1.2">∇</ci><ci id="S8.Ex5.m1.7.7.1.1.4.1.3.cmml" xref="S8.Ex5.m1.7.7.1.1.4.1.3">𝜃</ci></apply><ci id="S8.Ex5.m1.7.7.1.1.4.2.cmml" xref="S8.Ex5.m1.7.7.1.1.4.2">ℒ</ci></apply><apply id="S8.Ex5.m1.7.7.1.1.1.1.1.cmml" xref="S8.Ex5.m1.7.7.1.1.1.1"><csymbol cd="ambiguous" id="S8.Ex5.m1.7.7.1.1.1.1.1.1.cmml" xref="S8.Ex5.m1.7.7.1.1.1.1">subscript</csymbol><ci id="S8.Ex5.m1.7.7.1.1.1.1.1.2.cmml" xref="S8.Ex5.m1.7.7.1.1.1.1.1.2">𝜃</ci><list id="S8.Ex5.m1.6.6.2.3.cmml" xref="S8.Ex5.m1.6.6.2.2"><apply id="S8.Ex5.m1.5.5.1.1.1.cmml" xref="S8.Ex5.m1.5.5.1.1.1"><minus id="S8.Ex5.m1.5.5.1.1.1.1.cmml" xref="S8.Ex5.m1.5.5.1.1.1.1"></minus><ci id="S8.Ex5.m1.5.5.1.1.1.2.cmml" xref="S8.Ex5.m1.5.5.1.1.1.2">𝑖</ci><cn id="S8.Ex5.m1.5.5.1.1.1.3.cmml" type="integer" xref="S8.Ex5.m1.5.5.1.1.1.3">1</cn></apply><apply id="S8.Ex5.m1.6.6.2.2.2.cmml" xref="S8.Ex5.m1.6.6.2.2.2"><minus id="S8.Ex5.m1.6.6.2.2.2.1.cmml" xref="S8.Ex5.m1.6.6.2.2.2.1"></minus><ci id="S8.Ex5.m1.6.6.2.2.2.2.cmml" xref="S8.Ex5.m1.6.6.2.2.2.2">𝑘</ci><cn id="S8.Ex5.m1.6.6.2.2.2.3.cmml" type="integer" xref="S8.Ex5.m1.6.6.2.2.2.3">1</cn></apply></list></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S8.Ex5.m1.7c">\theta_{i,k-1}\leftarrow\theta_{i-1,k-1}-\alpha\nabla_{\theta}\mathcal{L}(% \theta_{i-1,k-1})</annotation><annotation encoding="application/x-llamapun" id="S8.Ex5.m1.7d">italic_θ start_POSTSUBSCRIPT italic_i , italic_k - 1 end_POSTSUBSCRIPT ← italic_θ start_POSTSUBSCRIPT italic_i - 1 , italic_k - 1 end_POSTSUBSCRIPT - italic_α ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_L ( italic_θ start_POSTSUBSCRIPT italic_i - 1 , italic_k - 1 end_POSTSUBSCRIPT )</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> </tr></tbody> </table> until they converge and assign <math alttext="\theta_{k}=\theta_{i,k-1}" class="ltx_Math" display="inline" id="alg1.l4.m6.2"><semantics id="alg1.l4.m6.2a"><mrow id="alg1.l4.m6.2.3" xref="alg1.l4.m6.2.3.cmml"><msub id="alg1.l4.m6.2.3.2" xref="alg1.l4.m6.2.3.2.cmml"><mi id="alg1.l4.m6.2.3.2.2" xref="alg1.l4.m6.2.3.2.2.cmml">θ</mi><mi id="alg1.l4.m6.2.3.2.3" xref="alg1.l4.m6.2.3.2.3.cmml">k</mi></msub><mo id="alg1.l4.m6.2.3.1" xref="alg1.l4.m6.2.3.1.cmml">=</mo><msub id="alg1.l4.m6.2.3.3" xref="alg1.l4.m6.2.3.3.cmml"><mi id="alg1.l4.m6.2.3.3.2" xref="alg1.l4.m6.2.3.3.2.cmml">θ</mi><mrow id="alg1.l4.m6.2.2.2.2" xref="alg1.l4.m6.2.2.2.3.cmml"><mi id="alg1.l4.m6.1.1.1.1" xref="alg1.l4.m6.1.1.1.1.cmml">i</mi><mo id="alg1.l4.m6.2.2.2.2.2" xref="alg1.l4.m6.2.2.2.3.cmml">,</mo><mrow id="alg1.l4.m6.2.2.2.2.1" xref="alg1.l4.m6.2.2.2.2.1.cmml"><mi id="alg1.l4.m6.2.2.2.2.1.2" xref="alg1.l4.m6.2.2.2.2.1.2.cmml">k</mi><mo id="alg1.l4.m6.2.2.2.2.1.1" xref="alg1.l4.m6.2.2.2.2.1.1.cmml">−</mo><mn id="alg1.l4.m6.2.2.2.2.1.3" xref="alg1.l4.m6.2.2.2.2.1.3.cmml">1</mn></mrow></mrow></msub></mrow><annotation-xml encoding="MathML-Content" id="alg1.l4.m6.2b"><apply id="alg1.l4.m6.2.3.cmml" xref="alg1.l4.m6.2.3"><eq id="alg1.l4.m6.2.3.1.cmml" xref="alg1.l4.m6.2.3.1"></eq><apply id="alg1.l4.m6.2.3.2.cmml" xref="alg1.l4.m6.2.3.2"><csymbol cd="ambiguous" id="alg1.l4.m6.2.3.2.1.cmml" xref="alg1.l4.m6.2.3.2">subscript</csymbol><ci id="alg1.l4.m6.2.3.2.2.cmml" xref="alg1.l4.m6.2.3.2.2">𝜃</ci><ci id="alg1.l4.m6.2.3.2.3.cmml" xref="alg1.l4.m6.2.3.2.3">𝑘</ci></apply><apply id="alg1.l4.m6.2.3.3.cmml" xref="alg1.l4.m6.2.3.3"><csymbol cd="ambiguous" id="alg1.l4.m6.2.3.3.1.cmml" xref="alg1.l4.m6.2.3.3">subscript</csymbol><ci id="alg1.l4.m6.2.3.3.2.cmml" xref="alg1.l4.m6.2.3.3.2">𝜃</ci><list id="alg1.l4.m6.2.2.2.3.cmml" xref="alg1.l4.m6.2.2.2.2"><ci id="alg1.l4.m6.1.1.1.1.cmml" xref="alg1.l4.m6.1.1.1.1">𝑖</ci><apply id="alg1.l4.m6.2.2.2.2.1.cmml" xref="alg1.l4.m6.2.2.2.2.1"><minus id="alg1.l4.m6.2.2.2.2.1.1.cmml" xref="alg1.l4.m6.2.2.2.2.1.1"></minus><ci id="alg1.l4.m6.2.2.2.2.1.2.cmml" xref="alg1.l4.m6.2.2.2.2.1.2">𝑘</ci><cn id="alg1.l4.m6.2.2.2.2.1.3.cmml" type="integer" xref="alg1.l4.m6.2.2.2.2.1.3">1</cn></apply></list></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l4.m6.2c">\theta_{k}=\theta_{i,k-1}</annotation><annotation encoding="application/x-llamapun" id="alg1.l4.m6.2d">italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_θ start_POSTSUBSCRIPT italic_i , italic_k - 1 end_POSTSUBSCRIPT</annotation></semantics></math> to settle down the simulation environment of the current iteration. </div> <div class="ltx_listingline" id="alg1.l5"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l5.1.1.1" style="font-size:80%;">5:</span></span>     <span class="ltx_text ltx_font_bold" id="alg1.l5.2">Step 2:</span> <span class="ltx_text ltx_font_bold" id="alg1.l5.3">Policy Training</span> </div> <div class="ltx_listingline" id="alg1.l6"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l6.1.1.1" style="font-size:80%;">6:</span></span>      Using the updated environment <math alttext="\theta_{k}" class="ltx_Math" display="inline" id="alg1.l6.m1.1"><semantics id="alg1.l6.m1.1a"><msub id="alg1.l6.m1.1.1" xref="alg1.l6.m1.1.1.cmml"><mi id="alg1.l6.m1.1.1.2" xref="alg1.l6.m1.1.1.2.cmml">θ</mi><mi id="alg1.l6.m1.1.1.3" xref="alg1.l6.m1.1.1.3.cmml">k</mi></msub><annotation-xml encoding="MathML-Content" id="alg1.l6.m1.1b"><apply id="alg1.l6.m1.1.1.cmml" xref="alg1.l6.m1.1.1"><csymbol cd="ambiguous" id="alg1.l6.m1.1.1.1.cmml" xref="alg1.l6.m1.1.1">subscript</csymbol><ci id="alg1.l6.m1.1.1.2.cmml" xref="alg1.l6.m1.1.1.2">𝜃</ci><ci id="alg1.l6.m1.1.1.3.cmml" xref="alg1.l6.m1.1.1.3">𝑘</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l6.m1.1c">\theta_{k}</annotation><annotation encoding="application/x-llamapun" id="alg1.l6.m1.1d">italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT</annotation></semantics></math>, perform reinforcement learning training for the policy <math alttext="\pi_{k}" class="ltx_Math" display="inline" id="alg1.l6.m2.1"><semantics id="alg1.l6.m2.1a"><msub id="alg1.l6.m2.1.1" xref="alg1.l6.m2.1.1.cmml"><mi id="alg1.l6.m2.1.1.2" xref="alg1.l6.m2.1.1.2.cmml">π</mi><mi id="alg1.l6.m2.1.1.3" xref="alg1.l6.m2.1.1.3.cmml">k</mi></msub><annotation-xml encoding="MathML-Content" id="alg1.l6.m2.1b"><apply id="alg1.l6.m2.1.1.cmml" xref="alg1.l6.m2.1.1"><csymbol cd="ambiguous" id="alg1.l6.m2.1.1.1.cmml" xref="alg1.l6.m2.1.1">subscript</csymbol><ci id="alg1.l6.m2.1.1.2.cmml" xref="alg1.l6.m2.1.1.2">𝜋</ci><ci id="alg1.l6.m2.1.1.3.cmml" xref="alg1.l6.m2.1.1.3">𝑘</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l6.m2.1c">\pi_{k}</annotation><annotation encoding="application/x-llamapun" id="alg1.l6.m2.1d">italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT</annotation></semantics></math> with the adaptive InfoGap cost <a class="ltx_ref" href="https://arxiv.org/html/2503.10118v2#S3.E3" title="In III-C Adaptive InfoGap Loss construction ‣ III Method ‣ An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation"><span class="ltx_text ltx_ref_tag">3</span></a> included: <table class="ltx_equation ltx_eqn_table" id="S8.Ex6"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\mathcal{L}(\pi_{k})=\mathbb{E}_{s_{t},a_{t}\sim\pi_{k}}\left[r_{t}+\gamma% \mathbb{V}_{\pi_{k}}(s_{t+1})-\mathbb{V}_{\pi_{k}}(s_{t})\right]+\mathcal{L}_{% sr}^{k}" class="ltx_Math" display="block" id="S8.Ex6.m1.4"><semantics id="S8.Ex6.m1.4a"><mrow id="S8.Ex6.m1.4.4" xref="S8.Ex6.m1.4.4.cmml"><mrow id="S8.Ex6.m1.3.3.1" xref="S8.Ex6.m1.3.3.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S8.Ex6.m1.3.3.1.3" xref="S8.Ex6.m1.3.3.1.3.cmml">ℒ</mi><mo id="S8.Ex6.m1.3.3.1.2" xref="S8.Ex6.m1.3.3.1.2.cmml">⁢</mo><mrow id="S8.Ex6.m1.3.3.1.1.1" xref="S8.Ex6.m1.3.3.1.1.1.1.cmml"><mo id="S8.Ex6.m1.3.3.1.1.1.2" stretchy="false" xref="S8.Ex6.m1.3.3.1.1.1.1.cmml">(</mo><msub id="S8.Ex6.m1.3.3.1.1.1.1" xref="S8.Ex6.m1.3.3.1.1.1.1.cmml"><mi id="S8.Ex6.m1.3.3.1.1.1.1.2" xref="S8.Ex6.m1.3.3.1.1.1.1.2.cmml">π</mi><mi id="S8.Ex6.m1.3.3.1.1.1.1.3" xref="S8.Ex6.m1.3.3.1.1.1.1.3.cmml">k</mi></msub><mo id="S8.Ex6.m1.3.3.1.1.1.3" stretchy="false" xref="S8.Ex6.m1.3.3.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S8.Ex6.m1.4.4.3" xref="S8.Ex6.m1.4.4.3.cmml">=</mo><mrow id="S8.Ex6.m1.4.4.2" xref="S8.Ex6.m1.4.4.2.cmml"><mrow id="S8.Ex6.m1.4.4.2.1" xref="S8.Ex6.m1.4.4.2.1.cmml"><msub id="S8.Ex6.m1.4.4.2.1.3" xref="S8.Ex6.m1.4.4.2.1.3.cmml"><mi id="S8.Ex6.m1.4.4.2.1.3.2" xref="S8.Ex6.m1.4.4.2.1.3.2.cmml">𝔼</mi><mrow id="S8.Ex6.m1.2.2.2" xref="S8.Ex6.m1.2.2.2.cmml"><mrow id="S8.Ex6.m1.2.2.2.2.2" xref="S8.Ex6.m1.2.2.2.2.3.cmml"><msub id="S8.Ex6.m1.1.1.1.1.1.1" xref="S8.Ex6.m1.1.1.1.1.1.1.cmml"><mi id="S8.Ex6.m1.1.1.1.1.1.1.2" xref="S8.Ex6.m1.1.1.1.1.1.1.2.cmml">s</mi><mi id="S8.Ex6.m1.1.1.1.1.1.1.3" xref="S8.Ex6.m1.1.1.1.1.1.1.3.cmml">t</mi></msub><mo id="S8.Ex6.m1.2.2.2.2.2.3" xref="S8.Ex6.m1.2.2.2.2.3.cmml">,</mo><msub id="S8.Ex6.m1.2.2.2.2.2.2" xref="S8.Ex6.m1.2.2.2.2.2.2.cmml"><mi id="S8.Ex6.m1.2.2.2.2.2.2.2" xref="S8.Ex6.m1.2.2.2.2.2.2.2.cmml">a</mi><mi id="S8.Ex6.m1.2.2.2.2.2.2.3" xref="S8.Ex6.m1.2.2.2.2.2.2.3.cmml">t</mi></msub></mrow><mo id="S8.Ex6.m1.2.2.2.3" xref="S8.Ex6.m1.2.2.2.3.cmml">∼</mo><msub id="S8.Ex6.m1.2.2.2.4" xref="S8.Ex6.m1.2.2.2.4.cmml"><mi id="S8.Ex6.m1.2.2.2.4.2" xref="S8.Ex6.m1.2.2.2.4.2.cmml">π</mi><mi id="S8.Ex6.m1.2.2.2.4.3" xref="S8.Ex6.m1.2.2.2.4.3.cmml">k</mi></msub></mrow></msub><mo id="S8.Ex6.m1.4.4.2.1.2" xref="S8.Ex6.m1.4.4.2.1.2.cmml">⁢</mo><mrow id="S8.Ex6.m1.4.4.2.1.1.1" xref="S8.Ex6.m1.4.4.2.1.1.2.cmml"><mo id="S8.Ex6.m1.4.4.2.1.1.1.2" xref="S8.Ex6.m1.4.4.2.1.1.2.1.cmml">[</mo><mrow id="S8.Ex6.m1.4.4.2.1.1.1.1" xref="S8.Ex6.m1.4.4.2.1.1.1.1.cmml"><mrow id="S8.Ex6.m1.4.4.2.1.1.1.1.1" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.cmml"><msub id="S8.Ex6.m1.4.4.2.1.1.1.1.1.3" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.3.cmml"><mi id="S8.Ex6.m1.4.4.2.1.1.1.1.1.3.2" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.3.2.cmml">r</mi><mi id="S8.Ex6.m1.4.4.2.1.1.1.1.1.3.3" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.3.3.cmml">t</mi></msub><mo id="S8.Ex6.m1.4.4.2.1.1.1.1.1.2" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.2.cmml">+</mo><mrow id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.cmml"><mi id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.3" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.3.cmml">γ</mi><mo id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.2" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.2.cmml">⁢</mo><msub id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.cmml"><mi id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.2" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.2.cmml">𝕍</mi><msub id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.3" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.3.cmml"><mi id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.3.2" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.3.2.cmml">π</mi><mi id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.3.3" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.3.3.cmml">k</mi></msub></msub><mo id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.2a" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.2.cmml">⁢</mo><mrow id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.cmml"><mo id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.2" stretchy="false" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.cmml">(</mo><msub id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.cmml"><mi id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.2" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.2.cmml">s</mi><mrow id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3.cmml"><mi id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3.2" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3.2.cmml">t</mi><mo id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3.1" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3.1.cmml">+</mo><mn id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3.3" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3.3.cmml">1</mn></mrow></msub><mo id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.3" stretchy="false" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow></mrow><mo id="S8.Ex6.m1.4.4.2.1.1.1.1.3" xref="S8.Ex6.m1.4.4.2.1.1.1.1.3.cmml">−</mo><mrow id="S8.Ex6.m1.4.4.2.1.1.1.1.2" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.cmml"><msub id="S8.Ex6.m1.4.4.2.1.1.1.1.2.3" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.cmml"><mi id="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.2" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.2.cmml">𝕍</mi><msub id="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.3" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.3.cmml"><mi id="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.3.2" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.3.2.cmml">π</mi><mi id="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.3.3" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.3.3.cmml">k</mi></msub></msub><mo id="S8.Ex6.m1.4.4.2.1.1.1.1.2.2" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.2.cmml">⁢</mo><mrow id="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1.cmml"><mo id="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.2" stretchy="false" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1.cmml">(</mo><msub id="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1.cmml"><mi id="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1.2" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1.2.cmml">s</mi><mi id="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1.3" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1.3.cmml">t</mi></msub><mo id="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.3" stretchy="false" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1.cmml">)</mo></mrow></mrow></mrow><mo id="S8.Ex6.m1.4.4.2.1.1.1.3" xref="S8.Ex6.m1.4.4.2.1.1.2.1.cmml">]</mo></mrow></mrow><mo id="S8.Ex6.m1.4.4.2.2" xref="S8.Ex6.m1.4.4.2.2.cmml">+</mo><msubsup id="S8.Ex6.m1.4.4.2.3" xref="S8.Ex6.m1.4.4.2.3.cmml"><mi class="ltx_font_mathcaligraphic" id="S8.Ex6.m1.4.4.2.3.2.2" xref="S8.Ex6.m1.4.4.2.3.2.2.cmml">ℒ</mi><mrow id="S8.Ex6.m1.4.4.2.3.2.3" xref="S8.Ex6.m1.4.4.2.3.2.3.cmml"><mi id="S8.Ex6.m1.4.4.2.3.2.3.2" xref="S8.Ex6.m1.4.4.2.3.2.3.2.cmml">s</mi><mo id="S8.Ex6.m1.4.4.2.3.2.3.1" xref="S8.Ex6.m1.4.4.2.3.2.3.1.cmml">⁢</mo><mi id="S8.Ex6.m1.4.4.2.3.2.3.3" xref="S8.Ex6.m1.4.4.2.3.2.3.3.cmml">r</mi></mrow><mi id="S8.Ex6.m1.4.4.2.3.3" xref="S8.Ex6.m1.4.4.2.3.3.cmml">k</mi></msubsup></mrow></mrow><annotation-xml encoding="MathML-Content" id="S8.Ex6.m1.4b"><apply id="S8.Ex6.m1.4.4.cmml" xref="S8.Ex6.m1.4.4"><eq id="S8.Ex6.m1.4.4.3.cmml" xref="S8.Ex6.m1.4.4.3"></eq><apply id="S8.Ex6.m1.3.3.1.cmml" xref="S8.Ex6.m1.3.3.1"><times id="S8.Ex6.m1.3.3.1.2.cmml" xref="S8.Ex6.m1.3.3.1.2"></times><ci id="S8.Ex6.m1.3.3.1.3.cmml" xref="S8.Ex6.m1.3.3.1.3">ℒ</ci><apply id="S8.Ex6.m1.3.3.1.1.1.1.cmml" xref="S8.Ex6.m1.3.3.1.1.1"><csymbol cd="ambiguous" id="S8.Ex6.m1.3.3.1.1.1.1.1.cmml" xref="S8.Ex6.m1.3.3.1.1.1">subscript</csymbol><ci id="S8.Ex6.m1.3.3.1.1.1.1.2.cmml" xref="S8.Ex6.m1.3.3.1.1.1.1.2">𝜋</ci><ci id="S8.Ex6.m1.3.3.1.1.1.1.3.cmml" xref="S8.Ex6.m1.3.3.1.1.1.1.3">𝑘</ci></apply></apply><apply id="S8.Ex6.m1.4.4.2.cmml" xref="S8.Ex6.m1.4.4.2"><plus id="S8.Ex6.m1.4.4.2.2.cmml" xref="S8.Ex6.m1.4.4.2.2"></plus><apply id="S8.Ex6.m1.4.4.2.1.cmml" xref="S8.Ex6.m1.4.4.2.1"><times id="S8.Ex6.m1.4.4.2.1.2.cmml" xref="S8.Ex6.m1.4.4.2.1.2"></times><apply id="S8.Ex6.m1.4.4.2.1.3.cmml" xref="S8.Ex6.m1.4.4.2.1.3"><csymbol cd="ambiguous" id="S8.Ex6.m1.4.4.2.1.3.1.cmml" xref="S8.Ex6.m1.4.4.2.1.3">subscript</csymbol><ci id="S8.Ex6.m1.4.4.2.1.3.2.cmml" xref="S8.Ex6.m1.4.4.2.1.3.2">𝔼</ci><apply id="S8.Ex6.m1.2.2.2.cmml" xref="S8.Ex6.m1.2.2.2"><csymbol cd="latexml" id="S8.Ex6.m1.2.2.2.3.cmml" xref="S8.Ex6.m1.2.2.2.3">similar-to</csymbol><list id="S8.Ex6.m1.2.2.2.2.3.cmml" xref="S8.Ex6.m1.2.2.2.2.2"><apply id="S8.Ex6.m1.1.1.1.1.1.1.cmml" xref="S8.Ex6.m1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S8.Ex6.m1.1.1.1.1.1.1.1.cmml" xref="S8.Ex6.m1.1.1.1.1.1.1">subscript</csymbol><ci id="S8.Ex6.m1.1.1.1.1.1.1.2.cmml" xref="S8.Ex6.m1.1.1.1.1.1.1.2">𝑠</ci><ci id="S8.Ex6.m1.1.1.1.1.1.1.3.cmml" xref="S8.Ex6.m1.1.1.1.1.1.1.3">𝑡</ci></apply><apply id="S8.Ex6.m1.2.2.2.2.2.2.cmml" xref="S8.Ex6.m1.2.2.2.2.2.2"><csymbol cd="ambiguous" id="S8.Ex6.m1.2.2.2.2.2.2.1.cmml" xref="S8.Ex6.m1.2.2.2.2.2.2">subscript</csymbol><ci id="S8.Ex6.m1.2.2.2.2.2.2.2.cmml" xref="S8.Ex6.m1.2.2.2.2.2.2.2">𝑎</ci><ci id="S8.Ex6.m1.2.2.2.2.2.2.3.cmml" xref="S8.Ex6.m1.2.2.2.2.2.2.3">𝑡</ci></apply></list><apply id="S8.Ex6.m1.2.2.2.4.cmml" xref="S8.Ex6.m1.2.2.2.4"><csymbol cd="ambiguous" id="S8.Ex6.m1.2.2.2.4.1.cmml" xref="S8.Ex6.m1.2.2.2.4">subscript</csymbol><ci id="S8.Ex6.m1.2.2.2.4.2.cmml" xref="S8.Ex6.m1.2.2.2.4.2">𝜋</ci><ci id="S8.Ex6.m1.2.2.2.4.3.cmml" xref="S8.Ex6.m1.2.2.2.4.3">𝑘</ci></apply></apply></apply><apply id="S8.Ex6.m1.4.4.2.1.1.2.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1"><csymbol cd="latexml" id="S8.Ex6.m1.4.4.2.1.1.2.1.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.2">delimited-[]</csymbol><apply id="S8.Ex6.m1.4.4.2.1.1.1.1.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1"><minus id="S8.Ex6.m1.4.4.2.1.1.1.1.3.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.3"></minus><apply id="S8.Ex6.m1.4.4.2.1.1.1.1.1.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1"><plus id="S8.Ex6.m1.4.4.2.1.1.1.1.1.2.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.2"></plus><apply id="S8.Ex6.m1.4.4.2.1.1.1.1.1.3.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S8.Ex6.m1.4.4.2.1.1.1.1.1.3.1.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.3">subscript</csymbol><ci id="S8.Ex6.m1.4.4.2.1.1.1.1.1.3.2.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.3.2">𝑟</ci><ci id="S8.Ex6.m1.4.4.2.1.1.1.1.1.3.3.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.3.3">𝑡</ci></apply><apply id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1"><times id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.2.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.2"></times><ci id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.3.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.3">𝛾</ci><apply id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4"><csymbol cd="ambiguous" id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.1.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4">subscript</csymbol><ci id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.2.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.2">𝕍</ci><apply id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.3.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.3"><csymbol cd="ambiguous" id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.3.1.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.3">subscript</csymbol><ci id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.3.2.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.3.2">𝜋</ci><ci id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.3.3.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.4.3.3">𝑘</ci></apply></apply><apply id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.1.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1">subscript</csymbol><ci id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.2.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.2">𝑠</ci><apply id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3"><plus id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3.1.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3.1"></plus><ci id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3.2.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3.2">𝑡</ci><cn id="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3.3.cmml" type="integer" xref="S8.Ex6.m1.4.4.2.1.1.1.1.1.1.1.1.1.3.3">1</cn></apply></apply></apply></apply><apply id="S8.Ex6.m1.4.4.2.1.1.1.1.2.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2"><times id="S8.Ex6.m1.4.4.2.1.1.1.1.2.2.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.2"></times><apply id="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.3"><csymbol cd="ambiguous" id="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.1.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.3">subscript</csymbol><ci id="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.2.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.2">𝕍</ci><apply id="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.3.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.3"><csymbol cd="ambiguous" id="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.3.1.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.3">subscript</csymbol><ci id="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.3.2.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.3.2">𝜋</ci><ci id="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.3.3.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.3.3.3">𝑘</ci></apply></apply><apply id="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1"><csymbol cd="ambiguous" id="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1.1.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1">subscript</csymbol><ci id="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1.2.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1.2">𝑠</ci><ci id="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1.3.cmml" xref="S8.Ex6.m1.4.4.2.1.1.1.1.2.1.1.1.3">𝑡</ci></apply></apply></apply></apply></apply><apply id="S8.Ex6.m1.4.4.2.3.cmml" xref="S8.Ex6.m1.4.4.2.3"><csymbol cd="ambiguous" id="S8.Ex6.m1.4.4.2.3.1.cmml" xref="S8.Ex6.m1.4.4.2.3">superscript</csymbol><apply id="S8.Ex6.m1.4.4.2.3.2.cmml" xref="S8.Ex6.m1.4.4.2.3"><csymbol cd="ambiguous" id="S8.Ex6.m1.4.4.2.3.2.1.cmml" xref="S8.Ex6.m1.4.4.2.3">subscript</csymbol><ci id="S8.Ex6.m1.4.4.2.3.2.2.cmml" xref="S8.Ex6.m1.4.4.2.3.2.2">ℒ</ci><apply id="S8.Ex6.m1.4.4.2.3.2.3.cmml" xref="S8.Ex6.m1.4.4.2.3.2.3"><times id="S8.Ex6.m1.4.4.2.3.2.3.1.cmml" xref="S8.Ex6.m1.4.4.2.3.2.3.1"></times><ci id="S8.Ex6.m1.4.4.2.3.2.3.2.cmml" xref="S8.Ex6.m1.4.4.2.3.2.3.2">𝑠</ci><ci id="S8.Ex6.m1.4.4.2.3.2.3.3.cmml" xref="S8.Ex6.m1.4.4.2.3.2.3.3">𝑟</ci></apply></apply><ci id="S8.Ex6.m1.4.4.2.3.3.cmml" xref="S8.Ex6.m1.4.4.2.3.3">𝑘</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S8.Ex6.m1.4c">\mathcal{L}(\pi_{k})=\mathbb{E}_{s_{t},a_{t}\sim\pi_{k}}\left[r_{t}+\gamma% \mathbb{V}_{\pi_{k}}(s_{t+1})-\mathbb{V}_{\pi_{k}}(s_{t})\right]+\mathcal{L}_{% sr}^{k}</annotation><annotation encoding="application/x-llamapun" id="S8.Ex6.m1.4d">caligraphic_L ( italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_γ blackboard_V start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) - blackboard_V start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] + caligraphic_L start_POSTSUBSCRIPT italic_s italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> </tr></tbody> </table> </div> <div class="ltx_listingline" id="alg1.l7"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l7.1.1.1" style="font-size:80%;">7:</span></span>     <span class="ltx_text ltx_font_bold" id="alg1.l7.2">Step 3:</span> Deploy the updated policy <math alttext="\pi_{k}" class="ltx_Math" display="inline" id="alg1.l7.m1.1"><semantics id="alg1.l7.m1.1a"><msub id="alg1.l7.m1.1.1" xref="alg1.l7.m1.1.1.cmml"><mi id="alg1.l7.m1.1.1.2" xref="alg1.l7.m1.1.1.2.cmml">π</mi><mi id="alg1.l7.m1.1.1.3" xref="alg1.l7.m1.1.1.3.cmml">k</mi></msub><annotation-xml encoding="MathML-Content" id="alg1.l7.m1.1b"><apply id="alg1.l7.m1.1.1.cmml" xref="alg1.l7.m1.1.1"><csymbol cd="ambiguous" id="alg1.l7.m1.1.1.1.cmml" xref="alg1.l7.m1.1.1">subscript</csymbol><ci id="alg1.l7.m1.1.1.2.cmml" xref="alg1.l7.m1.1.1.2">𝜋</ci><ci id="alg1.l7.m1.1.1.3.cmml" xref="alg1.l7.m1.1.1.3">𝑘</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l7.m1.1c">\pi_{k}</annotation><annotation encoding="application/x-llamapun" id="alg1.l7.m1.1d">italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT</annotation></semantics></math> on the real robot and collect new real-world data <math alttext="\mathcal{D}_{real}^{k+1}" class="ltx_Math" display="inline" id="alg1.l7.m2.1"><semantics id="alg1.l7.m2.1a"><msubsup id="alg1.l7.m2.1.1" xref="alg1.l7.m2.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="alg1.l7.m2.1.1.2.2" xref="alg1.l7.m2.1.1.2.2.cmml">𝒟</mi><mrow id="alg1.l7.m2.1.1.2.3" xref="alg1.l7.m2.1.1.2.3.cmml"><mi id="alg1.l7.m2.1.1.2.3.2" xref="alg1.l7.m2.1.1.2.3.2.cmml">r</mi><mo id="alg1.l7.m2.1.1.2.3.1" xref="alg1.l7.m2.1.1.2.3.1.cmml">⁢</mo><mi id="alg1.l7.m2.1.1.2.3.3" xref="alg1.l7.m2.1.1.2.3.3.cmml">e</mi><mo id="alg1.l7.m2.1.1.2.3.1a" xref="alg1.l7.m2.1.1.2.3.1.cmml">⁢</mo><mi id="alg1.l7.m2.1.1.2.3.4" xref="alg1.l7.m2.1.1.2.3.4.cmml">a</mi><mo id="alg1.l7.m2.1.1.2.3.1b" xref="alg1.l7.m2.1.1.2.3.1.cmml">⁢</mo><mi id="alg1.l7.m2.1.1.2.3.5" xref="alg1.l7.m2.1.1.2.3.5.cmml">l</mi></mrow><mrow id="alg1.l7.m2.1.1.3" xref="alg1.l7.m2.1.1.3.cmml"><mi id="alg1.l7.m2.1.1.3.2" xref="alg1.l7.m2.1.1.3.2.cmml">k</mi><mo id="alg1.l7.m2.1.1.3.1" xref="alg1.l7.m2.1.1.3.1.cmml">+</mo><mn id="alg1.l7.m2.1.1.3.3" xref="alg1.l7.m2.1.1.3.3.cmml">1</mn></mrow></msubsup><annotation-xml encoding="MathML-Content" id="alg1.l7.m2.1b"><apply id="alg1.l7.m2.1.1.cmml" xref="alg1.l7.m2.1.1"><csymbol cd="ambiguous" id="alg1.l7.m2.1.1.1.cmml" xref="alg1.l7.m2.1.1">superscript</csymbol><apply id="alg1.l7.m2.1.1.2.cmml" xref="alg1.l7.m2.1.1"><csymbol cd="ambiguous" id="alg1.l7.m2.1.1.2.1.cmml" xref="alg1.l7.m2.1.1">subscript</csymbol><ci id="alg1.l7.m2.1.1.2.2.cmml" xref="alg1.l7.m2.1.1.2.2">𝒟</ci><apply id="alg1.l7.m2.1.1.2.3.cmml" xref="alg1.l7.m2.1.1.2.3"><times id="alg1.l7.m2.1.1.2.3.1.cmml" xref="alg1.l7.m2.1.1.2.3.1"></times><ci id="alg1.l7.m2.1.1.2.3.2.cmml" xref="alg1.l7.m2.1.1.2.3.2">𝑟</ci><ci id="alg1.l7.m2.1.1.2.3.3.cmml" xref="alg1.l7.m2.1.1.2.3.3">𝑒</ci><ci id="alg1.l7.m2.1.1.2.3.4.cmml" xref="alg1.l7.m2.1.1.2.3.4">𝑎</ci><ci id="alg1.l7.m2.1.1.2.3.5.cmml" xref="alg1.l7.m2.1.1.2.3.5">𝑙</ci></apply></apply><apply id="alg1.l7.m2.1.1.3.cmml" xref="alg1.l7.m2.1.1.3"><plus id="alg1.l7.m2.1.1.3.1.cmml" xref="alg1.l7.m2.1.1.3.1"></plus><ci id="alg1.l7.m2.1.1.3.2.cmml" xref="alg1.l7.m2.1.1.3.2">𝑘</ci><cn id="alg1.l7.m2.1.1.3.3.cmml" type="integer" xref="alg1.l7.m2.1.1.3.3">1</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l7.m2.1c">\mathcal{D}_{real}^{k+1}</annotation><annotation encoding="application/x-llamapun" id="alg1.l7.m2.1d">caligraphic_D start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT</annotation></semantics></math>. Repeat the process for subsequent iterations. </div> <div class="ltx_listingline" id="alg1.l8"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l8.1.1.1" style="font-size:80%;">8:</span></span>  <span class="ltx_text ltx_font_bold" id="alg1.l8.2">end</span> <span class="ltx_text ltx_font_bold" id="alg1.l8.3">while</span> </div> </div> </figure> </section> <section class="ltx_subsection" id="S8.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S8.SS2.4.1.1">VIII-B</span> </span><span class="ltx_text ltx_font_italic" id="S8.SS2.5.2">RL Settings</span> </h3> <div class="ltx_para" id="S8.SS2.p1"> <p class="ltx_p" id="S8.SS2.p1.1"><span class="ltx_text ltx_font_bold" id="S8.SS2.p1.1.1">Action Space:</span> The action space is parameterized as a 6-dimensional vector governing the rotational displacements of individual robotic joints, optimized for articulated control precision in Cartesian space operations.</p> </div> <div class="ltx_para" id="S8.SS2.p2"> <p class="ltx_p" id="S8.SS2.p2.1"><span class="ltx_text ltx_font_bold" id="S8.SS2.p2.1.1">Observation Space:</span> The observation space for the cube-block pushing experiment consists of a 6-dimensional vector representing joint angles, the Cartesian coordinates of the end-effector, a 3-dimensional vector for the target position, and a 3-dimensional vector for the block position, along with their relative displacement vectors. For the T-shaped block pushing task, the observation space is augmented to include both position and orientation, incorporating quaternion representations for the target and block poses.</p> </div> <div class="ltx_para" id="S8.SS2.p3"> <p class="ltx_p" id="S8.SS2.p3.1"><span class="ltx_text ltx_font_bold" id="S8.SS2.p3.1.1">Reward Setup:</span></p> <ul class="ltx_itemize" id="S8.I1"> <li class="ltx_item" id="S8.I1.ix1" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">(1)</span> <div class="ltx_para" id="S8.I1.ix1.p1"> <p class="ltx_p" id="S8.I1.ix1.p1.8"><span class="ltx_text ltx_font_bold" id="S8.I1.ix1.p1.8.1">Cube Block Experiment</span> The total reward function consists of two components. Firstly, the distance-based reward <math alttext="r_{d}" class="ltx_Math" display="inline" id="S8.I1.ix1.p1.1.m1.1"><semantics id="S8.I1.ix1.p1.1.m1.1a"><msub id="S8.I1.ix1.p1.1.m1.1.1" xref="S8.I1.ix1.p1.1.m1.1.1.cmml"><mi id="S8.I1.ix1.p1.1.m1.1.1.2" xref="S8.I1.ix1.p1.1.m1.1.1.2.cmml">r</mi><mi id="S8.I1.ix1.p1.1.m1.1.1.3" xref="S8.I1.ix1.p1.1.m1.1.1.3.cmml">d</mi></msub><annotation-xml encoding="MathML-Content" id="S8.I1.ix1.p1.1.m1.1b"><apply id="S8.I1.ix1.p1.1.m1.1.1.cmml" xref="S8.I1.ix1.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.1.m1.1.1.1.cmml" xref="S8.I1.ix1.p1.1.m1.1.1">subscript</csymbol><ci id="S8.I1.ix1.p1.1.m1.1.1.2.cmml" xref="S8.I1.ix1.p1.1.m1.1.1.2">𝑟</ci><ci id="S8.I1.ix1.p1.1.m1.1.1.3.cmml" xref="S8.I1.ix1.p1.1.m1.1.1.3">𝑑</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S8.I1.ix1.p1.1.m1.1c">r_{d}</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix1.p1.1.m1.1d">italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT</annotation></semantics></math> encourages the block to reach the target position: <math alttext="r_{d}=-\|\mathbf{x}_{b}-\mathbf{x}_{t}\|_{2}" class="ltx_Math" display="inline" id="S8.I1.ix1.p1.2.m2.1"><semantics id="S8.I1.ix1.p1.2.m2.1a"><mrow id="S8.I1.ix1.p1.2.m2.1.1" xref="S8.I1.ix1.p1.2.m2.1.1.cmml"><msub id="S8.I1.ix1.p1.2.m2.1.1.3" xref="S8.I1.ix1.p1.2.m2.1.1.3.cmml"><mi id="S8.I1.ix1.p1.2.m2.1.1.3.2" xref="S8.I1.ix1.p1.2.m2.1.1.3.2.cmml">r</mi><mi id="S8.I1.ix1.p1.2.m2.1.1.3.3" xref="S8.I1.ix1.p1.2.m2.1.1.3.3.cmml">d</mi></msub><mo id="S8.I1.ix1.p1.2.m2.1.1.2" xref="S8.I1.ix1.p1.2.m2.1.1.2.cmml">=</mo><mrow id="S8.I1.ix1.p1.2.m2.1.1.1" xref="S8.I1.ix1.p1.2.m2.1.1.1.cmml"><mo id="S8.I1.ix1.p1.2.m2.1.1.1a" xref="S8.I1.ix1.p1.2.m2.1.1.1.cmml">−</mo><msub id="S8.I1.ix1.p1.2.m2.1.1.1.1" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.cmml"><mrow id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.2.cmml"><mo id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.2" stretchy="false" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.2.1.cmml">‖</mo><mrow id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.cmml"><msub id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.2" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.2.cmml"><mi id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.2.2" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.2.2.cmml">𝐱</mi><mi id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.2.3" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.2.3.cmml">b</mi></msub><mo id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.1" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.1.cmml">−</mo><msub id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.3" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.3.cmml"><mi id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.3.2" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.3.2.cmml">𝐱</mi><mi id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.3.3" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.3.3.cmml">t</mi></msub></mrow><mo id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.3" stretchy="false" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.2.1.cmml">‖</mo></mrow><mn id="S8.I1.ix1.p1.2.m2.1.1.1.1.3" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.3.cmml">2</mn></msub></mrow></mrow><annotation-xml encoding="MathML-Content" id="S8.I1.ix1.p1.2.m2.1b"><apply id="S8.I1.ix1.p1.2.m2.1.1.cmml" xref="S8.I1.ix1.p1.2.m2.1.1"><eq id="S8.I1.ix1.p1.2.m2.1.1.2.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.2"></eq><apply id="S8.I1.ix1.p1.2.m2.1.1.3.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.3"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.2.m2.1.1.3.1.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.3">subscript</csymbol><ci id="S8.I1.ix1.p1.2.m2.1.1.3.2.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.3.2">𝑟</ci><ci id="S8.I1.ix1.p1.2.m2.1.1.3.3.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.3.3">𝑑</ci></apply><apply id="S8.I1.ix1.p1.2.m2.1.1.1.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1"><minus id="S8.I1.ix1.p1.2.m2.1.1.1.2.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1"></minus><apply id="S8.I1.ix1.p1.2.m2.1.1.1.1.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1.1"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.2.m2.1.1.1.1.2.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1.1">subscript</csymbol><apply id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.2.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1"><csymbol cd="latexml" id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.2.1.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.2">norm</csymbol><apply id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1"><minus id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.1.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.1"></minus><apply id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.2.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.2.1.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.2">subscript</csymbol><ci id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.2.2.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.2.2">𝐱</ci><ci id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.2.3.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.2.3">𝑏</ci></apply><apply id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.3.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.3.1.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.3">subscript</csymbol><ci id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.3.2.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.3.2">𝐱</ci><ci id="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.3.3.cmml" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.1.1.1.3.3">𝑡</ci></apply></apply></apply><cn id="S8.I1.ix1.p1.2.m2.1.1.1.1.3.cmml" type="integer" xref="S8.I1.ix1.p1.2.m2.1.1.1.1.3">2</cn></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S8.I1.ix1.p1.2.m2.1c">r_{d}=-\|\mathbf{x}_{b}-\mathbf{x}_{t}\|_{2}</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix1.p1.2.m2.1d">italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = - ∥ bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT</annotation></semantics></math>, where <math alttext="\|\cdot\|_{2}" class="ltx_math_unparsed" display="inline" id="S8.I1.ix1.p1.3.m3.1"><semantics id="S8.I1.ix1.p1.3.m3.1a"><mrow id="S8.I1.ix1.p1.3.m3.1b"><mo id="S8.I1.ix1.p1.3.m3.1.1" rspace="0em">∥</mo><mo id="S8.I1.ix1.p1.3.m3.1.2" lspace="0em" rspace="0em">⋅</mo><msub id="S8.I1.ix1.p1.3.m3.1.3"><mo id="S8.I1.ix1.p1.3.m3.1.3.2" lspace="0em">∥</mo><mn id="S8.I1.ix1.p1.3.m3.1.3.3">2</mn></msub></mrow><annotation encoding="application/x-tex" id="S8.I1.ix1.p1.3.m3.1c">\|\cdot\|_{2}</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix1.p1.3.m3.1d">∥ ⋅ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT</annotation></semantics></math> represents the Euclidean norm. Secondly, the end-effector guidance reward <math alttext="r_{ee}" class="ltx_Math" display="inline" id="S8.I1.ix1.p1.4.m4.1"><semantics id="S8.I1.ix1.p1.4.m4.1a"><msub id="S8.I1.ix1.p1.4.m4.1.1" xref="S8.I1.ix1.p1.4.m4.1.1.cmml"><mi id="S8.I1.ix1.p1.4.m4.1.1.2" xref="S8.I1.ix1.p1.4.m4.1.1.2.cmml">r</mi><mrow id="S8.I1.ix1.p1.4.m4.1.1.3" xref="S8.I1.ix1.p1.4.m4.1.1.3.cmml"><mi id="S8.I1.ix1.p1.4.m4.1.1.3.2" xref="S8.I1.ix1.p1.4.m4.1.1.3.2.cmml">e</mi><mo id="S8.I1.ix1.p1.4.m4.1.1.3.1" xref="S8.I1.ix1.p1.4.m4.1.1.3.1.cmml">⁢</mo><mi id="S8.I1.ix1.p1.4.m4.1.1.3.3" xref="S8.I1.ix1.p1.4.m4.1.1.3.3.cmml">e</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S8.I1.ix1.p1.4.m4.1b"><apply id="S8.I1.ix1.p1.4.m4.1.1.cmml" xref="S8.I1.ix1.p1.4.m4.1.1"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.4.m4.1.1.1.cmml" xref="S8.I1.ix1.p1.4.m4.1.1">subscript</csymbol><ci id="S8.I1.ix1.p1.4.m4.1.1.2.cmml" xref="S8.I1.ix1.p1.4.m4.1.1.2">𝑟</ci><apply id="S8.I1.ix1.p1.4.m4.1.1.3.cmml" xref="S8.I1.ix1.p1.4.m4.1.1.3"><times id="S8.I1.ix1.p1.4.m4.1.1.3.1.cmml" xref="S8.I1.ix1.p1.4.m4.1.1.3.1"></times><ci id="S8.I1.ix1.p1.4.m4.1.1.3.2.cmml" xref="S8.I1.ix1.p1.4.m4.1.1.3.2">𝑒</ci><ci id="S8.I1.ix1.p1.4.m4.1.1.3.3.cmml" xref="S8.I1.ix1.p1.4.m4.1.1.3.3">𝑒</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S8.I1.ix1.p1.4.m4.1c">r_{ee}</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix1.p1.4.m4.1d">italic_r start_POSTSUBSCRIPT italic_e italic_e end_POSTSUBSCRIPT</annotation></semantics></math>, encourages the manipulator to move towards the block: <math alttext="r_{ee}=-\|\mathbf{x}_{ee}-\mathbf{x}_{b}\|_{2}" class="ltx_Math" display="inline" id="S8.I1.ix1.p1.5.m5.1"><semantics id="S8.I1.ix1.p1.5.m5.1a"><mrow id="S8.I1.ix1.p1.5.m5.1.1" xref="S8.I1.ix1.p1.5.m5.1.1.cmml"><msub id="S8.I1.ix1.p1.5.m5.1.1.3" xref="S8.I1.ix1.p1.5.m5.1.1.3.cmml"><mi id="S8.I1.ix1.p1.5.m5.1.1.3.2" xref="S8.I1.ix1.p1.5.m5.1.1.3.2.cmml">r</mi><mrow id="S8.I1.ix1.p1.5.m5.1.1.3.3" xref="S8.I1.ix1.p1.5.m5.1.1.3.3.cmml"><mi id="S8.I1.ix1.p1.5.m5.1.1.3.3.2" xref="S8.I1.ix1.p1.5.m5.1.1.3.3.2.cmml">e</mi><mo id="S8.I1.ix1.p1.5.m5.1.1.3.3.1" xref="S8.I1.ix1.p1.5.m5.1.1.3.3.1.cmml">⁢</mo><mi id="S8.I1.ix1.p1.5.m5.1.1.3.3.3" xref="S8.I1.ix1.p1.5.m5.1.1.3.3.3.cmml">e</mi></mrow></msub><mo id="S8.I1.ix1.p1.5.m5.1.1.2" xref="S8.I1.ix1.p1.5.m5.1.1.2.cmml">=</mo><mrow id="S8.I1.ix1.p1.5.m5.1.1.1" xref="S8.I1.ix1.p1.5.m5.1.1.1.cmml"><mo id="S8.I1.ix1.p1.5.m5.1.1.1a" xref="S8.I1.ix1.p1.5.m5.1.1.1.cmml">−</mo><msub id="S8.I1.ix1.p1.5.m5.1.1.1.1" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.cmml"><mrow id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.2.cmml"><mo id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.2" stretchy="false" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.2.1.cmml">‖</mo><mrow id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.cmml"><msub id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.cmml"><mi id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.2" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.2.cmml">𝐱</mi><mrow id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3.cmml"><mi id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3.2" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3.2.cmml">e</mi><mo id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3.1" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3.1.cmml">⁢</mo><mi id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3.3" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3.3.cmml">e</mi></mrow></msub><mo id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.1" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.1.cmml">−</mo><msub id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.3" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.3.cmml"><mi id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.3.2" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.3.2.cmml">𝐱</mi><mi id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.3.3" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.3.3.cmml">b</mi></msub></mrow><mo id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.3" stretchy="false" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.2.1.cmml">‖</mo></mrow><mn id="S8.I1.ix1.p1.5.m5.1.1.1.1.3" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.3.cmml">2</mn></msub></mrow></mrow><annotation-xml encoding="MathML-Content" id="S8.I1.ix1.p1.5.m5.1b"><apply id="S8.I1.ix1.p1.5.m5.1.1.cmml" xref="S8.I1.ix1.p1.5.m5.1.1"><eq id="S8.I1.ix1.p1.5.m5.1.1.2.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.2"></eq><apply id="S8.I1.ix1.p1.5.m5.1.1.3.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.3"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.5.m5.1.1.3.1.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.3">subscript</csymbol><ci id="S8.I1.ix1.p1.5.m5.1.1.3.2.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.3.2">𝑟</ci><apply id="S8.I1.ix1.p1.5.m5.1.1.3.3.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.3.3"><times id="S8.I1.ix1.p1.5.m5.1.1.3.3.1.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.3.3.1"></times><ci id="S8.I1.ix1.p1.5.m5.1.1.3.3.2.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.3.3.2">𝑒</ci><ci id="S8.I1.ix1.p1.5.m5.1.1.3.3.3.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.3.3.3">𝑒</ci></apply></apply><apply id="S8.I1.ix1.p1.5.m5.1.1.1.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1"><minus id="S8.I1.ix1.p1.5.m5.1.1.1.2.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1"></minus><apply id="S8.I1.ix1.p1.5.m5.1.1.1.1.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.5.m5.1.1.1.1.2.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1">subscript</csymbol><apply id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.2.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1"><csymbol cd="latexml" id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.2.1.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.2">norm</csymbol><apply id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1"><minus id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.1.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.1"></minus><apply id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.1.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2">subscript</csymbol><ci id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.2.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.2">𝐱</ci><apply id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3"><times id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3.1.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3.1"></times><ci id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3.2.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3.2">𝑒</ci><ci id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3.3.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.2.3.3">𝑒</ci></apply></apply><apply id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.3.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.3.1.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.3">subscript</csymbol><ci id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.3.2.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.3.2">𝐱</ci><ci id="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.3.3.cmml" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.1.1.1.3.3">𝑏</ci></apply></apply></apply><cn id="S8.I1.ix1.p1.5.m5.1.1.1.1.3.cmml" type="integer" xref="S8.I1.ix1.p1.5.m5.1.1.1.1.3">2</cn></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S8.I1.ix1.p1.5.m5.1c">r_{ee}=-\|\mathbf{x}_{ee}-\mathbf{x}_{b}\|_{2}</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix1.p1.5.m5.1d">italic_r start_POSTSUBSCRIPT italic_e italic_e end_POSTSUBSCRIPT = - ∥ bold_x start_POSTSUBSCRIPT italic_e italic_e end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT</annotation></semantics></math>. The total reward is given by: <math alttext="r=\lambda_{d}r_{d}+\lambda_{ee}r_{ee}" class="ltx_Math" display="inline" id="S8.I1.ix1.p1.6.m6.1"><semantics id="S8.I1.ix1.p1.6.m6.1a"><mrow id="S8.I1.ix1.p1.6.m6.1.1" xref="S8.I1.ix1.p1.6.m6.1.1.cmml"><mi id="S8.I1.ix1.p1.6.m6.1.1.2" xref="S8.I1.ix1.p1.6.m6.1.1.2.cmml">r</mi><mo id="S8.I1.ix1.p1.6.m6.1.1.1" xref="S8.I1.ix1.p1.6.m6.1.1.1.cmml">=</mo><mrow id="S8.I1.ix1.p1.6.m6.1.1.3" xref="S8.I1.ix1.p1.6.m6.1.1.3.cmml"><mrow id="S8.I1.ix1.p1.6.m6.1.1.3.2" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.cmml"><msub id="S8.I1.ix1.p1.6.m6.1.1.3.2.2" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.2.cmml"><mi id="S8.I1.ix1.p1.6.m6.1.1.3.2.2.2" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.2.2.cmml">λ</mi><mi id="S8.I1.ix1.p1.6.m6.1.1.3.2.2.3" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.2.3.cmml">d</mi></msub><mo id="S8.I1.ix1.p1.6.m6.1.1.3.2.1" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.1.cmml">⁢</mo><msub id="S8.I1.ix1.p1.6.m6.1.1.3.2.3" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.3.cmml"><mi id="S8.I1.ix1.p1.6.m6.1.1.3.2.3.2" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.3.2.cmml">r</mi><mi id="S8.I1.ix1.p1.6.m6.1.1.3.2.3.3" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.3.3.cmml">d</mi></msub></mrow><mo id="S8.I1.ix1.p1.6.m6.1.1.3.1" xref="S8.I1.ix1.p1.6.m6.1.1.3.1.cmml">+</mo><mrow id="S8.I1.ix1.p1.6.m6.1.1.3.3" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.cmml"><msub id="S8.I1.ix1.p1.6.m6.1.1.3.3.2" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.2.cmml"><mi id="S8.I1.ix1.p1.6.m6.1.1.3.3.2.2" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.2.2.cmml">λ</mi><mrow id="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3.cmml"><mi id="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3.2" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3.2.cmml">e</mi><mo id="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3.1" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3.1.cmml">⁢</mo><mi id="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3.3" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3.3.cmml">e</mi></mrow></msub><mo id="S8.I1.ix1.p1.6.m6.1.1.3.3.1" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.1.cmml">⁢</mo><msub id="S8.I1.ix1.p1.6.m6.1.1.3.3.3" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.3.cmml"><mi id="S8.I1.ix1.p1.6.m6.1.1.3.3.3.2" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.3.2.cmml">r</mi><mrow id="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3.cmml"><mi id="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3.2" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3.2.cmml">e</mi><mo id="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3.1" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3.1.cmml">⁢</mo><mi id="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3.3" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3.3.cmml">e</mi></mrow></msub></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S8.I1.ix1.p1.6.m6.1b"><apply id="S8.I1.ix1.p1.6.m6.1.1.cmml" xref="S8.I1.ix1.p1.6.m6.1.1"><eq id="S8.I1.ix1.p1.6.m6.1.1.1.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.1"></eq><ci id="S8.I1.ix1.p1.6.m6.1.1.2.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.2">𝑟</ci><apply id="S8.I1.ix1.p1.6.m6.1.1.3.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3"><plus id="S8.I1.ix1.p1.6.m6.1.1.3.1.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.1"></plus><apply id="S8.I1.ix1.p1.6.m6.1.1.3.2.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.2"><times id="S8.I1.ix1.p1.6.m6.1.1.3.2.1.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.1"></times><apply id="S8.I1.ix1.p1.6.m6.1.1.3.2.2.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.2"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.6.m6.1.1.3.2.2.1.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.2">subscript</csymbol><ci id="S8.I1.ix1.p1.6.m6.1.1.3.2.2.2.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.2.2">𝜆</ci><ci id="S8.I1.ix1.p1.6.m6.1.1.3.2.2.3.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.2.3">𝑑</ci></apply><apply id="S8.I1.ix1.p1.6.m6.1.1.3.2.3.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.3"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.6.m6.1.1.3.2.3.1.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.3">subscript</csymbol><ci id="S8.I1.ix1.p1.6.m6.1.1.3.2.3.2.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.3.2">𝑟</ci><ci id="S8.I1.ix1.p1.6.m6.1.1.3.2.3.3.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.2.3.3">𝑑</ci></apply></apply><apply id="S8.I1.ix1.p1.6.m6.1.1.3.3.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3"><times id="S8.I1.ix1.p1.6.m6.1.1.3.3.1.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.1"></times><apply id="S8.I1.ix1.p1.6.m6.1.1.3.3.2.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.2"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.6.m6.1.1.3.3.2.1.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.2">subscript</csymbol><ci id="S8.I1.ix1.p1.6.m6.1.1.3.3.2.2.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.2.2">𝜆</ci><apply id="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3"><times id="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3.1.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3.1"></times><ci id="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3.2.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3.2">𝑒</ci><ci id="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3.3.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.2.3.3">𝑒</ci></apply></apply><apply id="S8.I1.ix1.p1.6.m6.1.1.3.3.3.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.3"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.6.m6.1.1.3.3.3.1.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.3">subscript</csymbol><ci id="S8.I1.ix1.p1.6.m6.1.1.3.3.3.2.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.3.2">𝑟</ci><apply id="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3"><times id="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3.1.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3.1"></times><ci id="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3.2.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3.2">𝑒</ci><ci id="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3.3.cmml" xref="S8.I1.ix1.p1.6.m6.1.1.3.3.3.3.3">𝑒</ci></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S8.I1.ix1.p1.6.m6.1c">r=\lambda_{d}r_{d}+\lambda_{ee}r_{ee}</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix1.p1.6.m6.1d">italic_r = italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_e italic_e end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_e italic_e end_POSTSUBSCRIPT</annotation></semantics></math>, where <math alttext="\lambda_{d},\lambda_{ee}" class="ltx_Math" display="inline" id="S8.I1.ix1.p1.7.m7.2"><semantics id="S8.I1.ix1.p1.7.m7.2a"><mrow id="S8.I1.ix1.p1.7.m7.2.2.2" xref="S8.I1.ix1.p1.7.m7.2.2.3.cmml"><msub id="S8.I1.ix1.p1.7.m7.1.1.1.1" xref="S8.I1.ix1.p1.7.m7.1.1.1.1.cmml"><mi id="S8.I1.ix1.p1.7.m7.1.1.1.1.2" xref="S8.I1.ix1.p1.7.m7.1.1.1.1.2.cmml">λ</mi><mi id="S8.I1.ix1.p1.7.m7.1.1.1.1.3" xref="S8.I1.ix1.p1.7.m7.1.1.1.1.3.cmml">d</mi></msub><mo id="S8.I1.ix1.p1.7.m7.2.2.2.3" xref="S8.I1.ix1.p1.7.m7.2.2.3.cmml">,</mo><msub id="S8.I1.ix1.p1.7.m7.2.2.2.2" xref="S8.I1.ix1.p1.7.m7.2.2.2.2.cmml"><mi id="S8.I1.ix1.p1.7.m7.2.2.2.2.2" xref="S8.I1.ix1.p1.7.m7.2.2.2.2.2.cmml">λ</mi><mrow id="S8.I1.ix1.p1.7.m7.2.2.2.2.3" xref="S8.I1.ix1.p1.7.m7.2.2.2.2.3.cmml"><mi id="S8.I1.ix1.p1.7.m7.2.2.2.2.3.2" xref="S8.I1.ix1.p1.7.m7.2.2.2.2.3.2.cmml">e</mi><mo id="S8.I1.ix1.p1.7.m7.2.2.2.2.3.1" xref="S8.I1.ix1.p1.7.m7.2.2.2.2.3.1.cmml">⁢</mo><mi id="S8.I1.ix1.p1.7.m7.2.2.2.2.3.3" xref="S8.I1.ix1.p1.7.m7.2.2.2.2.3.3.cmml">e</mi></mrow></msub></mrow><annotation-xml encoding="MathML-Content" id="S8.I1.ix1.p1.7.m7.2b"><list id="S8.I1.ix1.p1.7.m7.2.2.3.cmml" xref="S8.I1.ix1.p1.7.m7.2.2.2"><apply id="S8.I1.ix1.p1.7.m7.1.1.1.1.cmml" xref="S8.I1.ix1.p1.7.m7.1.1.1.1"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.7.m7.1.1.1.1.1.cmml" xref="S8.I1.ix1.p1.7.m7.1.1.1.1">subscript</csymbol><ci id="S8.I1.ix1.p1.7.m7.1.1.1.1.2.cmml" xref="S8.I1.ix1.p1.7.m7.1.1.1.1.2">𝜆</ci><ci id="S8.I1.ix1.p1.7.m7.1.1.1.1.3.cmml" xref="S8.I1.ix1.p1.7.m7.1.1.1.1.3">𝑑</ci></apply><apply id="S8.I1.ix1.p1.7.m7.2.2.2.2.cmml" xref="S8.I1.ix1.p1.7.m7.2.2.2.2"><csymbol cd="ambiguous" id="S8.I1.ix1.p1.7.m7.2.2.2.2.1.cmml" xref="S8.I1.ix1.p1.7.m7.2.2.2.2">subscript</csymbol><ci id="S8.I1.ix1.p1.7.m7.2.2.2.2.2.cmml" xref="S8.I1.ix1.p1.7.m7.2.2.2.2.2">𝜆</ci><apply id="S8.I1.ix1.p1.7.m7.2.2.2.2.3.cmml" xref="S8.I1.ix1.p1.7.m7.2.2.2.2.3"><times id="S8.I1.ix1.p1.7.m7.2.2.2.2.3.1.cmml" xref="S8.I1.ix1.p1.7.m7.2.2.2.2.3.1"></times><ci id="S8.I1.ix1.p1.7.m7.2.2.2.2.3.2.cmml" xref="S8.I1.ix1.p1.7.m7.2.2.2.2.3.2">𝑒</ci><ci id="S8.I1.ix1.p1.7.m7.2.2.2.2.3.3.cmml" xref="S8.I1.ix1.p1.7.m7.2.2.2.2.3.3">𝑒</ci></apply></apply></list></annotation-xml><annotation encoding="application/x-tex" id="S8.I1.ix1.p1.7.m7.2c">\lambda_{d},\lambda_{ee}</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix1.p1.7.m7.2d">italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_e italic_e end_POSTSUBSCRIPT</annotation></semantics></math> are weighting coefficients, which is <math alttext="[6.0,3.0]" class="ltx_Math" display="inline" id="S8.I1.ix1.p1.8.m8.2"><semantics id="S8.I1.ix1.p1.8.m8.2a"><mrow id="S8.I1.ix1.p1.8.m8.2.3.2" xref="S8.I1.ix1.p1.8.m8.2.3.1.cmml"><mo id="S8.I1.ix1.p1.8.m8.2.3.2.1" stretchy="false" xref="S8.I1.ix1.p1.8.m8.2.3.1.cmml">[</mo><mn id="S8.I1.ix1.p1.8.m8.1.1" xref="S8.I1.ix1.p1.8.m8.1.1.cmml">6.0</mn><mo id="S8.I1.ix1.p1.8.m8.2.3.2.2" xref="S8.I1.ix1.p1.8.m8.2.3.1.cmml">,</mo><mn id="S8.I1.ix1.p1.8.m8.2.2" xref="S8.I1.ix1.p1.8.m8.2.2.cmml">3.0</mn><mo id="S8.I1.ix1.p1.8.m8.2.3.2.3" stretchy="false" xref="S8.I1.ix1.p1.8.m8.2.3.1.cmml">]</mo></mrow><annotation-xml encoding="MathML-Content" id="S8.I1.ix1.p1.8.m8.2b"><interval closure="closed" id="S8.I1.ix1.p1.8.m8.2.3.1.cmml" xref="S8.I1.ix1.p1.8.m8.2.3.2"><cn id="S8.I1.ix1.p1.8.m8.1.1.cmml" type="float" xref="S8.I1.ix1.p1.8.m8.1.1">6.0</cn><cn id="S8.I1.ix1.p1.8.m8.2.2.cmml" type="float" xref="S8.I1.ix1.p1.8.m8.2.2">3.0</cn></interval></annotation-xml><annotation encoding="application/x-tex" id="S8.I1.ix1.p1.8.m8.2c">[6.0,3.0]</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix1.p1.8.m8.2d">[ 6.0 , 3.0 ]</annotation></semantics></math> in this test.</p> </div> </li> <li class="ltx_item" id="S8.I1.ix2" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">(2)</span> <div class="ltx_para" id="S8.I1.ix2.p1"> <p class="ltx_p" id="S8.I1.ix2.p1.7"><span class="ltx_text ltx_font_bold" id="S8.I1.ix2.p1.7.1">T-shaped Block Experiment</span> The reward function for manipulating the T-shaped block includes both position and orientation components that is <math alttext="r=\lambda_{d}r_{d}+\lambda_{o}r_{o}" class="ltx_Math" display="inline" id="S8.I1.ix2.p1.1.m1.1"><semantics id="S8.I1.ix2.p1.1.m1.1a"><mrow id="S8.I1.ix2.p1.1.m1.1.1" xref="S8.I1.ix2.p1.1.m1.1.1.cmml"><mi id="S8.I1.ix2.p1.1.m1.1.1.2" xref="S8.I1.ix2.p1.1.m1.1.1.2.cmml">r</mi><mo id="S8.I1.ix2.p1.1.m1.1.1.1" xref="S8.I1.ix2.p1.1.m1.1.1.1.cmml">=</mo><mrow id="S8.I1.ix2.p1.1.m1.1.1.3" xref="S8.I1.ix2.p1.1.m1.1.1.3.cmml"><mrow id="S8.I1.ix2.p1.1.m1.1.1.3.2" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.cmml"><msub id="S8.I1.ix2.p1.1.m1.1.1.3.2.2" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.2.cmml"><mi id="S8.I1.ix2.p1.1.m1.1.1.3.2.2.2" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.2.2.cmml">λ</mi><mi id="S8.I1.ix2.p1.1.m1.1.1.3.2.2.3" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.2.3.cmml">d</mi></msub><mo id="S8.I1.ix2.p1.1.m1.1.1.3.2.1" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.1.cmml">⁢</mo><msub id="S8.I1.ix2.p1.1.m1.1.1.3.2.3" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.3.cmml"><mi id="S8.I1.ix2.p1.1.m1.1.1.3.2.3.2" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.3.2.cmml">r</mi><mi id="S8.I1.ix2.p1.1.m1.1.1.3.2.3.3" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.3.3.cmml">d</mi></msub></mrow><mo id="S8.I1.ix2.p1.1.m1.1.1.3.1" xref="S8.I1.ix2.p1.1.m1.1.1.3.1.cmml">+</mo><mrow id="S8.I1.ix2.p1.1.m1.1.1.3.3" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.cmml"><msub id="S8.I1.ix2.p1.1.m1.1.1.3.3.2" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.2.cmml"><mi id="S8.I1.ix2.p1.1.m1.1.1.3.3.2.2" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.2.2.cmml">λ</mi><mi id="S8.I1.ix2.p1.1.m1.1.1.3.3.2.3" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.2.3.cmml">o</mi></msub><mo id="S8.I1.ix2.p1.1.m1.1.1.3.3.1" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.1.cmml">⁢</mo><msub id="S8.I1.ix2.p1.1.m1.1.1.3.3.3" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.3.cmml"><mi id="S8.I1.ix2.p1.1.m1.1.1.3.3.3.2" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.3.2.cmml">r</mi><mi id="S8.I1.ix2.p1.1.m1.1.1.3.3.3.3" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.3.3.cmml">o</mi></msub></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S8.I1.ix2.p1.1.m1.1b"><apply id="S8.I1.ix2.p1.1.m1.1.1.cmml" xref="S8.I1.ix2.p1.1.m1.1.1"><eq id="S8.I1.ix2.p1.1.m1.1.1.1.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.1"></eq><ci id="S8.I1.ix2.p1.1.m1.1.1.2.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.2">𝑟</ci><apply id="S8.I1.ix2.p1.1.m1.1.1.3.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3"><plus id="S8.I1.ix2.p1.1.m1.1.1.3.1.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.1"></plus><apply id="S8.I1.ix2.p1.1.m1.1.1.3.2.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.2"><times id="S8.I1.ix2.p1.1.m1.1.1.3.2.1.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.1"></times><apply id="S8.I1.ix2.p1.1.m1.1.1.3.2.2.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.2"><csymbol cd="ambiguous" id="S8.I1.ix2.p1.1.m1.1.1.3.2.2.1.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.2">subscript</csymbol><ci id="S8.I1.ix2.p1.1.m1.1.1.3.2.2.2.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.2.2">𝜆</ci><ci id="S8.I1.ix2.p1.1.m1.1.1.3.2.2.3.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.2.3">𝑑</ci></apply><apply id="S8.I1.ix2.p1.1.m1.1.1.3.2.3.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.3"><csymbol cd="ambiguous" id="S8.I1.ix2.p1.1.m1.1.1.3.2.3.1.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.3">subscript</csymbol><ci id="S8.I1.ix2.p1.1.m1.1.1.3.2.3.2.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.3.2">𝑟</ci><ci id="S8.I1.ix2.p1.1.m1.1.1.3.2.3.3.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.2.3.3">𝑑</ci></apply></apply><apply id="S8.I1.ix2.p1.1.m1.1.1.3.3.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.3"><times id="S8.I1.ix2.p1.1.m1.1.1.3.3.1.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.1"></times><apply id="S8.I1.ix2.p1.1.m1.1.1.3.3.2.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.2"><csymbol cd="ambiguous" id="S8.I1.ix2.p1.1.m1.1.1.3.3.2.1.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.2">subscript</csymbol><ci id="S8.I1.ix2.p1.1.m1.1.1.3.3.2.2.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.2.2">𝜆</ci><ci id="S8.I1.ix2.p1.1.m1.1.1.3.3.2.3.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.2.3">𝑜</ci></apply><apply id="S8.I1.ix2.p1.1.m1.1.1.3.3.3.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.3"><csymbol cd="ambiguous" id="S8.I1.ix2.p1.1.m1.1.1.3.3.3.1.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.3">subscript</csymbol><ci id="S8.I1.ix2.p1.1.m1.1.1.3.3.3.2.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.3.2">𝑟</ci><ci id="S8.I1.ix2.p1.1.m1.1.1.3.3.3.3.cmml" xref="S8.I1.ix2.p1.1.m1.1.1.3.3.3.3">𝑜</ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S8.I1.ix2.p1.1.m1.1c">r=\lambda_{d}r_{d}+\lambda_{o}r_{o}</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix2.p1.1.m1.1d">italic_r = italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT</annotation></semantics></math>, where the orientation reward <math alttext="r_{o}" class="ltx_Math" display="inline" id="S8.I1.ix2.p1.2.m2.1"><semantics id="S8.I1.ix2.p1.2.m2.1a"><msub id="S8.I1.ix2.p1.2.m2.1.1" xref="S8.I1.ix2.p1.2.m2.1.1.cmml"><mi id="S8.I1.ix2.p1.2.m2.1.1.2" xref="S8.I1.ix2.p1.2.m2.1.1.2.cmml">r</mi><mi id="S8.I1.ix2.p1.2.m2.1.1.3" xref="S8.I1.ix2.p1.2.m2.1.1.3.cmml">o</mi></msub><annotation-xml encoding="MathML-Content" id="S8.I1.ix2.p1.2.m2.1b"><apply id="S8.I1.ix2.p1.2.m2.1.1.cmml" xref="S8.I1.ix2.p1.2.m2.1.1"><csymbol cd="ambiguous" id="S8.I1.ix2.p1.2.m2.1.1.1.cmml" xref="S8.I1.ix2.p1.2.m2.1.1">subscript</csymbol><ci id="S8.I1.ix2.p1.2.m2.1.1.2.cmml" xref="S8.I1.ix2.p1.2.m2.1.1.2">𝑟</ci><ci id="S8.I1.ix2.p1.2.m2.1.1.3.cmml" xref="S8.I1.ix2.p1.2.m2.1.1.3">𝑜</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S8.I1.ix2.p1.2.m2.1c">r_{o}</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix2.p1.2.m2.1d">italic_r start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT</annotation></semantics></math> penalizes deviations from the desired orientation: <math alttext="r_{o}=-\arccos(\langle\mathbf{q}_{b},\mathbf{q}_{t}\rangle)" class="ltx_Math" display="inline" id="S8.I1.ix2.p1.3.m3.2"><semantics id="S8.I1.ix2.p1.3.m3.2a"><mrow id="S8.I1.ix2.p1.3.m3.2.2" xref="S8.I1.ix2.p1.3.m3.2.2.cmml"><msub id="S8.I1.ix2.p1.3.m3.2.2.3" xref="S8.I1.ix2.p1.3.m3.2.2.3.cmml"><mi id="S8.I1.ix2.p1.3.m3.2.2.3.2" xref="S8.I1.ix2.p1.3.m3.2.2.3.2.cmml">r</mi><mi id="S8.I1.ix2.p1.3.m3.2.2.3.3" xref="S8.I1.ix2.p1.3.m3.2.2.3.3.cmml">o</mi></msub><mo id="S8.I1.ix2.p1.3.m3.2.2.2" xref="S8.I1.ix2.p1.3.m3.2.2.2.cmml">=</mo><mrow id="S8.I1.ix2.p1.3.m3.2.2.1" xref="S8.I1.ix2.p1.3.m3.2.2.1.cmml"><mo id="S8.I1.ix2.p1.3.m3.2.2.1a" rspace="0.167em" xref="S8.I1.ix2.p1.3.m3.2.2.1.cmml">−</mo><mrow id="S8.I1.ix2.p1.3.m3.2.2.1.1.1" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.2.cmml"><mi id="S8.I1.ix2.p1.3.m3.1.1" xref="S8.I1.ix2.p1.3.m3.1.1.cmml">arccos</mi><mo id="S8.I1.ix2.p1.3.m3.2.2.1.1.1a" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.2.cmml">⁡</mo><mrow id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.2.cmml"><mo id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.2" stretchy="false" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.2.cmml">(</mo><mrow id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.3.cmml"><mo id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.3" stretchy="false" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.3.cmml">⟨</mo><msub id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.1.1" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.1.1.cmml"><mi id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.1.1.2" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.1.1.2.cmml">𝐪</mi><mi id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.1.1.3" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.1.1.3.cmml">b</mi></msub><mo id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.4" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.3.cmml">,</mo><msub id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.2" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.2.cmml"><mi id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.2.2" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.2.2.cmml">𝐪</mi><mi id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.2.3" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.2.3.cmml">t</mi></msub><mo id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.5" stretchy="false" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.3.cmml">⟩</mo></mrow><mo id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.3" stretchy="false" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.2.cmml">)</mo></mrow></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S8.I1.ix2.p1.3.m3.2b"><apply id="S8.I1.ix2.p1.3.m3.2.2.cmml" xref="S8.I1.ix2.p1.3.m3.2.2"><eq id="S8.I1.ix2.p1.3.m3.2.2.2.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.2"></eq><apply id="S8.I1.ix2.p1.3.m3.2.2.3.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.3"><csymbol cd="ambiguous" id="S8.I1.ix2.p1.3.m3.2.2.3.1.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.3">subscript</csymbol><ci id="S8.I1.ix2.p1.3.m3.2.2.3.2.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.3.2">𝑟</ci><ci id="S8.I1.ix2.p1.3.m3.2.2.3.3.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.3.3">𝑜</ci></apply><apply id="S8.I1.ix2.p1.3.m3.2.2.1.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.1"><minus id="S8.I1.ix2.p1.3.m3.2.2.1.2.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.1"></minus><apply id="S8.I1.ix2.p1.3.m3.2.2.1.1.2.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1"><arccos id="S8.I1.ix2.p1.3.m3.1.1.cmml" xref="S8.I1.ix2.p1.3.m3.1.1"></arccos><list id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.3.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2"><apply id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.1.1.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.1.1.1.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.1.1">subscript</csymbol><ci id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.1.1.2.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.1.1.2">𝐪</ci><ci id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.1.1.3.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.1.1.3">𝑏</ci></apply><apply id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.2.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.2"><csymbol cd="ambiguous" id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.2.1.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.2">subscript</csymbol><ci id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.2.2.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.2.2">𝐪</ci><ci id="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.2.3.cmml" xref="S8.I1.ix2.p1.3.m3.2.2.1.1.1.1.1.2.2.3">𝑡</ci></apply></list></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S8.I1.ix2.p1.3.m3.2c">r_{o}=-\arccos(\langle\mathbf{q}_{b},\mathbf{q}_{t}\rangle)</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix2.p1.3.m3.2d">italic_r start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = - roman_arccos ( ⟨ bold_q start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , bold_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ )</annotation></semantics></math> that <math alttext="\langle\mathbf{q}_{b},\mathbf{q}_{t}\rangle" class="ltx_Math" display="inline" id="S8.I1.ix2.p1.4.m4.2"><semantics id="S8.I1.ix2.p1.4.m4.2a"><mrow id="S8.I1.ix2.p1.4.m4.2.2.2" xref="S8.I1.ix2.p1.4.m4.2.2.3.cmml"><mo id="S8.I1.ix2.p1.4.m4.2.2.2.3" stretchy="false" xref="S8.I1.ix2.p1.4.m4.2.2.3.cmml">⟨</mo><msub id="S8.I1.ix2.p1.4.m4.1.1.1.1" xref="S8.I1.ix2.p1.4.m4.1.1.1.1.cmml"><mi id="S8.I1.ix2.p1.4.m4.1.1.1.1.2" xref="S8.I1.ix2.p1.4.m4.1.1.1.1.2.cmml">𝐪</mi><mi id="S8.I1.ix2.p1.4.m4.1.1.1.1.3" xref="S8.I1.ix2.p1.4.m4.1.1.1.1.3.cmml">b</mi></msub><mo id="S8.I1.ix2.p1.4.m4.2.2.2.4" xref="S8.I1.ix2.p1.4.m4.2.2.3.cmml">,</mo><msub id="S8.I1.ix2.p1.4.m4.2.2.2.2" xref="S8.I1.ix2.p1.4.m4.2.2.2.2.cmml"><mi id="S8.I1.ix2.p1.4.m4.2.2.2.2.2" xref="S8.I1.ix2.p1.4.m4.2.2.2.2.2.cmml">𝐪</mi><mi id="S8.I1.ix2.p1.4.m4.2.2.2.2.3" xref="S8.I1.ix2.p1.4.m4.2.2.2.2.3.cmml">t</mi></msub><mo id="S8.I1.ix2.p1.4.m4.2.2.2.5" stretchy="false" xref="S8.I1.ix2.p1.4.m4.2.2.3.cmml">⟩</mo></mrow><annotation-xml encoding="MathML-Content" id="S8.I1.ix2.p1.4.m4.2b"><list id="S8.I1.ix2.p1.4.m4.2.2.3.cmml" xref="S8.I1.ix2.p1.4.m4.2.2.2"><apply id="S8.I1.ix2.p1.4.m4.1.1.1.1.cmml" xref="S8.I1.ix2.p1.4.m4.1.1.1.1"><csymbol cd="ambiguous" id="S8.I1.ix2.p1.4.m4.1.1.1.1.1.cmml" xref="S8.I1.ix2.p1.4.m4.1.1.1.1">subscript</csymbol><ci id="S8.I1.ix2.p1.4.m4.1.1.1.1.2.cmml" xref="S8.I1.ix2.p1.4.m4.1.1.1.1.2">𝐪</ci><ci id="S8.I1.ix2.p1.4.m4.1.1.1.1.3.cmml" xref="S8.I1.ix2.p1.4.m4.1.1.1.1.3">𝑏</ci></apply><apply id="S8.I1.ix2.p1.4.m4.2.2.2.2.cmml" xref="S8.I1.ix2.p1.4.m4.2.2.2.2"><csymbol cd="ambiguous" id="S8.I1.ix2.p1.4.m4.2.2.2.2.1.cmml" xref="S8.I1.ix2.p1.4.m4.2.2.2.2">subscript</csymbol><ci id="S8.I1.ix2.p1.4.m4.2.2.2.2.2.cmml" xref="S8.I1.ix2.p1.4.m4.2.2.2.2.2">𝐪</ci><ci id="S8.I1.ix2.p1.4.m4.2.2.2.2.3.cmml" xref="S8.I1.ix2.p1.4.m4.2.2.2.2.3">𝑡</ci></apply></list></annotation-xml><annotation encoding="application/x-tex" id="S8.I1.ix2.p1.4.m4.2c">\langle\mathbf{q}_{b},\mathbf{q}_{t}\rangle</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix2.p1.4.m4.2d">⟨ bold_q start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , bold_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩</annotation></semantics></math> represents the inner product of the unit quaternions, measuring the angular difference. <math alttext="\lambda_{d}" class="ltx_Math" display="inline" id="S8.I1.ix2.p1.5.m5.1"><semantics id="S8.I1.ix2.p1.5.m5.1a"><msub id="S8.I1.ix2.p1.5.m5.1.1" xref="S8.I1.ix2.p1.5.m5.1.1.cmml"><mi id="S8.I1.ix2.p1.5.m5.1.1.2" xref="S8.I1.ix2.p1.5.m5.1.1.2.cmml">λ</mi><mi id="S8.I1.ix2.p1.5.m5.1.1.3" xref="S8.I1.ix2.p1.5.m5.1.1.3.cmml">d</mi></msub><annotation-xml encoding="MathML-Content" id="S8.I1.ix2.p1.5.m5.1b"><apply id="S8.I1.ix2.p1.5.m5.1.1.cmml" xref="S8.I1.ix2.p1.5.m5.1.1"><csymbol cd="ambiguous" id="S8.I1.ix2.p1.5.m5.1.1.1.cmml" xref="S8.I1.ix2.p1.5.m5.1.1">subscript</csymbol><ci id="S8.I1.ix2.p1.5.m5.1.1.2.cmml" xref="S8.I1.ix2.p1.5.m5.1.1.2">𝜆</ci><ci id="S8.I1.ix2.p1.5.m5.1.1.3.cmml" xref="S8.I1.ix2.p1.5.m5.1.1.3">𝑑</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S8.I1.ix2.p1.5.m5.1c">\lambda_{d}</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix2.p1.5.m5.1d">italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT</annotation></semantics></math> and <math alttext="\lambda_{o}" class="ltx_Math" display="inline" id="S8.I1.ix2.p1.6.m6.1"><semantics id="S8.I1.ix2.p1.6.m6.1a"><msub id="S8.I1.ix2.p1.6.m6.1.1" xref="S8.I1.ix2.p1.6.m6.1.1.cmml"><mi id="S8.I1.ix2.p1.6.m6.1.1.2" xref="S8.I1.ix2.p1.6.m6.1.1.2.cmml">λ</mi><mi id="S8.I1.ix2.p1.6.m6.1.1.3" xref="S8.I1.ix2.p1.6.m6.1.1.3.cmml">o</mi></msub><annotation-xml encoding="MathML-Content" id="S8.I1.ix2.p1.6.m6.1b"><apply id="S8.I1.ix2.p1.6.m6.1.1.cmml" xref="S8.I1.ix2.p1.6.m6.1.1"><csymbol cd="ambiguous" id="S8.I1.ix2.p1.6.m6.1.1.1.cmml" xref="S8.I1.ix2.p1.6.m6.1.1">subscript</csymbol><ci id="S8.I1.ix2.p1.6.m6.1.1.2.cmml" xref="S8.I1.ix2.p1.6.m6.1.1.2">𝜆</ci><ci id="S8.I1.ix2.p1.6.m6.1.1.3.cmml" xref="S8.I1.ix2.p1.6.m6.1.1.3">𝑜</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S8.I1.ix2.p1.6.m6.1c">\lambda_{o}</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix2.p1.6.m6.1d">italic_λ start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT</annotation></semantics></math> are weighting coefficients ensuring proper balance between positional and orientation alignment, which is <math alttext="[3.0,6.0]" class="ltx_Math" display="inline" id="S8.I1.ix2.p1.7.m7.2"><semantics id="S8.I1.ix2.p1.7.m7.2a"><mrow id="S8.I1.ix2.p1.7.m7.2.3.2" xref="S8.I1.ix2.p1.7.m7.2.3.1.cmml"><mo id="S8.I1.ix2.p1.7.m7.2.3.2.1" stretchy="false" xref="S8.I1.ix2.p1.7.m7.2.3.1.cmml">[</mo><mn id="S8.I1.ix2.p1.7.m7.1.1" xref="S8.I1.ix2.p1.7.m7.1.1.cmml">3.0</mn><mo id="S8.I1.ix2.p1.7.m7.2.3.2.2" xref="S8.I1.ix2.p1.7.m7.2.3.1.cmml">,</mo><mn id="S8.I1.ix2.p1.7.m7.2.2" xref="S8.I1.ix2.p1.7.m7.2.2.cmml">6.0</mn><mo id="S8.I1.ix2.p1.7.m7.2.3.2.3" stretchy="false" xref="S8.I1.ix2.p1.7.m7.2.3.1.cmml">]</mo></mrow><annotation-xml encoding="MathML-Content" id="S8.I1.ix2.p1.7.m7.2b"><interval closure="closed" id="S8.I1.ix2.p1.7.m7.2.3.1.cmml" xref="S8.I1.ix2.p1.7.m7.2.3.2"><cn id="S8.I1.ix2.p1.7.m7.1.1.cmml" type="float" xref="S8.I1.ix2.p1.7.m7.1.1">3.0</cn><cn id="S8.I1.ix2.p1.7.m7.2.2.cmml" type="float" xref="S8.I1.ix2.p1.7.m7.2.2">6.0</cn></interval></annotation-xml><annotation encoding="application/x-tex" id="S8.I1.ix2.p1.7.m7.2c">[3.0,6.0]</annotation><annotation encoding="application/x-llamapun" id="S8.I1.ix2.p1.7.m7.2d">[ 3.0 , 6.0 ]</annotation></semantics></math> in this test.</p> </div> </li> </ul> </div> </section> </section> </article> </div> <footer class="ltx_page_footer"> <div class="ltx_page_logo">Generated on Tue Mar 18 07:27:46 2025 by <a class="ltx_LaTeXML_logo" href="http://dlmf.nist.gov/LaTeXML/"><span style="letter-spacing:-0.2em; margin-right:0.1em;">L<span class="ltx_font_smallcaps" style="position:relative; bottom:2.2pt;">a</span>T<span class="ltx_font_smallcaps" style="font-size:120%;position:relative; bottom:-0.2ex;">e</span></span><span style="font-size:90%; position:relative; bottom:-0.2ex;">XML</span><img alt="Mascot Sammy" src=""/></a> </div></footer> </div> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10