CINXE.COM

Multi-Objective Reinforcement Learning for Power Grid Topology Control

<!DOCTYPE html> <html lang="en"> <head> <meta content="text/html; charset=utf-8" http-equiv="content-type"/> <title>Multi-Objective Reinforcement Learning for Power Grid Topology Control</title> <!--Generated on Mon Jan 27 12:20:35 2025 by LaTeXML (version 0.8.8) http://dlmf.nist.gov/LaTeXML/.--> <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv-fonts.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/latexml_styles.css" rel="stylesheet" type="text/css"/> <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.3.3/html2canvas.min.js"></script> <script src="/static/browse/0.3.4/js/addons_new.js"></script> <script src="/static/browse/0.3.4/js/feedbackOverlay.js"></script> <meta content=" Transmission network topology control, Multi-objective reinforcement learning, Deep optimistic linear support. " lang="en" name="keywords"/> <base href="/html/2502.00040v1/"/></head> <body> <nav class="ltx_page_navbar"> <nav class="ltx_TOC"> <ol class="ltx_toclist"> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S1" title="In Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">I </span><span class="ltx_text ltx_font_smallcaps">Introduction</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S2" title="In Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">II </span><span class="ltx_text ltx_font_smallcaps">Methodology</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S2.SS1" title="In II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-A</span> </span><span class="ltx_text ltx_font_italic">Single-Policy Multi-Objective PPO</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S2.SS2" title="In II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-B</span> </span><span class="ltx_text ltx_font_italic">Reward functions for operational objectives</span></span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S2.SS2.SSS1" title="In II-B Reward functions for operational objectives ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-B</span>1 </span>Line Loading Reward</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S2.SS2.SSS2" title="In II-B Reward functions for operational objectives ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-B</span>2 </span>Topological Deviation</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S2.SS2.SSS3" title="In II-B Reward functions for operational objectives ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-B</span>3 </span>Switching Frequency</span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S2.SS3" title="In II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-C</span> </span><span class="ltx_text ltx_font_italic">Deep Optimistic Linear Support</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S2.SS4" title="In II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-D</span> </span><span class="ltx_text ltx_font_italic">Policy Selection</span></span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3" title="In Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">III </span><span class="ltx_text ltx_font_smallcaps">Case Studies</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.SS1" title="In III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-A</span> </span><span class="ltx_text ltx_font_italic">Settings</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.SS2" title="In III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-B</span> </span><span class="ltx_text ltx_font_italic">Pareto Front Approximation</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.SS3" title="In III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-C</span> </span><span class="ltx_text ltx_font_italic">Robustness to N-1 Contingencies</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.SS4" title="In III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-D</span> </span><span class="ltx_text ltx_font_italic">Efficient Training</span></span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S4" title="In Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">IV </span><span class="ltx_text ltx_font_smallcaps">Discussion and Conclusion</span></span></a></li> </ol></nav> </nav> <div class="ltx_page_main"> <div class="ltx_page_content"> <article class="ltx_document ltx_authors_1line"> <h1 class="ltx_title ltx_title_document">Multi-Objective Reinforcement Learning for Power Grid Topology Control </h1> <div class="ltx_authors"> <span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Thomas Lautenbacher </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_affiliation"><span class="ltx_text ltx_font_italic" id="id3.1.id1">50Hertz Transmission GmbH <br class="ltx_break"/></span>Berlin, Germany <br class="ltx_break"/>thomasrene.lautenbacher@50Hertz.com </span>The research was carried out for an MSc thesis project in Delft University of Technology.</span></span> <span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Ali Rajaei </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_affiliation"><span class="ltx_text ltx_font_italic" id="id4.1.id1">Delft University of Technology <br class="ltx_break"/></span>Delft, The Netherlands <br class="ltx_break"/>a.rajaei@tudelft.nl </span></span></span> <span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Davide Barbieri <span class="ltx_text" id="id5.1.id1"></span><span class="ltx_text" id="id6.2.id2"></span> <span class="ltx_ERROR undefined" id="id7.3.id3">{@IEEEauthorhalign}</span> Jan Viebahn </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_affiliation"><span class="ltx_text ltx_font_italic" id="id8.4.id1">TenneT TSO B.V. <br class="ltx_break"/></span>Arnhem, The Netherlands <br class="ltx_break"/>davide.barbieri@tennet.eu </span> <span class="ltx_contact ltx_role_affiliation"><span class="ltx_text ltx_font_italic" id="id9.5.id1">TenneT TSO B.V. <br class="ltx_break"/></span>Arnhem, The Netherlands <br class="ltx_break"/>jan.viebahn@tennet.eu </span></span></span> <span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Jochen L. Cremer </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_affiliation"><span class="ltx_text ltx_font_italic" id="id10.1.id1">Delft University of Technology <br class="ltx_break"/></span>Delft, The Netherlands <br class="ltx_break"/>j.l.cremer@tudelft.nl </span></span></span> </div> <div class="ltx_abstract"> <h6 class="ltx_title ltx_title_abstract">Abstract</h6> <p class="ltx_p" id="id2.2">Transmission grid congestion increases as the electrification of various sectors requires transmitting more power. Topology control, through substation reconfiguration, can reduce congestion but its potential remains under-exploited in operations. A challenge is modeling the topology control problem to align well with the objectives and constraints of operators. Addressing this challenge, this paper investigates the application of multi-objective reinforcement learning (MORL) to integrate multiple conflicting objectives for power grid topology control. We develop a MORL approach using deep optimistic linear support (DOL) and multi-objective proximal policy optimization (MOPPO) to generate a set of Pareto-optimal policies that balance objectives such as minimizing line loading, topological deviation, and switching frequency. Initial case studies show that the MORL approach can provide valuable insights into objective trade-offs and improve Pareto front approximation compared to a random search baseline. The generated multi-objective RL policies are <math alttext="30" class="ltx_Math" display="inline" id="id1.1.m1.1"><semantics id="id1.1.m1.1a"><mn id="id1.1.m1.1.1" xref="id1.1.m1.1.1.cmml">30</mn><annotation-xml encoding="MathML-Content" id="id1.1.m1.1b"><cn id="id1.1.m1.1.1.cmml" type="integer" xref="id1.1.m1.1.1">30</cn></annotation-xml><annotation encoding="application/x-tex" id="id1.1.m1.1c">30</annotation><annotation encoding="application/x-llamapun" id="id1.1.m1.1d">30</annotation></semantics></math>% more successful in preventing grid failure under contingencies and <math alttext="20" class="ltx_Math" display="inline" id="id2.2.m2.1"><semantics id="id2.2.m2.1a"><mn id="id2.2.m2.1.1" xref="id2.2.m2.1.1.cmml">20</mn><annotation-xml encoding="MathML-Content" id="id2.2.m2.1b"><cn id="id2.2.m2.1.1.cmml" type="integer" xref="id2.2.m2.1.1">20</cn></annotation-xml><annotation encoding="application/x-tex" id="id2.2.m2.1c">20</annotation><annotation encoding="application/x-llamapun" id="id2.2.m2.1d">20</annotation></semantics></math>% more effective when training budget is reduced - compared to the common single objective RL policy.</p> </div> <div class="ltx_keywords"> <h6 class="ltx_title ltx_title_keywords">Index Terms: </h6> Transmission network topology control, Multi-objective reinforcement learning, Deep optimistic linear support. </div> <section class="ltx_section" id="S1"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">I </span><span class="ltx_text ltx_font_smallcaps" id="S1.1.1">Introduction</span> </h2> <div class="ltx_para" id="S1.p1"> <p class="ltx_p" id="S1.p1.1">The energy transition and the shift toward renewable energy sources are crucial steps for mitigating climate change and ensuring a sustainable energy future. However, this transition poses significant operational challenges for system operators, including congestion management. Transmission network topology control is an under-utilized and non-costly source of flexibility. Adjusting the network topology, such as line switching or modifying busbar connections within substations, can reroute power flows to prevent line overloads and mitigate cascading outages <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib1" title="">1</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib2" title="">2</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib3" title="">3</a>]</cite>. In addition to maintaining continuous electricity supply, power systems must address other objectives, such as minimizing asset wear, reducing operational cost, and mitigating environmental impacts. Achieving these objectives requires a multi-objective approach to decision-making that maintains grid security while addressing other operational objectives <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib4" title="">4</a>]</cite>.</p> </div> <div class="ltx_para" id="S1.p2"> <p class="ltx_p" id="S1.p2.1">Transmission network topology control problem can be modeled as a mixed-integer non-linear optimization problem which is computationally challenging to approach. The so-called combinatorial explosion of possible topologies and the complex nonlinear nature of power systems <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib1" title="">1</a>]</cite> makes this problem challenging. To address these challenges, heuristic and expert rule-based approaches, such as in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib5" title="">5</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib6" title="">6</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib7" title="">7</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib8" title="">8</a>]</cite> are developed to determine corrective topological actions to relieve congestion. However, these approaches do not provide a sequence of control actions and may lead to sub-optimal solutions. To provide sequences of actions, recently researchers explored the use of reinforcement learning (RL) and Artificial Intelligence (AI) more broadly for topological control<cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib9" title="">9</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib3" title="">3</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib10" title="">10</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib11" title="">11</a>]</cite>. Studies such as <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib12" title="">12</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib13" title="">13</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib14" title="">14</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib15" title="">15</a>]</cite> explore RL-based approaches, including the deep duelling Q-network (DDQN) initialized with imitation learning <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib12" title="">12</a>]</cite>, the Semi-Markov actor-critic algorithm <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib13" title="">13</a>]</cite>, the cross-entropy method with importance sampling <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib14" title="">14</a>]</cite>, and the proximal policy optimization (PPO) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib15" title="">15</a>]</cite>. Additionally, <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib16" title="">16</a>]</cite> develops an AlphaZero-based approach using Monte-Carlo tree search to simulate future outcomes, and guide the agent toward long-term strategies, while <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib17" title="">17</a>]</cite> presents a curriculum-based approach to improve learning efficiency and stability. Building on these ideas, <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib18" title="">18</a>]</cite> combines curriculum learning with tree search to benefit from long-term strategies as well as the efficiency and stability. Some studies focus on addressing the combinatorial explosion of the topology control problem through hierarchical RL <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib19" title="">19</a>]</cite> and multi-agent RL <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib20" title="">20</a>]</cite>. Furthermore, <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib21" title="">21</a>]</cite> proposes a reward design using multiple metrics to reduce overloads. However, the developed approaches in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib6" title="">6</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib7" title="">7</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib5" title="">5</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib8" title="">8</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib12" title="">12</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib13" title="">13</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib14" title="">14</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib15" title="">15</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib16" title="">16</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib17" title="">17</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib18" title="">18</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib19" title="">19</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib20" title="">20</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib21" title="">21</a>]</cite> focus on single operational objectives and providing only a single policy. This limits their application to address the trade-offs inherent in the multi-objective nature of power systems and to provide a set of policies for operators to select from.</p> </div> <div class="ltx_para" id="S1.p3"> <p class="ltx_p" id="S1.p3.1">This paper proposes a multi-objective RL (MORL) approach to address the network topology control problem. Despite previous studies on RL approaches, to the best of the authors’ knowledge, a multi-policy MORL approach for topology control has not been investigated before. To this end, we implement deep optimistic linear support (DOL) and multi-objective PPO (MOPPO) to generate a set of Pareto-optimal policies. Additionally, we develop custom reward functions for different operational objectives, including line loading, topological deviation, and switching frequency. The proposed MORL approach not only shows the trade-offs between these objectives but also provides a set of policies that balance these trade-offs, offering a decision-support approach for system operators. By considering multiple rewards, the approach effectively relieves grid congestion while addressing other operational objectives.</p> </div> <div class="ltx_para" id="S1.p4"> <p class="ltx_p" id="S1.p4.1">The rest of the paper is organized as follows. <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S2" title="II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">Section</span> <span class="ltx_text ltx_ref_tag">II</span></a> presents the proposed MORL approach and the design of the reward functions. <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3" title="III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">Section</span> <span class="ltx_text ltx_ref_tag">III</span></a> presents the case studies, investigating the efficiency and robustness of the proposed approach. <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S4" title="IV Discussion and Conclusion ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">Section</span> <span class="ltx_text ltx_ref_tag">IV</span></a> provides discussions and concludes the paper.</p> </div> </section> <section class="ltx_section" id="S2"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">II </span><span class="ltx_text ltx_font_smallcaps" id="S2.1.1">Methodology</span> </h2> <div class="ltx_para" id="S2.p1"> <p class="ltx_p" id="S2.p1.4">This paper aims to provide a decision-support approach for transmission system operators to perform topological control considering multiple objectives. The proposed approach integrates a multi-objective adaptation of the PPO algorithm <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib22" title="">22</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib23" title="">23</a>]</cite> (MOPPO) with deep optimistic linear support <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib24" title="">24</a>]</cite>. To capture different operational objectives, we design custom reward functions that address line loading, topological deviation and switching frequency. The proposed approach considers a multi-policy MORL <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib25" title="">25</a>]</cite>, which results in a set of optimal solutions rather than a single solution. This allows system operators to better understand the trade-offs among objectives and select the most appropriate policy. <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S2.F1" title="In II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">Fig.</span> <span class="ltx_text ltx_ref_tag">1</span></a> depicts the proposed approach. By using DOL as an outer loop method within the MORL approach, the convex coverage set of solutions is constructed iteratively. In each iteration, the DOL generates a new set of weight vectors <span class="ltx_text ltx_markedasmath ltx_font_bold" id="S2.p1.4.1">w</span> and gives one weight vector with the highest priority to MOPPO, that is then trained in the multi-objective environment. MOPPO uses <span class="ltx_text ltx_markedasmath ltx_font_bold" id="S2.p1.4.2">w</span> to account for the multiple rewards <math alttext="\textbf{R}_{t}" class="ltx_Math" display="inline" id="S2.p1.3.m3.1"><semantics id="S2.p1.3.m3.1a"><msub id="S2.p1.3.m3.1.1" xref="S2.p1.3.m3.1.1.cmml"><mtext class="ltx_mathvariant_bold" id="S2.p1.3.m3.1.1.2" xref="S2.p1.3.m3.1.1.2a.cmml">R</mtext><mi id="S2.p1.3.m3.1.1.3" xref="S2.p1.3.m3.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.p1.3.m3.1b"><apply id="S2.p1.3.m3.1.1.cmml" xref="S2.p1.3.m3.1.1"><csymbol cd="ambiguous" id="S2.p1.3.m3.1.1.1.cmml" xref="S2.p1.3.m3.1.1">subscript</csymbol><ci id="S2.p1.3.m3.1.1.2a.cmml" xref="S2.p1.3.m3.1.1.2"><mtext class="ltx_mathvariant_bold" id="S2.p1.3.m3.1.1.2.cmml" xref="S2.p1.3.m3.1.1.2">R</mtext></ci><ci id="S2.p1.3.m3.1.1.3.cmml" xref="S2.p1.3.m3.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p1.3.m3.1c">\textbf{R}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.p1.3.m3.1d">R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math>. After training, the MOPPO is evaluated. If the average value vector over the evaluation episodes <math alttext="\mathbfcal{V}" class="ltx_Math" display="inline" id="S2.p1.4.m4.1"><semantics id="S2.p1.4.m4.1a"><mi id="S2.p1.4.m4.1.1" mathvariant="normal" xref="S2.p1.4.m4.1.1.cmml">𝒱</mi><annotation-xml encoding="MathML-Content" id="S2.p1.4.m4.1b"><ci id="S2.p1.4.m4.1.1.cmml" xref="S2.p1.4.m4.1.1">𝒱</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.p1.4.m4.1c">\mathbfcal{V}</annotation><annotation encoding="application/x-llamapun" id="S2.p1.4.m4.1d">roman_𝒱</annotation></semantics></math>, found by the MOPPO, is Pareto optimal, it is added to the convex coverage set (CCS). The detailed methodology is explained in the following.</p> </div> <figure class="ltx_figure" id="S2.F1"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_square" height="405" id="S2.F1.g1" src="x1.png" width="374"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S2.F1.2.1.1" style="font-size:90%;">Figure 1</span>: </span><span class="ltx_text" id="S2.F1.3.2" style="font-size:90%;">Schematic of the proposed MORL approach with deep optimistic linear support.</span></figcaption> </figure> <section class="ltx_subsection" id="S2.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S2.SS1.5.1.1">II-A</span> </span><span class="ltx_text ltx_font_italic" id="S2.SS1.6.2">Single-Policy Multi-Objective PPO</span> </h3> <div class="ltx_para" id="S2.SS1.p1"> <p class="ltx_p" id="S2.SS1.p1.2">In order to learn on multiple rewards, the agent needs to receive a reward signal for each of the objectives. To this end, we extend the original RL grid environment into a multi-objective environment, allowing the agent to receive a d-dimensional reward vector <math alttext="\mathbf{r}_{t}\in\mathbb{R}^{d}" class="ltx_Math" display="inline" id="S2.SS1.p1.1.m1.1"><semantics id="S2.SS1.p1.1.m1.1a"><mrow id="S2.SS1.p1.1.m1.1.1" xref="S2.SS1.p1.1.m1.1.1.cmml"><msub id="S2.SS1.p1.1.m1.1.1.2" xref="S2.SS1.p1.1.m1.1.1.2.cmml"><mi id="S2.SS1.p1.1.m1.1.1.2.2" xref="S2.SS1.p1.1.m1.1.1.2.2.cmml">𝐫</mi><mi id="S2.SS1.p1.1.m1.1.1.2.3" xref="S2.SS1.p1.1.m1.1.1.2.3.cmml">t</mi></msub><mo id="S2.SS1.p1.1.m1.1.1.1" xref="S2.SS1.p1.1.m1.1.1.1.cmml">∈</mo><msup id="S2.SS1.p1.1.m1.1.1.3" xref="S2.SS1.p1.1.m1.1.1.3.cmml"><mi id="S2.SS1.p1.1.m1.1.1.3.2" xref="S2.SS1.p1.1.m1.1.1.3.2.cmml">ℝ</mi><mi id="S2.SS1.p1.1.m1.1.1.3.3" xref="S2.SS1.p1.1.m1.1.1.3.3.cmml">d</mi></msup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.1.m1.1b"><apply id="S2.SS1.p1.1.m1.1.1.cmml" xref="S2.SS1.p1.1.m1.1.1"><in id="S2.SS1.p1.1.m1.1.1.1.cmml" xref="S2.SS1.p1.1.m1.1.1.1"></in><apply id="S2.SS1.p1.1.m1.1.1.2.cmml" xref="S2.SS1.p1.1.m1.1.1.2"><csymbol cd="ambiguous" id="S2.SS1.p1.1.m1.1.1.2.1.cmml" xref="S2.SS1.p1.1.m1.1.1.2">subscript</csymbol><ci id="S2.SS1.p1.1.m1.1.1.2.2.cmml" xref="S2.SS1.p1.1.m1.1.1.2.2">𝐫</ci><ci id="S2.SS1.p1.1.m1.1.1.2.3.cmml" xref="S2.SS1.p1.1.m1.1.1.2.3">𝑡</ci></apply><apply id="S2.SS1.p1.1.m1.1.1.3.cmml" xref="S2.SS1.p1.1.m1.1.1.3"><csymbol cd="ambiguous" id="S2.SS1.p1.1.m1.1.1.3.1.cmml" xref="S2.SS1.p1.1.m1.1.1.3">superscript</csymbol><ci id="S2.SS1.p1.1.m1.1.1.3.2.cmml" xref="S2.SS1.p1.1.m1.1.1.3.2">ℝ</ci><ci id="S2.SS1.p1.1.m1.1.1.3.3.cmml" xref="S2.SS1.p1.1.m1.1.1.3.3">𝑑</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.1.m1.1c">\mathbf{r}_{t}\in\mathbb{R}^{d}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.1.m1.1d">bold_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT</annotation></semantics></math>, where <math alttext="d" class="ltx_Math" display="inline" id="S2.SS1.p1.2.m2.1"><semantics id="S2.SS1.p1.2.m2.1a"><mi id="S2.SS1.p1.2.m2.1.1" xref="S2.SS1.p1.2.m2.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S2.SS1.p1.2.m2.1b"><ci id="S2.SS1.p1.2.m2.1.1.cmml" xref="S2.SS1.p1.2.m2.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p1.2.m2.1c">d</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p1.2.m2.1d">italic_d</annotation></semantics></math> is the number of objectives. In this paper, the rewards reflect the operational objectives of reducing the line loading, decreasing the topological deviation, and reducing the switching frequency, explained in detail in section <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S2.SS2" title="II-B Reward functions for operational objectives ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">II-B</span></span></a>.</p> </div> <div class="ltx_para" id="S2.SS1.p2"> <p class="ltx_p" id="S2.SS1.p2.4">At the core of the proposed approach is the MOPPO Algorithm <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg1" title="Algorithm 1 ‣ II-A Single-Policy Multi-Objective PPO ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">1</span></a>, which is trained using a reward vector <math alttext="\mathbf{r}_{t}" class="ltx_Math" display="inline" id="S2.SS1.p2.1.m1.1"><semantics id="S2.SS1.p2.1.m1.1a"><msub id="S2.SS1.p2.1.m1.1.1" xref="S2.SS1.p2.1.m1.1.1.cmml"><mi id="S2.SS1.p2.1.m1.1.1.2" xref="S2.SS1.p2.1.m1.1.1.2.cmml">𝐫</mi><mi id="S2.SS1.p2.1.m1.1.1.3" xref="S2.SS1.p2.1.m1.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p2.1.m1.1b"><apply id="S2.SS1.p2.1.m1.1.1.cmml" xref="S2.SS1.p2.1.m1.1.1"><csymbol cd="ambiguous" id="S2.SS1.p2.1.m1.1.1.1.cmml" xref="S2.SS1.p2.1.m1.1.1">subscript</csymbol><ci id="S2.SS1.p2.1.m1.1.1.2.cmml" xref="S2.SS1.p2.1.m1.1.1.2">𝐫</ci><ci id="S2.SS1.p2.1.m1.1.1.3.cmml" xref="S2.SS1.p2.1.m1.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p2.1.m1.1c">\mathbf{r}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p2.1.m1.1d">bold_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> from the environment and a weight vector <math alttext="\mathbf{w}" class="ltx_Math" display="inline" id="S2.SS1.p2.2.m2.1"><semantics id="S2.SS1.p2.2.m2.1a"><mi id="S2.SS1.p2.2.m2.1.1" xref="S2.SS1.p2.2.m2.1.1.cmml">𝐰</mi><annotation-xml encoding="MathML-Content" id="S2.SS1.p2.2.m2.1b"><ci id="S2.SS1.p2.2.m2.1.1.cmml" xref="S2.SS1.p2.2.m2.1.1">𝐰</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p2.2.m2.1c">\mathbf{w}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p2.2.m2.1d">bold_w</annotation></semantics></math>, serving as a scalarization, from the DOL to deliver a single policy solution <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib23" title="">23</a>]</cite>. The MOPPO algorithm processes the d-dimensional reward vector <math alttext="\mathbf{r}_{t}" class="ltx_Math" display="inline" id="S2.SS1.p2.3.m3.1"><semantics id="S2.SS1.p2.3.m3.1a"><msub id="S2.SS1.p2.3.m3.1.1" xref="S2.SS1.p2.3.m3.1.1.cmml"><mi id="S2.SS1.p2.3.m3.1.1.2" xref="S2.SS1.p2.3.m3.1.1.2.cmml">𝐫</mi><mi id="S2.SS1.p2.3.m3.1.1.3" xref="S2.SS1.p2.3.m3.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS1.p2.3.m3.1b"><apply id="S2.SS1.p2.3.m3.1.1.cmml" xref="S2.SS1.p2.3.m3.1.1"><csymbol cd="ambiguous" id="S2.SS1.p2.3.m3.1.1.1.cmml" xref="S2.SS1.p2.3.m3.1.1">subscript</csymbol><ci id="S2.SS1.p2.3.m3.1.1.2.cmml" xref="S2.SS1.p2.3.m3.1.1.2">𝐫</ci><ci id="S2.SS1.p2.3.m3.1.1.3.cmml" xref="S2.SS1.p2.3.m3.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p2.3.m3.1c">\mathbf{r}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p2.3.m3.1d">bold_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> by using a d-dimensional critic head, resulting in a vectorized value function for each state <math alttext="s" class="ltx_Math" display="inline" id="S2.SS1.p2.4.m4.1"><semantics id="S2.SS1.p2.4.m4.1a"><mi id="S2.SS1.p2.4.m4.1.1" xref="S2.SS1.p2.4.m4.1.1.cmml">s</mi><annotation-xml encoding="MathML-Content" id="S2.SS1.p2.4.m4.1b"><ci id="S2.SS1.p2.4.m4.1.1.cmml" xref="S2.SS1.p2.4.m4.1.1">𝑠</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p2.4.m4.1c">s</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p2.4.m4.1d">italic_s</annotation></semantics></math>:</p> <table class="ltx_equation ltx_eqn_table" id="S2.E1"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\textbf{V}^{\pi}(s_{t})=\mathbb{E}_{\pi}\left[\sum_{t=0}^{\infty}\gamma^{t}% \mathbf{r}_{t}\,\Big{|}\,s_{t}=s\right]," class="ltx_Math" display="block" id="S2.E1.m1.1"><semantics id="S2.E1.m1.1a"><mrow id="S2.E1.m1.1.1.1" xref="S2.E1.m1.1.1.1.1.cmml"><mrow id="S2.E1.m1.1.1.1.1" xref="S2.E1.m1.1.1.1.1.cmml"><mrow id="S2.E1.m1.1.1.1.1.1" xref="S2.E1.m1.1.1.1.1.1.cmml"><msup id="S2.E1.m1.1.1.1.1.1.3" xref="S2.E1.m1.1.1.1.1.1.3.cmml"><mtext class="ltx_mathvariant_bold" id="S2.E1.m1.1.1.1.1.1.3.2" xref="S2.E1.m1.1.1.1.1.1.3.2a.cmml">V</mtext><mi id="S2.E1.m1.1.1.1.1.1.3.3" xref="S2.E1.m1.1.1.1.1.1.3.3.cmml">π</mi></msup><mo id="S2.E1.m1.1.1.1.1.1.2" xref="S2.E1.m1.1.1.1.1.1.2.cmml">⁢</mo><mrow id="S2.E1.m1.1.1.1.1.1.1.1" xref="S2.E1.m1.1.1.1.1.1.1.1.1.cmml"><mo id="S2.E1.m1.1.1.1.1.1.1.1.2" stretchy="false" xref="S2.E1.m1.1.1.1.1.1.1.1.1.cmml">(</mo><msub id="S2.E1.m1.1.1.1.1.1.1.1.1" xref="S2.E1.m1.1.1.1.1.1.1.1.1.cmml"><mi id="S2.E1.m1.1.1.1.1.1.1.1.1.2" xref="S2.E1.m1.1.1.1.1.1.1.1.1.2.cmml">s</mi><mi id="S2.E1.m1.1.1.1.1.1.1.1.1.3" xref="S2.E1.m1.1.1.1.1.1.1.1.1.3.cmml">t</mi></msub><mo id="S2.E1.m1.1.1.1.1.1.1.1.3" stretchy="false" xref="S2.E1.m1.1.1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S2.E1.m1.1.1.1.1.3" xref="S2.E1.m1.1.1.1.1.3.cmml">=</mo><mrow id="S2.E1.m1.1.1.1.1.2" xref="S2.E1.m1.1.1.1.1.2.cmml"><msub id="S2.E1.m1.1.1.1.1.2.3" xref="S2.E1.m1.1.1.1.1.2.3.cmml"><mi id="S2.E1.m1.1.1.1.1.2.3.2" xref="S2.E1.m1.1.1.1.1.2.3.2.cmml">𝔼</mi><mi id="S2.E1.m1.1.1.1.1.2.3.3" xref="S2.E1.m1.1.1.1.1.2.3.3.cmml">π</mi></msub><mo id="S2.E1.m1.1.1.1.1.2.2" xref="S2.E1.m1.1.1.1.1.2.2.cmml">⁢</mo><mrow id="S2.E1.m1.1.1.1.1.2.1.1" xref="S2.E1.m1.1.1.1.1.2.1.2.cmml"><mo id="S2.E1.m1.1.1.1.1.2.1.1.2" xref="S2.E1.m1.1.1.1.1.2.1.2.1.cmml">[</mo><mrow id="S2.E1.m1.1.1.1.1.2.1.1.1" xref="S2.E1.m1.1.1.1.1.2.1.1.1.cmml"><mrow id="S2.E1.m1.1.1.1.1.2.1.1.1.2" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.cmml"><mrow id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.cmml"><munderover id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.cmml"><mo id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.2" lspace="0em" movablelimits="false" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.2.cmml">∑</mo><mrow id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3.cmml"><mi id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3.2" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3.2.cmml">t</mi><mo id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3.1" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3.1.cmml">=</mo><mn id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3.3" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3.3.cmml">0</mn></mrow><mi id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.3" mathvariant="normal" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.3.cmml">∞</mi></munderover><mrow id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.cmml"><msup id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.2" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.2.cmml"><mi id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.2.2" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.2.2.cmml">γ</mi><mi id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.2.3" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.2.3.cmml">t</mi></msup><mo id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.1" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.1.cmml">⁢</mo><msub id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.3" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.3.cmml"><mi id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.3.2" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.3.2.cmml">𝐫</mi><mi id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.3.3" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.3.3.cmml">t</mi></msub></mrow></mrow><mo fence="false" id="S2.E1.m1.1.1.1.1.2.1.1.1.2.1" mathsize="160%" rspace="0.448em" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.1.cmml">|</mo><msub id="S2.E1.m1.1.1.1.1.2.1.1.1.2.3" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.3.cmml"><mi id="S2.E1.m1.1.1.1.1.2.1.1.1.2.3.2" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.3.2.cmml">s</mi><mi id="S2.E1.m1.1.1.1.1.2.1.1.1.2.3.3" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.3.3.cmml">t</mi></msub></mrow><mo id="S2.E1.m1.1.1.1.1.2.1.1.1.1" xref="S2.E1.m1.1.1.1.1.2.1.1.1.1.cmml">=</mo><mi id="S2.E1.m1.1.1.1.1.2.1.1.1.3" xref="S2.E1.m1.1.1.1.1.2.1.1.1.3.cmml">s</mi></mrow><mo id="S2.E1.m1.1.1.1.1.2.1.1.3" xref="S2.E1.m1.1.1.1.1.2.1.2.1.cmml">]</mo></mrow></mrow></mrow><mo id="S2.E1.m1.1.1.1.2" xref="S2.E1.m1.1.1.1.1.cmml">,</mo></mrow><annotation-xml encoding="MathML-Content" id="S2.E1.m1.1b"><apply id="S2.E1.m1.1.1.1.1.cmml" xref="S2.E1.m1.1.1.1"><eq id="S2.E1.m1.1.1.1.1.3.cmml" xref="S2.E1.m1.1.1.1.1.3"></eq><apply id="S2.E1.m1.1.1.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1"><times id="S2.E1.m1.1.1.1.1.1.2.cmml" xref="S2.E1.m1.1.1.1.1.1.2"></times><apply id="S2.E1.m1.1.1.1.1.1.3.cmml" xref="S2.E1.m1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.1.1.1.3.1.cmml" xref="S2.E1.m1.1.1.1.1.1.3">superscript</csymbol><ci id="S2.E1.m1.1.1.1.1.1.3.2a.cmml" xref="S2.E1.m1.1.1.1.1.1.3.2"><mtext class="ltx_mathvariant_bold" id="S2.E1.m1.1.1.1.1.1.3.2.cmml" xref="S2.E1.m1.1.1.1.1.1.3.2">V</mtext></ci><ci id="S2.E1.m1.1.1.1.1.1.3.3.cmml" xref="S2.E1.m1.1.1.1.1.1.3.3">𝜋</ci></apply><apply id="S2.E1.m1.1.1.1.1.1.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1">subscript</csymbol><ci id="S2.E1.m1.1.1.1.1.1.1.1.1.2.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.2">𝑠</ci><ci id="S2.E1.m1.1.1.1.1.1.1.1.1.3.cmml" xref="S2.E1.m1.1.1.1.1.1.1.1.1.3">𝑡</ci></apply></apply><apply id="S2.E1.m1.1.1.1.1.2.cmml" xref="S2.E1.m1.1.1.1.1.2"><times id="S2.E1.m1.1.1.1.1.2.2.cmml" xref="S2.E1.m1.1.1.1.1.2.2"></times><apply id="S2.E1.m1.1.1.1.1.2.3.cmml" xref="S2.E1.m1.1.1.1.1.2.3"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.1.1.2.3.1.cmml" xref="S2.E1.m1.1.1.1.1.2.3">subscript</csymbol><ci id="S2.E1.m1.1.1.1.1.2.3.2.cmml" xref="S2.E1.m1.1.1.1.1.2.3.2">𝔼</ci><ci id="S2.E1.m1.1.1.1.1.2.3.3.cmml" xref="S2.E1.m1.1.1.1.1.2.3.3">𝜋</ci></apply><apply id="S2.E1.m1.1.1.1.1.2.1.2.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1"><csymbol cd="latexml" id="S2.E1.m1.1.1.1.1.2.1.2.1.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.2">delimited-[]</csymbol><apply id="S2.E1.m1.1.1.1.1.2.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1"><eq id="S2.E1.m1.1.1.1.1.2.1.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.1"></eq><apply id="S2.E1.m1.1.1.1.1.2.1.1.1.2.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2"><csymbol cd="latexml" id="S2.E1.m1.1.1.1.1.2.1.1.1.2.1.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.1">conditional</csymbol><apply id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2"><apply id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.1.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1">superscript</csymbol><apply id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.1.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1">subscript</csymbol><sum id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.2.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.2"></sum><apply id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3"><eq id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3.1.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3.1"></eq><ci id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3.2.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3.2">𝑡</ci><cn id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3.3.cmml" type="integer" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.2.3.3">0</cn></apply></apply><infinity id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.3.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.1.3"></infinity></apply><apply id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2"><times id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.1.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.1"></times><apply id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.2.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.2"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.2.1.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.2">superscript</csymbol><ci id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.2.2.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.2.2">𝛾</ci><ci id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.2.3.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.2.3">𝑡</ci></apply><apply id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.3.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.3"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.3.1.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.3">subscript</csymbol><ci id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.3.2.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.3.2">𝐫</ci><ci id="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.3.3.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.2.2.3.3">𝑡</ci></apply></apply></apply><apply id="S2.E1.m1.1.1.1.1.2.1.1.1.2.3.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.3"><csymbol cd="ambiguous" id="S2.E1.m1.1.1.1.1.2.1.1.1.2.3.1.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.3">subscript</csymbol><ci id="S2.E1.m1.1.1.1.1.2.1.1.1.2.3.2.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.3.2">𝑠</ci><ci id="S2.E1.m1.1.1.1.1.2.1.1.1.2.3.3.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.2.3.3">𝑡</ci></apply></apply><ci id="S2.E1.m1.1.1.1.1.2.1.1.1.3.cmml" xref="S2.E1.m1.1.1.1.1.2.1.1.1.3">𝑠</ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E1.m1.1c">\textbf{V}^{\pi}(s_{t})=\mathbb{E}_{\pi}\left[\sum_{t=0}^{\infty}\gamma^{t}% \mathbf{r}_{t}\,\Big{|}\,s_{t}=s\right],</annotation><annotation encoding="application/x-llamapun" id="S2.E1.m1.1d">V start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_s ] ,</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(1)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S2.SS1.p2.10">where <math alttext="\textbf{V}^{\pi}(s)\in\mathbb{R}^{d}" class="ltx_Math" display="inline" id="S2.SS1.p2.5.m1.1"><semantics id="S2.SS1.p2.5.m1.1a"><mrow id="S2.SS1.p2.5.m1.1.2" xref="S2.SS1.p2.5.m1.1.2.cmml"><mrow id="S2.SS1.p2.5.m1.1.2.2" xref="S2.SS1.p2.5.m1.1.2.2.cmml"><msup id="S2.SS1.p2.5.m1.1.2.2.2" xref="S2.SS1.p2.5.m1.1.2.2.2.cmml"><mtext class="ltx_mathvariant_bold" id="S2.SS1.p2.5.m1.1.2.2.2.2" xref="S2.SS1.p2.5.m1.1.2.2.2.2a.cmml">V</mtext><mi id="S2.SS1.p2.5.m1.1.2.2.2.3" xref="S2.SS1.p2.5.m1.1.2.2.2.3.cmml">π</mi></msup><mo id="S2.SS1.p2.5.m1.1.2.2.1" xref="S2.SS1.p2.5.m1.1.2.2.1.cmml">⁢</mo><mrow id="S2.SS1.p2.5.m1.1.2.2.3.2" xref="S2.SS1.p2.5.m1.1.2.2.cmml"><mo id="S2.SS1.p2.5.m1.1.2.2.3.2.1" stretchy="false" xref="S2.SS1.p2.5.m1.1.2.2.cmml">(</mo><mi id="S2.SS1.p2.5.m1.1.1" xref="S2.SS1.p2.5.m1.1.1.cmml">s</mi><mo id="S2.SS1.p2.5.m1.1.2.2.3.2.2" stretchy="false" xref="S2.SS1.p2.5.m1.1.2.2.cmml">)</mo></mrow></mrow><mo id="S2.SS1.p2.5.m1.1.2.1" xref="S2.SS1.p2.5.m1.1.2.1.cmml">∈</mo><msup id="S2.SS1.p2.5.m1.1.2.3" xref="S2.SS1.p2.5.m1.1.2.3.cmml"><mi id="S2.SS1.p2.5.m1.1.2.3.2" xref="S2.SS1.p2.5.m1.1.2.3.2.cmml">ℝ</mi><mi id="S2.SS1.p2.5.m1.1.2.3.3" xref="S2.SS1.p2.5.m1.1.2.3.3.cmml">d</mi></msup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p2.5.m1.1b"><apply id="S2.SS1.p2.5.m1.1.2.cmml" xref="S2.SS1.p2.5.m1.1.2"><in id="S2.SS1.p2.5.m1.1.2.1.cmml" xref="S2.SS1.p2.5.m1.1.2.1"></in><apply id="S2.SS1.p2.5.m1.1.2.2.cmml" xref="S2.SS1.p2.5.m1.1.2.2"><times id="S2.SS1.p2.5.m1.1.2.2.1.cmml" xref="S2.SS1.p2.5.m1.1.2.2.1"></times><apply id="S2.SS1.p2.5.m1.1.2.2.2.cmml" xref="S2.SS1.p2.5.m1.1.2.2.2"><csymbol cd="ambiguous" id="S2.SS1.p2.5.m1.1.2.2.2.1.cmml" xref="S2.SS1.p2.5.m1.1.2.2.2">superscript</csymbol><ci id="S2.SS1.p2.5.m1.1.2.2.2.2a.cmml" xref="S2.SS1.p2.5.m1.1.2.2.2.2"><mtext class="ltx_mathvariant_bold" id="S2.SS1.p2.5.m1.1.2.2.2.2.cmml" xref="S2.SS1.p2.5.m1.1.2.2.2.2">V</mtext></ci><ci id="S2.SS1.p2.5.m1.1.2.2.2.3.cmml" xref="S2.SS1.p2.5.m1.1.2.2.2.3">𝜋</ci></apply><ci id="S2.SS1.p2.5.m1.1.1.cmml" xref="S2.SS1.p2.5.m1.1.1">𝑠</ci></apply><apply id="S2.SS1.p2.5.m1.1.2.3.cmml" xref="S2.SS1.p2.5.m1.1.2.3"><csymbol cd="ambiguous" id="S2.SS1.p2.5.m1.1.2.3.1.cmml" xref="S2.SS1.p2.5.m1.1.2.3">superscript</csymbol><ci id="S2.SS1.p2.5.m1.1.2.3.2.cmml" xref="S2.SS1.p2.5.m1.1.2.3.2">ℝ</ci><ci id="S2.SS1.p2.5.m1.1.2.3.3.cmml" xref="S2.SS1.p2.5.m1.1.2.3.3">𝑑</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p2.5.m1.1c">\textbf{V}^{\pi}(s)\in\mathbb{R}^{d}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p2.5.m1.1d">V start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_s ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT</annotation></semantics></math> is the vectorized state-value function per state <math alttext="s" class="ltx_Math" display="inline" id="S2.SS1.p2.6.m2.1"><semantics id="S2.SS1.p2.6.m2.1a"><mi id="S2.SS1.p2.6.m2.1.1" xref="S2.SS1.p2.6.m2.1.1.cmml">s</mi><annotation-xml encoding="MathML-Content" id="S2.SS1.p2.6.m2.1b"><ci id="S2.SS1.p2.6.m2.1.1.cmml" xref="S2.SS1.p2.6.m2.1.1">𝑠</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p2.6.m2.1c">s</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p2.6.m2.1d">italic_s</annotation></semantics></math> under policy <math alttext="\pi" class="ltx_Math" display="inline" id="S2.SS1.p2.7.m3.1"><semantics id="S2.SS1.p2.7.m3.1a"><mi id="S2.SS1.p2.7.m3.1.1" xref="S2.SS1.p2.7.m3.1.1.cmml">π</mi><annotation-xml encoding="MathML-Content" id="S2.SS1.p2.7.m3.1b"><ci id="S2.SS1.p2.7.m3.1.1.cmml" xref="S2.SS1.p2.7.m3.1.1">𝜋</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p2.7.m3.1c">\pi</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p2.7.m3.1d">italic_π</annotation></semantics></math>, <math alttext="\mathbf{r}_{t}\in\mathbb{R}^{d}" class="ltx_Math" display="inline" id="S2.SS1.p2.8.m4.1"><semantics id="S2.SS1.p2.8.m4.1a"><mrow id="S2.SS1.p2.8.m4.1.1" xref="S2.SS1.p2.8.m4.1.1.cmml"><msub id="S2.SS1.p2.8.m4.1.1.2" xref="S2.SS1.p2.8.m4.1.1.2.cmml"><mi id="S2.SS1.p2.8.m4.1.1.2.2" xref="S2.SS1.p2.8.m4.1.1.2.2.cmml">𝐫</mi><mi id="S2.SS1.p2.8.m4.1.1.2.3" xref="S2.SS1.p2.8.m4.1.1.2.3.cmml">t</mi></msub><mo id="S2.SS1.p2.8.m4.1.1.1" xref="S2.SS1.p2.8.m4.1.1.1.cmml">∈</mo><msup id="S2.SS1.p2.8.m4.1.1.3" xref="S2.SS1.p2.8.m4.1.1.3.cmml"><mi id="S2.SS1.p2.8.m4.1.1.3.2" xref="S2.SS1.p2.8.m4.1.1.3.2.cmml">ℝ</mi><mi id="S2.SS1.p2.8.m4.1.1.3.3" xref="S2.SS1.p2.8.m4.1.1.3.3.cmml">d</mi></msup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p2.8.m4.1b"><apply id="S2.SS1.p2.8.m4.1.1.cmml" xref="S2.SS1.p2.8.m4.1.1"><in id="S2.SS1.p2.8.m4.1.1.1.cmml" xref="S2.SS1.p2.8.m4.1.1.1"></in><apply id="S2.SS1.p2.8.m4.1.1.2.cmml" xref="S2.SS1.p2.8.m4.1.1.2"><csymbol cd="ambiguous" id="S2.SS1.p2.8.m4.1.1.2.1.cmml" xref="S2.SS1.p2.8.m4.1.1.2">subscript</csymbol><ci id="S2.SS1.p2.8.m4.1.1.2.2.cmml" xref="S2.SS1.p2.8.m4.1.1.2.2">𝐫</ci><ci id="S2.SS1.p2.8.m4.1.1.2.3.cmml" xref="S2.SS1.p2.8.m4.1.1.2.3">𝑡</ci></apply><apply id="S2.SS1.p2.8.m4.1.1.3.cmml" xref="S2.SS1.p2.8.m4.1.1.3"><csymbol cd="ambiguous" id="S2.SS1.p2.8.m4.1.1.3.1.cmml" xref="S2.SS1.p2.8.m4.1.1.3">superscript</csymbol><ci id="S2.SS1.p2.8.m4.1.1.3.2.cmml" xref="S2.SS1.p2.8.m4.1.1.3.2">ℝ</ci><ci id="S2.SS1.p2.8.m4.1.1.3.3.cmml" xref="S2.SS1.p2.8.m4.1.1.3.3">𝑑</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p2.8.m4.1c">\mathbf{r}_{t}\in\mathbb{R}^{d}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p2.8.m4.1d">bold_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT</annotation></semantics></math> is the vector of rewards at time <math alttext="t" class="ltx_Math" display="inline" id="S2.SS1.p2.9.m5.1"><semantics id="S2.SS1.p2.9.m5.1a"><mi id="S2.SS1.p2.9.m5.1.1" xref="S2.SS1.p2.9.m5.1.1.cmml">t</mi><annotation-xml encoding="MathML-Content" id="S2.SS1.p2.9.m5.1b"><ci id="S2.SS1.p2.9.m5.1.1.cmml" xref="S2.SS1.p2.9.m5.1.1">𝑡</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p2.9.m5.1c">t</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p2.9.m5.1d">italic_t</annotation></semantics></math>, and <math alttext="\gamma\in[0,1)" class="ltx_Math" display="inline" id="S2.SS1.p2.10.m6.2"><semantics id="S2.SS1.p2.10.m6.2a"><mrow id="S2.SS1.p2.10.m6.2.3" xref="S2.SS1.p2.10.m6.2.3.cmml"><mi id="S2.SS1.p2.10.m6.2.3.2" xref="S2.SS1.p2.10.m6.2.3.2.cmml">γ</mi><mo id="S2.SS1.p2.10.m6.2.3.1" xref="S2.SS1.p2.10.m6.2.3.1.cmml">∈</mo><mrow id="S2.SS1.p2.10.m6.2.3.3.2" xref="S2.SS1.p2.10.m6.2.3.3.1.cmml"><mo id="S2.SS1.p2.10.m6.2.3.3.2.1" stretchy="false" xref="S2.SS1.p2.10.m6.2.3.3.1.cmml">[</mo><mn id="S2.SS1.p2.10.m6.1.1" xref="S2.SS1.p2.10.m6.1.1.cmml">0</mn><mo id="S2.SS1.p2.10.m6.2.3.3.2.2" xref="S2.SS1.p2.10.m6.2.3.3.1.cmml">,</mo><mn id="S2.SS1.p2.10.m6.2.2" xref="S2.SS1.p2.10.m6.2.2.cmml">1</mn><mo id="S2.SS1.p2.10.m6.2.3.3.2.3" stretchy="false" xref="S2.SS1.p2.10.m6.2.3.3.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p2.10.m6.2b"><apply id="S2.SS1.p2.10.m6.2.3.cmml" xref="S2.SS1.p2.10.m6.2.3"><in id="S2.SS1.p2.10.m6.2.3.1.cmml" xref="S2.SS1.p2.10.m6.2.3.1"></in><ci id="S2.SS1.p2.10.m6.2.3.2.cmml" xref="S2.SS1.p2.10.m6.2.3.2">𝛾</ci><interval closure="closed-open" id="S2.SS1.p2.10.m6.2.3.3.1.cmml" xref="S2.SS1.p2.10.m6.2.3.3.2"><cn id="S2.SS1.p2.10.m6.1.1.cmml" type="integer" xref="S2.SS1.p2.10.m6.1.1">0</cn><cn id="S2.SS1.p2.10.m6.2.2.cmml" type="integer" xref="S2.SS1.p2.10.m6.2.2">1</cn></interval></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p2.10.m6.2c">\gamma\in[0,1)</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p2.10.m6.2d">italic_γ ∈ [ 0 , 1 )</annotation></semantics></math> is the discount factor.</p> </div> <div class="ltx_para" id="S2.SS1.p3"> <p class="ltx_p" id="S2.SS1.p3.1">The vectorized advantages <math alttext="\mathbf{A}_{t}\in\mathbb{R}^{d}" class="ltx_Math" display="inline" id="S2.SS1.p3.1.m1.1"><semantics id="S2.SS1.p3.1.m1.1a"><mrow id="S2.SS1.p3.1.m1.1.1" xref="S2.SS1.p3.1.m1.1.1.cmml"><msub id="S2.SS1.p3.1.m1.1.1.2" xref="S2.SS1.p3.1.m1.1.1.2.cmml"><mi id="S2.SS1.p3.1.m1.1.1.2.2" xref="S2.SS1.p3.1.m1.1.1.2.2.cmml">𝐀</mi><mi id="S2.SS1.p3.1.m1.1.1.2.3" xref="S2.SS1.p3.1.m1.1.1.2.3.cmml">t</mi></msub><mo id="S2.SS1.p3.1.m1.1.1.1" xref="S2.SS1.p3.1.m1.1.1.1.cmml">∈</mo><msup id="S2.SS1.p3.1.m1.1.1.3" xref="S2.SS1.p3.1.m1.1.1.3.cmml"><mi id="S2.SS1.p3.1.m1.1.1.3.2" xref="S2.SS1.p3.1.m1.1.1.3.2.cmml">ℝ</mi><mi id="S2.SS1.p3.1.m1.1.1.3.3" xref="S2.SS1.p3.1.m1.1.1.3.3.cmml">d</mi></msup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p3.1.m1.1b"><apply id="S2.SS1.p3.1.m1.1.1.cmml" xref="S2.SS1.p3.1.m1.1.1"><in id="S2.SS1.p3.1.m1.1.1.1.cmml" xref="S2.SS1.p3.1.m1.1.1.1"></in><apply id="S2.SS1.p3.1.m1.1.1.2.cmml" xref="S2.SS1.p3.1.m1.1.1.2"><csymbol cd="ambiguous" id="S2.SS1.p3.1.m1.1.1.2.1.cmml" xref="S2.SS1.p3.1.m1.1.1.2">subscript</csymbol><ci id="S2.SS1.p3.1.m1.1.1.2.2.cmml" xref="S2.SS1.p3.1.m1.1.1.2.2">𝐀</ci><ci id="S2.SS1.p3.1.m1.1.1.2.3.cmml" xref="S2.SS1.p3.1.m1.1.1.2.3">𝑡</ci></apply><apply id="S2.SS1.p3.1.m1.1.1.3.cmml" xref="S2.SS1.p3.1.m1.1.1.3"><csymbol cd="ambiguous" id="S2.SS1.p3.1.m1.1.1.3.1.cmml" xref="S2.SS1.p3.1.m1.1.1.3">superscript</csymbol><ci id="S2.SS1.p3.1.m1.1.1.3.2.cmml" xref="S2.SS1.p3.1.m1.1.1.3.2">ℝ</ci><ci id="S2.SS1.p3.1.m1.1.1.3.3.cmml" xref="S2.SS1.p3.1.m1.1.1.3.3">𝑑</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p3.1.m1.1c">\mathbf{A}_{t}\in\mathbb{R}^{d}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p3.1.m1.1d">bold_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT</annotation></semantics></math> are then calculated using generalized advantage estimation (GAE) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib22" title="">22</a>]</cite>. These advantages are then scalarized using a weighted sum approach:</p> </div> <div class="ltx_para" id="S2.SS1.p4"> <table class="ltx_equation ltx_eqn_table" id="S2.E2"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="A_{t}=\mathbf{w}^{\top}\mathbf{A}_{t}," class="ltx_Math" display="block" id="S2.E2.m1.1"><semantics id="S2.E2.m1.1a"><mrow id="S2.E2.m1.1.1.1" xref="S2.E2.m1.1.1.1.1.cmml"><mrow id="S2.E2.m1.1.1.1.1" xref="S2.E2.m1.1.1.1.1.cmml"><msub id="S2.E2.m1.1.1.1.1.2" xref="S2.E2.m1.1.1.1.1.2.cmml"><mi id="S2.E2.m1.1.1.1.1.2.2" xref="S2.E2.m1.1.1.1.1.2.2.cmml">A</mi><mi id="S2.E2.m1.1.1.1.1.2.3" xref="S2.E2.m1.1.1.1.1.2.3.cmml">t</mi></msub><mo id="S2.E2.m1.1.1.1.1.1" xref="S2.E2.m1.1.1.1.1.1.cmml">=</mo><mrow id="S2.E2.m1.1.1.1.1.3" xref="S2.E2.m1.1.1.1.1.3.cmml"><msup id="S2.E2.m1.1.1.1.1.3.2" xref="S2.E2.m1.1.1.1.1.3.2.cmml"><mi id="S2.E2.m1.1.1.1.1.3.2.2" xref="S2.E2.m1.1.1.1.1.3.2.2.cmml">𝐰</mi><mo id="S2.E2.m1.1.1.1.1.3.2.3" xref="S2.E2.m1.1.1.1.1.3.2.3.cmml">⊤</mo></msup><mo id="S2.E2.m1.1.1.1.1.3.1" xref="S2.E2.m1.1.1.1.1.3.1.cmml">⁢</mo><msub id="S2.E2.m1.1.1.1.1.3.3" xref="S2.E2.m1.1.1.1.1.3.3.cmml"><mi id="S2.E2.m1.1.1.1.1.3.3.2" xref="S2.E2.m1.1.1.1.1.3.3.2.cmml">𝐀</mi><mi id="S2.E2.m1.1.1.1.1.3.3.3" xref="S2.E2.m1.1.1.1.1.3.3.3.cmml">t</mi></msub></mrow></mrow><mo id="S2.E2.m1.1.1.1.2" xref="S2.E2.m1.1.1.1.1.cmml">,</mo></mrow><annotation-xml encoding="MathML-Content" id="S2.E2.m1.1b"><apply id="S2.E2.m1.1.1.1.1.cmml" xref="S2.E2.m1.1.1.1"><eq id="S2.E2.m1.1.1.1.1.1.cmml" xref="S2.E2.m1.1.1.1.1.1"></eq><apply id="S2.E2.m1.1.1.1.1.2.cmml" xref="S2.E2.m1.1.1.1.1.2"><csymbol cd="ambiguous" id="S2.E2.m1.1.1.1.1.2.1.cmml" xref="S2.E2.m1.1.1.1.1.2">subscript</csymbol><ci id="S2.E2.m1.1.1.1.1.2.2.cmml" xref="S2.E2.m1.1.1.1.1.2.2">𝐴</ci><ci id="S2.E2.m1.1.1.1.1.2.3.cmml" xref="S2.E2.m1.1.1.1.1.2.3">𝑡</ci></apply><apply id="S2.E2.m1.1.1.1.1.3.cmml" xref="S2.E2.m1.1.1.1.1.3"><times id="S2.E2.m1.1.1.1.1.3.1.cmml" xref="S2.E2.m1.1.1.1.1.3.1"></times><apply id="S2.E2.m1.1.1.1.1.3.2.cmml" xref="S2.E2.m1.1.1.1.1.3.2"><csymbol cd="ambiguous" id="S2.E2.m1.1.1.1.1.3.2.1.cmml" xref="S2.E2.m1.1.1.1.1.3.2">superscript</csymbol><ci id="S2.E2.m1.1.1.1.1.3.2.2.cmml" xref="S2.E2.m1.1.1.1.1.3.2.2">𝐰</ci><csymbol cd="latexml" id="S2.E2.m1.1.1.1.1.3.2.3.cmml" xref="S2.E2.m1.1.1.1.1.3.2.3">top</csymbol></apply><apply id="S2.E2.m1.1.1.1.1.3.3.cmml" xref="S2.E2.m1.1.1.1.1.3.3"><csymbol cd="ambiguous" id="S2.E2.m1.1.1.1.1.3.3.1.cmml" xref="S2.E2.m1.1.1.1.1.3.3">subscript</csymbol><ci id="S2.E2.m1.1.1.1.1.3.3.2.cmml" xref="S2.E2.m1.1.1.1.1.3.3.2">𝐀</ci><ci id="S2.E2.m1.1.1.1.1.3.3.3.cmml" xref="S2.E2.m1.1.1.1.1.3.3.3">𝑡</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E2.m1.1c">A_{t}=\mathbf{w}^{\top}\mathbf{A}_{t},</annotation><annotation encoding="application/x-llamapun" id="S2.E2.m1.1d">italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(2)</span></td> </tr></tbody> </table> </div> <div class="ltx_para" id="S2.SS1.p5"> <p class="ltx_p" id="S2.SS1.p5.8">where <math alttext="\mathbf{w}\in\mathbb{R}^{d}" class="ltx_Math" display="inline" id="S2.SS1.p5.1.m1.1"><semantics id="S2.SS1.p5.1.m1.1a"><mrow id="S2.SS1.p5.1.m1.1.1" xref="S2.SS1.p5.1.m1.1.1.cmml"><mi id="S2.SS1.p5.1.m1.1.1.2" xref="S2.SS1.p5.1.m1.1.1.2.cmml">𝐰</mi><mo id="S2.SS1.p5.1.m1.1.1.1" xref="S2.SS1.p5.1.m1.1.1.1.cmml">∈</mo><msup id="S2.SS1.p5.1.m1.1.1.3" xref="S2.SS1.p5.1.m1.1.1.3.cmml"><mi id="S2.SS1.p5.1.m1.1.1.3.2" xref="S2.SS1.p5.1.m1.1.1.3.2.cmml">ℝ</mi><mi id="S2.SS1.p5.1.m1.1.1.3.3" xref="S2.SS1.p5.1.m1.1.1.3.3.cmml">d</mi></msup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p5.1.m1.1b"><apply id="S2.SS1.p5.1.m1.1.1.cmml" xref="S2.SS1.p5.1.m1.1.1"><in id="S2.SS1.p5.1.m1.1.1.1.cmml" xref="S2.SS1.p5.1.m1.1.1.1"></in><ci id="S2.SS1.p5.1.m1.1.1.2.cmml" xref="S2.SS1.p5.1.m1.1.1.2">𝐰</ci><apply id="S2.SS1.p5.1.m1.1.1.3.cmml" xref="S2.SS1.p5.1.m1.1.1.3"><csymbol cd="ambiguous" id="S2.SS1.p5.1.m1.1.1.3.1.cmml" xref="S2.SS1.p5.1.m1.1.1.3">superscript</csymbol><ci id="S2.SS1.p5.1.m1.1.1.3.2.cmml" xref="S2.SS1.p5.1.m1.1.1.3.2">ℝ</ci><ci id="S2.SS1.p5.1.m1.1.1.3.3.cmml" xref="S2.SS1.p5.1.m1.1.1.3.3">𝑑</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p5.1.m1.1c">\mathbf{w}\in\mathbb{R}^{d}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p5.1.m1.1d">bold_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT</annotation></semantics></math> is the weight vector representing the scalarization preferences, satisfying <math alttext="\sum_{i=1}^{n}w_{i}=1" class="ltx_Math" display="inline" id="S2.SS1.p5.2.m2.1"><semantics id="S2.SS1.p5.2.m2.1a"><mrow id="S2.SS1.p5.2.m2.1.1" xref="S2.SS1.p5.2.m2.1.1.cmml"><mrow id="S2.SS1.p5.2.m2.1.1.2" xref="S2.SS1.p5.2.m2.1.1.2.cmml"><msubsup id="S2.SS1.p5.2.m2.1.1.2.1" xref="S2.SS1.p5.2.m2.1.1.2.1.cmml"><mo id="S2.SS1.p5.2.m2.1.1.2.1.2.2" xref="S2.SS1.p5.2.m2.1.1.2.1.2.2.cmml">∑</mo><mrow id="S2.SS1.p5.2.m2.1.1.2.1.2.3" xref="S2.SS1.p5.2.m2.1.1.2.1.2.3.cmml"><mi id="S2.SS1.p5.2.m2.1.1.2.1.2.3.2" xref="S2.SS1.p5.2.m2.1.1.2.1.2.3.2.cmml">i</mi><mo id="S2.SS1.p5.2.m2.1.1.2.1.2.3.1" xref="S2.SS1.p5.2.m2.1.1.2.1.2.3.1.cmml">=</mo><mn id="S2.SS1.p5.2.m2.1.1.2.1.2.3.3" xref="S2.SS1.p5.2.m2.1.1.2.1.2.3.3.cmml">1</mn></mrow><mi id="S2.SS1.p5.2.m2.1.1.2.1.3" xref="S2.SS1.p5.2.m2.1.1.2.1.3.cmml">n</mi></msubsup><msub id="S2.SS1.p5.2.m2.1.1.2.2" xref="S2.SS1.p5.2.m2.1.1.2.2.cmml"><mi id="S2.SS1.p5.2.m2.1.1.2.2.2" xref="S2.SS1.p5.2.m2.1.1.2.2.2.cmml">w</mi><mi id="S2.SS1.p5.2.m2.1.1.2.2.3" xref="S2.SS1.p5.2.m2.1.1.2.2.3.cmml">i</mi></msub></mrow><mo id="S2.SS1.p5.2.m2.1.1.1" xref="S2.SS1.p5.2.m2.1.1.1.cmml">=</mo><mn id="S2.SS1.p5.2.m2.1.1.3" xref="S2.SS1.p5.2.m2.1.1.3.cmml">1</mn></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p5.2.m2.1b"><apply id="S2.SS1.p5.2.m2.1.1.cmml" xref="S2.SS1.p5.2.m2.1.1"><eq id="S2.SS1.p5.2.m2.1.1.1.cmml" xref="S2.SS1.p5.2.m2.1.1.1"></eq><apply id="S2.SS1.p5.2.m2.1.1.2.cmml" xref="S2.SS1.p5.2.m2.1.1.2"><apply id="S2.SS1.p5.2.m2.1.1.2.1.cmml" xref="S2.SS1.p5.2.m2.1.1.2.1"><csymbol cd="ambiguous" id="S2.SS1.p5.2.m2.1.1.2.1.1.cmml" xref="S2.SS1.p5.2.m2.1.1.2.1">superscript</csymbol><apply id="S2.SS1.p5.2.m2.1.1.2.1.2.cmml" xref="S2.SS1.p5.2.m2.1.1.2.1"><csymbol cd="ambiguous" id="S2.SS1.p5.2.m2.1.1.2.1.2.1.cmml" xref="S2.SS1.p5.2.m2.1.1.2.1">subscript</csymbol><sum id="S2.SS1.p5.2.m2.1.1.2.1.2.2.cmml" xref="S2.SS1.p5.2.m2.1.1.2.1.2.2"></sum><apply id="S2.SS1.p5.2.m2.1.1.2.1.2.3.cmml" xref="S2.SS1.p5.2.m2.1.1.2.1.2.3"><eq id="S2.SS1.p5.2.m2.1.1.2.1.2.3.1.cmml" xref="S2.SS1.p5.2.m2.1.1.2.1.2.3.1"></eq><ci id="S2.SS1.p5.2.m2.1.1.2.1.2.3.2.cmml" xref="S2.SS1.p5.2.m2.1.1.2.1.2.3.2">𝑖</ci><cn id="S2.SS1.p5.2.m2.1.1.2.1.2.3.3.cmml" type="integer" xref="S2.SS1.p5.2.m2.1.1.2.1.2.3.3">1</cn></apply></apply><ci id="S2.SS1.p5.2.m2.1.1.2.1.3.cmml" xref="S2.SS1.p5.2.m2.1.1.2.1.3">𝑛</ci></apply><apply id="S2.SS1.p5.2.m2.1.1.2.2.cmml" xref="S2.SS1.p5.2.m2.1.1.2.2"><csymbol cd="ambiguous" id="S2.SS1.p5.2.m2.1.1.2.2.1.cmml" xref="S2.SS1.p5.2.m2.1.1.2.2">subscript</csymbol><ci id="S2.SS1.p5.2.m2.1.1.2.2.2.cmml" xref="S2.SS1.p5.2.m2.1.1.2.2.2">𝑤</ci><ci id="S2.SS1.p5.2.m2.1.1.2.2.3.cmml" xref="S2.SS1.p5.2.m2.1.1.2.2.3">𝑖</ci></apply></apply><cn id="S2.SS1.p5.2.m2.1.1.3.cmml" type="integer" xref="S2.SS1.p5.2.m2.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p5.2.m2.1c">\sum_{i=1}^{n}w_{i}=1</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p5.2.m2.1d">∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1</annotation></semantics></math> for <math alttext="w_{i}\geq 0" class="ltx_Math" display="inline" id="S2.SS1.p5.3.m3.1"><semantics id="S2.SS1.p5.3.m3.1a"><mrow id="S2.SS1.p5.3.m3.1.1" xref="S2.SS1.p5.3.m3.1.1.cmml"><msub id="S2.SS1.p5.3.m3.1.1.2" xref="S2.SS1.p5.3.m3.1.1.2.cmml"><mi id="S2.SS1.p5.3.m3.1.1.2.2" xref="S2.SS1.p5.3.m3.1.1.2.2.cmml">w</mi><mi id="S2.SS1.p5.3.m3.1.1.2.3" xref="S2.SS1.p5.3.m3.1.1.2.3.cmml">i</mi></msub><mo id="S2.SS1.p5.3.m3.1.1.1" xref="S2.SS1.p5.3.m3.1.1.1.cmml">≥</mo><mn id="S2.SS1.p5.3.m3.1.1.3" xref="S2.SS1.p5.3.m3.1.1.3.cmml">0</mn></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p5.3.m3.1b"><apply id="S2.SS1.p5.3.m3.1.1.cmml" xref="S2.SS1.p5.3.m3.1.1"><geq id="S2.SS1.p5.3.m3.1.1.1.cmml" xref="S2.SS1.p5.3.m3.1.1.1"></geq><apply id="S2.SS1.p5.3.m3.1.1.2.cmml" xref="S2.SS1.p5.3.m3.1.1.2"><csymbol cd="ambiguous" id="S2.SS1.p5.3.m3.1.1.2.1.cmml" xref="S2.SS1.p5.3.m3.1.1.2">subscript</csymbol><ci id="S2.SS1.p5.3.m3.1.1.2.2.cmml" xref="S2.SS1.p5.3.m3.1.1.2.2">𝑤</ci><ci id="S2.SS1.p5.3.m3.1.1.2.3.cmml" xref="S2.SS1.p5.3.m3.1.1.2.3">𝑖</ci></apply><cn id="S2.SS1.p5.3.m3.1.1.3.cmml" type="integer" xref="S2.SS1.p5.3.m3.1.1.3">0</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p5.3.m3.1c">w_{i}\geq 0</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p5.3.m3.1d">italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0</annotation></semantics></math>, and <math alttext="A_{t}\in\mathbb{R}" class="ltx_Math" display="inline" id="S2.SS1.p5.4.m4.1"><semantics id="S2.SS1.p5.4.m4.1a"><mrow id="S2.SS1.p5.4.m4.1.1" xref="S2.SS1.p5.4.m4.1.1.cmml"><msub id="S2.SS1.p5.4.m4.1.1.2" xref="S2.SS1.p5.4.m4.1.1.2.cmml"><mi id="S2.SS1.p5.4.m4.1.1.2.2" xref="S2.SS1.p5.4.m4.1.1.2.2.cmml">A</mi><mi id="S2.SS1.p5.4.m4.1.1.2.3" xref="S2.SS1.p5.4.m4.1.1.2.3.cmml">t</mi></msub><mo id="S2.SS1.p5.4.m4.1.1.1" xref="S2.SS1.p5.4.m4.1.1.1.cmml">∈</mo><mi id="S2.SS1.p5.4.m4.1.1.3" xref="S2.SS1.p5.4.m4.1.1.3.cmml">ℝ</mi></mrow><annotation-xml encoding="MathML-Content" id="S2.SS1.p5.4.m4.1b"><apply id="S2.SS1.p5.4.m4.1.1.cmml" xref="S2.SS1.p5.4.m4.1.1"><in id="S2.SS1.p5.4.m4.1.1.1.cmml" xref="S2.SS1.p5.4.m4.1.1.1"></in><apply id="S2.SS1.p5.4.m4.1.1.2.cmml" xref="S2.SS1.p5.4.m4.1.1.2"><csymbol cd="ambiguous" id="S2.SS1.p5.4.m4.1.1.2.1.cmml" xref="S2.SS1.p5.4.m4.1.1.2">subscript</csymbol><ci id="S2.SS1.p5.4.m4.1.1.2.2.cmml" xref="S2.SS1.p5.4.m4.1.1.2.2">𝐴</ci><ci id="S2.SS1.p5.4.m4.1.1.2.3.cmml" xref="S2.SS1.p5.4.m4.1.1.2.3">𝑡</ci></apply><ci id="S2.SS1.p5.4.m4.1.1.3.cmml" xref="S2.SS1.p5.4.m4.1.1.3">ℝ</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p5.4.m4.1c">A_{t}\in\mathbb{R}</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p5.4.m4.1d">italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R</annotation></semantics></math> is the scalarized advantage. This step incorporates the scalarization function, given by DOL, to train the single-policy MOPPO. The scalarized advantage is then used to calculate the policy loss, which contributes to the total loss. For detailed information on the loss calculation in MOPPO, we refer to <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib23" title="">23</a>]</cite>. The MOPPO produces the final value vector <span class="ltx_text ltx_markedasmath ltx_font_bold" id="S2.SS1.p5.8.1">V</span> and the model <math alttext="m" class="ltx_Math" display="inline" id="S2.SS1.p5.6.m6.1"><semantics id="S2.SS1.p5.6.m6.1a"><mi id="S2.SS1.p5.6.m6.1.1" xref="S2.SS1.p5.6.m6.1.1.cmml">m</mi><annotation-xml encoding="MathML-Content" id="S2.SS1.p5.6.m6.1b"><ci id="S2.SS1.p5.6.m6.1.1.cmml" xref="S2.SS1.p5.6.m6.1.1">𝑚</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p5.6.m6.1c">m</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p5.6.m6.1d">italic_m</annotation></semantics></math>, containing the neural network weights <math alttext="\theta" class="ltx_Math" display="inline" id="S2.SS1.p5.7.m7.1"><semantics id="S2.SS1.p5.7.m7.1a"><mi id="S2.SS1.p5.7.m7.1.1" xref="S2.SS1.p5.7.m7.1.1.cmml">θ</mi><annotation-xml encoding="MathML-Content" id="S2.SS1.p5.7.m7.1b"><ci id="S2.SS1.p5.7.m7.1.1.cmml" xref="S2.SS1.p5.7.m7.1.1">𝜃</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p5.7.m7.1c">\theta</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p5.7.m7.1d">italic_θ</annotation></semantics></math> and the learnt policy <math alttext="\pi" class="ltx_Math" display="inline" id="S2.SS1.p5.8.m8.1"><semantics id="S2.SS1.p5.8.m8.1a"><mi id="S2.SS1.p5.8.m8.1.1" xref="S2.SS1.p5.8.m8.1.1.cmml">π</mi><annotation-xml encoding="MathML-Content" id="S2.SS1.p5.8.m8.1b"><ci id="S2.SS1.p5.8.m8.1.1.cmml" xref="S2.SS1.p5.8.m8.1.1">𝜋</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS1.p5.8.m8.1c">\pi</annotation><annotation encoding="application/x-llamapun" id="S2.SS1.p5.8.m8.1d">italic_π</annotation></semantics></math>.</p> </div> <figure class="ltx_float ltx_float_algorithm ltx_framed ltx_framed_top" id="alg1"> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_float"><span class="ltx_text ltx_font_bold" id="alg1.2.1.1">Algorithm 1</span> </span> Multi-Objective PPO (MOPPO) Training</figcaption> <div class="ltx_listing ltx_listing" id="alg1.3"> <div class="ltx_listingline" id="alg1.l0"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l0.1.1.1" style="font-size:80%;">0:</span></span>  Multi-Objective MDP, scalarization weight vector <span class="ltx_text ltx_markedasmath ltx_font_bold" id="alg1.l0.2">w</span>, number of update cycles </div> <div class="ltx_listingline" id="alg1.l1"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l1.1.1.1" style="font-size:80%;">1:</span></span>  initialize MOPPO network <math alttext="\theta\leftarrow\emptyset" class="ltx_Math" display="inline" id="alg1.l1.m1.1"><semantics id="alg1.l1.m1.1a"><mrow id="alg1.l1.m1.1.1" xref="alg1.l1.m1.1.1.cmml"><mi id="alg1.l1.m1.1.1.2" xref="alg1.l1.m1.1.1.2.cmml">θ</mi><mo id="alg1.l1.m1.1.1.1" stretchy="false" xref="alg1.l1.m1.1.1.1.cmml">←</mo><mi id="alg1.l1.m1.1.1.3" mathvariant="normal" xref="alg1.l1.m1.1.1.3.cmml">∅</mi></mrow><annotation-xml encoding="MathML-Content" id="alg1.l1.m1.1b"><apply id="alg1.l1.m1.1.1.cmml" xref="alg1.l1.m1.1.1"><ci id="alg1.l1.m1.1.1.1.cmml" xref="alg1.l1.m1.1.1.1">←</ci><ci id="alg1.l1.m1.1.1.2.cmml" xref="alg1.l1.m1.1.1.2">𝜃</ci><emptyset id="alg1.l1.m1.1.1.3.cmml" xref="alg1.l1.m1.1.1.3"></emptyset></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l1.m1.1c">\theta\leftarrow\emptyset</annotation><annotation encoding="application/x-llamapun" id="alg1.l1.m1.1d">italic_θ ← ∅</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg1.l2"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l2.1.1.1" style="font-size:80%;">2:</span></span>  <span class="ltx_text ltx_font_bold" id="alg1.l2.2">for</span> Number of updates cycles <span class="ltx_text ltx_font_bold" id="alg1.l2.3">do</span> </div> <div class="ltx_listingline" id="alg1.l3"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l3.1.1.1" style="font-size:80%;">3:</span></span>     initialize replaybuffer(<span class="ltx_text ltx_markedasmath ltx_font_typewriter" id="alg1.l3.2">B</span>) batch <math alttext="\texttt{B}\leftarrow\emptyset" class="ltx_Math" display="inline" id="alg1.l3.m2.1"><semantics id="alg1.l3.m2.1a"><mrow id="alg1.l3.m2.1.1" xref="alg1.l3.m2.1.1.cmml"><mtext class="ltx_mathvariant_monospace" id="alg1.l3.m2.1.1.2" xref="alg1.l3.m2.1.1.2a.cmml">B</mtext><mo id="alg1.l3.m2.1.1.1" stretchy="false" xref="alg1.l3.m2.1.1.1.cmml">←</mo><mi id="alg1.l3.m2.1.1.3" mathvariant="normal" xref="alg1.l3.m2.1.1.3.cmml">∅</mi></mrow><annotation-xml encoding="MathML-Content" id="alg1.l3.m2.1b"><apply id="alg1.l3.m2.1.1.cmml" xref="alg1.l3.m2.1.1"><ci id="alg1.l3.m2.1.1.1.cmml" xref="alg1.l3.m2.1.1.1">←</ci><ci id="alg1.l3.m2.1.1.2a.cmml" xref="alg1.l3.m2.1.1.2"><mtext class="ltx_mathvariant_monospace" id="alg1.l3.m2.1.1.2.cmml" xref="alg1.l3.m2.1.1.2">B</mtext></ci><emptyset id="alg1.l3.m2.1.1.3.cmml" xref="alg1.l3.m2.1.1.3"></emptyset></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l3.m2.1c">\texttt{B}\leftarrow\emptyset</annotation><annotation encoding="application/x-llamapun" id="alg1.l3.m2.1d">B ← ∅</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg1.l4"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l4.1.1.1" style="font-size:80%;">4:</span></span>     <math alttext="(\textbf{V}_{t},\textbf{R}_{t})_{\texttt{B}}\leftarrow\texttt{collect samples}% (\texttt{MOPPO}_{\theta})" class="ltx_Math" display="inline" id="alg1.l4.m1.3"><semantics id="alg1.l4.m1.3a"><mrow id="alg1.l4.m1.3.3" xref="alg1.l4.m1.3.3.cmml"><msub id="alg1.l4.m1.2.2.2" xref="alg1.l4.m1.2.2.2.cmml"><mrow id="alg1.l4.m1.2.2.2.2.2" xref="alg1.l4.m1.2.2.2.2.3.cmml"><mo id="alg1.l4.m1.2.2.2.2.2.3" stretchy="false" xref="alg1.l4.m1.2.2.2.2.3.cmml">(</mo><msub id="alg1.l4.m1.1.1.1.1.1.1" xref="alg1.l4.m1.1.1.1.1.1.1.cmml"><mtext class="ltx_mathvariant_bold" id="alg1.l4.m1.1.1.1.1.1.1.2" xref="alg1.l4.m1.1.1.1.1.1.1.2a.cmml">V</mtext><mi id="alg1.l4.m1.1.1.1.1.1.1.3" xref="alg1.l4.m1.1.1.1.1.1.1.3.cmml">t</mi></msub><mo id="alg1.l4.m1.2.2.2.2.2.4" xref="alg1.l4.m1.2.2.2.2.3.cmml">,</mo><msub id="alg1.l4.m1.2.2.2.2.2.2" xref="alg1.l4.m1.2.2.2.2.2.2.cmml"><mtext class="ltx_mathvariant_bold" id="alg1.l4.m1.2.2.2.2.2.2.2" xref="alg1.l4.m1.2.2.2.2.2.2.2a.cmml">R</mtext><mi id="alg1.l4.m1.2.2.2.2.2.2.3" xref="alg1.l4.m1.2.2.2.2.2.2.3.cmml">t</mi></msub><mo id="alg1.l4.m1.2.2.2.2.2.5" stretchy="false" xref="alg1.l4.m1.2.2.2.2.3.cmml">)</mo></mrow><mtext class="ltx_mathvariant_monospace" id="alg1.l4.m1.2.2.2.4" xref="alg1.l4.m1.2.2.2.4a.cmml">B</mtext></msub><mo id="alg1.l4.m1.3.3.4" stretchy="false" xref="alg1.l4.m1.3.3.4.cmml">←</mo><mrow id="alg1.l4.m1.3.3.3" xref="alg1.l4.m1.3.3.3.cmml"><mtext class="ltx_mathvariant_monospace" id="alg1.l4.m1.3.3.3.3" xref="alg1.l4.m1.3.3.3.3a.cmml">collect samples</mtext><mo id="alg1.l4.m1.3.3.3.2" xref="alg1.l4.m1.3.3.3.2.cmml">⁢</mo><mrow id="alg1.l4.m1.3.3.3.1.1" xref="alg1.l4.m1.3.3.3.1.1.1.cmml"><mo id="alg1.l4.m1.3.3.3.1.1.2" stretchy="false" xref="alg1.l4.m1.3.3.3.1.1.1.cmml">(</mo><msub id="alg1.l4.m1.3.3.3.1.1.1" xref="alg1.l4.m1.3.3.3.1.1.1.cmml"><mtext class="ltx_mathvariant_monospace" id="alg1.l4.m1.3.3.3.1.1.1.2" xref="alg1.l4.m1.3.3.3.1.1.1.2a.cmml">MOPPO</mtext><mi id="alg1.l4.m1.3.3.3.1.1.1.3" xref="alg1.l4.m1.3.3.3.1.1.1.3.cmml">θ</mi></msub><mo id="alg1.l4.m1.3.3.3.1.1.3" stretchy="false" xref="alg1.l4.m1.3.3.3.1.1.1.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg1.l4.m1.3b"><apply id="alg1.l4.m1.3.3.cmml" xref="alg1.l4.m1.3.3"><ci id="alg1.l4.m1.3.3.4.cmml" xref="alg1.l4.m1.3.3.4">←</ci><apply id="alg1.l4.m1.2.2.2.cmml" xref="alg1.l4.m1.2.2.2"><csymbol cd="ambiguous" id="alg1.l4.m1.2.2.2.3.cmml" xref="alg1.l4.m1.2.2.2">subscript</csymbol><interval closure="open" id="alg1.l4.m1.2.2.2.2.3.cmml" xref="alg1.l4.m1.2.2.2.2.2"><apply id="alg1.l4.m1.1.1.1.1.1.1.cmml" xref="alg1.l4.m1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="alg1.l4.m1.1.1.1.1.1.1.1.cmml" xref="alg1.l4.m1.1.1.1.1.1.1">subscript</csymbol><ci id="alg1.l4.m1.1.1.1.1.1.1.2a.cmml" xref="alg1.l4.m1.1.1.1.1.1.1.2"><mtext class="ltx_mathvariant_bold" id="alg1.l4.m1.1.1.1.1.1.1.2.cmml" xref="alg1.l4.m1.1.1.1.1.1.1.2">V</mtext></ci><ci id="alg1.l4.m1.1.1.1.1.1.1.3.cmml" xref="alg1.l4.m1.1.1.1.1.1.1.3">𝑡</ci></apply><apply id="alg1.l4.m1.2.2.2.2.2.2.cmml" xref="alg1.l4.m1.2.2.2.2.2.2"><csymbol cd="ambiguous" id="alg1.l4.m1.2.2.2.2.2.2.1.cmml" xref="alg1.l4.m1.2.2.2.2.2.2">subscript</csymbol><ci id="alg1.l4.m1.2.2.2.2.2.2.2a.cmml" xref="alg1.l4.m1.2.2.2.2.2.2.2"><mtext class="ltx_mathvariant_bold" id="alg1.l4.m1.2.2.2.2.2.2.2.cmml" xref="alg1.l4.m1.2.2.2.2.2.2.2">R</mtext></ci><ci id="alg1.l4.m1.2.2.2.2.2.2.3.cmml" xref="alg1.l4.m1.2.2.2.2.2.2.3">𝑡</ci></apply></interval><ci id="alg1.l4.m1.2.2.2.4a.cmml" xref="alg1.l4.m1.2.2.2.4"><mtext class="ltx_mathvariant_monospace" id="alg1.l4.m1.2.2.2.4.cmml" mathsize="70%" xref="alg1.l4.m1.2.2.2.4">B</mtext></ci></apply><apply id="alg1.l4.m1.3.3.3.cmml" xref="alg1.l4.m1.3.3.3"><times id="alg1.l4.m1.3.3.3.2.cmml" xref="alg1.l4.m1.3.3.3.2"></times><ci id="alg1.l4.m1.3.3.3.3a.cmml" xref="alg1.l4.m1.3.3.3.3"><mtext class="ltx_mathvariant_monospace" id="alg1.l4.m1.3.3.3.3.cmml" xref="alg1.l4.m1.3.3.3.3">collect samples</mtext></ci><apply id="alg1.l4.m1.3.3.3.1.1.1.cmml" xref="alg1.l4.m1.3.3.3.1.1"><csymbol cd="ambiguous" id="alg1.l4.m1.3.3.3.1.1.1.1.cmml" xref="alg1.l4.m1.3.3.3.1.1">subscript</csymbol><ci id="alg1.l4.m1.3.3.3.1.1.1.2a.cmml" xref="alg1.l4.m1.3.3.3.1.1.1.2"><mtext class="ltx_mathvariant_monospace" id="alg1.l4.m1.3.3.3.1.1.1.2.cmml" xref="alg1.l4.m1.3.3.3.1.1.1.2">MOPPO</mtext></ci><ci id="alg1.l4.m1.3.3.3.1.1.1.3.cmml" xref="alg1.l4.m1.3.3.3.1.1.1.3">𝜃</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l4.m1.3c">(\textbf{V}_{t},\textbf{R}_{t})_{\texttt{B}}\leftarrow\texttt{collect samples}% (\texttt{MOPPO}_{\theta})</annotation><annotation encoding="application/x-llamapun" id="alg1.l4.m1.3d">( V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT B end_POSTSUBSCRIPT ← collect samples ( MOPPO start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT )</annotation></semantics></math> {Fill replay buffer <span class="ltx_text ltx_markedasmath ltx_font_typewriter" id="alg1.l4.2">B</span> by collecting samples from the environment, containing batchsize tuples of vectorized value function and vectorized rewards <math alttext="(\textbf{V}_{t},\textbf{R}_{t})_{\texttt{B}}" class="ltx_Math" display="inline" id="alg1.l4.m3.2"><semantics id="alg1.l4.m3.2a"><msub id="alg1.l4.m3.2.2" xref="alg1.l4.m3.2.2.cmml"><mrow id="alg1.l4.m3.2.2.2.2" xref="alg1.l4.m3.2.2.2.3.cmml"><mo id="alg1.l4.m3.2.2.2.2.3" stretchy="false" xref="alg1.l4.m3.2.2.2.3.cmml">(</mo><msub id="alg1.l4.m3.1.1.1.1.1" xref="alg1.l4.m3.1.1.1.1.1.cmml"><mtext class="ltx_mathvariant_bold" id="alg1.l4.m3.1.1.1.1.1.2" xref="alg1.l4.m3.1.1.1.1.1.2a.cmml">V</mtext><mi id="alg1.l4.m3.1.1.1.1.1.3" xref="alg1.l4.m3.1.1.1.1.1.3.cmml">t</mi></msub><mo id="alg1.l4.m3.2.2.2.2.4" xref="alg1.l4.m3.2.2.2.3.cmml">,</mo><msub id="alg1.l4.m3.2.2.2.2.2" xref="alg1.l4.m3.2.2.2.2.2.cmml"><mtext class="ltx_mathvariant_bold" id="alg1.l4.m3.2.2.2.2.2.2" xref="alg1.l4.m3.2.2.2.2.2.2a.cmml">R</mtext><mi id="alg1.l4.m3.2.2.2.2.2.3" xref="alg1.l4.m3.2.2.2.2.2.3.cmml">t</mi></msub><mo id="alg1.l4.m3.2.2.2.2.5" stretchy="false" xref="alg1.l4.m3.2.2.2.3.cmml">)</mo></mrow><mtext class="ltx_mathvariant_monospace" id="alg1.l4.m3.2.2.4" xref="alg1.l4.m3.2.2.4a.cmml">B</mtext></msub><annotation-xml encoding="MathML-Content" id="alg1.l4.m3.2b"><apply id="alg1.l4.m3.2.2.cmml" xref="alg1.l4.m3.2.2"><csymbol cd="ambiguous" id="alg1.l4.m3.2.2.3.cmml" xref="alg1.l4.m3.2.2">subscript</csymbol><interval closure="open" id="alg1.l4.m3.2.2.2.3.cmml" xref="alg1.l4.m3.2.2.2.2"><apply id="alg1.l4.m3.1.1.1.1.1.cmml" xref="alg1.l4.m3.1.1.1.1.1"><csymbol cd="ambiguous" id="alg1.l4.m3.1.1.1.1.1.1.cmml" xref="alg1.l4.m3.1.1.1.1.1">subscript</csymbol><ci id="alg1.l4.m3.1.1.1.1.1.2a.cmml" xref="alg1.l4.m3.1.1.1.1.1.2"><mtext class="ltx_mathvariant_bold" id="alg1.l4.m3.1.1.1.1.1.2.cmml" xref="alg1.l4.m3.1.1.1.1.1.2">V</mtext></ci><ci id="alg1.l4.m3.1.1.1.1.1.3.cmml" xref="alg1.l4.m3.1.1.1.1.1.3">𝑡</ci></apply><apply id="alg1.l4.m3.2.2.2.2.2.cmml" xref="alg1.l4.m3.2.2.2.2.2"><csymbol cd="ambiguous" id="alg1.l4.m3.2.2.2.2.2.1.cmml" xref="alg1.l4.m3.2.2.2.2.2">subscript</csymbol><ci id="alg1.l4.m3.2.2.2.2.2.2a.cmml" xref="alg1.l4.m3.2.2.2.2.2.2"><mtext class="ltx_mathvariant_bold" id="alg1.l4.m3.2.2.2.2.2.2.cmml" xref="alg1.l4.m3.2.2.2.2.2.2">R</mtext></ci><ci id="alg1.l4.m3.2.2.2.2.2.3.cmml" xref="alg1.l4.m3.2.2.2.2.2.3">𝑡</ci></apply></interval><ci id="alg1.l4.m3.2.2.4a.cmml" xref="alg1.l4.m3.2.2.4"><mtext class="ltx_mathvariant_monospace" id="alg1.l4.m3.2.2.4.cmml" mathsize="70%" xref="alg1.l4.m3.2.2.4">B</mtext></ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l4.m3.2c">(\textbf{V}_{t},\textbf{R}_{t})_{\texttt{B}}</annotation><annotation encoding="application/x-llamapun" id="alg1.l4.m3.2d">( V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT B end_POSTSUBSCRIPT</annotation></semantics></math>} </div> <div class="ltx_listingline" id="alg1.l5"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l5.1.1.1" style="font-size:80%;">5:</span></span>     <math alttext="\textbf{A}_{t}\leftarrow\texttt{compute advantages}((\textbf{V}_{t},\textbf{R}% _{t})\texttt{B})" class="ltx_Math" display="inline" id="alg1.l5.m1.1"><semantics id="alg1.l5.m1.1a"><mrow id="alg1.l5.m1.1.1" xref="alg1.l5.m1.1.1.cmml"><msub id="alg1.l5.m1.1.1.3" xref="alg1.l5.m1.1.1.3.cmml"><mtext class="ltx_mathvariant_bold" id="alg1.l5.m1.1.1.3.2" xref="alg1.l5.m1.1.1.3.2a.cmml">A</mtext><mi id="alg1.l5.m1.1.1.3.3" xref="alg1.l5.m1.1.1.3.3.cmml">t</mi></msub><mo id="alg1.l5.m1.1.1.2" stretchy="false" xref="alg1.l5.m1.1.1.2.cmml">←</mo><mrow id="alg1.l5.m1.1.1.1" xref="alg1.l5.m1.1.1.1.cmml"><mtext class="ltx_mathvariant_monospace" id="alg1.l5.m1.1.1.1.3" xref="alg1.l5.m1.1.1.1.3a.cmml">compute advantages</mtext><mo id="alg1.l5.m1.1.1.1.2" xref="alg1.l5.m1.1.1.1.2.cmml">⁢</mo><mrow id="alg1.l5.m1.1.1.1.1.1" xref="alg1.l5.m1.1.1.1.1.1.1.cmml"><mo id="alg1.l5.m1.1.1.1.1.1.2" stretchy="false" xref="alg1.l5.m1.1.1.1.1.1.1.cmml">(</mo><mrow id="alg1.l5.m1.1.1.1.1.1.1" xref="alg1.l5.m1.1.1.1.1.1.1.cmml"><mrow id="alg1.l5.m1.1.1.1.1.1.1.2.2" xref="alg1.l5.m1.1.1.1.1.1.1.2.3.cmml"><mo id="alg1.l5.m1.1.1.1.1.1.1.2.2.3" stretchy="false" xref="alg1.l5.m1.1.1.1.1.1.1.2.3.cmml">(</mo><msub id="alg1.l5.m1.1.1.1.1.1.1.1.1.1" xref="alg1.l5.m1.1.1.1.1.1.1.1.1.1.cmml"><mtext class="ltx_mathvariant_bold" id="alg1.l5.m1.1.1.1.1.1.1.1.1.1.2" xref="alg1.l5.m1.1.1.1.1.1.1.1.1.1.2a.cmml">V</mtext><mi id="alg1.l5.m1.1.1.1.1.1.1.1.1.1.3" xref="alg1.l5.m1.1.1.1.1.1.1.1.1.1.3.cmml">t</mi></msub><mo id="alg1.l5.m1.1.1.1.1.1.1.2.2.4" xref="alg1.l5.m1.1.1.1.1.1.1.2.3.cmml">,</mo><msub id="alg1.l5.m1.1.1.1.1.1.1.2.2.2" xref="alg1.l5.m1.1.1.1.1.1.1.2.2.2.cmml"><mtext class="ltx_mathvariant_bold" id="alg1.l5.m1.1.1.1.1.1.1.2.2.2.2" xref="alg1.l5.m1.1.1.1.1.1.1.2.2.2.2a.cmml">R</mtext><mi id="alg1.l5.m1.1.1.1.1.1.1.2.2.2.3" xref="alg1.l5.m1.1.1.1.1.1.1.2.2.2.3.cmml">t</mi></msub><mo id="alg1.l5.m1.1.1.1.1.1.1.2.2.5" stretchy="false" xref="alg1.l5.m1.1.1.1.1.1.1.2.3.cmml">)</mo></mrow><mo id="alg1.l5.m1.1.1.1.1.1.1.3" xref="alg1.l5.m1.1.1.1.1.1.1.3.cmml">⁢</mo><mtext class="ltx_mathvariant_monospace" id="alg1.l5.m1.1.1.1.1.1.1.4" xref="alg1.l5.m1.1.1.1.1.1.1.4a.cmml">B</mtext></mrow><mo id="alg1.l5.m1.1.1.1.1.1.3" stretchy="false" xref="alg1.l5.m1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg1.l5.m1.1b"><apply id="alg1.l5.m1.1.1.cmml" xref="alg1.l5.m1.1.1"><ci id="alg1.l5.m1.1.1.2.cmml" xref="alg1.l5.m1.1.1.2">←</ci><apply id="alg1.l5.m1.1.1.3.cmml" xref="alg1.l5.m1.1.1.3"><csymbol cd="ambiguous" id="alg1.l5.m1.1.1.3.1.cmml" xref="alg1.l5.m1.1.1.3">subscript</csymbol><ci id="alg1.l5.m1.1.1.3.2a.cmml" xref="alg1.l5.m1.1.1.3.2"><mtext class="ltx_mathvariant_bold" id="alg1.l5.m1.1.1.3.2.cmml" xref="alg1.l5.m1.1.1.3.2">A</mtext></ci><ci id="alg1.l5.m1.1.1.3.3.cmml" xref="alg1.l5.m1.1.1.3.3">𝑡</ci></apply><apply id="alg1.l5.m1.1.1.1.cmml" xref="alg1.l5.m1.1.1.1"><times id="alg1.l5.m1.1.1.1.2.cmml" xref="alg1.l5.m1.1.1.1.2"></times><ci id="alg1.l5.m1.1.1.1.3a.cmml" xref="alg1.l5.m1.1.1.1.3"><mtext class="ltx_mathvariant_monospace" id="alg1.l5.m1.1.1.1.3.cmml" xref="alg1.l5.m1.1.1.1.3">compute advantages</mtext></ci><apply id="alg1.l5.m1.1.1.1.1.1.1.cmml" xref="alg1.l5.m1.1.1.1.1.1"><times id="alg1.l5.m1.1.1.1.1.1.1.3.cmml" xref="alg1.l5.m1.1.1.1.1.1.1.3"></times><interval closure="open" id="alg1.l5.m1.1.1.1.1.1.1.2.3.cmml" xref="alg1.l5.m1.1.1.1.1.1.1.2.2"><apply id="alg1.l5.m1.1.1.1.1.1.1.1.1.1.cmml" xref="alg1.l5.m1.1.1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="alg1.l5.m1.1.1.1.1.1.1.1.1.1.1.cmml" xref="alg1.l5.m1.1.1.1.1.1.1.1.1.1">subscript</csymbol><ci id="alg1.l5.m1.1.1.1.1.1.1.1.1.1.2a.cmml" xref="alg1.l5.m1.1.1.1.1.1.1.1.1.1.2"><mtext class="ltx_mathvariant_bold" id="alg1.l5.m1.1.1.1.1.1.1.1.1.1.2.cmml" xref="alg1.l5.m1.1.1.1.1.1.1.1.1.1.2">V</mtext></ci><ci id="alg1.l5.m1.1.1.1.1.1.1.1.1.1.3.cmml" xref="alg1.l5.m1.1.1.1.1.1.1.1.1.1.3">𝑡</ci></apply><apply id="alg1.l5.m1.1.1.1.1.1.1.2.2.2.cmml" xref="alg1.l5.m1.1.1.1.1.1.1.2.2.2"><csymbol cd="ambiguous" id="alg1.l5.m1.1.1.1.1.1.1.2.2.2.1.cmml" xref="alg1.l5.m1.1.1.1.1.1.1.2.2.2">subscript</csymbol><ci id="alg1.l5.m1.1.1.1.1.1.1.2.2.2.2a.cmml" xref="alg1.l5.m1.1.1.1.1.1.1.2.2.2.2"><mtext class="ltx_mathvariant_bold" id="alg1.l5.m1.1.1.1.1.1.1.2.2.2.2.cmml" xref="alg1.l5.m1.1.1.1.1.1.1.2.2.2.2">R</mtext></ci><ci id="alg1.l5.m1.1.1.1.1.1.1.2.2.2.3.cmml" xref="alg1.l5.m1.1.1.1.1.1.1.2.2.2.3">𝑡</ci></apply></interval><ci id="alg1.l5.m1.1.1.1.1.1.1.4a.cmml" xref="alg1.l5.m1.1.1.1.1.1.1.4"><mtext class="ltx_mathvariant_monospace" id="alg1.l5.m1.1.1.1.1.1.1.4.cmml" xref="alg1.l5.m1.1.1.1.1.1.1.4">B</mtext></ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l5.m1.1c">\textbf{A}_{t}\leftarrow\texttt{compute advantages}((\textbf{V}_{t},\textbf{R}% _{t})\texttt{B})</annotation><annotation encoding="application/x-llamapun" id="alg1.l5.m1.1d">A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← compute advantages ( ( V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) B )</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg1.l6"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l6.1.1.1" style="font-size:80%;">6:</span></span>     <math alttext="A_{t}=\textbf{w}^{\top}\textbf{A}_{t}" class="ltx_Math" display="inline" id="alg1.l6.m1.1"><semantics id="alg1.l6.m1.1a"><mrow id="alg1.l6.m1.1.1" xref="alg1.l6.m1.1.1.cmml"><msub id="alg1.l6.m1.1.1.2" xref="alg1.l6.m1.1.1.2.cmml"><mi id="alg1.l6.m1.1.1.2.2" xref="alg1.l6.m1.1.1.2.2.cmml">A</mi><mi id="alg1.l6.m1.1.1.2.3" xref="alg1.l6.m1.1.1.2.3.cmml">t</mi></msub><mo id="alg1.l6.m1.1.1.1" xref="alg1.l6.m1.1.1.1.cmml">=</mo><mrow id="alg1.l6.m1.1.1.3" xref="alg1.l6.m1.1.1.3.cmml"><msup id="alg1.l6.m1.1.1.3.2" xref="alg1.l6.m1.1.1.3.2.cmml"><mtext class="ltx_mathvariant_bold" id="alg1.l6.m1.1.1.3.2.2" xref="alg1.l6.m1.1.1.3.2.2a.cmml">w</mtext><mo id="alg1.l6.m1.1.1.3.2.3" xref="alg1.l6.m1.1.1.3.2.3.cmml">⊤</mo></msup><mo id="alg1.l6.m1.1.1.3.1" xref="alg1.l6.m1.1.1.3.1.cmml">⁢</mo><msub id="alg1.l6.m1.1.1.3.3" xref="alg1.l6.m1.1.1.3.3.cmml"><mtext class="ltx_mathvariant_bold" id="alg1.l6.m1.1.1.3.3.2" xref="alg1.l6.m1.1.1.3.3.2a.cmml">A</mtext><mi id="alg1.l6.m1.1.1.3.3.3" xref="alg1.l6.m1.1.1.3.3.3.cmml">t</mi></msub></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg1.l6.m1.1b"><apply id="alg1.l6.m1.1.1.cmml" xref="alg1.l6.m1.1.1"><eq id="alg1.l6.m1.1.1.1.cmml" xref="alg1.l6.m1.1.1.1"></eq><apply id="alg1.l6.m1.1.1.2.cmml" xref="alg1.l6.m1.1.1.2"><csymbol cd="ambiguous" id="alg1.l6.m1.1.1.2.1.cmml" xref="alg1.l6.m1.1.1.2">subscript</csymbol><ci id="alg1.l6.m1.1.1.2.2.cmml" xref="alg1.l6.m1.1.1.2.2">𝐴</ci><ci id="alg1.l6.m1.1.1.2.3.cmml" xref="alg1.l6.m1.1.1.2.3">𝑡</ci></apply><apply id="alg1.l6.m1.1.1.3.cmml" xref="alg1.l6.m1.1.1.3"><times id="alg1.l6.m1.1.1.3.1.cmml" xref="alg1.l6.m1.1.1.3.1"></times><apply id="alg1.l6.m1.1.1.3.2.cmml" xref="alg1.l6.m1.1.1.3.2"><csymbol cd="ambiguous" id="alg1.l6.m1.1.1.3.2.1.cmml" xref="alg1.l6.m1.1.1.3.2">superscript</csymbol><ci id="alg1.l6.m1.1.1.3.2.2a.cmml" xref="alg1.l6.m1.1.1.3.2.2"><mtext class="ltx_mathvariant_bold" id="alg1.l6.m1.1.1.3.2.2.cmml" xref="alg1.l6.m1.1.1.3.2.2">w</mtext></ci><csymbol cd="latexml" id="alg1.l6.m1.1.1.3.2.3.cmml" xref="alg1.l6.m1.1.1.3.2.3">top</csymbol></apply><apply id="alg1.l6.m1.1.1.3.3.cmml" xref="alg1.l6.m1.1.1.3.3"><csymbol cd="ambiguous" id="alg1.l6.m1.1.1.3.3.1.cmml" xref="alg1.l6.m1.1.1.3.3">subscript</csymbol><ci id="alg1.l6.m1.1.1.3.3.2a.cmml" xref="alg1.l6.m1.1.1.3.3.2"><mtext class="ltx_mathvariant_bold" id="alg1.l6.m1.1.1.3.3.2.cmml" xref="alg1.l6.m1.1.1.3.3.2">A</mtext></ci><ci id="alg1.l6.m1.1.1.3.3.3.cmml" xref="alg1.l6.m1.1.1.3.3.3">𝑡</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l6.m1.1c">A_{t}=\textbf{w}^{\top}\textbf{A}_{t}</annotation><annotation encoding="application/x-llamapun" id="alg1.l6.m1.1d">italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> {Scalarize advantages (<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S2.E2" title="Equation 2 ‣ II-A Single-Policy Multi-Objective PPO ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">2</span></a>)} </div> <div class="ltx_listingline" id="alg1.l7"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l7.1.1.1" style="font-size:80%;">7:</span></span>     <math alttext="\theta\leftarrow\texttt{update}(A_{t})" class="ltx_Math" display="inline" id="alg1.l7.m1.1"><semantics id="alg1.l7.m1.1a"><mrow id="alg1.l7.m1.1.1" xref="alg1.l7.m1.1.1.cmml"><mi id="alg1.l7.m1.1.1.3" xref="alg1.l7.m1.1.1.3.cmml">θ</mi><mo id="alg1.l7.m1.1.1.2" stretchy="false" xref="alg1.l7.m1.1.1.2.cmml">←</mo><mrow id="alg1.l7.m1.1.1.1" xref="alg1.l7.m1.1.1.1.cmml"><mtext class="ltx_mathvariant_monospace" id="alg1.l7.m1.1.1.1.3" xref="alg1.l7.m1.1.1.1.3a.cmml">update</mtext><mo id="alg1.l7.m1.1.1.1.2" xref="alg1.l7.m1.1.1.1.2.cmml">⁢</mo><mrow id="alg1.l7.m1.1.1.1.1.1" xref="alg1.l7.m1.1.1.1.1.1.1.cmml"><mo id="alg1.l7.m1.1.1.1.1.1.2" stretchy="false" xref="alg1.l7.m1.1.1.1.1.1.1.cmml">(</mo><msub id="alg1.l7.m1.1.1.1.1.1.1" xref="alg1.l7.m1.1.1.1.1.1.1.cmml"><mi id="alg1.l7.m1.1.1.1.1.1.1.2" xref="alg1.l7.m1.1.1.1.1.1.1.2.cmml">A</mi><mi id="alg1.l7.m1.1.1.1.1.1.1.3" xref="alg1.l7.m1.1.1.1.1.1.1.3.cmml">t</mi></msub><mo id="alg1.l7.m1.1.1.1.1.1.3" stretchy="false" xref="alg1.l7.m1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg1.l7.m1.1b"><apply id="alg1.l7.m1.1.1.cmml" xref="alg1.l7.m1.1.1"><ci id="alg1.l7.m1.1.1.2.cmml" xref="alg1.l7.m1.1.1.2">←</ci><ci id="alg1.l7.m1.1.1.3.cmml" xref="alg1.l7.m1.1.1.3">𝜃</ci><apply id="alg1.l7.m1.1.1.1.cmml" xref="alg1.l7.m1.1.1.1"><times id="alg1.l7.m1.1.1.1.2.cmml" xref="alg1.l7.m1.1.1.1.2"></times><ci id="alg1.l7.m1.1.1.1.3a.cmml" xref="alg1.l7.m1.1.1.1.3"><mtext class="ltx_mathvariant_monospace" id="alg1.l7.m1.1.1.1.3.cmml" xref="alg1.l7.m1.1.1.1.3">update</mtext></ci><apply id="alg1.l7.m1.1.1.1.1.1.1.cmml" xref="alg1.l7.m1.1.1.1.1.1"><csymbol cd="ambiguous" id="alg1.l7.m1.1.1.1.1.1.1.1.cmml" xref="alg1.l7.m1.1.1.1.1.1">subscript</csymbol><ci id="alg1.l7.m1.1.1.1.1.1.1.2.cmml" xref="alg1.l7.m1.1.1.1.1.1.1.2">𝐴</ci><ci id="alg1.l7.m1.1.1.1.1.1.1.3.cmml" xref="alg1.l7.m1.1.1.1.1.1.1.3">𝑡</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l7.m1.1c">\theta\leftarrow\texttt{update}(A_{t})</annotation><annotation encoding="application/x-llamapun" id="alg1.l7.m1.1d">italic_θ ← update ( italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )</annotation></semantics></math> {updates the neural network} </div> <div class="ltx_listingline" id="alg1.l8"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l8.1.1.1" style="font-size:80%;">8:</span></span>  <span class="ltx_text ltx_font_bold" id="alg1.l8.2">end</span> <span class="ltx_text ltx_font_bold" id="alg1.l8.3">for</span> </div> </div> </figure> </section> <section class="ltx_subsection" id="S2.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S2.SS2.5.1.1">II-B</span> </span><span class="ltx_text ltx_font_italic" id="S2.SS2.6.2">Reward functions for operational objectives</span> </h3> <section class="ltx_subsubsection" id="S2.SS2.SSS1"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection"><span class="ltx_text" id="S2.SS2.SSS1.5.1.1">II-B</span>1 </span>Line Loading Reward</h4> <div class="ltx_para" id="S2.SS2.SSS1.p1"> <p class="ltx_p" id="S2.SS2.SSS1.p1.1">The <span class="ltx_text ltx_font_italic" id="S2.SS2.SSS1.p1.1.1">L2RPNReward</span>, referred to here as <span class="ltx_text ltx_font_italic" id="S2.SS2.SSS1.p1.1.2">Line Loading Reward</span>, is the conventional reward used in the L2RPN competition <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib3" title="">3</a>]</cite>, and widely adopted as a default reward in the literature <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib19" title="">19</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib20" title="">20</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib8" title="">8</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib7" title="">7</a>]</cite>. This reward emphasizes maintaining adequate thermal loading margins on power lines to ensure grid security. For each power line <math alttext="l" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.1.m1.1"><semantics id="S2.SS2.SSS1.p1.1.m1.1a"><mi id="S2.SS2.SSS1.p1.1.m1.1.1" xref="S2.SS2.SSS1.p1.1.m1.1.1.cmml">l</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.1.m1.1b"><ci id="S2.SS2.SSS1.p1.1.m1.1.1.cmml" xref="S2.SS2.SSS1.p1.1.m1.1.1">𝑙</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.1.m1.1c">l</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.1.m1.1d">italic_l</annotation></semantics></math>, the thermal loading margin is defined:</p> <table class="ltx_equation ltx_eqn_table" id="S2.E3"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\text{Margin}_{l,t}=\begin{cases}\displaystyle\frac{\overline{F}_{l}-|F_{l,t}|% }{\overline{F}_{l}}&amp;\text{if }|F_{l,t}|\leq\overline{F}_{l}\\[10.0pt] 0&amp;\text{if }|F_{l,t}|&gt;\overline{F}_{l}\end{cases}" class="ltx_Math" display="block" id="S2.E3.m1.6"><semantics id="S2.E3.m1.6a"><mrow id="S2.E3.m1.6.7" xref="S2.E3.m1.6.7.cmml"><msub id="S2.E3.m1.6.7.2" xref="S2.E3.m1.6.7.2.cmml"><mtext id="S2.E3.m1.6.7.2.2" xref="S2.E3.m1.6.7.2.2a.cmml">Margin</mtext><mrow id="S2.E3.m1.6.6.2.4" xref="S2.E3.m1.6.6.2.3.cmml"><mi id="S2.E3.m1.5.5.1.1" xref="S2.E3.m1.5.5.1.1.cmml">l</mi><mo id="S2.E3.m1.6.6.2.4.1" xref="S2.E3.m1.6.6.2.3.cmml">,</mo><mi id="S2.E3.m1.6.6.2.2" xref="S2.E3.m1.6.6.2.2.cmml">t</mi></mrow></msub><mo id="S2.E3.m1.6.7.1" xref="S2.E3.m1.6.7.1.cmml">=</mo><mrow id="S2.E3.m1.4.4" xref="S2.E3.m1.6.7.3.1.cmml"><mo id="S2.E3.m1.4.4.5" xref="S2.E3.m1.6.7.3.1.1.cmml">{</mo><mtable columnspacing="5pt" displaystyle="true" id="S2.E3.m1.4.4.4" rowspacing="0pt" xref="S2.E3.m1.6.7.3.1.cmml"><mtr id="S2.E3.m1.4.4.4a" xref="S2.E3.m1.6.7.3.1.cmml"><mtd class="ltx_align_left" columnalign="left" id="S2.E3.m1.4.4.4b" xref="S2.E3.m1.6.7.3.1.cmml"><mfrac id="S2.E3.m1.1.1.1.1.1.1" xref="S2.E3.m1.1.1.1.1.1.1.cmml"><mrow id="S2.E3.m1.1.1.1.1.1.1.3.3" xref="S2.E3.m1.1.1.1.1.1.1.3.3.cmml"><msub id="S2.E3.m1.1.1.1.1.1.1.3.3.5" xref="S2.E3.m1.1.1.1.1.1.1.3.3.5.cmml"><mover accent="true" id="S2.E3.m1.1.1.1.1.1.1.3.3.5.2" xref="S2.E3.m1.1.1.1.1.1.1.3.3.5.2.cmml"><mi id="S2.E3.m1.1.1.1.1.1.1.3.3.5.2.2" xref="S2.E3.m1.1.1.1.1.1.1.3.3.5.2.2.cmml">F</mi><mo id="S2.E3.m1.1.1.1.1.1.1.3.3.5.2.1" xref="S2.E3.m1.1.1.1.1.1.1.3.3.5.2.1.cmml">¯</mo></mover><mi id="S2.E3.m1.1.1.1.1.1.1.3.3.5.3" xref="S2.E3.m1.1.1.1.1.1.1.3.3.5.3.cmml">l</mi></msub><mo id="S2.E3.m1.1.1.1.1.1.1.3.3.4" xref="S2.E3.m1.1.1.1.1.1.1.3.3.4.cmml">−</mo><mrow id="S2.E3.m1.1.1.1.1.1.1.3.3.3.1" xref="S2.E3.m1.1.1.1.1.1.1.3.3.3.2.cmml"><mo id="S2.E3.m1.1.1.1.1.1.1.3.3.3.1.2" stretchy="false" xref="S2.E3.m1.1.1.1.1.1.1.3.3.3.2.1.cmml">|</mo><msub id="S2.E3.m1.1.1.1.1.1.1.3.3.3.1.1" xref="S2.E3.m1.1.1.1.1.1.1.3.3.3.1.1.cmml"><mi id="S2.E3.m1.1.1.1.1.1.1.3.3.3.1.1.2" xref="S2.E3.m1.1.1.1.1.1.1.3.3.3.1.1.2.cmml">F</mi><mrow id="S2.E3.m1.1.1.1.1.1.1.2.2.2.2.4" xref="S2.E3.m1.1.1.1.1.1.1.2.2.2.2.3.cmml"><mi id="S2.E3.m1.1.1.1.1.1.1.1.1.1.1.1" xref="S2.E3.m1.1.1.1.1.1.1.1.1.1.1.1.cmml">l</mi><mo id="S2.E3.m1.1.1.1.1.1.1.2.2.2.2.4.1" xref="S2.E3.m1.1.1.1.1.1.1.2.2.2.2.3.cmml">,</mo><mi id="S2.E3.m1.1.1.1.1.1.1.2.2.2.2.2" xref="S2.E3.m1.1.1.1.1.1.1.2.2.2.2.2.cmml">t</mi></mrow></msub><mo id="S2.E3.m1.1.1.1.1.1.1.3.3.3.1.3" stretchy="false" xref="S2.E3.m1.1.1.1.1.1.1.3.3.3.2.1.cmml">|</mo></mrow></mrow><msub id="S2.E3.m1.1.1.1.1.1.1.5" xref="S2.E3.m1.1.1.1.1.1.1.5.cmml"><mover accent="true" id="S2.E3.m1.1.1.1.1.1.1.5.2" xref="S2.E3.m1.1.1.1.1.1.1.5.2.cmml"><mi id="S2.E3.m1.1.1.1.1.1.1.5.2.2" xref="S2.E3.m1.1.1.1.1.1.1.5.2.2.cmml">F</mi><mo id="S2.E3.m1.1.1.1.1.1.1.5.2.1" xref="S2.E3.m1.1.1.1.1.1.1.5.2.1.cmml">¯</mo></mover><mi id="S2.E3.m1.1.1.1.1.1.1.5.3" xref="S2.E3.m1.1.1.1.1.1.1.5.3.cmml">l</mi></msub></mfrac></mtd><mtd class="ltx_align_left" columnalign="left" id="S2.E3.m1.4.4.4c" xref="S2.E3.m1.6.7.3.1.cmml"><mrow id="S2.E3.m1.2.2.2.2.2.1" xref="S2.E3.m1.2.2.2.2.2.1.cmml"><mrow id="S2.E3.m1.2.2.2.2.2.1.3" xref="S2.E3.m1.2.2.2.2.2.1.3.cmml"><mtext id="S2.E3.m1.2.2.2.2.2.1.3.3" xref="S2.E3.m1.2.2.2.2.2.1.3.3a.cmml">if </mtext><mo id="S2.E3.m1.2.2.2.2.2.1.3.2" xref="S2.E3.m1.2.2.2.2.2.1.3.2.cmml">⁢</mo><mrow id="S2.E3.m1.2.2.2.2.2.1.3.1.1" xref="S2.E3.m1.2.2.2.2.2.1.3.1.2.cmml"><mo id="S2.E3.m1.2.2.2.2.2.1.3.1.1.2" stretchy="false" xref="S2.E3.m1.2.2.2.2.2.1.3.1.2.1.cmml">|</mo><msub id="S2.E3.m1.2.2.2.2.2.1.3.1.1.1" xref="S2.E3.m1.2.2.2.2.2.1.3.1.1.1.cmml"><mi id="S2.E3.m1.2.2.2.2.2.1.3.1.1.1.2" xref="S2.E3.m1.2.2.2.2.2.1.3.1.1.1.2.cmml">F</mi><mrow id="S2.E3.m1.2.2.2.2.2.1.2.2.4" xref="S2.E3.m1.2.2.2.2.2.1.2.2.3.cmml"><mi id="S2.E3.m1.2.2.2.2.2.1.1.1.1" xref="S2.E3.m1.2.2.2.2.2.1.1.1.1.cmml">l</mi><mo id="S2.E3.m1.2.2.2.2.2.1.2.2.4.1" xref="S2.E3.m1.2.2.2.2.2.1.2.2.3.cmml">,</mo><mi id="S2.E3.m1.2.2.2.2.2.1.2.2.2" xref="S2.E3.m1.2.2.2.2.2.1.2.2.2.cmml">t</mi></mrow></msub><mo id="S2.E3.m1.2.2.2.2.2.1.3.1.1.3" stretchy="false" xref="S2.E3.m1.2.2.2.2.2.1.3.1.2.1.cmml">|</mo></mrow></mrow><mo id="S2.E3.m1.2.2.2.2.2.1.4" xref="S2.E3.m1.2.2.2.2.2.1.4.cmml">≤</mo><msub id="S2.E3.m1.2.2.2.2.2.1.5" xref="S2.E3.m1.2.2.2.2.2.1.5.cmml"><mover accent="true" id="S2.E3.m1.2.2.2.2.2.1.5.2" xref="S2.E3.m1.2.2.2.2.2.1.5.2.cmml"><mi id="S2.E3.m1.2.2.2.2.2.1.5.2.2" xref="S2.E3.m1.2.2.2.2.2.1.5.2.2.cmml">F</mi><mo id="S2.E3.m1.2.2.2.2.2.1.5.2.1" xref="S2.E3.m1.2.2.2.2.2.1.5.2.1.cmml">¯</mo></mover><mi id="S2.E3.m1.2.2.2.2.2.1.5.3" xref="S2.E3.m1.2.2.2.2.2.1.5.3.cmml">l</mi></msub></mrow></mtd></mtr><mtr id="S2.E3.m1.4.4.4d" xref="S2.E3.m1.6.7.3.1.cmml"><mtd class="ltx_align_left" columnalign="left" id="S2.E3.m1.4.4.4e" xref="S2.E3.m1.6.7.3.1.cmml"><mn id="S2.E3.m1.3.3.3.3.1.1" xref="S2.E3.m1.3.3.3.3.1.1.cmml">0</mn></mtd><mtd class="ltx_align_left" columnalign="left" id="S2.E3.m1.4.4.4f" xref="S2.E3.m1.6.7.3.1.cmml"><mrow id="S2.E3.m1.4.4.4.4.2.1" xref="S2.E3.m1.4.4.4.4.2.1.cmml"><mrow id="S2.E3.m1.4.4.4.4.2.1.3" xref="S2.E3.m1.4.4.4.4.2.1.3.cmml"><mtext id="S2.E3.m1.4.4.4.4.2.1.3.3" xref="S2.E3.m1.4.4.4.4.2.1.3.3a.cmml">if </mtext><mo id="S2.E3.m1.4.4.4.4.2.1.3.2" xref="S2.E3.m1.4.4.4.4.2.1.3.2.cmml">⁢</mo><mrow id="S2.E3.m1.4.4.4.4.2.1.3.1.1" xref="S2.E3.m1.4.4.4.4.2.1.3.1.2.cmml"><mo id="S2.E3.m1.4.4.4.4.2.1.3.1.1.2" stretchy="false" xref="S2.E3.m1.4.4.4.4.2.1.3.1.2.1.cmml">|</mo><msub id="S2.E3.m1.4.4.4.4.2.1.3.1.1.1" xref="S2.E3.m1.4.4.4.4.2.1.3.1.1.1.cmml"><mi id="S2.E3.m1.4.4.4.4.2.1.3.1.1.1.2" xref="S2.E3.m1.4.4.4.4.2.1.3.1.1.1.2.cmml">F</mi><mrow id="S2.E3.m1.4.4.4.4.2.1.2.2.4" xref="S2.E3.m1.4.4.4.4.2.1.2.2.3.cmml"><mi id="S2.E3.m1.4.4.4.4.2.1.1.1.1" xref="S2.E3.m1.4.4.4.4.2.1.1.1.1.cmml">l</mi><mo id="S2.E3.m1.4.4.4.4.2.1.2.2.4.1" xref="S2.E3.m1.4.4.4.4.2.1.2.2.3.cmml">,</mo><mi id="S2.E3.m1.4.4.4.4.2.1.2.2.2" xref="S2.E3.m1.4.4.4.4.2.1.2.2.2.cmml">t</mi></mrow></msub><mo id="S2.E3.m1.4.4.4.4.2.1.3.1.1.3" stretchy="false" xref="S2.E3.m1.4.4.4.4.2.1.3.1.2.1.cmml">|</mo></mrow></mrow><mo id="S2.E3.m1.4.4.4.4.2.1.4" xref="S2.E3.m1.4.4.4.4.2.1.4.cmml">&gt;</mo><msub id="S2.E3.m1.4.4.4.4.2.1.5" xref="S2.E3.m1.4.4.4.4.2.1.5.cmml"><mover accent="true" id="S2.E3.m1.4.4.4.4.2.1.5.2" xref="S2.E3.m1.4.4.4.4.2.1.5.2.cmml"><mi id="S2.E3.m1.4.4.4.4.2.1.5.2.2" xref="S2.E3.m1.4.4.4.4.2.1.5.2.2.cmml">F</mi><mo id="S2.E3.m1.4.4.4.4.2.1.5.2.1" xref="S2.E3.m1.4.4.4.4.2.1.5.2.1.cmml">¯</mo></mover><mi id="S2.E3.m1.4.4.4.4.2.1.5.3" xref="S2.E3.m1.4.4.4.4.2.1.5.3.cmml">l</mi></msub></mrow></mtd></mtr></mtable></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.E3.m1.6b"><apply id="S2.E3.m1.6.7.cmml" xref="S2.E3.m1.6.7"><eq id="S2.E3.m1.6.7.1.cmml" xref="S2.E3.m1.6.7.1"></eq><apply id="S2.E3.m1.6.7.2.cmml" xref="S2.E3.m1.6.7.2"><csymbol cd="ambiguous" id="S2.E3.m1.6.7.2.1.cmml" xref="S2.E3.m1.6.7.2">subscript</csymbol><ci id="S2.E3.m1.6.7.2.2a.cmml" xref="S2.E3.m1.6.7.2.2"><mtext id="S2.E3.m1.6.7.2.2.cmml" xref="S2.E3.m1.6.7.2.2">Margin</mtext></ci><list id="S2.E3.m1.6.6.2.3.cmml" xref="S2.E3.m1.6.6.2.4"><ci id="S2.E3.m1.5.5.1.1.cmml" xref="S2.E3.m1.5.5.1.1">𝑙</ci><ci id="S2.E3.m1.6.6.2.2.cmml" xref="S2.E3.m1.6.6.2.2">𝑡</ci></list></apply><apply id="S2.E3.m1.6.7.3.1.cmml" xref="S2.E3.m1.4.4"><csymbol cd="latexml" id="S2.E3.m1.6.7.3.1.1.cmml" xref="S2.E3.m1.4.4.5">cases</csymbol><apply id="S2.E3.m1.1.1.1.1.1.1.cmml" xref="S2.E3.m1.1.1.1.1.1.1"><divide id="S2.E3.m1.1.1.1.1.1.1.4.cmml" xref="S2.E3.m1.1.1.1.1.1.1"></divide><apply id="S2.E3.m1.1.1.1.1.1.1.3.3.cmml" xref="S2.E3.m1.1.1.1.1.1.1.3.3"><minus id="S2.E3.m1.1.1.1.1.1.1.3.3.4.cmml" xref="S2.E3.m1.1.1.1.1.1.1.3.3.4"></minus><apply id="S2.E3.m1.1.1.1.1.1.1.3.3.5.cmml" xref="S2.E3.m1.1.1.1.1.1.1.3.3.5"><csymbol cd="ambiguous" id="S2.E3.m1.1.1.1.1.1.1.3.3.5.1.cmml" xref="S2.E3.m1.1.1.1.1.1.1.3.3.5">subscript</csymbol><apply id="S2.E3.m1.1.1.1.1.1.1.3.3.5.2.cmml" xref="S2.E3.m1.1.1.1.1.1.1.3.3.5.2"><ci id="S2.E3.m1.1.1.1.1.1.1.3.3.5.2.1.cmml" xref="S2.E3.m1.1.1.1.1.1.1.3.3.5.2.1">¯</ci><ci id="S2.E3.m1.1.1.1.1.1.1.3.3.5.2.2.cmml" xref="S2.E3.m1.1.1.1.1.1.1.3.3.5.2.2">𝐹</ci></apply><ci id="S2.E3.m1.1.1.1.1.1.1.3.3.5.3.cmml" xref="S2.E3.m1.1.1.1.1.1.1.3.3.5.3">𝑙</ci></apply><apply id="S2.E3.m1.1.1.1.1.1.1.3.3.3.2.cmml" xref="S2.E3.m1.1.1.1.1.1.1.3.3.3.1"><abs id="S2.E3.m1.1.1.1.1.1.1.3.3.3.2.1.cmml" xref="S2.E3.m1.1.1.1.1.1.1.3.3.3.1.2"></abs><apply id="S2.E3.m1.1.1.1.1.1.1.3.3.3.1.1.cmml" xref="S2.E3.m1.1.1.1.1.1.1.3.3.3.1.1"><csymbol cd="ambiguous" id="S2.E3.m1.1.1.1.1.1.1.3.3.3.1.1.1.cmml" xref="S2.E3.m1.1.1.1.1.1.1.3.3.3.1.1">subscript</csymbol><ci id="S2.E3.m1.1.1.1.1.1.1.3.3.3.1.1.2.cmml" xref="S2.E3.m1.1.1.1.1.1.1.3.3.3.1.1.2">𝐹</ci><list id="S2.E3.m1.1.1.1.1.1.1.2.2.2.2.3.cmml" xref="S2.E3.m1.1.1.1.1.1.1.2.2.2.2.4"><ci id="S2.E3.m1.1.1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.E3.m1.1.1.1.1.1.1.1.1.1.1.1">𝑙</ci><ci id="S2.E3.m1.1.1.1.1.1.1.2.2.2.2.2.cmml" xref="S2.E3.m1.1.1.1.1.1.1.2.2.2.2.2">𝑡</ci></list></apply></apply></apply><apply id="S2.E3.m1.1.1.1.1.1.1.5.cmml" xref="S2.E3.m1.1.1.1.1.1.1.5"><csymbol cd="ambiguous" id="S2.E3.m1.1.1.1.1.1.1.5.1.cmml" xref="S2.E3.m1.1.1.1.1.1.1.5">subscript</csymbol><apply id="S2.E3.m1.1.1.1.1.1.1.5.2.cmml" xref="S2.E3.m1.1.1.1.1.1.1.5.2"><ci id="S2.E3.m1.1.1.1.1.1.1.5.2.1.cmml" xref="S2.E3.m1.1.1.1.1.1.1.5.2.1">¯</ci><ci id="S2.E3.m1.1.1.1.1.1.1.5.2.2.cmml" xref="S2.E3.m1.1.1.1.1.1.1.5.2.2">𝐹</ci></apply><ci id="S2.E3.m1.1.1.1.1.1.1.5.3.cmml" xref="S2.E3.m1.1.1.1.1.1.1.5.3">𝑙</ci></apply></apply><apply id="S2.E3.m1.2.2.2.2.2.1.cmml" xref="S2.E3.m1.2.2.2.2.2.1"><leq id="S2.E3.m1.2.2.2.2.2.1.4.cmml" xref="S2.E3.m1.2.2.2.2.2.1.4"></leq><apply id="S2.E3.m1.2.2.2.2.2.1.3.cmml" xref="S2.E3.m1.2.2.2.2.2.1.3"><times id="S2.E3.m1.2.2.2.2.2.1.3.2.cmml" xref="S2.E3.m1.2.2.2.2.2.1.3.2"></times><ci id="S2.E3.m1.2.2.2.2.2.1.3.3a.cmml" xref="S2.E3.m1.2.2.2.2.2.1.3.3"><mtext id="S2.E3.m1.2.2.2.2.2.1.3.3.cmml" xref="S2.E3.m1.2.2.2.2.2.1.3.3">if </mtext></ci><apply id="S2.E3.m1.2.2.2.2.2.1.3.1.2.cmml" xref="S2.E3.m1.2.2.2.2.2.1.3.1.1"><abs id="S2.E3.m1.2.2.2.2.2.1.3.1.2.1.cmml" xref="S2.E3.m1.2.2.2.2.2.1.3.1.1.2"></abs><apply id="S2.E3.m1.2.2.2.2.2.1.3.1.1.1.cmml" xref="S2.E3.m1.2.2.2.2.2.1.3.1.1.1"><csymbol cd="ambiguous" id="S2.E3.m1.2.2.2.2.2.1.3.1.1.1.1.cmml" xref="S2.E3.m1.2.2.2.2.2.1.3.1.1.1">subscript</csymbol><ci id="S2.E3.m1.2.2.2.2.2.1.3.1.1.1.2.cmml" xref="S2.E3.m1.2.2.2.2.2.1.3.1.1.1.2">𝐹</ci><list id="S2.E3.m1.2.2.2.2.2.1.2.2.3.cmml" xref="S2.E3.m1.2.2.2.2.2.1.2.2.4"><ci id="S2.E3.m1.2.2.2.2.2.1.1.1.1.cmml" xref="S2.E3.m1.2.2.2.2.2.1.1.1.1">𝑙</ci><ci id="S2.E3.m1.2.2.2.2.2.1.2.2.2.cmml" xref="S2.E3.m1.2.2.2.2.2.1.2.2.2">𝑡</ci></list></apply></apply></apply><apply id="S2.E3.m1.2.2.2.2.2.1.5.cmml" xref="S2.E3.m1.2.2.2.2.2.1.5"><csymbol cd="ambiguous" id="S2.E3.m1.2.2.2.2.2.1.5.1.cmml" xref="S2.E3.m1.2.2.2.2.2.1.5">subscript</csymbol><apply id="S2.E3.m1.2.2.2.2.2.1.5.2.cmml" xref="S2.E3.m1.2.2.2.2.2.1.5.2"><ci id="S2.E3.m1.2.2.2.2.2.1.5.2.1.cmml" xref="S2.E3.m1.2.2.2.2.2.1.5.2.1">¯</ci><ci id="S2.E3.m1.2.2.2.2.2.1.5.2.2.cmml" xref="S2.E3.m1.2.2.2.2.2.1.5.2.2">𝐹</ci></apply><ci id="S2.E3.m1.2.2.2.2.2.1.5.3.cmml" xref="S2.E3.m1.2.2.2.2.2.1.5.3">𝑙</ci></apply></apply><cn id="S2.E3.m1.3.3.3.3.1.1.cmml" type="integer" xref="S2.E3.m1.3.3.3.3.1.1">0</cn><apply id="S2.E3.m1.4.4.4.4.2.1.cmml" xref="S2.E3.m1.4.4.4.4.2.1"><gt id="S2.E3.m1.4.4.4.4.2.1.4.cmml" xref="S2.E3.m1.4.4.4.4.2.1.4"></gt><apply id="S2.E3.m1.4.4.4.4.2.1.3.cmml" xref="S2.E3.m1.4.4.4.4.2.1.3"><times id="S2.E3.m1.4.4.4.4.2.1.3.2.cmml" xref="S2.E3.m1.4.4.4.4.2.1.3.2"></times><ci id="S2.E3.m1.4.4.4.4.2.1.3.3a.cmml" xref="S2.E3.m1.4.4.4.4.2.1.3.3"><mtext id="S2.E3.m1.4.4.4.4.2.1.3.3.cmml" xref="S2.E3.m1.4.4.4.4.2.1.3.3">if </mtext></ci><apply id="S2.E3.m1.4.4.4.4.2.1.3.1.2.cmml" xref="S2.E3.m1.4.4.4.4.2.1.3.1.1"><abs id="S2.E3.m1.4.4.4.4.2.1.3.1.2.1.cmml" xref="S2.E3.m1.4.4.4.4.2.1.3.1.1.2"></abs><apply id="S2.E3.m1.4.4.4.4.2.1.3.1.1.1.cmml" xref="S2.E3.m1.4.4.4.4.2.1.3.1.1.1"><csymbol cd="ambiguous" id="S2.E3.m1.4.4.4.4.2.1.3.1.1.1.1.cmml" xref="S2.E3.m1.4.4.4.4.2.1.3.1.1.1">subscript</csymbol><ci id="S2.E3.m1.4.4.4.4.2.1.3.1.1.1.2.cmml" xref="S2.E3.m1.4.4.4.4.2.1.3.1.1.1.2">𝐹</ci><list id="S2.E3.m1.4.4.4.4.2.1.2.2.3.cmml" xref="S2.E3.m1.4.4.4.4.2.1.2.2.4"><ci id="S2.E3.m1.4.4.4.4.2.1.1.1.1.cmml" xref="S2.E3.m1.4.4.4.4.2.1.1.1.1">𝑙</ci><ci id="S2.E3.m1.4.4.4.4.2.1.2.2.2.cmml" xref="S2.E3.m1.4.4.4.4.2.1.2.2.2">𝑡</ci></list></apply></apply></apply><apply id="S2.E3.m1.4.4.4.4.2.1.5.cmml" xref="S2.E3.m1.4.4.4.4.2.1.5"><csymbol cd="ambiguous" id="S2.E3.m1.4.4.4.4.2.1.5.1.cmml" xref="S2.E3.m1.4.4.4.4.2.1.5">subscript</csymbol><apply id="S2.E3.m1.4.4.4.4.2.1.5.2.cmml" xref="S2.E3.m1.4.4.4.4.2.1.5.2"><ci id="S2.E3.m1.4.4.4.4.2.1.5.2.1.cmml" xref="S2.E3.m1.4.4.4.4.2.1.5.2.1">¯</ci><ci id="S2.E3.m1.4.4.4.4.2.1.5.2.2.cmml" xref="S2.E3.m1.4.4.4.4.2.1.5.2.2">𝐹</ci></apply><ci id="S2.E3.m1.4.4.4.4.2.1.5.3.cmml" xref="S2.E3.m1.4.4.4.4.2.1.5.3">𝑙</ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E3.m1.6c">\text{Margin}_{l,t}=\begin{cases}\displaystyle\frac{\overline{F}_{l}-|F_{l,t}|% }{\overline{F}_{l}}&amp;\text{if }|F_{l,t}|\leq\overline{F}_{l}\\[10.0pt] 0&amp;\text{if }|F_{l,t}|&gt;\overline{F}_{l}\end{cases}</annotation><annotation encoding="application/x-llamapun" id="S2.E3.m1.6d">Margin start_POSTSUBSCRIPT italic_l , italic_t end_POSTSUBSCRIPT = { start_ROW start_CELL divide start_ARG over¯ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - | italic_F start_POSTSUBSCRIPT italic_l , italic_t end_POSTSUBSCRIPT | end_ARG start_ARG over¯ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_ARG end_CELL start_CELL if | italic_F start_POSTSUBSCRIPT italic_l , italic_t end_POSTSUBSCRIPT | ≤ over¯ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL if | italic_F start_POSTSUBSCRIPT italic_l , italic_t end_POSTSUBSCRIPT | &gt; over¯ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_CELL end_ROW</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(3)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S2.SS2.SSS1.p1.5">where <math alttext="\overline{F}_{l}" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.2.m1.1"><semantics id="S2.SS2.SSS1.p1.2.m1.1a"><msub id="S2.SS2.SSS1.p1.2.m1.1.1" xref="S2.SS2.SSS1.p1.2.m1.1.1.cmml"><mover accent="true" id="S2.SS2.SSS1.p1.2.m1.1.1.2" xref="S2.SS2.SSS1.p1.2.m1.1.1.2.cmml"><mi id="S2.SS2.SSS1.p1.2.m1.1.1.2.2" xref="S2.SS2.SSS1.p1.2.m1.1.1.2.2.cmml">F</mi><mo id="S2.SS2.SSS1.p1.2.m1.1.1.2.1" xref="S2.SS2.SSS1.p1.2.m1.1.1.2.1.cmml">¯</mo></mover><mi id="S2.SS2.SSS1.p1.2.m1.1.1.3" xref="S2.SS2.SSS1.p1.2.m1.1.1.3.cmml">l</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.2.m1.1b"><apply id="S2.SS2.SSS1.p1.2.m1.1.1.cmml" xref="S2.SS2.SSS1.p1.2.m1.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.2.m1.1.1.1.cmml" xref="S2.SS2.SSS1.p1.2.m1.1.1">subscript</csymbol><apply id="S2.SS2.SSS1.p1.2.m1.1.1.2.cmml" xref="S2.SS2.SSS1.p1.2.m1.1.1.2"><ci id="S2.SS2.SSS1.p1.2.m1.1.1.2.1.cmml" xref="S2.SS2.SSS1.p1.2.m1.1.1.2.1">¯</ci><ci id="S2.SS2.SSS1.p1.2.m1.1.1.2.2.cmml" xref="S2.SS2.SSS1.p1.2.m1.1.1.2.2">𝐹</ci></apply><ci id="S2.SS2.SSS1.p1.2.m1.1.1.3.cmml" xref="S2.SS2.SSS1.p1.2.m1.1.1.3">𝑙</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.2.m1.1c">\overline{F}_{l}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.2.m1.1d">over¯ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT</annotation></semantics></math> is the thermal limit (ampacity) of line <math alttext="l" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.3.m2.1"><semantics id="S2.SS2.SSS1.p1.3.m2.1a"><mi id="S2.SS2.SSS1.p1.3.m2.1.1" xref="S2.SS2.SSS1.p1.3.m2.1.1.cmml">l</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.3.m2.1b"><ci id="S2.SS2.SSS1.p1.3.m2.1.1.cmml" xref="S2.SS2.SSS1.p1.3.m2.1.1">𝑙</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.3.m2.1c">l</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.3.m2.1d">italic_l</annotation></semantics></math>, and <math alttext="|F_{l,t}|" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.4.m3.3"><semantics id="S2.SS2.SSS1.p1.4.m3.3a"><mrow id="S2.SS2.SSS1.p1.4.m3.3.3.1" xref="S2.SS2.SSS1.p1.4.m3.3.3.2.cmml"><mo id="S2.SS2.SSS1.p1.4.m3.3.3.1.2" stretchy="false" xref="S2.SS2.SSS1.p1.4.m3.3.3.2.1.cmml">|</mo><msub id="S2.SS2.SSS1.p1.4.m3.3.3.1.1" xref="S2.SS2.SSS1.p1.4.m3.3.3.1.1.cmml"><mi id="S2.SS2.SSS1.p1.4.m3.3.3.1.1.2" xref="S2.SS2.SSS1.p1.4.m3.3.3.1.1.2.cmml">F</mi><mrow id="S2.SS2.SSS1.p1.4.m3.2.2.2.4" xref="S2.SS2.SSS1.p1.4.m3.2.2.2.3.cmml"><mi id="S2.SS2.SSS1.p1.4.m3.1.1.1.1" xref="S2.SS2.SSS1.p1.4.m3.1.1.1.1.cmml">l</mi><mo id="S2.SS2.SSS1.p1.4.m3.2.2.2.4.1" xref="S2.SS2.SSS1.p1.4.m3.2.2.2.3.cmml">,</mo><mi id="S2.SS2.SSS1.p1.4.m3.2.2.2.2" xref="S2.SS2.SSS1.p1.4.m3.2.2.2.2.cmml">t</mi></mrow></msub><mo id="S2.SS2.SSS1.p1.4.m3.3.3.1.3" stretchy="false" xref="S2.SS2.SSS1.p1.4.m3.3.3.2.1.cmml">|</mo></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.4.m3.3b"><apply id="S2.SS2.SSS1.p1.4.m3.3.3.2.cmml" xref="S2.SS2.SSS1.p1.4.m3.3.3.1"><abs id="S2.SS2.SSS1.p1.4.m3.3.3.2.1.cmml" xref="S2.SS2.SSS1.p1.4.m3.3.3.1.2"></abs><apply id="S2.SS2.SSS1.p1.4.m3.3.3.1.1.cmml" xref="S2.SS2.SSS1.p1.4.m3.3.3.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.4.m3.3.3.1.1.1.cmml" xref="S2.SS2.SSS1.p1.4.m3.3.3.1.1">subscript</csymbol><ci id="S2.SS2.SSS1.p1.4.m3.3.3.1.1.2.cmml" xref="S2.SS2.SSS1.p1.4.m3.3.3.1.1.2">𝐹</ci><list id="S2.SS2.SSS1.p1.4.m3.2.2.2.3.cmml" xref="S2.SS2.SSS1.p1.4.m3.2.2.2.4"><ci id="S2.SS2.SSS1.p1.4.m3.1.1.1.1.cmml" xref="S2.SS2.SSS1.p1.4.m3.1.1.1.1">𝑙</ci><ci id="S2.SS2.SSS1.p1.4.m3.2.2.2.2.cmml" xref="S2.SS2.SSS1.p1.4.m3.2.2.2.2">𝑡</ci></list></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.4.m3.3c">|F_{l,t}|</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.4.m3.3d">| italic_F start_POSTSUBSCRIPT italic_l , italic_t end_POSTSUBSCRIPT |</annotation></semantics></math> is the absolute value of the current flow in amps. The line loading reward <math alttext="R^{L}_{t}" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.5.m4.1"><semantics id="S2.SS2.SSS1.p1.5.m4.1a"><msubsup id="S2.SS2.SSS1.p1.5.m4.1.1" xref="S2.SS2.SSS1.p1.5.m4.1.1.cmml"><mi id="S2.SS2.SSS1.p1.5.m4.1.1.2.2" xref="S2.SS2.SSS1.p1.5.m4.1.1.2.2.cmml">R</mi><mi id="S2.SS2.SSS1.p1.5.m4.1.1.3" xref="S2.SS2.SSS1.p1.5.m4.1.1.3.cmml">t</mi><mi id="S2.SS2.SSS1.p1.5.m4.1.1.2.3" xref="S2.SS2.SSS1.p1.5.m4.1.1.2.3.cmml">L</mi></msubsup><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.5.m4.1b"><apply id="S2.SS2.SSS1.p1.5.m4.1.1.cmml" xref="S2.SS2.SSS1.p1.5.m4.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.5.m4.1.1.1.cmml" xref="S2.SS2.SSS1.p1.5.m4.1.1">subscript</csymbol><apply id="S2.SS2.SSS1.p1.5.m4.1.1.2.cmml" xref="S2.SS2.SSS1.p1.5.m4.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS1.p1.5.m4.1.1.2.1.cmml" xref="S2.SS2.SSS1.p1.5.m4.1.1">superscript</csymbol><ci id="S2.SS2.SSS1.p1.5.m4.1.1.2.2.cmml" xref="S2.SS2.SSS1.p1.5.m4.1.1.2.2">𝑅</ci><ci id="S2.SS2.SSS1.p1.5.m4.1.1.2.3.cmml" xref="S2.SS2.SSS1.p1.5.m4.1.1.2.3">𝐿</ci></apply><ci id="S2.SS2.SSS1.p1.5.m4.1.1.3.cmml" xref="S2.SS2.SSS1.p1.5.m4.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.5.m4.1c">R^{L}_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.5.m4.1d">italic_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> at each time step is calculated as:</p> <table class="ltx_equation ltx_eqn_table" id="S2.E4"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="R^{L}_{t}=\sum_{l=1}^{L}\left(\text{Margin}_{l,t}\right)^{2}" class="ltx_Math" display="block" id="S2.E4.m1.3"><semantics id="S2.E4.m1.3a"><mrow id="S2.E4.m1.3.3" xref="S2.E4.m1.3.3.cmml"><msubsup id="S2.E4.m1.3.3.3" xref="S2.E4.m1.3.3.3.cmml"><mi id="S2.E4.m1.3.3.3.2.2" xref="S2.E4.m1.3.3.3.2.2.cmml">R</mi><mi id="S2.E4.m1.3.3.3.3" xref="S2.E4.m1.3.3.3.3.cmml">t</mi><mi id="S2.E4.m1.3.3.3.2.3" xref="S2.E4.m1.3.3.3.2.3.cmml">L</mi></msubsup><mo id="S2.E4.m1.3.3.2" rspace="0.111em" xref="S2.E4.m1.3.3.2.cmml">=</mo><mrow id="S2.E4.m1.3.3.1" xref="S2.E4.m1.3.3.1.cmml"><munderover id="S2.E4.m1.3.3.1.2" xref="S2.E4.m1.3.3.1.2.cmml"><mo id="S2.E4.m1.3.3.1.2.2.2" movablelimits="false" rspace="0em" xref="S2.E4.m1.3.3.1.2.2.2.cmml">∑</mo><mrow id="S2.E4.m1.3.3.1.2.2.3" xref="S2.E4.m1.3.3.1.2.2.3.cmml"><mi id="S2.E4.m1.3.3.1.2.2.3.2" xref="S2.E4.m1.3.3.1.2.2.3.2.cmml">l</mi><mo id="S2.E4.m1.3.3.1.2.2.3.1" xref="S2.E4.m1.3.3.1.2.2.3.1.cmml">=</mo><mn id="S2.E4.m1.3.3.1.2.2.3.3" xref="S2.E4.m1.3.3.1.2.2.3.3.cmml">1</mn></mrow><mi id="S2.E4.m1.3.3.1.2.3" xref="S2.E4.m1.3.3.1.2.3.cmml">L</mi></munderover><msup id="S2.E4.m1.3.3.1.1" xref="S2.E4.m1.3.3.1.1.cmml"><mrow id="S2.E4.m1.3.3.1.1.1.1" xref="S2.E4.m1.3.3.1.1.1.1.1.cmml"><mo id="S2.E4.m1.3.3.1.1.1.1.2" xref="S2.E4.m1.3.3.1.1.1.1.1.cmml">(</mo><msub id="S2.E4.m1.3.3.1.1.1.1.1" xref="S2.E4.m1.3.3.1.1.1.1.1.cmml"><mtext id="S2.E4.m1.3.3.1.1.1.1.1.2" xref="S2.E4.m1.3.3.1.1.1.1.1.2a.cmml">Margin</mtext><mrow id="S2.E4.m1.2.2.2.4" xref="S2.E4.m1.2.2.2.3.cmml"><mi id="S2.E4.m1.1.1.1.1" xref="S2.E4.m1.1.1.1.1.cmml">l</mi><mo id="S2.E4.m1.2.2.2.4.1" xref="S2.E4.m1.2.2.2.3.cmml">,</mo><mi id="S2.E4.m1.2.2.2.2" xref="S2.E4.m1.2.2.2.2.cmml">t</mi></mrow></msub><mo id="S2.E4.m1.3.3.1.1.1.1.3" xref="S2.E4.m1.3.3.1.1.1.1.1.cmml">)</mo></mrow><mn id="S2.E4.m1.3.3.1.1.3" xref="S2.E4.m1.3.3.1.1.3.cmml">2</mn></msup></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.E4.m1.3b"><apply id="S2.E4.m1.3.3.cmml" xref="S2.E4.m1.3.3"><eq id="S2.E4.m1.3.3.2.cmml" xref="S2.E4.m1.3.3.2"></eq><apply id="S2.E4.m1.3.3.3.cmml" xref="S2.E4.m1.3.3.3"><csymbol cd="ambiguous" id="S2.E4.m1.3.3.3.1.cmml" xref="S2.E4.m1.3.3.3">subscript</csymbol><apply id="S2.E4.m1.3.3.3.2.cmml" xref="S2.E4.m1.3.3.3"><csymbol cd="ambiguous" id="S2.E4.m1.3.3.3.2.1.cmml" xref="S2.E4.m1.3.3.3">superscript</csymbol><ci id="S2.E4.m1.3.3.3.2.2.cmml" xref="S2.E4.m1.3.3.3.2.2">𝑅</ci><ci id="S2.E4.m1.3.3.3.2.3.cmml" xref="S2.E4.m1.3.3.3.2.3">𝐿</ci></apply><ci id="S2.E4.m1.3.3.3.3.cmml" xref="S2.E4.m1.3.3.3.3">𝑡</ci></apply><apply id="S2.E4.m1.3.3.1.cmml" xref="S2.E4.m1.3.3.1"><apply id="S2.E4.m1.3.3.1.2.cmml" xref="S2.E4.m1.3.3.1.2"><csymbol cd="ambiguous" id="S2.E4.m1.3.3.1.2.1.cmml" xref="S2.E4.m1.3.3.1.2">superscript</csymbol><apply id="S2.E4.m1.3.3.1.2.2.cmml" xref="S2.E4.m1.3.3.1.2"><csymbol cd="ambiguous" id="S2.E4.m1.3.3.1.2.2.1.cmml" xref="S2.E4.m1.3.3.1.2">subscript</csymbol><sum id="S2.E4.m1.3.3.1.2.2.2.cmml" xref="S2.E4.m1.3.3.1.2.2.2"></sum><apply id="S2.E4.m1.3.3.1.2.2.3.cmml" xref="S2.E4.m1.3.3.1.2.2.3"><eq id="S2.E4.m1.3.3.1.2.2.3.1.cmml" xref="S2.E4.m1.3.3.1.2.2.3.1"></eq><ci id="S2.E4.m1.3.3.1.2.2.3.2.cmml" xref="S2.E4.m1.3.3.1.2.2.3.2">𝑙</ci><cn id="S2.E4.m1.3.3.1.2.2.3.3.cmml" type="integer" xref="S2.E4.m1.3.3.1.2.2.3.3">1</cn></apply></apply><ci id="S2.E4.m1.3.3.1.2.3.cmml" xref="S2.E4.m1.3.3.1.2.3">𝐿</ci></apply><apply id="S2.E4.m1.3.3.1.1.cmml" xref="S2.E4.m1.3.3.1.1"><csymbol cd="ambiguous" id="S2.E4.m1.3.3.1.1.2.cmml" xref="S2.E4.m1.3.3.1.1">superscript</csymbol><apply id="S2.E4.m1.3.3.1.1.1.1.1.cmml" xref="S2.E4.m1.3.3.1.1.1.1"><csymbol cd="ambiguous" id="S2.E4.m1.3.3.1.1.1.1.1.1.cmml" xref="S2.E4.m1.3.3.1.1.1.1">subscript</csymbol><ci id="S2.E4.m1.3.3.1.1.1.1.1.2a.cmml" xref="S2.E4.m1.3.3.1.1.1.1.1.2"><mtext id="S2.E4.m1.3.3.1.1.1.1.1.2.cmml" xref="S2.E4.m1.3.3.1.1.1.1.1.2">Margin</mtext></ci><list id="S2.E4.m1.2.2.2.3.cmml" xref="S2.E4.m1.2.2.2.4"><ci id="S2.E4.m1.1.1.1.1.cmml" xref="S2.E4.m1.1.1.1.1">𝑙</ci><ci id="S2.E4.m1.2.2.2.2.cmml" xref="S2.E4.m1.2.2.2.2">𝑡</ci></list></apply><cn id="S2.E4.m1.3.3.1.1.3.cmml" type="integer" xref="S2.E4.m1.3.3.1.1.3">2</cn></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E4.m1.3c">R^{L}_{t}=\sum_{l=1}^{L}\left(\text{Margin}_{l,t}\right)^{2}</annotation><annotation encoding="application/x-llamapun" id="S2.E4.m1.3d">italic_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ( Margin start_POSTSUBSCRIPT italic_l , italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(4)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S2.SS2.SSS1.p1.6">where, <math alttext="L" class="ltx_Math" display="inline" id="S2.SS2.SSS1.p1.6.m1.1"><semantics id="S2.SS2.SSS1.p1.6.m1.1a"><mi id="S2.SS2.SSS1.p1.6.m1.1.1" xref="S2.SS2.SSS1.p1.6.m1.1.1.cmml">L</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS1.p1.6.m1.1b"><ci id="S2.SS2.SSS1.p1.6.m1.1.1.cmml" xref="S2.SS2.SSS1.p1.6.m1.1.1">𝐿</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS1.p1.6.m1.1c">L</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS1.p1.6.m1.1d">italic_L</annotation></semantics></math> represents the total number of power lines. This reward function encourages the agent to keep the power flows within the thermal limits of the lines, penalizing situations where lines are overloaded.</p> </div> </section> <section class="ltx_subsubsection" id="S2.SS2.SSS2"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection"><span class="ltx_text" id="S2.SS2.SSS2.5.1.1">II-B</span>2 </span>Topological Deviation</h4> <div class="ltx_para" id="S2.SS2.SSS2.p1"> <p class="ltx_p" id="S2.SS2.SSS2.p1.1">This reward assigns a penalty based on the degree of deviation of the current grid topology from its initial, default configuration <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib26" title="">26</a>]</cite>. In the default state, all the busbar coupler switches are closed, resulting in a fully meshed configuration and a single electrical node per substation. The reward decreases progressively as the topology deviates from this configuration, encouraging the agent not too deviate too far from the original topology <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib4" title="">4</a>]</cite>.</p> </div> <div class="ltx_para" id="S2.SS2.SSS2.p2"> <p class="ltx_p" id="S2.SS2.SSS2.p2.2">At each time step <math alttext="t" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p2.1.m1.1"><semantics id="S2.SS2.SSS2.p2.1.m1.1a"><mi id="S2.SS2.SSS2.p2.1.m1.1.1" xref="S2.SS2.SSS2.p2.1.m1.1.1.cmml">t</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p2.1.m1.1b"><ci id="S2.SS2.SSS2.p2.1.m1.1.1.cmml" xref="S2.SS2.SSS2.p2.1.m1.1.1">𝑡</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p2.1.m1.1c">t</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p2.1.m1.1d">italic_t</annotation></semantics></math>, we define the topological deviation for a substation <math alttext="i" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p2.2.m2.1"><semantics id="S2.SS2.SSS2.p2.2.m2.1a"><mi id="S2.SS2.SSS2.p2.2.m2.1.1" xref="S2.SS2.SSS2.p2.2.m2.1.1.cmml">i</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p2.2.m2.1b"><ci id="S2.SS2.SSS2.p2.2.m2.1.1.cmml" xref="S2.SS2.SSS2.p2.2.m2.1.1">𝑖</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p2.2.m2.1c">i</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p2.2.m2.1d">italic_i</annotation></semantics></math> as:</p> <table class="ltx_equation ltx_eqn_table" id="S2.E5"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="D_{i,t}=\begin{cases}1,&amp;\text{if any element in substation }i\\ &amp;\text{ is assigned to a different bus,}\\ 0,&amp;\text{otherwise.}\end{cases}" class="ltx_Math" display="block" id="S2.E5.m1.7"><semantics id="S2.E5.m1.7a"><mrow id="S2.E5.m1.7.8" xref="S2.E5.m1.7.8.cmml"><msub id="S2.E5.m1.7.8.2" xref="S2.E5.m1.7.8.2.cmml"><mi id="S2.E5.m1.7.8.2.2" xref="S2.E5.m1.7.8.2.2.cmml">D</mi><mrow id="S2.E5.m1.7.7.2.4" xref="S2.E5.m1.7.7.2.3.cmml"><mi id="S2.E5.m1.6.6.1.1" xref="S2.E5.m1.6.6.1.1.cmml">i</mi><mo id="S2.E5.m1.7.7.2.4.1" xref="S2.E5.m1.7.7.2.3.cmml">,</mo><mi id="S2.E5.m1.7.7.2.2" xref="S2.E5.m1.7.7.2.2.cmml">t</mi></mrow></msub><mo id="S2.E5.m1.7.8.1" xref="S2.E5.m1.7.8.1.cmml">=</mo><mrow id="S2.E5.m1.5.5" xref="S2.E5.m1.7.8.3.1.cmml"><mo id="S2.E5.m1.5.5.6" xref="S2.E5.m1.7.8.3.1.1.cmml">{</mo><mtable columnspacing="5pt" displaystyle="true" id="S2.E5.m1.5.5.5" rowspacing="0pt" xref="S2.E5.m1.7.8.3.1.cmml"><mtr id="S2.E5.m1.5.5.5a" xref="S2.E5.m1.7.8.3.1.cmml"><mtd class="ltx_align_left" columnalign="left" id="S2.E5.m1.5.5.5b" xref="S2.E5.m1.7.8.3.1.cmml"><mrow id="S2.E5.m1.1.1.1.1.1.1.3" xref="S2.E5.m1.7.8.3.1.cmml"><mn id="S2.E5.m1.1.1.1.1.1.1.1" xref="S2.E5.m1.1.1.1.1.1.1.1.cmml">1</mn><mo id="S2.E5.m1.1.1.1.1.1.1.3.1" xref="S2.E5.m1.7.8.3.1.cmml">,</mo></mrow></mtd><mtd class="ltx_align_left" columnalign="left" id="S2.E5.m1.5.5.5c" xref="S2.E5.m1.7.8.3.1.cmml"><mrow id="S2.E5.m1.2.2.2.2.2.1" xref="S2.E5.m1.2.2.2.2.2.1.cmml"><mtext id="S2.E5.m1.2.2.2.2.2.1.2" xref="S2.E5.m1.2.2.2.2.2.1.2a.cmml">if any element in substation </mtext><mo id="S2.E5.m1.2.2.2.2.2.1.1" xref="S2.E5.m1.2.2.2.2.2.1.1.cmml">⁢</mo><mi id="S2.E5.m1.2.2.2.2.2.1.3" xref="S2.E5.m1.2.2.2.2.2.1.3.cmml">i</mi></mrow></mtd></mtr><mtr id="S2.E5.m1.5.5.5d" xref="S2.E5.m1.7.8.3.1.cmml"><mtd id="S2.E5.m1.5.5.5e" xref="S2.E5.m1.7.8.3.1.1.cmml"></mtd><mtd class="ltx_align_left" columnalign="left" id="S2.E5.m1.5.5.5f" xref="S2.E5.m1.7.8.3.1.cmml"><mtext id="S2.E5.m1.3.3.3.3.1.1" xref="S2.E5.m1.3.3.3.3.1.1a.cmml"> is assigned to a different bus,</mtext></mtd></mtr><mtr id="S2.E5.m1.5.5.5g" xref="S2.E5.m1.7.8.3.1.cmml"><mtd class="ltx_align_left" columnalign="left" id="S2.E5.m1.5.5.5h" xref="S2.E5.m1.7.8.3.1.cmml"><mrow id="S2.E5.m1.4.4.4.4.1.1.3" xref="S2.E5.m1.7.8.3.1.cmml"><mn id="S2.E5.m1.4.4.4.4.1.1.1" xref="S2.E5.m1.4.4.4.4.1.1.1.cmml">0</mn><mo id="S2.E5.m1.4.4.4.4.1.1.3.1" xref="S2.E5.m1.7.8.3.1.cmml">,</mo></mrow></mtd><mtd class="ltx_align_left" columnalign="left" id="S2.E5.m1.5.5.5i" xref="S2.E5.m1.7.8.3.1.cmml"><mtext id="S2.E5.m1.5.5.5.5.2.1" xref="S2.E5.m1.5.5.5.5.2.1a.cmml">otherwise.</mtext></mtd></mtr></mtable></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.E5.m1.7b"><apply id="S2.E5.m1.7.8.cmml" xref="S2.E5.m1.7.8"><eq id="S2.E5.m1.7.8.1.cmml" xref="S2.E5.m1.7.8.1"></eq><apply id="S2.E5.m1.7.8.2.cmml" xref="S2.E5.m1.7.8.2"><csymbol cd="ambiguous" id="S2.E5.m1.7.8.2.1.cmml" xref="S2.E5.m1.7.8.2">subscript</csymbol><ci id="S2.E5.m1.7.8.2.2.cmml" xref="S2.E5.m1.7.8.2.2">𝐷</ci><list id="S2.E5.m1.7.7.2.3.cmml" xref="S2.E5.m1.7.7.2.4"><ci id="S2.E5.m1.6.6.1.1.cmml" xref="S2.E5.m1.6.6.1.1">𝑖</ci><ci id="S2.E5.m1.7.7.2.2.cmml" xref="S2.E5.m1.7.7.2.2">𝑡</ci></list></apply><apply id="S2.E5.m1.7.8.3.1.cmml" xref="S2.E5.m1.5.5"><csymbol cd="latexml" id="S2.E5.m1.7.8.3.1.1.cmml" xref="S2.E5.m1.5.5.6">cases</csymbol><cn id="S2.E5.m1.1.1.1.1.1.1.1.cmml" type="integer" xref="S2.E5.m1.1.1.1.1.1.1.1">1</cn><apply id="S2.E5.m1.2.2.2.2.2.1.cmml" xref="S2.E5.m1.2.2.2.2.2.1"><times id="S2.E5.m1.2.2.2.2.2.1.1.cmml" xref="S2.E5.m1.2.2.2.2.2.1.1"></times><ci id="S2.E5.m1.2.2.2.2.2.1.2a.cmml" xref="S2.E5.m1.2.2.2.2.2.1.2"><mtext id="S2.E5.m1.2.2.2.2.2.1.2.cmml" xref="S2.E5.m1.2.2.2.2.2.1.2">if any element in substation </mtext></ci><ci id="S2.E5.m1.2.2.2.2.2.1.3.cmml" xref="S2.E5.m1.2.2.2.2.2.1.3">𝑖</ci></apply><ci id="S2.E5.m1.7.8.3.1.4a.cmml" xref="S2.E5.m1.5.5"><mtext class="ltx_mathvariant_italic" id="S2.E5.m1.7.8.3.1.4.cmml" xref="S2.E5.m1.5.5.6">otherwise</mtext></ci><ci id="S2.E5.m1.3.3.3.3.1.1a.cmml" xref="S2.E5.m1.3.3.3.3.1.1"><mtext id="S2.E5.m1.3.3.3.3.1.1.cmml" xref="S2.E5.m1.3.3.3.3.1.1"> is assigned to a different bus,</mtext></ci><cn id="S2.E5.m1.4.4.4.4.1.1.1.cmml" type="integer" xref="S2.E5.m1.4.4.4.4.1.1.1">0</cn><ci id="S2.E5.m1.5.5.5.5.2.1a.cmml" xref="S2.E5.m1.5.5.5.5.2.1"><mtext id="S2.E5.m1.5.5.5.5.2.1.cmml" xref="S2.E5.m1.5.5.5.5.2.1">otherwise.</mtext></ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E5.m1.7c">D_{i,t}=\begin{cases}1,&amp;\text{if any element in substation }i\\ &amp;\text{ is assigned to a different bus,}\\ 0,&amp;\text{otherwise.}\end{cases}</annotation><annotation encoding="application/x-llamapun" id="S2.E5.m1.7d">italic_D start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT = { start_ROW start_CELL 1 , end_CELL start_CELL if any element in substation italic_i end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL is assigned to a different bus, end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise. end_CELL end_ROW</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(5)</span></td> </tr></tbody> </table> </div> <div class="ltx_para" id="S2.SS2.SSS2.p3"> <p class="ltx_p" id="S2.SS2.SSS2.p3.1">The total topological deviation at time <math alttext="t" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p3.1.m1.1"><semantics id="S2.SS2.SSS2.p3.1.m1.1a"><mi id="S2.SS2.SSS2.p3.1.m1.1.1" xref="S2.SS2.SSS2.p3.1.m1.1.1.cmml">t</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p3.1.m1.1b"><ci id="S2.SS2.SSS2.p3.1.m1.1.1.cmml" xref="S2.SS2.SSS2.p3.1.m1.1.1">𝑡</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p3.1.m1.1c">t</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p3.1.m1.1d">italic_t</annotation></semantics></math> is :</p> <table class="ltx_equation ltx_eqn_table" id="S2.E6"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="D_{t}=\sum_{i=1}^{N}D_{i,t}" class="ltx_Math" display="block" id="S2.E6.m1.2"><semantics id="S2.E6.m1.2a"><mrow id="S2.E6.m1.2.3" xref="S2.E6.m1.2.3.cmml"><msub id="S2.E6.m1.2.3.2" xref="S2.E6.m1.2.3.2.cmml"><mi id="S2.E6.m1.2.3.2.2" xref="S2.E6.m1.2.3.2.2.cmml">D</mi><mi id="S2.E6.m1.2.3.2.3" xref="S2.E6.m1.2.3.2.3.cmml">t</mi></msub><mo id="S2.E6.m1.2.3.1" rspace="0.111em" xref="S2.E6.m1.2.3.1.cmml">=</mo><mrow id="S2.E6.m1.2.3.3" xref="S2.E6.m1.2.3.3.cmml"><munderover id="S2.E6.m1.2.3.3.1" xref="S2.E6.m1.2.3.3.1.cmml"><mo id="S2.E6.m1.2.3.3.1.2.2" movablelimits="false" xref="S2.E6.m1.2.3.3.1.2.2.cmml">∑</mo><mrow id="S2.E6.m1.2.3.3.1.2.3" xref="S2.E6.m1.2.3.3.1.2.3.cmml"><mi id="S2.E6.m1.2.3.3.1.2.3.2" xref="S2.E6.m1.2.3.3.1.2.3.2.cmml">i</mi><mo id="S2.E6.m1.2.3.3.1.2.3.1" xref="S2.E6.m1.2.3.3.1.2.3.1.cmml">=</mo><mn id="S2.E6.m1.2.3.3.1.2.3.3" xref="S2.E6.m1.2.3.3.1.2.3.3.cmml">1</mn></mrow><mi id="S2.E6.m1.2.3.3.1.3" xref="S2.E6.m1.2.3.3.1.3.cmml">N</mi></munderover><msub id="S2.E6.m1.2.3.3.2" xref="S2.E6.m1.2.3.3.2.cmml"><mi id="S2.E6.m1.2.3.3.2.2" xref="S2.E6.m1.2.3.3.2.2.cmml">D</mi><mrow id="S2.E6.m1.2.2.2.4" xref="S2.E6.m1.2.2.2.3.cmml"><mi id="S2.E6.m1.1.1.1.1" xref="S2.E6.m1.1.1.1.1.cmml">i</mi><mo id="S2.E6.m1.2.2.2.4.1" xref="S2.E6.m1.2.2.2.3.cmml">,</mo><mi id="S2.E6.m1.2.2.2.2" xref="S2.E6.m1.2.2.2.2.cmml">t</mi></mrow></msub></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.E6.m1.2b"><apply id="S2.E6.m1.2.3.cmml" xref="S2.E6.m1.2.3"><eq id="S2.E6.m1.2.3.1.cmml" xref="S2.E6.m1.2.3.1"></eq><apply id="S2.E6.m1.2.3.2.cmml" xref="S2.E6.m1.2.3.2"><csymbol cd="ambiguous" id="S2.E6.m1.2.3.2.1.cmml" xref="S2.E6.m1.2.3.2">subscript</csymbol><ci id="S2.E6.m1.2.3.2.2.cmml" xref="S2.E6.m1.2.3.2.2">𝐷</ci><ci id="S2.E6.m1.2.3.2.3.cmml" xref="S2.E6.m1.2.3.2.3">𝑡</ci></apply><apply id="S2.E6.m1.2.3.3.cmml" xref="S2.E6.m1.2.3.3"><apply id="S2.E6.m1.2.3.3.1.cmml" xref="S2.E6.m1.2.3.3.1"><csymbol cd="ambiguous" id="S2.E6.m1.2.3.3.1.1.cmml" xref="S2.E6.m1.2.3.3.1">superscript</csymbol><apply id="S2.E6.m1.2.3.3.1.2.cmml" xref="S2.E6.m1.2.3.3.1"><csymbol cd="ambiguous" id="S2.E6.m1.2.3.3.1.2.1.cmml" xref="S2.E6.m1.2.3.3.1">subscript</csymbol><sum id="S2.E6.m1.2.3.3.1.2.2.cmml" xref="S2.E6.m1.2.3.3.1.2.2"></sum><apply id="S2.E6.m1.2.3.3.1.2.3.cmml" xref="S2.E6.m1.2.3.3.1.2.3"><eq id="S2.E6.m1.2.3.3.1.2.3.1.cmml" xref="S2.E6.m1.2.3.3.1.2.3.1"></eq><ci id="S2.E6.m1.2.3.3.1.2.3.2.cmml" xref="S2.E6.m1.2.3.3.1.2.3.2">𝑖</ci><cn id="S2.E6.m1.2.3.3.1.2.3.3.cmml" type="integer" xref="S2.E6.m1.2.3.3.1.2.3.3">1</cn></apply></apply><ci id="S2.E6.m1.2.3.3.1.3.cmml" xref="S2.E6.m1.2.3.3.1.3">𝑁</ci></apply><apply id="S2.E6.m1.2.3.3.2.cmml" xref="S2.E6.m1.2.3.3.2"><csymbol cd="ambiguous" id="S2.E6.m1.2.3.3.2.1.cmml" xref="S2.E6.m1.2.3.3.2">subscript</csymbol><ci id="S2.E6.m1.2.3.3.2.2.cmml" xref="S2.E6.m1.2.3.3.2.2">𝐷</ci><list id="S2.E6.m1.2.2.2.3.cmml" xref="S2.E6.m1.2.2.2.4"><ci id="S2.E6.m1.1.1.1.1.cmml" xref="S2.E6.m1.1.1.1.1">𝑖</ci><ci id="S2.E6.m1.2.2.2.2.cmml" xref="S2.E6.m1.2.2.2.2">𝑡</ci></list></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E6.m1.2c">D_{t}=\sum_{i=1}^{N}D_{i,t}</annotation><annotation encoding="application/x-llamapun" id="S2.E6.m1.2d">italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(6)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S2.SS2.SSS2.p3.2">where <math alttext="N" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p3.2.m1.1"><semantics id="S2.SS2.SSS2.p3.2.m1.1a"><mi id="S2.SS2.SSS2.p3.2.m1.1.1" xref="S2.SS2.SSS2.p3.2.m1.1.1.cmml">N</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p3.2.m1.1b"><ci id="S2.SS2.SSS2.p3.2.m1.1.1.cmml" xref="S2.SS2.SSS2.p3.2.m1.1.1">𝑁</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p3.2.m1.1c">N</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p3.2.m1.1d">italic_N</annotation></semantics></math> denotes the total number of substations.</p> </div> <div class="ltx_para" id="S2.SS2.SSS2.p4"> <p class="ltx_p" id="S2.SS2.SSS2.p4.1">The Deviation reward at time <math alttext="t" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p4.1.m1.1"><semantics id="S2.SS2.SSS2.p4.1.m1.1a"><mi id="S2.SS2.SSS2.p4.1.m1.1.1" xref="S2.SS2.SSS2.p4.1.m1.1.1.cmml">t</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p4.1.m1.1b"><ci id="S2.SS2.SSS2.p4.1.m1.1.1.cmml" xref="S2.SS2.SSS2.p4.1.m1.1.1">𝑡</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p4.1.m1.1c">t</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p4.1.m1.1d">italic_t</annotation></semantics></math> is:</p> <table class="ltx_equation ltx_eqn_table" id="S2.E7"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="R^{D}_{t}=\begin{cases}r_{\text{default}},&amp;\text{if }D_{t}=0,\\ r_{\text{minor}},&amp;\text{if }D_{t}\leq d_{\text{threshold}},\\ r_{\text{major}},&amp;\text{otherwise,}\end{cases}" class="ltx_Math" display="block" id="S2.E7.m1.6"><semantics id="S2.E7.m1.6a"><mrow id="S2.E7.m1.6.7" xref="S2.E7.m1.6.7.cmml"><msubsup id="S2.E7.m1.6.7.2" xref="S2.E7.m1.6.7.2.cmml"><mi id="S2.E7.m1.6.7.2.2.2" xref="S2.E7.m1.6.7.2.2.2.cmml">R</mi><mi id="S2.E7.m1.6.7.2.3" xref="S2.E7.m1.6.7.2.3.cmml">t</mi><mi id="S2.E7.m1.6.7.2.2.3" xref="S2.E7.m1.6.7.2.2.3.cmml">D</mi></msubsup><mo id="S2.E7.m1.6.7.1" xref="S2.E7.m1.6.7.1.cmml">=</mo><mrow id="S2.E7.m1.6.6" xref="S2.E7.m1.6.7.3.1.cmml"><mo id="S2.E7.m1.6.6.7" xref="S2.E7.m1.6.7.3.1.1.cmml">{</mo><mtable columnspacing="5pt" displaystyle="true" id="S2.E7.m1.6.6.6" rowspacing="0pt" xref="S2.E7.m1.6.7.3.1.cmml"><mtr id="S2.E7.m1.6.6.6a" xref="S2.E7.m1.6.7.3.1.cmml"><mtd class="ltx_align_left" columnalign="left" id="S2.E7.m1.6.6.6b" xref="S2.E7.m1.6.7.3.1.cmml"><mrow id="S2.E7.m1.1.1.1.1.1.1.1" xref="S2.E7.m1.1.1.1.1.1.1.1.1.cmml"><msub id="S2.E7.m1.1.1.1.1.1.1.1.1" xref="S2.E7.m1.1.1.1.1.1.1.1.1.cmml"><mi id="S2.E7.m1.1.1.1.1.1.1.1.1.2" xref="S2.E7.m1.1.1.1.1.1.1.1.1.2.cmml">r</mi><mtext id="S2.E7.m1.1.1.1.1.1.1.1.1.3" xref="S2.E7.m1.1.1.1.1.1.1.1.1.3a.cmml">default</mtext></msub><mo id="S2.E7.m1.1.1.1.1.1.1.1.2" xref="S2.E7.m1.1.1.1.1.1.1.1.1.cmml">,</mo></mrow></mtd><mtd class="ltx_align_left" columnalign="left" id="S2.E7.m1.6.6.6c" xref="S2.E7.m1.6.7.3.1.cmml"><mrow id="S2.E7.m1.2.2.2.2.2.1.1" xref="S2.E7.m1.2.2.2.2.2.1.1.1.cmml"><mrow id="S2.E7.m1.2.2.2.2.2.1.1.1" xref="S2.E7.m1.2.2.2.2.2.1.1.1.cmml"><mrow id="S2.E7.m1.2.2.2.2.2.1.1.1.2" xref="S2.E7.m1.2.2.2.2.2.1.1.1.2.cmml"><mtext id="S2.E7.m1.2.2.2.2.2.1.1.1.2.2" xref="S2.E7.m1.2.2.2.2.2.1.1.1.2.2a.cmml">if </mtext><mo id="S2.E7.m1.2.2.2.2.2.1.1.1.2.1" xref="S2.E7.m1.2.2.2.2.2.1.1.1.2.1.cmml">⁢</mo><msub id="S2.E7.m1.2.2.2.2.2.1.1.1.2.3" xref="S2.E7.m1.2.2.2.2.2.1.1.1.2.3.cmml"><mi id="S2.E7.m1.2.2.2.2.2.1.1.1.2.3.2" xref="S2.E7.m1.2.2.2.2.2.1.1.1.2.3.2.cmml">D</mi><mi id="S2.E7.m1.2.2.2.2.2.1.1.1.2.3.3" xref="S2.E7.m1.2.2.2.2.2.1.1.1.2.3.3.cmml">t</mi></msub></mrow><mo id="S2.E7.m1.2.2.2.2.2.1.1.1.1" xref="S2.E7.m1.2.2.2.2.2.1.1.1.1.cmml">=</mo><mn id="S2.E7.m1.2.2.2.2.2.1.1.1.3" xref="S2.E7.m1.2.2.2.2.2.1.1.1.3.cmml">0</mn></mrow><mo id="S2.E7.m1.2.2.2.2.2.1.1.2" xref="S2.E7.m1.2.2.2.2.2.1.1.1.cmml">,</mo></mrow></mtd></mtr><mtr id="S2.E7.m1.6.6.6d" xref="S2.E7.m1.6.7.3.1.cmml"><mtd class="ltx_align_left" columnalign="left" id="S2.E7.m1.6.6.6e" xref="S2.E7.m1.6.7.3.1.cmml"><mrow id="S2.E7.m1.3.3.3.3.1.1.1" xref="S2.E7.m1.3.3.3.3.1.1.1.1.cmml"><msub id="S2.E7.m1.3.3.3.3.1.1.1.1" xref="S2.E7.m1.3.3.3.3.1.1.1.1.cmml"><mi id="S2.E7.m1.3.3.3.3.1.1.1.1.2" xref="S2.E7.m1.3.3.3.3.1.1.1.1.2.cmml">r</mi><mtext id="S2.E7.m1.3.3.3.3.1.1.1.1.3" xref="S2.E7.m1.3.3.3.3.1.1.1.1.3a.cmml">minor</mtext></msub><mo id="S2.E7.m1.3.3.3.3.1.1.1.2" xref="S2.E7.m1.3.3.3.3.1.1.1.1.cmml">,</mo></mrow></mtd><mtd class="ltx_align_left" columnalign="left" id="S2.E7.m1.6.6.6f" xref="S2.E7.m1.6.7.3.1.cmml"><mrow id="S2.E7.m1.4.4.4.4.2.1.1" xref="S2.E7.m1.4.4.4.4.2.1.1.1.cmml"><mrow id="S2.E7.m1.4.4.4.4.2.1.1.1" xref="S2.E7.m1.4.4.4.4.2.1.1.1.cmml"><mrow id="S2.E7.m1.4.4.4.4.2.1.1.1.2" xref="S2.E7.m1.4.4.4.4.2.1.1.1.2.cmml"><mtext id="S2.E7.m1.4.4.4.4.2.1.1.1.2.2" xref="S2.E7.m1.4.4.4.4.2.1.1.1.2.2a.cmml">if </mtext><mo id="S2.E7.m1.4.4.4.4.2.1.1.1.2.1" xref="S2.E7.m1.4.4.4.4.2.1.1.1.2.1.cmml">⁢</mo><msub id="S2.E7.m1.4.4.4.4.2.1.1.1.2.3" xref="S2.E7.m1.4.4.4.4.2.1.1.1.2.3.cmml"><mi id="S2.E7.m1.4.4.4.4.2.1.1.1.2.3.2" xref="S2.E7.m1.4.4.4.4.2.1.1.1.2.3.2.cmml">D</mi><mi id="S2.E7.m1.4.4.4.4.2.1.1.1.2.3.3" xref="S2.E7.m1.4.4.4.4.2.1.1.1.2.3.3.cmml">t</mi></msub></mrow><mo id="S2.E7.m1.4.4.4.4.2.1.1.1.1" xref="S2.E7.m1.4.4.4.4.2.1.1.1.1.cmml">≤</mo><msub id="S2.E7.m1.4.4.4.4.2.1.1.1.3" xref="S2.E7.m1.4.4.4.4.2.1.1.1.3.cmml"><mi id="S2.E7.m1.4.4.4.4.2.1.1.1.3.2" xref="S2.E7.m1.4.4.4.4.2.1.1.1.3.2.cmml">d</mi><mtext id="S2.E7.m1.4.4.4.4.2.1.1.1.3.3" xref="S2.E7.m1.4.4.4.4.2.1.1.1.3.3a.cmml">threshold</mtext></msub></mrow><mo id="S2.E7.m1.4.4.4.4.2.1.1.2" xref="S2.E7.m1.4.4.4.4.2.1.1.1.cmml">,</mo></mrow></mtd></mtr><mtr id="S2.E7.m1.6.6.6g" xref="S2.E7.m1.6.7.3.1.cmml"><mtd class="ltx_align_left" columnalign="left" id="S2.E7.m1.6.6.6h" xref="S2.E7.m1.6.7.3.1.cmml"><mrow id="S2.E7.m1.5.5.5.5.1.1.1" xref="S2.E7.m1.5.5.5.5.1.1.1.1.cmml"><msub id="S2.E7.m1.5.5.5.5.1.1.1.1" xref="S2.E7.m1.5.5.5.5.1.1.1.1.cmml"><mi id="S2.E7.m1.5.5.5.5.1.1.1.1.2" xref="S2.E7.m1.5.5.5.5.1.1.1.1.2.cmml">r</mi><mtext id="S2.E7.m1.5.5.5.5.1.1.1.1.3" xref="S2.E7.m1.5.5.5.5.1.1.1.1.3a.cmml">major</mtext></msub><mo id="S2.E7.m1.5.5.5.5.1.1.1.2" xref="S2.E7.m1.5.5.5.5.1.1.1.1.cmml">,</mo></mrow></mtd><mtd class="ltx_align_left" columnalign="left" id="S2.E7.m1.6.6.6i" xref="S2.E7.m1.6.7.3.1.cmml"><mtext id="S2.E7.m1.6.6.6.6.2.1" xref="S2.E7.m1.6.6.6.6.2.1a.cmml">otherwise,</mtext></mtd></mtr></mtable></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.E7.m1.6b"><apply id="S2.E7.m1.6.7.cmml" xref="S2.E7.m1.6.7"><eq id="S2.E7.m1.6.7.1.cmml" xref="S2.E7.m1.6.7.1"></eq><apply id="S2.E7.m1.6.7.2.cmml" xref="S2.E7.m1.6.7.2"><csymbol cd="ambiguous" id="S2.E7.m1.6.7.2.1.cmml" xref="S2.E7.m1.6.7.2">subscript</csymbol><apply id="S2.E7.m1.6.7.2.2.cmml" xref="S2.E7.m1.6.7.2"><csymbol cd="ambiguous" id="S2.E7.m1.6.7.2.2.1.cmml" xref="S2.E7.m1.6.7.2">superscript</csymbol><ci id="S2.E7.m1.6.7.2.2.2.cmml" xref="S2.E7.m1.6.7.2.2.2">𝑅</ci><ci id="S2.E7.m1.6.7.2.2.3.cmml" xref="S2.E7.m1.6.7.2.2.3">𝐷</ci></apply><ci id="S2.E7.m1.6.7.2.3.cmml" xref="S2.E7.m1.6.7.2.3">𝑡</ci></apply><apply id="S2.E7.m1.6.7.3.1.cmml" xref="S2.E7.m1.6.6"><csymbol cd="latexml" id="S2.E7.m1.6.7.3.1.1.cmml" xref="S2.E7.m1.6.6.7">cases</csymbol><apply id="S2.E7.m1.1.1.1.1.1.1.1.1.cmml" xref="S2.E7.m1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.E7.m1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.E7.m1.1.1.1.1.1.1.1">subscript</csymbol><ci id="S2.E7.m1.1.1.1.1.1.1.1.1.2.cmml" xref="S2.E7.m1.1.1.1.1.1.1.1.1.2">𝑟</ci><ci id="S2.E7.m1.1.1.1.1.1.1.1.1.3a.cmml" xref="S2.E7.m1.1.1.1.1.1.1.1.1.3"><mtext id="S2.E7.m1.1.1.1.1.1.1.1.1.3.cmml" mathsize="70%" xref="S2.E7.m1.1.1.1.1.1.1.1.1.3">default</mtext></ci></apply><apply id="S2.E7.m1.2.2.2.2.2.1.1.1.cmml" xref="S2.E7.m1.2.2.2.2.2.1.1"><eq id="S2.E7.m1.2.2.2.2.2.1.1.1.1.cmml" xref="S2.E7.m1.2.2.2.2.2.1.1.1.1"></eq><apply id="S2.E7.m1.2.2.2.2.2.1.1.1.2.cmml" xref="S2.E7.m1.2.2.2.2.2.1.1.1.2"><times id="S2.E7.m1.2.2.2.2.2.1.1.1.2.1.cmml" xref="S2.E7.m1.2.2.2.2.2.1.1.1.2.1"></times><ci id="S2.E7.m1.2.2.2.2.2.1.1.1.2.2a.cmml" xref="S2.E7.m1.2.2.2.2.2.1.1.1.2.2"><mtext id="S2.E7.m1.2.2.2.2.2.1.1.1.2.2.cmml" xref="S2.E7.m1.2.2.2.2.2.1.1.1.2.2">if </mtext></ci><apply id="S2.E7.m1.2.2.2.2.2.1.1.1.2.3.cmml" xref="S2.E7.m1.2.2.2.2.2.1.1.1.2.3"><csymbol cd="ambiguous" id="S2.E7.m1.2.2.2.2.2.1.1.1.2.3.1.cmml" xref="S2.E7.m1.2.2.2.2.2.1.1.1.2.3">subscript</csymbol><ci id="S2.E7.m1.2.2.2.2.2.1.1.1.2.3.2.cmml" xref="S2.E7.m1.2.2.2.2.2.1.1.1.2.3.2">𝐷</ci><ci id="S2.E7.m1.2.2.2.2.2.1.1.1.2.3.3.cmml" xref="S2.E7.m1.2.2.2.2.2.1.1.1.2.3.3">𝑡</ci></apply></apply><cn id="S2.E7.m1.2.2.2.2.2.1.1.1.3.cmml" type="integer" xref="S2.E7.m1.2.2.2.2.2.1.1.1.3">0</cn></apply><apply id="S2.E7.m1.3.3.3.3.1.1.1.1.cmml" xref="S2.E7.m1.3.3.3.3.1.1.1"><csymbol cd="ambiguous" id="S2.E7.m1.3.3.3.3.1.1.1.1.1.cmml" xref="S2.E7.m1.3.3.3.3.1.1.1">subscript</csymbol><ci id="S2.E7.m1.3.3.3.3.1.1.1.1.2.cmml" xref="S2.E7.m1.3.3.3.3.1.1.1.1.2">𝑟</ci><ci id="S2.E7.m1.3.3.3.3.1.1.1.1.3a.cmml" xref="S2.E7.m1.3.3.3.3.1.1.1.1.3"><mtext id="S2.E7.m1.3.3.3.3.1.1.1.1.3.cmml" mathsize="70%" xref="S2.E7.m1.3.3.3.3.1.1.1.1.3">minor</mtext></ci></apply><apply id="S2.E7.m1.4.4.4.4.2.1.1.1.cmml" xref="S2.E7.m1.4.4.4.4.2.1.1"><leq id="S2.E7.m1.4.4.4.4.2.1.1.1.1.cmml" xref="S2.E7.m1.4.4.4.4.2.1.1.1.1"></leq><apply id="S2.E7.m1.4.4.4.4.2.1.1.1.2.cmml" xref="S2.E7.m1.4.4.4.4.2.1.1.1.2"><times id="S2.E7.m1.4.4.4.4.2.1.1.1.2.1.cmml" xref="S2.E7.m1.4.4.4.4.2.1.1.1.2.1"></times><ci id="S2.E7.m1.4.4.4.4.2.1.1.1.2.2a.cmml" xref="S2.E7.m1.4.4.4.4.2.1.1.1.2.2"><mtext id="S2.E7.m1.4.4.4.4.2.1.1.1.2.2.cmml" xref="S2.E7.m1.4.4.4.4.2.1.1.1.2.2">if </mtext></ci><apply id="S2.E7.m1.4.4.4.4.2.1.1.1.2.3.cmml" xref="S2.E7.m1.4.4.4.4.2.1.1.1.2.3"><csymbol cd="ambiguous" id="S2.E7.m1.4.4.4.4.2.1.1.1.2.3.1.cmml" xref="S2.E7.m1.4.4.4.4.2.1.1.1.2.3">subscript</csymbol><ci id="S2.E7.m1.4.4.4.4.2.1.1.1.2.3.2.cmml" xref="S2.E7.m1.4.4.4.4.2.1.1.1.2.3.2">𝐷</ci><ci id="S2.E7.m1.4.4.4.4.2.1.1.1.2.3.3.cmml" xref="S2.E7.m1.4.4.4.4.2.1.1.1.2.3.3">𝑡</ci></apply></apply><apply id="S2.E7.m1.4.4.4.4.2.1.1.1.3.cmml" xref="S2.E7.m1.4.4.4.4.2.1.1.1.3"><csymbol cd="ambiguous" id="S2.E7.m1.4.4.4.4.2.1.1.1.3.1.cmml" xref="S2.E7.m1.4.4.4.4.2.1.1.1.3">subscript</csymbol><ci id="S2.E7.m1.4.4.4.4.2.1.1.1.3.2.cmml" xref="S2.E7.m1.4.4.4.4.2.1.1.1.3.2">𝑑</ci><ci id="S2.E7.m1.4.4.4.4.2.1.1.1.3.3a.cmml" xref="S2.E7.m1.4.4.4.4.2.1.1.1.3.3"><mtext id="S2.E7.m1.4.4.4.4.2.1.1.1.3.3.cmml" mathsize="70%" xref="S2.E7.m1.4.4.4.4.2.1.1.1.3.3">threshold</mtext></ci></apply></apply><apply id="S2.E7.m1.5.5.5.5.1.1.1.1.cmml" xref="S2.E7.m1.5.5.5.5.1.1.1"><csymbol cd="ambiguous" id="S2.E7.m1.5.5.5.5.1.1.1.1.1.cmml" xref="S2.E7.m1.5.5.5.5.1.1.1">subscript</csymbol><ci id="S2.E7.m1.5.5.5.5.1.1.1.1.2.cmml" xref="S2.E7.m1.5.5.5.5.1.1.1.1.2">𝑟</ci><ci id="S2.E7.m1.5.5.5.5.1.1.1.1.3a.cmml" xref="S2.E7.m1.5.5.5.5.1.1.1.1.3"><mtext id="S2.E7.m1.5.5.5.5.1.1.1.1.3.cmml" mathsize="70%" xref="S2.E7.m1.5.5.5.5.1.1.1.1.3">major</mtext></ci></apply><ci id="S2.E7.m1.6.6.6.6.2.1a.cmml" xref="S2.E7.m1.6.6.6.6.2.1"><mtext id="S2.E7.m1.6.6.6.6.2.1.cmml" xref="S2.E7.m1.6.6.6.6.2.1">otherwise,</mtext></ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E7.m1.6c">R^{D}_{t}=\begin{cases}r_{\text{default}},&amp;\text{if }D_{t}=0,\\ r_{\text{minor}},&amp;\text{if }D_{t}\leq d_{\text{threshold}},\\ r_{\text{major}},&amp;\text{otherwise,}\end{cases}</annotation><annotation encoding="application/x-llamapun" id="S2.E7.m1.6d">italic_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { start_ROW start_CELL italic_r start_POSTSUBSCRIPT default end_POSTSUBSCRIPT , end_CELL start_CELL if italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 , end_CELL end_ROW start_ROW start_CELL italic_r start_POSTSUBSCRIPT minor end_POSTSUBSCRIPT , end_CELL start_CELL if italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ italic_d start_POSTSUBSCRIPT threshold end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_r start_POSTSUBSCRIPT major end_POSTSUBSCRIPT , end_CELL start_CELL otherwise, end_CELL end_ROW</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(7)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S2.SS2.SSS2.p4.5">where <math alttext="r_{\text{default}}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p4.2.m1.1"><semantics id="S2.SS2.SSS2.p4.2.m1.1a"><msub id="S2.SS2.SSS2.p4.2.m1.1.1" xref="S2.SS2.SSS2.p4.2.m1.1.1.cmml"><mi id="S2.SS2.SSS2.p4.2.m1.1.1.2" xref="S2.SS2.SSS2.p4.2.m1.1.1.2.cmml">r</mi><mtext id="S2.SS2.SSS2.p4.2.m1.1.1.3" xref="S2.SS2.SSS2.p4.2.m1.1.1.3a.cmml">default</mtext></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p4.2.m1.1b"><apply id="S2.SS2.SSS2.p4.2.m1.1.1.cmml" xref="S2.SS2.SSS2.p4.2.m1.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p4.2.m1.1.1.1.cmml" xref="S2.SS2.SSS2.p4.2.m1.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p4.2.m1.1.1.2.cmml" xref="S2.SS2.SSS2.p4.2.m1.1.1.2">𝑟</ci><ci id="S2.SS2.SSS2.p4.2.m1.1.1.3a.cmml" xref="S2.SS2.SSS2.p4.2.m1.1.1.3"><mtext id="S2.SS2.SSS2.p4.2.m1.1.1.3.cmml" mathsize="70%" xref="S2.SS2.SSS2.p4.2.m1.1.1.3">default</mtext></ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p4.2.m1.1c">r_{\text{default}}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p4.2.m1.1d">italic_r start_POSTSUBSCRIPT default end_POSTSUBSCRIPT</annotation></semantics></math> is a positive reward indicating the maintenance of the default state, <math alttext="r_{\text{low}}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p4.3.m2.1"><semantics id="S2.SS2.SSS2.p4.3.m2.1a"><msub id="S2.SS2.SSS2.p4.3.m2.1.1" xref="S2.SS2.SSS2.p4.3.m2.1.1.cmml"><mi id="S2.SS2.SSS2.p4.3.m2.1.1.2" xref="S2.SS2.SSS2.p4.3.m2.1.1.2.cmml">r</mi><mtext id="S2.SS2.SSS2.p4.3.m2.1.1.3" xref="S2.SS2.SSS2.p4.3.m2.1.1.3a.cmml">low</mtext></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p4.3.m2.1b"><apply id="S2.SS2.SSS2.p4.3.m2.1.1.cmml" xref="S2.SS2.SSS2.p4.3.m2.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p4.3.m2.1.1.1.cmml" xref="S2.SS2.SSS2.p4.3.m2.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p4.3.m2.1.1.2.cmml" xref="S2.SS2.SSS2.p4.3.m2.1.1.2">𝑟</ci><ci id="S2.SS2.SSS2.p4.3.m2.1.1.3a.cmml" xref="S2.SS2.SSS2.p4.3.m2.1.1.3"><mtext id="S2.SS2.SSS2.p4.3.m2.1.1.3.cmml" mathsize="70%" xref="S2.SS2.SSS2.p4.3.m2.1.1.3">low</mtext></ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p4.3.m2.1c">r_{\text{low}}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p4.3.m2.1d">italic_r start_POSTSUBSCRIPT low end_POSTSUBSCRIPT</annotation></semantics></math> is a small penalty for minor deviations from the default configuration, <math alttext="r_{\text{high}}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p4.4.m3.1"><semantics id="S2.SS2.SSS2.p4.4.m3.1a"><msub id="S2.SS2.SSS2.p4.4.m3.1.1" xref="S2.SS2.SSS2.p4.4.m3.1.1.cmml"><mi id="S2.SS2.SSS2.p4.4.m3.1.1.2" xref="S2.SS2.SSS2.p4.4.m3.1.1.2.cmml">r</mi><mtext id="S2.SS2.SSS2.p4.4.m3.1.1.3" xref="S2.SS2.SSS2.p4.4.m3.1.1.3a.cmml">high</mtext></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p4.4.m3.1b"><apply id="S2.SS2.SSS2.p4.4.m3.1.1.cmml" xref="S2.SS2.SSS2.p4.4.m3.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p4.4.m3.1.1.1.cmml" xref="S2.SS2.SSS2.p4.4.m3.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p4.4.m3.1.1.2.cmml" xref="S2.SS2.SSS2.p4.4.m3.1.1.2">𝑟</ci><ci id="S2.SS2.SSS2.p4.4.m3.1.1.3a.cmml" xref="S2.SS2.SSS2.p4.4.m3.1.1.3"><mtext id="S2.SS2.SSS2.p4.4.m3.1.1.3.cmml" mathsize="70%" xref="S2.SS2.SSS2.p4.4.m3.1.1.3">high</mtext></ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p4.4.m3.1c">r_{\text{high}}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p4.4.m3.1d">italic_r start_POSTSUBSCRIPT high end_POSTSUBSCRIPT</annotation></semantics></math> is a larger penalty for significant deviations, and <math alttext="d_{\text{threshold}}" class="ltx_Math" display="inline" id="S2.SS2.SSS2.p4.5.m4.1"><semantics id="S2.SS2.SSS2.p4.5.m4.1a"><msub id="S2.SS2.SSS2.p4.5.m4.1.1" xref="S2.SS2.SSS2.p4.5.m4.1.1.cmml"><mi id="S2.SS2.SSS2.p4.5.m4.1.1.2" xref="S2.SS2.SSS2.p4.5.m4.1.1.2.cmml">d</mi><mtext id="S2.SS2.SSS2.p4.5.m4.1.1.3" xref="S2.SS2.SSS2.p4.5.m4.1.1.3a.cmml">threshold</mtext></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS2.p4.5.m4.1b"><apply id="S2.SS2.SSS2.p4.5.m4.1.1.cmml" xref="S2.SS2.SSS2.p4.5.m4.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS2.p4.5.m4.1.1.1.cmml" xref="S2.SS2.SSS2.p4.5.m4.1.1">subscript</csymbol><ci id="S2.SS2.SSS2.p4.5.m4.1.1.2.cmml" xref="S2.SS2.SSS2.p4.5.m4.1.1.2">𝑑</ci><ci id="S2.SS2.SSS2.p4.5.m4.1.1.3a.cmml" xref="S2.SS2.SSS2.p4.5.m4.1.1.3"><mtext id="S2.SS2.SSS2.p4.5.m4.1.1.3.cmml" mathsize="70%" xref="S2.SS2.SSS2.p4.5.m4.1.1.3">threshold</mtext></ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS2.p4.5.m4.1c">d_{\text{threshold}}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS2.p4.5.m4.1d">italic_d start_POSTSUBSCRIPT threshold end_POSTSUBSCRIPT</annotation></semantics></math> is the predefined threshold for minor deviations.</p> </div> <div class="ltx_para" id="S2.SS2.SSS2.p5"> <p class="ltx_p" id="S2.SS2.SSS2.p5.1">Transmission networks operate more securely and more resilient to disturbances in a fully meshed configuration with minimal or no changes to the topology <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib3" title="">3</a>]</cite>. Consequently, the piecewise linear design of the reward functions incentivizes the agent not to minimize deviation from the default configuration by imposing disproportionately higher penalties for significant alterations to the grid topology.</p> </div> </section> <section class="ltx_subsubsection" id="S2.SS2.SSS3"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection"><span class="ltx_text" id="S2.SS2.SSS3.5.1.1">II-B</span>3 </span>Switching Frequency</h4> <div class="ltx_para" id="S2.SS2.SSS3.p1"> <p class="ltx_p" id="S2.SS2.SSS3.p1.4">This reward penalizes the number of switching actions within a specified time interval, aiming to discourage instability from frequent adjustments <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib4" title="">4</a>]</cite>. To prevent the reward from exploding, we consider only the accumulated switching actions within one time interval. Thus, the total possible episode is divided into intervals <math alttext="\{\mathcal{T}_{m}\}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.1.m1.1"><semantics id="S2.SS2.SSS3.p1.1.m1.1a"><mrow id="S2.SS2.SSS3.p1.1.m1.1.1.1" xref="S2.SS2.SSS3.p1.1.m1.1.1.2.cmml"><mo id="S2.SS2.SSS3.p1.1.m1.1.1.1.2" stretchy="false" xref="S2.SS2.SSS3.p1.1.m1.1.1.2.cmml">{</mo><msub id="S2.SS2.SSS3.p1.1.m1.1.1.1.1" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.2" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.2.cmml">𝒯</mi><mi id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.3" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.3.cmml">m</mi></msub><mo id="S2.SS2.SSS3.p1.1.m1.1.1.1.3" stretchy="false" xref="S2.SS2.SSS3.p1.1.m1.1.1.2.cmml">}</mo></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.1.m1.1b"><set id="S2.SS2.SSS3.p1.1.m1.1.1.2.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1"><apply id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.1.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1">subscript</csymbol><ci id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.2.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.2">𝒯</ci><ci id="S2.SS2.SSS3.p1.1.m1.1.1.1.1.3.cmml" xref="S2.SS2.SSS3.p1.1.m1.1.1.1.1.3">𝑚</ci></apply></set></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.1.m1.1c">\{\mathcal{T}_{m}\}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.1.m1.1d">{ caligraphic_T start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT }</annotation></semantics></math> (e.g., timesteps within one hour or within one day). At each RL time step <math alttext="t" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.2.m2.1"><semantics id="S2.SS2.SSS3.p1.2.m2.1a"><mi id="S2.SS2.SSS3.p1.2.m2.1.1" xref="S2.SS2.SSS3.p1.2.m2.1.1.cmml">t</mi><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.2.m2.1b"><ci id="S2.SS2.SSS3.p1.2.m2.1.1.cmml" xref="S2.SS2.SSS3.p1.2.m2.1.1">𝑡</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.2.m2.1c">t</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.2.m2.1d">italic_t</annotation></semantics></math>, the cumulative switching actions <math alttext="F_{t}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.3.m3.1"><semantics id="S2.SS2.SSS3.p1.3.m3.1a"><msub id="S2.SS2.SSS3.p1.3.m3.1.1" xref="S2.SS2.SSS3.p1.3.m3.1.1.cmml"><mi id="S2.SS2.SSS3.p1.3.m3.1.1.2" xref="S2.SS2.SSS3.p1.3.m3.1.1.2.cmml">F</mi><mi id="S2.SS2.SSS3.p1.3.m3.1.1.3" xref="S2.SS2.SSS3.p1.3.m3.1.1.3.cmml">t</mi></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.3.m3.1b"><apply id="S2.SS2.SSS3.p1.3.m3.1.1.cmml" xref="S2.SS2.SSS3.p1.3.m3.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.3.m3.1.1.1.cmml" xref="S2.SS2.SSS3.p1.3.m3.1.1">subscript</csymbol><ci id="S2.SS2.SSS3.p1.3.m3.1.1.2.cmml" xref="S2.SS2.SSS3.p1.3.m3.1.1.2">𝐹</ci><ci id="S2.SS2.SSS3.p1.3.m3.1.1.3.cmml" xref="S2.SS2.SSS3.p1.3.m3.1.1.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.3.m3.1c">F_{t}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.3.m3.1d">italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT</annotation></semantics></math> in interval <math alttext="\{\mathcal{T}_{m}\}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.4.m4.1"><semantics id="S2.SS2.SSS3.p1.4.m4.1a"><mrow id="S2.SS2.SSS3.p1.4.m4.1.1.1" xref="S2.SS2.SSS3.p1.4.m4.1.1.2.cmml"><mo id="S2.SS2.SSS3.p1.4.m4.1.1.1.2" stretchy="false" xref="S2.SS2.SSS3.p1.4.m4.1.1.2.cmml">{</mo><msub id="S2.SS2.SSS3.p1.4.m4.1.1.1.1" xref="S2.SS2.SSS3.p1.4.m4.1.1.1.1.cmml"><mi class="ltx_font_mathcaligraphic" id="S2.SS2.SSS3.p1.4.m4.1.1.1.1.2" xref="S2.SS2.SSS3.p1.4.m4.1.1.1.1.2.cmml">𝒯</mi><mi id="S2.SS2.SSS3.p1.4.m4.1.1.1.1.3" xref="S2.SS2.SSS3.p1.4.m4.1.1.1.1.3.cmml">m</mi></msub><mo id="S2.SS2.SSS3.p1.4.m4.1.1.1.3" stretchy="false" xref="S2.SS2.SSS3.p1.4.m4.1.1.2.cmml">}</mo></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.4.m4.1b"><set id="S2.SS2.SSS3.p1.4.m4.1.1.2.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.1"><apply id="S2.SS2.SSS3.p1.4.m4.1.1.1.1.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.4.m4.1.1.1.1.1.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.1.1">subscript</csymbol><ci id="S2.SS2.SSS3.p1.4.m4.1.1.1.1.2.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.1.1.2">𝒯</ci><ci id="S2.SS2.SSS3.p1.4.m4.1.1.1.1.3.cmml" xref="S2.SS2.SSS3.p1.4.m4.1.1.1.1.3">𝑚</ci></apply></set></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.4.m4.1c">\{\mathcal{T}_{m}\}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.4.m4.1d">{ caligraphic_T start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT }</annotation></semantics></math> is computed as:</p> <table class="ltx_equation ltx_eqn_table" id="S2.E8"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="F_{t}=\sum_{t\in\mathcal{T}_{m}}\text{switching actions in interval $m$ up to % $t$}" class="ltx_Math" display="block" id="S2.E8.m1.2"><semantics id="S2.E8.m1.2a"><mrow id="S2.E8.m1.2.3" xref="S2.E8.m1.2.3.cmml"><msub id="S2.E8.m1.2.3.2" xref="S2.E8.m1.2.3.2.cmml"><mi id="S2.E8.m1.2.3.2.2" xref="S2.E8.m1.2.3.2.2.cmml">F</mi><mi id="S2.E8.m1.2.3.2.3" xref="S2.E8.m1.2.3.2.3.cmml">t</mi></msub><mo id="S2.E8.m1.2.3.1" rspace="0.111em" xref="S2.E8.m1.2.3.1.cmml">=</mo><mrow id="S2.E8.m1.2.3.3" xref="S2.E8.m1.2.3.3.cmml"><munder id="S2.E8.m1.2.3.3.1" xref="S2.E8.m1.2.3.3.1.cmml"><mo id="S2.E8.m1.2.3.3.1.2" movablelimits="false" xref="S2.E8.m1.2.3.3.1.2.cmml">∑</mo><mrow id="S2.E8.m1.2.3.3.1.3" xref="S2.E8.m1.2.3.3.1.3.cmml"><mi id="S2.E8.m1.2.3.3.1.3.2" xref="S2.E8.m1.2.3.3.1.3.2.cmml">t</mi><mo id="S2.E8.m1.2.3.3.1.3.1" xref="S2.E8.m1.2.3.3.1.3.1.cmml">∈</mo><msub id="S2.E8.m1.2.3.3.1.3.3" xref="S2.E8.m1.2.3.3.1.3.3.cmml"><mi class="ltx_font_mathcaligraphic" id="S2.E8.m1.2.3.3.1.3.3.2" xref="S2.E8.m1.2.3.3.1.3.3.2.cmml">𝒯</mi><mi id="S2.E8.m1.2.3.3.1.3.3.3" xref="S2.E8.m1.2.3.3.1.3.3.3.cmml">m</mi></msub></mrow></munder><mrow id="S2.E8.m1.2.2.2" xref="S2.E8.m1.2.2.2c.cmml"><mtext id="S2.E8.m1.2.2.2a" xref="S2.E8.m1.2.2.2c.cmml">switching actions in interval </mtext><mi id="S2.E8.m1.1.1.1.m1.1.1" xref="S2.E8.m1.1.1.1.m1.1.1.cmml">m</mi><mtext id="S2.E8.m1.2.2.2b" xref="S2.E8.m1.2.2.2c.cmml"> up to </mtext><mi id="S2.E8.m1.2.2.2.m2.1.1" xref="S2.E8.m1.2.2.2.m2.1.1.cmml">t</mi></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.E8.m1.2b"><apply id="S2.E8.m1.2.3.cmml" xref="S2.E8.m1.2.3"><eq id="S2.E8.m1.2.3.1.cmml" xref="S2.E8.m1.2.3.1"></eq><apply id="S2.E8.m1.2.3.2.cmml" xref="S2.E8.m1.2.3.2"><csymbol cd="ambiguous" id="S2.E8.m1.2.3.2.1.cmml" xref="S2.E8.m1.2.3.2">subscript</csymbol><ci id="S2.E8.m1.2.3.2.2.cmml" xref="S2.E8.m1.2.3.2.2">𝐹</ci><ci id="S2.E8.m1.2.3.2.3.cmml" xref="S2.E8.m1.2.3.2.3">𝑡</ci></apply><apply id="S2.E8.m1.2.3.3.cmml" xref="S2.E8.m1.2.3.3"><apply id="S2.E8.m1.2.3.3.1.cmml" xref="S2.E8.m1.2.3.3.1"><csymbol cd="ambiguous" id="S2.E8.m1.2.3.3.1.1.cmml" xref="S2.E8.m1.2.3.3.1">subscript</csymbol><sum id="S2.E8.m1.2.3.3.1.2.cmml" xref="S2.E8.m1.2.3.3.1.2"></sum><apply id="S2.E8.m1.2.3.3.1.3.cmml" xref="S2.E8.m1.2.3.3.1.3"><in id="S2.E8.m1.2.3.3.1.3.1.cmml" xref="S2.E8.m1.2.3.3.1.3.1"></in><ci id="S2.E8.m1.2.3.3.1.3.2.cmml" xref="S2.E8.m1.2.3.3.1.3.2">𝑡</ci><apply id="S2.E8.m1.2.3.3.1.3.3.cmml" xref="S2.E8.m1.2.3.3.1.3.3"><csymbol cd="ambiguous" id="S2.E8.m1.2.3.3.1.3.3.1.cmml" xref="S2.E8.m1.2.3.3.1.3.3">subscript</csymbol><ci id="S2.E8.m1.2.3.3.1.3.3.2.cmml" xref="S2.E8.m1.2.3.3.1.3.3.2">𝒯</ci><ci id="S2.E8.m1.2.3.3.1.3.3.3.cmml" xref="S2.E8.m1.2.3.3.1.3.3.3">𝑚</ci></apply></apply></apply><ci id="S2.E8.m1.2.2.2c.cmml" xref="S2.E8.m1.2.2.2"><mrow id="S2.E8.m1.2.2.2.cmml" xref="S2.E8.m1.2.2.2"><mtext id="S2.E8.m1.2.2.2a.cmml" xref="S2.E8.m1.2.2.2">switching actions in interval </mtext><mi id="S2.E8.m1.1.1.1.m1.1.1.cmml" xref="S2.E8.m1.1.1.1.m1.1.1">m</mi><mtext id="S2.E8.m1.2.2.2b.cmml" xref="S2.E8.m1.2.2.2"> up to </mtext><mi id="S2.E8.m1.2.2.2.m2.1.1.cmml" xref="S2.E8.m1.2.2.2.m2.1.1">t</mi></mrow></ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E8.m1.2c">F_{t}=\sum_{t\in\mathcal{T}_{m}}\text{switching actions in interval $m$ up to % $t$}</annotation><annotation encoding="application/x-llamapun" id="S2.E8.m1.2d">italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_t ∈ caligraphic_T start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT switching actions in interval italic_m up to italic_t</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(8)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S2.SS2.SSS3.p1.9">The switching frequency reward is defined as:</p> <table class="ltx_equation ltx_eqn_table" id="S2.E9"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="R^{F}_{t}=\begin{cases}r_{\text{DoNothing}},&amp;\text{if }F_{t}=0,\\ r_{\text{low}},&amp;\text{if }F_{t}\leq F_{\text{low}}^{\text{th}},\\ r_{\text{high}},&amp;\text{if }F_{t}\geq F_{\text{high}}^{\text{th}},\end{cases}" class="ltx_Math" display="block" id="S2.E9.m1.6"><semantics id="S2.E9.m1.6a"><mrow id="S2.E9.m1.6.7" xref="S2.E9.m1.6.7.cmml"><msubsup id="S2.E9.m1.6.7.2" xref="S2.E9.m1.6.7.2.cmml"><mi id="S2.E9.m1.6.7.2.2.2" xref="S2.E9.m1.6.7.2.2.2.cmml">R</mi><mi id="S2.E9.m1.6.7.2.3" xref="S2.E9.m1.6.7.2.3.cmml">t</mi><mi id="S2.E9.m1.6.7.2.2.3" xref="S2.E9.m1.6.7.2.2.3.cmml">F</mi></msubsup><mo id="S2.E9.m1.6.7.1" xref="S2.E9.m1.6.7.1.cmml">=</mo><mrow id="S2.E9.m1.6.6" xref="S2.E9.m1.6.7.3.1.cmml"><mo id="S2.E9.m1.6.6.7" xref="S2.E9.m1.6.7.3.1.1.cmml">{</mo><mtable columnspacing="5pt" displaystyle="true" id="S2.E9.m1.6.6.6" rowspacing="0pt" xref="S2.E9.m1.6.7.3.1.cmml"><mtr id="S2.E9.m1.6.6.6a" xref="S2.E9.m1.6.7.3.1.cmml"><mtd class="ltx_align_left" columnalign="left" id="S2.E9.m1.6.6.6b" xref="S2.E9.m1.6.7.3.1.cmml"><mrow id="S2.E9.m1.1.1.1.1.1.1.1" xref="S2.E9.m1.1.1.1.1.1.1.1.1.cmml"><msub id="S2.E9.m1.1.1.1.1.1.1.1.1" xref="S2.E9.m1.1.1.1.1.1.1.1.1.cmml"><mi id="S2.E9.m1.1.1.1.1.1.1.1.1.2" xref="S2.E9.m1.1.1.1.1.1.1.1.1.2.cmml">r</mi><mtext id="S2.E9.m1.1.1.1.1.1.1.1.1.3" xref="S2.E9.m1.1.1.1.1.1.1.1.1.3a.cmml">DoNothing</mtext></msub><mo id="S2.E9.m1.1.1.1.1.1.1.1.2" xref="S2.E9.m1.1.1.1.1.1.1.1.1.cmml">,</mo></mrow></mtd><mtd class="ltx_align_left" columnalign="left" id="S2.E9.m1.6.6.6c" xref="S2.E9.m1.6.7.3.1.cmml"><mrow id="S2.E9.m1.2.2.2.2.2.1.1" xref="S2.E9.m1.2.2.2.2.2.1.1.1.cmml"><mrow id="S2.E9.m1.2.2.2.2.2.1.1.1" xref="S2.E9.m1.2.2.2.2.2.1.1.1.cmml"><mrow id="S2.E9.m1.2.2.2.2.2.1.1.1.2" xref="S2.E9.m1.2.2.2.2.2.1.1.1.2.cmml"><mtext id="S2.E9.m1.2.2.2.2.2.1.1.1.2.2" xref="S2.E9.m1.2.2.2.2.2.1.1.1.2.2a.cmml">if </mtext><mo id="S2.E9.m1.2.2.2.2.2.1.1.1.2.1" xref="S2.E9.m1.2.2.2.2.2.1.1.1.2.1.cmml">⁢</mo><msub id="S2.E9.m1.2.2.2.2.2.1.1.1.2.3" xref="S2.E9.m1.2.2.2.2.2.1.1.1.2.3.cmml"><mi id="S2.E9.m1.2.2.2.2.2.1.1.1.2.3.2" xref="S2.E9.m1.2.2.2.2.2.1.1.1.2.3.2.cmml">F</mi><mi id="S2.E9.m1.2.2.2.2.2.1.1.1.2.3.3" xref="S2.E9.m1.2.2.2.2.2.1.1.1.2.3.3.cmml">t</mi></msub></mrow><mo id="S2.E9.m1.2.2.2.2.2.1.1.1.1" xref="S2.E9.m1.2.2.2.2.2.1.1.1.1.cmml">=</mo><mn id="S2.E9.m1.2.2.2.2.2.1.1.1.3" xref="S2.E9.m1.2.2.2.2.2.1.1.1.3.cmml">0</mn></mrow><mo id="S2.E9.m1.2.2.2.2.2.1.1.2" xref="S2.E9.m1.2.2.2.2.2.1.1.1.cmml">,</mo></mrow></mtd></mtr><mtr id="S2.E9.m1.6.6.6d" xref="S2.E9.m1.6.7.3.1.cmml"><mtd class="ltx_align_left" columnalign="left" id="S2.E9.m1.6.6.6e" xref="S2.E9.m1.6.7.3.1.cmml"><mrow id="S2.E9.m1.3.3.3.3.1.1.1" xref="S2.E9.m1.3.3.3.3.1.1.1.1.cmml"><msub id="S2.E9.m1.3.3.3.3.1.1.1.1" xref="S2.E9.m1.3.3.3.3.1.1.1.1.cmml"><mi id="S2.E9.m1.3.3.3.3.1.1.1.1.2" xref="S2.E9.m1.3.3.3.3.1.1.1.1.2.cmml">r</mi><mtext id="S2.E9.m1.3.3.3.3.1.1.1.1.3" xref="S2.E9.m1.3.3.3.3.1.1.1.1.3a.cmml">low</mtext></msub><mo id="S2.E9.m1.3.3.3.3.1.1.1.2" xref="S2.E9.m1.3.3.3.3.1.1.1.1.cmml">,</mo></mrow></mtd><mtd class="ltx_align_left" columnalign="left" id="S2.E9.m1.6.6.6f" xref="S2.E9.m1.6.7.3.1.cmml"><mrow id="S2.E9.m1.4.4.4.4.2.1.1" xref="S2.E9.m1.4.4.4.4.2.1.1.1.cmml"><mrow id="S2.E9.m1.4.4.4.4.2.1.1.1" xref="S2.E9.m1.4.4.4.4.2.1.1.1.cmml"><mrow id="S2.E9.m1.4.4.4.4.2.1.1.1.2" xref="S2.E9.m1.4.4.4.4.2.1.1.1.2.cmml"><mtext id="S2.E9.m1.4.4.4.4.2.1.1.1.2.2" xref="S2.E9.m1.4.4.4.4.2.1.1.1.2.2a.cmml">if </mtext><mo id="S2.E9.m1.4.4.4.4.2.1.1.1.2.1" xref="S2.E9.m1.4.4.4.4.2.1.1.1.2.1.cmml">⁢</mo><msub id="S2.E9.m1.4.4.4.4.2.1.1.1.2.3" xref="S2.E9.m1.4.4.4.4.2.1.1.1.2.3.cmml"><mi id="S2.E9.m1.4.4.4.4.2.1.1.1.2.3.2" xref="S2.E9.m1.4.4.4.4.2.1.1.1.2.3.2.cmml">F</mi><mi id="S2.E9.m1.4.4.4.4.2.1.1.1.2.3.3" xref="S2.E9.m1.4.4.4.4.2.1.1.1.2.3.3.cmml">t</mi></msub></mrow><mo id="S2.E9.m1.4.4.4.4.2.1.1.1.1" xref="S2.E9.m1.4.4.4.4.2.1.1.1.1.cmml">≤</mo><msubsup id="S2.E9.m1.4.4.4.4.2.1.1.1.3" xref="S2.E9.m1.4.4.4.4.2.1.1.1.3.cmml"><mi id="S2.E9.m1.4.4.4.4.2.1.1.1.3.2.2" xref="S2.E9.m1.4.4.4.4.2.1.1.1.3.2.2.cmml">F</mi><mtext id="S2.E9.m1.4.4.4.4.2.1.1.1.3.2.3" xref="S2.E9.m1.4.4.4.4.2.1.1.1.3.2.3a.cmml">low</mtext><mtext id="S2.E9.m1.4.4.4.4.2.1.1.1.3.3" xref="S2.E9.m1.4.4.4.4.2.1.1.1.3.3a.cmml">th</mtext></msubsup></mrow><mo id="S2.E9.m1.4.4.4.4.2.1.1.2" xref="S2.E9.m1.4.4.4.4.2.1.1.1.cmml">,</mo></mrow></mtd></mtr><mtr id="S2.E9.m1.6.6.6g" xref="S2.E9.m1.6.7.3.1.cmml"><mtd class="ltx_align_left" columnalign="left" id="S2.E9.m1.6.6.6h" xref="S2.E9.m1.6.7.3.1.cmml"><mrow id="S2.E9.m1.5.5.5.5.1.1.1" xref="S2.E9.m1.5.5.5.5.1.1.1.1.cmml"><msub id="S2.E9.m1.5.5.5.5.1.1.1.1" xref="S2.E9.m1.5.5.5.5.1.1.1.1.cmml"><mi id="S2.E9.m1.5.5.5.5.1.1.1.1.2" xref="S2.E9.m1.5.5.5.5.1.1.1.1.2.cmml">r</mi><mtext id="S2.E9.m1.5.5.5.5.1.1.1.1.3" xref="S2.E9.m1.5.5.5.5.1.1.1.1.3a.cmml">high</mtext></msub><mo id="S2.E9.m1.5.5.5.5.1.1.1.2" xref="S2.E9.m1.5.5.5.5.1.1.1.1.cmml">,</mo></mrow></mtd><mtd class="ltx_align_left" columnalign="left" id="S2.E9.m1.6.6.6i" xref="S2.E9.m1.6.7.3.1.cmml"><mrow id="S2.E9.m1.6.6.6.6.2.1.1" xref="S2.E9.m1.6.6.6.6.2.1.1.1.cmml"><mrow id="S2.E9.m1.6.6.6.6.2.1.1.1" xref="S2.E9.m1.6.6.6.6.2.1.1.1.cmml"><mrow id="S2.E9.m1.6.6.6.6.2.1.1.1.2" xref="S2.E9.m1.6.6.6.6.2.1.1.1.2.cmml"><mtext id="S2.E9.m1.6.6.6.6.2.1.1.1.2.2" xref="S2.E9.m1.6.6.6.6.2.1.1.1.2.2a.cmml">if </mtext><mo id="S2.E9.m1.6.6.6.6.2.1.1.1.2.1" xref="S2.E9.m1.6.6.6.6.2.1.1.1.2.1.cmml">⁢</mo><msub id="S2.E9.m1.6.6.6.6.2.1.1.1.2.3" xref="S2.E9.m1.6.6.6.6.2.1.1.1.2.3.cmml"><mi id="S2.E9.m1.6.6.6.6.2.1.1.1.2.3.2" xref="S2.E9.m1.6.6.6.6.2.1.1.1.2.3.2.cmml">F</mi><mi id="S2.E9.m1.6.6.6.6.2.1.1.1.2.3.3" xref="S2.E9.m1.6.6.6.6.2.1.1.1.2.3.3.cmml">t</mi></msub></mrow><mo id="S2.E9.m1.6.6.6.6.2.1.1.1.1" xref="S2.E9.m1.6.6.6.6.2.1.1.1.1.cmml">≥</mo><msubsup id="S2.E9.m1.6.6.6.6.2.1.1.1.3" xref="S2.E9.m1.6.6.6.6.2.1.1.1.3.cmml"><mi id="S2.E9.m1.6.6.6.6.2.1.1.1.3.2.2" xref="S2.E9.m1.6.6.6.6.2.1.1.1.3.2.2.cmml">F</mi><mtext id="S2.E9.m1.6.6.6.6.2.1.1.1.3.2.3" xref="S2.E9.m1.6.6.6.6.2.1.1.1.3.2.3a.cmml">high</mtext><mtext id="S2.E9.m1.6.6.6.6.2.1.1.1.3.3" xref="S2.E9.m1.6.6.6.6.2.1.1.1.3.3a.cmml">th</mtext></msubsup></mrow><mo id="S2.E9.m1.6.6.6.6.2.1.1.2" xref="S2.E9.m1.6.6.6.6.2.1.1.1.cmml">,</mo></mrow></mtd></mtr></mtable></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.E9.m1.6b"><apply id="S2.E9.m1.6.7.cmml" xref="S2.E9.m1.6.7"><eq id="S2.E9.m1.6.7.1.cmml" xref="S2.E9.m1.6.7.1"></eq><apply id="S2.E9.m1.6.7.2.cmml" xref="S2.E9.m1.6.7.2"><csymbol cd="ambiguous" id="S2.E9.m1.6.7.2.1.cmml" xref="S2.E9.m1.6.7.2">subscript</csymbol><apply id="S2.E9.m1.6.7.2.2.cmml" xref="S2.E9.m1.6.7.2"><csymbol cd="ambiguous" id="S2.E9.m1.6.7.2.2.1.cmml" xref="S2.E9.m1.6.7.2">superscript</csymbol><ci id="S2.E9.m1.6.7.2.2.2.cmml" xref="S2.E9.m1.6.7.2.2.2">𝑅</ci><ci id="S2.E9.m1.6.7.2.2.3.cmml" xref="S2.E9.m1.6.7.2.2.3">𝐹</ci></apply><ci id="S2.E9.m1.6.7.2.3.cmml" xref="S2.E9.m1.6.7.2.3">𝑡</ci></apply><apply id="S2.E9.m1.6.7.3.1.cmml" xref="S2.E9.m1.6.6"><csymbol cd="latexml" id="S2.E9.m1.6.7.3.1.1.cmml" xref="S2.E9.m1.6.6.7">cases</csymbol><apply id="S2.E9.m1.1.1.1.1.1.1.1.1.cmml" xref="S2.E9.m1.1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.E9.m1.1.1.1.1.1.1.1.1.1.cmml" xref="S2.E9.m1.1.1.1.1.1.1.1">subscript</csymbol><ci id="S2.E9.m1.1.1.1.1.1.1.1.1.2.cmml" xref="S2.E9.m1.1.1.1.1.1.1.1.1.2">𝑟</ci><ci id="S2.E9.m1.1.1.1.1.1.1.1.1.3a.cmml" xref="S2.E9.m1.1.1.1.1.1.1.1.1.3"><mtext id="S2.E9.m1.1.1.1.1.1.1.1.1.3.cmml" mathsize="70%" xref="S2.E9.m1.1.1.1.1.1.1.1.1.3">DoNothing</mtext></ci></apply><apply id="S2.E9.m1.2.2.2.2.2.1.1.1.cmml" xref="S2.E9.m1.2.2.2.2.2.1.1"><eq id="S2.E9.m1.2.2.2.2.2.1.1.1.1.cmml" xref="S2.E9.m1.2.2.2.2.2.1.1.1.1"></eq><apply id="S2.E9.m1.2.2.2.2.2.1.1.1.2.cmml" xref="S2.E9.m1.2.2.2.2.2.1.1.1.2"><times id="S2.E9.m1.2.2.2.2.2.1.1.1.2.1.cmml" xref="S2.E9.m1.2.2.2.2.2.1.1.1.2.1"></times><ci id="S2.E9.m1.2.2.2.2.2.1.1.1.2.2a.cmml" xref="S2.E9.m1.2.2.2.2.2.1.1.1.2.2"><mtext id="S2.E9.m1.2.2.2.2.2.1.1.1.2.2.cmml" xref="S2.E9.m1.2.2.2.2.2.1.1.1.2.2">if </mtext></ci><apply id="S2.E9.m1.2.2.2.2.2.1.1.1.2.3.cmml" xref="S2.E9.m1.2.2.2.2.2.1.1.1.2.3"><csymbol cd="ambiguous" id="S2.E9.m1.2.2.2.2.2.1.1.1.2.3.1.cmml" xref="S2.E9.m1.2.2.2.2.2.1.1.1.2.3">subscript</csymbol><ci id="S2.E9.m1.2.2.2.2.2.1.1.1.2.3.2.cmml" xref="S2.E9.m1.2.2.2.2.2.1.1.1.2.3.2">𝐹</ci><ci id="S2.E9.m1.2.2.2.2.2.1.1.1.2.3.3.cmml" xref="S2.E9.m1.2.2.2.2.2.1.1.1.2.3.3">𝑡</ci></apply></apply><cn id="S2.E9.m1.2.2.2.2.2.1.1.1.3.cmml" type="integer" xref="S2.E9.m1.2.2.2.2.2.1.1.1.3">0</cn></apply><apply id="S2.E9.m1.3.3.3.3.1.1.1.1.cmml" xref="S2.E9.m1.3.3.3.3.1.1.1"><csymbol cd="ambiguous" id="S2.E9.m1.3.3.3.3.1.1.1.1.1.cmml" xref="S2.E9.m1.3.3.3.3.1.1.1">subscript</csymbol><ci id="S2.E9.m1.3.3.3.3.1.1.1.1.2.cmml" xref="S2.E9.m1.3.3.3.3.1.1.1.1.2">𝑟</ci><ci id="S2.E9.m1.3.3.3.3.1.1.1.1.3a.cmml" xref="S2.E9.m1.3.3.3.3.1.1.1.1.3"><mtext id="S2.E9.m1.3.3.3.3.1.1.1.1.3.cmml" mathsize="70%" xref="S2.E9.m1.3.3.3.3.1.1.1.1.3">low</mtext></ci></apply><apply id="S2.E9.m1.4.4.4.4.2.1.1.1.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1"><leq id="S2.E9.m1.4.4.4.4.2.1.1.1.1.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.1"></leq><apply id="S2.E9.m1.4.4.4.4.2.1.1.1.2.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.2"><times id="S2.E9.m1.4.4.4.4.2.1.1.1.2.1.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.2.1"></times><ci id="S2.E9.m1.4.4.4.4.2.1.1.1.2.2a.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.2.2"><mtext id="S2.E9.m1.4.4.4.4.2.1.1.1.2.2.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.2.2">if </mtext></ci><apply id="S2.E9.m1.4.4.4.4.2.1.1.1.2.3.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.2.3"><csymbol cd="ambiguous" id="S2.E9.m1.4.4.4.4.2.1.1.1.2.3.1.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.2.3">subscript</csymbol><ci id="S2.E9.m1.4.4.4.4.2.1.1.1.2.3.2.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.2.3.2">𝐹</ci><ci id="S2.E9.m1.4.4.4.4.2.1.1.1.2.3.3.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.2.3.3">𝑡</ci></apply></apply><apply id="S2.E9.m1.4.4.4.4.2.1.1.1.3.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.3"><csymbol cd="ambiguous" id="S2.E9.m1.4.4.4.4.2.1.1.1.3.1.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.3">superscript</csymbol><apply id="S2.E9.m1.4.4.4.4.2.1.1.1.3.2.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.3"><csymbol cd="ambiguous" id="S2.E9.m1.4.4.4.4.2.1.1.1.3.2.1.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.3">subscript</csymbol><ci id="S2.E9.m1.4.4.4.4.2.1.1.1.3.2.2.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.3.2.2">𝐹</ci><ci id="S2.E9.m1.4.4.4.4.2.1.1.1.3.2.3a.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.3.2.3"><mtext id="S2.E9.m1.4.4.4.4.2.1.1.1.3.2.3.cmml" mathsize="70%" xref="S2.E9.m1.4.4.4.4.2.1.1.1.3.2.3">low</mtext></ci></apply><ci id="S2.E9.m1.4.4.4.4.2.1.1.1.3.3a.cmml" xref="S2.E9.m1.4.4.4.4.2.1.1.1.3.3"><mtext id="S2.E9.m1.4.4.4.4.2.1.1.1.3.3.cmml" mathsize="70%" xref="S2.E9.m1.4.4.4.4.2.1.1.1.3.3">th</mtext></ci></apply></apply><apply id="S2.E9.m1.5.5.5.5.1.1.1.1.cmml" xref="S2.E9.m1.5.5.5.5.1.1.1"><csymbol cd="ambiguous" id="S2.E9.m1.5.5.5.5.1.1.1.1.1.cmml" xref="S2.E9.m1.5.5.5.5.1.1.1">subscript</csymbol><ci id="S2.E9.m1.5.5.5.5.1.1.1.1.2.cmml" xref="S2.E9.m1.5.5.5.5.1.1.1.1.2">𝑟</ci><ci id="S2.E9.m1.5.5.5.5.1.1.1.1.3a.cmml" xref="S2.E9.m1.5.5.5.5.1.1.1.1.3"><mtext id="S2.E9.m1.5.5.5.5.1.1.1.1.3.cmml" mathsize="70%" xref="S2.E9.m1.5.5.5.5.1.1.1.1.3">high</mtext></ci></apply><apply id="S2.E9.m1.6.6.6.6.2.1.1.1.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1"><geq id="S2.E9.m1.6.6.6.6.2.1.1.1.1.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.1"></geq><apply id="S2.E9.m1.6.6.6.6.2.1.1.1.2.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.2"><times id="S2.E9.m1.6.6.6.6.2.1.1.1.2.1.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.2.1"></times><ci id="S2.E9.m1.6.6.6.6.2.1.1.1.2.2a.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.2.2"><mtext id="S2.E9.m1.6.6.6.6.2.1.1.1.2.2.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.2.2">if </mtext></ci><apply id="S2.E9.m1.6.6.6.6.2.1.1.1.2.3.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.2.3"><csymbol cd="ambiguous" id="S2.E9.m1.6.6.6.6.2.1.1.1.2.3.1.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.2.3">subscript</csymbol><ci id="S2.E9.m1.6.6.6.6.2.1.1.1.2.3.2.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.2.3.2">𝐹</ci><ci id="S2.E9.m1.6.6.6.6.2.1.1.1.2.3.3.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.2.3.3">𝑡</ci></apply></apply><apply id="S2.E9.m1.6.6.6.6.2.1.1.1.3.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.3"><csymbol cd="ambiguous" id="S2.E9.m1.6.6.6.6.2.1.1.1.3.1.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.3">superscript</csymbol><apply id="S2.E9.m1.6.6.6.6.2.1.1.1.3.2.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.3"><csymbol cd="ambiguous" id="S2.E9.m1.6.6.6.6.2.1.1.1.3.2.1.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.3">subscript</csymbol><ci id="S2.E9.m1.6.6.6.6.2.1.1.1.3.2.2.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.3.2.2">𝐹</ci><ci id="S2.E9.m1.6.6.6.6.2.1.1.1.3.2.3a.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.3.2.3"><mtext id="S2.E9.m1.6.6.6.6.2.1.1.1.3.2.3.cmml" mathsize="70%" xref="S2.E9.m1.6.6.6.6.2.1.1.1.3.2.3">high</mtext></ci></apply><ci id="S2.E9.m1.6.6.6.6.2.1.1.1.3.3a.cmml" xref="S2.E9.m1.6.6.6.6.2.1.1.1.3.3"><mtext id="S2.E9.m1.6.6.6.6.2.1.1.1.3.3.cmml" mathsize="70%" xref="S2.E9.m1.6.6.6.6.2.1.1.1.3.3">th</mtext></ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E9.m1.6c">R^{F}_{t}=\begin{cases}r_{\text{DoNothing}},&amp;\text{if }F_{t}=0,\\ r_{\text{low}},&amp;\text{if }F_{t}\leq F_{\text{low}}^{\text{th}},\\ r_{\text{high}},&amp;\text{if }F_{t}\geq F_{\text{high}}^{\text{th}},\end{cases}</annotation><annotation encoding="application/x-llamapun" id="S2.E9.m1.6d">italic_R start_POSTSUPERSCRIPT italic_F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { start_ROW start_CELL italic_r start_POSTSUBSCRIPT DoNothing end_POSTSUBSCRIPT , end_CELL start_CELL if italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 , end_CELL end_ROW start_ROW start_CELL italic_r start_POSTSUBSCRIPT low end_POSTSUBSCRIPT , end_CELL start_CELL if italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ italic_F start_POSTSUBSCRIPT low end_POSTSUBSCRIPT start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_r start_POSTSUBSCRIPT high end_POSTSUBSCRIPT , end_CELL start_CELL if italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≥ italic_F start_POSTSUBSCRIPT high end_POSTSUBSCRIPT start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT , end_CELL end_ROW</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(9)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S2.SS2.SSS3.p1.8">where <math alttext="r_{\text{default}}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.5.m1.1"><semantics id="S2.SS2.SSS3.p1.5.m1.1a"><msub id="S2.SS2.SSS3.p1.5.m1.1.1" xref="S2.SS2.SSS3.p1.5.m1.1.1.cmml"><mi id="S2.SS2.SSS3.p1.5.m1.1.1.2" xref="S2.SS2.SSS3.p1.5.m1.1.1.2.cmml">r</mi><mtext id="S2.SS2.SSS3.p1.5.m1.1.1.3" xref="S2.SS2.SSS3.p1.5.m1.1.1.3a.cmml">default</mtext></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.5.m1.1b"><apply id="S2.SS2.SSS3.p1.5.m1.1.1.cmml" xref="S2.SS2.SSS3.p1.5.m1.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.5.m1.1.1.1.cmml" xref="S2.SS2.SSS3.p1.5.m1.1.1">subscript</csymbol><ci id="S2.SS2.SSS3.p1.5.m1.1.1.2.cmml" xref="S2.SS2.SSS3.p1.5.m1.1.1.2">𝑟</ci><ci id="S2.SS2.SSS3.p1.5.m1.1.1.3a.cmml" xref="S2.SS2.SSS3.p1.5.m1.1.1.3"><mtext id="S2.SS2.SSS3.p1.5.m1.1.1.3.cmml" mathsize="70%" xref="S2.SS2.SSS3.p1.5.m1.1.1.3">default</mtext></ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.5.m1.1c">r_{\text{default}}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.5.m1.1d">italic_r start_POSTSUBSCRIPT default end_POSTSUBSCRIPT</annotation></semantics></math> is a baseline reward indicating no switching actions, <math alttext="r_{\text{low}}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.6.m2.1"><semantics id="S2.SS2.SSS3.p1.6.m2.1a"><msub id="S2.SS2.SSS3.p1.6.m2.1.1" xref="S2.SS2.SSS3.p1.6.m2.1.1.cmml"><mi id="S2.SS2.SSS3.p1.6.m2.1.1.2" xref="S2.SS2.SSS3.p1.6.m2.1.1.2.cmml">r</mi><mtext id="S2.SS2.SSS3.p1.6.m2.1.1.3" xref="S2.SS2.SSS3.p1.6.m2.1.1.3a.cmml">low</mtext></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.6.m2.1b"><apply id="S2.SS2.SSS3.p1.6.m2.1.1.cmml" xref="S2.SS2.SSS3.p1.6.m2.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.6.m2.1.1.1.cmml" xref="S2.SS2.SSS3.p1.6.m2.1.1">subscript</csymbol><ci id="S2.SS2.SSS3.p1.6.m2.1.1.2.cmml" xref="S2.SS2.SSS3.p1.6.m2.1.1.2">𝑟</ci><ci id="S2.SS2.SSS3.p1.6.m2.1.1.3a.cmml" xref="S2.SS2.SSS3.p1.6.m2.1.1.3"><mtext id="S2.SS2.SSS3.p1.6.m2.1.1.3.cmml" mathsize="70%" xref="S2.SS2.SSS3.p1.6.m2.1.1.3">low</mtext></ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.6.m2.1c">r_{\text{low}}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.6.m2.1d">italic_r start_POSTSUBSCRIPT low end_POSTSUBSCRIPT</annotation></semantics></math> is a penalty for low switching frequency, <math alttext="r_{\text{high}}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.7.m3.1"><semantics id="S2.SS2.SSS3.p1.7.m3.1a"><msub id="S2.SS2.SSS3.p1.7.m3.1.1" xref="S2.SS2.SSS3.p1.7.m3.1.1.cmml"><mi id="S2.SS2.SSS3.p1.7.m3.1.1.2" xref="S2.SS2.SSS3.p1.7.m3.1.1.2.cmml">r</mi><mtext id="S2.SS2.SSS3.p1.7.m3.1.1.3" xref="S2.SS2.SSS3.p1.7.m3.1.1.3a.cmml">high</mtext></msub><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.7.m3.1b"><apply id="S2.SS2.SSS3.p1.7.m3.1.1.cmml" xref="S2.SS2.SSS3.p1.7.m3.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.7.m3.1.1.1.cmml" xref="S2.SS2.SSS3.p1.7.m3.1.1">subscript</csymbol><ci id="S2.SS2.SSS3.p1.7.m3.1.1.2.cmml" xref="S2.SS2.SSS3.p1.7.m3.1.1.2">𝑟</ci><ci id="S2.SS2.SSS3.p1.7.m3.1.1.3a.cmml" xref="S2.SS2.SSS3.p1.7.m3.1.1.3"><mtext id="S2.SS2.SSS3.p1.7.m3.1.1.3.cmml" mathsize="70%" xref="S2.SS2.SSS3.p1.7.m3.1.1.3">high</mtext></ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.7.m3.1c">r_{\text{high}}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.7.m3.1d">italic_r start_POSTSUBSCRIPT high end_POSTSUBSCRIPT</annotation></semantics></math> is a penalty for high switching frequency, and <math alttext="F_{\text{low}}^{\text{th}},F_{\text{high}}^{\text{th}}" class="ltx_Math" display="inline" id="S2.SS2.SSS3.p1.8.m4.2"><semantics id="S2.SS2.SSS3.p1.8.m4.2a"><mrow id="S2.SS2.SSS3.p1.8.m4.2.2.2" xref="S2.SS2.SSS3.p1.8.m4.2.2.3.cmml"><msubsup id="S2.SS2.SSS3.p1.8.m4.1.1.1.1" xref="S2.SS2.SSS3.p1.8.m4.1.1.1.1.cmml"><mi id="S2.SS2.SSS3.p1.8.m4.1.1.1.1.2.2" xref="S2.SS2.SSS3.p1.8.m4.1.1.1.1.2.2.cmml">F</mi><mtext id="S2.SS2.SSS3.p1.8.m4.1.1.1.1.2.3" xref="S2.SS2.SSS3.p1.8.m4.1.1.1.1.2.3a.cmml">low</mtext><mtext id="S2.SS2.SSS3.p1.8.m4.1.1.1.1.3" xref="S2.SS2.SSS3.p1.8.m4.1.1.1.1.3a.cmml">th</mtext></msubsup><mo id="S2.SS2.SSS3.p1.8.m4.2.2.2.3" xref="S2.SS2.SSS3.p1.8.m4.2.2.3.cmml">,</mo><msubsup id="S2.SS2.SSS3.p1.8.m4.2.2.2.2" xref="S2.SS2.SSS3.p1.8.m4.2.2.2.2.cmml"><mi id="S2.SS2.SSS3.p1.8.m4.2.2.2.2.2.2" xref="S2.SS2.SSS3.p1.8.m4.2.2.2.2.2.2.cmml">F</mi><mtext id="S2.SS2.SSS3.p1.8.m4.2.2.2.2.2.3" xref="S2.SS2.SSS3.p1.8.m4.2.2.2.2.2.3a.cmml">high</mtext><mtext id="S2.SS2.SSS3.p1.8.m4.2.2.2.2.3" xref="S2.SS2.SSS3.p1.8.m4.2.2.2.2.3a.cmml">th</mtext></msubsup></mrow><annotation-xml encoding="MathML-Content" id="S2.SS2.SSS3.p1.8.m4.2b"><list id="S2.SS2.SSS3.p1.8.m4.2.2.3.cmml" xref="S2.SS2.SSS3.p1.8.m4.2.2.2"><apply id="S2.SS2.SSS3.p1.8.m4.1.1.1.1.cmml" xref="S2.SS2.SSS3.p1.8.m4.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.8.m4.1.1.1.1.1.cmml" xref="S2.SS2.SSS3.p1.8.m4.1.1.1.1">superscript</csymbol><apply id="S2.SS2.SSS3.p1.8.m4.1.1.1.1.2.cmml" xref="S2.SS2.SSS3.p1.8.m4.1.1.1.1"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.8.m4.1.1.1.1.2.1.cmml" xref="S2.SS2.SSS3.p1.8.m4.1.1.1.1">subscript</csymbol><ci id="S2.SS2.SSS3.p1.8.m4.1.1.1.1.2.2.cmml" xref="S2.SS2.SSS3.p1.8.m4.1.1.1.1.2.2">𝐹</ci><ci id="S2.SS2.SSS3.p1.8.m4.1.1.1.1.2.3a.cmml" xref="S2.SS2.SSS3.p1.8.m4.1.1.1.1.2.3"><mtext id="S2.SS2.SSS3.p1.8.m4.1.1.1.1.2.3.cmml" mathsize="70%" xref="S2.SS2.SSS3.p1.8.m4.1.1.1.1.2.3">low</mtext></ci></apply><ci id="S2.SS2.SSS3.p1.8.m4.1.1.1.1.3a.cmml" xref="S2.SS2.SSS3.p1.8.m4.1.1.1.1.3"><mtext id="S2.SS2.SSS3.p1.8.m4.1.1.1.1.3.cmml" mathsize="70%" xref="S2.SS2.SSS3.p1.8.m4.1.1.1.1.3">th</mtext></ci></apply><apply id="S2.SS2.SSS3.p1.8.m4.2.2.2.2.cmml" xref="S2.SS2.SSS3.p1.8.m4.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.8.m4.2.2.2.2.1.cmml" xref="S2.SS2.SSS3.p1.8.m4.2.2.2.2">superscript</csymbol><apply id="S2.SS2.SSS3.p1.8.m4.2.2.2.2.2.cmml" xref="S2.SS2.SSS3.p1.8.m4.2.2.2.2"><csymbol cd="ambiguous" id="S2.SS2.SSS3.p1.8.m4.2.2.2.2.2.1.cmml" xref="S2.SS2.SSS3.p1.8.m4.2.2.2.2">subscript</csymbol><ci id="S2.SS2.SSS3.p1.8.m4.2.2.2.2.2.2.cmml" xref="S2.SS2.SSS3.p1.8.m4.2.2.2.2.2.2">𝐹</ci><ci id="S2.SS2.SSS3.p1.8.m4.2.2.2.2.2.3a.cmml" xref="S2.SS2.SSS3.p1.8.m4.2.2.2.2.2.3"><mtext id="S2.SS2.SSS3.p1.8.m4.2.2.2.2.2.3.cmml" mathsize="70%" xref="S2.SS2.SSS3.p1.8.m4.2.2.2.2.2.3">high</mtext></ci></apply><ci id="S2.SS2.SSS3.p1.8.m4.2.2.2.2.3a.cmml" xref="S2.SS2.SSS3.p1.8.m4.2.2.2.2.3"><mtext id="S2.SS2.SSS3.p1.8.m4.2.2.2.2.3.cmml" mathsize="70%" xref="S2.SS2.SSS3.p1.8.m4.2.2.2.2.3">th</mtext></ci></apply></list></annotation-xml><annotation encoding="application/x-tex" id="S2.SS2.SSS3.p1.8.m4.2c">F_{\text{low}}^{\text{th}},F_{\text{high}}^{\text{th}}</annotation><annotation encoding="application/x-llamapun" id="S2.SS2.SSS3.p1.8.m4.2d">italic_F start_POSTSUBSCRIPT low end_POSTSUBSCRIPT start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT , italic_F start_POSTSUBSCRIPT high end_POSTSUBSCRIPT start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT</annotation></semantics></math> are the predefined thresholds for low and high switching frequencies, respectively.</p> </div> </section> </section> <section class="ltx_subsection" id="S2.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S2.SS3.5.1.1">II-C</span> </span><span class="ltx_text ltx_font_italic" id="S2.SS3.6.2">Deep Optimistic Linear Support</span> </h3> <figure class="ltx_float ltx_float_algorithm ltx_framed ltx_framed_top" id="alg2"> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_float"><span class="ltx_text ltx_font_bold" id="alg2.2.1.1">Algorithm 2</span> </span> Deep Optimistic Linear Support (DOL)</figcaption> <div class="ltx_listing ltx_listing" id="alg2.3"> <div class="ltx_listingline" id="alg2.l0"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l0.1.1.1" style="font-size:80%;">0:</span></span>  single-policy RL algorithm <span class="ltx_text ltx_font_typewriter" id="alg2.l0.2">MO PPO</span>, maximum number of iterations <math alttext="k^{max}" class="ltx_Math" display="inline" id="alg2.l0.m1.1"><semantics id="alg2.l0.m1.1a"><msup id="alg2.l0.m1.1.1" xref="alg2.l0.m1.1.1.cmml"><mi id="alg2.l0.m1.1.1.2" xref="alg2.l0.m1.1.1.2.cmml">k</mi><mrow id="alg2.l0.m1.1.1.3" xref="alg2.l0.m1.1.1.3.cmml"><mi id="alg2.l0.m1.1.1.3.2" xref="alg2.l0.m1.1.1.3.2.cmml">m</mi><mo id="alg2.l0.m1.1.1.3.1" xref="alg2.l0.m1.1.1.3.1.cmml">⁢</mo><mi id="alg2.l0.m1.1.1.3.3" xref="alg2.l0.m1.1.1.3.3.cmml">a</mi><mo id="alg2.l0.m1.1.1.3.1a" xref="alg2.l0.m1.1.1.3.1.cmml">⁢</mo><mi id="alg2.l0.m1.1.1.3.4" xref="alg2.l0.m1.1.1.3.4.cmml">x</mi></mrow></msup><annotation-xml encoding="MathML-Content" id="alg2.l0.m1.1b"><apply id="alg2.l0.m1.1.1.cmml" xref="alg2.l0.m1.1.1"><csymbol cd="ambiguous" id="alg2.l0.m1.1.1.1.cmml" xref="alg2.l0.m1.1.1">superscript</csymbol><ci id="alg2.l0.m1.1.1.2.cmml" xref="alg2.l0.m1.1.1.2">𝑘</ci><apply id="alg2.l0.m1.1.1.3.cmml" xref="alg2.l0.m1.1.1.3"><times id="alg2.l0.m1.1.1.3.1.cmml" xref="alg2.l0.m1.1.1.3.1"></times><ci id="alg2.l0.m1.1.1.3.2.cmml" xref="alg2.l0.m1.1.1.3.2">𝑚</ci><ci id="alg2.l0.m1.1.1.3.3.cmml" xref="alg2.l0.m1.1.1.3.3">𝑎</ci><ci id="alg2.l0.m1.1.1.3.4.cmml" xref="alg2.l0.m1.1.1.3.4">𝑥</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l0.m1.1c">k^{max}</annotation><annotation encoding="application/x-llamapun" id="alg2.l0.m1.1d">italic_k start_POSTSUPERSCRIPT italic_m italic_a italic_x end_POSTSUPERSCRIPT</annotation></semantics></math>, reuse option (<span class="ltx_text ltx_font_typewriter" id="alg2.l0.3">no reuse</span>, <span class="ltx_text ltx_font_typewriter" id="alg2.l0.4">full reuse</span>, <span class="ltx_text ltx_font_typewriter" id="alg2.l0.5">partial reuse</span>) </div> <div class="ltx_listingline" id="alg2.l1"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l1.1.1.1" style="font-size:80%;">1:</span></span>  Initialize partial CCS <math alttext="\Omega^{s}\leftarrow\emptyset" class="ltx_Math" display="inline" id="alg2.l1.m1.1"><semantics id="alg2.l1.m1.1a"><mrow id="alg2.l1.m1.1.1" xref="alg2.l1.m1.1.1.cmml"><msup id="alg2.l1.m1.1.1.2" xref="alg2.l1.m1.1.1.2.cmml"><mi id="alg2.l1.m1.1.1.2.2" mathvariant="normal" xref="alg2.l1.m1.1.1.2.2.cmml">Ω</mi><mi id="alg2.l1.m1.1.1.2.3" xref="alg2.l1.m1.1.1.2.3.cmml">s</mi></msup><mo id="alg2.l1.m1.1.1.1" stretchy="false" xref="alg2.l1.m1.1.1.1.cmml">←</mo><mi id="alg2.l1.m1.1.1.3" mathvariant="normal" xref="alg2.l1.m1.1.1.3.cmml">∅</mi></mrow><annotation-xml encoding="MathML-Content" id="alg2.l1.m1.1b"><apply id="alg2.l1.m1.1.1.cmml" xref="alg2.l1.m1.1.1"><ci id="alg2.l1.m1.1.1.1.cmml" xref="alg2.l1.m1.1.1.1">←</ci><apply id="alg2.l1.m1.1.1.2.cmml" xref="alg2.l1.m1.1.1.2"><csymbol cd="ambiguous" id="alg2.l1.m1.1.1.2.1.cmml" xref="alg2.l1.m1.1.1.2">superscript</csymbol><ci id="alg2.l1.m1.1.1.2.2.cmml" xref="alg2.l1.m1.1.1.2.2">Ω</ci><ci id="alg2.l1.m1.1.1.2.3.cmml" xref="alg2.l1.m1.1.1.2.3">𝑠</ci></apply><emptyset id="alg2.l1.m1.1.1.3.cmml" xref="alg2.l1.m1.1.1.3"></emptyset></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l1.m1.1c">\Omega^{s}\leftarrow\emptyset</annotation><annotation encoding="application/x-llamapun" id="alg2.l1.m1.1d">roman_Ω start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ← ∅</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg2.l2"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l2.1.1.1" style="font-size:80%;">2:</span></span>  Initialize set of visited weights <math alttext="W\leftarrow\emptyset" class="ltx_Math" display="inline" id="alg2.l2.m1.1"><semantics id="alg2.l2.m1.1a"><mrow id="alg2.l2.m1.1.1" xref="alg2.l2.m1.1.1.cmml"><mi id="alg2.l2.m1.1.1.2" xref="alg2.l2.m1.1.1.2.cmml">W</mi><mo id="alg2.l2.m1.1.1.1" stretchy="false" xref="alg2.l2.m1.1.1.1.cmml">←</mo><mi id="alg2.l2.m1.1.1.3" mathvariant="normal" xref="alg2.l2.m1.1.1.3.cmml">∅</mi></mrow><annotation-xml encoding="MathML-Content" id="alg2.l2.m1.1b"><apply id="alg2.l2.m1.1.1.cmml" xref="alg2.l2.m1.1.1"><ci id="alg2.l2.m1.1.1.1.cmml" xref="alg2.l2.m1.1.1.1">←</ci><ci id="alg2.l2.m1.1.1.2.cmml" xref="alg2.l2.m1.1.1.2">𝑊</ci><emptyset id="alg2.l2.m1.1.1.3.cmml" xref="alg2.l2.m1.1.1.3"></emptyset></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l2.m1.1c">W\leftarrow\emptyset</annotation><annotation encoding="application/x-llamapun" id="alg2.l2.m1.1d">italic_W ← ∅</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg2.l3"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l3.1.1.1" style="font-size:80%;">3:</span></span>  Initialize priority queue <math alttext="Q\leftarrow\emptyset" class="ltx_Math" display="inline" id="alg2.l3.m1.1"><semantics id="alg2.l3.m1.1a"><mrow id="alg2.l3.m1.1.1" xref="alg2.l3.m1.1.1.cmml"><mi id="alg2.l3.m1.1.1.2" xref="alg2.l3.m1.1.1.2.cmml">Q</mi><mo id="alg2.l3.m1.1.1.1" stretchy="false" xref="alg2.l3.m1.1.1.1.cmml">←</mo><mi id="alg2.l3.m1.1.1.3" mathvariant="normal" xref="alg2.l3.m1.1.1.3.cmml">∅</mi></mrow><annotation-xml encoding="MathML-Content" id="alg2.l3.m1.1b"><apply id="alg2.l3.m1.1.1.cmml" xref="alg2.l3.m1.1.1"><ci id="alg2.l3.m1.1.1.1.cmml" xref="alg2.l3.m1.1.1.1">←</ci><ci id="alg2.l3.m1.1.1.2.cmml" xref="alg2.l3.m1.1.1.2">𝑄</ci><emptyset id="alg2.l3.m1.1.1.3.cmml" xref="alg2.l3.m1.1.1.3"></emptyset></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l3.m1.1c">Q\leftarrow\emptyset</annotation><annotation encoding="application/x-llamapun" id="alg2.l3.m1.1d">italic_Q ← ∅</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg2.l4"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l4.1.1.1" style="font-size:80%;">4:</span></span>  Initialize model repository <math alttext="\text{Models}\leftarrow\emptyset" class="ltx_Math" display="inline" id="alg2.l4.m1.1"><semantics id="alg2.l4.m1.1a"><mrow id="alg2.l4.m1.1.1" xref="alg2.l4.m1.1.1.cmml"><mtext id="alg2.l4.m1.1.1.2" xref="alg2.l4.m1.1.1.2a.cmml">Models</mtext><mo id="alg2.l4.m1.1.1.1" stretchy="false" xref="alg2.l4.m1.1.1.1.cmml">←</mo><mi id="alg2.l4.m1.1.1.3" mathvariant="normal" xref="alg2.l4.m1.1.1.3.cmml">∅</mi></mrow><annotation-xml encoding="MathML-Content" id="alg2.l4.m1.1b"><apply id="alg2.l4.m1.1.1.cmml" xref="alg2.l4.m1.1.1"><ci id="alg2.l4.m1.1.1.1.cmml" xref="alg2.l4.m1.1.1.1">←</ci><ci id="alg2.l4.m1.1.1.2a.cmml" xref="alg2.l4.m1.1.1.2"><mtext id="alg2.l4.m1.1.1.2.cmml" xref="alg2.l4.m1.1.1.2">Models</mtext></ci><emptyset id="alg2.l4.m1.1.1.3.cmml" xref="alg2.l4.m1.1.1.3"></emptyset></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l4.m1.1c">\text{Models}\leftarrow\emptyset</annotation><annotation encoding="application/x-llamapun" id="alg2.l4.m1.1d">Models ← ∅</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg2.l5"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l5.1.1.1" style="font-size:80%;">5:</span></span>  Initialize iteration <math alttext="k" class="ltx_Math" display="inline" id="alg2.l5.m1.1"><semantics id="alg2.l5.m1.1a"><mi id="alg2.l5.m1.1.1" xref="alg2.l5.m1.1.1.cmml">k</mi><annotation-xml encoding="MathML-Content" id="alg2.l5.m1.1b"><ci id="alg2.l5.m1.1.1.cmml" xref="alg2.l5.m1.1.1">𝑘</ci></annotation-xml><annotation encoding="application/x-tex" id="alg2.l5.m1.1c">k</annotation><annotation encoding="application/x-llamapun" id="alg2.l5.m1.1d">italic_k</annotation></semantics></math> as 0. </div> <div class="ltx_listingline" id="alg2.l6"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l6.1.1.1" style="font-size:80%;">6:</span></span>  <span class="ltx_text ltx_font_bold" id="alg2.l6.2">for all</span> extremum weight <math alttext="w_{e}" class="ltx_Math" display="inline" id="alg2.l6.m1.1"><semantics id="alg2.l6.m1.1a"><msub id="alg2.l6.m1.1.1" xref="alg2.l6.m1.1.1.cmml"><mi id="alg2.l6.m1.1.1.2" xref="alg2.l6.m1.1.1.2.cmml">w</mi><mi id="alg2.l6.m1.1.1.3" xref="alg2.l6.m1.1.1.3.cmml">e</mi></msub><annotation-xml encoding="MathML-Content" id="alg2.l6.m1.1b"><apply id="alg2.l6.m1.1.1.cmml" xref="alg2.l6.m1.1.1"><csymbol cd="ambiguous" id="alg2.l6.m1.1.1.1.cmml" xref="alg2.l6.m1.1.1">subscript</csymbol><ci id="alg2.l6.m1.1.1.2.cmml" xref="alg2.l6.m1.1.1.2">𝑤</ci><ci id="alg2.l6.m1.1.1.3.cmml" xref="alg2.l6.m1.1.1.3">𝑒</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l6.m1.1c">w_{e}</annotation><annotation encoding="application/x-llamapun" id="alg2.l6.m1.1d">italic_w start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT</annotation></semantics></math> of the weight simplex <span class="ltx_text ltx_font_bold" id="alg2.l6.3">do</span> </div> <div class="ltx_listingline" id="alg2.l7"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l7.1.1.1" style="font-size:80%;">7:</span></span>     Add <math alttext="(w_{e},\infty)" class="ltx_Math" display="inline" id="alg2.l7.m1.2"><semantics id="alg2.l7.m1.2a"><mrow id="alg2.l7.m1.2.2.1" xref="alg2.l7.m1.2.2.2.cmml"><mo id="alg2.l7.m1.2.2.1.2" stretchy="false" xref="alg2.l7.m1.2.2.2.cmml">(</mo><msub id="alg2.l7.m1.2.2.1.1" xref="alg2.l7.m1.2.2.1.1.cmml"><mi id="alg2.l7.m1.2.2.1.1.2" xref="alg2.l7.m1.2.2.1.1.2.cmml">w</mi><mi id="alg2.l7.m1.2.2.1.1.3" xref="alg2.l7.m1.2.2.1.1.3.cmml">e</mi></msub><mo id="alg2.l7.m1.2.2.1.3" xref="alg2.l7.m1.2.2.2.cmml">,</mo><mi id="alg2.l7.m1.1.1" mathvariant="normal" xref="alg2.l7.m1.1.1.cmml">∞</mi><mo id="alg2.l7.m1.2.2.1.4" stretchy="false" xref="alg2.l7.m1.2.2.2.cmml">)</mo></mrow><annotation-xml encoding="MathML-Content" id="alg2.l7.m1.2b"><interval closure="open" id="alg2.l7.m1.2.2.2.cmml" xref="alg2.l7.m1.2.2.1"><apply id="alg2.l7.m1.2.2.1.1.cmml" xref="alg2.l7.m1.2.2.1.1"><csymbol cd="ambiguous" id="alg2.l7.m1.2.2.1.1.1.cmml" xref="alg2.l7.m1.2.2.1.1">subscript</csymbol><ci id="alg2.l7.m1.2.2.1.1.2.cmml" xref="alg2.l7.m1.2.2.1.1.2">𝑤</ci><ci id="alg2.l7.m1.2.2.1.1.3.cmml" xref="alg2.l7.m1.2.2.1.1.3">𝑒</ci></apply><infinity id="alg2.l7.m1.1.1.cmml" xref="alg2.l7.m1.1.1"></infinity></interval></annotation-xml><annotation encoding="application/x-tex" id="alg2.l7.m1.2c">(w_{e},\infty)</annotation><annotation encoding="application/x-llamapun" id="alg2.l7.m1.2d">( italic_w start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , ∞ )</annotation></semantics></math> to <math alttext="Q" class="ltx_Math" display="inline" id="alg2.l7.m2.1"><semantics id="alg2.l7.m2.1a"><mi id="alg2.l7.m2.1.1" xref="alg2.l7.m2.1.1.cmml">Q</mi><annotation-xml encoding="MathML-Content" id="alg2.l7.m2.1b"><ci id="alg2.l7.m2.1.1.cmml" xref="alg2.l7.m2.1.1">𝑄</ci></annotation-xml><annotation encoding="application/x-tex" id="alg2.l7.m2.1c">Q</annotation><annotation encoding="application/x-llamapun" id="alg2.l7.m2.1d">italic_Q</annotation></semantics></math> {Add extrema with infinite priority} </div> <div class="ltx_listingline" id="alg2.l8"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l8.1.1.1" style="font-size:80%;">8:</span></span>  <span class="ltx_text ltx_font_bold" id="alg2.l8.2">end</span> <span class="ltx_text ltx_font_bold" id="alg2.l8.3">for</span> </div> <div class="ltx_listingline" id="alg2.l9"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l9.1.1.1" style="font-size:80%;">9:</span></span>  <span class="ltx_text ltx_font_bold" id="alg2.l9.2">while</span> <math alttext="Q" class="ltx_Math" display="inline" id="alg2.l9.m1.1"><semantics id="alg2.l9.m1.1a"><mi id="alg2.l9.m1.1.1" xref="alg2.l9.m1.1.1.cmml">Q</mi><annotation-xml encoding="MathML-Content" id="alg2.l9.m1.1b"><ci id="alg2.l9.m1.1.1.cmml" xref="alg2.l9.m1.1.1">𝑄</ci></annotation-xml><annotation encoding="application/x-tex" id="alg2.l9.m1.1c">Q</annotation><annotation encoding="application/x-llamapun" id="alg2.l9.m1.1d">italic_Q</annotation></semantics></math> is not empty <span class="ltx_text ltx_font_bold" id="alg2.l9.3">and</span> <math alttext="k&lt;k^{max}" class="ltx_Math" display="inline" id="alg2.l9.m2.1"><semantics id="alg2.l9.m2.1a"><mrow id="alg2.l9.m2.1.1" xref="alg2.l9.m2.1.1.cmml"><mi id="alg2.l9.m2.1.1.2" xref="alg2.l9.m2.1.1.2.cmml">k</mi><mo id="alg2.l9.m2.1.1.1" xref="alg2.l9.m2.1.1.1.cmml">&lt;</mo><msup id="alg2.l9.m2.1.1.3" xref="alg2.l9.m2.1.1.3.cmml"><mi id="alg2.l9.m2.1.1.3.2" xref="alg2.l9.m2.1.1.3.2.cmml">k</mi><mrow id="alg2.l9.m2.1.1.3.3" xref="alg2.l9.m2.1.1.3.3.cmml"><mi id="alg2.l9.m2.1.1.3.3.2" xref="alg2.l9.m2.1.1.3.3.2.cmml">m</mi><mo id="alg2.l9.m2.1.1.3.3.1" xref="alg2.l9.m2.1.1.3.3.1.cmml">⁢</mo><mi id="alg2.l9.m2.1.1.3.3.3" xref="alg2.l9.m2.1.1.3.3.3.cmml">a</mi><mo id="alg2.l9.m2.1.1.3.3.1a" xref="alg2.l9.m2.1.1.3.3.1.cmml">⁢</mo><mi id="alg2.l9.m2.1.1.3.3.4" xref="alg2.l9.m2.1.1.3.3.4.cmml">x</mi></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="alg2.l9.m2.1b"><apply id="alg2.l9.m2.1.1.cmml" xref="alg2.l9.m2.1.1"><lt id="alg2.l9.m2.1.1.1.cmml" xref="alg2.l9.m2.1.1.1"></lt><ci id="alg2.l9.m2.1.1.2.cmml" xref="alg2.l9.m2.1.1.2">𝑘</ci><apply id="alg2.l9.m2.1.1.3.cmml" xref="alg2.l9.m2.1.1.3"><csymbol cd="ambiguous" id="alg2.l9.m2.1.1.3.1.cmml" xref="alg2.l9.m2.1.1.3">superscript</csymbol><ci id="alg2.l9.m2.1.1.3.2.cmml" xref="alg2.l9.m2.1.1.3.2">𝑘</ci><apply id="alg2.l9.m2.1.1.3.3.cmml" xref="alg2.l9.m2.1.1.3.3"><times id="alg2.l9.m2.1.1.3.3.1.cmml" xref="alg2.l9.m2.1.1.3.3.1"></times><ci id="alg2.l9.m2.1.1.3.3.2.cmml" xref="alg2.l9.m2.1.1.3.3.2">𝑚</ci><ci id="alg2.l9.m2.1.1.3.3.3.cmml" xref="alg2.l9.m2.1.1.3.3.3">𝑎</ci><ci id="alg2.l9.m2.1.1.3.3.4.cmml" xref="alg2.l9.m2.1.1.3.3.4">𝑥</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l9.m2.1c">k&lt;k^{max}</annotation><annotation encoding="application/x-llamapun" id="alg2.l9.m2.1d">italic_k &lt; italic_k start_POSTSUPERSCRIPT italic_m italic_a italic_x end_POSTSUPERSCRIPT</annotation></semantics></math> <span class="ltx_text ltx_font_bold" id="alg2.l9.4">do</span> </div> <div class="ltx_listingline" id="alg2.l10"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l10.1.1.1" style="font-size:80%;">10:</span></span>     Pop <math alttext="w" class="ltx_Math" display="inline" id="alg2.l10.m1.1"><semantics id="alg2.l10.m1.1a"><mi id="alg2.l10.m1.1.1" xref="alg2.l10.m1.1.1.cmml">w</mi><annotation-xml encoding="MathML-Content" id="alg2.l10.m1.1b"><ci id="alg2.l10.m1.1.1.cmml" xref="alg2.l10.m1.1.1">𝑤</ci></annotation-xml><annotation encoding="application/x-tex" id="alg2.l10.m1.1c">w</annotation><annotation encoding="application/x-llamapun" id="alg2.l10.m1.1d">italic_w</annotation></semantics></math> from <math alttext="Q" class="ltx_Math" display="inline" id="alg2.l10.m2.1"><semantics id="alg2.l10.m2.1a"><mi id="alg2.l10.m2.1.1" xref="alg2.l10.m2.1.1.cmml">Q</mi><annotation-xml encoding="MathML-Content" id="alg2.l10.m2.1b"><ci id="alg2.l10.m2.1.1.cmml" xref="alg2.l10.m2.1.1">𝑄</ci></annotation-xml><annotation encoding="application/x-tex" id="alg2.l10.m2.1c">Q</annotation><annotation encoding="application/x-llamapun" id="alg2.l10.m2.1d">italic_Q</annotation></semantics></math> with the highest priority <math alttext="q_{\text{max}}" class="ltx_Math" display="inline" id="alg2.l10.m3.1"><semantics id="alg2.l10.m3.1a"><msub id="alg2.l10.m3.1.1" xref="alg2.l10.m3.1.1.cmml"><mi id="alg2.l10.m3.1.1.2" xref="alg2.l10.m3.1.1.2.cmml">q</mi><mtext id="alg2.l10.m3.1.1.3" xref="alg2.l10.m3.1.1.3a.cmml">max</mtext></msub><annotation-xml encoding="MathML-Content" id="alg2.l10.m3.1b"><apply id="alg2.l10.m3.1.1.cmml" xref="alg2.l10.m3.1.1"><csymbol cd="ambiguous" id="alg2.l10.m3.1.1.1.cmml" xref="alg2.l10.m3.1.1">subscript</csymbol><ci id="alg2.l10.m3.1.1.2.cmml" xref="alg2.l10.m3.1.1.2">𝑞</ci><ci id="alg2.l10.m3.1.1.3a.cmml" xref="alg2.l10.m3.1.1.3"><mtext id="alg2.l10.m3.1.1.3.cmml" mathsize="70%" xref="alg2.l10.m3.1.1.3">max</mtext></ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l10.m3.1c">q_{\text{max}}</annotation><annotation encoding="application/x-llamapun" id="alg2.l10.m3.1d">italic_q start_POSTSUBSCRIPT max end_POSTSUBSCRIPT</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg2.l11"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l11.1.1.1" style="font-size:80%;">11:</span></span>     <span class="ltx_text ltx_font_bold" id="alg2.l11.2">if</span> reuse option is <span class="ltx_text ltx_font_typewriter" id="alg2.l11.3">no reuse</span> <span class="ltx_text ltx_font_bold" id="alg2.l11.4">or</span> <span class="ltx_text ltx_markedasmath" id="alg2.l11.5">Models</span> is empty <span class="ltx_text ltx_font_bold" id="alg2.l11.6">then</span> </div> <div class="ltx_listingline" id="alg2.l12"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l12.1.1.1" style="font-size:80%;">12:</span></span>        Initialize model parameters <math alttext="\theta" class="ltx_Math" display="inline" id="alg2.l12.m1.1"><semantics id="alg2.l12.m1.1a"><mi id="alg2.l12.m1.1.1" xref="alg2.l12.m1.1.1.cmml">θ</mi><annotation-xml encoding="MathML-Content" id="alg2.l12.m1.1b"><ci id="alg2.l12.m1.1.1.cmml" xref="alg2.l12.m1.1.1">𝜃</ci></annotation-xml><annotation encoding="application/x-tex" id="alg2.l12.m1.1c">\theta</annotation><annotation encoding="application/x-llamapun" id="alg2.l12.m1.1d">italic_θ</annotation></semantics></math> randomly </div> <div class="ltx_listingline" id="alg2.l13"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l13.1.1.1" style="font-size:80%;">13:</span></span>     <span class="ltx_text ltx_font_bold" id="alg2.l13.2">else</span> </div> <div class="ltx_listingline" id="alg2.l14"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l14.1.1.1" style="font-size:80%;">14:</span></span>        Find the closest weight <math alttext="w_{\text{closest}}" class="ltx_Math" display="inline" id="alg2.l14.m1.1"><semantics id="alg2.l14.m1.1a"><msub id="alg2.l14.m1.1.1" xref="alg2.l14.m1.1.1.cmml"><mi id="alg2.l14.m1.1.1.2" xref="alg2.l14.m1.1.1.2.cmml">w</mi><mtext id="alg2.l14.m1.1.1.3" xref="alg2.l14.m1.1.1.3a.cmml">closest</mtext></msub><annotation-xml encoding="MathML-Content" id="alg2.l14.m1.1b"><apply id="alg2.l14.m1.1.1.cmml" xref="alg2.l14.m1.1.1"><csymbol cd="ambiguous" id="alg2.l14.m1.1.1.1.cmml" xref="alg2.l14.m1.1.1">subscript</csymbol><ci id="alg2.l14.m1.1.1.2.cmml" xref="alg2.l14.m1.1.1.2">𝑤</ci><ci id="alg2.l14.m1.1.1.3a.cmml" xref="alg2.l14.m1.1.1.3"><mtext id="alg2.l14.m1.1.1.3.cmml" mathsize="70%" xref="alg2.l14.m1.1.1.3">closest</mtext></ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l14.m1.1c">w_{\text{closest}}</annotation><annotation encoding="application/x-llamapun" id="alg2.l14.m1.1d">italic_w start_POSTSUBSCRIPT closest end_POSTSUBSCRIPT</annotation></semantics></math> in <math alttext="W" class="ltx_Math" display="inline" id="alg2.l14.m2.1"><semantics id="alg2.l14.m2.1a"><mi id="alg2.l14.m2.1.1" xref="alg2.l14.m2.1.1.cmml">W</mi><annotation-xml encoding="MathML-Content" id="alg2.l14.m2.1b"><ci id="alg2.l14.m2.1.1.cmml" xref="alg2.l14.m2.1.1">𝑊</ci></annotation-xml><annotation encoding="application/x-tex" id="alg2.l14.m2.1c">W</annotation><annotation encoding="application/x-llamapun" id="alg2.l14.m2.1d">italic_W</annotation></semantics></math> to <math alttext="w" class="ltx_Math" display="inline" id="alg2.l14.m3.1"><semantics id="alg2.l14.m3.1a"><mi id="alg2.l14.m3.1.1" xref="alg2.l14.m3.1.1.cmml">w</mi><annotation-xml encoding="MathML-Content" id="alg2.l14.m3.1b"><ci id="alg2.l14.m3.1.1.cmml" xref="alg2.l14.m3.1.1">𝑤</ci></annotation-xml><annotation encoding="application/x-tex" id="alg2.l14.m3.1c">w</annotation><annotation encoding="application/x-llamapun" id="alg2.l14.m3.1d">italic_w</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg2.l15"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l15.1.1.1" style="font-size:80%;">15:</span></span>        Initialize model parameters <math alttext="\theta" class="ltx_Math" display="inline" id="alg2.l15.m1.1"><semantics id="alg2.l15.m1.1a"><mi id="alg2.l15.m1.1.1" xref="alg2.l15.m1.1.1.cmml">θ</mi><annotation-xml encoding="MathML-Content" id="alg2.l15.m1.1b"><ci id="alg2.l15.m1.1.1.cmml" xref="alg2.l15.m1.1.1">𝜃</ci></annotation-xml><annotation encoding="application/x-tex" id="alg2.l15.m1.1c">\theta</annotation><annotation encoding="application/x-llamapun" id="alg2.l15.m1.1d">italic_θ</annotation></semantics></math> with parameters from <math alttext="\text{Models}[w_{\text{closest}}]" class="ltx_Math" display="inline" id="alg2.l15.m2.1"><semantics id="alg2.l15.m2.1a"><mrow id="alg2.l15.m2.1.1" xref="alg2.l15.m2.1.1.cmml"><mtext id="alg2.l15.m2.1.1.3" xref="alg2.l15.m2.1.1.3a.cmml">Models</mtext><mo id="alg2.l15.m2.1.1.2" xref="alg2.l15.m2.1.1.2.cmml">⁢</mo><mrow id="alg2.l15.m2.1.1.1.1" xref="alg2.l15.m2.1.1.1.2.cmml"><mo id="alg2.l15.m2.1.1.1.1.2" stretchy="false" xref="alg2.l15.m2.1.1.1.2.1.cmml">[</mo><msub id="alg2.l15.m2.1.1.1.1.1" xref="alg2.l15.m2.1.1.1.1.1.cmml"><mi id="alg2.l15.m2.1.1.1.1.1.2" xref="alg2.l15.m2.1.1.1.1.1.2.cmml">w</mi><mtext id="alg2.l15.m2.1.1.1.1.1.3" xref="alg2.l15.m2.1.1.1.1.1.3a.cmml">closest</mtext></msub><mo id="alg2.l15.m2.1.1.1.1.3" stretchy="false" xref="alg2.l15.m2.1.1.1.2.1.cmml">]</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg2.l15.m2.1b"><apply id="alg2.l15.m2.1.1.cmml" xref="alg2.l15.m2.1.1"><times id="alg2.l15.m2.1.1.2.cmml" xref="alg2.l15.m2.1.1.2"></times><ci id="alg2.l15.m2.1.1.3a.cmml" xref="alg2.l15.m2.1.1.3"><mtext id="alg2.l15.m2.1.1.3.cmml" xref="alg2.l15.m2.1.1.3">Models</mtext></ci><apply id="alg2.l15.m2.1.1.1.2.cmml" xref="alg2.l15.m2.1.1.1.1"><csymbol cd="latexml" id="alg2.l15.m2.1.1.1.2.1.cmml" xref="alg2.l15.m2.1.1.1.1.2">delimited-[]</csymbol><apply id="alg2.l15.m2.1.1.1.1.1.cmml" xref="alg2.l15.m2.1.1.1.1.1"><csymbol cd="ambiguous" id="alg2.l15.m2.1.1.1.1.1.1.cmml" xref="alg2.l15.m2.1.1.1.1.1">subscript</csymbol><ci id="alg2.l15.m2.1.1.1.1.1.2.cmml" xref="alg2.l15.m2.1.1.1.1.1.2">𝑤</ci><ci id="alg2.l15.m2.1.1.1.1.1.3a.cmml" xref="alg2.l15.m2.1.1.1.1.1.3"><mtext id="alg2.l15.m2.1.1.1.1.1.3.cmml" mathsize="70%" xref="alg2.l15.m2.1.1.1.1.1.3">closest</mtext></ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l15.m2.1c">\text{Models}[w_{\text{closest}}]</annotation><annotation encoding="application/x-llamapun" id="alg2.l15.m2.1d">Models [ italic_w start_POSTSUBSCRIPT closest end_POSTSUBSCRIPT ]</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg2.l16"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l16.1.1.1" style="font-size:80%;">16:</span></span>        <span class="ltx_text ltx_font_bold" id="alg2.l16.2">if</span> reuse option is <span class="ltx_text ltx_font_typewriter" id="alg2.l16.3">partial reuse</span> <span class="ltx_text ltx_font_bold" id="alg2.l16.4">then</span> </div> <div class="ltx_listingline" id="alg2.l17"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l17.1.1.1" style="font-size:80%;">17:</span></span>           Randomly reinitialize the last layer of the model </div> <div class="ltx_listingline" id="alg2.l18"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l18.1.1.1" style="font-size:80%;">18:</span></span>        <span class="ltx_text ltx_font_bold" id="alg2.l18.2">end</span> <span class="ltx_text ltx_font_bold" id="alg2.l18.3">if</span> </div> <div class="ltx_listingline" id="alg2.l19"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l19.1.1.1" style="font-size:80%;">19:</span></span>     <span class="ltx_text ltx_font_bold" id="alg2.l19.2">end</span> <span class="ltx_text ltx_font_bold" id="alg2.l19.3">if</span> </div> <div class="ltx_listingline" id="alg2.l20"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l20.1.1.1" style="font-size:80%;">20:</span></span>     <math alttext="\theta_{\text{new}}\leftarrow\texttt{MO PPO}(m,w,\theta)" class="ltx_Math" display="inline" id="alg2.l20.m1.3"><semantics id="alg2.l20.m1.3a"><mrow id="alg2.l20.m1.3.4" xref="alg2.l20.m1.3.4.cmml"><msub id="alg2.l20.m1.3.4.2" xref="alg2.l20.m1.3.4.2.cmml"><mi id="alg2.l20.m1.3.4.2.2" xref="alg2.l20.m1.3.4.2.2.cmml">θ</mi><mtext id="alg2.l20.m1.3.4.2.3" xref="alg2.l20.m1.3.4.2.3a.cmml">new</mtext></msub><mo id="alg2.l20.m1.3.4.1" stretchy="false" xref="alg2.l20.m1.3.4.1.cmml">←</mo><mrow id="alg2.l20.m1.3.4.3" xref="alg2.l20.m1.3.4.3.cmml"><mtext class="ltx_mathvariant_monospace" id="alg2.l20.m1.3.4.3.2" xref="alg2.l20.m1.3.4.3.2a.cmml">MO PPO</mtext><mo id="alg2.l20.m1.3.4.3.1" xref="alg2.l20.m1.3.4.3.1.cmml">⁢</mo><mrow id="alg2.l20.m1.3.4.3.3.2" xref="alg2.l20.m1.3.4.3.3.1.cmml"><mo id="alg2.l20.m1.3.4.3.3.2.1" stretchy="false" xref="alg2.l20.m1.3.4.3.3.1.cmml">(</mo><mi id="alg2.l20.m1.1.1" xref="alg2.l20.m1.1.1.cmml">m</mi><mo id="alg2.l20.m1.3.4.3.3.2.2" xref="alg2.l20.m1.3.4.3.3.1.cmml">,</mo><mi id="alg2.l20.m1.2.2" xref="alg2.l20.m1.2.2.cmml">w</mi><mo id="alg2.l20.m1.3.4.3.3.2.3" xref="alg2.l20.m1.3.4.3.3.1.cmml">,</mo><mi id="alg2.l20.m1.3.3" xref="alg2.l20.m1.3.3.cmml">θ</mi><mo id="alg2.l20.m1.3.4.3.3.2.4" stretchy="false" xref="alg2.l20.m1.3.4.3.3.1.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg2.l20.m1.3b"><apply id="alg2.l20.m1.3.4.cmml" xref="alg2.l20.m1.3.4"><ci id="alg2.l20.m1.3.4.1.cmml" xref="alg2.l20.m1.3.4.1">←</ci><apply id="alg2.l20.m1.3.4.2.cmml" xref="alg2.l20.m1.3.4.2"><csymbol cd="ambiguous" id="alg2.l20.m1.3.4.2.1.cmml" xref="alg2.l20.m1.3.4.2">subscript</csymbol><ci id="alg2.l20.m1.3.4.2.2.cmml" xref="alg2.l20.m1.3.4.2.2">𝜃</ci><ci id="alg2.l20.m1.3.4.2.3a.cmml" xref="alg2.l20.m1.3.4.2.3"><mtext id="alg2.l20.m1.3.4.2.3.cmml" mathsize="70%" xref="alg2.l20.m1.3.4.2.3">new</mtext></ci></apply><apply id="alg2.l20.m1.3.4.3.cmml" xref="alg2.l20.m1.3.4.3"><times id="alg2.l20.m1.3.4.3.1.cmml" xref="alg2.l20.m1.3.4.3.1"></times><ci id="alg2.l20.m1.3.4.3.2a.cmml" xref="alg2.l20.m1.3.4.3.2"><mtext class="ltx_mathvariant_monospace" id="alg2.l20.m1.3.4.3.2.cmml" xref="alg2.l20.m1.3.4.3.2">MO PPO</mtext></ci><vector id="alg2.l20.m1.3.4.3.3.1.cmml" xref="alg2.l20.m1.3.4.3.3.2"><ci id="alg2.l20.m1.1.1.cmml" xref="alg2.l20.m1.1.1">𝑚</ci><ci id="alg2.l20.m1.2.2.cmml" xref="alg2.l20.m1.2.2">𝑤</ci><ci id="alg2.l20.m1.3.3.cmml" xref="alg2.l20.m1.3.3">𝜃</ci></vector></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l20.m1.3c">\theta_{\text{new}}\leftarrow\texttt{MO PPO}(m,w,\theta)</annotation><annotation encoding="application/x-llamapun" id="alg2.l20.m1.3d">italic_θ start_POSTSUBSCRIPT new end_POSTSUBSCRIPT ← MO PPO ( italic_m , italic_w , italic_θ )</annotation></semantics></math> {Train RL algorithm with weight <math alttext="w" class="ltx_Math" display="inline" id="alg2.l20.m2.1"><semantics id="alg2.l20.m2.1a"><mi id="alg2.l20.m2.1.1" xref="alg2.l20.m2.1.1.cmml">w</mi><annotation-xml encoding="MathML-Content" id="alg2.l20.m2.1b"><ci id="alg2.l20.m2.1.1.cmml" xref="alg2.l20.m2.1.1">𝑤</ci></annotation-xml><annotation encoding="application/x-tex" id="alg2.l20.m2.1c">w</annotation><annotation encoding="application/x-llamapun" id="alg2.l20.m2.1d">italic_w</annotation></semantics></math>} </div> <div class="ltx_listingline" id="alg2.l21"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l21.1.1.1" style="font-size:80%;">21:</span></span>     <math alttext="\mathbfcal{V}\leftarrow\texttt{MO PPO}(m,w,\theta_{\text{new}})" class="ltx_Math" display="inline" id="alg2.l21.m1.1"><semantics id="alg2.l21.m1.1a"><mrow id="alg2.l21.m1.1.1" xref="alg2.l21.m1.1.1.cmml"><mi id="alg2.l21.m1.1.1.2" mathvariant="normal" xref="alg2.l21.m1.1.1.2.cmml">𝒱</mi><mo id="alg2.l21.m1.1.1.1" stretchy="false" xref="alg2.l21.m1.1.1.1.cmml">←</mo><mrow id="alg2.l21.m1.1.1.3" xref="alg2.l21.m1.1.1.3.cmml"><mtext class="ltx_mathvariant_monospace" id="alg2.l21.m1.1.1.3.2" xref="alg2.l21.m1.1.1.3.2a.cmml">MO PPO</mtext><mo id="alg2.l21.m1.1.1.3.1" xref="alg2.l21.m1.1.1.3.1.cmml">⁢</mo><mi id="alg2.l21.m1.1.1.3.3" mathvariant="normal" xref="alg2.l21.m1.1.1.3.3.cmml">⇐</mi><mo id="alg2.l21.m1.1.1.3.1a" xref="alg2.l21.m1.1.1.3.1.cmml">⁢</mo><mi id="alg2.l21.m1.1.1.3.4" mathvariant="normal" xref="alg2.l21.m1.1.1.3.4.cmml">⇕</mi><mo id="alg2.l21.m1.1.1.3.1b" xref="alg2.l21.m1.1.1.3.1.cmml">⁢</mo><mi id="alg2.l21.m1.1.1.3.5" mathvariant="normal" xref="alg2.l21.m1.1.1.3.5.cmml">⇔</mi><mo id="alg2.l21.m1.1.1.3.1c" xref="alg2.l21.m1.1.1.3.1.cmml">⁢</mo><mi id="alg2.l21.m1.1.1.3.6" mathvariant="normal" xref="alg2.l21.m1.1.1.3.6.cmml">⊒</mi><mo id="alg2.l21.m1.1.1.3.1d" xref="alg2.l21.m1.1.1.3.1.cmml">⁢</mo><mi id="alg2.l21.m1.1.1.3.7" mathvariant="normal" xref="alg2.l21.m1.1.1.3.7.cmml">⇔</mi><mo id="alg2.l21.m1.1.1.3.1e" xref="alg2.l21.m1.1.1.3.1.cmml">⁢</mo><msub id="alg2.l21.m1.1.1.3.8" xref="alg2.l21.m1.1.1.3.8.cmml"><mi id="alg2.l21.m1.1.1.3.8.2" xref="alg2.l21.m1.1.1.3.8.2.cmml">θ</mi><mtext id="alg2.l21.m1.1.1.3.8.3" xref="alg2.l21.m1.1.1.3.8.3a.cmml">new</mtext></msub><mo id="alg2.l21.m1.1.1.3.1f" xref="alg2.l21.m1.1.1.3.1.cmml">⁢</mo><mi id="alg2.l21.m1.1.1.3.9" mathvariant="normal" xref="alg2.l21.m1.1.1.3.9.cmml">⇒</mi></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg2.l21.m1.1b"><apply id="alg2.l21.m1.1.1.cmml" xref="alg2.l21.m1.1.1"><ci id="alg2.l21.m1.1.1.1.cmml" xref="alg2.l21.m1.1.1.1">←</ci><ci id="alg2.l21.m1.1.1.2.cmml" xref="alg2.l21.m1.1.1.2">𝒱</ci><apply id="alg2.l21.m1.1.1.3.cmml" xref="alg2.l21.m1.1.1.3"><times id="alg2.l21.m1.1.1.3.1.cmml" xref="alg2.l21.m1.1.1.3.1"></times><ci id="alg2.l21.m1.1.1.3.2a.cmml" xref="alg2.l21.m1.1.1.3.2"><mtext class="ltx_mathvariant_monospace" id="alg2.l21.m1.1.1.3.2.cmml" xref="alg2.l21.m1.1.1.3.2">MO PPO</mtext></ci><ci id="alg2.l21.m1.1.1.3.3.cmml" xref="alg2.l21.m1.1.1.3.3">⇐</ci><ci id="alg2.l21.m1.1.1.3.4.cmml" xref="alg2.l21.m1.1.1.3.4">⇕</ci><ci id="alg2.l21.m1.1.1.3.5.cmml" xref="alg2.l21.m1.1.1.3.5">⇔</ci><ci id="alg2.l21.m1.1.1.3.6.cmml" xref="alg2.l21.m1.1.1.3.6">⊒</ci><ci id="alg2.l21.m1.1.1.3.7.cmml" xref="alg2.l21.m1.1.1.3.7">⇔</ci><apply id="alg2.l21.m1.1.1.3.8.cmml" xref="alg2.l21.m1.1.1.3.8"><csymbol cd="ambiguous" id="alg2.l21.m1.1.1.3.8.1.cmml" xref="alg2.l21.m1.1.1.3.8">subscript</csymbol><ci id="alg2.l21.m1.1.1.3.8.2.cmml" xref="alg2.l21.m1.1.1.3.8.2">𝜃</ci><ci id="alg2.l21.m1.1.1.3.8.3a.cmml" xref="alg2.l21.m1.1.1.3.8.3"><mtext id="alg2.l21.m1.1.1.3.8.3.cmml" mathsize="70%" xref="alg2.l21.m1.1.1.3.8.3">new</mtext></ci></apply><ci id="alg2.l21.m1.1.1.3.9.cmml" xref="alg2.l21.m1.1.1.3.9">⇒</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l21.m1.1c">\mathbfcal{V}\leftarrow\texttt{MO PPO}(m,w,\theta_{\text{new}})</annotation><annotation encoding="application/x-llamapun" id="alg2.l21.m1.1d">roman_𝒱 ← MO PPO ⇐ ⇕ ⇔ ⊒ ⇔ italic_θ start_POSTSUBSCRIPT new end_POSTSUBSCRIPT ⇒</annotation></semantics></math> {Evaluate RL algorithm with weight <math alttext="w" class="ltx_Math" display="inline" id="alg2.l21.m2.1"><semantics id="alg2.l21.m2.1a"><mi id="alg2.l21.m2.1.1" xref="alg2.l21.m2.1.1.cmml">w</mi><annotation-xml encoding="MathML-Content" id="alg2.l21.m2.1b"><ci id="alg2.l21.m2.1.1.cmml" xref="alg2.l21.m2.1.1">𝑤</ci></annotation-xml><annotation encoding="application/x-tex" id="alg2.l21.m2.1c">w</annotation><annotation encoding="application/x-llamapun" id="alg2.l21.m2.1d">italic_w</annotation></semantics></math>} </div> <div class="ltx_listingline" id="alg2.l22"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l22.1.1.1" style="font-size:80%;">22:</span></span>     <math alttext="W\leftarrow W\cup\{w\}" class="ltx_Math" display="inline" id="alg2.l22.m1.1"><semantics id="alg2.l22.m1.1a"><mrow id="alg2.l22.m1.1.2" xref="alg2.l22.m1.1.2.cmml"><mi id="alg2.l22.m1.1.2.2" xref="alg2.l22.m1.1.2.2.cmml">W</mi><mo id="alg2.l22.m1.1.2.1" stretchy="false" xref="alg2.l22.m1.1.2.1.cmml">←</mo><mrow id="alg2.l22.m1.1.2.3" xref="alg2.l22.m1.1.2.3.cmml"><mi id="alg2.l22.m1.1.2.3.2" xref="alg2.l22.m1.1.2.3.2.cmml">W</mi><mo id="alg2.l22.m1.1.2.3.1" xref="alg2.l22.m1.1.2.3.1.cmml">∪</mo><mrow id="alg2.l22.m1.1.2.3.3.2" xref="alg2.l22.m1.1.2.3.3.1.cmml"><mo id="alg2.l22.m1.1.2.3.3.2.1" stretchy="false" xref="alg2.l22.m1.1.2.3.3.1.cmml">{</mo><mi id="alg2.l22.m1.1.1" xref="alg2.l22.m1.1.1.cmml">w</mi><mo id="alg2.l22.m1.1.2.3.3.2.2" stretchy="false" xref="alg2.l22.m1.1.2.3.3.1.cmml">}</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg2.l22.m1.1b"><apply id="alg2.l22.m1.1.2.cmml" xref="alg2.l22.m1.1.2"><ci id="alg2.l22.m1.1.2.1.cmml" xref="alg2.l22.m1.1.2.1">←</ci><ci id="alg2.l22.m1.1.2.2.cmml" xref="alg2.l22.m1.1.2.2">𝑊</ci><apply id="alg2.l22.m1.1.2.3.cmml" xref="alg2.l22.m1.1.2.3"><union id="alg2.l22.m1.1.2.3.1.cmml" xref="alg2.l22.m1.1.2.3.1"></union><ci id="alg2.l22.m1.1.2.3.2.cmml" xref="alg2.l22.m1.1.2.3.2">𝑊</ci><set id="alg2.l22.m1.1.2.3.3.1.cmml" xref="alg2.l22.m1.1.2.3.3.2"><ci id="alg2.l22.m1.1.1.cmml" xref="alg2.l22.m1.1.1">𝑤</ci></set></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l22.m1.1c">W\leftarrow W\cup\{w\}</annotation><annotation encoding="application/x-llamapun" id="alg2.l22.m1.1d">italic_W ← italic_W ∪ { italic_w }</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg2.l23"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l23.1.1.1" style="font-size:80%;">23:</span></span>     <span class="ltx_text ltx_font_bold" id="alg2.l23.2">if</span> there exists <math alttext="w^{\prime}" class="ltx_Math" display="inline" id="alg2.l23.m1.1"><semantics id="alg2.l23.m1.1a"><msup id="alg2.l23.m1.1.1" xref="alg2.l23.m1.1.1.cmml"><mi id="alg2.l23.m1.1.1.2" xref="alg2.l23.m1.1.1.2.cmml">w</mi><mo id="alg2.l23.m1.1.1.3" xref="alg2.l23.m1.1.1.3.cmml">′</mo></msup><annotation-xml encoding="MathML-Content" id="alg2.l23.m1.1b"><apply id="alg2.l23.m1.1.1.cmml" xref="alg2.l23.m1.1.1"><csymbol cd="ambiguous" id="alg2.l23.m1.1.1.1.cmml" xref="alg2.l23.m1.1.1">superscript</csymbol><ci id="alg2.l23.m1.1.1.2.cmml" xref="alg2.l23.m1.1.1.2">𝑤</ci><ci id="alg2.l23.m1.1.1.3.cmml" xref="alg2.l23.m1.1.1.3">′</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l23.m1.1c">w^{\prime}</annotation><annotation encoding="application/x-llamapun" id="alg2.l23.m1.1d">italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT</annotation></semantics></math> such that <math alttext="w^{\prime}\cdot\mathbfcal{V}&gt;\max\limits_{U\in\Omega^{s}}w^{\prime}\cdot U" class="ltx_Math" display="inline" id="alg2.l23.m2.1"><semantics id="alg2.l23.m2.1a"><mrow id="alg2.l23.m2.1.1" xref="alg2.l23.m2.1.1.cmml"><mrow id="alg2.l23.m2.1.1.2" xref="alg2.l23.m2.1.1.2.cmml"><msup id="alg2.l23.m2.1.1.2.2" xref="alg2.l23.m2.1.1.2.2.cmml"><mi id="alg2.l23.m2.1.1.2.2.2" xref="alg2.l23.m2.1.1.2.2.2.cmml">w</mi><mo id="alg2.l23.m2.1.1.2.2.3" xref="alg2.l23.m2.1.1.2.2.3.cmml">′</mo></msup><mo id="alg2.l23.m2.1.1.2.1" lspace="0.222em" rspace="0.222em" xref="alg2.l23.m2.1.1.2.1.cmml">⋅</mo><mi id="alg2.l23.m2.1.1.2.3" mathvariant="normal" xref="alg2.l23.m2.1.1.2.3.cmml">𝒱</mi></mrow><mo id="alg2.l23.m2.1.1.1" xref="alg2.l23.m2.1.1.1.cmml">⁢</mo><mi id="alg2.l23.m2.1.1.3" mathvariant="normal" xref="alg2.l23.m2.1.1.3.cmml">⊤</mi><mo id="alg2.l23.m2.1.1.1a" lspace="0.167em" xref="alg2.l23.m2.1.1.1.cmml">⁢</mo><mrow id="alg2.l23.m2.1.1.4" xref="alg2.l23.m2.1.1.4.cmml"><munder id="alg2.l23.m2.1.1.4.1" xref="alg2.l23.m2.1.1.4.1.cmml"><mi id="alg2.l23.m2.1.1.4.1.2" xref="alg2.l23.m2.1.1.4.1.2.cmml">max</mi><mrow id="alg2.l23.m2.1.1.4.1.3" xref="alg2.l23.m2.1.1.4.1.3.cmml"><mi id="alg2.l23.m2.1.1.4.1.3.2" mathvariant="normal" xref="alg2.l23.m2.1.1.4.1.3.2.cmml">𝒰</mi><mo id="alg2.l23.m2.1.1.4.1.3.1" xref="alg2.l23.m2.1.1.4.1.3.1.cmml">∈</mo><msup id="alg2.l23.m2.1.1.4.1.3.3" xref="alg2.l23.m2.1.1.4.1.3.3.cmml"><mi id="alg2.l23.m2.1.1.4.1.3.3.2" mathvariant="normal" xref="alg2.l23.m2.1.1.4.1.3.3.2.cmml">Ω</mi><mi id="alg2.l23.m2.1.1.4.1.3.3.3" mathvariant="normal" xref="alg2.l23.m2.1.1.4.1.3.3.3.cmml">∫</mi></msup></mrow></munder><mo id="alg2.l23.m2.1.1.4a" lspace="0.167em" xref="alg2.l23.m2.1.1.4.cmml">⁡</mo><mrow id="alg2.l23.m2.1.1.4.2" xref="alg2.l23.m2.1.1.4.2.cmml"><msup id="alg2.l23.m2.1.1.4.2.2" xref="alg2.l23.m2.1.1.4.2.2.cmml"><mi id="alg2.l23.m2.1.1.4.2.2.2" mathvariant="normal" xref="alg2.l23.m2.1.1.4.2.2.2.cmml">⊒</mi><mo id="alg2.l23.m2.1.1.4.2.2.3" xref="alg2.l23.m2.1.1.4.2.2.3.cmml">′</mo></msup><mo id="alg2.l23.m2.1.1.4.2.1" lspace="0.222em" rspace="0.222em" xref="alg2.l23.m2.1.1.4.2.1.cmml">⋅</mo><mi id="alg2.l23.m2.1.1.4.2.3" mathvariant="normal" xref="alg2.l23.m2.1.1.4.2.3.cmml">𝒰</mi></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg2.l23.m2.1b"><apply id="alg2.l23.m2.1.1.cmml" xref="alg2.l23.m2.1.1"><times id="alg2.l23.m2.1.1.1.cmml" xref="alg2.l23.m2.1.1.1"></times><apply id="alg2.l23.m2.1.1.2.cmml" xref="alg2.l23.m2.1.1.2"><ci id="alg2.l23.m2.1.1.2.1.cmml" xref="alg2.l23.m2.1.1.2.1">⋅</ci><apply id="alg2.l23.m2.1.1.2.2.cmml" xref="alg2.l23.m2.1.1.2.2"><csymbol cd="ambiguous" id="alg2.l23.m2.1.1.2.2.1.cmml" xref="alg2.l23.m2.1.1.2.2">superscript</csymbol><ci id="alg2.l23.m2.1.1.2.2.2.cmml" xref="alg2.l23.m2.1.1.2.2.2">𝑤</ci><ci id="alg2.l23.m2.1.1.2.2.3.cmml" xref="alg2.l23.m2.1.1.2.2.3">′</ci></apply><ci id="alg2.l23.m2.1.1.2.3.cmml" xref="alg2.l23.m2.1.1.2.3">𝒱</ci></apply><ci id="alg2.l23.m2.1.1.3.cmml" xref="alg2.l23.m2.1.1.3">⊤</ci><apply id="alg2.l23.m2.1.1.4.cmml" xref="alg2.l23.m2.1.1.4"><apply id="alg2.l23.m2.1.1.4.1.cmml" xref="alg2.l23.m2.1.1.4.1"><csymbol cd="ambiguous" id="alg2.l23.m2.1.1.4.1.1.cmml" xref="alg2.l23.m2.1.1.4.1">subscript</csymbol><max id="alg2.l23.m2.1.1.4.1.2.cmml" xref="alg2.l23.m2.1.1.4.1.2"></max><apply id="alg2.l23.m2.1.1.4.1.3.cmml" xref="alg2.l23.m2.1.1.4.1.3"><in id="alg2.l23.m2.1.1.4.1.3.1.cmml" xref="alg2.l23.m2.1.1.4.1.3.1"></in><ci id="alg2.l23.m2.1.1.4.1.3.2.cmml" xref="alg2.l23.m2.1.1.4.1.3.2">𝒰</ci><apply id="alg2.l23.m2.1.1.4.1.3.3.cmml" xref="alg2.l23.m2.1.1.4.1.3.3"><csymbol cd="ambiguous" id="alg2.l23.m2.1.1.4.1.3.3.1.cmml" xref="alg2.l23.m2.1.1.4.1.3.3">superscript</csymbol><ci id="alg2.l23.m2.1.1.4.1.3.3.2.cmml" xref="alg2.l23.m2.1.1.4.1.3.3.2">Ω</ci><ci id="alg2.l23.m2.1.1.4.1.3.3.3.cmml" xref="alg2.l23.m2.1.1.4.1.3.3.3">∫</ci></apply></apply></apply><apply id="alg2.l23.m2.1.1.4.2.cmml" xref="alg2.l23.m2.1.1.4.2"><ci id="alg2.l23.m2.1.1.4.2.1.cmml" xref="alg2.l23.m2.1.1.4.2.1">⋅</ci><apply id="alg2.l23.m2.1.1.4.2.2.cmml" xref="alg2.l23.m2.1.1.4.2.2"><csymbol cd="ambiguous" id="alg2.l23.m2.1.1.4.2.2.1.cmml" xref="alg2.l23.m2.1.1.4.2.2">superscript</csymbol><ci id="alg2.l23.m2.1.1.4.2.2.2.cmml" xref="alg2.l23.m2.1.1.4.2.2.2">⊒</ci><ci id="alg2.l23.m2.1.1.4.2.2.3.cmml" xref="alg2.l23.m2.1.1.4.2.2.3">′</ci></apply><ci id="alg2.l23.m2.1.1.4.2.3.cmml" xref="alg2.l23.m2.1.1.4.2.3">𝒰</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l23.m2.1c">w^{\prime}\cdot\mathbfcal{V}&gt;\max\limits_{U\in\Omega^{s}}w^{\prime}\cdot U</annotation><annotation encoding="application/x-llamapun" id="alg2.l23.m2.1d">italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ roman_𝒱 ⊤ roman_max start_POSTSUBSCRIPT roman_𝒰 ∈ bold_symbol_Ω start_POSTSUPERSCRIPT ∫ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ⊒ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ roman_𝒰</annotation></semantics></math> <span class="ltx_text ltx_font_bold" id="alg2.l23.3">then</span> </div> <div class="ltx_listingline" id="alg2.l24"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l24.1.1.1" style="font-size:80%;">24:</span></span>        Remove corner weights made obsolete by <math alttext="\mathbfcal{V}" class="ltx_Math" display="inline" id="alg2.l24.m1.1"><semantics id="alg2.l24.m1.1a"><mi id="alg2.l24.m1.1.1" mathvariant="normal" xref="alg2.l24.m1.1.1.cmml">𝒱</mi><annotation-xml encoding="MathML-Content" id="alg2.l24.m1.1b"><ci id="alg2.l24.m1.1.1.cmml" xref="alg2.l24.m1.1.1">𝒱</ci></annotation-xml><annotation encoding="application/x-tex" id="alg2.l24.m1.1c">\mathbfcal{V}</annotation><annotation encoding="application/x-llamapun" id="alg2.l24.m1.1d">roman_𝒱</annotation></semantics></math> from <math alttext="Q" class="ltx_Math" display="inline" id="alg2.l24.m2.1"><semantics id="alg2.l24.m2.1a"><mi id="alg2.l24.m2.1.1" xref="alg2.l24.m2.1.1.cmml">Q</mi><annotation-xml encoding="MathML-Content" id="alg2.l24.m2.1b"><ci id="alg2.l24.m2.1.1.cmml" xref="alg2.l24.m2.1.1">𝑄</ci></annotation-xml><annotation encoding="application/x-tex" id="alg2.l24.m2.1c">Q</annotation><annotation encoding="application/x-llamapun" id="alg2.l24.m2.1d">italic_Q</annotation></semantics></math>, store them in <math alttext="W_{\text{del}}" class="ltx_Math" display="inline" id="alg2.l24.m3.1"><semantics id="alg2.l24.m3.1a"><msub id="alg2.l24.m3.1.1" xref="alg2.l24.m3.1.1.cmml"><mi id="alg2.l24.m3.1.1.2" xref="alg2.l24.m3.1.1.2.cmml">W</mi><mtext id="alg2.l24.m3.1.1.3" xref="alg2.l24.m3.1.1.3a.cmml">del</mtext></msub><annotation-xml encoding="MathML-Content" id="alg2.l24.m3.1b"><apply id="alg2.l24.m3.1.1.cmml" xref="alg2.l24.m3.1.1"><csymbol cd="ambiguous" id="alg2.l24.m3.1.1.1.cmml" xref="alg2.l24.m3.1.1">subscript</csymbol><ci id="alg2.l24.m3.1.1.2.cmml" xref="alg2.l24.m3.1.1.2">𝑊</ci><ci id="alg2.l24.m3.1.1.3a.cmml" xref="alg2.l24.m3.1.1.3"><mtext id="alg2.l24.m3.1.1.3.cmml" mathsize="70%" xref="alg2.l24.m3.1.1.3">del</mtext></ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l24.m3.1c">W_{\text{del}}</annotation><annotation encoding="application/x-llamapun" id="alg2.l24.m3.1d">italic_W start_POSTSUBSCRIPT del end_POSTSUBSCRIPT</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg2.l25"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l25.1.1.1" style="font-size:80%;">25:</span></span>        <math alttext="W_{\text{del}}\leftarrow W_{\text{del}}\cup\{w\}" class="ltx_Math" display="inline" id="alg2.l25.m1.1"><semantics id="alg2.l25.m1.1a"><mrow id="alg2.l25.m1.1.2" xref="alg2.l25.m1.1.2.cmml"><msub id="alg2.l25.m1.1.2.2" xref="alg2.l25.m1.1.2.2.cmml"><mi id="alg2.l25.m1.1.2.2.2" xref="alg2.l25.m1.1.2.2.2.cmml">W</mi><mtext id="alg2.l25.m1.1.2.2.3" xref="alg2.l25.m1.1.2.2.3a.cmml">del</mtext></msub><mo id="alg2.l25.m1.1.2.1" stretchy="false" xref="alg2.l25.m1.1.2.1.cmml">←</mo><mrow id="alg2.l25.m1.1.2.3" xref="alg2.l25.m1.1.2.3.cmml"><msub id="alg2.l25.m1.1.2.3.2" xref="alg2.l25.m1.1.2.3.2.cmml"><mi id="alg2.l25.m1.1.2.3.2.2" xref="alg2.l25.m1.1.2.3.2.2.cmml">W</mi><mtext id="alg2.l25.m1.1.2.3.2.3" xref="alg2.l25.m1.1.2.3.2.3a.cmml">del</mtext></msub><mo id="alg2.l25.m1.1.2.3.1" xref="alg2.l25.m1.1.2.3.1.cmml">∪</mo><mrow id="alg2.l25.m1.1.2.3.3.2" xref="alg2.l25.m1.1.2.3.3.1.cmml"><mo id="alg2.l25.m1.1.2.3.3.2.1" stretchy="false" xref="alg2.l25.m1.1.2.3.3.1.cmml">{</mo><mi id="alg2.l25.m1.1.1" xref="alg2.l25.m1.1.1.cmml">w</mi><mo id="alg2.l25.m1.1.2.3.3.2.2" stretchy="false" xref="alg2.l25.m1.1.2.3.3.1.cmml">}</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg2.l25.m1.1b"><apply id="alg2.l25.m1.1.2.cmml" xref="alg2.l25.m1.1.2"><ci id="alg2.l25.m1.1.2.1.cmml" xref="alg2.l25.m1.1.2.1">←</ci><apply id="alg2.l25.m1.1.2.2.cmml" xref="alg2.l25.m1.1.2.2"><csymbol cd="ambiguous" id="alg2.l25.m1.1.2.2.1.cmml" xref="alg2.l25.m1.1.2.2">subscript</csymbol><ci id="alg2.l25.m1.1.2.2.2.cmml" xref="alg2.l25.m1.1.2.2.2">𝑊</ci><ci id="alg2.l25.m1.1.2.2.3a.cmml" xref="alg2.l25.m1.1.2.2.3"><mtext id="alg2.l25.m1.1.2.2.3.cmml" mathsize="70%" xref="alg2.l25.m1.1.2.2.3">del</mtext></ci></apply><apply id="alg2.l25.m1.1.2.3.cmml" xref="alg2.l25.m1.1.2.3"><union id="alg2.l25.m1.1.2.3.1.cmml" xref="alg2.l25.m1.1.2.3.1"></union><apply id="alg2.l25.m1.1.2.3.2.cmml" xref="alg2.l25.m1.1.2.3.2"><csymbol cd="ambiguous" id="alg2.l25.m1.1.2.3.2.1.cmml" xref="alg2.l25.m1.1.2.3.2">subscript</csymbol><ci id="alg2.l25.m1.1.2.3.2.2.cmml" xref="alg2.l25.m1.1.2.3.2.2">𝑊</ci><ci id="alg2.l25.m1.1.2.3.2.3a.cmml" xref="alg2.l25.m1.1.2.3.2.3"><mtext id="alg2.l25.m1.1.2.3.2.3.cmml" mathsize="70%" xref="alg2.l25.m1.1.2.3.2.3">del</mtext></ci></apply><set id="alg2.l25.m1.1.2.3.3.1.cmml" xref="alg2.l25.m1.1.2.3.3.2"><ci id="alg2.l25.m1.1.1.cmml" xref="alg2.l25.m1.1.1">𝑤</ci></set></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l25.m1.1c">W_{\text{del}}\leftarrow W_{\text{del}}\cup\{w\}</annotation><annotation encoding="application/x-llamapun" id="alg2.l25.m1.1d">italic_W start_POSTSUBSCRIPT del end_POSTSUBSCRIPT ← italic_W start_POSTSUBSCRIPT del end_POSTSUBSCRIPT ∪ { italic_w }</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg2.l26"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l26.1.1.1" style="font-size:80%;">26:</span></span>        <math alttext="\Omega^{s}\leftarrow\Omega^{s}\cup\{\mathbfcal{V}\}" class="ltx_Math" display="inline" id="alg2.l26.m1.1"><semantics id="alg2.l26.m1.1a"><mrow id="alg2.l26.m1.1.2" xref="alg2.l26.m1.1.2.cmml"><msup id="alg2.l26.m1.1.2.2" xref="alg2.l26.m1.1.2.2.cmml"><mi id="alg2.l26.m1.1.2.2.2" mathvariant="normal" xref="alg2.l26.m1.1.2.2.2.cmml">Ω</mi><mi id="alg2.l26.m1.1.2.2.3" xref="alg2.l26.m1.1.2.2.3.cmml">s</mi></msup><mo id="alg2.l26.m1.1.2.1" stretchy="false" xref="alg2.l26.m1.1.2.1.cmml">←</mo><mrow id="alg2.l26.m1.1.2.3" xref="alg2.l26.m1.1.2.3.cmml"><msup id="alg2.l26.m1.1.2.3.2" xref="alg2.l26.m1.1.2.3.2.cmml"><mi id="alg2.l26.m1.1.2.3.2.2" mathvariant="normal" xref="alg2.l26.m1.1.2.3.2.2.cmml">Ω</mi><mi id="alg2.l26.m1.1.2.3.2.3" xref="alg2.l26.m1.1.2.3.2.3.cmml">s</mi></msup><mo id="alg2.l26.m1.1.2.3.1" xref="alg2.l26.m1.1.2.3.1.cmml">∪</mo><mrow id="alg2.l26.m1.1.2.3.3.2" xref="alg2.l26.m1.1.2.3.3.1.cmml"><mo id="alg2.l26.m1.1.2.3.3.2.1" stretchy="false" xref="alg2.l26.m1.1.2.3.3.1.cmml">{</mo><mi id="alg2.l26.m1.1.1" mathvariant="normal" xref="alg2.l26.m1.1.1.cmml">𝒱</mi><mo id="alg2.l26.m1.1.2.3.3.2.2" stretchy="false" xref="alg2.l26.m1.1.2.3.3.1.cmml">}</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg2.l26.m1.1b"><apply id="alg2.l26.m1.1.2.cmml" xref="alg2.l26.m1.1.2"><ci id="alg2.l26.m1.1.2.1.cmml" xref="alg2.l26.m1.1.2.1">←</ci><apply id="alg2.l26.m1.1.2.2.cmml" xref="alg2.l26.m1.1.2.2"><csymbol cd="ambiguous" id="alg2.l26.m1.1.2.2.1.cmml" xref="alg2.l26.m1.1.2.2">superscript</csymbol><ci id="alg2.l26.m1.1.2.2.2.cmml" xref="alg2.l26.m1.1.2.2.2">Ω</ci><ci id="alg2.l26.m1.1.2.2.3.cmml" xref="alg2.l26.m1.1.2.2.3">𝑠</ci></apply><apply id="alg2.l26.m1.1.2.3.cmml" xref="alg2.l26.m1.1.2.3"><union id="alg2.l26.m1.1.2.3.1.cmml" xref="alg2.l26.m1.1.2.3.1"></union><apply id="alg2.l26.m1.1.2.3.2.cmml" xref="alg2.l26.m1.1.2.3.2"><csymbol cd="ambiguous" id="alg2.l26.m1.1.2.3.2.1.cmml" xref="alg2.l26.m1.1.2.3.2">superscript</csymbol><ci id="alg2.l26.m1.1.2.3.2.2.cmml" xref="alg2.l26.m1.1.2.3.2.2">Ω</ci><ci id="alg2.l26.m1.1.2.3.2.3.cmml" xref="alg2.l26.m1.1.2.3.2.3">𝑠</ci></apply><set id="alg2.l26.m1.1.2.3.3.1.cmml" xref="alg2.l26.m1.1.2.3.3.2"><ci id="alg2.l26.m1.1.1.cmml" xref="alg2.l26.m1.1.1">𝒱</ci></set></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l26.m1.1c">\Omega^{s}\leftarrow\Omega^{s}\cup\{\mathbfcal{V}\}</annotation><annotation encoding="application/x-llamapun" id="alg2.l26.m1.1d">roman_Ω start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ← roman_Ω start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ∪ { roman_𝒱 }</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg2.l27"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l27.1.1.1" style="font-size:80%;">27:</span></span>        Remove vectors from <math alttext="\Omega^{s}" class="ltx_Math" display="inline" id="alg2.l27.m1.1"><semantics id="alg2.l27.m1.1a"><msup id="alg2.l27.m1.1.1" xref="alg2.l27.m1.1.1.cmml"><mi id="alg2.l27.m1.1.1.2" mathvariant="normal" xref="alg2.l27.m1.1.1.2.cmml">Ω</mi><mi id="alg2.l27.m1.1.1.3" xref="alg2.l27.m1.1.1.3.cmml">s</mi></msup><annotation-xml encoding="MathML-Content" id="alg2.l27.m1.1b"><apply id="alg2.l27.m1.1.1.cmml" xref="alg2.l27.m1.1.1"><csymbol cd="ambiguous" id="alg2.l27.m1.1.1.1.cmml" xref="alg2.l27.m1.1.1">superscript</csymbol><ci id="alg2.l27.m1.1.1.2.cmml" xref="alg2.l27.m1.1.1.2">Ω</ci><ci id="alg2.l27.m1.1.1.3.cmml" xref="alg2.l27.m1.1.1.3">𝑠</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l27.m1.1c">\Omega^{s}</annotation><annotation encoding="application/x-llamapun" id="alg2.l27.m1.1d">roman_Ω start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT</annotation></semantics></math> that are no longer optimal after adding <math alttext="\mathbfcal{V}" class="ltx_Math" display="inline" id="alg2.l27.m2.1"><semantics id="alg2.l27.m2.1a"><mi id="alg2.l27.m2.1.1" mathvariant="normal" xref="alg2.l27.m2.1.1.cmml">𝒱</mi><annotation-xml encoding="MathML-Content" id="alg2.l27.m2.1b"><ci id="alg2.l27.m2.1.1.cmml" xref="alg2.l27.m2.1.1">𝒱</ci></annotation-xml><annotation encoding="application/x-tex" id="alg2.l27.m2.1c">\mathbfcal{V}</annotation><annotation encoding="application/x-llamapun" id="alg2.l27.m2.1d">roman_𝒱</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg2.l28"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l28.1.1.1" style="font-size:80%;">28:</span></span>        <math alttext="\text{Models}[w]\leftarrow\theta_{\text{new}}" class="ltx_Math" display="inline" id="alg2.l28.m1.1"><semantics id="alg2.l28.m1.1a"><mrow id="alg2.l28.m1.1.2" xref="alg2.l28.m1.1.2.cmml"><mrow id="alg2.l28.m1.1.2.2" xref="alg2.l28.m1.1.2.2.cmml"><mtext id="alg2.l28.m1.1.2.2.2" xref="alg2.l28.m1.1.2.2.2a.cmml">Models</mtext><mo id="alg2.l28.m1.1.2.2.1" xref="alg2.l28.m1.1.2.2.1.cmml">⁢</mo><mrow id="alg2.l28.m1.1.2.2.3.2" xref="alg2.l28.m1.1.2.2.3.1.cmml"><mo id="alg2.l28.m1.1.2.2.3.2.1" stretchy="false" xref="alg2.l28.m1.1.2.2.3.1.1.cmml">[</mo><mi id="alg2.l28.m1.1.1" xref="alg2.l28.m1.1.1.cmml">w</mi><mo id="alg2.l28.m1.1.2.2.3.2.2" stretchy="false" xref="alg2.l28.m1.1.2.2.3.1.1.cmml">]</mo></mrow></mrow><mo id="alg2.l28.m1.1.2.1" stretchy="false" xref="alg2.l28.m1.1.2.1.cmml">←</mo><msub id="alg2.l28.m1.1.2.3" xref="alg2.l28.m1.1.2.3.cmml"><mi id="alg2.l28.m1.1.2.3.2" xref="alg2.l28.m1.1.2.3.2.cmml">θ</mi><mtext id="alg2.l28.m1.1.2.3.3" xref="alg2.l28.m1.1.2.3.3a.cmml">new</mtext></msub></mrow><annotation-xml encoding="MathML-Content" id="alg2.l28.m1.1b"><apply id="alg2.l28.m1.1.2.cmml" xref="alg2.l28.m1.1.2"><ci id="alg2.l28.m1.1.2.1.cmml" xref="alg2.l28.m1.1.2.1">←</ci><apply id="alg2.l28.m1.1.2.2.cmml" xref="alg2.l28.m1.1.2.2"><times id="alg2.l28.m1.1.2.2.1.cmml" xref="alg2.l28.m1.1.2.2.1"></times><ci id="alg2.l28.m1.1.2.2.2a.cmml" xref="alg2.l28.m1.1.2.2.2"><mtext id="alg2.l28.m1.1.2.2.2.cmml" xref="alg2.l28.m1.1.2.2.2">Models</mtext></ci><apply id="alg2.l28.m1.1.2.2.3.1.cmml" xref="alg2.l28.m1.1.2.2.3.2"><csymbol cd="latexml" id="alg2.l28.m1.1.2.2.3.1.1.cmml" xref="alg2.l28.m1.1.2.2.3.2.1">delimited-[]</csymbol><ci id="alg2.l28.m1.1.1.cmml" xref="alg2.l28.m1.1.1">𝑤</ci></apply></apply><apply id="alg2.l28.m1.1.2.3.cmml" xref="alg2.l28.m1.1.2.3"><csymbol cd="ambiguous" id="alg2.l28.m1.1.2.3.1.cmml" xref="alg2.l28.m1.1.2.3">subscript</csymbol><ci id="alg2.l28.m1.1.2.3.2.cmml" xref="alg2.l28.m1.1.2.3.2">𝜃</ci><ci id="alg2.l28.m1.1.2.3.3a.cmml" xref="alg2.l28.m1.1.2.3.3"><mtext id="alg2.l28.m1.1.2.3.3.cmml" mathsize="70%" xref="alg2.l28.m1.1.2.3.3">new</mtext></ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l28.m1.1c">\text{Models}[w]\leftarrow\theta_{\text{new}}</annotation><annotation encoding="application/x-llamapun" id="alg2.l28.m1.1d">Models [ italic_w ] ← italic_θ start_POSTSUBSCRIPT new end_POSTSUBSCRIPT</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg2.l29"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l29.1.1.1" style="font-size:80%;">29:</span></span>        <math alttext="(W_{\text{new}},q_{\text{new}})\leftarrow\texttt{newCornerWeights}(\Omega^{s},V)" class="ltx_Math" display="inline" id="alg2.l29.m1.4"><semantics id="alg2.l29.m1.4a"><mrow id="alg2.l29.m1.4.4" xref="alg2.l29.m1.4.4.cmml"><mrow id="alg2.l29.m1.3.3.2.2" xref="alg2.l29.m1.3.3.2.3.cmml"><mo id="alg2.l29.m1.3.3.2.2.3" stretchy="false" xref="alg2.l29.m1.3.3.2.3.cmml">(</mo><msub id="alg2.l29.m1.2.2.1.1.1" xref="alg2.l29.m1.2.2.1.1.1.cmml"><mi id="alg2.l29.m1.2.2.1.1.1.2" xref="alg2.l29.m1.2.2.1.1.1.2.cmml">W</mi><mtext id="alg2.l29.m1.2.2.1.1.1.3" xref="alg2.l29.m1.2.2.1.1.1.3a.cmml">new</mtext></msub><mo id="alg2.l29.m1.3.3.2.2.4" xref="alg2.l29.m1.3.3.2.3.cmml">,</mo><msub id="alg2.l29.m1.3.3.2.2.2" xref="alg2.l29.m1.3.3.2.2.2.cmml"><mi id="alg2.l29.m1.3.3.2.2.2.2" xref="alg2.l29.m1.3.3.2.2.2.2.cmml">q</mi><mtext id="alg2.l29.m1.3.3.2.2.2.3" xref="alg2.l29.m1.3.3.2.2.2.3a.cmml">new</mtext></msub><mo id="alg2.l29.m1.3.3.2.2.5" stretchy="false" xref="alg2.l29.m1.3.3.2.3.cmml">)</mo></mrow><mo id="alg2.l29.m1.4.4.4" stretchy="false" xref="alg2.l29.m1.4.4.4.cmml">←</mo><mrow id="alg2.l29.m1.4.4.3" xref="alg2.l29.m1.4.4.3.cmml"><mtext class="ltx_mathvariant_monospace" id="alg2.l29.m1.4.4.3.3" xref="alg2.l29.m1.4.4.3.3a.cmml">newCornerWeights</mtext><mo id="alg2.l29.m1.4.4.3.2" xref="alg2.l29.m1.4.4.3.2.cmml">⁢</mo><mrow id="alg2.l29.m1.4.4.3.1.1" xref="alg2.l29.m1.4.4.3.1.2.cmml"><mo id="alg2.l29.m1.4.4.3.1.1.2" stretchy="false" xref="alg2.l29.m1.4.4.3.1.2.cmml">(</mo><msup id="alg2.l29.m1.4.4.3.1.1.1" xref="alg2.l29.m1.4.4.3.1.1.1.cmml"><mi id="alg2.l29.m1.4.4.3.1.1.1.2" mathvariant="normal" xref="alg2.l29.m1.4.4.3.1.1.1.2.cmml">Ω</mi><mi id="alg2.l29.m1.4.4.3.1.1.1.3" xref="alg2.l29.m1.4.4.3.1.1.1.3.cmml">s</mi></msup><mo id="alg2.l29.m1.4.4.3.1.1.3" xref="alg2.l29.m1.4.4.3.1.2.cmml">,</mo><mi id="alg2.l29.m1.1.1" xref="alg2.l29.m1.1.1.cmml">V</mi><mo id="alg2.l29.m1.4.4.3.1.1.4" stretchy="false" xref="alg2.l29.m1.4.4.3.1.2.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg2.l29.m1.4b"><apply id="alg2.l29.m1.4.4.cmml" xref="alg2.l29.m1.4.4"><ci id="alg2.l29.m1.4.4.4.cmml" xref="alg2.l29.m1.4.4.4">←</ci><interval closure="open" id="alg2.l29.m1.3.3.2.3.cmml" xref="alg2.l29.m1.3.3.2.2"><apply id="alg2.l29.m1.2.2.1.1.1.cmml" xref="alg2.l29.m1.2.2.1.1.1"><csymbol cd="ambiguous" id="alg2.l29.m1.2.2.1.1.1.1.cmml" xref="alg2.l29.m1.2.2.1.1.1">subscript</csymbol><ci id="alg2.l29.m1.2.2.1.1.1.2.cmml" xref="alg2.l29.m1.2.2.1.1.1.2">𝑊</ci><ci id="alg2.l29.m1.2.2.1.1.1.3a.cmml" xref="alg2.l29.m1.2.2.1.1.1.3"><mtext id="alg2.l29.m1.2.2.1.1.1.3.cmml" mathsize="70%" xref="alg2.l29.m1.2.2.1.1.1.3">new</mtext></ci></apply><apply id="alg2.l29.m1.3.3.2.2.2.cmml" xref="alg2.l29.m1.3.3.2.2.2"><csymbol cd="ambiguous" id="alg2.l29.m1.3.3.2.2.2.1.cmml" xref="alg2.l29.m1.3.3.2.2.2">subscript</csymbol><ci id="alg2.l29.m1.3.3.2.2.2.2.cmml" xref="alg2.l29.m1.3.3.2.2.2.2">𝑞</ci><ci id="alg2.l29.m1.3.3.2.2.2.3a.cmml" xref="alg2.l29.m1.3.3.2.2.2.3"><mtext id="alg2.l29.m1.3.3.2.2.2.3.cmml" mathsize="70%" xref="alg2.l29.m1.3.3.2.2.2.3">new</mtext></ci></apply></interval><apply id="alg2.l29.m1.4.4.3.cmml" xref="alg2.l29.m1.4.4.3"><times id="alg2.l29.m1.4.4.3.2.cmml" xref="alg2.l29.m1.4.4.3.2"></times><ci id="alg2.l29.m1.4.4.3.3a.cmml" xref="alg2.l29.m1.4.4.3.3"><mtext class="ltx_mathvariant_monospace" id="alg2.l29.m1.4.4.3.3.cmml" xref="alg2.l29.m1.4.4.3.3">newCornerWeights</mtext></ci><interval closure="open" id="alg2.l29.m1.4.4.3.1.2.cmml" xref="alg2.l29.m1.4.4.3.1.1"><apply id="alg2.l29.m1.4.4.3.1.1.1.cmml" xref="alg2.l29.m1.4.4.3.1.1.1"><csymbol cd="ambiguous" id="alg2.l29.m1.4.4.3.1.1.1.1.cmml" xref="alg2.l29.m1.4.4.3.1.1.1">superscript</csymbol><ci id="alg2.l29.m1.4.4.3.1.1.1.2.cmml" xref="alg2.l29.m1.4.4.3.1.1.1.2">Ω</ci><ci id="alg2.l29.m1.4.4.3.1.1.1.3.cmml" xref="alg2.l29.m1.4.4.3.1.1.1.3">𝑠</ci></apply><ci id="alg2.l29.m1.1.1.cmml" xref="alg2.l29.m1.1.1">𝑉</ci></interval></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l29.m1.4c">(W_{\text{new}},q_{\text{new}})\leftarrow\texttt{newCornerWeights}(\Omega^{s},V)</annotation><annotation encoding="application/x-llamapun" id="alg2.l29.m1.4d">( italic_W start_POSTSUBSCRIPT new end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT new end_POSTSUBSCRIPT ) ← newCornerWeights ( roman_Ω start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_V )</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg2.l30"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l30.2.1.1" style="font-size:80%;">30:</span></span>     <span class="ltx_text ltx_font_bold" id="alg2.l30.3">end</span> <span class="ltx_text ltx_font_bold" id="alg2.l30.1">if<math alttext="k+=1" class="ltx_Math" display="inline" id="alg2.l30.1.m1.1"><semantics id="alg2.l30.1.m1.1a"><mrow id="alg2.l30.1.m1.1.1" xref="alg2.l30.1.m1.1.1.cmml"><mrow id="alg2.l30.1.m1.1.1.2" xref="alg2.l30.1.m1.1.1.2.cmml"><mi id="alg2.l30.1.m1.1.1.2.2" xref="alg2.l30.1.m1.1.1.2.2.cmml">k</mi><mo id="alg2.l30.1.m1.1.1.2.3" rspace="0em" xref="alg2.l30.1.m1.1.1.2.3.cmml">+</mo></mrow><mo id="alg2.l30.1.m1.1.1.1" lspace="0em" xref="alg2.l30.1.m1.1.1.1.cmml">=</mo><mn id="alg2.l30.1.m1.1.1.3" xref="alg2.l30.1.m1.1.1.3.cmml">1</mn></mrow><annotation-xml encoding="MathML-Content" id="alg2.l30.1.m1.1b"><apply id="alg2.l30.1.m1.1.1.cmml" xref="alg2.l30.1.m1.1.1"><eq id="alg2.l30.1.m1.1.1.1.cmml" xref="alg2.l30.1.m1.1.1.1"></eq><apply id="alg2.l30.1.m1.1.1.2.cmml" xref="alg2.l30.1.m1.1.1.2"><csymbol cd="latexml" id="alg2.l30.1.m1.1.1.2.1.cmml" xref="alg2.l30.1.m1.1.1.2">limit-from</csymbol><ci id="alg2.l30.1.m1.1.1.2.2.cmml" xref="alg2.l30.1.m1.1.1.2.2">𝑘</ci><plus id="alg2.l30.1.m1.1.1.2.3.cmml" xref="alg2.l30.1.m1.1.1.2.3"></plus></apply><cn id="alg2.l30.1.m1.1.1.3.cmml" type="integer" xref="alg2.l30.1.m1.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l30.1.m1.1c">k+=1</annotation><annotation encoding="application/x-llamapun" id="alg2.l30.1.m1.1d">italic_k + = 1</annotation></semantics></math></span> </div> <div class="ltx_listingline" id="alg2.l31"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l31.1.1.1" style="font-size:80%;">31:</span></span>  <span class="ltx_text ltx_font_bold" id="alg2.l31.2">end</span> <span class="ltx_text ltx_font_bold" id="alg2.l31.3">while</span> </div> <div class="ltx_listingline" id="alg2.l32"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg2.l32.1.1.1" style="font-size:80%;">32:</span></span>  <span class="ltx_text ltx_font_bold" id="alg2.l32.2">return</span>  <math alttext="\Omega^{s}" class="ltx_Math" display="inline" id="alg2.l32.m1.1"><semantics id="alg2.l32.m1.1a"><msup id="alg2.l32.m1.1.1" xref="alg2.l32.m1.1.1.cmml"><mi id="alg2.l32.m1.1.1.2" mathvariant="normal" xref="alg2.l32.m1.1.1.2.cmml">Ω</mi><mi id="alg2.l32.m1.1.1.3" xref="alg2.l32.m1.1.1.3.cmml">s</mi></msup><annotation-xml encoding="MathML-Content" id="alg2.l32.m1.1b"><apply id="alg2.l32.m1.1.1.cmml" xref="alg2.l32.m1.1.1"><csymbol cd="ambiguous" id="alg2.l32.m1.1.1.1.cmml" xref="alg2.l32.m1.1.1">superscript</csymbol><ci id="alg2.l32.m1.1.1.2.cmml" xref="alg2.l32.m1.1.1.2">Ω</ci><ci id="alg2.l32.m1.1.1.3.cmml" xref="alg2.l32.m1.1.1.3">𝑠</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg2.l32.m1.1c">\Omega^{s}</annotation><annotation encoding="application/x-llamapun" id="alg2.l32.m1.1d">roman_Ω start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT</annotation></semantics></math> and the models in <span class="ltx_text ltx_markedasmath" id="alg2.l32.3">Models</span> {models correspond to the policies implicitly integrated in the neural network weights} </div> </div> <br class="ltx_break ltx_break"/> </figure> <div class="ltx_para" id="S2.SS3.p1"> <p class="ltx_p" id="S2.SS3.p1.2">To generate a set of policies, we employ the deep optimistic linear support (DOL) algorithm to iteratively construct the optimal solution set of policies <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib24" title="">24</a>]</cite>. DOL functions as an outer-loop MORL approach <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib25" title="">25</a>]</cite>, leveraging optimistic linear support (OLS) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib27" title="">27</a>]</cite> to iteratively construct the convex coverage set (CCS) of a multi-objective problem by generating scalarization functions in the form of weight vectors <math alttext="\mathbf{w}" class="ltx_Math" display="inline" id="S2.SS3.p1.1.m1.1"><semantics id="S2.SS3.p1.1.m1.1a"><mi id="S2.SS3.p1.1.m1.1.1" xref="S2.SS3.p1.1.m1.1.1.cmml">𝐰</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.1.m1.1b"><ci id="S2.SS3.p1.1.m1.1.1.cmml" xref="S2.SS3.p1.1.m1.1.1">𝐰</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.1.m1.1c">\mathbf{w}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.1.m1.1d">bold_w</annotation></semantics></math> and updating the solution set <math alttext="\Omega^{s}" class="ltx_Math" display="inline" id="S2.SS3.p1.2.m2.1"><semantics id="S2.SS3.p1.2.m2.1a"><msup id="S2.SS3.p1.2.m2.1.1" xref="S2.SS3.p1.2.m2.1.1.cmml"><mi id="S2.SS3.p1.2.m2.1.1.2" mathvariant="normal" xref="S2.SS3.p1.2.m2.1.1.2.cmml">Ω</mi><mi id="S2.SS3.p1.2.m2.1.1.3" xref="S2.SS3.p1.2.m2.1.1.3.cmml">s</mi></msup><annotation-xml encoding="MathML-Content" id="S2.SS3.p1.2.m2.1b"><apply id="S2.SS3.p1.2.m2.1.1.cmml" xref="S2.SS3.p1.2.m2.1.1"><csymbol cd="ambiguous" id="S2.SS3.p1.2.m2.1.1.1.cmml" xref="S2.SS3.p1.2.m2.1.1">superscript</csymbol><ci id="S2.SS3.p1.2.m2.1.1.2.cmml" xref="S2.SS3.p1.2.m2.1.1.2">Ω</ci><ci id="S2.SS3.p1.2.m2.1.1.3.cmml" xref="S2.SS3.p1.2.m2.1.1.3">𝑠</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p1.2.m2.1c">\Omega^{s}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p1.2.m2.1d">roman_Ω start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT</annotation></semantics></math> accordingly <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib25" title="">25</a>]</cite>. Additionally, here DOL facilitates integration of MOPPO as a multi-objective single policy RL Algorithm.</p> </div> <div class="ltx_para" id="S2.SS3.p2"> <p class="ltx_p" id="S2.SS3.p2.9">Algorithm <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg2" title="Algorithm 2 ‣ II-C Deep Optimistic Linear Support ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">2</span></a> provides details of the proposed DOL approach. In the first iteration, DOL assigns the extrema weights to the queue with infinite priority, ensuring these weights are processed first by the MOPPO algorithm (line <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg2.l7" title="In Algorithm 2 ‣ II-C Deep Optimistic Linear Support ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">7</span></a>). For each new iteration of the DOL algorithm, the weight vector with the highest priority <math alttext="q_{max}" class="ltx_Math" display="inline" id="S2.SS3.p2.1.m1.1"><semantics id="S2.SS3.p2.1.m1.1a"><msub id="S2.SS3.p2.1.m1.1.1" xref="S2.SS3.p2.1.m1.1.1.cmml"><mi id="S2.SS3.p2.1.m1.1.1.2" xref="S2.SS3.p2.1.m1.1.1.2.cmml">q</mi><mrow id="S2.SS3.p2.1.m1.1.1.3" xref="S2.SS3.p2.1.m1.1.1.3.cmml"><mi id="S2.SS3.p2.1.m1.1.1.3.2" xref="S2.SS3.p2.1.m1.1.1.3.2.cmml">m</mi><mo id="S2.SS3.p2.1.m1.1.1.3.1" xref="S2.SS3.p2.1.m1.1.1.3.1.cmml">⁢</mo><mi id="S2.SS3.p2.1.m1.1.1.3.3" xref="S2.SS3.p2.1.m1.1.1.3.3.cmml">a</mi><mo id="S2.SS3.p2.1.m1.1.1.3.1a" xref="S2.SS3.p2.1.m1.1.1.3.1.cmml">⁢</mo><mi id="S2.SS3.p2.1.m1.1.1.3.4" xref="S2.SS3.p2.1.m1.1.1.3.4.cmml">x</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.1.m1.1b"><apply id="S2.SS3.p2.1.m1.1.1.cmml" xref="S2.SS3.p2.1.m1.1.1"><csymbol cd="ambiguous" id="S2.SS3.p2.1.m1.1.1.1.cmml" xref="S2.SS3.p2.1.m1.1.1">subscript</csymbol><ci id="S2.SS3.p2.1.m1.1.1.2.cmml" xref="S2.SS3.p2.1.m1.1.1.2">𝑞</ci><apply id="S2.SS3.p2.1.m1.1.1.3.cmml" xref="S2.SS3.p2.1.m1.1.1.3"><times id="S2.SS3.p2.1.m1.1.1.3.1.cmml" xref="S2.SS3.p2.1.m1.1.1.3.1"></times><ci id="S2.SS3.p2.1.m1.1.1.3.2.cmml" xref="S2.SS3.p2.1.m1.1.1.3.2">𝑚</ci><ci id="S2.SS3.p2.1.m1.1.1.3.3.cmml" xref="S2.SS3.p2.1.m1.1.1.3.3">𝑎</ci><ci id="S2.SS3.p2.1.m1.1.1.3.4.cmml" xref="S2.SS3.p2.1.m1.1.1.3.4">𝑥</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.1.m1.1c">q_{max}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.1.m1.1d">italic_q start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT</annotation></semantics></math> (expected highest improvement) will be given to the MOPPO (line <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg2.l10" title="In Algorithm 2 ‣ II-C Deep Optimistic Linear Support ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">10</span></a>). The MOPPO is trained using <math alttext="\mathbf{w}" class="ltx_Math" display="inline" id="S2.SS3.p2.2.m2.1"><semantics id="S2.SS3.p2.2.m2.1a"><mi id="S2.SS3.p2.2.m2.1.1" xref="S2.SS3.p2.2.m2.1.1.cmml">𝐰</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.2.m2.1b"><ci id="S2.SS3.p2.2.m2.1.1.cmml" xref="S2.SS3.p2.2.m2.1.1">𝐰</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.2.m2.1c">\mathbf{w}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.2.m2.1d">bold_w</annotation></semantics></math>, described in Algorithm <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg1" title="Algorithm 1 ‣ II-A Single-Policy Multi-Objective PPO ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">1</span></a>. The trained agent containing <math alttext="\theta_{new}" class="ltx_Math" display="inline" id="S2.SS3.p2.3.m3.1"><semantics id="S2.SS3.p2.3.m3.1a"><msub id="S2.SS3.p2.3.m3.1.1" xref="S2.SS3.p2.3.m3.1.1.cmml"><mi id="S2.SS3.p2.3.m3.1.1.2" xref="S2.SS3.p2.3.m3.1.1.2.cmml">θ</mi><mrow id="S2.SS3.p2.3.m3.1.1.3" xref="S2.SS3.p2.3.m3.1.1.3.cmml"><mi id="S2.SS3.p2.3.m3.1.1.3.2" xref="S2.SS3.p2.3.m3.1.1.3.2.cmml">n</mi><mo id="S2.SS3.p2.3.m3.1.1.3.1" xref="S2.SS3.p2.3.m3.1.1.3.1.cmml">⁢</mo><mi id="S2.SS3.p2.3.m3.1.1.3.3" xref="S2.SS3.p2.3.m3.1.1.3.3.cmml">e</mi><mo id="S2.SS3.p2.3.m3.1.1.3.1a" xref="S2.SS3.p2.3.m3.1.1.3.1.cmml">⁢</mo><mi id="S2.SS3.p2.3.m3.1.1.3.4" xref="S2.SS3.p2.3.m3.1.1.3.4.cmml">w</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.3.m3.1b"><apply id="S2.SS3.p2.3.m3.1.1.cmml" xref="S2.SS3.p2.3.m3.1.1"><csymbol cd="ambiguous" id="S2.SS3.p2.3.m3.1.1.1.cmml" xref="S2.SS3.p2.3.m3.1.1">subscript</csymbol><ci id="S2.SS3.p2.3.m3.1.1.2.cmml" xref="S2.SS3.p2.3.m3.1.1.2">𝜃</ci><apply id="S2.SS3.p2.3.m3.1.1.3.cmml" xref="S2.SS3.p2.3.m3.1.1.3"><times id="S2.SS3.p2.3.m3.1.1.3.1.cmml" xref="S2.SS3.p2.3.m3.1.1.3.1"></times><ci id="S2.SS3.p2.3.m3.1.1.3.2.cmml" xref="S2.SS3.p2.3.m3.1.1.3.2">𝑛</ci><ci id="S2.SS3.p2.3.m3.1.1.3.3.cmml" xref="S2.SS3.p2.3.m3.1.1.3.3">𝑒</ci><ci id="S2.SS3.p2.3.m3.1.1.3.4.cmml" xref="S2.SS3.p2.3.m3.1.1.3.4">𝑤</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.3.m3.1c">\theta_{new}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.3.m3.1d">italic_θ start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT</annotation></semantics></math> is evaluated and produces <math alttext="\mathbfcal{V}" class="ltx_Math" display="inline" id="S2.SS3.p2.4.m4.1"><semantics id="S2.SS3.p2.4.m4.1a"><mi id="S2.SS3.p2.4.m4.1.1" mathvariant="normal" xref="S2.SS3.p2.4.m4.1.1.cmml">𝒱</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.4.m4.1b"><ci id="S2.SS3.p2.4.m4.1.1.cmml" xref="S2.SS3.p2.4.m4.1.1">𝒱</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.4.m4.1c">\mathbfcal{V}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.4.m4.1d">roman_𝒱</annotation></semantics></math> as the average value vector across the evaluation episodes (line <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg2.l21" title="In Algorithm 2 ‣ II-C Deep Optimistic Linear Support ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">21</span></a>). <math alttext="\mathbfcal{V}" class="ltx_Math" display="inline" id="S2.SS3.p2.5.m5.1"><semantics id="S2.SS3.p2.5.m5.1a"><mi id="S2.SS3.p2.5.m5.1.1" mathvariant="normal" xref="S2.SS3.p2.5.m5.1.1.cmml">𝒱</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.5.m5.1b"><ci id="S2.SS3.p2.5.m5.1.1.cmml" xref="S2.SS3.p2.5.m5.1.1">𝒱</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.5.m5.1c">\mathbfcal{V}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.5.m5.1d">roman_𝒱</annotation></semantics></math> is used to evaluate if the trained model contributes to the CCS. <math alttext="\mathbfcal{V}" class="ltx_Math" display="inline" id="S2.SS3.p2.6.m6.1"><semantics id="S2.SS3.p2.6.m6.1a"><mi id="S2.SS3.p2.6.m6.1.1" mathvariant="normal" xref="S2.SS3.p2.6.m6.1.1.cmml">𝒱</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.6.m6.1b"><ci id="S2.SS3.p2.6.m6.1.1.cmml" xref="S2.SS3.p2.6.m6.1.1">𝒱</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.6.m6.1c">\mathbfcal{V}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.6.m6.1d">roman_𝒱</annotation></semantics></math> is the sampled vectorized value-estimation <math alttext="\mathbf{V}(s)" class="ltx_Math" display="inline" id="S2.SS3.p2.7.m7.1"><semantics id="S2.SS3.p2.7.m7.1a"><mrow id="S2.SS3.p2.7.m7.1.2" xref="S2.SS3.p2.7.m7.1.2.cmml"><mi id="S2.SS3.p2.7.m7.1.2.2" xref="S2.SS3.p2.7.m7.1.2.2.cmml">𝐕</mi><mo id="S2.SS3.p2.7.m7.1.2.1" xref="S2.SS3.p2.7.m7.1.2.1.cmml">⁢</mo><mrow id="S2.SS3.p2.7.m7.1.2.3.2" xref="S2.SS3.p2.7.m7.1.2.cmml"><mo id="S2.SS3.p2.7.m7.1.2.3.2.1" stretchy="false" xref="S2.SS3.p2.7.m7.1.2.cmml">(</mo><mi id="S2.SS3.p2.7.m7.1.1" xref="S2.SS3.p2.7.m7.1.1.cmml">s</mi><mo id="S2.SS3.p2.7.m7.1.2.3.2.2" stretchy="false" xref="S2.SS3.p2.7.m7.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.7.m7.1b"><apply id="S2.SS3.p2.7.m7.1.2.cmml" xref="S2.SS3.p2.7.m7.1.2"><times id="S2.SS3.p2.7.m7.1.2.1.cmml" xref="S2.SS3.p2.7.m7.1.2.1"></times><ci id="S2.SS3.p2.7.m7.1.2.2.cmml" xref="S2.SS3.p2.7.m7.1.2.2">𝐕</ci><ci id="S2.SS3.p2.7.m7.1.1.cmml" xref="S2.SS3.p2.7.m7.1.1">𝑠</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.7.m7.1c">\mathbf{V}(s)</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.7.m7.1d">bold_V ( italic_s )</annotation></semantics></math> over the evaluation episodes, based on the received rewards; hence, <math alttext="\mathbfcal{V}" class="ltx_Math" display="inline" id="S2.SS3.p2.8.m8.1"><semantics id="S2.SS3.p2.8.m8.1a"><mi id="S2.SS3.p2.8.m8.1.1" mathvariant="normal" xref="S2.SS3.p2.8.m8.1.1.cmml">𝒱</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.8.m8.1b"><ci id="S2.SS3.p2.8.m8.1.1.cmml" xref="S2.SS3.p2.8.m8.1.1">𝒱</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.8.m8.1c">\mathbfcal{V}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.8.m8.1d">roman_𝒱</annotation></semantics></math> does not necessarily reflect the vectorized state dependent value function <math alttext="\mathbf{V}(s)" class="ltx_Math" display="inline" id="S2.SS3.p2.9.m9.1"><semantics id="S2.SS3.p2.9.m9.1a"><mrow id="S2.SS3.p2.9.m9.1.2" xref="S2.SS3.p2.9.m9.1.2.cmml"><mi id="S2.SS3.p2.9.m9.1.2.2" xref="S2.SS3.p2.9.m9.1.2.2.cmml">𝐕</mi><mo id="S2.SS3.p2.9.m9.1.2.1" xref="S2.SS3.p2.9.m9.1.2.1.cmml">⁢</mo><mrow id="S2.SS3.p2.9.m9.1.2.3.2" xref="S2.SS3.p2.9.m9.1.2.cmml"><mo id="S2.SS3.p2.9.m9.1.2.3.2.1" stretchy="false" xref="S2.SS3.p2.9.m9.1.2.cmml">(</mo><mi id="S2.SS3.p2.9.m9.1.1" xref="S2.SS3.p2.9.m9.1.1.cmml">s</mi><mo id="S2.SS3.p2.9.m9.1.2.3.2.2" stretchy="false" xref="S2.SS3.p2.9.m9.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.9.m9.1b"><apply id="S2.SS3.p2.9.m9.1.2.cmml" xref="S2.SS3.p2.9.m9.1.2"><times id="S2.SS3.p2.9.m9.1.2.1.cmml" xref="S2.SS3.p2.9.m9.1.2.1"></times><ci id="S2.SS3.p2.9.m9.1.2.2.cmml" xref="S2.SS3.p2.9.m9.1.2.2">𝐕</ci><ci id="S2.SS3.p2.9.m9.1.1.cmml" xref="S2.SS3.p2.9.m9.1.1">𝑠</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.9.m9.1c">\mathbf{V}(s)</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.9.m9.1d">bold_V ( italic_s )</annotation></semantics></math> (<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S2.E1" title="Equation 1 ‣ II-A Single-Policy Multi-Objective PPO ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">1</span></a>).</p> </div> <div class="ltx_para" id="S2.SS3.p3"> <p class="ltx_p" id="S2.SS3.p3.5">If the new value vector improves the coverage set <math alttext="\Omega^{s}" class="ltx_Math" display="inline" id="S2.SS3.p3.1.m1.1"><semantics id="S2.SS3.p3.1.m1.1a"><msup id="S2.SS3.p3.1.m1.1.1" xref="S2.SS3.p3.1.m1.1.1.cmml"><mi id="S2.SS3.p3.1.m1.1.1.2" mathvariant="normal" xref="S2.SS3.p3.1.m1.1.1.2.cmml">Ω</mi><mi id="S2.SS3.p3.1.m1.1.1.3" xref="S2.SS3.p3.1.m1.1.1.3.cmml">s</mi></msup><annotation-xml encoding="MathML-Content" id="S2.SS3.p3.1.m1.1b"><apply id="S2.SS3.p3.1.m1.1.1.cmml" xref="S2.SS3.p3.1.m1.1.1"><csymbol cd="ambiguous" id="S2.SS3.p3.1.m1.1.1.1.cmml" xref="S2.SS3.p3.1.m1.1.1">superscript</csymbol><ci id="S2.SS3.p3.1.m1.1.1.2.cmml" xref="S2.SS3.p3.1.m1.1.1.2">Ω</ci><ci id="S2.SS3.p3.1.m1.1.1.3.cmml" xref="S2.SS3.p3.1.m1.1.1.3">𝑠</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p3.1.m1.1c">\Omega^{s}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p3.1.m1.1d">roman_Ω start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT</annotation></semantics></math>, obsolete corner weights are deleted, the new value vector is added to the set, the model is saved and new corner weights are calculated (line <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg2.l23" title="In Algorithm 2 ‣ II-C Deep Optimistic Linear Support ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">23</span></a>-<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg2.l29" title="In Algorithm 2 ‣ II-C Deep Optimistic Linear Support ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">29</span></a>). To determine next weight vectors, corner weights are identified as the weights where the piecewise-linear convex (PWLC) surface of <math alttext="\mathbfcal{V}(w)" class="ltx_Math" display="inline" id="S2.SS3.p3.2.m2.1"><semantics id="S2.SS3.p3.2.m2.1a"><mrow id="S2.SS3.p3.2.m2.1.1" xref="S2.SS3.p3.2.m2.1.1.cmml"><mi id="S2.SS3.p3.2.m2.1.1.2" mathvariant="normal" xref="S2.SS3.p3.2.m2.1.1.2.cmml">𝒱</mi><mo id="S2.SS3.p3.2.m2.1.1.1" xref="S2.SS3.p3.2.m2.1.1.1.cmml">⁢</mo><mi id="S2.SS3.p3.2.m2.1.1.3" mathvariant="normal" xref="S2.SS3.p3.2.m2.1.1.3.cmml">⇐</mi><mo id="S2.SS3.p3.2.m2.1.1.1a" xref="S2.SS3.p3.2.m2.1.1.1.cmml">⁢</mo><mi id="S2.SS3.p3.2.m2.1.1.4" mathvariant="normal" xref="S2.SS3.p3.2.m2.1.1.4.cmml">⊒</mi><mo id="S2.SS3.p3.2.m2.1.1.1b" xref="S2.SS3.p3.2.m2.1.1.1.cmml">⁢</mo><mi id="S2.SS3.p3.2.m2.1.1.5" mathvariant="normal" xref="S2.SS3.p3.2.m2.1.1.5.cmml">⇒</mi></mrow><annotation-xml encoding="MathML-Content" id="S2.SS3.p3.2.m2.1b"><apply id="S2.SS3.p3.2.m2.1.1.cmml" xref="S2.SS3.p3.2.m2.1.1"><times id="S2.SS3.p3.2.m2.1.1.1.cmml" xref="S2.SS3.p3.2.m2.1.1.1"></times><ci id="S2.SS3.p3.2.m2.1.1.2.cmml" xref="S2.SS3.p3.2.m2.1.1.2">𝒱</ci><ci id="S2.SS3.p3.2.m2.1.1.3.cmml" xref="S2.SS3.p3.2.m2.1.1.3">⇐</ci><ci id="S2.SS3.p3.2.m2.1.1.4.cmml" xref="S2.SS3.p3.2.m2.1.1.4">⊒</ci><ci id="S2.SS3.p3.2.m2.1.1.5.cmml" xref="S2.SS3.p3.2.m2.1.1.5">⇒</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p3.2.m2.1c">\mathbfcal{V}(w)</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p3.2.m2.1d">roman_𝒱 ⇐ ⊒ ⇒</annotation></semantics></math> changes slope (line <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg2.l29" title="In Algorithm 2 ‣ II-C Deep Optimistic Linear Support ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">29</span></a>). Specifically, these corner weights are the vertices of the polyhedral subspace above <math alttext="\mathbfcal{V}(w)" class="ltx_Math" display="inline" id="S2.SS3.p3.3.m3.1"><semantics id="S2.SS3.p3.3.m3.1a"><mrow id="S2.SS3.p3.3.m3.1.1" xref="S2.SS3.p3.3.m3.1.1.cmml"><mi id="S2.SS3.p3.3.m3.1.1.2" mathvariant="normal" xref="S2.SS3.p3.3.m3.1.1.2.cmml">𝒱</mi><mo id="S2.SS3.p3.3.m3.1.1.1" xref="S2.SS3.p3.3.m3.1.1.1.cmml">⁢</mo><mi id="S2.SS3.p3.3.m3.1.1.3" mathvariant="normal" xref="S2.SS3.p3.3.m3.1.1.3.cmml">⇐</mi><mo id="S2.SS3.p3.3.m3.1.1.1a" xref="S2.SS3.p3.3.m3.1.1.1.cmml">⁢</mo><mi id="S2.SS3.p3.3.m3.1.1.4" mathvariant="normal" xref="S2.SS3.p3.3.m3.1.1.4.cmml">⊒</mi><mo id="S2.SS3.p3.3.m3.1.1.1b" xref="S2.SS3.p3.3.m3.1.1.1.cmml">⁢</mo><mi id="S2.SS3.p3.3.m3.1.1.5" mathvariant="normal" xref="S2.SS3.p3.3.m3.1.1.5.cmml">⇒</mi></mrow><annotation-xml encoding="MathML-Content" id="S2.SS3.p3.3.m3.1b"><apply id="S2.SS3.p3.3.m3.1.1.cmml" xref="S2.SS3.p3.3.m3.1.1"><times id="S2.SS3.p3.3.m3.1.1.1.cmml" xref="S2.SS3.p3.3.m3.1.1.1"></times><ci id="S2.SS3.p3.3.m3.1.1.2.cmml" xref="S2.SS3.p3.3.m3.1.1.2">𝒱</ci><ci id="S2.SS3.p3.3.m3.1.1.3.cmml" xref="S2.SS3.p3.3.m3.1.1.3">⇐</ci><ci id="S2.SS3.p3.3.m3.1.1.4.cmml" xref="S2.SS3.p3.3.m3.1.1.4">⊒</ci><ci id="S2.SS3.p3.3.m3.1.1.5.cmml" xref="S2.SS3.p3.3.m3.1.1.5">⇒</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p3.3.m3.1c">\mathbfcal{V}(w)</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p3.3.m3.1d">roman_𝒱 ⇐ ⊒ ⇒</annotation></semantics></math>. The priority of the new corner weights is calculated based on their distance to the assumed optimistic upper bound of the CCS. Details on the <span class="ltx_text ltx_markedasmath ltx_font_typewriter" id="S2.SS3.p3.5.1">newCornerWeights</span> are provided in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib27" title="">27</a>]</cite>. The original stopping criterion, which depended on a minimum improvement in the CCS <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib24" title="">24</a>]</cite>, is replaced with a predefined maximum number of iterations, <math alttext="k^{\text{max}}" class="ltx_Math" display="inline" id="S2.SS3.p3.5.m5.1"><semantics id="S2.SS3.p3.5.m5.1a"><msup id="S2.SS3.p3.5.m5.1.1" xref="S2.SS3.p3.5.m5.1.1.cmml"><mi id="S2.SS3.p3.5.m5.1.1.2" xref="S2.SS3.p3.5.m5.1.1.2.cmml">k</mi><mtext id="S2.SS3.p3.5.m5.1.1.3" xref="S2.SS3.p3.5.m5.1.1.3a.cmml">max</mtext></msup><annotation-xml encoding="MathML-Content" id="S2.SS3.p3.5.m5.1b"><apply id="S2.SS3.p3.5.m5.1.1.cmml" xref="S2.SS3.p3.5.m5.1.1"><csymbol cd="ambiguous" id="S2.SS3.p3.5.m5.1.1.1.cmml" xref="S2.SS3.p3.5.m5.1.1">superscript</csymbol><ci id="S2.SS3.p3.5.m5.1.1.2.cmml" xref="S2.SS3.p3.5.m5.1.1.2">𝑘</ci><ci id="S2.SS3.p3.5.m5.1.1.3a.cmml" xref="S2.SS3.p3.5.m5.1.1.3"><mtext id="S2.SS3.p3.5.m5.1.1.3.cmml" mathsize="70%" xref="S2.SS3.p3.5.m5.1.1.3">max</mtext></ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p3.5.m5.1c">k^{\text{max}}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p3.5.m5.1d">italic_k start_POSTSUPERSCRIPT max end_POSTSUPERSCRIPT</annotation></semantics></math> (line <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg2.l9" title="In Algorithm 2 ‣ II-C Deep Optimistic Linear Support ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">9</span></a>). This allows for an initial estimate of the training effort by fixing the number of iterations in advance.</p> </div> <div class="ltx_para" id="S2.SS3.p4"> <p class="ltx_p" id="S2.SS3.p4.2">DOL offers the possibility of reusing model parameters from the nearest model (lines <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg2.l11" title="In Algorithm 2 ‣ II-C Deep Optimistic Linear Support ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">11</span></a>-<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg2.l19" title="In Algorithm 2 ‣ II-C Deep Optimistic Linear Support ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">19</span></a>). The neural network weights <math alttext="\theta" class="ltx_Math" display="inline" id="S2.SS3.p4.1.m1.1"><semantics id="S2.SS3.p4.1.m1.1a"><mi id="S2.SS3.p4.1.m1.1.1" xref="S2.SS3.p4.1.m1.1.1.cmml">θ</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p4.1.m1.1b"><ci id="S2.SS3.p4.1.m1.1.1.cmml" xref="S2.SS3.p4.1.m1.1.1">𝜃</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p4.1.m1.1c">\theta</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p4.1.m1.1d">italic_θ</annotation></semantics></math> from the model with the closet <span class="ltx_text ltx_markedasmath ltx_font_bold" id="S2.SS3.p4.2.1">w</span> are used to initialize the next MOPPO. For a detailed view on DOL and OLS, we refer to <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib24" title="">24</a>]</cite> and <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib27" title="">27</a>]</cite>, respectively.</p> </div> </section> <section class="ltx_subsection" id="S2.SS4"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S2.SS4.5.1.1">II-D</span> </span><span class="ltx_text ltx_font_italic" id="S2.SS4.6.2">Policy Selection</span> </h3> <div class="ltx_para" id="S2.SS4.p1"> <p class="ltx_p" id="S2.SS4.p1.4">Despite the multi-objective nature of the problem, the ultimate goal of the system operator is maintain secure system operation as long as possible <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib4" title="">4</a>]</cite>. To this end, Algorithm <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg3" title="Algorithm 3 ‣ II-D Policy Selection ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">3</span></a> is proposed to identify the best-performing policy as a recommendation for the system operator. We assess policy quality using a metric independent of the rewards, referred to as <span class="ltx_text ltx_font_italic" id="S2.SS4.p1.4.1">Episode Duration</span> (<math alttext="E" class="ltx_Math" display="inline" id="S2.SS4.p1.1.m1.1"><semantics id="S2.SS4.p1.1.m1.1a"><mi id="S2.SS4.p1.1.m1.1.1" xref="S2.SS4.p1.1.m1.1.1.cmml">E</mi><annotation-xml encoding="MathML-Content" id="S2.SS4.p1.1.m1.1b"><ci id="S2.SS4.p1.1.m1.1.1.cmml" xref="S2.SS4.p1.1.m1.1.1">𝐸</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.p1.1.m1.1c">E</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.p1.1.m1.1d">italic_E</annotation></semantics></math>) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib26" title="">26</a>]</cite>. <math alttext="E" class="ltx_Math" display="inline" id="S2.SS4.p1.2.m2.1"><semantics id="S2.SS4.p1.2.m2.1a"><mi id="S2.SS4.p1.2.m2.1.1" xref="S2.SS4.p1.2.m2.1.1.cmml">E</mi><annotation-xml encoding="MathML-Content" id="S2.SS4.p1.2.m2.1b"><ci id="S2.SS4.p1.2.m2.1.1.cmml" xref="S2.SS4.p1.2.m2.1.1">𝐸</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.p1.2.m2.1c">E</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.p1.2.m2.1d">italic_E</annotation></semantics></math> measures how long the agent can prevent the power system from premature grid failure. After generating the complete set of policies, we select the best-performing policy <math alttext="(\pi^{MO})" class="ltx_Math" display="inline" id="S2.SS4.p1.3.m3.1"><semantics id="S2.SS4.p1.3.m3.1a"><mrow id="S2.SS4.p1.3.m3.1.1.1" xref="S2.SS4.p1.3.m3.1.1.1.1.cmml"><mo id="S2.SS4.p1.3.m3.1.1.1.2" stretchy="false" xref="S2.SS4.p1.3.m3.1.1.1.1.cmml">(</mo><msup id="S2.SS4.p1.3.m3.1.1.1.1" xref="S2.SS4.p1.3.m3.1.1.1.1.cmml"><mi id="S2.SS4.p1.3.m3.1.1.1.1.2" xref="S2.SS4.p1.3.m3.1.1.1.1.2.cmml">π</mi><mrow id="S2.SS4.p1.3.m3.1.1.1.1.3" xref="S2.SS4.p1.3.m3.1.1.1.1.3.cmml"><mi id="S2.SS4.p1.3.m3.1.1.1.1.3.2" xref="S2.SS4.p1.3.m3.1.1.1.1.3.2.cmml">M</mi><mo id="S2.SS4.p1.3.m3.1.1.1.1.3.1" xref="S2.SS4.p1.3.m3.1.1.1.1.3.1.cmml">⁢</mo><mi id="S2.SS4.p1.3.m3.1.1.1.1.3.3" xref="S2.SS4.p1.3.m3.1.1.1.1.3.3.cmml">O</mi></mrow></msup><mo id="S2.SS4.p1.3.m3.1.1.1.3" stretchy="false" xref="S2.SS4.p1.3.m3.1.1.1.1.cmml">)</mo></mrow><annotation-xml encoding="MathML-Content" id="S2.SS4.p1.3.m3.1b"><apply id="S2.SS4.p1.3.m3.1.1.1.1.cmml" xref="S2.SS4.p1.3.m3.1.1.1"><csymbol cd="ambiguous" id="S2.SS4.p1.3.m3.1.1.1.1.1.cmml" xref="S2.SS4.p1.3.m3.1.1.1">superscript</csymbol><ci id="S2.SS4.p1.3.m3.1.1.1.1.2.cmml" xref="S2.SS4.p1.3.m3.1.1.1.1.2">𝜋</ci><apply id="S2.SS4.p1.3.m3.1.1.1.1.3.cmml" xref="S2.SS4.p1.3.m3.1.1.1.1.3"><times id="S2.SS4.p1.3.m3.1.1.1.1.3.1.cmml" xref="S2.SS4.p1.3.m3.1.1.1.1.3.1"></times><ci id="S2.SS4.p1.3.m3.1.1.1.1.3.2.cmml" xref="S2.SS4.p1.3.m3.1.1.1.1.3.2">𝑀</ci><ci id="S2.SS4.p1.3.m3.1.1.1.1.3.3.cmml" xref="S2.SS4.p1.3.m3.1.1.1.1.3.3">𝑂</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.p1.3.m3.1c">(\pi^{MO})</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.p1.3.m3.1d">( italic_π start_POSTSUPERSCRIPT italic_M italic_O end_POSTSUPERSCRIPT )</annotation></semantics></math> for each seed run based on (<math alttext="E" class="ltx_Math" display="inline" id="S2.SS4.p1.4.m4.1"><semantics id="S2.SS4.p1.4.m4.1a"><mi id="S2.SS4.p1.4.m4.1.1" xref="S2.SS4.p1.4.m4.1.1.cmml">E</mi><annotation-xml encoding="MathML-Content" id="S2.SS4.p1.4.m4.1b"><ci id="S2.SS4.p1.4.m4.1.1.cmml" xref="S2.SS4.p1.4.m4.1.1">𝐸</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.p1.4.m4.1c">E</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.p1.4.m4.1d">italic_E</annotation></semantics></math>). Policies trained solely on extreme weights are excluded from consideration, as the focus is on selecting policies optimized for multiple rewards.</p> </div> <figure class="ltx_float ltx_float_algorithm ltx_framed ltx_framed_top" id="alg3"> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_float"><span class="ltx_text ltx_font_bold" id="alg3.2.1.1">Algorithm 3</span> </span> Selection Process for Best Performing Policy</figcaption> <div class="ltx_listing ltx_listing" id="alg3.3"> <div class="ltx_listingline" id="alg3.l0"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg3.l0.1.1.1" style="font-size:80%;">0:</span></span>  Set of policies <math alttext="\{\pi_{1},\pi_{2},\ldots,\pi_{n}\}" class="ltx_Math" display="inline" id="alg3.l0.m1.4"><semantics id="alg3.l0.m1.4a"><mrow id="alg3.l0.m1.4.4.3" xref="alg3.l0.m1.4.4.4.cmml"><mo id="alg3.l0.m1.4.4.3.4" stretchy="false" xref="alg3.l0.m1.4.4.4.cmml">{</mo><msub id="alg3.l0.m1.2.2.1.1" xref="alg3.l0.m1.2.2.1.1.cmml"><mi id="alg3.l0.m1.2.2.1.1.2" xref="alg3.l0.m1.2.2.1.1.2.cmml">π</mi><mn id="alg3.l0.m1.2.2.1.1.3" xref="alg3.l0.m1.2.2.1.1.3.cmml">1</mn></msub><mo id="alg3.l0.m1.4.4.3.5" xref="alg3.l0.m1.4.4.4.cmml">,</mo><msub id="alg3.l0.m1.3.3.2.2" xref="alg3.l0.m1.3.3.2.2.cmml"><mi id="alg3.l0.m1.3.3.2.2.2" xref="alg3.l0.m1.3.3.2.2.2.cmml">π</mi><mn id="alg3.l0.m1.3.3.2.2.3" xref="alg3.l0.m1.3.3.2.2.3.cmml">2</mn></msub><mo id="alg3.l0.m1.4.4.3.6" xref="alg3.l0.m1.4.4.4.cmml">,</mo><mi id="alg3.l0.m1.1.1" mathvariant="normal" xref="alg3.l0.m1.1.1.cmml">…</mi><mo id="alg3.l0.m1.4.4.3.7" xref="alg3.l0.m1.4.4.4.cmml">,</mo><msub id="alg3.l0.m1.4.4.3.3" xref="alg3.l0.m1.4.4.3.3.cmml"><mi id="alg3.l0.m1.4.4.3.3.2" xref="alg3.l0.m1.4.4.3.3.2.cmml">π</mi><mi id="alg3.l0.m1.4.4.3.3.3" xref="alg3.l0.m1.4.4.3.3.3.cmml">n</mi></msub><mo id="alg3.l0.m1.4.4.3.8" stretchy="false" xref="alg3.l0.m1.4.4.4.cmml">}</mo></mrow><annotation-xml encoding="MathML-Content" id="alg3.l0.m1.4b"><set id="alg3.l0.m1.4.4.4.cmml" xref="alg3.l0.m1.4.4.3"><apply id="alg3.l0.m1.2.2.1.1.cmml" xref="alg3.l0.m1.2.2.1.1"><csymbol cd="ambiguous" id="alg3.l0.m1.2.2.1.1.1.cmml" xref="alg3.l0.m1.2.2.1.1">subscript</csymbol><ci id="alg3.l0.m1.2.2.1.1.2.cmml" xref="alg3.l0.m1.2.2.1.1.2">𝜋</ci><cn id="alg3.l0.m1.2.2.1.1.3.cmml" type="integer" xref="alg3.l0.m1.2.2.1.1.3">1</cn></apply><apply id="alg3.l0.m1.3.3.2.2.cmml" xref="alg3.l0.m1.3.3.2.2"><csymbol cd="ambiguous" id="alg3.l0.m1.3.3.2.2.1.cmml" xref="alg3.l0.m1.3.3.2.2">subscript</csymbol><ci id="alg3.l0.m1.3.3.2.2.2.cmml" xref="alg3.l0.m1.3.3.2.2.2">𝜋</ci><cn id="alg3.l0.m1.3.3.2.2.3.cmml" type="integer" xref="alg3.l0.m1.3.3.2.2.3">2</cn></apply><ci id="alg3.l0.m1.1.1.cmml" xref="alg3.l0.m1.1.1">…</ci><apply id="alg3.l0.m1.4.4.3.3.cmml" xref="alg3.l0.m1.4.4.3.3"><csymbol cd="ambiguous" id="alg3.l0.m1.4.4.3.3.1.cmml" xref="alg3.l0.m1.4.4.3.3">subscript</csymbol><ci id="alg3.l0.m1.4.4.3.3.2.cmml" xref="alg3.l0.m1.4.4.3.3.2">𝜋</ci><ci id="alg3.l0.m1.4.4.3.3.3.cmml" xref="alg3.l0.m1.4.4.3.3.3">𝑛</ci></apply></set></annotation-xml><annotation encoding="application/x-tex" id="alg3.l0.m1.4c">\{\pi_{1},\pi_{2},\ldots,\pi_{n}\}</annotation><annotation encoding="application/x-llamapun" id="alg3.l0.m1.4d">{ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }</annotation></semantics></math>, set of episode durations <math alttext="\{E_{1},E_{2},\ldots,E_{n}\}" class="ltx_Math" display="inline" id="alg3.l0.m2.4"><semantics id="alg3.l0.m2.4a"><mrow id="alg3.l0.m2.4.4.3" xref="alg3.l0.m2.4.4.4.cmml"><mo id="alg3.l0.m2.4.4.3.4" stretchy="false" xref="alg3.l0.m2.4.4.4.cmml">{</mo><msub id="alg3.l0.m2.2.2.1.1" xref="alg3.l0.m2.2.2.1.1.cmml"><mi id="alg3.l0.m2.2.2.1.1.2" xref="alg3.l0.m2.2.2.1.1.2.cmml">E</mi><mn id="alg3.l0.m2.2.2.1.1.3" xref="alg3.l0.m2.2.2.1.1.3.cmml">1</mn></msub><mo id="alg3.l0.m2.4.4.3.5" xref="alg3.l0.m2.4.4.4.cmml">,</mo><msub id="alg3.l0.m2.3.3.2.2" xref="alg3.l0.m2.3.3.2.2.cmml"><mi id="alg3.l0.m2.3.3.2.2.2" xref="alg3.l0.m2.3.3.2.2.2.cmml">E</mi><mn id="alg3.l0.m2.3.3.2.2.3" xref="alg3.l0.m2.3.3.2.2.3.cmml">2</mn></msub><mo id="alg3.l0.m2.4.4.3.6" xref="alg3.l0.m2.4.4.4.cmml">,</mo><mi id="alg3.l0.m2.1.1" mathvariant="normal" xref="alg3.l0.m2.1.1.cmml">…</mi><mo id="alg3.l0.m2.4.4.3.7" xref="alg3.l0.m2.4.4.4.cmml">,</mo><msub id="alg3.l0.m2.4.4.3.3" xref="alg3.l0.m2.4.4.3.3.cmml"><mi id="alg3.l0.m2.4.4.3.3.2" xref="alg3.l0.m2.4.4.3.3.2.cmml">E</mi><mi id="alg3.l0.m2.4.4.3.3.3" xref="alg3.l0.m2.4.4.3.3.3.cmml">n</mi></msub><mo id="alg3.l0.m2.4.4.3.8" stretchy="false" xref="alg3.l0.m2.4.4.4.cmml">}</mo></mrow><annotation-xml encoding="MathML-Content" id="alg3.l0.m2.4b"><set id="alg3.l0.m2.4.4.4.cmml" xref="alg3.l0.m2.4.4.3"><apply id="alg3.l0.m2.2.2.1.1.cmml" xref="alg3.l0.m2.2.2.1.1"><csymbol cd="ambiguous" id="alg3.l0.m2.2.2.1.1.1.cmml" xref="alg3.l0.m2.2.2.1.1">subscript</csymbol><ci id="alg3.l0.m2.2.2.1.1.2.cmml" xref="alg3.l0.m2.2.2.1.1.2">𝐸</ci><cn id="alg3.l0.m2.2.2.1.1.3.cmml" type="integer" xref="alg3.l0.m2.2.2.1.1.3">1</cn></apply><apply id="alg3.l0.m2.3.3.2.2.cmml" xref="alg3.l0.m2.3.3.2.2"><csymbol cd="ambiguous" id="alg3.l0.m2.3.3.2.2.1.cmml" xref="alg3.l0.m2.3.3.2.2">subscript</csymbol><ci id="alg3.l0.m2.3.3.2.2.2.cmml" xref="alg3.l0.m2.3.3.2.2.2">𝐸</ci><cn id="alg3.l0.m2.3.3.2.2.3.cmml" type="integer" xref="alg3.l0.m2.3.3.2.2.3">2</cn></apply><ci id="alg3.l0.m2.1.1.cmml" xref="alg3.l0.m2.1.1">…</ci><apply id="alg3.l0.m2.4.4.3.3.cmml" xref="alg3.l0.m2.4.4.3.3"><csymbol cd="ambiguous" id="alg3.l0.m2.4.4.3.3.1.cmml" xref="alg3.l0.m2.4.4.3.3">subscript</csymbol><ci id="alg3.l0.m2.4.4.3.3.2.cmml" xref="alg3.l0.m2.4.4.3.3.2">𝐸</ci><ci id="alg3.l0.m2.4.4.3.3.3.cmml" xref="alg3.l0.m2.4.4.3.3.3">𝑛</ci></apply></set></annotation-xml><annotation encoding="application/x-tex" id="alg3.l0.m2.4c">\{E_{1},E_{2},\ldots,E_{n}\}</annotation><annotation encoding="application/x-llamapun" id="alg3.l0.m2.4d">{ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }</annotation></semantics></math>, extrema weights <math alttext="\{w_{e}\}" class="ltx_Math" display="inline" id="alg3.l0.m3.1"><semantics id="alg3.l0.m3.1a"><mrow id="alg3.l0.m3.1.1.1" xref="alg3.l0.m3.1.1.2.cmml"><mo id="alg3.l0.m3.1.1.1.2" stretchy="false" xref="alg3.l0.m3.1.1.2.cmml">{</mo><msub id="alg3.l0.m3.1.1.1.1" xref="alg3.l0.m3.1.1.1.1.cmml"><mi id="alg3.l0.m3.1.1.1.1.2" xref="alg3.l0.m3.1.1.1.1.2.cmml">w</mi><mi id="alg3.l0.m3.1.1.1.1.3" xref="alg3.l0.m3.1.1.1.1.3.cmml">e</mi></msub><mo id="alg3.l0.m3.1.1.1.3" stretchy="false" xref="alg3.l0.m3.1.1.2.cmml">}</mo></mrow><annotation-xml encoding="MathML-Content" id="alg3.l0.m3.1b"><set id="alg3.l0.m3.1.1.2.cmml" xref="alg3.l0.m3.1.1.1"><apply id="alg3.l0.m3.1.1.1.1.cmml" xref="alg3.l0.m3.1.1.1.1"><csymbol cd="ambiguous" id="alg3.l0.m3.1.1.1.1.1.cmml" xref="alg3.l0.m3.1.1.1.1">subscript</csymbol><ci id="alg3.l0.m3.1.1.1.1.2.cmml" xref="alg3.l0.m3.1.1.1.1.2">𝑤</ci><ci id="alg3.l0.m3.1.1.1.1.3.cmml" xref="alg3.l0.m3.1.1.1.1.3">𝑒</ci></apply></set></annotation-xml><annotation encoding="application/x-tex" id="alg3.l0.m3.1c">\{w_{e}\}</annotation><annotation encoding="application/x-llamapun" id="alg3.l0.m3.1d">{ italic_w start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT }</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg3.l1"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg3.l1.1.1.1" style="font-size:80%;">1:</span></span>  Initialize best policy <math alttext="\pi^{MO}\leftarrow\emptyset" class="ltx_Math" display="inline" id="alg3.l1.m1.1"><semantics id="alg3.l1.m1.1a"><mrow id="alg3.l1.m1.1.1" xref="alg3.l1.m1.1.1.cmml"><msup id="alg3.l1.m1.1.1.2" xref="alg3.l1.m1.1.1.2.cmml"><mi id="alg3.l1.m1.1.1.2.2" xref="alg3.l1.m1.1.1.2.2.cmml">π</mi><mrow id="alg3.l1.m1.1.1.2.3" xref="alg3.l1.m1.1.1.2.3.cmml"><mi id="alg3.l1.m1.1.1.2.3.2" xref="alg3.l1.m1.1.1.2.3.2.cmml">M</mi><mo id="alg3.l1.m1.1.1.2.3.1" xref="alg3.l1.m1.1.1.2.3.1.cmml">⁢</mo><mi id="alg3.l1.m1.1.1.2.3.3" xref="alg3.l1.m1.1.1.2.3.3.cmml">O</mi></mrow></msup><mo id="alg3.l1.m1.1.1.1" stretchy="false" xref="alg3.l1.m1.1.1.1.cmml">←</mo><mi id="alg3.l1.m1.1.1.3" mathvariant="normal" xref="alg3.l1.m1.1.1.3.cmml">∅</mi></mrow><annotation-xml encoding="MathML-Content" id="alg3.l1.m1.1b"><apply id="alg3.l1.m1.1.1.cmml" xref="alg3.l1.m1.1.1"><ci id="alg3.l1.m1.1.1.1.cmml" xref="alg3.l1.m1.1.1.1">←</ci><apply id="alg3.l1.m1.1.1.2.cmml" xref="alg3.l1.m1.1.1.2"><csymbol cd="ambiguous" id="alg3.l1.m1.1.1.2.1.cmml" xref="alg3.l1.m1.1.1.2">superscript</csymbol><ci id="alg3.l1.m1.1.1.2.2.cmml" xref="alg3.l1.m1.1.1.2.2">𝜋</ci><apply id="alg3.l1.m1.1.1.2.3.cmml" xref="alg3.l1.m1.1.1.2.3"><times id="alg3.l1.m1.1.1.2.3.1.cmml" xref="alg3.l1.m1.1.1.2.3.1"></times><ci id="alg3.l1.m1.1.1.2.3.2.cmml" xref="alg3.l1.m1.1.1.2.3.2">𝑀</ci><ci id="alg3.l1.m1.1.1.2.3.3.cmml" xref="alg3.l1.m1.1.1.2.3.3">𝑂</ci></apply></apply><emptyset id="alg3.l1.m1.1.1.3.cmml" xref="alg3.l1.m1.1.1.3"></emptyset></apply></annotation-xml><annotation encoding="application/x-tex" id="alg3.l1.m1.1c">\pi^{MO}\leftarrow\emptyset</annotation><annotation encoding="application/x-llamapun" id="alg3.l1.m1.1d">italic_π start_POSTSUPERSCRIPT italic_M italic_O end_POSTSUPERSCRIPT ← ∅</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg3.l2"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg3.l2.1.1.1" style="font-size:80%;">2:</span></span>  Initialize maximum episode duration <math alttext="E^{\text{max}}\leftarrow 0" class="ltx_Math" display="inline" id="alg3.l2.m1.1"><semantics id="alg3.l2.m1.1a"><mrow id="alg3.l2.m1.1.1" xref="alg3.l2.m1.1.1.cmml"><msup id="alg3.l2.m1.1.1.2" xref="alg3.l2.m1.1.1.2.cmml"><mi id="alg3.l2.m1.1.1.2.2" xref="alg3.l2.m1.1.1.2.2.cmml">E</mi><mtext id="alg3.l2.m1.1.1.2.3" xref="alg3.l2.m1.1.1.2.3a.cmml">max</mtext></msup><mo id="alg3.l2.m1.1.1.1" stretchy="false" xref="alg3.l2.m1.1.1.1.cmml">←</mo><mn id="alg3.l2.m1.1.1.3" xref="alg3.l2.m1.1.1.3.cmml">0</mn></mrow><annotation-xml encoding="MathML-Content" id="alg3.l2.m1.1b"><apply id="alg3.l2.m1.1.1.cmml" xref="alg3.l2.m1.1.1"><ci id="alg3.l2.m1.1.1.1.cmml" xref="alg3.l2.m1.1.1.1">←</ci><apply id="alg3.l2.m1.1.1.2.cmml" xref="alg3.l2.m1.1.1.2"><csymbol cd="ambiguous" id="alg3.l2.m1.1.1.2.1.cmml" xref="alg3.l2.m1.1.1.2">superscript</csymbol><ci id="alg3.l2.m1.1.1.2.2.cmml" xref="alg3.l2.m1.1.1.2.2">𝐸</ci><ci id="alg3.l2.m1.1.1.2.3a.cmml" xref="alg3.l2.m1.1.1.2.3"><mtext id="alg3.l2.m1.1.1.2.3.cmml" mathsize="70%" xref="alg3.l2.m1.1.1.2.3">max</mtext></ci></apply><cn id="alg3.l2.m1.1.1.3.cmml" type="integer" xref="alg3.l2.m1.1.1.3">0</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="alg3.l2.m1.1c">E^{\text{max}}\leftarrow 0</annotation><annotation encoding="application/x-llamapun" id="alg3.l2.m1.1d">italic_E start_POSTSUPERSCRIPT max end_POSTSUPERSCRIPT ← 0</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg3.l3"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg3.l3.1.1.1" style="font-size:80%;">3:</span></span>  <span class="ltx_text ltx_font_bold" id="alg3.l3.2">for all</span> <math alttext="i\in\{1,2,\ldots,n\}" class="ltx_Math" display="inline" id="alg3.l3.m1.4"><semantics id="alg3.l3.m1.4a"><mrow id="alg3.l3.m1.4.5" xref="alg3.l3.m1.4.5.cmml"><mi id="alg3.l3.m1.4.5.2" xref="alg3.l3.m1.4.5.2.cmml">i</mi><mo id="alg3.l3.m1.4.5.1" xref="alg3.l3.m1.4.5.1.cmml">∈</mo><mrow id="alg3.l3.m1.4.5.3.2" xref="alg3.l3.m1.4.5.3.1.cmml"><mo id="alg3.l3.m1.4.5.3.2.1" stretchy="false" xref="alg3.l3.m1.4.5.3.1.cmml">{</mo><mn id="alg3.l3.m1.1.1" xref="alg3.l3.m1.1.1.cmml">1</mn><mo id="alg3.l3.m1.4.5.3.2.2" xref="alg3.l3.m1.4.5.3.1.cmml">,</mo><mn id="alg3.l3.m1.2.2" xref="alg3.l3.m1.2.2.cmml">2</mn><mo id="alg3.l3.m1.4.5.3.2.3" xref="alg3.l3.m1.4.5.3.1.cmml">,</mo><mi id="alg3.l3.m1.3.3" mathvariant="normal" xref="alg3.l3.m1.3.3.cmml">…</mi><mo id="alg3.l3.m1.4.5.3.2.4" xref="alg3.l3.m1.4.5.3.1.cmml">,</mo><mi id="alg3.l3.m1.4.4" xref="alg3.l3.m1.4.4.cmml">n</mi><mo id="alg3.l3.m1.4.5.3.2.5" stretchy="false" xref="alg3.l3.m1.4.5.3.1.cmml">}</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg3.l3.m1.4b"><apply id="alg3.l3.m1.4.5.cmml" xref="alg3.l3.m1.4.5"><in id="alg3.l3.m1.4.5.1.cmml" xref="alg3.l3.m1.4.5.1"></in><ci id="alg3.l3.m1.4.5.2.cmml" xref="alg3.l3.m1.4.5.2">𝑖</ci><set id="alg3.l3.m1.4.5.3.1.cmml" xref="alg3.l3.m1.4.5.3.2"><cn id="alg3.l3.m1.1.1.cmml" type="integer" xref="alg3.l3.m1.1.1">1</cn><cn id="alg3.l3.m1.2.2.cmml" type="integer" xref="alg3.l3.m1.2.2">2</cn><ci id="alg3.l3.m1.3.3.cmml" xref="alg3.l3.m1.3.3">…</ci><ci id="alg3.l3.m1.4.4.cmml" xref="alg3.l3.m1.4.4">𝑛</ci></set></apply></annotation-xml><annotation encoding="application/x-tex" id="alg3.l3.m1.4c">i\in\{1,2,\ldots,n\}</annotation><annotation encoding="application/x-llamapun" id="alg3.l3.m1.4d">italic_i ∈ { 1 , 2 , … , italic_n }</annotation></semantics></math> <span class="ltx_text ltx_font_bold" id="alg3.l3.3">do</span> </div> <div class="ltx_listingline" id="alg3.l4"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg3.l4.1.1.1" style="font-size:80%;">4:</span></span>     Retrieve policy <math alttext="\pi_{i}" class="ltx_Math" display="inline" id="alg3.l4.m1.1"><semantics id="alg3.l4.m1.1a"><msub id="alg3.l4.m1.1.1" xref="alg3.l4.m1.1.1.cmml"><mi id="alg3.l4.m1.1.1.2" xref="alg3.l4.m1.1.1.2.cmml">π</mi><mi id="alg3.l4.m1.1.1.3" xref="alg3.l4.m1.1.1.3.cmml">i</mi></msub><annotation-xml encoding="MathML-Content" id="alg3.l4.m1.1b"><apply id="alg3.l4.m1.1.1.cmml" xref="alg3.l4.m1.1.1"><csymbol cd="ambiguous" id="alg3.l4.m1.1.1.1.cmml" xref="alg3.l4.m1.1.1">subscript</csymbol><ci id="alg3.l4.m1.1.1.2.cmml" xref="alg3.l4.m1.1.1.2">𝜋</ci><ci id="alg3.l4.m1.1.1.3.cmml" xref="alg3.l4.m1.1.1.3">𝑖</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg3.l4.m1.1c">\pi_{i}</annotation><annotation encoding="application/x-llamapun" id="alg3.l4.m1.1d">italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT</annotation></semantics></math> and corresponding episode duration <math alttext="E_{i}" class="ltx_Math" display="inline" id="alg3.l4.m2.1"><semantics id="alg3.l4.m2.1a"><msub id="alg3.l4.m2.1.1" xref="alg3.l4.m2.1.1.cmml"><mi id="alg3.l4.m2.1.1.2" xref="alg3.l4.m2.1.1.2.cmml">E</mi><mi id="alg3.l4.m2.1.1.3" xref="alg3.l4.m2.1.1.3.cmml">i</mi></msub><annotation-xml encoding="MathML-Content" id="alg3.l4.m2.1b"><apply id="alg3.l4.m2.1.1.cmml" xref="alg3.l4.m2.1.1"><csymbol cd="ambiguous" id="alg3.l4.m2.1.1.1.cmml" xref="alg3.l4.m2.1.1">subscript</csymbol><ci id="alg3.l4.m2.1.1.2.cmml" xref="alg3.l4.m2.1.1.2">𝐸</ci><ci id="alg3.l4.m2.1.1.3.cmml" xref="alg3.l4.m2.1.1.3">𝑖</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg3.l4.m2.1c">E_{i}</annotation><annotation encoding="application/x-llamapun" id="alg3.l4.m2.1d">italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg3.l5"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg3.l5.1.1.1" style="font-size:80%;">5:</span></span>     <span class="ltx_text ltx_font_bold" id="alg3.l5.2">if</span> <math alttext="\pi_{i}" class="ltx_Math" display="inline" id="alg3.l5.m1.1"><semantics id="alg3.l5.m1.1a"><msub id="alg3.l5.m1.1.1" xref="alg3.l5.m1.1.1.cmml"><mi id="alg3.l5.m1.1.1.2" xref="alg3.l5.m1.1.1.2.cmml">π</mi><mi id="alg3.l5.m1.1.1.3" xref="alg3.l5.m1.1.1.3.cmml">i</mi></msub><annotation-xml encoding="MathML-Content" id="alg3.l5.m1.1b"><apply id="alg3.l5.m1.1.1.cmml" xref="alg3.l5.m1.1.1"><csymbol cd="ambiguous" id="alg3.l5.m1.1.1.1.cmml" xref="alg3.l5.m1.1.1">subscript</csymbol><ci id="alg3.l5.m1.1.1.2.cmml" xref="alg3.l5.m1.1.1.2">𝜋</ci><ci id="alg3.l5.m1.1.1.3.cmml" xref="alg3.l5.m1.1.1.3">𝑖</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg3.l5.m1.1c">\pi_{i}</annotation><annotation encoding="application/x-llamapun" id="alg3.l5.m1.1d">italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT</annotation></semantics></math> is not trained on extrema weights <math alttext="\{w_{e}\}" class="ltx_Math" display="inline" id="alg3.l5.m2.1"><semantics id="alg3.l5.m2.1a"><mrow id="alg3.l5.m2.1.1.1" xref="alg3.l5.m2.1.1.2.cmml"><mo id="alg3.l5.m2.1.1.1.2" stretchy="false" xref="alg3.l5.m2.1.1.2.cmml">{</mo><msub id="alg3.l5.m2.1.1.1.1" xref="alg3.l5.m2.1.1.1.1.cmml"><mi id="alg3.l5.m2.1.1.1.1.2" xref="alg3.l5.m2.1.1.1.1.2.cmml">w</mi><mi id="alg3.l5.m2.1.1.1.1.3" xref="alg3.l5.m2.1.1.1.1.3.cmml">e</mi></msub><mo id="alg3.l5.m2.1.1.1.3" stretchy="false" xref="alg3.l5.m2.1.1.2.cmml">}</mo></mrow><annotation-xml encoding="MathML-Content" id="alg3.l5.m2.1b"><set id="alg3.l5.m2.1.1.2.cmml" xref="alg3.l5.m2.1.1.1"><apply id="alg3.l5.m2.1.1.1.1.cmml" xref="alg3.l5.m2.1.1.1.1"><csymbol cd="ambiguous" id="alg3.l5.m2.1.1.1.1.1.cmml" xref="alg3.l5.m2.1.1.1.1">subscript</csymbol><ci id="alg3.l5.m2.1.1.1.1.2.cmml" xref="alg3.l5.m2.1.1.1.1.2">𝑤</ci><ci id="alg3.l5.m2.1.1.1.1.3.cmml" xref="alg3.l5.m2.1.1.1.1.3">𝑒</ci></apply></set></annotation-xml><annotation encoding="application/x-tex" id="alg3.l5.m2.1c">\{w_{e}\}</annotation><annotation encoding="application/x-llamapun" id="alg3.l5.m2.1d">{ italic_w start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT }</annotation></semantics></math> <span class="ltx_text ltx_font_bold" id="alg3.l5.3">and</span> <math alttext="E_{i}&gt;E^{\text{max}}" class="ltx_Math" display="inline" id="alg3.l5.m3.1"><semantics id="alg3.l5.m3.1a"><mrow id="alg3.l5.m3.1.1" xref="alg3.l5.m3.1.1.cmml"><msub id="alg3.l5.m3.1.1.2" xref="alg3.l5.m3.1.1.2.cmml"><mi id="alg3.l5.m3.1.1.2.2" xref="alg3.l5.m3.1.1.2.2.cmml">E</mi><mi id="alg3.l5.m3.1.1.2.3" xref="alg3.l5.m3.1.1.2.3.cmml">i</mi></msub><mo id="alg3.l5.m3.1.1.1" xref="alg3.l5.m3.1.1.1.cmml">&gt;</mo><msup id="alg3.l5.m3.1.1.3" xref="alg3.l5.m3.1.1.3.cmml"><mi id="alg3.l5.m3.1.1.3.2" xref="alg3.l5.m3.1.1.3.2.cmml">E</mi><mtext id="alg3.l5.m3.1.1.3.3" xref="alg3.l5.m3.1.1.3.3a.cmml">max</mtext></msup></mrow><annotation-xml encoding="MathML-Content" id="alg3.l5.m3.1b"><apply id="alg3.l5.m3.1.1.cmml" xref="alg3.l5.m3.1.1"><gt id="alg3.l5.m3.1.1.1.cmml" xref="alg3.l5.m3.1.1.1"></gt><apply id="alg3.l5.m3.1.1.2.cmml" xref="alg3.l5.m3.1.1.2"><csymbol cd="ambiguous" id="alg3.l5.m3.1.1.2.1.cmml" xref="alg3.l5.m3.1.1.2">subscript</csymbol><ci id="alg3.l5.m3.1.1.2.2.cmml" xref="alg3.l5.m3.1.1.2.2">𝐸</ci><ci id="alg3.l5.m3.1.1.2.3.cmml" xref="alg3.l5.m3.1.1.2.3">𝑖</ci></apply><apply id="alg3.l5.m3.1.1.3.cmml" xref="alg3.l5.m3.1.1.3"><csymbol cd="ambiguous" id="alg3.l5.m3.1.1.3.1.cmml" xref="alg3.l5.m3.1.1.3">superscript</csymbol><ci id="alg3.l5.m3.1.1.3.2.cmml" xref="alg3.l5.m3.1.1.3.2">𝐸</ci><ci id="alg3.l5.m3.1.1.3.3a.cmml" xref="alg3.l5.m3.1.1.3.3"><mtext id="alg3.l5.m3.1.1.3.3.cmml" mathsize="70%" xref="alg3.l5.m3.1.1.3.3">max</mtext></ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg3.l5.m3.1c">E_{i}&gt;E^{\text{max}}</annotation><annotation encoding="application/x-llamapun" id="alg3.l5.m3.1d">italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT &gt; italic_E start_POSTSUPERSCRIPT max end_POSTSUPERSCRIPT</annotation></semantics></math> <span class="ltx_text ltx_font_bold" id="alg3.l5.4">then</span> </div> <div class="ltx_listingline" id="alg3.l6"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg3.l6.1.1.1" style="font-size:80%;">6:</span></span>        <math alttext="E^{\text{max}}\leftarrow E_{i}" class="ltx_Math" display="inline" id="alg3.l6.m1.1"><semantics id="alg3.l6.m1.1a"><mrow id="alg3.l6.m1.1.1" xref="alg3.l6.m1.1.1.cmml"><msup id="alg3.l6.m1.1.1.2" xref="alg3.l6.m1.1.1.2.cmml"><mi id="alg3.l6.m1.1.1.2.2" xref="alg3.l6.m1.1.1.2.2.cmml">E</mi><mtext id="alg3.l6.m1.1.1.2.3" xref="alg3.l6.m1.1.1.2.3a.cmml">max</mtext></msup><mo id="alg3.l6.m1.1.1.1" stretchy="false" xref="alg3.l6.m1.1.1.1.cmml">←</mo><msub id="alg3.l6.m1.1.1.3" xref="alg3.l6.m1.1.1.3.cmml"><mi id="alg3.l6.m1.1.1.3.2" xref="alg3.l6.m1.1.1.3.2.cmml">E</mi><mi id="alg3.l6.m1.1.1.3.3" xref="alg3.l6.m1.1.1.3.3.cmml">i</mi></msub></mrow><annotation-xml encoding="MathML-Content" id="alg3.l6.m1.1b"><apply id="alg3.l6.m1.1.1.cmml" xref="alg3.l6.m1.1.1"><ci id="alg3.l6.m1.1.1.1.cmml" xref="alg3.l6.m1.1.1.1">←</ci><apply id="alg3.l6.m1.1.1.2.cmml" xref="alg3.l6.m1.1.1.2"><csymbol cd="ambiguous" id="alg3.l6.m1.1.1.2.1.cmml" xref="alg3.l6.m1.1.1.2">superscript</csymbol><ci id="alg3.l6.m1.1.1.2.2.cmml" xref="alg3.l6.m1.1.1.2.2">𝐸</ci><ci id="alg3.l6.m1.1.1.2.3a.cmml" xref="alg3.l6.m1.1.1.2.3"><mtext id="alg3.l6.m1.1.1.2.3.cmml" mathsize="70%" xref="alg3.l6.m1.1.1.2.3">max</mtext></ci></apply><apply id="alg3.l6.m1.1.1.3.cmml" xref="alg3.l6.m1.1.1.3"><csymbol cd="ambiguous" id="alg3.l6.m1.1.1.3.1.cmml" xref="alg3.l6.m1.1.1.3">subscript</csymbol><ci id="alg3.l6.m1.1.1.3.2.cmml" xref="alg3.l6.m1.1.1.3.2">𝐸</ci><ci id="alg3.l6.m1.1.1.3.3.cmml" xref="alg3.l6.m1.1.1.3.3">𝑖</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg3.l6.m1.1c">E^{\text{max}}\leftarrow E_{i}</annotation><annotation encoding="application/x-llamapun" id="alg3.l6.m1.1d">italic_E start_POSTSUPERSCRIPT max end_POSTSUPERSCRIPT ← italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg3.l7"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg3.l7.1.1.1" style="font-size:80%;">7:</span></span>        <math alttext="\pi^{MO}\leftarrow\pi_{i}" class="ltx_Math" display="inline" id="alg3.l7.m1.1"><semantics id="alg3.l7.m1.1a"><mrow id="alg3.l7.m1.1.1" xref="alg3.l7.m1.1.1.cmml"><msup id="alg3.l7.m1.1.1.2" xref="alg3.l7.m1.1.1.2.cmml"><mi id="alg3.l7.m1.1.1.2.2" xref="alg3.l7.m1.1.1.2.2.cmml">π</mi><mrow id="alg3.l7.m1.1.1.2.3" xref="alg3.l7.m1.1.1.2.3.cmml"><mi id="alg3.l7.m1.1.1.2.3.2" xref="alg3.l7.m1.1.1.2.3.2.cmml">M</mi><mo id="alg3.l7.m1.1.1.2.3.1" xref="alg3.l7.m1.1.1.2.3.1.cmml">⁢</mo><mi id="alg3.l7.m1.1.1.2.3.3" xref="alg3.l7.m1.1.1.2.3.3.cmml">O</mi></mrow></msup><mo id="alg3.l7.m1.1.1.1" stretchy="false" xref="alg3.l7.m1.1.1.1.cmml">←</mo><msub id="alg3.l7.m1.1.1.3" xref="alg3.l7.m1.1.1.3.cmml"><mi id="alg3.l7.m1.1.1.3.2" xref="alg3.l7.m1.1.1.3.2.cmml">π</mi><mi id="alg3.l7.m1.1.1.3.3" xref="alg3.l7.m1.1.1.3.3.cmml">i</mi></msub></mrow><annotation-xml encoding="MathML-Content" id="alg3.l7.m1.1b"><apply id="alg3.l7.m1.1.1.cmml" xref="alg3.l7.m1.1.1"><ci id="alg3.l7.m1.1.1.1.cmml" xref="alg3.l7.m1.1.1.1">←</ci><apply id="alg3.l7.m1.1.1.2.cmml" xref="alg3.l7.m1.1.1.2"><csymbol cd="ambiguous" id="alg3.l7.m1.1.1.2.1.cmml" xref="alg3.l7.m1.1.1.2">superscript</csymbol><ci id="alg3.l7.m1.1.1.2.2.cmml" xref="alg3.l7.m1.1.1.2.2">𝜋</ci><apply id="alg3.l7.m1.1.1.2.3.cmml" xref="alg3.l7.m1.1.1.2.3"><times id="alg3.l7.m1.1.1.2.3.1.cmml" xref="alg3.l7.m1.1.1.2.3.1"></times><ci id="alg3.l7.m1.1.1.2.3.2.cmml" xref="alg3.l7.m1.1.1.2.3.2">𝑀</ci><ci id="alg3.l7.m1.1.1.2.3.3.cmml" xref="alg3.l7.m1.1.1.2.3.3">𝑂</ci></apply></apply><apply id="alg3.l7.m1.1.1.3.cmml" xref="alg3.l7.m1.1.1.3"><csymbol cd="ambiguous" id="alg3.l7.m1.1.1.3.1.cmml" xref="alg3.l7.m1.1.1.3">subscript</csymbol><ci id="alg3.l7.m1.1.1.3.2.cmml" xref="alg3.l7.m1.1.1.3.2">𝜋</ci><ci id="alg3.l7.m1.1.1.3.3.cmml" xref="alg3.l7.m1.1.1.3.3">𝑖</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg3.l7.m1.1c">\pi^{MO}\leftarrow\pi_{i}</annotation><annotation encoding="application/x-llamapun" id="alg3.l7.m1.1d">italic_π start_POSTSUPERSCRIPT italic_M italic_O end_POSTSUPERSCRIPT ← italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg3.l8"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg3.l8.1.1.1" style="font-size:80%;">8:</span></span>     <span class="ltx_text ltx_font_bold" id="alg3.l8.2">end</span> <span class="ltx_text ltx_font_bold" id="alg3.l8.3">if</span> </div> <div class="ltx_listingline" id="alg3.l9"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg3.l9.1.1.1" style="font-size:80%;">9:</span></span>  <span class="ltx_text ltx_font_bold" id="alg3.l9.2">end</span> <span class="ltx_text ltx_font_bold" id="alg3.l9.3">for</span> </div> <div class="ltx_listingline" id="alg3.l10"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg3.l10.1.1.1" style="font-size:80%;">10:</span></span>  <span class="ltx_text ltx_font_bold" id="alg3.l10.2">return</span>  <math alttext="\pi^{MO}" class="ltx_Math" display="inline" id="alg3.l10.m1.1"><semantics id="alg3.l10.m1.1a"><msup id="alg3.l10.m1.1.1" xref="alg3.l10.m1.1.1.cmml"><mi id="alg3.l10.m1.1.1.2" xref="alg3.l10.m1.1.1.2.cmml">π</mi><mrow id="alg3.l10.m1.1.1.3" xref="alg3.l10.m1.1.1.3.cmml"><mi id="alg3.l10.m1.1.1.3.2" xref="alg3.l10.m1.1.1.3.2.cmml">M</mi><mo id="alg3.l10.m1.1.1.3.1" xref="alg3.l10.m1.1.1.3.1.cmml">⁢</mo><mi id="alg3.l10.m1.1.1.3.3" xref="alg3.l10.m1.1.1.3.3.cmml">O</mi></mrow></msup><annotation-xml encoding="MathML-Content" id="alg3.l10.m1.1b"><apply id="alg3.l10.m1.1.1.cmml" xref="alg3.l10.m1.1.1"><csymbol cd="ambiguous" id="alg3.l10.m1.1.1.1.cmml" xref="alg3.l10.m1.1.1">superscript</csymbol><ci id="alg3.l10.m1.1.1.2.cmml" xref="alg3.l10.m1.1.1.2">𝜋</ci><apply id="alg3.l10.m1.1.1.3.cmml" xref="alg3.l10.m1.1.1.3"><times id="alg3.l10.m1.1.1.3.1.cmml" xref="alg3.l10.m1.1.1.3.1"></times><ci id="alg3.l10.m1.1.1.3.2.cmml" xref="alg3.l10.m1.1.1.3.2">𝑀</ci><ci id="alg3.l10.m1.1.1.3.3.cmml" xref="alg3.l10.m1.1.1.3.3">𝑂</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg3.l10.m1.1c">\pi^{MO}</annotation><annotation encoding="application/x-llamapun" id="alg3.l10.m1.1d">italic_π start_POSTSUPERSCRIPT italic_M italic_O end_POSTSUPERSCRIPT</annotation></semantics></math> </div> </div> </figure> </section> </section> <section class="ltx_section" id="S3"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">III </span><span class="ltx_text ltx_font_smallcaps" id="S3.1.1">Case Studies</span> </h2> <section class="ltx_subsection" id="S3.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS1.5.1.1">III-A</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS1.6.2">Settings</span> </h3> <figure class="ltx_figure" id="S3.F2"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="369" id="S3.F2.g1" src="x2.png" width="478"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S3.F2.4.2.1" style="font-size:90%;">Figure 2</span>: </span><span class="ltx_text" id="S3.F2.2.1" style="font-size:90%;">Schematic of the RTE <math alttext="5" class="ltx_Math" display="inline" id="S3.F2.2.1.m1.1"><semantics id="S3.F2.2.1.m1.1b"><mn id="S3.F2.2.1.m1.1.1" xref="S3.F2.2.1.m1.1.1.cmml">5</mn><annotation-xml encoding="MathML-Content" id="S3.F2.2.1.m1.1c"><cn id="S3.F2.2.1.m1.1.1.cmml" type="integer" xref="S3.F2.2.1.m1.1.1">5</cn></annotation-xml><annotation encoding="application/x-tex" id="S3.F2.2.1.m1.1d">5</annotation><annotation encoding="application/x-llamapun" id="S3.F2.2.1.m1.1e">5</annotation></semantics></math>-bus system with busbar splitting on substation 3.</span></figcaption> </figure> <div class="ltx_para" id="S3.SS1.p1"> <p class="ltx_p" id="S3.SS1.p1.2">All case studies are performed on the RTE 5-bus system in the Grid2Op environment <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib26" title="">26</a>]</cite>, providing initial insights into a MORL approach for topology control. Figure <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.F2" title="Figure 2 ‣ III-A Settings ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">2</span></a> shows the 5-bus system. Each environment scenario lasts a week with a 5-min resolution (2016 time steps), which corresponds to the maximum episode duration (<math alttext="E" class="ltx_Math" display="inline" id="S3.SS1.p1.1.m1.1"><semantics id="S3.SS1.p1.1.m1.1a"><mi id="S3.SS1.p1.1.m1.1.1" xref="S3.SS1.p1.1.m1.1.1.cmml">E</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.1.m1.1b"><ci id="S3.SS1.p1.1.m1.1.1.cmml" xref="S3.SS1.p1.1.m1.1.1">𝐸</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.1.m1.1c">E</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.1.m1.1d">italic_E</annotation></semantics></math>). We use 16 scenarios for training, 2 scenarios for validation, and 2 scenarios for testing. The experiments are performed on up to 20 random seeds initializing the environment. The PPO neural network architecture consists of 2 fully connected layers with 64-dimensional hidden features. Training is performed using the Adam optimizer with a learning rate of <math alttext="5\times 10^{-4}" class="ltx_Math" display="inline" id="S3.SS1.p1.2.m2.1"><semantics id="S3.SS1.p1.2.m2.1a"><mrow id="S3.SS1.p1.2.m2.1.1" xref="S3.SS1.p1.2.m2.1.1.cmml"><mn id="S3.SS1.p1.2.m2.1.1.2" xref="S3.SS1.p1.2.m2.1.1.2.cmml">5</mn><mo id="S3.SS1.p1.2.m2.1.1.1" lspace="0.222em" rspace="0.222em" xref="S3.SS1.p1.2.m2.1.1.1.cmml">×</mo><msup id="S3.SS1.p1.2.m2.1.1.3" xref="S3.SS1.p1.2.m2.1.1.3.cmml"><mn id="S3.SS1.p1.2.m2.1.1.3.2" xref="S3.SS1.p1.2.m2.1.1.3.2.cmml">10</mn><mrow id="S3.SS1.p1.2.m2.1.1.3.3" xref="S3.SS1.p1.2.m2.1.1.3.3.cmml"><mo id="S3.SS1.p1.2.m2.1.1.3.3a" xref="S3.SS1.p1.2.m2.1.1.3.3.cmml">−</mo><mn id="S3.SS1.p1.2.m2.1.1.3.3.2" xref="S3.SS1.p1.2.m2.1.1.3.3.2.cmml">4</mn></mrow></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.2.m2.1b"><apply id="S3.SS1.p1.2.m2.1.1.cmml" xref="S3.SS1.p1.2.m2.1.1"><times id="S3.SS1.p1.2.m2.1.1.1.cmml" xref="S3.SS1.p1.2.m2.1.1.1"></times><cn id="S3.SS1.p1.2.m2.1.1.2.cmml" type="integer" xref="S3.SS1.p1.2.m2.1.1.2">5</cn><apply id="S3.SS1.p1.2.m2.1.1.3.cmml" xref="S3.SS1.p1.2.m2.1.1.3"><csymbol cd="ambiguous" id="S3.SS1.p1.2.m2.1.1.3.1.cmml" xref="S3.SS1.p1.2.m2.1.1.3">superscript</csymbol><cn id="S3.SS1.p1.2.m2.1.1.3.2.cmml" type="integer" xref="S3.SS1.p1.2.m2.1.1.3.2">10</cn><apply id="S3.SS1.p1.2.m2.1.1.3.3.cmml" xref="S3.SS1.p1.2.m2.1.1.3.3"><minus id="S3.SS1.p1.2.m2.1.1.3.3.1.cmml" xref="S3.SS1.p1.2.m2.1.1.3.3"></minus><cn id="S3.SS1.p1.2.m2.1.1.3.3.2.cmml" type="integer" xref="S3.SS1.p1.2.m2.1.1.3.3.2">4</cn></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.2.m2.1c">5\times 10^{-4}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.2.m2.1d">5 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT</annotation></semantics></math> and a batch size of 512. The MOPPO algorithm assumes 4 update cycles. In the case study on robustness to contingencies (<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.SS3" title="III-C Robustness to N-1 Contingencies ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">Section</span> <span class="ltx_text ltx_ref_tag"><span class="ltx_text">III-C</span></span></a>), an adversarial agent is considered to simulate N-1 contingency states by randomly targeting power lines. Additionally, a set of common expert rules are considered to improve the performance and ensure safety, as detailed in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib14" title="">14</a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib8" title="">8</a>]</cite>. All computations are performed using DelftBlue’s supercomputer, equipped with Intel XEON E5-6248R 24C 3.0GHz CPU cores <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib28" title="">28</a>]</cite>. We use Grid2Op 1.10, LightSim2Grid 0.8, pandapower 2.14, gymnasium 0.29, mo-gymnasium 1.1 and the morl-baselines 1.0 package. The code for this study is publicly available in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib29" title="">29</a>]</cite>.</p> </div> <div class="ltx_para" id="S3.SS1.p2"> <p class="ltx_p" id="S3.SS1.p2.1">In Section <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.SS2" title="III-B Pareto Front Approximation ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">III-B</span></span></a>, we compare our approach to a random sampling (RS) benchmark, which replaces the DOL component by randomly selecting weight vectors from a uniform distribution. These randomly selected weights are then provided to the MOPPO, following the same process as in the DOL-based approach. To enhance this baseline, we incorporate the extrema weights <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib27" title="">27</a>]</cite>, as exploring these weights is expected to yield significant gains in the objective space.</p> </div> </section> <section class="ltx_subsection" id="S3.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS2.5.1.1">III-B</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS2.6.2">Pareto Front Approximation</span> </h3> <div class="ltx_para" id="S3.SS2.p1"> <p class="ltx_p" id="S3.SS2.p1.1"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.T1" title="In III-B Pareto Front Approximation ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">Table</span> <span class="ltx_text ltx_ref_tag">I</span></a> presents the results for hypervolume, sparsity and inverted generational distance (IGD) of the proposed DOL approach and RS. The hypervolume metric evaluates the spread and distribution of the solution space, the sparsity metric quantifies the density of the solution set, and IGD measures how accurately the generated solution set approximates the true Pareto front <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib25" title="">25</a>]</cite>. The DOL and RS achieve similar mean hypervolume, with DOL slightly outperforming RS. However, DOL shows 50% lower mean sparsity compared to RS, indicating a much denser coverage of the Pareto front, more suitable for a decision support tool. Additionally, DOL exhibits a substantial reduction in IGD by 60%, indicating a better approximation of the assumed true convex coverage set.</p> </div> <figure class="ltx_table" id="S3.T1"> <div class="ltx_inline-block ltx_align_center ltx_transformed_outer" id="S3.T1.2" style="width:195.1pt;height:53.7pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="transform:translate(-33.3pt,9.2pt) scale(0.745511891888394,0.745511891888394) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S3.T1.2.1"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S3.T1.2.1.1.1"> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row ltx_border_r ltx_border_t" id="S3.T1.2.1.1.1.1" rowspan="2" style="padding-top:1.5pt;padding-bottom:1.5pt;"><span class="ltx_text" id="S3.T1.2.1.1.1.1.1">Approach</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S3.T1.2.1.1.1.2" style="padding-top:1.5pt;padding-bottom:1.5pt;">HV</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S3.T1.2.1.1.1.3" style="padding-top:1.5pt;padding-bottom:1.5pt;">HV</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S3.T1.2.1.1.1.4" style="padding-top:1.5pt;padding-bottom:1.5pt;">Spa</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S3.T1.2.1.1.1.5" style="padding-top:1.5pt;padding-bottom:1.5pt;">Spa</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S3.T1.2.1.1.1.6" style="padding-top:1.5pt;padding-bottom:1.5pt;">IGD</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S3.T1.2.1.1.1.7" style="padding-top:1.5pt;padding-bottom:1.5pt;">IGD</th> </tr> <tr class="ltx_tr" id="S3.T1.2.1.2.2"> <td class="ltx_td ltx_align_center" id="S3.T1.2.1.2.2.1" style="padding-top:1.5pt;padding-bottom:1.5pt;">Mean</td> <td class="ltx_td ltx_align_center" id="S3.T1.2.1.2.2.2" style="padding-top:1.5pt;padding-bottom:1.5pt;">Std</td> <td class="ltx_td ltx_align_center" id="S3.T1.2.1.2.2.3" style="padding-top:1.5pt;padding-bottom:1.5pt;">Mean</td> <td class="ltx_td ltx_align_center" id="S3.T1.2.1.2.2.4" style="padding-top:1.5pt;padding-bottom:1.5pt;">Std</td> <td class="ltx_td ltx_align_center" id="S3.T1.2.1.2.2.5" style="padding-top:1.5pt;padding-bottom:1.5pt;">Mean</td> <td class="ltx_td ltx_align_center" id="S3.T1.2.1.2.2.6" style="padding-top:1.5pt;padding-bottom:1.5pt;">Std</td> </tr> <tr class="ltx_tr" id="S3.T1.2.1.3.3"> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row ltx_border_r ltx_border_t" id="S3.T1.2.1.3.3.1" style="padding-top:1.5pt;padding-bottom:1.5pt;">DOL</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S3.T1.2.1.3.3.2" style="padding-top:1.5pt;padding-bottom:1.5pt;">44.62</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S3.T1.2.1.3.3.3" style="padding-top:1.5pt;padding-bottom:1.5pt;">27.32</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S3.T1.2.1.3.3.4" style="padding-top:1.5pt;padding-bottom:1.5pt;">0.11</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S3.T1.2.1.3.3.5" style="padding-top:1.5pt;padding-bottom:1.5pt;">0.04</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S3.T1.2.1.3.3.6" style="padding-top:1.5pt;padding-bottom:1.5pt;">0.84</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S3.T1.2.1.3.3.7" style="padding-top:1.5pt;padding-bottom:1.5pt;">0.20</th> </tr> <tr class="ltx_tr" id="S3.T1.2.1.4.4"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_b ltx_border_r" id="S3.T1.2.1.4.4.1" style="padding-top:1.5pt;padding-bottom:1.5pt;">RS</th> <td class="ltx_td ltx_align_center ltx_border_b" id="S3.T1.2.1.4.4.2" style="padding-top:1.5pt;padding-bottom:1.5pt;">37.35</td> <td class="ltx_td ltx_align_center ltx_border_b" id="S3.T1.2.1.4.4.3" style="padding-top:1.5pt;padding-bottom:1.5pt;">31.67</td> <td class="ltx_td ltx_align_center ltx_border_b" id="S3.T1.2.1.4.4.4" style="padding-top:1.5pt;padding-bottom:1.5pt;">0.22</td> <td class="ltx_td ltx_align_center ltx_border_b" id="S3.T1.2.1.4.4.5" style="padding-top:1.5pt;padding-bottom:1.5pt;">0.06</td> <td class="ltx_td ltx_align_center ltx_border_b" id="S3.T1.2.1.4.4.6" style="padding-top:1.5pt;padding-bottom:1.5pt;">2.22</td> <td class="ltx_td ltx_align_center ltx_border_b" id="S3.T1.2.1.4.4.7" style="padding-top:1.5pt;padding-bottom:1.5pt;">0.12</td> </tr> </tbody> </table> </span></div> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S3.T1.3.1.1" style="font-size:90%;">TABLE I</span>: </span><span class="ltx_text" id="S3.T1.4.2" style="font-size:90%;">Hypervolume, sparsity and inverted generational distance for DOL and RS Methods.</span></figcaption> </figure> <figure class="ltx_figure" id="S3.F3"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="556" id="S3.F3.g1" src="x3.png" width="747"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S3.F3.2.1.1" style="font-size:90%;">Figure 3</span>: </span><span class="ltx_text" id="S3.F3.3.2" style="font-size:90%;">2D Projection of Super CCS for line loading reward vs topological deviation Reward.</span></figcaption> </figure> <figure class="ltx_figure" id="S3.F4"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="555" id="S3.F4.g1" src="x4.png" width="747"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S3.F4.2.1.1" style="font-size:90%;">Figure 4</span>: </span><span class="ltx_text" id="S3.F4.3.2" style="font-size:90%;">2D Projection of Super CCS for line loading reward vs switching frequency reward.</span></figcaption> </figure> <figure class="ltx_figure" id="S3.F5"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="555" id="S3.F5.g1" src="x5.png" width="747"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S3.F5.2.1.1" style="font-size:90%;">Figure 5</span>: </span><span class="ltx_text" id="S3.F5.3.2" style="font-size:90%;">2D Projection of Super CCS for topological deviation reward vs switching frequency reward</span></figcaption> </figure> <div class="ltx_para" id="S3.SS2.p2"> <p class="ltx_p" id="S3.SS2.p2.1"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.F3" title="In III-B Pareto Front Approximation ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">Figs.</span> <span class="ltx_text ltx_ref_tag">3</span></a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.F4" title="Figure 4 ‣ III-B Pareto Front Approximation ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">4</span></a> and <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.F5" title="Figure 5 ‣ III-B Pareto Front Approximation ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">5</span></a> show the 2D projections of the super CCS for DOL and RS runs across five seeds into 2-dimensional reward spaces. The super CCS is constructed as the convex set over all generated solution sets across all seeds and both DOL and RS generated solutions. The Super CCS here serves as an indicator for the assumed true CCS. The DOL generates more points compared to the RS benchmark that contribute to the formation of the super CCS. Assuming the super CCS reflects the true trade-offs in the objective space, we can conclude that DOL more closely approximates these trade-offs.</p> </div> <div class="ltx_para" id="S3.SS2.p3"> <p class="ltx_p" id="S3.SS2.p3.2"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.F3" title="In III-B Pareto Front Approximation ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">Figs.</span> <span class="ltx_text ltx_ref_tag">3</span></a>, <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.F4" title="Figure 4 ‣ III-B Pareto Front Approximation ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">4</span></a> and <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.F5" title="Figure 5 ‣ III-B Pareto Front Approximation ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">5</span></a> illustrates key trade-offs among the objectives. In <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.F3" title="In III-B Pareto Front Approximation ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">Fig.</span> <span class="ltx_text ltx_ref_tag">3</span></a>, a clear conflict is observed between <math alttext="R^{L}" class="ltx_Math" display="inline" id="S3.SS2.p3.1.m1.1"><semantics id="S3.SS2.p3.1.m1.1a"><msup id="S3.SS2.p3.1.m1.1.1" xref="S3.SS2.p3.1.m1.1.1.cmml"><mi id="S3.SS2.p3.1.m1.1.1.2" xref="S3.SS2.p3.1.m1.1.1.2.cmml">R</mi><mi id="S3.SS2.p3.1.m1.1.1.3" xref="S3.SS2.p3.1.m1.1.1.3.cmml">L</mi></msup><annotation-xml encoding="MathML-Content" id="S3.SS2.p3.1.m1.1b"><apply id="S3.SS2.p3.1.m1.1.1.cmml" xref="S3.SS2.p3.1.m1.1.1"><csymbol cd="ambiguous" id="S3.SS2.p3.1.m1.1.1.1.cmml" xref="S3.SS2.p3.1.m1.1.1">superscript</csymbol><ci id="S3.SS2.p3.1.m1.1.1.2.cmml" xref="S3.SS2.p3.1.m1.1.1.2">𝑅</ci><ci id="S3.SS2.p3.1.m1.1.1.3.cmml" xref="S3.SS2.p3.1.m1.1.1.3">𝐿</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p3.1.m1.1c">R^{L}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p3.1.m1.1d">italic_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT</annotation></semantics></math> and <math alttext="R^{D}" class="ltx_Math" display="inline" id="S3.SS2.p3.2.m2.1"><semantics id="S3.SS2.p3.2.m2.1a"><msup id="S3.SS2.p3.2.m2.1.1" xref="S3.SS2.p3.2.m2.1.1.cmml"><mi id="S3.SS2.p3.2.m2.1.1.2" xref="S3.SS2.p3.2.m2.1.1.2.cmml">R</mi><mi id="S3.SS2.p3.2.m2.1.1.3" xref="S3.SS2.p3.2.m2.1.1.3.cmml">D</mi></msup><annotation-xml encoding="MathML-Content" id="S3.SS2.p3.2.m2.1b"><apply id="S3.SS2.p3.2.m2.1.1.cmml" xref="S3.SS2.p3.2.m2.1.1"><csymbol cd="ambiguous" id="S3.SS2.p3.2.m2.1.1.1.cmml" xref="S3.SS2.p3.2.m2.1.1">superscript</csymbol><ci id="S3.SS2.p3.2.m2.1.1.2.cmml" xref="S3.SS2.p3.2.m2.1.1.2">𝑅</ci><ci id="S3.SS2.p3.2.m2.1.1.3.cmml" xref="S3.SS2.p3.2.m2.1.1.3">𝐷</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p3.2.m2.1c">R^{D}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p3.2.m2.1d">italic_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT</annotation></semantics></math>, where reducing topological deviation often results in lower line loading. This indicates that changes in topology are sometimes necessary to maintain grid security, trading off topological deviation for improved line loading.</p> </div> <div class="ltx_para" id="S3.SS2.p4"> <p class="ltx_p" id="S3.SS2.p4.3">In <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.F4" title="In III-B Pareto Front Approximation ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">Fig.</span> <span class="ltx_text ltx_ref_tag">4</span></a>, low <math alttext="R^{L}" class="ltx_Math" display="inline" id="S3.SS2.p4.1.m1.1"><semantics id="S3.SS2.p4.1.m1.1a"><msup id="S3.SS2.p4.1.m1.1.1" xref="S3.SS2.p4.1.m1.1.1.cmml"><mi id="S3.SS2.p4.1.m1.1.1.2" xref="S3.SS2.p4.1.m1.1.1.2.cmml">R</mi><mi id="S3.SS2.p4.1.m1.1.1.3" xref="S3.SS2.p4.1.m1.1.1.3.cmml">L</mi></msup><annotation-xml encoding="MathML-Content" id="S3.SS2.p4.1.m1.1b"><apply id="S3.SS2.p4.1.m1.1.1.cmml" xref="S3.SS2.p4.1.m1.1.1"><csymbol cd="ambiguous" id="S3.SS2.p4.1.m1.1.1.1.cmml" xref="S3.SS2.p4.1.m1.1.1">superscript</csymbol><ci id="S3.SS2.p4.1.m1.1.1.2.cmml" xref="S3.SS2.p4.1.m1.1.1.2">𝑅</ci><ci id="S3.SS2.p4.1.m1.1.1.3.cmml" xref="S3.SS2.p4.1.m1.1.1.3">𝐿</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p4.1.m1.1c">R^{L}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p4.1.m1.1d">italic_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT</annotation></semantics></math> corresponds to low <math alttext="R^{S}" class="ltx_Math" display="inline" id="S3.SS2.p4.2.m2.1"><semantics id="S3.SS2.p4.2.m2.1a"><msup id="S3.SS2.p4.2.m2.1.1" xref="S3.SS2.p4.2.m2.1.1.cmml"><mi id="S3.SS2.p4.2.m2.1.1.2" xref="S3.SS2.p4.2.m2.1.1.2.cmml">R</mi><mi id="S3.SS2.p4.2.m2.1.1.3" xref="S3.SS2.p4.2.m2.1.1.3.cmml">S</mi></msup><annotation-xml encoding="MathML-Content" id="S3.SS2.p4.2.m2.1b"><apply id="S3.SS2.p4.2.m2.1.1.cmml" xref="S3.SS2.p4.2.m2.1.1"><csymbol cd="ambiguous" id="S3.SS2.p4.2.m2.1.1.1.cmml" xref="S3.SS2.p4.2.m2.1.1">superscript</csymbol><ci id="S3.SS2.p4.2.m2.1.1.2.cmml" xref="S3.SS2.p4.2.m2.1.1.2">𝑅</ci><ci id="S3.SS2.p4.2.m2.1.1.3.cmml" xref="S3.SS2.p4.2.m2.1.1.3">𝑆</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p4.2.m2.1c">R^{S}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p4.2.m2.1d">italic_R start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT</annotation></semantics></math>, indicating that minimal switching interaction allows the grid to operate securely. However, some switching is necessary to further increase <math alttext="R^{L}" class="ltx_Math" display="inline" id="S3.SS2.p4.3.m3.1"><semantics id="S3.SS2.p4.3.m3.1a"><msup id="S3.SS2.p4.3.m3.1.1" xref="S3.SS2.p4.3.m3.1.1.cmml"><mi id="S3.SS2.p4.3.m3.1.1.2" xref="S3.SS2.p4.3.m3.1.1.2.cmml">R</mi><mi id="S3.SS2.p4.3.m3.1.1.3" xref="S3.SS2.p4.3.m3.1.1.3.cmml">L</mi></msup><annotation-xml encoding="MathML-Content" id="S3.SS2.p4.3.m3.1b"><apply id="S3.SS2.p4.3.m3.1.1.cmml" xref="S3.SS2.p4.3.m3.1.1"><csymbol cd="ambiguous" id="S3.SS2.p4.3.m3.1.1.1.cmml" xref="S3.SS2.p4.3.m3.1.1">superscript</csymbol><ci id="S3.SS2.p4.3.m3.1.1.2.cmml" xref="S3.SS2.p4.3.m3.1.1.2">𝑅</ci><ci id="S3.SS2.p4.3.m3.1.1.3.cmml" xref="S3.SS2.p4.3.m3.1.1.3">𝐿</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p4.3.m3.1c">R^{L}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p4.3.m3.1d">italic_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT</annotation></semantics></math>. The trade-off solutions in the top middle of <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.F4" title="In III-B Pareto Front Approximation ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">Fig.</span> <span class="ltx_text ltx_ref_tag">4</span></a> may appeal to power system operators who seek an RL policy balancing low switching frequency with low line loading.</p> </div> <div class="ltx_para" id="S3.SS2.p5"> <p class="ltx_p" id="S3.SS2.p5.2">In <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.F5" title="In III-B Pareto Front Approximation ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">Fig.</span> <span class="ltx_text ltx_ref_tag">5</span></a>, <math alttext="R^{D}" class="ltx_Math" display="inline" id="S3.SS2.p5.1.m1.1"><semantics id="S3.SS2.p5.1.m1.1a"><msup id="S3.SS2.p5.1.m1.1.1" xref="S3.SS2.p5.1.m1.1.1.cmml"><mi id="S3.SS2.p5.1.m1.1.1.2" xref="S3.SS2.p5.1.m1.1.1.2.cmml">R</mi><mi id="S3.SS2.p5.1.m1.1.1.3" xref="S3.SS2.p5.1.m1.1.1.3.cmml">D</mi></msup><annotation-xml encoding="MathML-Content" id="S3.SS2.p5.1.m1.1b"><apply id="S3.SS2.p5.1.m1.1.1.cmml" xref="S3.SS2.p5.1.m1.1.1"><csymbol cd="ambiguous" id="S3.SS2.p5.1.m1.1.1.1.cmml" xref="S3.SS2.p5.1.m1.1.1">superscript</csymbol><ci id="S3.SS2.p5.1.m1.1.1.2.cmml" xref="S3.SS2.p5.1.m1.1.1.2">𝑅</ci><ci id="S3.SS2.p5.1.m1.1.1.3.cmml" xref="S3.SS2.p5.1.m1.1.1.3">𝐷</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p5.1.m1.1c">R^{D}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p5.1.m1.1d">italic_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT</annotation></semantics></math> and <math alttext="R^{S}" class="ltx_Math" display="inline" id="S3.SS2.p5.2.m2.1"><semantics id="S3.SS2.p5.2.m2.1a"><msup id="S3.SS2.p5.2.m2.1.1" xref="S3.SS2.p5.2.m2.1.1.cmml"><mi id="S3.SS2.p5.2.m2.1.1.2" xref="S3.SS2.p5.2.m2.1.1.2.cmml">R</mi><mi id="S3.SS2.p5.2.m2.1.1.3" xref="S3.SS2.p5.2.m2.1.1.3.cmml">S</mi></msup><annotation-xml encoding="MathML-Content" id="S3.SS2.p5.2.m2.1b"><apply id="S3.SS2.p5.2.m2.1.1.cmml" xref="S3.SS2.p5.2.m2.1.1"><csymbol cd="ambiguous" id="S3.SS2.p5.2.m2.1.1.1.cmml" xref="S3.SS2.p5.2.m2.1.1">superscript</csymbol><ci id="S3.SS2.p5.2.m2.1.1.2.cmml" xref="S3.SS2.p5.2.m2.1.1.2">𝑅</ci><ci id="S3.SS2.p5.2.m2.1.1.3.cmml" xref="S3.SS2.p5.2.m2.1.1.3">𝑆</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p5.2.m2.1c">R^{S}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p5.2.m2.1d">italic_R start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT</annotation></semantics></math> exhibit a trade-off, as lower switching frequency can lead to higher topological deviation. This occurs because deviations in topology persist longer without switching actions, leaving the grid in a deviated state for extended periods.</p> </div> </section> <section class="ltx_subsection" id="S3.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS3.5.1.1">III-C</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS3.6.2">Robustness to N-1 Contingencies</span> </h3> <div class="ltx_para" id="S3.SS3.p1"> <p class="ltx_p" id="S3.SS3.p1.5">This case study investigates the robustness of MO policies that are trained on multiple rewards compared to SO policies under N-1 contingency states. The contingency states are generated by an adversarial attacker, which disconnects power lines at random. We use the same settings for the adversarial attacks as in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#bib.bib19" title="">19</a>]</cite>. The multi-objective policies (MO policies) are trained on <math alttext="R^{L}" class="ltx_Math" display="inline" id="S3.SS3.p1.1.m1.1"><semantics id="S3.SS3.p1.1.m1.1a"><msup id="S3.SS3.p1.1.m1.1.1" xref="S3.SS3.p1.1.m1.1.1.cmml"><mi id="S3.SS3.p1.1.m1.1.1.2" xref="S3.SS3.p1.1.m1.1.1.2.cmml">R</mi><mi id="S3.SS3.p1.1.m1.1.1.3" xref="S3.SS3.p1.1.m1.1.1.3.cmml">L</mi></msup><annotation-xml encoding="MathML-Content" id="S3.SS3.p1.1.m1.1b"><apply id="S3.SS3.p1.1.m1.1.1.cmml" xref="S3.SS3.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S3.SS3.p1.1.m1.1.1.1.cmml" xref="S3.SS3.p1.1.m1.1.1">superscript</csymbol><ci id="S3.SS3.p1.1.m1.1.1.2.cmml" xref="S3.SS3.p1.1.m1.1.1.2">𝑅</ci><ci id="S3.SS3.p1.1.m1.1.1.3.cmml" xref="S3.SS3.p1.1.m1.1.1.3">𝐿</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p1.1.m1.1c">R^{L}</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p1.1.m1.1d">italic_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT</annotation></semantics></math>, <math alttext="R^{D}" class="ltx_Math" display="inline" id="S3.SS3.p1.2.m2.1"><semantics id="S3.SS3.p1.2.m2.1a"><msup id="S3.SS3.p1.2.m2.1.1" xref="S3.SS3.p1.2.m2.1.1.cmml"><mi id="S3.SS3.p1.2.m2.1.1.2" xref="S3.SS3.p1.2.m2.1.1.2.cmml">R</mi><mi id="S3.SS3.p1.2.m2.1.1.3" xref="S3.SS3.p1.2.m2.1.1.3.cmml">D</mi></msup><annotation-xml encoding="MathML-Content" id="S3.SS3.p1.2.m2.1b"><apply id="S3.SS3.p1.2.m2.1.1.cmml" xref="S3.SS3.p1.2.m2.1.1"><csymbol cd="ambiguous" id="S3.SS3.p1.2.m2.1.1.1.cmml" xref="S3.SS3.p1.2.m2.1.1">superscript</csymbol><ci id="S3.SS3.p1.2.m2.1.1.2.cmml" xref="S3.SS3.p1.2.m2.1.1.2">𝑅</ci><ci id="S3.SS3.p1.2.m2.1.1.3.cmml" xref="S3.SS3.p1.2.m2.1.1.3">𝐷</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p1.2.m2.1c">R^{D}</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p1.2.m2.1d">italic_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT</annotation></semantics></math>, and <math alttext="R^{F}" class="ltx_Math" display="inline" id="S3.SS3.p1.3.m3.1"><semantics id="S3.SS3.p1.3.m3.1a"><msup id="S3.SS3.p1.3.m3.1.1" xref="S3.SS3.p1.3.m3.1.1.cmml"><mi id="S3.SS3.p1.3.m3.1.1.2" xref="S3.SS3.p1.3.m3.1.1.2.cmml">R</mi><mi id="S3.SS3.p1.3.m3.1.1.3" xref="S3.SS3.p1.3.m3.1.1.3.cmml">F</mi></msup><annotation-xml encoding="MathML-Content" id="S3.SS3.p1.3.m3.1b"><apply id="S3.SS3.p1.3.m3.1.1.cmml" xref="S3.SS3.p1.3.m3.1.1"><csymbol cd="ambiguous" id="S3.SS3.p1.3.m3.1.1.1.cmml" xref="S3.SS3.p1.3.m3.1.1">superscript</csymbol><ci id="S3.SS3.p1.3.m3.1.1.2.cmml" xref="S3.SS3.p1.3.m3.1.1.2">𝑅</ci><ci id="S3.SS3.p1.3.m3.1.1.3.cmml" xref="S3.SS3.p1.3.m3.1.1.3">𝐹</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p1.3.m3.1c">R^{F}</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p1.3.m3.1d">italic_R start_POSTSUPERSCRIPT italic_F end_POSTSUPERSCRIPT</annotation></semantics></math> rewards, while the SO policy is trained on the common <math alttext="R^{L}" class="ltx_Math" display="inline" id="S3.SS3.p1.4.m4.1"><semantics id="S3.SS3.p1.4.m4.1a"><msup id="S3.SS3.p1.4.m4.1.1" xref="S3.SS3.p1.4.m4.1.1.cmml"><mi id="S3.SS3.p1.4.m4.1.1.2" xref="S3.SS3.p1.4.m4.1.1.2.cmml">R</mi><mi id="S3.SS3.p1.4.m4.1.1.3" xref="S3.SS3.p1.4.m4.1.1.3.cmml">L</mi></msup><annotation-xml encoding="MathML-Content" id="S3.SS3.p1.4.m4.1b"><apply id="S3.SS3.p1.4.m4.1.1.cmml" xref="S3.SS3.p1.4.m4.1.1"><csymbol cd="ambiguous" id="S3.SS3.p1.4.m4.1.1.1.cmml" xref="S3.SS3.p1.4.m4.1.1">superscript</csymbol><ci id="S3.SS3.p1.4.m4.1.1.2.cmml" xref="S3.SS3.p1.4.m4.1.1.2">𝑅</ci><ci id="S3.SS3.p1.4.m4.1.1.3.cmml" xref="S3.SS3.p1.4.m4.1.1.3">𝐿</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p1.4.m4.1c">R^{L}</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p1.4.m4.1d">italic_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT</annotation></semantics></math> reward. The MO policies are selected based on Algorithm <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg3" title="Algorithm 3 ‣ II-D Policy Selection ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">3</span></a> considering the Episode Duration metric <math alttext="E" class="ltx_Math" display="inline" id="S3.SS3.p1.5.m5.1"><semantics id="S3.SS3.p1.5.m5.1a"><mi id="S3.SS3.p1.5.m5.1.1" xref="S3.SS3.p1.5.m5.1.1.cmml">E</mi><annotation-xml encoding="MathML-Content" id="S3.SS3.p1.5.m5.1b"><ci id="S3.SS3.p1.5.m5.1.1.cmml" xref="S3.SS3.p1.5.m5.1.1">𝐸</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p1.5.m5.1c">E</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p1.5.m5.1d">italic_E</annotation></semantics></math>. The following scenarios are considered:</p> <ul class="ltx_itemize" id="S3.I1"> <li class="ltx_item" id="S3.I1.i1" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I1.i1.p1"> <p class="ltx_p" id="S3.I1.i1.p1.1">No Contingencies: The environment does not include any unplanned contingencies (baseline).</p> </div> </li> <li class="ltx_item" id="S3.I1.i2" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I1.i2.p1"> <p class="ltx_p" id="S3.I1.i2.p1.1">Moderately Frequent Contingencies: line disconnection randomly at maximum twice a day.</p> </div> </li> <li class="ltx_item" id="S3.I1.i3" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I1.i3.p1"> <p class="ltx_p" id="S3.I1.i3.p1.1">Highly Frequent Contingencies: line disconnection randomly at maximum four times a day.</p> </div> </li> </ul> </div> <figure class="ltx_table" id="S3.T2"> <table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle" id="S3.T2.3"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S3.T2.3.4.1"> <td class="ltx_td ltx_border_t" id="S3.T2.3.4.1.1" style="padding-top:1.5pt;padding-bottom:1.5pt;"></td> <th class="ltx_td ltx_th ltx_th_column ltx_border_t" id="S3.T2.3.4.1.2" style="padding-top:1.5pt;padding-bottom:1.5pt;"></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S3.T2.3.4.1.3" style="padding-top:1.5pt;padding-bottom:1.5pt;">N-1 Contingency Frequency</th> <td class="ltx_td ltx_border_t" id="S3.T2.3.4.1.4" style="padding-top:1.5pt;padding-bottom:1.5pt;"></td> </tr> <tr class="ltx_tr" id="S3.T2.3.5.2"> <td class="ltx_td" id="S3.T2.3.5.2.1" style="padding-top:1.5pt;padding-bottom:1.5pt;"></td> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S3.T2.3.5.2.2" style="padding-top:1.5pt;padding-bottom:1.5pt;">No</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S3.T2.3.5.2.3" style="padding-top:1.5pt;padding-bottom:1.5pt;">Moderate</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S3.T2.3.5.2.4" style="padding-top:1.5pt;padding-bottom:1.5pt;">High</th> </tr> <tr class="ltx_tr" id="S3.T2.1.1"> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.1.1.1" style="padding-top:1.5pt;padding-bottom:1.5pt;"> <math alttext="E^{MO}" class="ltx_Math" display="inline" id="S3.T2.1.1.1.m1.1"><semantics id="S3.T2.1.1.1.m1.1a"><msup id="S3.T2.1.1.1.m1.1.1" xref="S3.T2.1.1.1.m1.1.1.cmml"><mi id="S3.T2.1.1.1.m1.1.1.2" xref="S3.T2.1.1.1.m1.1.1.2.cmml">E</mi><mrow id="S3.T2.1.1.1.m1.1.1.3" xref="S3.T2.1.1.1.m1.1.1.3.cmml"><mi id="S3.T2.1.1.1.m1.1.1.3.2" xref="S3.T2.1.1.1.m1.1.1.3.2.cmml">M</mi><mo id="S3.T2.1.1.1.m1.1.1.3.1" xref="S3.T2.1.1.1.m1.1.1.3.1.cmml">⁢</mo><mi id="S3.T2.1.1.1.m1.1.1.3.3" xref="S3.T2.1.1.1.m1.1.1.3.3.cmml">O</mi></mrow></msup><annotation-xml encoding="MathML-Content" id="S3.T2.1.1.1.m1.1b"><apply id="S3.T2.1.1.1.m1.1.1.cmml" xref="S3.T2.1.1.1.m1.1.1"><csymbol cd="ambiguous" id="S3.T2.1.1.1.m1.1.1.1.cmml" xref="S3.T2.1.1.1.m1.1.1">superscript</csymbol><ci id="S3.T2.1.1.1.m1.1.1.2.cmml" xref="S3.T2.1.1.1.m1.1.1.2">𝐸</ci><apply id="S3.T2.1.1.1.m1.1.1.3.cmml" xref="S3.T2.1.1.1.m1.1.1.3"><times id="S3.T2.1.1.1.m1.1.1.3.1.cmml" xref="S3.T2.1.1.1.m1.1.1.3.1"></times><ci id="S3.T2.1.1.1.m1.1.1.3.2.cmml" xref="S3.T2.1.1.1.m1.1.1.3.2">𝑀</ci><ci id="S3.T2.1.1.1.m1.1.1.3.3.cmml" xref="S3.T2.1.1.1.m1.1.1.3.3">𝑂</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.T2.1.1.1.m1.1c">E^{MO}</annotation><annotation encoding="application/x-llamapun" id="S3.T2.1.1.1.m1.1d">italic_E start_POSTSUPERSCRIPT italic_M italic_O end_POSTSUPERSCRIPT</annotation></semantics></math>(%)</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.1.1.2" style="padding-top:1.5pt;padding-bottom:1.5pt;">94.83</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.1.1.3" style="padding-top:1.5pt;padding-bottom:1.5pt;">97.68</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.1.1.4" style="padding-top:1.5pt;padding-bottom:1.5pt;">90.33</td> </tr> <tr class="ltx_tr" id="S3.T2.2.2"> <td class="ltx_td ltx_align_center" id="S3.T2.2.2.1" style="padding-top:1.5pt;padding-bottom:1.5pt;"> <math alttext="E^{SO}" class="ltx_Math" display="inline" id="S3.T2.2.2.1.m1.1"><semantics id="S3.T2.2.2.1.m1.1a"><msup id="S3.T2.2.2.1.m1.1.1" xref="S3.T2.2.2.1.m1.1.1.cmml"><mi id="S3.T2.2.2.1.m1.1.1.2" xref="S3.T2.2.2.1.m1.1.1.2.cmml">E</mi><mrow id="S3.T2.2.2.1.m1.1.1.3" xref="S3.T2.2.2.1.m1.1.1.3.cmml"><mi id="S3.T2.2.2.1.m1.1.1.3.2" xref="S3.T2.2.2.1.m1.1.1.3.2.cmml">S</mi><mo id="S3.T2.2.2.1.m1.1.1.3.1" xref="S3.T2.2.2.1.m1.1.1.3.1.cmml">⁢</mo><mi id="S3.T2.2.2.1.m1.1.1.3.3" xref="S3.T2.2.2.1.m1.1.1.3.3.cmml">O</mi></mrow></msup><annotation-xml encoding="MathML-Content" id="S3.T2.2.2.1.m1.1b"><apply id="S3.T2.2.2.1.m1.1.1.cmml" xref="S3.T2.2.2.1.m1.1.1"><csymbol cd="ambiguous" id="S3.T2.2.2.1.m1.1.1.1.cmml" xref="S3.T2.2.2.1.m1.1.1">superscript</csymbol><ci id="S3.T2.2.2.1.m1.1.1.2.cmml" xref="S3.T2.2.2.1.m1.1.1.2">𝐸</ci><apply id="S3.T2.2.2.1.m1.1.1.3.cmml" xref="S3.T2.2.2.1.m1.1.1.3"><times id="S3.T2.2.2.1.m1.1.1.3.1.cmml" xref="S3.T2.2.2.1.m1.1.1.3.1"></times><ci id="S3.T2.2.2.1.m1.1.1.3.2.cmml" xref="S3.T2.2.2.1.m1.1.1.3.2">𝑆</ci><ci id="S3.T2.2.2.1.m1.1.1.3.3.cmml" xref="S3.T2.2.2.1.m1.1.1.3.3">𝑂</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.T2.2.2.1.m1.1c">E^{SO}</annotation><annotation encoding="application/x-llamapun" id="S3.T2.2.2.1.m1.1d">italic_E start_POSTSUPERSCRIPT italic_S italic_O end_POSTSUPERSCRIPT</annotation></semantics></math>(%)</td> <td class="ltx_td ltx_align_center" id="S3.T2.2.2.2" style="padding-top:1.5pt;padding-bottom:1.5pt;">82.66</td> <td class="ltx_td ltx_align_center" id="S3.T2.2.2.3" style="padding-top:1.5pt;padding-bottom:1.5pt;">58.61</td> <td class="ltx_td ltx_align_center" id="S3.T2.2.2.4" style="padding-top:1.5pt;padding-bottom:1.5pt;">66.47</td> </tr> <tr class="ltx_tr" id="S3.T2.3.3"> <td class="ltx_td ltx_align_center ltx_border_b" id="S3.T2.3.3.1" style="padding-top:1.5pt;padding-bottom:1.5pt;"> <math alttext="\Delta E" class="ltx_Math" display="inline" id="S3.T2.3.3.1.m1.1"><semantics id="S3.T2.3.3.1.m1.1a"><mrow id="S3.T2.3.3.1.m1.1.1" xref="S3.T2.3.3.1.m1.1.1.cmml"><mi id="S3.T2.3.3.1.m1.1.1.2" mathvariant="normal" xref="S3.T2.3.3.1.m1.1.1.2.cmml">Δ</mi><mo id="S3.T2.3.3.1.m1.1.1.1" xref="S3.T2.3.3.1.m1.1.1.1.cmml">⁢</mo><mi id="S3.T2.3.3.1.m1.1.1.3" xref="S3.T2.3.3.1.m1.1.1.3.cmml">E</mi></mrow><annotation-xml encoding="MathML-Content" id="S3.T2.3.3.1.m1.1b"><apply id="S3.T2.3.3.1.m1.1.1.cmml" xref="S3.T2.3.3.1.m1.1.1"><times id="S3.T2.3.3.1.m1.1.1.1.cmml" xref="S3.T2.3.3.1.m1.1.1.1"></times><ci id="S3.T2.3.3.1.m1.1.1.2.cmml" xref="S3.T2.3.3.1.m1.1.1.2">Δ</ci><ci id="S3.T2.3.3.1.m1.1.1.3.cmml" xref="S3.T2.3.3.1.m1.1.1.3">𝐸</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.T2.3.3.1.m1.1c">\Delta E</annotation><annotation encoding="application/x-llamapun" id="S3.T2.3.3.1.m1.1d">roman_Δ italic_E</annotation></semantics></math> (%)</td> <td class="ltx_td ltx_align_center ltx_border_b" id="S3.T2.3.3.2" style="padding-top:1.5pt;padding-bottom:1.5pt;">10.17</td> <td class="ltx_td ltx_align_center ltx_border_b" id="S3.T2.3.3.3" style="padding-top:1.5pt;padding-bottom:1.5pt;">39.07</td> <td class="ltx_td ltx_align_center ltx_border_b" id="S3.T2.3.3.4" style="padding-top:1.5pt;padding-bottom:1.5pt;">23.86</td> </tr> </tbody> </table> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S3.T2.7.2.1" style="font-size:90%;">TABLE II</span>: </span><span class="ltx_text" id="S3.T2.5.1" style="font-size:90%;">Comparison of episode duration (<math alttext="E" class="ltx_Math" display="inline" id="S3.T2.5.1.m1.1"><semantics id="S3.T2.5.1.m1.1b"><mi id="S3.T2.5.1.m1.1.1" xref="S3.T2.5.1.m1.1.1.cmml">E</mi><annotation-xml encoding="MathML-Content" id="S3.T2.5.1.m1.1c"><ci id="S3.T2.5.1.m1.1.1.cmml" xref="S3.T2.5.1.m1.1.1">𝐸</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.T2.5.1.m1.1d">E</annotation><annotation encoding="application/x-llamapun" id="S3.T2.5.1.m1.1e">italic_E</annotation></semantics></math>) for multi objective and single objective policies under N-1 contingencies.</span></figcaption> </figure> <div class="ltx_para" id="S3.SS3.p2"> <p class="ltx_p" id="S3.SS3.p2.2">Table <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.T2" title="Table II ‣ III-C Robustness to N-1 Contingencies ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">II</span></a> compares the mean episode duration (<math alttext="E" class="ltx_Math" display="inline" id="S3.SS3.p2.1.m1.1"><semantics id="S3.SS3.p2.1.m1.1a"><mi id="S3.SS3.p2.1.m1.1.1" xref="S3.SS3.p2.1.m1.1.1.cmml">E</mi><annotation-xml encoding="MathML-Content" id="S3.SS3.p2.1.m1.1b"><ci id="S3.SS3.p2.1.m1.1.1.cmml" xref="S3.SS3.p2.1.m1.1.1">𝐸</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p2.1.m1.1c">E</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p2.1.m1.1d">italic_E</annotation></semantics></math>) and the improved episode duration (<math alttext="\Delta E" class="ltx_Math" display="inline" id="S3.SS3.p2.2.m2.1"><semantics id="S3.SS3.p2.2.m2.1a"><mrow id="S3.SS3.p2.2.m2.1.1" xref="S3.SS3.p2.2.m2.1.1.cmml"><mi id="S3.SS3.p2.2.m2.1.1.2" mathvariant="normal" xref="S3.SS3.p2.2.m2.1.1.2.cmml">Δ</mi><mo id="S3.SS3.p2.2.m2.1.1.1" xref="S3.SS3.p2.2.m2.1.1.1.cmml">⁢</mo><mi id="S3.SS3.p2.2.m2.1.1.3" xref="S3.SS3.p2.2.m2.1.1.3.cmml">E</mi></mrow><annotation-xml encoding="MathML-Content" id="S3.SS3.p2.2.m2.1b"><apply id="S3.SS3.p2.2.m2.1.1.cmml" xref="S3.SS3.p2.2.m2.1.1"><times id="S3.SS3.p2.2.m2.1.1.1.cmml" xref="S3.SS3.p2.2.m2.1.1.1"></times><ci id="S3.SS3.p2.2.m2.1.1.2.cmml" xref="S3.SS3.p2.2.m2.1.1.2">Δ</ci><ci id="S3.SS3.p2.2.m2.1.1.3.cmml" xref="S3.SS3.p2.2.m2.1.1.3">𝐸</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS3.p2.2.m2.1c">\Delta E</annotation><annotation encoding="application/x-llamapun" id="S3.SS3.p2.2.m2.1d">roman_Δ italic_E</annotation></semantics></math>) normalized by the the total number of possible steps for the MO and SO policies. The results show that MO RL policies achieve a higher average episode duration. For instance, for highly frequent contingencies, the MO Policies achieve 23.86% increase in episode duration compared to SO policies. In the setting with moderately frequent contingencies, the MO Policies outperform the SO policies by almost 40%. By learning to reduce the topological deviation and to reduce the switching frequency, agents trained with MO policies develop more robust strategies, which perform better under contingencies.</p> </div> </section> <section class="ltx_subsection" id="S3.SS4"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS4.5.1.1">III-D</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS4.6.2">Efficient Training</span> </h3> <div class="ltx_para" id="S3.SS4.p1"> <p class="ltx_p" id="S3.SS4.p1.1">This case study investigates the training efficiency of MO and SO policies when computational resources are limited. Similar to the previous case study, we evaluate performance using the average episode duration and select the best MO policies according to Algorithm <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#alg3" title="Algorithm 3 ‣ II-D Policy Selection ‣ II Methodology ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">3</span></a>. <a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.T3" title="In III-D Efficient Training ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">Table</span> <span class="ltx_text ltx_ref_tag">III</span></a> compares MO and SO policies considering the following training scenarios:</p> <ul class="ltx_itemize" id="S3.I2"> <li class="ltx_item" id="S3.I2.i1" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I2.i1.p1"> <p class="ltx_p" id="S3.I2.i1.p1.1">Full Training Budget: The agent is trained with the default number of interactions (2048 training samples, 4 update cycles).</p> </div> </li> <li class="ltx_item" id="S3.I2.i2" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I2.i2.p1"> <p class="ltx_p" id="S3.I2.i2.p1.1">Moderate Training Budget: The agent is trained on 75% of the training (1536 training samples, 3 update cycles).</p> </div> </li> <li class="ltx_item" id="S3.I2.i3" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">•</span> <div class="ltx_para" id="S3.I2.i3.p1"> <p class="ltx_p" id="S3.I2.i3.p1.1">Low Training Budget: The agent is trained on 50% of the training (1024 training samples, 2 update cycles).</p> </div> </li> </ul> </div> <figure class="ltx_table" id="S3.T3"> <table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle" id="S3.T3.3"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S3.T3.3.4.1"> <td class="ltx_td ltx_border_t" id="S3.T3.3.4.1.1" style="padding-top:1.5pt;padding-bottom:1.5pt;"></td> <th class="ltx_td ltx_th ltx_th_column ltx_border_t" id="S3.T3.3.4.1.2" style="padding-top:1.5pt;padding-bottom:1.5pt;"></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S3.T3.3.4.1.3" style="padding-top:1.5pt;padding-bottom:1.5pt;">Training Budget</th> <td class="ltx_td ltx_border_t" id="S3.T3.3.4.1.4" style="padding-top:1.5pt;padding-bottom:1.5pt;"></td> </tr> <tr class="ltx_tr" id="S3.T3.3.5.2"> <td class="ltx_td" id="S3.T3.3.5.2.1" style="padding-top:1.5pt;padding-bottom:1.5pt;"></td> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S3.T3.3.5.2.2" style="padding-top:1.5pt;padding-bottom:1.5pt;">Low</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S3.T3.3.5.2.3" style="padding-top:1.5pt;padding-bottom:1.5pt;">Moderate</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S3.T3.3.5.2.4" style="padding-top:1.5pt;padding-bottom:1.5pt;">Full</th> </tr> <tr class="ltx_tr" id="S3.T3.1.1"> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.1.1.1" style="padding-top:1.5pt;padding-bottom:1.5pt;"> <math alttext="E^{MO}" class="ltx_Math" display="inline" id="S3.T3.1.1.1.m1.1"><semantics id="S3.T3.1.1.1.m1.1a"><msup id="S3.T3.1.1.1.m1.1.1" xref="S3.T3.1.1.1.m1.1.1.cmml"><mi id="S3.T3.1.1.1.m1.1.1.2" xref="S3.T3.1.1.1.m1.1.1.2.cmml">E</mi><mrow id="S3.T3.1.1.1.m1.1.1.3" xref="S3.T3.1.1.1.m1.1.1.3.cmml"><mi id="S3.T3.1.1.1.m1.1.1.3.2" xref="S3.T3.1.1.1.m1.1.1.3.2.cmml">M</mi><mo id="S3.T3.1.1.1.m1.1.1.3.1" xref="S3.T3.1.1.1.m1.1.1.3.1.cmml">⁢</mo><mi id="S3.T3.1.1.1.m1.1.1.3.3" xref="S3.T3.1.1.1.m1.1.1.3.3.cmml">O</mi></mrow></msup><annotation-xml encoding="MathML-Content" id="S3.T3.1.1.1.m1.1b"><apply id="S3.T3.1.1.1.m1.1.1.cmml" xref="S3.T3.1.1.1.m1.1.1"><csymbol cd="ambiguous" id="S3.T3.1.1.1.m1.1.1.1.cmml" xref="S3.T3.1.1.1.m1.1.1">superscript</csymbol><ci id="S3.T3.1.1.1.m1.1.1.2.cmml" xref="S3.T3.1.1.1.m1.1.1.2">𝐸</ci><apply id="S3.T3.1.1.1.m1.1.1.3.cmml" xref="S3.T3.1.1.1.m1.1.1.3"><times id="S3.T3.1.1.1.m1.1.1.3.1.cmml" xref="S3.T3.1.1.1.m1.1.1.3.1"></times><ci id="S3.T3.1.1.1.m1.1.1.3.2.cmml" xref="S3.T3.1.1.1.m1.1.1.3.2">𝑀</ci><ci id="S3.T3.1.1.1.m1.1.1.3.3.cmml" xref="S3.T3.1.1.1.m1.1.1.3.3">𝑂</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.1.1.1.m1.1c">E^{MO}</annotation><annotation encoding="application/x-llamapun" id="S3.T3.1.1.1.m1.1d">italic_E start_POSTSUPERSCRIPT italic_M italic_O end_POSTSUPERSCRIPT</annotation></semantics></math> (%)</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.1.1.2" style="padding-top:1.5pt;padding-bottom:1.5pt;">95.27</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.1.1.3" style="padding-top:1.5pt;padding-bottom:1.5pt;">90.44</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T3.1.1.4" style="padding-top:1.5pt;padding-bottom:1.5pt;">94.83</td> </tr> <tr class="ltx_tr" id="S3.T3.2.2"> <td class="ltx_td ltx_align_center" id="S3.T3.2.2.1" style="padding-top:1.5pt;padding-bottom:1.5pt;"> <math alttext="E^{SO}" class="ltx_Math" display="inline" id="S3.T3.2.2.1.m1.1"><semantics id="S3.T3.2.2.1.m1.1a"><msup id="S3.T3.2.2.1.m1.1.1" xref="S3.T3.2.2.1.m1.1.1.cmml"><mi id="S3.T3.2.2.1.m1.1.1.2" xref="S3.T3.2.2.1.m1.1.1.2.cmml">E</mi><mrow id="S3.T3.2.2.1.m1.1.1.3" xref="S3.T3.2.2.1.m1.1.1.3.cmml"><mi id="S3.T3.2.2.1.m1.1.1.3.2" xref="S3.T3.2.2.1.m1.1.1.3.2.cmml">S</mi><mo id="S3.T3.2.2.1.m1.1.1.3.1" xref="S3.T3.2.2.1.m1.1.1.3.1.cmml">⁢</mo><mi id="S3.T3.2.2.1.m1.1.1.3.3" xref="S3.T3.2.2.1.m1.1.1.3.3.cmml">O</mi></mrow></msup><annotation-xml encoding="MathML-Content" id="S3.T3.2.2.1.m1.1b"><apply id="S3.T3.2.2.1.m1.1.1.cmml" xref="S3.T3.2.2.1.m1.1.1"><csymbol cd="ambiguous" id="S3.T3.2.2.1.m1.1.1.1.cmml" xref="S3.T3.2.2.1.m1.1.1">superscript</csymbol><ci id="S3.T3.2.2.1.m1.1.1.2.cmml" xref="S3.T3.2.2.1.m1.1.1.2">𝐸</ci><apply id="S3.T3.2.2.1.m1.1.1.3.cmml" xref="S3.T3.2.2.1.m1.1.1.3"><times id="S3.T3.2.2.1.m1.1.1.3.1.cmml" xref="S3.T3.2.2.1.m1.1.1.3.1"></times><ci id="S3.T3.2.2.1.m1.1.1.3.2.cmml" xref="S3.T3.2.2.1.m1.1.1.3.2">𝑆</ci><ci id="S3.T3.2.2.1.m1.1.1.3.3.cmml" xref="S3.T3.2.2.1.m1.1.1.3.3">𝑂</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.2.2.1.m1.1c">E^{SO}</annotation><annotation encoding="application/x-llamapun" id="S3.T3.2.2.1.m1.1d">italic_E start_POSTSUPERSCRIPT italic_S italic_O end_POSTSUPERSCRIPT</annotation></semantics></math> (%)</td> <td class="ltx_td ltx_align_center" id="S3.T3.2.2.2" style="padding-top:1.5pt;padding-bottom:1.5pt;">73.39</td> <td class="ltx_td ltx_align_center" id="S3.T3.2.2.3" style="padding-top:1.5pt;padding-bottom:1.5pt;">84.95</td> <td class="ltx_td ltx_align_center" id="S3.T3.2.2.4" style="padding-top:1.5pt;padding-bottom:1.5pt;">82.66</td> </tr> <tr class="ltx_tr" id="S3.T3.3.3"> <td class="ltx_td ltx_align_center ltx_border_b" id="S3.T3.3.3.1" style="padding-top:1.5pt;padding-bottom:1.5pt;"> <math alttext="\Delta E" class="ltx_Math" display="inline" id="S3.T3.3.3.1.m1.1"><semantics id="S3.T3.3.3.1.m1.1a"><mrow id="S3.T3.3.3.1.m1.1.1" xref="S3.T3.3.3.1.m1.1.1.cmml"><mi id="S3.T3.3.3.1.m1.1.1.2" mathvariant="normal" xref="S3.T3.3.3.1.m1.1.1.2.cmml">Δ</mi><mo id="S3.T3.3.3.1.m1.1.1.1" xref="S3.T3.3.3.1.m1.1.1.1.cmml">⁢</mo><mi id="S3.T3.3.3.1.m1.1.1.3" xref="S3.T3.3.3.1.m1.1.1.3.cmml">E</mi></mrow><annotation-xml encoding="MathML-Content" id="S3.T3.3.3.1.m1.1b"><apply id="S3.T3.3.3.1.m1.1.1.cmml" xref="S3.T3.3.3.1.m1.1.1"><times id="S3.T3.3.3.1.m1.1.1.1.cmml" xref="S3.T3.3.3.1.m1.1.1.1"></times><ci id="S3.T3.3.3.1.m1.1.1.2.cmml" xref="S3.T3.3.3.1.m1.1.1.2">Δ</ci><ci id="S3.T3.3.3.1.m1.1.1.3.cmml" xref="S3.T3.3.3.1.m1.1.1.3">𝐸</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.3.3.1.m1.1c">\Delta E</annotation><annotation encoding="application/x-llamapun" id="S3.T3.3.3.1.m1.1d">roman_Δ italic_E</annotation></semantics></math> (%)</td> <td class="ltx_td ltx_align_center ltx_border_b" id="S3.T3.3.3.2" style="padding-top:1.5pt;padding-bottom:1.5pt;">21.88</td> <td class="ltx_td ltx_align_center ltx_border_b" id="S3.T3.3.3.3" style="padding-top:1.5pt;padding-bottom:1.5pt;">5.49</td> <td class="ltx_td ltx_align_center ltx_border_b" id="S3.T3.3.3.4" style="padding-top:1.5pt;padding-bottom:1.5pt;">10.17</td> </tr> </tbody> </table> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S3.T3.7.2.1" style="font-size:90%;">TABLE III</span>: </span><span class="ltx_text" id="S3.T3.5.1" style="font-size:90%;">Comparison of episode duration (<math alttext="E" class="ltx_Math" display="inline" id="S3.T3.5.1.m1.1"><semantics id="S3.T3.5.1.m1.1b"><mi id="S3.T3.5.1.m1.1.1" xref="S3.T3.5.1.m1.1.1.cmml">E</mi><annotation-xml encoding="MathML-Content" id="S3.T3.5.1.m1.1c"><ci id="S3.T3.5.1.m1.1.1.cmml" xref="S3.T3.5.1.m1.1.1">𝐸</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.T3.5.1.m1.1d">E</annotation><annotation encoding="application/x-llamapun" id="S3.T3.5.1.m1.1e">italic_E</annotation></semantics></math>) for multi objective and single objective policies under constrained training.</span></figcaption> </figure> <div class="ltx_para" id="S3.SS4.p2"> <p class="ltx_p" id="S3.SS4.p2.1"><a class="ltx_ref" href="https://arxiv.org/html/2502.00040v1#S3.T3" title="In III-D Efficient Training ‣ III Case Studies ‣ Multi-Objective Reinforcement Learning for Power Grid Topology Control"><span class="ltx_text ltx_ref_tag">Table</span> <span class="ltx_text ltx_ref_tag">III</span></a> shows that MO policies outperform SO policies with fewer training iterations. Notably, in the low training scenario, MO policies achieve a 21.88% higher episode duration on average. By focusing on reducing switching frequency and maintaining proximity to the original topology early in training, MO policies develop effective strategies at an earlier stage. As a result, MO policies provide faster and more efficient learning, a critical advantage as the computational complexity of larger grids increases exponentially.</p> </div> </section> </section> <section class="ltx_section" id="S4"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">IV </span><span class="ltx_text ltx_font_smallcaps" id="S4.1.1">Discussion and Conclusion</span> </h2> <div class="ltx_para" id="S4.p1"> <p class="ltx_p" id="S4.p1.1">This paper presents the first investigation into multi-objective reinforcement learning (MORL) for power grid topology control. We demonstrated trade-offs exist among conflicting operational objectives in the underlying problem of topological control. From analyzing the initial case studies, we conclude the approach seems promising to alleviate challenges in modeling this topological control problem. In other words, the underlying problem being multi-objective can be approached through learning in a more principled way. The initial case studies show that the proposed DOL approach generates higher-quality solution sets, with higher density and closer approximation of the Pareto frontier, compared to random sampling, thereby offering enhanced decision support and a more comprehensive set of policies for operators to select from. Additionally, by simultaneously reducing topological deviation, switching frequency and line loading, multi-objective policies achieve 24% higher average episode duration under high-contingency scenarios, compared to single-objective policies. The results also show that multi-objective policies improve training efficiency; when using <math alttext="50" class="ltx_Math" display="inline" id="S4.p1.1.m1.1"><semantics id="S4.p1.1.m1.1a"><mn id="S4.p1.1.m1.1.1" xref="S4.p1.1.m1.1.1.cmml">50</mn><annotation-xml encoding="MathML-Content" id="S4.p1.1.m1.1b"><cn id="S4.p1.1.m1.1.1.cmml" type="integer" xref="S4.p1.1.m1.1.1">50</cn></annotation-xml><annotation encoding="application/x-tex" id="S4.p1.1.m1.1c">50</annotation><annotation encoding="application/x-llamapun" id="S4.p1.1.m1.1d">50</annotation></semantics></math>% of the training steps, MO policies achieve a 22% better episode duration compared to single-objective policies. However, some limitations of the study can be noted. The 5-bus system used in this research does not fully reflect the complexity of real-world power systems. Future work should adapt the proposed approach to a larger grid to investigate its practical applicability. Additionally, this study focuses on a limited set of operational objectives, including line loading and switching frequency. Future research should explore other objectives, such as operational cost, and environmental impacts. Moreover, expanding MORL for topology control to include both topological actions and generator re-dispatch should be explored in future research.</p> </div> </section> <section class="ltx_section" id="Sx1"> <h2 class="ltx_title ltx_font_smallcaps ltx_title_section">Acknowledgment</h2> <div class="ltx_para" id="Sx1.p1"> <p class="ltx_p" id="Sx1.p1.1">AI4REALNET has received funding from European Union’s Horizon Europe Research and Innovation programme under the Grant Agreement No 101119527. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.</p> </div> </section> <section class="ltx_bibliography" id="bib"> <h2 class="ltx_title ltx_title_bibliography">References</h2> <ul class="ltx_biblist"> <li class="ltx_bibitem" id="bib.bib1"> <span class="ltx_tag ltx_tag_bibitem">[1]</span> <span class="ltx_bibblock"> M. Heidarifar, P. Andrianesis, P. Ruiz, M. C. Caramanis, and I. C. Paschalidis, “An optimal transmission line switching and bus splitting heuristic incorporating ac and n-1 contingency constraints,” <em class="ltx_emph ltx_font_italic" id="bib.bib1.1.1">International Journal of Electrical Power &amp; Energy Systems</em>, vol. 133, p. 107278, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib2"> <span class="ltx_tag ltx_tag_bibitem">[2]</span> <span class="ltx_bibblock"> A. Ewerszumrode, N. Erle, S. Krahl, and A. Moser, “An iterative approach to grid topology and redispatch optimization in congestion management,” <em class="ltx_emph ltx_font_italic" id="bib.bib2.1.1">Electric Power Systems Research</em>, vol. 234, p. 110700, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib3"> <span class="ltx_tag ltx_tag_bibitem">[3]</span> <span class="ltx_bibblock"> A. Marot, B. Donnot, C. Romero, B. Donon, M. Lerousseau, L. Veyrin-Forrer, and I. Guyon, “Learning to run a power network challenge for training topology controllers,” <em class="ltx_emph ltx_font_italic" id="bib.bib3.1.1">Electric Power Systems Research</em>, vol. 189, p. 106635, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib4"> <span class="ltx_tag ltx_tag_bibitem">[4]</span> <span class="ltx_bibblock"> J. Viebahn, S. Kop, J. v. Dijk, H. Budaya, M. Streefland, D. Barbieri, P. Champion, M. Jothy, V. Renault, and S. Tindemans, “Gridoptions tool: Real-world day-ahead congestion management using topological remedial actions,” <em class="ltx_emph ltx_font_italic" id="bib.bib4.1.1">CIGRE</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib5"> <span class="ltx_tag ltx_tag_bibitem">[5]</span> <span class="ltx_bibblock"> S. Babaeinejadsarookolaee, B. Park, B. Lesieutre, and C. L. DeMarco, “Transmission congestion management via node-breaker topology control,” <em class="ltx_emph ltx_font_italic" id="bib.bib5.1.1">IEEE Systems Journal</em>, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib6"> <span class="ltx_tag ltx_tag_bibitem">[6]</span> <span class="ltx_bibblock"> A. Marot, B. Donnot, S. Tazi, and P. Panciatici, “Expert system for topological remedial action discovery in smart grids,” in <em class="ltx_emph ltx_font_italic" id="bib.bib6.1.1">Mediterranean Conference on Power Generation, Transmission, Distribution and Energy Conversion (MEDPOWER 2018)</em>.   IET, 2018, pp. 1–6. </span> </li> <li class="ltx_bibitem" id="bib.bib7"> <span class="ltx_tag ltx_tag_bibitem">[7]</span> <span class="ltx_bibblock"> I. Hrgović and I. Pavić, “Substation reconfiguration selection algorithm based on ptdfs for congestion management and rl approach,” <em class="ltx_emph ltx_font_italic" id="bib.bib7.1.1">Expert systems with applications</em>, vol. 257, p. 125017, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib8"> <span class="ltx_tag ltx_tag_bibitem">[8]</span> <span class="ltx_bibblock"> M. Lehna, J. Viebahn, A. Marot, S. Tomforde, and C. Scholz, “Managing power grids through topology actions: A comparative study between advanced rule-based and reinforcement learning agents,” <em class="ltx_emph ltx_font_italic" id="bib.bib8.1.1">Energy and AI</em>, vol. 14, p. 100276, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib9"> <span class="ltx_tag ltx_tag_bibitem">[9]</span> <span class="ltx_bibblock"> A. Kelly, A. O’Sullivan, P. de Mars, and A. Marot, “Reinforcement learning for electricity network operation,” <em class="ltx_emph ltx_font_italic" id="bib.bib9.1.1">arXiv preprint arXiv:2003.07339</em>, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib10"> <span class="ltx_tag ltx_tag_bibitem">[10]</span> <span class="ltx_bibblock"> A. Marot, B. Donnot, G. Dulac-Arnold, A. Kelly, A. O’Sullivan, J. Viebahn, M. Awad, I. Guyon, P. Panciatici, and C. Romero, “Learning to run a power network challenge: a retrospective analysis,” in <em class="ltx_emph ltx_font_italic" id="bib.bib10.1.1">NeurIPS 2020 Competition and Demonstration Track</em>.   PMLR, 2021, pp. 112–132. </span> </li> <li class="ltx_bibitem" id="bib.bib11"> <span class="ltx_tag ltx_tag_bibitem">[11]</span> <span class="ltx_bibblock"> A. Marot, B. Donnot, K. Chaouache, A. Kelly, Q. Huang, R.-R. Hossain, and J. L. Cremer, “Learning to run a power network with trust,” <em class="ltx_emph ltx_font_italic" id="bib.bib11.1.1">Electric Power Systems Research</em>, vol. 212, p. 108487, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib12"> <span class="ltx_tag ltx_tag_bibitem">[12]</span> <span class="ltx_bibblock"> T. Lan, J. Duan, B. Zhang, D. Shi, Z. Wang, R. Diao, and X. Zhang, “Ai-based autonomous line flow control via topology adjustment for maximizing time-series atcs,” in <em class="ltx_emph ltx_font_italic" id="bib.bib12.1.1">2020 IEEE Power &amp; Energy Society General Meeting (PESGM)</em>.   IEEE, 2020, pp. 1–5. </span> </li> <li class="ltx_bibitem" id="bib.bib13"> <span class="ltx_tag ltx_tag_bibitem">[13]</span> <span class="ltx_bibblock"> D. Yoon, S. Hong, B.-J. Lee, and K.-E. Kim, “Winning the l2rpn challenge: Power grid management via semi-markov afterstate actor-critic,” in <em class="ltx_emph ltx_font_italic" id="bib.bib13.1.1">International Conference on Learning Representations</em>, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib14"> <span class="ltx_tag ltx_tag_bibitem">[14]</span> <span class="ltx_bibblock"> M. Subramanian, J. Viebahn, S. H. Tindemans, B. Donnot, and A. Marot, “Exploring grid topology reconfiguration using a simple deep reinforcement learning approach,” in <em class="ltx_emph ltx_font_italic" id="bib.bib14.1.1">2021 IEEE Madrid PowerTech</em>.   IEEE, 2021, pp. 1–6. </span> </li> <li class="ltx_bibitem" id="bib.bib15"> <span class="ltx_tag ltx_tag_bibitem">[15]</span> <span class="ltx_bibblock"> A. Chauhan, M. Baranwal, and A. Basumatary, “Powrl: A reinforcement learning framework for robust management of power networks,” in <em class="ltx_emph ltx_font_italic" id="bib.bib15.1.1">Proceedings of the AAAI Conference on Artificial Intelligence</em>, vol. 37, no. 12, 2023, pp. 14 757–14 764. </span> </li> <li class="ltx_bibitem" id="bib.bib16"> <span class="ltx_tag ltx_tag_bibitem">[16]</span> <span class="ltx_bibblock"> M. Dorfer, A. R. Fuxjäger, K. Kozak, P. M. Blies, and M. Wasserer, “Power grid congestion management via topology optimization with alphazero,” <em class="ltx_emph ltx_font_italic" id="bib.bib16.1.1">arXiv preprint arXiv:2211.05612</em>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib17"> <span class="ltx_tag ltx_tag_bibitem">[17]</span> <span class="ltx_bibblock"> A. R. R. Matavalam, K. P. Guddanti, Y. Weng, and V. Ajjarapu, “Curriculum based reinforcement learning of grid topology controllers to prevent thermal cascading,” <em class="ltx_emph ltx_font_italic" id="bib.bib17.1.1">IEEE Transactions on Power Systems</em>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib18"> <span class="ltx_tag ltx_tag_bibitem">[18]</span> <span class="ltx_bibblock"> G. J. Meppelink, A. Rajaei, and J. L. Cremer, “A hybrid curriculum learning and tree search approach for network topology control,” <em class="ltx_emph ltx_font_italic" id="bib.bib18.1.1">Electric Power Systems Research</em>, 2025, accepted. </span> </li> <li class="ltx_bibitem" id="bib.bib19"> <span class="ltx_tag ltx_tag_bibitem">[19]</span> <span class="ltx_bibblock"> B. Manczak, J. Viebahn, and H. van Hoof, “Hierarchical reinforcement learning for power network topology control.” </span> </li> <li class="ltx_bibitem" id="bib.bib20"> <span class="ltx_tag ltx_tag_bibitem">[20]</span> <span class="ltx_bibblock"> E. van der Sar, A. Zocca, and S. Bhulai, “Multi-agent reinforcement learning for power grid topology optimization.” </span> </li> <li class="ltx_bibitem" id="bib.bib21"> <span class="ltx_tag ltx_tag_bibitem">[21]</span> <span class="ltx_bibblock"> I. Hrgović and I. Pavić, “Reward design for intelligent deep reinforcement learning based power flow control using topology optimization,” <em class="ltx_emph ltx_font_italic" id="bib.bib21.1.1">Sustainable energy, grids and networks</em>, p. 101580, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib22"> <span class="ltx_tag ltx_tag_bibitem">[22]</span> <span class="ltx_bibblock"> J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” <em class="ltx_emph ltx_font_italic" id="bib.bib22.1.1">arXiv preprint arXiv:1707.06347</em>, 2017. </span> </li> <li class="ltx_bibitem" id="bib.bib23"> <span class="ltx_tag ltx_tag_bibitem">[23]</span> <span class="ltx_bibblock"> F. Felten, L. N. Alegre, A. Nowe, A. Bazzan, E. G. Talbi, G. Danoy, and B. C da Silva, “A toolkit for reliable benchmarking and research in multi-objective reinforcement learning,” <em class="ltx_emph ltx_font_italic" id="bib.bib23.1.1">Advances in Neural Information Processing Systems</em>, vol. 36, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib24"> <span class="ltx_tag ltx_tag_bibitem">[24]</span> <span class="ltx_bibblock"> H. Mossalam, Y. M. Assael, D. M. Roijers, and S. Whiteson, “Multi-objective deep reinforcement learning,” <em class="ltx_emph ltx_font_italic" id="bib.bib24.1.1">arXiv preprint arXiv:1610.02707</em>, 2016. </span> </li> <li class="ltx_bibitem" id="bib.bib25"> <span class="ltx_tag ltx_tag_bibitem">[25]</span> <span class="ltx_bibblock"> C. F. Hayes, R. Rădulescu, E. Bargiacchi, J. Källström, M. Macfarlane, M. Reymond, T. Verstraeten, L. M. Zintgraf, R. Dazeley, F. Heintz, E. Howley, A. A. Irissappane, P. Mannion, A. Nowé, G. Ramos, M. Restelli, P. Vamplew, and D. M. Roijers, “A practical guide to multi-objective reinforcement learning and planning,” <em class="ltx_emph ltx_font_italic" id="bib.bib25.1.1">Autonomous Agents and Multi-Agent Systems</em>, vol. 36, no. 1, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib26"> <span class="ltx_tag ltx_tag_bibitem">[26]</span> <span class="ltx_bibblock"> RTE France, “Grid2op,” 2020. [Online]. Available: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://github.com/rte-france/Grid2Op" title="">https://github.com/rte-france/Grid2Op</a> </span> </li> <li class="ltx_bibitem" id="bib.bib27"> <span class="ltx_tag ltx_tag_bibitem">[27]</span> <span class="ltx_bibblock"> D. M. Roijers, “Multi-objective decision-theoretic planning,” <em class="ltx_emph ltx_font_italic" id="bib.bib27.1.1">AI Matters</em>, vol. 2, no. 4, pp. 11–12, 2016. </span> </li> <li class="ltx_bibitem" id="bib.bib28"> <span class="ltx_tag ltx_tag_bibitem">[28]</span> <span class="ltx_bibblock"> Delft High Performance Computing Centre (DHPC), “DelftBlue Supercomputer (Phase 2),” <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://www.tudelft.nl/dhpc/ark:/44463/DelftBluePhase2" title="">https://www.tudelft.nl/dhpc/ark:/44463/DelftBluePhase2</a>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib29"> <span class="ltx_tag ltx_tag_bibitem">[29]</span> <span class="ltx_bibblock"> T. R. Lautenbacher, A. Rajaei, J. Viebahn, D. Barbieri, and J. Cremer, “Implementation of multi-objective reinforcement learning for power grid topology control,” 2025. [Online]. Available: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://github.com/TU-Delft-AI-Energy-Lab/TOPGRID_MORL" title="">https://github.com/TU-Delft-AI-Energy-Lab/TOPGRID_MORL</a> </span> </li> </ul> </section> </article> </div> <footer class="ltx_page_footer"> <div class="ltx_page_logo">Generated on Mon Jan 27 12:20:35 2025 by <a class="ltx_LaTeXML_logo" href="http://dlmf.nist.gov/LaTeXML/"><span style="letter-spacing:-0.2em; margin-right:0.1em;">L<span class="ltx_font_smallcaps" style="position:relative; bottom:2.2pt;">a</span>T<span class="ltx_font_smallcaps" style="font-size:120%;position:relative; bottom:-0.2ex;">e</span></span><span style="font-size:90%; position:relative; bottom:-0.2ex;">XML</span><img alt="Mascot Sammy" src=""/></a> </div></footer> </div> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10