CINXE.COM

<!DOCTYPE html> <html lang="en"> <head> <meta content="text/html; charset=utf-8" http-equiv="content-type"/> <title>Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy.</title>  <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv-fonts.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/latexml_styles.css" rel="stylesheet" type="text/css"/> <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.3.3/html2canvas.min.js"></script> <script src="/static/browse/0.3.4/js/addons_new.js"></script> <script src="/static/browse/0.3.4/js/feedbackOverlay.js"></script> <meta content=" Speaker selection mechanism, neural network, audio beamforming, cocktail party problem " lang="en" name="keywords"/> <base href="/html/2503.18590v1/"/></head> <body> <nav class="ltx_page_navbar"> <nav class="ltx_TOC"> <ol class="ltx_toclist"> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S1" title="In Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">I </span><span class="ltx_text ltx_font_smallcaps">Introduction</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S2" title="In Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">II </span><span class="ltx_text ltx_font_smallcaps">Preliminaries</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S3" title="In Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">III </span><span class="ltx_text ltx_font_smallcaps">Speaker selection mechanism</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S4" title="In Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">IV </span><span class="ltx_text ltx_font_smallcaps">Model and simulation framework</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S4.SS1" title="In IV Model and simulation framework ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">IV-A</span> </span><span class="ltx_text ltx_font_italic">Audio beamforming model</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S4.SS2" title="In IV Model and simulation framework ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">IV-B</span> </span><span class="ltx_text ltx_font_italic">Acoustic simulation setup</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S4.SS3" title="In IV Model and simulation framework ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">IV-C</span> </span><span class="ltx_text ltx_font_italic">Data</span></span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S5" title="In Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">V </span><span class="ltx_text ltx_font_smallcaps">Results and discussion</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S6" title="In Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">VI </span><span class="ltx_text ltx_font_smallcaps">Conclusion</span></span></a></li> </ol></nav> </nav> <div class="ltx_page_main"> <div class="ltx_page_content"> <article class="ltx_document ltx_authors_1line"> <h1 class="ltx_title ltx_title_document">Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios <br class="ltx_break"/><span class="ltx_note ltx_role_thanks" id="id1.id1"><sup class="ltx_note_mark">†</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">†</sup><span class="ltx_note_type">thanks: </span>This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy.</span></span></span> </h1> <div class="ltx_authors"> <span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Luan Vinícius Fiorio </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_affiliation"><span class="ltx_text ltx_font_italic" id="id2.1.id1">Department of Electrical Engineering</span> <br class="ltx_break"/><span class="ltx_text ltx_font_italic" id="id3.2.id2">Eindhoven University of Technology <br class="ltx_break"/></span>Eindhoven, The Netherlands <br class="ltx_break"/>l.v.fiorio@tue.nl </span></span></span> <span class="ltx_author_before"> </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Bruno Defraene </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_affiliation"> <span class="ltx_text ltx_font_italic" id="id4.1.id1">NXP Semiconductors</span> <br class="ltx_break"/>Leuven, Belgium <br class="ltx_break"/>bruno.defraene@nxp.com </span></span></span> <span class="ltx_author_before"> </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Johan David </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_affiliation"> <span class="ltx_text ltx_font_italic" id="id5.1.id1">NXP Semiconductors</span> <br class="ltx_break"/>Leuven, Belgium <br class="ltx_break"/>j.david@nxp.com </span></span></span> <span class="ltx_author_before"> </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Alex Young </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_affiliation"> <span class="ltx_text ltx_font_italic" id="id6.1.id1">NXP Semiconductors</span> <br class="ltx_break"/>Eindhoven, The Netherlands <br class="ltx_break"/>alex.young@nxp.com </span></span></span> <span class="ltx_author_before"> </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Frans Widdershoven </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_affiliation"> <span class="ltx_text ltx_font_italic" id="id7.1.id1">NXP Semiconductors</span> <br class="ltx_break"/>Eindhoven, The Netherlands <br class="ltx_break"/>frans.widdershoven@nxp.com </span></span></span> <span class="ltx_author_before"> </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Wim van Houtum </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_affiliation"> <span class="ltx_text ltx_font_italic" id="id8.1.id1">NXP Semiconductors</span> <br class="ltx_break"/>Eindhoven, The Netherlands <br class="ltx_break"/>wim.van.houtum@nxp.com </span></span></span> <span class="ltx_author_before"> </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Ronald M. Aarts </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_affiliation"><span class="ltx_text ltx_font_italic" id="id9.1.id1">Department of Electrical Engineering</span> <br class="ltx_break"/><span class="ltx_text ltx_font_italic" id="id10.2.id2">Eindhoven University of Technology <br class="ltx_break"/></span>Eindhoven, The Netherlands <br class="ltx_break"/>R.M.Aarts@tue.nl </span></span></span> </div> <div class="ltx_abstract"> <h6 class="ltx_title ltx_title_abstract">Abstract</h6> <p class="ltx_p" id="id11.id1">We propose a speaker selection mechanism (SSM) for the training of an end-to-end beamforming neural network, based on recent findings that a listener usually looks to the target speaker with a certain undershot angle. The mechanism allows the neural network model to learn toward which speaker to focus, during training, in a multi-speaker scenario, based on the position of listener and speakers. However, only audio information is necessary during inference. We perform acoustic simulations demonstrating the feasibility and performance when the SSM is employed in training. The results show significant increase in speech intelligibility, quality, and distortion metrics when compared to the minimum variance distortionless filter and the same neural network model trained without SSM. The success of the proposed method is a significant step forward toward the solution of the cocktail party problem.</p> </div> <div class="ltx_keywords"> <h6 class="ltx_title ltx_title_keywords">Index Terms: </h6> Speaker selection mechanism, neural network, audio beamforming, cocktail party problem </div> <section class="ltx_section" id="S1"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">I </span><span class="ltx_text ltx_font_smallcaps" id="S1.1.1">Introduction</span> </h2> <div class="ltx_para" id="S1.p1"> <p class="ltx_p" id="S1.p1.1">“How do we recognize what one person is saying when others are speaking at the same time?” <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib1" title="">1</a>, p. 117]</cite>. This simple question formulates the <em class="ltx_emph ltx_font_italic" id="S1.p1.1.1">cocktail party problem</em>, which refers to the ability of the human hearing to separate voices that are mixed, in frequency and time. While such an ability is present in normal hearing, hearing impaired listeners might face difficulty in segregating auditory streams <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib2" title="">2</a>]</cite>.</p> </div> <div class="ltx_para" id="S1.p2"> <p class="ltx_p" id="S1.p2.1">Hearing impaired listeners frequently rely on hearing aids, sound-amplifying devices which employ beamforming strategies. Such devices usually beamform in front of the listener, while recent findings show that the listener’s head has a tendency to undershot the target speaker’s position <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib3" title="">3</a>]</cite>. Beamforming algorithms help improving speech intelligibility and sound quality <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib4" title="">4</a>]</cite>, however, in reverberant multi-speaker scenarios, the performance of algorithms such as the minimum variance distortionless response (MVDR) filter is reduced <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib5" title="">5</a>]</cite>. More recently, audio beamforming was developed using neural networks (NN), end-to-end <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib6" title="">6</a>]</cite> or estimating signals feeding into a beamforming filter <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib7" title="">7</a>]</cite>. Such approaches usually do not take multi-speaker scenarios into account, limiting its application, or employ additional sensors (e.g., cameras) for guiding the beam, which can be prohibitive for most hearing aid devices.</p> </div> <div class="ltx_para" id="S1.p3"> <p class="ltx_p" id="S1.p3.1">Inspired from the findings of <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib3" title="">3</a>]</cite> regarding the presence of an undershot angle between listener’s head and speaker direction, we propose a speaker selection mechanism for the training of beamforming neural networks. The mechanism teaches the model to focus on the target speaker based on the smallest undershot angle, requiring only audio information during inference. Through acoustic simulations, we show that a neural network trained with the mechanism is able to outperform the baseline model, trained without it, and the MVDR filter <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib8" title="">8</a>]</cite>. We also show that the proposed algorithm is robust to changes in number and position of speakers, a significant progress toward solving the cocktail party problem.</p> </div> </section> <section class="ltx_section" id="S2"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">II </span><span class="ltx_text ltx_font_smallcaps" id="S2.1.1">Preliminaries</span> </h2> <div class="ltx_para" id="S2.p1"> <p class="ltx_p" id="S2.p1.1">The problem we tackle consists of multi-microphone audio beamforming in a multi-speaker scenario, where the microphones are positioned as of simulating hearing aid devices wore by a listener. <math alttext="N" class="ltx_Math" display="inline" id="S2.p1.1.m1.1"><semantics id="S2.p1.1.m1.1a"><mi id="S2.p1.1.m1.1.1" xref="S2.p1.1.m1.1.1.cmml">N</mi><annotation-xml encoding="MathML-Content" id="S2.p1.1.m1.1b"><ci id="S2.p1.1.m1.1.1.cmml" xref="S2.p1.1.m1.1.1">𝑁</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.p1.1.m1.1c">N</annotation><annotation encoding="application/x-llamapun" id="S2.p1.1.m1.1d">italic_N</annotation></semantics></math> speakers and a listener are randomly positioned in a reverberant room. The listener can look toward one of the speakers directly, or with an <em class="ltx_emph ltx_font_italic" id="S2.p1.1.1">undershot</em> azimuth angle, as pointed out in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib3" title="">3</a>]</cite>. For generality, we also consider the overshot (though we prioritize the term undershot for readability) when the listener looks further than the desired speaker angle. Our objective is to extract the clean reverberant speech of the desired speaker only using audio information.</p> </div> <div class="ltx_para" id="S2.p2"> <p class="ltx_p" id="S2.p2.10">An example can be seen in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S2.F1" title="Figure 1 ‣ II Preliminaries ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_tag">1</span></a>, where in a reverberant room, a listener <math alttext="L" class="ltx_Math" display="inline" id="S2.p2.1.m1.1"><semantics id="S2.p2.1.m1.1a"><mi id="S2.p2.1.m1.1.1" xref="S2.p2.1.m1.1.1.cmml">L</mi><annotation-xml encoding="MathML-Content" id="S2.p2.1.m1.1b"><ci id="S2.p2.1.m1.1.1.cmml" xref="S2.p2.1.m1.1.1">𝐿</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.p2.1.m1.1c">L</annotation><annotation encoding="application/x-llamapun" id="S2.p2.1.m1.1d">italic_L</annotation></semantics></math> looks toward a speaker <math alttext="S_{2}" class="ltx_Math" display="inline" id="S2.p2.2.m2.1"><semantics id="S2.p2.2.m2.1a"><msub id="S2.p2.2.m2.1.1" xref="S2.p2.2.m2.1.1.cmml"><mi id="S2.p2.2.m2.1.1.2" xref="S2.p2.2.m2.1.1.2.cmml">S</mi><mn id="S2.p2.2.m2.1.1.3" xref="S2.p2.2.m2.1.1.3.cmml">2</mn></msub><annotation-xml encoding="MathML-Content" id="S2.p2.2.m2.1b"><apply id="S2.p2.2.m2.1.1.cmml" xref="S2.p2.2.m2.1.1"><csymbol cd="ambiguous" id="S2.p2.2.m2.1.1.1.cmml" xref="S2.p2.2.m2.1.1">subscript</csymbol><ci id="S2.p2.2.m2.1.1.2.cmml" xref="S2.p2.2.m2.1.1.2">𝑆</ci><cn id="S2.p2.2.m2.1.1.3.cmml" type="integer" xref="S2.p2.2.m2.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p2.2.m2.1c">S_{2}</annotation><annotation encoding="application/x-llamapun" id="S2.p2.2.m2.1d">italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT</annotation></semantics></math> with an undershot angle <math alttext="\theta_{u}" class="ltx_Math" display="inline" id="S2.p2.3.m3.1"><semantics id="S2.p2.3.m3.1a"><msub id="S2.p2.3.m3.1.1" xref="S2.p2.3.m3.1.1.cmml"><mi id="S2.p2.3.m3.1.1.2" xref="S2.p2.3.m3.1.1.2.cmml">θ</mi><mi id="S2.p2.3.m3.1.1.3" xref="S2.p2.3.m3.1.1.3.cmml">u</mi></msub><annotation-xml encoding="MathML-Content" id="S2.p2.3.m3.1b"><apply id="S2.p2.3.m3.1.1.cmml" xref="S2.p2.3.m3.1.1"><csymbol cd="ambiguous" id="S2.p2.3.m3.1.1.1.cmml" xref="S2.p2.3.m3.1.1">subscript</csymbol><ci id="S2.p2.3.m3.1.1.2.cmml" xref="S2.p2.3.m3.1.1.2">𝜃</ci><ci id="S2.p2.3.m3.1.1.3.cmml" xref="S2.p2.3.m3.1.1.3">𝑢</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p2.3.m3.1c">\theta_{u}</annotation><annotation encoding="application/x-llamapun" id="S2.p2.3.m3.1d">italic_θ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT</annotation></semantics></math>. In this case, <math alttext="S_{2}" class="ltx_Math" display="inline" id="S2.p2.4.m4.1"><semantics id="S2.p2.4.m4.1a"><msub id="S2.p2.4.m4.1.1" xref="S2.p2.4.m4.1.1.cmml"><mi id="S2.p2.4.m4.1.1.2" xref="S2.p2.4.m4.1.1.2.cmml">S</mi><mn id="S2.p2.4.m4.1.1.3" xref="S2.p2.4.m4.1.1.3.cmml">2</mn></msub><annotation-xml encoding="MathML-Content" id="S2.p2.4.m4.1b"><apply id="S2.p2.4.m4.1.1.cmml" xref="S2.p2.4.m4.1.1"><csymbol cd="ambiguous" id="S2.p2.4.m4.1.1.1.cmml" xref="S2.p2.4.m4.1.1">subscript</csymbol><ci id="S2.p2.4.m4.1.1.2.cmml" xref="S2.p2.4.m4.1.1.2">𝑆</ci><cn id="S2.p2.4.m4.1.1.3.cmml" type="integer" xref="S2.p2.4.m4.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p2.4.m4.1c">S_{2}</annotation><annotation encoding="application/x-llamapun" id="S2.p2.4.m4.1d">italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT</annotation></semantics></math> is the desired speaker while <math alttext="S_{1}" class="ltx_Math" display="inline" id="S2.p2.5.m5.1"><semantics id="S2.p2.5.m5.1a"><msub id="S2.p2.5.m5.1.1" xref="S2.p2.5.m5.1.1.cmml"><mi id="S2.p2.5.m5.1.1.2" xref="S2.p2.5.m5.1.1.2.cmml">S</mi><mn id="S2.p2.5.m5.1.1.3" xref="S2.p2.5.m5.1.1.3.cmml">1</mn></msub><annotation-xml encoding="MathML-Content" id="S2.p2.5.m5.1b"><apply id="S2.p2.5.m5.1.1.cmml" xref="S2.p2.5.m5.1.1"><csymbol cd="ambiguous" id="S2.p2.5.m5.1.1.1.cmml" xref="S2.p2.5.m5.1.1">subscript</csymbol><ci id="S2.p2.5.m5.1.1.2.cmml" xref="S2.p2.5.m5.1.1.2">𝑆</ci><cn id="S2.p2.5.m5.1.1.3.cmml" type="integer" xref="S2.p2.5.m5.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p2.5.m5.1c">S_{1}</annotation><annotation encoding="application/x-llamapun" id="S2.p2.5.m5.1d">italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT</annotation></semantics></math> is undesired. The undershot angle can be described in terms of the listener’s head center axis angle <math alttext="\theta_{h}" class="ltx_Math" display="inline" id="S2.p2.6.m6.1"><semantics id="S2.p2.6.m6.1a"><msub id="S2.p2.6.m6.1.1" xref="S2.p2.6.m6.1.1.cmml"><mi id="S2.p2.6.m6.1.1.2" xref="S2.p2.6.m6.1.1.2.cmml">θ</mi><mi id="S2.p2.6.m6.1.1.3" xref="S2.p2.6.m6.1.1.3.cmml">h</mi></msub><annotation-xml encoding="MathML-Content" id="S2.p2.6.m6.1b"><apply id="S2.p2.6.m6.1.1.cmml" xref="S2.p2.6.m6.1.1"><csymbol cd="ambiguous" id="S2.p2.6.m6.1.1.1.cmml" xref="S2.p2.6.m6.1.1">subscript</csymbol><ci id="S2.p2.6.m6.1.1.2.cmml" xref="S2.p2.6.m6.1.1.2">𝜃</ci><ci id="S2.p2.6.m6.1.1.3.cmml" xref="S2.p2.6.m6.1.1.3">ℎ</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p2.6.m6.1c">\theta_{h}</annotation><annotation encoding="application/x-llamapun" id="S2.p2.6.m6.1d">italic_θ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT</annotation></semantics></math> and the angle of the desired speaker <math alttext="\theta_{S_{2}}" class="ltx_Math" display="inline" id="S2.p2.7.m7.1"><semantics id="S2.p2.7.m7.1a"><msub id="S2.p2.7.m7.1.1" xref="S2.p2.7.m7.1.1.cmml"><mi id="S2.p2.7.m7.1.1.2" xref="S2.p2.7.m7.1.1.2.cmml">θ</mi><msub id="S2.p2.7.m7.1.1.3" xref="S2.p2.7.m7.1.1.3.cmml"><mi id="S2.p2.7.m7.1.1.3.2" xref="S2.p2.7.m7.1.1.3.2.cmml">S</mi><mn id="S2.p2.7.m7.1.1.3.3" xref="S2.p2.7.m7.1.1.3.3.cmml">2</mn></msub></msub><annotation-xml encoding="MathML-Content" id="S2.p2.7.m7.1b"><apply id="S2.p2.7.m7.1.1.cmml" xref="S2.p2.7.m7.1.1"><csymbol cd="ambiguous" id="S2.p2.7.m7.1.1.1.cmml" xref="S2.p2.7.m7.1.1">subscript</csymbol><ci id="S2.p2.7.m7.1.1.2.cmml" xref="S2.p2.7.m7.1.1.2">𝜃</ci><apply id="S2.p2.7.m7.1.1.3.cmml" xref="S2.p2.7.m7.1.1.3"><csymbol cd="ambiguous" id="S2.p2.7.m7.1.1.3.1.cmml" xref="S2.p2.7.m7.1.1.3">subscript</csymbol><ci id="S2.p2.7.m7.1.1.3.2.cmml" xref="S2.p2.7.m7.1.1.3.2">𝑆</ci><cn id="S2.p2.7.m7.1.1.3.3.cmml" type="integer" xref="S2.p2.7.m7.1.1.3.3">2</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p2.7.m7.1c">\theta_{S_{2}}</annotation><annotation encoding="application/x-llamapun" id="S2.p2.7.m7.1d">italic_θ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT</annotation></semantics></math> (<math alttext="\theta_{S_{1}}" class="ltx_Math" display="inline" id="S2.p2.8.m8.1"><semantics id="S2.p2.8.m8.1a"><msub id="S2.p2.8.m8.1.1" xref="S2.p2.8.m8.1.1.cmml"><mi id="S2.p2.8.m8.1.1.2" xref="S2.p2.8.m8.1.1.2.cmml">θ</mi><msub id="S2.p2.8.m8.1.1.3" xref="S2.p2.8.m8.1.1.3.cmml"><mi id="S2.p2.8.m8.1.1.3.2" xref="S2.p2.8.m8.1.1.3.2.cmml">S</mi><mn id="S2.p2.8.m8.1.1.3.3" xref="S2.p2.8.m8.1.1.3.3.cmml">1</mn></msub></msub><annotation-xml encoding="MathML-Content" id="S2.p2.8.m8.1b"><apply id="S2.p2.8.m8.1.1.cmml" xref="S2.p2.8.m8.1.1"><csymbol cd="ambiguous" id="S2.p2.8.m8.1.1.1.cmml" xref="S2.p2.8.m8.1.1">subscript</csymbol><ci id="S2.p2.8.m8.1.1.2.cmml" xref="S2.p2.8.m8.1.1.2">𝜃</ci><apply id="S2.p2.8.m8.1.1.3.cmml" xref="S2.p2.8.m8.1.1.3"><csymbol cd="ambiguous" id="S2.p2.8.m8.1.1.3.1.cmml" xref="S2.p2.8.m8.1.1.3">subscript</csymbol><ci id="S2.p2.8.m8.1.1.3.2.cmml" xref="S2.p2.8.m8.1.1.3.2">𝑆</ci><cn id="S2.p2.8.m8.1.1.3.3.cmml" type="integer" xref="S2.p2.8.m8.1.1.3.3">1</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p2.8.m8.1c">\theta_{S_{1}}</annotation><annotation encoding="application/x-llamapun" id="S2.p2.8.m8.1d">italic_θ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT</annotation></semantics></math> for the undersired speaker), in relation to the listener’s x-axis, as <math alttext="|\theta_{u}|=|\theta_{h}-\theta_{S_{2}}|" class="ltx_Math" display="inline" id="S2.p2.9.m9.2"><semantics id="S2.p2.9.m9.2a"><mrow id="S2.p2.9.m9.2.2" xref="S2.p2.9.m9.2.2.cmml"><mrow id="S2.p2.9.m9.1.1.1.1" xref="S2.p2.9.m9.1.1.1.2.cmml"><mo id="S2.p2.9.m9.1.1.1.1.2" stretchy="false" xref="S2.p2.9.m9.1.1.1.2.1.cmml">|</mo><msub id="S2.p2.9.m9.1.1.1.1.1" xref="S2.p2.9.m9.1.1.1.1.1.cmml"><mi id="S2.p2.9.m9.1.1.1.1.1.2" xref="S2.p2.9.m9.1.1.1.1.1.2.cmml">θ</mi><mi id="S2.p2.9.m9.1.1.1.1.1.3" xref="S2.p2.9.m9.1.1.1.1.1.3.cmml">u</mi></msub><mo id="S2.p2.9.m9.1.1.1.1.3" stretchy="false" xref="S2.p2.9.m9.1.1.1.2.1.cmml">|</mo></mrow><mo id="S2.p2.9.m9.2.2.3" xref="S2.p2.9.m9.2.2.3.cmml">=</mo><mrow id="S2.p2.9.m9.2.2.2.1" xref="S2.p2.9.m9.2.2.2.2.cmml"><mo id="S2.p2.9.m9.2.2.2.1.2" stretchy="false" xref="S2.p2.9.m9.2.2.2.2.1.cmml">|</mo><mrow id="S2.p2.9.m9.2.2.2.1.1" xref="S2.p2.9.m9.2.2.2.1.1.cmml"><msub id="S2.p2.9.m9.2.2.2.1.1.2" xref="S2.p2.9.m9.2.2.2.1.1.2.cmml"><mi id="S2.p2.9.m9.2.2.2.1.1.2.2" xref="S2.p2.9.m9.2.2.2.1.1.2.2.cmml">θ</mi><mi id="S2.p2.9.m9.2.2.2.1.1.2.3" xref="S2.p2.9.m9.2.2.2.1.1.2.3.cmml">h</mi></msub><mo id="S2.p2.9.m9.2.2.2.1.1.1" xref="S2.p2.9.m9.2.2.2.1.1.1.cmml">−</mo><msub id="S2.p2.9.m9.2.2.2.1.1.3" xref="S2.p2.9.m9.2.2.2.1.1.3.cmml"><mi id="S2.p2.9.m9.2.2.2.1.1.3.2" xref="S2.p2.9.m9.2.2.2.1.1.3.2.cmml">θ</mi><msub id="S2.p2.9.m9.2.2.2.1.1.3.3" xref="S2.p2.9.m9.2.2.2.1.1.3.3.cmml"><mi id="S2.p2.9.m9.2.2.2.1.1.3.3.2" xref="S2.p2.9.m9.2.2.2.1.1.3.3.2.cmml">S</mi><mn id="S2.p2.9.m9.2.2.2.1.1.3.3.3" xref="S2.p2.9.m9.2.2.2.1.1.3.3.3.cmml">2</mn></msub></msub></mrow><mo id="S2.p2.9.m9.2.2.2.1.3" stretchy="false" xref="S2.p2.9.m9.2.2.2.2.1.cmml">|</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.p2.9.m9.2b"><apply id="S2.p2.9.m9.2.2.cmml" xref="S2.p2.9.m9.2.2"><eq id="S2.p2.9.m9.2.2.3.cmml" xref="S2.p2.9.m9.2.2.3"></eq><apply id="S2.p2.9.m9.1.1.1.2.cmml" xref="S2.p2.9.m9.1.1.1.1"><abs id="S2.p2.9.m9.1.1.1.2.1.cmml" xref="S2.p2.9.m9.1.1.1.1.2"></abs><apply id="S2.p2.9.m9.1.1.1.1.1.cmml" xref="S2.p2.9.m9.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.p2.9.m9.1.1.1.1.1.1.cmml" xref="S2.p2.9.m9.1.1.1.1.1">subscript</csymbol><ci id="S2.p2.9.m9.1.1.1.1.1.2.cmml" xref="S2.p2.9.m9.1.1.1.1.1.2">𝜃</ci><ci id="S2.p2.9.m9.1.1.1.1.1.3.cmml" xref="S2.p2.9.m9.1.1.1.1.1.3">𝑢</ci></apply></apply><apply id="S2.p2.9.m9.2.2.2.2.cmml" xref="S2.p2.9.m9.2.2.2.1"><abs id="S2.p2.9.m9.2.2.2.2.1.cmml" xref="S2.p2.9.m9.2.2.2.1.2"></abs><apply id="S2.p2.9.m9.2.2.2.1.1.cmml" xref="S2.p2.9.m9.2.2.2.1.1"><minus id="S2.p2.9.m9.2.2.2.1.1.1.cmml" xref="S2.p2.9.m9.2.2.2.1.1.1"></minus><apply id="S2.p2.9.m9.2.2.2.1.1.2.cmml" xref="S2.p2.9.m9.2.2.2.1.1.2"><csymbol cd="ambiguous" id="S2.p2.9.m9.2.2.2.1.1.2.1.cmml" xref="S2.p2.9.m9.2.2.2.1.1.2">subscript</csymbol><ci id="S2.p2.9.m9.2.2.2.1.1.2.2.cmml" xref="S2.p2.9.m9.2.2.2.1.1.2.2">𝜃</ci><ci id="S2.p2.9.m9.2.2.2.1.1.2.3.cmml" xref="S2.p2.9.m9.2.2.2.1.1.2.3">ℎ</ci></apply><apply id="S2.p2.9.m9.2.2.2.1.1.3.cmml" xref="S2.p2.9.m9.2.2.2.1.1.3"><csymbol cd="ambiguous" id="S2.p2.9.m9.2.2.2.1.1.3.1.cmml" xref="S2.p2.9.m9.2.2.2.1.1.3">subscript</csymbol><ci id="S2.p2.9.m9.2.2.2.1.1.3.2.cmml" xref="S2.p2.9.m9.2.2.2.1.1.3.2">𝜃</ci><apply id="S2.p2.9.m9.2.2.2.1.1.3.3.cmml" xref="S2.p2.9.m9.2.2.2.1.1.3.3"><csymbol cd="ambiguous" id="S2.p2.9.m9.2.2.2.1.1.3.3.1.cmml" xref="S2.p2.9.m9.2.2.2.1.1.3.3">subscript</csymbol><ci id="S2.p2.9.m9.2.2.2.1.1.3.3.2.cmml" xref="S2.p2.9.m9.2.2.2.1.1.3.3.2">𝑆</ci><cn id="S2.p2.9.m9.2.2.2.1.1.3.3.3.cmml" type="integer" xref="S2.p2.9.m9.2.2.2.1.1.3.3.3">2</cn></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p2.9.m9.2c">|\theta_{u}|=|\theta_{h}-\theta_{S_{2}}|</annotation><annotation encoding="application/x-llamapun" id="S2.p2.9.m9.2d">| italic_θ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT | = | italic_θ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT |</annotation></semantics></math>. In this example, the objective would be to extract the reverberant speech of speaker <math alttext="S_{2}" class="ltx_Math" display="inline" id="S2.p2.10.m10.1"><semantics id="S2.p2.10.m10.1a"><msub id="S2.p2.10.m10.1.1" xref="S2.p2.10.m10.1.1.cmml"><mi id="S2.p2.10.m10.1.1.2" xref="S2.p2.10.m10.1.1.2.cmml">S</mi><mn id="S2.p2.10.m10.1.1.3" xref="S2.p2.10.m10.1.1.3.cmml">2</mn></msub><annotation-xml encoding="MathML-Content" id="S2.p2.10.m10.1b"><apply id="S2.p2.10.m10.1.1.cmml" xref="S2.p2.10.m10.1.1"><csymbol cd="ambiguous" id="S2.p2.10.m10.1.1.1.cmml" xref="S2.p2.10.m10.1.1">subscript</csymbol><ci id="S2.p2.10.m10.1.1.2.cmml" xref="S2.p2.10.m10.1.1.2">𝑆</ci><cn id="S2.p2.10.m10.1.1.3.cmml" type="integer" xref="S2.p2.10.m10.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p2.10.m10.1c">S_{2}</annotation><annotation encoding="application/x-llamapun" id="S2.p2.10.m10.1d">italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT</annotation></semantics></math> as received by a reference microphone.</p> </div> <div class="ltx_para" id="S2.p3"> <p class="ltx_p" id="S2.p3.8">The output <math alttext="y_{m}(t)" class="ltx_Math" display="inline" id="S2.p3.1.m1.1"><semantics id="S2.p3.1.m1.1a"><mrow id="S2.p3.1.m1.1.2" xref="S2.p3.1.m1.1.2.cmml"><msub id="S2.p3.1.m1.1.2.2" xref="S2.p3.1.m1.1.2.2.cmml"><mi id="S2.p3.1.m1.1.2.2.2" xref="S2.p3.1.m1.1.2.2.2.cmml">y</mi><mi id="S2.p3.1.m1.1.2.2.3" xref="S2.p3.1.m1.1.2.2.3.cmml">m</mi></msub><mo id="S2.p3.1.m1.1.2.1" xref="S2.p3.1.m1.1.2.1.cmml">⁢</mo><mrow id="S2.p3.1.m1.1.2.3.2" xref="S2.p3.1.m1.1.2.cmml"><mo id="S2.p3.1.m1.1.2.3.2.1" stretchy="false" xref="S2.p3.1.m1.1.2.cmml">(</mo><mi id="S2.p3.1.m1.1.1" xref="S2.p3.1.m1.1.1.cmml">t</mi><mo id="S2.p3.1.m1.1.2.3.2.2" stretchy="false" xref="S2.p3.1.m1.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.p3.1.m1.1b"><apply id="S2.p3.1.m1.1.2.cmml" xref="S2.p3.1.m1.1.2"><times id="S2.p3.1.m1.1.2.1.cmml" xref="S2.p3.1.m1.1.2.1"></times><apply id="S2.p3.1.m1.1.2.2.cmml" xref="S2.p3.1.m1.1.2.2"><csymbol cd="ambiguous" id="S2.p3.1.m1.1.2.2.1.cmml" xref="S2.p3.1.m1.1.2.2">subscript</csymbol><ci id="S2.p3.1.m1.1.2.2.2.cmml" xref="S2.p3.1.m1.1.2.2.2">𝑦</ci><ci id="S2.p3.1.m1.1.2.2.3.cmml" xref="S2.p3.1.m1.1.2.2.3">𝑚</ci></apply><ci id="S2.p3.1.m1.1.1.cmml" xref="S2.p3.1.m1.1.1">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p3.1.m1.1c">y_{m}(t)</annotation><annotation encoding="application/x-llamapun" id="S2.p3.1.m1.1d">italic_y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t )</annotation></semantics></math> of each microphone <math alttext="m" class="ltx_Math" display="inline" id="S2.p3.2.m2.1"><semantics id="S2.p3.2.m2.1a"><mi id="S2.p3.2.m2.1.1" xref="S2.p3.2.m2.1.1.cmml">m</mi><annotation-xml encoding="MathML-Content" id="S2.p3.2.m2.1b"><ci id="S2.p3.2.m2.1.1.cmml" xref="S2.p3.2.m2.1.1">𝑚</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.p3.2.m2.1c">m</annotation><annotation encoding="application/x-llamapun" id="S2.p3.2.m2.1d">italic_m</annotation></semantics></math> is defined by the speech fragment <math alttext="s_{n}(t)" class="ltx_Math" display="inline" id="S2.p3.3.m3.1"><semantics id="S2.p3.3.m3.1a"><mrow id="S2.p3.3.m3.1.2" xref="S2.p3.3.m3.1.2.cmml"><msub id="S2.p3.3.m3.1.2.2" xref="S2.p3.3.m3.1.2.2.cmml"><mi id="S2.p3.3.m3.1.2.2.2" xref="S2.p3.3.m3.1.2.2.2.cmml">s</mi><mi id="S2.p3.3.m3.1.2.2.3" xref="S2.p3.3.m3.1.2.2.3.cmml">n</mi></msub><mo id="S2.p3.3.m3.1.2.1" xref="S2.p3.3.m3.1.2.1.cmml">⁢</mo><mrow id="S2.p3.3.m3.1.2.3.2" xref="S2.p3.3.m3.1.2.cmml"><mo id="S2.p3.3.m3.1.2.3.2.1" stretchy="false" xref="S2.p3.3.m3.1.2.cmml">(</mo><mi id="S2.p3.3.m3.1.1" xref="S2.p3.3.m3.1.1.cmml">t</mi><mo id="S2.p3.3.m3.1.2.3.2.2" stretchy="false" xref="S2.p3.3.m3.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.p3.3.m3.1b"><apply id="S2.p3.3.m3.1.2.cmml" xref="S2.p3.3.m3.1.2"><times id="S2.p3.3.m3.1.2.1.cmml" xref="S2.p3.3.m3.1.2.1"></times><apply id="S2.p3.3.m3.1.2.2.cmml" xref="S2.p3.3.m3.1.2.2"><csymbol cd="ambiguous" id="S2.p3.3.m3.1.2.2.1.cmml" xref="S2.p3.3.m3.1.2.2">subscript</csymbol><ci id="S2.p3.3.m3.1.2.2.2.cmml" xref="S2.p3.3.m3.1.2.2.2">𝑠</ci><ci id="S2.p3.3.m3.1.2.2.3.cmml" xref="S2.p3.3.m3.1.2.2.3">𝑛</ci></apply><ci id="S2.p3.3.m3.1.1.cmml" xref="S2.p3.3.m3.1.1">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p3.3.m3.1c">s_{n}(t)</annotation><annotation encoding="application/x-llamapun" id="S2.p3.3.m3.1d">italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t )</annotation></semantics></math> of speaker <math alttext="n" class="ltx_Math" display="inline" id="S2.p3.4.m4.1"><semantics id="S2.p3.4.m4.1a"><mi id="S2.p3.4.m4.1.1" xref="S2.p3.4.m4.1.1.cmml">n</mi><annotation-xml encoding="MathML-Content" id="S2.p3.4.m4.1b"><ci id="S2.p3.4.m4.1.1.cmml" xref="S2.p3.4.m4.1.1">𝑛</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.p3.4.m4.1c">n</annotation><annotation encoding="application/x-llamapun" id="S2.p3.4.m4.1d">italic_n</annotation></semantics></math>, convolutioned (<math alttext="\ast" class="ltx_Math" display="inline" id="S2.p3.5.m5.1"><semantics id="S2.p3.5.m5.1a"><mo id="S2.p3.5.m5.1.1" xref="S2.p3.5.m5.1.1.cmml">∗</mo><annotation-xml encoding="MathML-Content" id="S2.p3.5.m5.1b"><ci id="S2.p3.5.m5.1.1.cmml" xref="S2.p3.5.m5.1.1">∗</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.p3.5.m5.1c">\ast</annotation><annotation encoding="application/x-llamapun" id="S2.p3.5.m5.1d">∗</annotation></semantics></math>) with a room impulse response (RIR) <math alttext="g_{nm}(t)" class="ltx_Math" display="inline" id="S2.p3.6.m6.1"><semantics id="S2.p3.6.m6.1a"><mrow id="S2.p3.6.m6.1.2" xref="S2.p3.6.m6.1.2.cmml"><msub id="S2.p3.6.m6.1.2.2" xref="S2.p3.6.m6.1.2.2.cmml"><mi id="S2.p3.6.m6.1.2.2.2" xref="S2.p3.6.m6.1.2.2.2.cmml">g</mi><mrow id="S2.p3.6.m6.1.2.2.3" xref="S2.p3.6.m6.1.2.2.3.cmml"><mi id="S2.p3.6.m6.1.2.2.3.2" xref="S2.p3.6.m6.1.2.2.3.2.cmml">n</mi><mo id="S2.p3.6.m6.1.2.2.3.1" xref="S2.p3.6.m6.1.2.2.3.1.cmml">⁢</mo><mi id="S2.p3.6.m6.1.2.2.3.3" xref="S2.p3.6.m6.1.2.2.3.3.cmml">m</mi></mrow></msub><mo id="S2.p3.6.m6.1.2.1" xref="S2.p3.6.m6.1.2.1.cmml">⁢</mo><mrow id="S2.p3.6.m6.1.2.3.2" xref="S2.p3.6.m6.1.2.cmml"><mo id="S2.p3.6.m6.1.2.3.2.1" stretchy="false" xref="S2.p3.6.m6.1.2.cmml">(</mo><mi id="S2.p3.6.m6.1.1" xref="S2.p3.6.m6.1.1.cmml">t</mi><mo id="S2.p3.6.m6.1.2.3.2.2" stretchy="false" xref="S2.p3.6.m6.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.p3.6.m6.1b"><apply id="S2.p3.6.m6.1.2.cmml" xref="S2.p3.6.m6.1.2"><times id="S2.p3.6.m6.1.2.1.cmml" xref="S2.p3.6.m6.1.2.1"></times><apply id="S2.p3.6.m6.1.2.2.cmml" xref="S2.p3.6.m6.1.2.2"><csymbol cd="ambiguous" id="S2.p3.6.m6.1.2.2.1.cmml" xref="S2.p3.6.m6.1.2.2">subscript</csymbol><ci id="S2.p3.6.m6.1.2.2.2.cmml" xref="S2.p3.6.m6.1.2.2.2">𝑔</ci><apply id="S2.p3.6.m6.1.2.2.3.cmml" xref="S2.p3.6.m6.1.2.2.3"><times id="S2.p3.6.m6.1.2.2.3.1.cmml" xref="S2.p3.6.m6.1.2.2.3.1"></times><ci id="S2.p3.6.m6.1.2.2.3.2.cmml" xref="S2.p3.6.m6.1.2.2.3.2">𝑛</ci><ci id="S2.p3.6.m6.1.2.2.3.3.cmml" xref="S2.p3.6.m6.1.2.2.3.3">𝑚</ci></apply></apply><ci id="S2.p3.6.m6.1.1.cmml" xref="S2.p3.6.m6.1.1">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p3.6.m6.1c">g_{nm}(t)</annotation><annotation encoding="application/x-llamapun" id="S2.p3.6.m6.1d">italic_g start_POSTSUBSCRIPT italic_n italic_m end_POSTSUBSCRIPT ( italic_t )</annotation></semantics></math> from speaker <math alttext="n" class="ltx_Math" display="inline" id="S2.p3.7.m7.1"><semantics id="S2.p3.7.m7.1a"><mi id="S2.p3.7.m7.1.1" xref="S2.p3.7.m7.1.1.cmml">n</mi><annotation-xml encoding="MathML-Content" id="S2.p3.7.m7.1b"><ci id="S2.p3.7.m7.1.1.cmml" xref="S2.p3.7.m7.1.1">𝑛</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.p3.7.m7.1c">n</annotation><annotation encoding="application/x-llamapun" id="S2.p3.7.m7.1d">italic_n</annotation></semantics></math> to microphone <math alttext="m" class="ltx_Math" display="inline" id="S2.p3.8.m8.1"><semantics id="S2.p3.8.m8.1a"><mi id="S2.p3.8.m8.1.1" xref="S2.p3.8.m8.1.1.cmml">m</mi><annotation-xml encoding="MathML-Content" id="S2.p3.8.m8.1b"><ci id="S2.p3.8.m8.1.1.cmml" xref="S2.p3.8.m8.1.1">𝑚</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.p3.8.m8.1c">m</annotation><annotation encoding="application/x-llamapun" id="S2.p3.8.m8.1d">italic_m</annotation></semantics></math>, summed for all speakers. This is described as</p> <table class="ltx_equation ltx_eqn_table" id="S2.E1"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="y_{m}(t)=\sum_{n}s_{n}(t)\ast g_{n,m}(t)." class="ltx_Math" display="block" id="S2.E1.m1.6"><semantics id="S2.E1.m1.6a"><mrow id="S2.E1.m1.6.6.1" xref="S2.E1.m1.6.6.1.1.cmml"><mrow id="S2.E1.m1.6.6.1.1" xref="S2.E1.m1.6.6.1.1.cmml"><mrow id="S2.E1.m1.6.6.1.1.2" xref="S2.E1.m1.6.6.1.1.2.cmml"><msub id="S2.E1.m1.6.6.1.1.2.2" xref="S2.E1.m1.6.6.1.1.2.2.cmml"><mi id="S2.E1.m1.6.6.1.1.2.2.2" xref="S2.E1.m1.6.6.1.1.2.2.2.cmml">y</mi><mi id="S2.E1.m1.6.6.1.1.2.2.3" xref="S2.E1.m1.6.6.1.1.2.2.3.cmml">m</mi></msub><mo id="S2.E1.m1.6.6.1.1.2.1" xref="S2.E1.m1.6.6.1.1.2.1.cmml">⁢</mo><mrow id="S2.E1.m1.6.6.1.1.2.3.2" xref="S2.E1.m1.6.6.1.1.2.cmml"><mo id="S2.E1.m1.6.6.1.1.2.3.2.1" stretchy="false" xref="S2.E1.m1.6.6.1.1.2.cmml">(</mo><mi id="S2.E1.m1.3.3" xref="S2.E1.m1.3.3.cmml">t</mi><mo id="S2.E1.m1.6.6.1.1.2.3.2.2" stretchy="false" xref="S2.E1.m1.6.6.1.1.2.cmml">)</mo></mrow></mrow><mo id="S2.E1.m1.6.6.1.1.1" rspace="0.111em" xref="S2.E1.m1.6.6.1.1.1.cmml">=</mo><mrow id="S2.E1.m1.6.6.1.1.3" xref="S2.E1.m1.6.6.1.1.3.cmml"><munder id="S2.E1.m1.6.6.1.1.3.1" xref="S2.E1.m1.6.6.1.1.3.1.cmml"><mo id="S2.E1.m1.6.6.1.1.3.1.2" movablelimits="false" xref="S2.E1.m1.6.6.1.1.3.1.2.cmml">∑</mo><mi id="S2.E1.m1.6.6.1.1.3.1.3" xref="S2.E1.m1.6.6.1.1.3.1.3.cmml">n</mi></munder><mrow id="S2.E1.m1.6.6.1.1.3.2" xref="S2.E1.m1.6.6.1.1.3.2.cmml"><mrow id="S2.E1.m1.6.6.1.1.3.2.2" xref="S2.E1.m1.6.6.1.1.3.2.2.cmml"><mrow id="S2.E1.m1.6.6.1.1.3.2.2.2" xref="S2.E1.m1.6.6.1.1.3.2.2.2.cmml"><msub id="S2.E1.m1.6.6.1.1.3.2.2.2.2" xref="S2.E1.m1.6.6.1.1.3.2.2.2.2.cmml"><mi id="S2.E1.m1.6.6.1.1.3.2.2.2.2.2" xref="S2.E1.m1.6.6.1.1.3.2.2.2.2.2.cmml">s</mi><mi id="S2.E1.m1.6.6.1.1.3.2.2.2.2.3" xref="S2.E1.m1.6.6.1.1.3.2.2.2.2.3.cmml">n</mi></msub><mo id="S2.E1.m1.6.6.1.1.3.2.2.2.1" xref="S2.E1.m1.6.6.1.1.3.2.2.2.1.cmml">⁢</mo><mrow id="S2.E1.m1.6.6.1.1.3.2.2.2.3.2" xref="S2.E1.m1.6.6.1.1.3.2.2.2.cmml"><mo id="S2.E1.m1.6.6.1.1.3.2.2.2.3.2.1" stretchy="false" xref="S2.E1.m1.6.6.1.1.3.2.2.2.cmml">(</mo><mi id="S2.E1.m1.4.4" xref="S2.E1.m1.4.4.cmml">t</mi><mo id="S2.E1.m1.6.6.1.1.3.2.2.2.3.2.2" rspace="0.055em" stretchy="false" xref="S2.E1.m1.6.6.1.1.3.2.2.2.cmml">)</mo></mrow></mrow><mo id="S2.E1.m1.6.6.1.1.3.2.2.1" rspace="0.222em" xref="S2.E1.m1.6.6.1.1.3.2.2.1.cmml">∗</mo><msub id="S2.E1.m1.6.6.1.1.3.2.2.3" xref="S2.E1.m1.6.6.1.1.3.2.2.3.cmml"><mi id="S2.E1.m1.6.6.1.1.3.2.2.3.2" xref="S2.E1.m1.6.6.1.1.3.2.2.3.2.cmml">g</mi><mrow id="S2.E1.m1.2.2.2.4" xref="S2.E1.m1.2.2.2.3.cmml"><mi id="S2.E1.m1.1.1.1.1" xref="S2.E1.m1.1.1.1.1.cmml">n</mi><mo id="S2.E1.m1.2.2.2.4.1" xref="S2.E1.m1.2.2.2.3.cmml">,</mo><mi id="S2.E1.m1.2.2.2.2" xref="S2.E1.m1.2.2.2.2.cmml">m</mi></mrow></msub></mrow><mo id="S2.E1.m1.6.6.1.1.3.2.1" xref="S2.E1.m1.6.6.1.1.3.2.1.cmml">⁢</mo><mrow id="S2.E1.m1.6.6.1.1.3.2.3.2" xref="S2.E1.m1.6.6.1.1.3.2.cmml"><mo id="S2.E1.m1.6.6.1.1.3.2.3.2.1" stretchy="false" xref="S2.E1.m1.6.6.1.1.3.2.cmml">(</mo><mi id="S2.E1.m1.5.5" xref="S2.E1.m1.5.5.cmml">t</mi><mo id="S2.E1.m1.6.6.1.1.3.2.3.2.2" stretchy="false" xref="S2.E1.m1.6.6.1.1.3.2.cmml">)</mo></mrow></mrow></mrow></mrow><mo id="S2.E1.m1.6.6.1.2" lspace="0em" xref="S2.E1.m1.6.6.1.1.cmml">.</mo></mrow><annotation-xml encoding="MathML-Content" id="S2.E1.m1.6b"><apply id="S2.E1.m1.6.6.1.1.cmml" xref="S2.E1.m1.6.6.1"><eq id="S2.E1.m1.6.6.1.1.1.cmml" xref="S2.E1.m1.6.6.1.1.1"></eq><apply id="S2.E1.m1.6.6.1.1.2.cmml" xref="S2.E1.m1.6.6.1.1.2"><times id="S2.E1.m1.6.6.1.1.2.1.cmml" xref="S2.E1.m1.6.6.1.1.2.1"></times><apply id="S2.E1.m1.6.6.1.1.2.2.cmml" xref="S2.E1.m1.6.6.1.1.2.2"><csymbol cd="ambiguous" id="S2.E1.m1.6.6.1.1.2.2.1.cmml" xref="S2.E1.m1.6.6.1.1.2.2">subscript</csymbol><ci id="S2.E1.m1.6.6.1.1.2.2.2.cmml" xref="S2.E1.m1.6.6.1.1.2.2.2">𝑦</ci><ci id="S2.E1.m1.6.6.1.1.2.2.3.cmml" xref="S2.E1.m1.6.6.1.1.2.2.3">𝑚</ci></apply><ci id="S2.E1.m1.3.3.cmml" xref="S2.E1.m1.3.3">𝑡</ci></apply><apply id="S2.E1.m1.6.6.1.1.3.cmml" xref="S2.E1.m1.6.6.1.1.3"><apply id="S2.E1.m1.6.6.1.1.3.1.cmml" xref="S2.E1.m1.6.6.1.1.3.1"><csymbol cd="ambiguous" id="S2.E1.m1.6.6.1.1.3.1.1.cmml" xref="S2.E1.m1.6.6.1.1.3.1">subscript</csymbol><sum id="S2.E1.m1.6.6.1.1.3.1.2.cmml" xref="S2.E1.m1.6.6.1.1.3.1.2"></sum><ci id="S2.E1.m1.6.6.1.1.3.1.3.cmml" xref="S2.E1.m1.6.6.1.1.3.1.3">𝑛</ci></apply><apply id="S2.E1.m1.6.6.1.1.3.2.cmml" xref="S2.E1.m1.6.6.1.1.3.2"><times id="S2.E1.m1.6.6.1.1.3.2.1.cmml" xref="S2.E1.m1.6.6.1.1.3.2.1"></times><apply id="S2.E1.m1.6.6.1.1.3.2.2.cmml" xref="S2.E1.m1.6.6.1.1.3.2.2"><ci id="S2.E1.m1.6.6.1.1.3.2.2.1.cmml" xref="S2.E1.m1.6.6.1.1.3.2.2.1">∗</ci><apply id="S2.E1.m1.6.6.1.1.3.2.2.2.cmml" xref="S2.E1.m1.6.6.1.1.3.2.2.2"><times id="S2.E1.m1.6.6.1.1.3.2.2.2.1.cmml" xref="S2.E1.m1.6.6.1.1.3.2.2.2.1"></times><apply id="S2.E1.m1.6.6.1.1.3.2.2.2.2.cmml" xref="S2.E1.m1.6.6.1.1.3.2.2.2.2"><csymbol cd="ambiguous" id="S2.E1.m1.6.6.1.1.3.2.2.2.2.1.cmml" xref="S2.E1.m1.6.6.1.1.3.2.2.2.2">subscript</csymbol><ci id="S2.E1.m1.6.6.1.1.3.2.2.2.2.2.cmml" xref="S2.E1.m1.6.6.1.1.3.2.2.2.2.2">𝑠</ci><ci id="S2.E1.m1.6.6.1.1.3.2.2.2.2.3.cmml" xref="S2.E1.m1.6.6.1.1.3.2.2.2.2.3">𝑛</ci></apply><ci id="S2.E1.m1.4.4.cmml" xref="S2.E1.m1.4.4">𝑡</ci></apply><apply id="S2.E1.m1.6.6.1.1.3.2.2.3.cmml" xref="S2.E1.m1.6.6.1.1.3.2.2.3"><csymbol cd="ambiguous" id="S2.E1.m1.6.6.1.1.3.2.2.3.1.cmml" xref="S2.E1.m1.6.6.1.1.3.2.2.3">subscript</csymbol><ci id="S2.E1.m1.6.6.1.1.3.2.2.3.2.cmml" xref="S2.E1.m1.6.6.1.1.3.2.2.3.2">𝑔</ci><list id="S2.E1.m1.2.2.2.3.cmml" xref="S2.E1.m1.2.2.2.4"><ci id="S2.E1.m1.1.1.1.1.cmml" xref="S2.E1.m1.1.1.1.1">𝑛</ci><ci id="S2.E1.m1.2.2.2.2.cmml" xref="S2.E1.m1.2.2.2.2">𝑚</ci></list></apply></apply><ci id="S2.E1.m1.5.5.cmml" xref="S2.E1.m1.5.5">𝑡</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.E1.m1.6c">y_{m}(t)=\sum_{n}s_{n}(t)\ast g_{n,m}(t).</annotation><annotation encoding="application/x-llamapun" id="S2.E1.m1.6d">italic_y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) = ∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ∗ italic_g start_POSTSUBSCRIPT italic_n , italic_m end_POSTSUBSCRIPT ( italic_t ) .</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(1)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S2.p3.11">Our objective is to extract the desired (subscript <math alttext="d" class="ltx_Math" display="inline" id="S2.p3.9.m1.1"><semantics id="S2.p3.9.m1.1a"><mi id="S2.p3.9.m1.1.1" xref="S2.p3.9.m1.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S2.p3.9.m1.1b"><ci id="S2.p3.9.m1.1.1.cmml" xref="S2.p3.9.m1.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.p3.9.m1.1c">d</annotation><annotation encoding="application/x-llamapun" id="S2.p3.9.m1.1d">italic_d</annotation></semantics></math>) reverberant speech at the reference microphone <math alttext="s_{n=d}(t)\ast g_{n=d,m=ref.}(t)" class="ltx_Math" display="inline" id="S2.p3.10.m2.3"><semantics id="S2.p3.10.m2.3a"><mrow id="S2.p3.10.m2.3.4" xref="S2.p3.10.m2.3.4.cmml"><mrow id="S2.p3.10.m2.3.4.2" xref="S2.p3.10.m2.3.4.2.cmml"><mrow id="S2.p3.10.m2.3.4.2.2" xref="S2.p3.10.m2.3.4.2.2.cmml"><msub id="S2.p3.10.m2.3.4.2.2.2" xref="S2.p3.10.m2.3.4.2.2.2.cmml"><mi id="S2.p3.10.m2.3.4.2.2.2.2" xref="S2.p3.10.m2.3.4.2.2.2.2.cmml">s</mi><mrow id="S2.p3.10.m2.3.4.2.2.2.3" xref="S2.p3.10.m2.3.4.2.2.2.3.cmml"><mi id="S2.p3.10.m2.3.4.2.2.2.3.2" xref="S2.p3.10.m2.3.4.2.2.2.3.2.cmml">n</mi><mo id="S2.p3.10.m2.3.4.2.2.2.3.1" xref="S2.p3.10.m2.3.4.2.2.2.3.1.cmml">=</mo><mi id="S2.p3.10.m2.3.4.2.2.2.3.3" xref="S2.p3.10.m2.3.4.2.2.2.3.3.cmml">d</mi></mrow></msub><mo id="S2.p3.10.m2.3.4.2.2.1" xref="S2.p3.10.m2.3.4.2.2.1.cmml">⁢</mo><mrow id="S2.p3.10.m2.3.4.2.2.3.2" xref="S2.p3.10.m2.3.4.2.2.cmml"><mo id="S2.p3.10.m2.3.4.2.2.3.2.1" stretchy="false" xref="S2.p3.10.m2.3.4.2.2.cmml">(</mo><mi id="S2.p3.10.m2.2.2" xref="S2.p3.10.m2.2.2.cmml">t</mi><mo id="S2.p3.10.m2.3.4.2.2.3.2.2" rspace="0.055em" stretchy="false" xref="S2.p3.10.m2.3.4.2.2.cmml">)</mo></mrow></mrow><mo id="S2.p3.10.m2.3.4.2.1" rspace="0.222em" xref="S2.p3.10.m2.3.4.2.1.cmml">∗</mo><msub id="S2.p3.10.m2.3.4.2.3" xref="S2.p3.10.m2.3.4.2.3.cmml"><mi id="S2.p3.10.m2.3.4.2.3.2" xref="S2.p3.10.m2.3.4.2.3.2.cmml">g</mi><mrow id="S2.p3.10.m2.1.1.1.1" xref="S2.p3.10.m2.1.1.1.2.cmml"><mrow id="S2.p3.10.m2.1.1.1.1.1.2" xref="S2.p3.10.m2.1.1.1.1.1.3.cmml"><mrow id="S2.p3.10.m2.1.1.1.1.1.1.1" xref="S2.p3.10.m2.1.1.1.1.1.1.1.cmml"><mi id="S2.p3.10.m2.1.1.1.1.1.1.1.2" xref="S2.p3.10.m2.1.1.1.1.1.1.1.2.cmml">n</mi><mo id="S2.p3.10.m2.1.1.1.1.1.1.1.1" xref="S2.p3.10.m2.1.1.1.1.1.1.1.1.cmml">=</mo><mi id="S2.p3.10.m2.1.1.1.1.1.1.1.3" xref="S2.p3.10.m2.1.1.1.1.1.1.1.3.cmml">d</mi></mrow><mo id="S2.p3.10.m2.1.1.1.1.1.2.3" xref="S2.p3.10.m2.1.1.1.1.1.3a.cmml">,</mo><mrow id="S2.p3.10.m2.1.1.1.1.1.2.2" xref="S2.p3.10.m2.1.1.1.1.1.2.2.cmml"><mi id="S2.p3.10.m2.1.1.1.1.1.2.2.2" xref="S2.p3.10.m2.1.1.1.1.1.2.2.2.cmml">m</mi><mo id="S2.p3.10.m2.1.1.1.1.1.2.2.1" xref="S2.p3.10.m2.1.1.1.1.1.2.2.1.cmml">=</mo><mrow id="S2.p3.10.m2.1.1.1.1.1.2.2.3" xref="S2.p3.10.m2.1.1.1.1.1.2.2.3.cmml"><mi id="S2.p3.10.m2.1.1.1.1.1.2.2.3.2" xref="S2.p3.10.m2.1.1.1.1.1.2.2.3.2.cmml">r</mi><mo id="S2.p3.10.m2.1.1.1.1.1.2.2.3.1" xref="S2.p3.10.m2.1.1.1.1.1.2.2.3.1.cmml">⁢</mo><mi id="S2.p3.10.m2.1.1.1.1.1.2.2.3.3" xref="S2.p3.10.m2.1.1.1.1.1.2.2.3.3.cmml">e</mi><mo id="S2.p3.10.m2.1.1.1.1.1.2.2.3.1a" xref="S2.p3.10.m2.1.1.1.1.1.2.2.3.1.cmml">⁢</mo><mi id="S2.p3.10.m2.1.1.1.1.1.2.2.3.4" xref="S2.p3.10.m2.1.1.1.1.1.2.2.3.4.cmml">f</mi></mrow></mrow></mrow><mo id="S2.p3.10.m2.1.1.1.1.2" lspace="0em" xref="S2.p3.10.m2.1.1.1.2.cmml">.</mo></mrow></msub></mrow><mo id="S2.p3.10.m2.3.4.1" xref="S2.p3.10.m2.3.4.1.cmml">⁢</mo><mrow id="S2.p3.10.m2.3.4.3.2" xref="S2.p3.10.m2.3.4.cmml"><mo id="S2.p3.10.m2.3.4.3.2.1" stretchy="false" xref="S2.p3.10.m2.3.4.cmml">(</mo><mi id="S2.p3.10.m2.3.3" xref="S2.p3.10.m2.3.3.cmml">t</mi><mo id="S2.p3.10.m2.3.4.3.2.2" stretchy="false" xref="S2.p3.10.m2.3.4.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.p3.10.m2.3b"><apply id="S2.p3.10.m2.3.4.cmml" xref="S2.p3.10.m2.3.4"><times id="S2.p3.10.m2.3.4.1.cmml" xref="S2.p3.10.m2.3.4.1"></times><apply id="S2.p3.10.m2.3.4.2.cmml" xref="S2.p3.10.m2.3.4.2"><ci id="S2.p3.10.m2.3.4.2.1.cmml" xref="S2.p3.10.m2.3.4.2.1">∗</ci><apply id="S2.p3.10.m2.3.4.2.2.cmml" xref="S2.p3.10.m2.3.4.2.2"><times id="S2.p3.10.m2.3.4.2.2.1.cmml" xref="S2.p3.10.m2.3.4.2.2.1"></times><apply id="S2.p3.10.m2.3.4.2.2.2.cmml" xref="S2.p3.10.m2.3.4.2.2.2"><csymbol cd="ambiguous" id="S2.p3.10.m2.3.4.2.2.2.1.cmml" xref="S2.p3.10.m2.3.4.2.2.2">subscript</csymbol><ci id="S2.p3.10.m2.3.4.2.2.2.2.cmml" xref="S2.p3.10.m2.3.4.2.2.2.2">𝑠</ci><apply id="S2.p3.10.m2.3.4.2.2.2.3.cmml" xref="S2.p3.10.m2.3.4.2.2.2.3"><eq id="S2.p3.10.m2.3.4.2.2.2.3.1.cmml" xref="S2.p3.10.m2.3.4.2.2.2.3.1"></eq><ci id="S2.p3.10.m2.3.4.2.2.2.3.2.cmml" xref="S2.p3.10.m2.3.4.2.2.2.3.2">𝑛</ci><ci id="S2.p3.10.m2.3.4.2.2.2.3.3.cmml" xref="S2.p3.10.m2.3.4.2.2.2.3.3">𝑑</ci></apply></apply><ci id="S2.p3.10.m2.2.2.cmml" xref="S2.p3.10.m2.2.2">𝑡</ci></apply><apply id="S2.p3.10.m2.3.4.2.3.cmml" xref="S2.p3.10.m2.3.4.2.3"><csymbol cd="ambiguous" id="S2.p3.10.m2.3.4.2.3.1.cmml" xref="S2.p3.10.m2.3.4.2.3">subscript</csymbol><ci id="S2.p3.10.m2.3.4.2.3.2.cmml" xref="S2.p3.10.m2.3.4.2.3.2">𝑔</ci><list id="S2.p3.10.m2.1.1.1.2.cmml" xref="S2.p3.10.m2.1.1.1.1"><apply id="S2.p3.10.m2.1.1.1.1.1.3.cmml" xref="S2.p3.10.m2.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S2.p3.10.m2.1.1.1.1.1.3a.cmml" xref="S2.p3.10.m2.1.1.1.1.1.2.3">formulae-sequence</csymbol><apply id="S2.p3.10.m2.1.1.1.1.1.1.1.cmml" xref="S2.p3.10.m2.1.1.1.1.1.1.1"><eq id="S2.p3.10.m2.1.1.1.1.1.1.1.1.cmml" xref="S2.p3.10.m2.1.1.1.1.1.1.1.1"></eq><ci id="S2.p3.10.m2.1.1.1.1.1.1.1.2.cmml" xref="S2.p3.10.m2.1.1.1.1.1.1.1.2">𝑛</ci><ci id="S2.p3.10.m2.1.1.1.1.1.1.1.3.cmml" xref="S2.p3.10.m2.1.1.1.1.1.1.1.3">𝑑</ci></apply><apply id="S2.p3.10.m2.1.1.1.1.1.2.2.cmml" xref="S2.p3.10.m2.1.1.1.1.1.2.2"><eq id="S2.p3.10.m2.1.1.1.1.1.2.2.1.cmml" xref="S2.p3.10.m2.1.1.1.1.1.2.2.1"></eq><ci id="S2.p3.10.m2.1.1.1.1.1.2.2.2.cmml" xref="S2.p3.10.m2.1.1.1.1.1.2.2.2">𝑚</ci><apply id="S2.p3.10.m2.1.1.1.1.1.2.2.3.cmml" xref="S2.p3.10.m2.1.1.1.1.1.2.2.3"><times id="S2.p3.10.m2.1.1.1.1.1.2.2.3.1.cmml" xref="S2.p3.10.m2.1.1.1.1.1.2.2.3.1"></times><ci id="S2.p3.10.m2.1.1.1.1.1.2.2.3.2.cmml" xref="S2.p3.10.m2.1.1.1.1.1.2.2.3.2">𝑟</ci><ci id="S2.p3.10.m2.1.1.1.1.1.2.2.3.3.cmml" xref="S2.p3.10.m2.1.1.1.1.1.2.2.3.3">𝑒</ci><ci id="S2.p3.10.m2.1.1.1.1.1.2.2.3.4.cmml" xref="S2.p3.10.m2.1.1.1.1.1.2.2.3.4">𝑓</ci></apply></apply></apply></list></apply></apply><ci id="S2.p3.10.m2.3.3.cmml" xref="S2.p3.10.m2.3.3">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p3.10.m2.3c">s_{n=d}(t)\ast g_{n=d,m=ref.}(t)</annotation><annotation encoding="application/x-llamapun" id="S2.p3.10.m2.3d">italic_s start_POSTSUBSCRIPT italic_n = italic_d end_POSTSUBSCRIPT ( italic_t ) ∗ italic_g start_POSTSUBSCRIPT italic_n = italic_d , italic_m = italic_r italic_e italic_f . end_POSTSUBSCRIPT ( italic_t )</annotation></semantics></math>, solely given microphone outputs <math alttext="y_{m}(t),\ \forall\ m\in[1,\ ...,\ M]" class="ltx_Math" display="inline" id="S2.p3.11.m3.6"><semantics id="S2.p3.11.m3.6a"><mrow id="S2.p3.11.m3.6.6" xref="S2.p3.11.m3.6.6.cmml"><mrow id="S2.p3.11.m3.6.6.2.2" xref="S2.p3.11.m3.6.6.2.3.cmml"><mrow id="S2.p3.11.m3.5.5.1.1.1" xref="S2.p3.11.m3.5.5.1.1.1.cmml"><msub id="S2.p3.11.m3.5.5.1.1.1.2" xref="S2.p3.11.m3.5.5.1.1.1.2.cmml"><mi id="S2.p3.11.m3.5.5.1.1.1.2.2" xref="S2.p3.11.m3.5.5.1.1.1.2.2.cmml">y</mi><mi id="S2.p3.11.m3.5.5.1.1.1.2.3" xref="S2.p3.11.m3.5.5.1.1.1.2.3.cmml">m</mi></msub><mo id="S2.p3.11.m3.5.5.1.1.1.1" xref="S2.p3.11.m3.5.5.1.1.1.1.cmml">⁢</mo><mrow id="S2.p3.11.m3.5.5.1.1.1.3.2" xref="S2.p3.11.m3.5.5.1.1.1.cmml"><mo id="S2.p3.11.m3.5.5.1.1.1.3.2.1" stretchy="false" xref="S2.p3.11.m3.5.5.1.1.1.cmml">(</mo><mi id="S2.p3.11.m3.1.1" xref="S2.p3.11.m3.1.1.cmml">t</mi><mo id="S2.p3.11.m3.5.5.1.1.1.3.2.2" stretchy="false" xref="S2.p3.11.m3.5.5.1.1.1.cmml">)</mo></mrow></mrow><mo id="S2.p3.11.m3.6.6.2.2.3" rspace="0.667em" xref="S2.p3.11.m3.6.6.2.3.cmml">,</mo><mrow id="S2.p3.11.m3.6.6.2.2.2" xref="S2.p3.11.m3.6.6.2.2.2.cmml"><mo id="S2.p3.11.m3.6.6.2.2.2.1" rspace="0.667em" xref="S2.p3.11.m3.6.6.2.2.2.1.cmml">∀</mo><mi id="S2.p3.11.m3.6.6.2.2.2.2" xref="S2.p3.11.m3.6.6.2.2.2.2.cmml">m</mi></mrow></mrow><mo id="S2.p3.11.m3.6.6.3" xref="S2.p3.11.m3.6.6.3.cmml">∈</mo><mrow id="S2.p3.11.m3.6.6.4.2" xref="S2.p3.11.m3.6.6.4.1.cmml"><mo id="S2.p3.11.m3.6.6.4.2.1" stretchy="false" xref="S2.p3.11.m3.6.6.4.1.cmml">[</mo><mn id="S2.p3.11.m3.2.2" xref="S2.p3.11.m3.2.2.cmml">1</mn><mo id="S2.p3.11.m3.6.6.4.2.2" rspace="0.667em" xref="S2.p3.11.m3.6.6.4.1.cmml">,</mo><mi id="S2.p3.11.m3.3.3" mathvariant="normal" xref="S2.p3.11.m3.3.3.cmml">…</mi><mo id="S2.p3.11.m3.6.6.4.2.3" rspace="0.667em" xref="S2.p3.11.m3.6.6.4.1.cmml">,</mo><mi id="S2.p3.11.m3.4.4" xref="S2.p3.11.m3.4.4.cmml">M</mi><mo id="S2.p3.11.m3.6.6.4.2.4" stretchy="false" xref="S2.p3.11.m3.6.6.4.1.cmml">]</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.p3.11.m3.6b"><apply id="S2.p3.11.m3.6.6.cmml" xref="S2.p3.11.m3.6.6"><in id="S2.p3.11.m3.6.6.3.cmml" xref="S2.p3.11.m3.6.6.3"></in><list id="S2.p3.11.m3.6.6.2.3.cmml" xref="S2.p3.11.m3.6.6.2.2"><apply id="S2.p3.11.m3.5.5.1.1.1.cmml" xref="S2.p3.11.m3.5.5.1.1.1"><times id="S2.p3.11.m3.5.5.1.1.1.1.cmml" xref="S2.p3.11.m3.5.5.1.1.1.1"></times><apply id="S2.p3.11.m3.5.5.1.1.1.2.cmml" xref="S2.p3.11.m3.5.5.1.1.1.2"><csymbol cd="ambiguous" id="S2.p3.11.m3.5.5.1.1.1.2.1.cmml" xref="S2.p3.11.m3.5.5.1.1.1.2">subscript</csymbol><ci id="S2.p3.11.m3.5.5.1.1.1.2.2.cmml" xref="S2.p3.11.m3.5.5.1.1.1.2.2">𝑦</ci><ci id="S2.p3.11.m3.5.5.1.1.1.2.3.cmml" xref="S2.p3.11.m3.5.5.1.1.1.2.3">𝑚</ci></apply><ci id="S2.p3.11.m3.1.1.cmml" xref="S2.p3.11.m3.1.1">𝑡</ci></apply><apply id="S2.p3.11.m3.6.6.2.2.2.cmml" xref="S2.p3.11.m3.6.6.2.2.2"><csymbol cd="latexml" id="S2.p3.11.m3.6.6.2.2.2.1.cmml" xref="S2.p3.11.m3.6.6.2.2.2.1">for-all</csymbol><ci id="S2.p3.11.m3.6.6.2.2.2.2.cmml" xref="S2.p3.11.m3.6.6.2.2.2.2">𝑚</ci></apply></list><list id="S2.p3.11.m3.6.6.4.1.cmml" xref="S2.p3.11.m3.6.6.4.2"><cn id="S2.p3.11.m3.2.2.cmml" type="integer" xref="S2.p3.11.m3.2.2">1</cn><ci id="S2.p3.11.m3.3.3.cmml" xref="S2.p3.11.m3.3.3">…</ci><ci id="S2.p3.11.m3.4.4.cmml" xref="S2.p3.11.m3.4.4">𝑀</ci></list></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p3.11.m3.6c">y_{m}(t),\ \forall\ m\in[1,\ ...,\ M]</annotation><annotation encoding="application/x-llamapun" id="S2.p3.11.m3.6d">italic_y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) , ∀ italic_m ∈ [ 1 , … , italic_M ]</annotation></semantics></math>. Additional noise is not considered in order to facilitate the demonstration of the proposed method.</p> </div> <div class="ltx_para" id="S2.p4"> <p class="ltx_p" id="S2.p4.3">Additionally, the system we use in this work operates in the time-frequency domain, where <math alttext="Y_{m}(t,f)" class="ltx_Math" display="inline" id="S2.p4.1.m1.2"><semantics id="S2.p4.1.m1.2a"><mrow id="S2.p4.1.m1.2.3" xref="S2.p4.1.m1.2.3.cmml"><msub id="S2.p4.1.m1.2.3.2" xref="S2.p4.1.m1.2.3.2.cmml"><mi id="S2.p4.1.m1.2.3.2.2" xref="S2.p4.1.m1.2.3.2.2.cmml">Y</mi><mi id="S2.p4.1.m1.2.3.2.3" xref="S2.p4.1.m1.2.3.2.3.cmml">m</mi></msub><mo id="S2.p4.1.m1.2.3.1" xref="S2.p4.1.m1.2.3.1.cmml">⁢</mo><mrow id="S2.p4.1.m1.2.3.3.2" xref="S2.p4.1.m1.2.3.3.1.cmml"><mo id="S2.p4.1.m1.2.3.3.2.1" stretchy="false" xref="S2.p4.1.m1.2.3.3.1.cmml">(</mo><mi id="S2.p4.1.m1.1.1" xref="S2.p4.1.m1.1.1.cmml">t</mi><mo id="S2.p4.1.m1.2.3.3.2.2" xref="S2.p4.1.m1.2.3.3.1.cmml">,</mo><mi id="S2.p4.1.m1.2.2" xref="S2.p4.1.m1.2.2.cmml">f</mi><mo id="S2.p4.1.m1.2.3.3.2.3" stretchy="false" xref="S2.p4.1.m1.2.3.3.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.p4.1.m1.2b"><apply id="S2.p4.1.m1.2.3.cmml" xref="S2.p4.1.m1.2.3"><times id="S2.p4.1.m1.2.3.1.cmml" xref="S2.p4.1.m1.2.3.1"></times><apply id="S2.p4.1.m1.2.3.2.cmml" xref="S2.p4.1.m1.2.3.2"><csymbol cd="ambiguous" id="S2.p4.1.m1.2.3.2.1.cmml" xref="S2.p4.1.m1.2.3.2">subscript</csymbol><ci id="S2.p4.1.m1.2.3.2.2.cmml" xref="S2.p4.1.m1.2.3.2.2">𝑌</ci><ci id="S2.p4.1.m1.2.3.2.3.cmml" xref="S2.p4.1.m1.2.3.2.3">𝑚</ci></apply><interval closure="open" id="S2.p4.1.m1.2.3.3.1.cmml" xref="S2.p4.1.m1.2.3.3.2"><ci id="S2.p4.1.m1.1.1.cmml" xref="S2.p4.1.m1.1.1">𝑡</ci><ci id="S2.p4.1.m1.2.2.cmml" xref="S2.p4.1.m1.2.2">𝑓</ci></interval></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p4.1.m1.2c">Y_{m}(t,f)</annotation><annotation encoding="application/x-llamapun" id="S2.p4.1.m1.2d">italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t , italic_f )</annotation></semantics></math> and <math alttext="S_{n}(t,f)" class="ltx_Math" display="inline" id="S2.p4.2.m2.2"><semantics id="S2.p4.2.m2.2a"><mrow id="S2.p4.2.m2.2.3" xref="S2.p4.2.m2.2.3.cmml"><msub id="S2.p4.2.m2.2.3.2" xref="S2.p4.2.m2.2.3.2.cmml"><mi id="S2.p4.2.m2.2.3.2.2" xref="S2.p4.2.m2.2.3.2.2.cmml">S</mi><mi id="S2.p4.2.m2.2.3.2.3" xref="S2.p4.2.m2.2.3.2.3.cmml">n</mi></msub><mo id="S2.p4.2.m2.2.3.1" xref="S2.p4.2.m2.2.3.1.cmml">⁢</mo><mrow id="S2.p4.2.m2.2.3.3.2" xref="S2.p4.2.m2.2.3.3.1.cmml"><mo id="S2.p4.2.m2.2.3.3.2.1" stretchy="false" xref="S2.p4.2.m2.2.3.3.1.cmml">(</mo><mi id="S2.p4.2.m2.1.1" xref="S2.p4.2.m2.1.1.cmml">t</mi><mo id="S2.p4.2.m2.2.3.3.2.2" xref="S2.p4.2.m2.2.3.3.1.cmml">,</mo><mi id="S2.p4.2.m2.2.2" xref="S2.p4.2.m2.2.2.cmml">f</mi><mo id="S2.p4.2.m2.2.3.3.2.3" stretchy="false" xref="S2.p4.2.m2.2.3.3.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.p4.2.m2.2b"><apply id="S2.p4.2.m2.2.3.cmml" xref="S2.p4.2.m2.2.3"><times id="S2.p4.2.m2.2.3.1.cmml" xref="S2.p4.2.m2.2.3.1"></times><apply id="S2.p4.2.m2.2.3.2.cmml" xref="S2.p4.2.m2.2.3.2"><csymbol cd="ambiguous" id="S2.p4.2.m2.2.3.2.1.cmml" xref="S2.p4.2.m2.2.3.2">subscript</csymbol><ci id="S2.p4.2.m2.2.3.2.2.cmml" xref="S2.p4.2.m2.2.3.2.2">𝑆</ci><ci id="S2.p4.2.m2.2.3.2.3.cmml" xref="S2.p4.2.m2.2.3.2.3">𝑛</ci></apply><interval closure="open" id="S2.p4.2.m2.2.3.3.1.cmml" xref="S2.p4.2.m2.2.3.3.2"><ci id="S2.p4.2.m2.1.1.cmml" xref="S2.p4.2.m2.1.1">𝑡</ci><ci id="S2.p4.2.m2.2.2.cmml" xref="S2.p4.2.m2.2.2">𝑓</ci></interval></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p4.2.m2.2c">S_{n}(t,f)</annotation><annotation encoding="application/x-llamapun" id="S2.p4.2.m2.2d">italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t , italic_f )</annotation></semantics></math> are, respectively, the short-term Fourier transform (STFT) of the microphone outputs and the reverberant speech signal <math alttext="s_{n}(t)" class="ltx_Math" display="inline" id="S2.p4.3.m3.1"><semantics id="S2.p4.3.m3.1a"><mrow id="S2.p4.3.m3.1.2" xref="S2.p4.3.m3.1.2.cmml"><msub id="S2.p4.3.m3.1.2.2" xref="S2.p4.3.m3.1.2.2.cmml"><mi id="S2.p4.3.m3.1.2.2.2" xref="S2.p4.3.m3.1.2.2.2.cmml">s</mi><mi id="S2.p4.3.m3.1.2.2.3" xref="S2.p4.3.m3.1.2.2.3.cmml">n</mi></msub><mo id="S2.p4.3.m3.1.2.1" xref="S2.p4.3.m3.1.2.1.cmml">⁢</mo><mrow id="S2.p4.3.m3.1.2.3.2" xref="S2.p4.3.m3.1.2.cmml"><mo id="S2.p4.3.m3.1.2.3.2.1" stretchy="false" xref="S2.p4.3.m3.1.2.cmml">(</mo><mi id="S2.p4.3.m3.1.1" xref="S2.p4.3.m3.1.1.cmml">t</mi><mo id="S2.p4.3.m3.1.2.3.2.2" stretchy="false" xref="S2.p4.3.m3.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.p4.3.m3.1b"><apply id="S2.p4.3.m3.1.2.cmml" xref="S2.p4.3.m3.1.2"><times id="S2.p4.3.m3.1.2.1.cmml" xref="S2.p4.3.m3.1.2.1"></times><apply id="S2.p4.3.m3.1.2.2.cmml" xref="S2.p4.3.m3.1.2.2"><csymbol cd="ambiguous" id="S2.p4.3.m3.1.2.2.1.cmml" xref="S2.p4.3.m3.1.2.2">subscript</csymbol><ci id="S2.p4.3.m3.1.2.2.2.cmml" xref="S2.p4.3.m3.1.2.2.2">𝑠</ci><ci id="S2.p4.3.m3.1.2.2.3.cmml" xref="S2.p4.3.m3.1.2.2.3">𝑛</ci></apply><ci id="S2.p4.3.m3.1.1.cmml" xref="S2.p4.3.m3.1.1">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.p4.3.m3.1c">s_{n}(t)</annotation><annotation encoding="application/x-llamapun" id="S2.p4.3.m3.1d">italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t )</annotation></semantics></math> captured by a reference microphone.</p> </div> <figure class="ltx_figure" id="S2.F1"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_square" height="736" id="S2.F1.g1" src="x1.png" width="705"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S2.F1.8.4.1" style="font-size:90%;">Figure 1</span>: </span><span class="ltx_text" id="S2.F1.6.3" style="font-size:90%;">Example scenario of the considered problem. The indication arrows point out to: (i) listener <math alttext="L" class="ltx_Math" display="inline" id="S2.F1.4.1.m1.1"><semantics id="S2.F1.4.1.m1.1b"><mi id="S2.F1.4.1.m1.1.1" xref="S2.F1.4.1.m1.1.1.cmml">L</mi><annotation-xml encoding="MathML-Content" id="S2.F1.4.1.m1.1c"><ci id="S2.F1.4.1.m1.1.1.cmml" xref="S2.F1.4.1.m1.1.1">𝐿</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.F1.4.1.m1.1d">L</annotation><annotation encoding="application/x-llamapun" id="S2.F1.4.1.m1.1e">italic_L</annotation></semantics></math>; (ii) speaker <math alttext="S_{1}" class="ltx_Math" display="inline" id="S2.F1.5.2.m2.1"><semantics id="S2.F1.5.2.m2.1b"><msub id="S2.F1.5.2.m2.1.1" xref="S2.F1.5.2.m2.1.1.cmml"><mi id="S2.F1.5.2.m2.1.1.2" xref="S2.F1.5.2.m2.1.1.2.cmml">S</mi><mn id="S2.F1.5.2.m2.1.1.3" xref="S2.F1.5.2.m2.1.1.3.cmml">1</mn></msub><annotation-xml encoding="MathML-Content" id="S2.F1.5.2.m2.1c"><apply id="S2.F1.5.2.m2.1.1.cmml" xref="S2.F1.5.2.m2.1.1"><csymbol cd="ambiguous" id="S2.F1.5.2.m2.1.1.1.cmml" xref="S2.F1.5.2.m2.1.1">subscript</csymbol><ci id="S2.F1.5.2.m2.1.1.2.cmml" xref="S2.F1.5.2.m2.1.1.2">𝑆</ci><cn id="S2.F1.5.2.m2.1.1.3.cmml" type="integer" xref="S2.F1.5.2.m2.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.F1.5.2.m2.1d">S_{1}</annotation><annotation encoding="application/x-llamapun" id="S2.F1.5.2.m2.1e">italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT</annotation></semantics></math>; (iii) speaker <math alttext="S_{2}" class="ltx_Math" display="inline" id="S2.F1.6.3.m3.1"><semantics id="S2.F1.6.3.m3.1b"><msub id="S2.F1.6.3.m3.1.1" xref="S2.F1.6.3.m3.1.1.cmml"><mi id="S2.F1.6.3.m3.1.1.2" xref="S2.F1.6.3.m3.1.1.2.cmml">S</mi><mn id="S2.F1.6.3.m3.1.1.3" xref="S2.F1.6.3.m3.1.1.3.cmml">2</mn></msub><annotation-xml encoding="MathML-Content" id="S2.F1.6.3.m3.1c"><apply id="S2.F1.6.3.m3.1.1.cmml" xref="S2.F1.6.3.m3.1.1"><csymbol cd="ambiguous" id="S2.F1.6.3.m3.1.1.1.cmml" xref="S2.F1.6.3.m3.1.1">subscript</csymbol><ci id="S2.F1.6.3.m3.1.1.2.cmml" xref="S2.F1.6.3.m3.1.1.2">𝑆</ci><cn id="S2.F1.6.3.m3.1.1.3.cmml" type="integer" xref="S2.F1.6.3.m3.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.F1.6.3.m3.1d">S_{2}</annotation><annotation encoding="application/x-llamapun" id="S2.F1.6.3.m3.1e">italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT</annotation></semantics></math>; (iv) wall; and (v) reverberation.</span></figcaption> </figure> <figure class="ltx_figure" id="S2.F2"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="287" id="S2.F2.g1" src="x2.png" width="706"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S2.F2.2.1.1" style="font-size:90%;">Figure 2</span>: </span><span class="ltx_text" id="S2.F2.3.2" style="font-size:90%;">End-to-end beamforming neural network training system employing speaker selection mechanism.</span></figcaption> </figure> <figure class="ltx_float ltx_float_algorithm ltx_framed ltx_framed_top" id="alg1"> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_float"><span class="ltx_text ltx_font_bold" id="alg1.2.1.1">Algorithm 1</span> </span> Speaker selection mechanism for two speakers</figcaption> <div class="ltx_listing ltx_listing" id="alg1.3"> <div class="ltx_listingline" id="alg1.l1"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l1.1.1.1" style="font-size:80%;">1:</span></span><span class="ltx_text ltx_font_bold" id="alg1.l1.2">procedure</span> <span class="ltx_text ltx_font_smallcaps" id="alg1.l1.3">SSM</span>(positions, <math alttext="|\theta_{u}^{\max}|" class="ltx_Math" display="inline" id="alg1.l1.m1.1"><semantics id="alg1.l1.m1.1a"><mrow id="alg1.l1.m1.1.1.1" xref="alg1.l1.m1.1.1.2.cmml"><mo id="alg1.l1.m1.1.1.1.2" stretchy="false" xref="alg1.l1.m1.1.1.2.1.cmml">|</mo><msubsup id="alg1.l1.m1.1.1.1.1" xref="alg1.l1.m1.1.1.1.1.cmml"><mi id="alg1.l1.m1.1.1.1.1.2.2" xref="alg1.l1.m1.1.1.1.1.2.2.cmml">θ</mi><mi id="alg1.l1.m1.1.1.1.1.2.3" xref="alg1.l1.m1.1.1.1.1.2.3.cmml">u</mi><mi id="alg1.l1.m1.1.1.1.1.3" xref="alg1.l1.m1.1.1.1.1.3.cmml">max</mi></msubsup><mo id="alg1.l1.m1.1.1.1.3" stretchy="false" xref="alg1.l1.m1.1.1.2.1.cmml">|</mo></mrow><annotation-xml encoding="MathML-Content" id="alg1.l1.m1.1b"><apply id="alg1.l1.m1.1.1.2.cmml" xref="alg1.l1.m1.1.1.1"><abs id="alg1.l1.m1.1.1.2.1.cmml" xref="alg1.l1.m1.1.1.1.2"></abs><apply id="alg1.l1.m1.1.1.1.1.cmml" xref="alg1.l1.m1.1.1.1.1"><csymbol cd="ambiguous" id="alg1.l1.m1.1.1.1.1.1.cmml" xref="alg1.l1.m1.1.1.1.1">superscript</csymbol><apply id="alg1.l1.m1.1.1.1.1.2.cmml" xref="alg1.l1.m1.1.1.1.1"><csymbol cd="ambiguous" id="alg1.l1.m1.1.1.1.1.2.1.cmml" xref="alg1.l1.m1.1.1.1.1">subscript</csymbol><ci id="alg1.l1.m1.1.1.1.1.2.2.cmml" xref="alg1.l1.m1.1.1.1.1.2.2">𝜃</ci><ci id="alg1.l1.m1.1.1.1.1.2.3.cmml" xref="alg1.l1.m1.1.1.1.1.2.3">𝑢</ci></apply><max id="alg1.l1.m1.1.1.1.1.3.cmml" xref="alg1.l1.m1.1.1.1.1.3"></max></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l1.m1.1c">|\theta_{u}^{\max}|</annotation><annotation encoding="application/x-llamapun" id="alg1.l1.m1.1d">| italic_θ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT |</annotation></semantics></math>) </div> <div class="ltx_listingline" id="alg1.l2"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l2.1.1.1" style="font-size:80%;">2:</span></span> <span class="ltx_text ltx_font_bold" id="alg1.l2.2">Input:</span> 2-dimensional position of listener <math alttext="[a_{L},b_{L}]" class="ltx_Math" display="inline" id="alg1.l2.m1.2"><semantics id="alg1.l2.m1.2a"><mrow id="alg1.l2.m1.2.2.2" xref="alg1.l2.m1.2.2.3.cmml"><mo id="alg1.l2.m1.2.2.2.3" stretchy="false" xref="alg1.l2.m1.2.2.3.cmml">[</mo><msub id="alg1.l2.m1.1.1.1.1" xref="alg1.l2.m1.1.1.1.1.cmml"><mi id="alg1.l2.m1.1.1.1.1.2" xref="alg1.l2.m1.1.1.1.1.2.cmml">a</mi><mi id="alg1.l2.m1.1.1.1.1.3" xref="alg1.l2.m1.1.1.1.1.3.cmml">L</mi></msub><mo id="alg1.l2.m1.2.2.2.4" xref="alg1.l2.m1.2.2.3.cmml">,</mo><msub id="alg1.l2.m1.2.2.2.2" xref="alg1.l2.m1.2.2.2.2.cmml"><mi id="alg1.l2.m1.2.2.2.2.2" xref="alg1.l2.m1.2.2.2.2.2.cmml">b</mi><mi id="alg1.l2.m1.2.2.2.2.3" xref="alg1.l2.m1.2.2.2.2.3.cmml">L</mi></msub><mo id="alg1.l2.m1.2.2.2.5" stretchy="false" xref="alg1.l2.m1.2.2.3.cmml">]</mo></mrow><annotation-xml encoding="MathML-Content" id="alg1.l2.m1.2b"><interval closure="closed" id="alg1.l2.m1.2.2.3.cmml" xref="alg1.l2.m1.2.2.2"><apply id="alg1.l2.m1.1.1.1.1.cmml" xref="alg1.l2.m1.1.1.1.1"><csymbol cd="ambiguous" id="alg1.l2.m1.1.1.1.1.1.cmml" xref="alg1.l2.m1.1.1.1.1">subscript</csymbol><ci id="alg1.l2.m1.1.1.1.1.2.cmml" xref="alg1.l2.m1.1.1.1.1.2">𝑎</ci><ci id="alg1.l2.m1.1.1.1.1.3.cmml" xref="alg1.l2.m1.1.1.1.1.3">𝐿</ci></apply><apply id="alg1.l2.m1.2.2.2.2.cmml" xref="alg1.l2.m1.2.2.2.2"><csymbol cd="ambiguous" id="alg1.l2.m1.2.2.2.2.1.cmml" xref="alg1.l2.m1.2.2.2.2">subscript</csymbol><ci id="alg1.l2.m1.2.2.2.2.2.cmml" xref="alg1.l2.m1.2.2.2.2.2">𝑏</ci><ci id="alg1.l2.m1.2.2.2.2.3.cmml" xref="alg1.l2.m1.2.2.2.2.3">𝐿</ci></apply></interval></annotation-xml><annotation encoding="application/x-tex" id="alg1.l2.m1.2c">[a_{L},b_{L}]</annotation><annotation encoding="application/x-llamapun" id="alg1.l2.m1.2d">[ italic_a start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ]</annotation></semantics></math> and speakers <math alttext="\Bigl{[}\bigl{[}a_{S_{1}},b_{S_{1}}\bigr{]},\ \bigl{[}a_{S_{2}},b_{S_{2}}\bigr% {]}\Bigr{]}" class="ltx_Math" display="inline" id="alg1.l2.m2.2"><semantics id="alg1.l2.m2.2a"><mrow id="alg1.l2.m2.2.2.2" xref="alg1.l2.m2.2.2.3.cmml"><mo id="alg1.l2.m2.2.2.2.3" maxsize="160%" minsize="160%" xref="alg1.l2.m2.2.2.3.cmml">[</mo><mrow id="alg1.l2.m2.1.1.1.1.2" xref="alg1.l2.m2.1.1.1.1.3.cmml"><mo id="alg1.l2.m2.1.1.1.1.2.3" maxsize="120%" minsize="120%" xref="alg1.l2.m2.1.1.1.1.3.cmml">[</mo><msub id="alg1.l2.m2.1.1.1.1.1.1" xref="alg1.l2.m2.1.1.1.1.1.1.cmml"><mi id="alg1.l2.m2.1.1.1.1.1.1.2" xref="alg1.l2.m2.1.1.1.1.1.1.2.cmml">a</mi><msub id="alg1.l2.m2.1.1.1.1.1.1.3" xref="alg1.l2.m2.1.1.1.1.1.1.3.cmml"><mi id="alg1.l2.m2.1.1.1.1.1.1.3.2" xref="alg1.l2.m2.1.1.1.1.1.1.3.2.cmml">S</mi><mn id="alg1.l2.m2.1.1.1.1.1.1.3.3" xref="alg1.l2.m2.1.1.1.1.1.1.3.3.cmml">1</mn></msub></msub><mo id="alg1.l2.m2.1.1.1.1.2.4" xref="alg1.l2.m2.1.1.1.1.3.cmml">,</mo><msub id="alg1.l2.m2.1.1.1.1.2.2" xref="alg1.l2.m2.1.1.1.1.2.2.cmml"><mi id="alg1.l2.m2.1.1.1.1.2.2.2" xref="alg1.l2.m2.1.1.1.1.2.2.2.cmml">b</mi><msub id="alg1.l2.m2.1.1.1.1.2.2.3" xref="alg1.l2.m2.1.1.1.1.2.2.3.cmml"><mi id="alg1.l2.m2.1.1.1.1.2.2.3.2" xref="alg1.l2.m2.1.1.1.1.2.2.3.2.cmml">S</mi><mn id="alg1.l2.m2.1.1.1.1.2.2.3.3" xref="alg1.l2.m2.1.1.1.1.2.2.3.3.cmml">1</mn></msub></msub><mo id="alg1.l2.m2.1.1.1.1.2.5" maxsize="120%" minsize="120%" xref="alg1.l2.m2.1.1.1.1.3.cmml">]</mo></mrow><mo id="alg1.l2.m2.2.2.2.4" rspace="0.667em" xref="alg1.l2.m2.2.2.3.cmml">,</mo><mrow id="alg1.l2.m2.2.2.2.2.2" xref="alg1.l2.m2.2.2.2.2.3.cmml"><mo id="alg1.l2.m2.2.2.2.2.2.3" maxsize="120%" minsize="120%" xref="alg1.l2.m2.2.2.2.2.3.cmml">[</mo><msub id="alg1.l2.m2.2.2.2.2.1.1" xref="alg1.l2.m2.2.2.2.2.1.1.cmml"><mi id="alg1.l2.m2.2.2.2.2.1.1.2" xref="alg1.l2.m2.2.2.2.2.1.1.2.cmml">a</mi><msub id="alg1.l2.m2.2.2.2.2.1.1.3" xref="alg1.l2.m2.2.2.2.2.1.1.3.cmml"><mi id="alg1.l2.m2.2.2.2.2.1.1.3.2" xref="alg1.l2.m2.2.2.2.2.1.1.3.2.cmml">S</mi><mn id="alg1.l2.m2.2.2.2.2.1.1.3.3" xref="alg1.l2.m2.2.2.2.2.1.1.3.3.cmml">2</mn></msub></msub><mo id="alg1.l2.m2.2.2.2.2.2.4" xref="alg1.l2.m2.2.2.2.2.3.cmml">,</mo><msub id="alg1.l2.m2.2.2.2.2.2.2" xref="alg1.l2.m2.2.2.2.2.2.2.cmml"><mi id="alg1.l2.m2.2.2.2.2.2.2.2" xref="alg1.l2.m2.2.2.2.2.2.2.2.cmml">b</mi><msub id="alg1.l2.m2.2.2.2.2.2.2.3" xref="alg1.l2.m2.2.2.2.2.2.2.3.cmml"><mi id="alg1.l2.m2.2.2.2.2.2.2.3.2" xref="alg1.l2.m2.2.2.2.2.2.2.3.2.cmml">S</mi><mn id="alg1.l2.m2.2.2.2.2.2.2.3.3" xref="alg1.l2.m2.2.2.2.2.2.2.3.3.cmml">2</mn></msub></msub><mo id="alg1.l2.m2.2.2.2.2.2.5" maxsize="120%" minsize="120%" xref="alg1.l2.m2.2.2.2.2.3.cmml">]</mo></mrow><mo id="alg1.l2.m2.2.2.2.5" maxsize="160%" minsize="160%" xref="alg1.l2.m2.2.2.3.cmml">]</mo></mrow><annotation-xml encoding="MathML-Content" id="alg1.l2.m2.2b"><interval closure="closed" id="alg1.l2.m2.2.2.3.cmml" xref="alg1.l2.m2.2.2.2"><interval closure="closed" id="alg1.l2.m2.1.1.1.1.3.cmml" xref="alg1.l2.m2.1.1.1.1.2"><apply id="alg1.l2.m2.1.1.1.1.1.1.cmml" xref="alg1.l2.m2.1.1.1.1.1.1"><csymbol cd="ambiguous" id="alg1.l2.m2.1.1.1.1.1.1.1.cmml" xref="alg1.l2.m2.1.1.1.1.1.1">subscript</csymbol><ci id="alg1.l2.m2.1.1.1.1.1.1.2.cmml" xref="alg1.l2.m2.1.1.1.1.1.1.2">𝑎</ci><apply id="alg1.l2.m2.1.1.1.1.1.1.3.cmml" xref="alg1.l2.m2.1.1.1.1.1.1.3"><csymbol cd="ambiguous" id="alg1.l2.m2.1.1.1.1.1.1.3.1.cmml" xref="alg1.l2.m2.1.1.1.1.1.1.3">subscript</csymbol><ci id="alg1.l2.m2.1.1.1.1.1.1.3.2.cmml" xref="alg1.l2.m2.1.1.1.1.1.1.3.2">𝑆</ci><cn id="alg1.l2.m2.1.1.1.1.1.1.3.3.cmml" type="integer" xref="alg1.l2.m2.1.1.1.1.1.1.3.3">1</cn></apply></apply><apply id="alg1.l2.m2.1.1.1.1.2.2.cmml" xref="alg1.l2.m2.1.1.1.1.2.2"><csymbol cd="ambiguous" id="alg1.l2.m2.1.1.1.1.2.2.1.cmml" xref="alg1.l2.m2.1.1.1.1.2.2">subscript</csymbol><ci id="alg1.l2.m2.1.1.1.1.2.2.2.cmml" xref="alg1.l2.m2.1.1.1.1.2.2.2">𝑏</ci><apply id="alg1.l2.m2.1.1.1.1.2.2.3.cmml" xref="alg1.l2.m2.1.1.1.1.2.2.3"><csymbol cd="ambiguous" id="alg1.l2.m2.1.1.1.1.2.2.3.1.cmml" xref="alg1.l2.m2.1.1.1.1.2.2.3">subscript</csymbol><ci id="alg1.l2.m2.1.1.1.1.2.2.3.2.cmml" xref="alg1.l2.m2.1.1.1.1.2.2.3.2">𝑆</ci><cn id="alg1.l2.m2.1.1.1.1.2.2.3.3.cmml" type="integer" xref="alg1.l2.m2.1.1.1.1.2.2.3.3">1</cn></apply></apply></interval><interval closure="closed" id="alg1.l2.m2.2.2.2.2.3.cmml" xref="alg1.l2.m2.2.2.2.2.2"><apply id="alg1.l2.m2.2.2.2.2.1.1.cmml" xref="alg1.l2.m2.2.2.2.2.1.1"><csymbol cd="ambiguous" id="alg1.l2.m2.2.2.2.2.1.1.1.cmml" xref="alg1.l2.m2.2.2.2.2.1.1">subscript</csymbol><ci id="alg1.l2.m2.2.2.2.2.1.1.2.cmml" xref="alg1.l2.m2.2.2.2.2.1.1.2">𝑎</ci><apply id="alg1.l2.m2.2.2.2.2.1.1.3.cmml" xref="alg1.l2.m2.2.2.2.2.1.1.3"><csymbol cd="ambiguous" id="alg1.l2.m2.2.2.2.2.1.1.3.1.cmml" xref="alg1.l2.m2.2.2.2.2.1.1.3">subscript</csymbol><ci id="alg1.l2.m2.2.2.2.2.1.1.3.2.cmml" xref="alg1.l2.m2.2.2.2.2.1.1.3.2">𝑆</ci><cn id="alg1.l2.m2.2.2.2.2.1.1.3.3.cmml" type="integer" xref="alg1.l2.m2.2.2.2.2.1.1.3.3">2</cn></apply></apply><apply id="alg1.l2.m2.2.2.2.2.2.2.cmml" xref="alg1.l2.m2.2.2.2.2.2.2"><csymbol cd="ambiguous" id="alg1.l2.m2.2.2.2.2.2.2.1.cmml" xref="alg1.l2.m2.2.2.2.2.2.2">subscript</csymbol><ci id="alg1.l2.m2.2.2.2.2.2.2.2.cmml" xref="alg1.l2.m2.2.2.2.2.2.2.2">𝑏</ci><apply id="alg1.l2.m2.2.2.2.2.2.2.3.cmml" xref="alg1.l2.m2.2.2.2.2.2.2.3"><csymbol cd="ambiguous" id="alg1.l2.m2.2.2.2.2.2.2.3.1.cmml" xref="alg1.l2.m2.2.2.2.2.2.2.3">subscript</csymbol><ci id="alg1.l2.m2.2.2.2.2.2.2.3.2.cmml" xref="alg1.l2.m2.2.2.2.2.2.2.3.2">𝑆</ci><cn id="alg1.l2.m2.2.2.2.2.2.2.3.3.cmml" type="integer" xref="alg1.l2.m2.2.2.2.2.2.2.3.3">2</cn></apply></apply></interval></interval></annotation-xml><annotation encoding="application/x-tex" id="alg1.l2.m2.2c">\Bigl{[}\bigl{[}a_{S_{1}},b_{S_{1}}\bigr{]},\ \bigl{[}a_{S_{2}},b_{S_{2}}\bigr% {]}\Bigr{]}</annotation><annotation encoding="application/x-llamapun" id="alg1.l2.m2.2d">[ [ italic_a start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] , [ italic_a start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] ]</annotation></semantics></math> of an audio utterance </div> <div class="ltx_listingline" id="alg1.l3"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l3.1.1.1" style="font-size:80%;">3:</span></span> <span class="ltx_text ltx_font_bold" id="alg1.l3.2">Parameter:</span> Maximum undershot angle <math alttext="|\theta_{u}^{\max}|" class="ltx_Math" display="inline" id="alg1.l3.m1.1"><semantics id="alg1.l3.m1.1a"><mrow id="alg1.l3.m1.1.1.1" xref="alg1.l3.m1.1.1.2.cmml"><mo id="alg1.l3.m1.1.1.1.2" stretchy="false" xref="alg1.l3.m1.1.1.2.1.cmml">|</mo><msubsup id="alg1.l3.m1.1.1.1.1" xref="alg1.l3.m1.1.1.1.1.cmml"><mi id="alg1.l3.m1.1.1.1.1.2.2" xref="alg1.l3.m1.1.1.1.1.2.2.cmml">θ</mi><mi id="alg1.l3.m1.1.1.1.1.2.3" xref="alg1.l3.m1.1.1.1.1.2.3.cmml">u</mi><mi id="alg1.l3.m1.1.1.1.1.3" xref="alg1.l3.m1.1.1.1.1.3.cmml">max</mi></msubsup><mo id="alg1.l3.m1.1.1.1.3" stretchy="false" xref="alg1.l3.m1.1.1.2.1.cmml">|</mo></mrow><annotation-xml encoding="MathML-Content" id="alg1.l3.m1.1b"><apply id="alg1.l3.m1.1.1.2.cmml" xref="alg1.l3.m1.1.1.1"><abs id="alg1.l3.m1.1.1.2.1.cmml" xref="alg1.l3.m1.1.1.1.2"></abs><apply id="alg1.l3.m1.1.1.1.1.cmml" xref="alg1.l3.m1.1.1.1.1"><csymbol cd="ambiguous" id="alg1.l3.m1.1.1.1.1.1.cmml" xref="alg1.l3.m1.1.1.1.1">superscript</csymbol><apply id="alg1.l3.m1.1.1.1.1.2.cmml" xref="alg1.l3.m1.1.1.1.1"><csymbol cd="ambiguous" id="alg1.l3.m1.1.1.1.1.2.1.cmml" xref="alg1.l3.m1.1.1.1.1">subscript</csymbol><ci id="alg1.l3.m1.1.1.1.1.2.2.cmml" xref="alg1.l3.m1.1.1.1.1.2.2">𝜃</ci><ci id="alg1.l3.m1.1.1.1.1.2.3.cmml" xref="alg1.l3.m1.1.1.1.1.2.3">𝑢</ci></apply><max id="alg1.l3.m1.1.1.1.1.3.cmml" xref="alg1.l3.m1.1.1.1.1.3"></max></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l3.m1.1c">|\theta_{u}^{\max}|</annotation><annotation encoding="application/x-llamapun" id="alg1.l3.m1.1d">| italic_θ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT |</annotation></semantics></math> </div> <div class="ltx_listingline" id="alg1.l4"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l4.1.1.1" style="font-size:80%;">4:</span></span> <span class="ltx_text ltx_font_bold" id="alg1.l4.2">Output:</span> Index of desired speaker </div> <div class="ltx_listingline" id="alg1.l5"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l5.1.1.1" style="font-size:80%;">5:</span></span> <span class="ltx_text ltx_font_bold" id="alg1.l5.2">Calculate the speakers’ angles relative to the listener’s x-axis:</span> <table class="ltx_equation ltx_eqn_table" id="S2.Ex1"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\theta_{S_{1}}=\operatorname{atan2}(b_{S_{1}}-b_{L},\;a_{S_{1}}-a_{L})" class="ltx_Math" display="block" id="S2.Ex1.m1.3"><semantics id="S2.Ex1.m1.3a"><mrow id="S2.Ex1.m1.3.3" xref="S2.Ex1.m1.3.3.cmml"><msub id="S2.Ex1.m1.3.3.4" xref="S2.Ex1.m1.3.3.4.cmml"><mi id="S2.Ex1.m1.3.3.4.2" xref="S2.Ex1.m1.3.3.4.2.cmml">θ</mi><msub id="S2.Ex1.m1.3.3.4.3" xref="S2.Ex1.m1.3.3.4.3.cmml"><mi id="S2.Ex1.m1.3.3.4.3.2" xref="S2.Ex1.m1.3.3.4.3.2.cmml">S</mi><mn id="S2.Ex1.m1.3.3.4.3.3" xref="S2.Ex1.m1.3.3.4.3.3.cmml">1</mn></msub></msub><mo id="S2.Ex1.m1.3.3.3" xref="S2.Ex1.m1.3.3.3.cmml">=</mo><mrow id="S2.Ex1.m1.3.3.2.2" xref="S2.Ex1.m1.3.3.2.3.cmml"><mi id="S2.Ex1.m1.1.1" xref="S2.Ex1.m1.1.1.cmml">atan2</mi><mo id="S2.Ex1.m1.3.3.2.2a" xref="S2.Ex1.m1.3.3.2.3.cmml">⁡</mo><mrow id="S2.Ex1.m1.3.3.2.2.2" xref="S2.Ex1.m1.3.3.2.3.cmml"><mo id="S2.Ex1.m1.3.3.2.2.2.3" stretchy="false" xref="S2.Ex1.m1.3.3.2.3.cmml">(</mo><mrow id="S2.Ex1.m1.2.2.1.1.1.1" xref="S2.Ex1.m1.2.2.1.1.1.1.cmml"><msub id="S2.Ex1.m1.2.2.1.1.1.1.2" xref="S2.Ex1.m1.2.2.1.1.1.1.2.cmml"><mi id="S2.Ex1.m1.2.2.1.1.1.1.2.2" xref="S2.Ex1.m1.2.2.1.1.1.1.2.2.cmml">b</mi><msub id="S2.Ex1.m1.2.2.1.1.1.1.2.3" xref="S2.Ex1.m1.2.2.1.1.1.1.2.3.cmml"><mi id="S2.Ex1.m1.2.2.1.1.1.1.2.3.2" xref="S2.Ex1.m1.2.2.1.1.1.1.2.3.2.cmml">S</mi><mn id="S2.Ex1.m1.2.2.1.1.1.1.2.3.3" xref="S2.Ex1.m1.2.2.1.1.1.1.2.3.3.cmml">1</mn></msub></msub><mo id="S2.Ex1.m1.2.2.1.1.1.1.1" xref="S2.Ex1.m1.2.2.1.1.1.1.1.cmml">−</mo><msub id="S2.Ex1.m1.2.2.1.1.1.1.3" xref="S2.Ex1.m1.2.2.1.1.1.1.3.cmml"><mi id="S2.Ex1.m1.2.2.1.1.1.1.3.2" xref="S2.Ex1.m1.2.2.1.1.1.1.3.2.cmml">b</mi><mi id="S2.Ex1.m1.2.2.1.1.1.1.3.3" xref="S2.Ex1.m1.2.2.1.1.1.1.3.3.cmml">L</mi></msub></mrow><mo id="S2.Ex1.m1.3.3.2.2.2.4" rspace="0.447em" xref="S2.Ex1.m1.3.3.2.3.cmml">,</mo><mrow id="S2.Ex1.m1.3.3.2.2.2.2" xref="S2.Ex1.m1.3.3.2.2.2.2.cmml"><msub id="S2.Ex1.m1.3.3.2.2.2.2.2" xref="S2.Ex1.m1.3.3.2.2.2.2.2.cmml"><mi id="S2.Ex1.m1.3.3.2.2.2.2.2.2" xref="S2.Ex1.m1.3.3.2.2.2.2.2.2.cmml">a</mi><msub id="S2.Ex1.m1.3.3.2.2.2.2.2.3" xref="S2.Ex1.m1.3.3.2.2.2.2.2.3.cmml"><mi id="S2.Ex1.m1.3.3.2.2.2.2.2.3.2" xref="S2.Ex1.m1.3.3.2.2.2.2.2.3.2.cmml">S</mi><mn id="S2.Ex1.m1.3.3.2.2.2.2.2.3.3" xref="S2.Ex1.m1.3.3.2.2.2.2.2.3.3.cmml">1</mn></msub></msub><mo id="S2.Ex1.m1.3.3.2.2.2.2.1" xref="S2.Ex1.m1.3.3.2.2.2.2.1.cmml">−</mo><msub id="S2.Ex1.m1.3.3.2.2.2.2.3" xref="S2.Ex1.m1.3.3.2.2.2.2.3.cmml"><mi id="S2.Ex1.m1.3.3.2.2.2.2.3.2" xref="S2.Ex1.m1.3.3.2.2.2.2.3.2.cmml">a</mi><mi id="S2.Ex1.m1.3.3.2.2.2.2.3.3" xref="S2.Ex1.m1.3.3.2.2.2.2.3.3.cmml">L</mi></msub></mrow><mo id="S2.Ex1.m1.3.3.2.2.2.5" stretchy="false" xref="S2.Ex1.m1.3.3.2.3.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.Ex1.m1.3b"><apply id="S2.Ex1.m1.3.3.cmml" xref="S2.Ex1.m1.3.3"><eq id="S2.Ex1.m1.3.3.3.cmml" xref="S2.Ex1.m1.3.3.3"></eq><apply id="S2.Ex1.m1.3.3.4.cmml" xref="S2.Ex1.m1.3.3.4"><csymbol cd="ambiguous" id="S2.Ex1.m1.3.3.4.1.cmml" xref="S2.Ex1.m1.3.3.4">subscript</csymbol><ci id="S2.Ex1.m1.3.3.4.2.cmml" xref="S2.Ex1.m1.3.3.4.2">𝜃</ci><apply id="S2.Ex1.m1.3.3.4.3.cmml" xref="S2.Ex1.m1.3.3.4.3"><csymbol cd="ambiguous" id="S2.Ex1.m1.3.3.4.3.1.cmml" xref="S2.Ex1.m1.3.3.4.3">subscript</csymbol><ci id="S2.Ex1.m1.3.3.4.3.2.cmml" xref="S2.Ex1.m1.3.3.4.3.2">𝑆</ci><cn id="S2.Ex1.m1.3.3.4.3.3.cmml" type="integer" xref="S2.Ex1.m1.3.3.4.3.3">1</cn></apply></apply><apply id="S2.Ex1.m1.3.3.2.3.cmml" xref="S2.Ex1.m1.3.3.2.2"><ci id="S2.Ex1.m1.1.1.cmml" xref="S2.Ex1.m1.1.1">atan2</ci><apply id="S2.Ex1.m1.2.2.1.1.1.1.cmml" xref="S2.Ex1.m1.2.2.1.1.1.1"><minus id="S2.Ex1.m1.2.2.1.1.1.1.1.cmml" xref="S2.Ex1.m1.2.2.1.1.1.1.1"></minus><apply id="S2.Ex1.m1.2.2.1.1.1.1.2.cmml" xref="S2.Ex1.m1.2.2.1.1.1.1.2"><csymbol cd="ambiguous" id="S2.Ex1.m1.2.2.1.1.1.1.2.1.cmml" xref="S2.Ex1.m1.2.2.1.1.1.1.2">subscript</csymbol><ci id="S2.Ex1.m1.2.2.1.1.1.1.2.2.cmml" xref="S2.Ex1.m1.2.2.1.1.1.1.2.2">𝑏</ci><apply id="S2.Ex1.m1.2.2.1.1.1.1.2.3.cmml" xref="S2.Ex1.m1.2.2.1.1.1.1.2.3"><csymbol cd="ambiguous" id="S2.Ex1.m1.2.2.1.1.1.1.2.3.1.cmml" xref="S2.Ex1.m1.2.2.1.1.1.1.2.3">subscript</csymbol><ci id="S2.Ex1.m1.2.2.1.1.1.1.2.3.2.cmml" xref="S2.Ex1.m1.2.2.1.1.1.1.2.3.2">𝑆</ci><cn id="S2.Ex1.m1.2.2.1.1.1.1.2.3.3.cmml" type="integer" xref="S2.Ex1.m1.2.2.1.1.1.1.2.3.3">1</cn></apply></apply><apply id="S2.Ex1.m1.2.2.1.1.1.1.3.cmml" xref="S2.Ex1.m1.2.2.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.Ex1.m1.2.2.1.1.1.1.3.1.cmml" xref="S2.Ex1.m1.2.2.1.1.1.1.3">subscript</csymbol><ci id="S2.Ex1.m1.2.2.1.1.1.1.3.2.cmml" xref="S2.Ex1.m1.2.2.1.1.1.1.3.2">𝑏</ci><ci id="S2.Ex1.m1.2.2.1.1.1.1.3.3.cmml" xref="S2.Ex1.m1.2.2.1.1.1.1.3.3">𝐿</ci></apply></apply><apply id="S2.Ex1.m1.3.3.2.2.2.2.cmml" xref="S2.Ex1.m1.3.3.2.2.2.2"><minus id="S2.Ex1.m1.3.3.2.2.2.2.1.cmml" xref="S2.Ex1.m1.3.3.2.2.2.2.1"></minus><apply id="S2.Ex1.m1.3.3.2.2.2.2.2.cmml" xref="S2.Ex1.m1.3.3.2.2.2.2.2"><csymbol cd="ambiguous" id="S2.Ex1.m1.3.3.2.2.2.2.2.1.cmml" xref="S2.Ex1.m1.3.3.2.2.2.2.2">subscript</csymbol><ci id="S2.Ex1.m1.3.3.2.2.2.2.2.2.cmml" xref="S2.Ex1.m1.3.3.2.2.2.2.2.2">𝑎</ci><apply id="S2.Ex1.m1.3.3.2.2.2.2.2.3.cmml" xref="S2.Ex1.m1.3.3.2.2.2.2.2.3"><csymbol cd="ambiguous" id="S2.Ex1.m1.3.3.2.2.2.2.2.3.1.cmml" xref="S2.Ex1.m1.3.3.2.2.2.2.2.3">subscript</csymbol><ci id="S2.Ex1.m1.3.3.2.2.2.2.2.3.2.cmml" xref="S2.Ex1.m1.3.3.2.2.2.2.2.3.2">𝑆</ci><cn id="S2.Ex1.m1.3.3.2.2.2.2.2.3.3.cmml" type="integer" xref="S2.Ex1.m1.3.3.2.2.2.2.2.3.3">1</cn></apply></apply><apply id="S2.Ex1.m1.3.3.2.2.2.2.3.cmml" xref="S2.Ex1.m1.3.3.2.2.2.2.3"><csymbol cd="ambiguous" id="S2.Ex1.m1.3.3.2.2.2.2.3.1.cmml" xref="S2.Ex1.m1.3.3.2.2.2.2.3">subscript</csymbol><ci id="S2.Ex1.m1.3.3.2.2.2.2.3.2.cmml" xref="S2.Ex1.m1.3.3.2.2.2.2.3.2">𝑎</ci><ci id="S2.Ex1.m1.3.3.2.2.2.2.3.3.cmml" xref="S2.Ex1.m1.3.3.2.2.2.2.3.3">𝐿</ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.Ex1.m1.3c">\theta_{S_{1}}=\operatorname{atan2}(b_{S_{1}}-b_{L},\;a_{S_{1}}-a_{L})</annotation><annotation encoding="application/x-llamapun" id="S2.Ex1.m1.3d">italic_θ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = atan2 ( italic_b start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT )</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> </tr></tbody> </table> <table class="ltx_equation ltx_eqn_table" id="S2.Ex2"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\theta_{S_{2}}=\operatorname{atan2}(b_{S_{2}}-b_{L},\;a_{S_{2}}-a_{L})" class="ltx_Math" display="block" id="S2.Ex2.m1.3"><semantics id="S2.Ex2.m1.3a"><mrow id="S2.Ex2.m1.3.3" xref="S2.Ex2.m1.3.3.cmml"><msub id="S2.Ex2.m1.3.3.4" xref="S2.Ex2.m1.3.3.4.cmml"><mi id="S2.Ex2.m1.3.3.4.2" xref="S2.Ex2.m1.3.3.4.2.cmml">θ</mi><msub id="S2.Ex2.m1.3.3.4.3" xref="S2.Ex2.m1.3.3.4.3.cmml"><mi id="S2.Ex2.m1.3.3.4.3.2" xref="S2.Ex2.m1.3.3.4.3.2.cmml">S</mi><mn id="S2.Ex2.m1.3.3.4.3.3" xref="S2.Ex2.m1.3.3.4.3.3.cmml">2</mn></msub></msub><mo id="S2.Ex2.m1.3.3.3" xref="S2.Ex2.m1.3.3.3.cmml">=</mo><mrow id="S2.Ex2.m1.3.3.2.2" xref="S2.Ex2.m1.3.3.2.3.cmml"><mi id="S2.Ex2.m1.1.1" xref="S2.Ex2.m1.1.1.cmml">atan2</mi><mo id="S2.Ex2.m1.3.3.2.2a" xref="S2.Ex2.m1.3.3.2.3.cmml">⁡</mo><mrow id="S2.Ex2.m1.3.3.2.2.2" xref="S2.Ex2.m1.3.3.2.3.cmml"><mo id="S2.Ex2.m1.3.3.2.2.2.3" stretchy="false" xref="S2.Ex2.m1.3.3.2.3.cmml">(</mo><mrow id="S2.Ex2.m1.2.2.1.1.1.1" xref="S2.Ex2.m1.2.2.1.1.1.1.cmml"><msub id="S2.Ex2.m1.2.2.1.1.1.1.2" xref="S2.Ex2.m1.2.2.1.1.1.1.2.cmml"><mi id="S2.Ex2.m1.2.2.1.1.1.1.2.2" xref="S2.Ex2.m1.2.2.1.1.1.1.2.2.cmml">b</mi><msub id="S2.Ex2.m1.2.2.1.1.1.1.2.3" xref="S2.Ex2.m1.2.2.1.1.1.1.2.3.cmml"><mi id="S2.Ex2.m1.2.2.1.1.1.1.2.3.2" xref="S2.Ex2.m1.2.2.1.1.1.1.2.3.2.cmml">S</mi><mn id="S2.Ex2.m1.2.2.1.1.1.1.2.3.3" xref="S2.Ex2.m1.2.2.1.1.1.1.2.3.3.cmml">2</mn></msub></msub><mo id="S2.Ex2.m1.2.2.1.1.1.1.1" xref="S2.Ex2.m1.2.2.1.1.1.1.1.cmml">−</mo><msub id="S2.Ex2.m1.2.2.1.1.1.1.3" xref="S2.Ex2.m1.2.2.1.1.1.1.3.cmml"><mi id="S2.Ex2.m1.2.2.1.1.1.1.3.2" xref="S2.Ex2.m1.2.2.1.1.1.1.3.2.cmml">b</mi><mi id="S2.Ex2.m1.2.2.1.1.1.1.3.3" xref="S2.Ex2.m1.2.2.1.1.1.1.3.3.cmml">L</mi></msub></mrow><mo id="S2.Ex2.m1.3.3.2.2.2.4" rspace="0.447em" xref="S2.Ex2.m1.3.3.2.3.cmml">,</mo><mrow id="S2.Ex2.m1.3.3.2.2.2.2" xref="S2.Ex2.m1.3.3.2.2.2.2.cmml"><msub id="S2.Ex2.m1.3.3.2.2.2.2.2" xref="S2.Ex2.m1.3.3.2.2.2.2.2.cmml"><mi id="S2.Ex2.m1.3.3.2.2.2.2.2.2" xref="S2.Ex2.m1.3.3.2.2.2.2.2.2.cmml">a</mi><msub id="S2.Ex2.m1.3.3.2.2.2.2.2.3" xref="S2.Ex2.m1.3.3.2.2.2.2.2.3.cmml"><mi id="S2.Ex2.m1.3.3.2.2.2.2.2.3.2" xref="S2.Ex2.m1.3.3.2.2.2.2.2.3.2.cmml">S</mi><mn id="S2.Ex2.m1.3.3.2.2.2.2.2.3.3" xref="S2.Ex2.m1.3.3.2.2.2.2.2.3.3.cmml">2</mn></msub></msub><mo id="S2.Ex2.m1.3.3.2.2.2.2.1" xref="S2.Ex2.m1.3.3.2.2.2.2.1.cmml">−</mo><msub id="S2.Ex2.m1.3.3.2.2.2.2.3" xref="S2.Ex2.m1.3.3.2.2.2.2.3.cmml"><mi id="S2.Ex2.m1.3.3.2.2.2.2.3.2" xref="S2.Ex2.m1.3.3.2.2.2.2.3.2.cmml">a</mi><mi id="S2.Ex2.m1.3.3.2.2.2.2.3.3" xref="S2.Ex2.m1.3.3.2.2.2.2.3.3.cmml">L</mi></msub></mrow><mo id="S2.Ex2.m1.3.3.2.2.2.5" stretchy="false" xref="S2.Ex2.m1.3.3.2.3.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.Ex2.m1.3b"><apply id="S2.Ex2.m1.3.3.cmml" xref="S2.Ex2.m1.3.3"><eq id="S2.Ex2.m1.3.3.3.cmml" xref="S2.Ex2.m1.3.3.3"></eq><apply id="S2.Ex2.m1.3.3.4.cmml" xref="S2.Ex2.m1.3.3.4"><csymbol cd="ambiguous" id="S2.Ex2.m1.3.3.4.1.cmml" xref="S2.Ex2.m1.3.3.4">subscript</csymbol><ci id="S2.Ex2.m1.3.3.4.2.cmml" xref="S2.Ex2.m1.3.3.4.2">𝜃</ci><apply id="S2.Ex2.m1.3.3.4.3.cmml" xref="S2.Ex2.m1.3.3.4.3"><csymbol cd="ambiguous" id="S2.Ex2.m1.3.3.4.3.1.cmml" xref="S2.Ex2.m1.3.3.4.3">subscript</csymbol><ci id="S2.Ex2.m1.3.3.4.3.2.cmml" xref="S2.Ex2.m1.3.3.4.3.2">𝑆</ci><cn id="S2.Ex2.m1.3.3.4.3.3.cmml" type="integer" xref="S2.Ex2.m1.3.3.4.3.3">2</cn></apply></apply><apply id="S2.Ex2.m1.3.3.2.3.cmml" xref="S2.Ex2.m1.3.3.2.2"><ci id="S2.Ex2.m1.1.1.cmml" xref="S2.Ex2.m1.1.1">atan2</ci><apply id="S2.Ex2.m1.2.2.1.1.1.1.cmml" xref="S2.Ex2.m1.2.2.1.1.1.1"><minus id="S2.Ex2.m1.2.2.1.1.1.1.1.cmml" xref="S2.Ex2.m1.2.2.1.1.1.1.1"></minus><apply id="S2.Ex2.m1.2.2.1.1.1.1.2.cmml" xref="S2.Ex2.m1.2.2.1.1.1.1.2"><csymbol cd="ambiguous" id="S2.Ex2.m1.2.2.1.1.1.1.2.1.cmml" xref="S2.Ex2.m1.2.2.1.1.1.1.2">subscript</csymbol><ci id="S2.Ex2.m1.2.2.1.1.1.1.2.2.cmml" xref="S2.Ex2.m1.2.2.1.1.1.1.2.2">𝑏</ci><apply id="S2.Ex2.m1.2.2.1.1.1.1.2.3.cmml" xref="S2.Ex2.m1.2.2.1.1.1.1.2.3"><csymbol cd="ambiguous" id="S2.Ex2.m1.2.2.1.1.1.1.2.3.1.cmml" xref="S2.Ex2.m1.2.2.1.1.1.1.2.3">subscript</csymbol><ci id="S2.Ex2.m1.2.2.1.1.1.1.2.3.2.cmml" xref="S2.Ex2.m1.2.2.1.1.1.1.2.3.2">𝑆</ci><cn id="S2.Ex2.m1.2.2.1.1.1.1.2.3.3.cmml" type="integer" xref="S2.Ex2.m1.2.2.1.1.1.1.2.3.3">2</cn></apply></apply><apply id="S2.Ex2.m1.2.2.1.1.1.1.3.cmml" xref="S2.Ex2.m1.2.2.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.Ex2.m1.2.2.1.1.1.1.3.1.cmml" xref="S2.Ex2.m1.2.2.1.1.1.1.3">subscript</csymbol><ci id="S2.Ex2.m1.2.2.1.1.1.1.3.2.cmml" xref="S2.Ex2.m1.2.2.1.1.1.1.3.2">𝑏</ci><ci id="S2.Ex2.m1.2.2.1.1.1.1.3.3.cmml" xref="S2.Ex2.m1.2.2.1.1.1.1.3.3">𝐿</ci></apply></apply><apply id="S2.Ex2.m1.3.3.2.2.2.2.cmml" xref="S2.Ex2.m1.3.3.2.2.2.2"><minus id="S2.Ex2.m1.3.3.2.2.2.2.1.cmml" xref="S2.Ex2.m1.3.3.2.2.2.2.1"></minus><apply id="S2.Ex2.m1.3.3.2.2.2.2.2.cmml" xref="S2.Ex2.m1.3.3.2.2.2.2.2"><csymbol cd="ambiguous" id="S2.Ex2.m1.3.3.2.2.2.2.2.1.cmml" xref="S2.Ex2.m1.3.3.2.2.2.2.2">subscript</csymbol><ci id="S2.Ex2.m1.3.3.2.2.2.2.2.2.cmml" xref="S2.Ex2.m1.3.3.2.2.2.2.2.2">𝑎</ci><apply id="S2.Ex2.m1.3.3.2.2.2.2.2.3.cmml" xref="S2.Ex2.m1.3.3.2.2.2.2.2.3"><csymbol cd="ambiguous" id="S2.Ex2.m1.3.3.2.2.2.2.2.3.1.cmml" xref="S2.Ex2.m1.3.3.2.2.2.2.2.3">subscript</csymbol><ci id="S2.Ex2.m1.3.3.2.2.2.2.2.3.2.cmml" xref="S2.Ex2.m1.3.3.2.2.2.2.2.3.2">𝑆</ci><cn id="S2.Ex2.m1.3.3.2.2.2.2.2.3.3.cmml" type="integer" xref="S2.Ex2.m1.3.3.2.2.2.2.2.3.3">2</cn></apply></apply><apply id="S2.Ex2.m1.3.3.2.2.2.2.3.cmml" xref="S2.Ex2.m1.3.3.2.2.2.2.3"><csymbol cd="ambiguous" id="S2.Ex2.m1.3.3.2.2.2.2.3.1.cmml" xref="S2.Ex2.m1.3.3.2.2.2.2.3">subscript</csymbol><ci id="S2.Ex2.m1.3.3.2.2.2.2.3.2.cmml" xref="S2.Ex2.m1.3.3.2.2.2.2.3.2">𝑎</ci><ci id="S2.Ex2.m1.3.3.2.2.2.2.3.3.cmml" xref="S2.Ex2.m1.3.3.2.2.2.2.3.3">𝐿</ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.Ex2.m1.3c">\theta_{S_{2}}=\operatorname{atan2}(b_{S_{2}}-b_{L},\;a_{S_{2}}-a_{L})</annotation><annotation encoding="application/x-llamapun" id="S2.Ex2.m1.3d">italic_θ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = atan2 ( italic_b start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT )</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> </tr></tbody> </table> </div> <div class="ltx_listingline" id="alg1.l6"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l6.2.1.1" style="font-size:80%;">6:</span></span> <span class="ltx_text ltx_font_bold" id="alg1.l6.1">Determine the admissible range for <math alttext="\theta_{h}" class="ltx_Math" display="inline" id="alg1.l6.1.m1.1"><semantics id="alg1.l6.1.m1.1a"><msub id="alg1.l6.1.m1.1.1" xref="alg1.l6.1.m1.1.1.cmml"><mi id="alg1.l6.1.m1.1.1.2" xref="alg1.l6.1.m1.1.1.2.cmml">θ</mi><mi id="alg1.l6.1.m1.1.1.3" xref="alg1.l6.1.m1.1.1.3.cmml">h</mi></msub><annotation-xml encoding="MathML-Content" id="alg1.l6.1.m1.1b"><apply id="alg1.l6.1.m1.1.1.cmml" xref="alg1.l6.1.m1.1.1"><csymbol cd="ambiguous" id="alg1.l6.1.m1.1.1.1.cmml" xref="alg1.l6.1.m1.1.1">subscript</csymbol><ci id="alg1.l6.1.m1.1.1.2.cmml" xref="alg1.l6.1.m1.1.1.2">𝜃</ci><ci id="alg1.l6.1.m1.1.1.3.cmml" xref="alg1.l6.1.m1.1.1.3">ℎ</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l6.1.m1.1c">\theta_{h}</annotation><annotation encoding="application/x-llamapun" id="alg1.l6.1.m1.1d">italic_θ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT</annotation></semantics></math>:</span> </div> <div class="ltx_listingline" id="alg1.l7"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l7.1.1.1" style="font-size:80%;">7:</span></span>Ensure that the listener’s head angle is always closer than <math alttext="|\theta_{u}^{\max}|" class="ltx_Math" display="inline" id="alg1.l7.m1.1"><semantics id="alg1.l7.m1.1a"><mrow id="alg1.l7.m1.1.1.1" xref="alg1.l7.m1.1.1.2.cmml"><mo id="alg1.l7.m1.1.1.1.2" stretchy="false" xref="alg1.l7.m1.1.1.2.1.cmml">|</mo><msubsup id="alg1.l7.m1.1.1.1.1" xref="alg1.l7.m1.1.1.1.1.cmml"><mi id="alg1.l7.m1.1.1.1.1.2.2" xref="alg1.l7.m1.1.1.1.1.2.2.cmml">θ</mi><mi id="alg1.l7.m1.1.1.1.1.2.3" xref="alg1.l7.m1.1.1.1.1.2.3.cmml">u</mi><mi id="alg1.l7.m1.1.1.1.1.3" xref="alg1.l7.m1.1.1.1.1.3.cmml">max</mi></msubsup><mo id="alg1.l7.m1.1.1.1.3" stretchy="false" xref="alg1.l7.m1.1.1.2.1.cmml">|</mo></mrow><annotation-xml encoding="MathML-Content" id="alg1.l7.m1.1b"><apply id="alg1.l7.m1.1.1.2.cmml" xref="alg1.l7.m1.1.1.1"><abs id="alg1.l7.m1.1.1.2.1.cmml" xref="alg1.l7.m1.1.1.1.2"></abs><apply id="alg1.l7.m1.1.1.1.1.cmml" xref="alg1.l7.m1.1.1.1.1"><csymbol cd="ambiguous" id="alg1.l7.m1.1.1.1.1.1.cmml" xref="alg1.l7.m1.1.1.1.1">superscript</csymbol><apply id="alg1.l7.m1.1.1.1.1.2.cmml" xref="alg1.l7.m1.1.1.1.1"><csymbol cd="ambiguous" id="alg1.l7.m1.1.1.1.1.2.1.cmml" xref="alg1.l7.m1.1.1.1.1">subscript</csymbol><ci id="alg1.l7.m1.1.1.1.1.2.2.cmml" xref="alg1.l7.m1.1.1.1.1.2.2">𝜃</ci><ci id="alg1.l7.m1.1.1.1.1.2.3.cmml" xref="alg1.l7.m1.1.1.1.1.2.3">𝑢</ci></apply><max id="alg1.l7.m1.1.1.1.1.3.cmml" xref="alg1.l7.m1.1.1.1.1.3"></max></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l7.m1.1c">|\theta_{u}^{\max}|</annotation><annotation encoding="application/x-llamapun" id="alg1.l7.m1.1d">| italic_θ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT |</annotation></semantics></math> from both speakers: <table class="ltx_equation ltx_eqn_table" id="S2.Ex3"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\theta_{h}^{\min}=\min\{\theta_{S_{1}},\theta_{S_{2}}\}-|\theta_{u}^{\max}|" class="ltx_Math" display="block" id="S2.Ex3.m1.4"><semantics id="S2.Ex3.m1.4a"><mrow id="S2.Ex3.m1.4.4" xref="S2.Ex3.m1.4.4.cmml"><msubsup id="S2.Ex3.m1.4.4.5" xref="S2.Ex3.m1.4.4.5.cmml"><mi id="S2.Ex3.m1.4.4.5.2.2" xref="S2.Ex3.m1.4.4.5.2.2.cmml">θ</mi><mi id="S2.Ex3.m1.4.4.5.2.3" xref="S2.Ex3.m1.4.4.5.2.3.cmml">h</mi><mi id="S2.Ex3.m1.4.4.5.3" xref="S2.Ex3.m1.4.4.5.3.cmml">min</mi></msubsup><mo id="S2.Ex3.m1.4.4.4" xref="S2.Ex3.m1.4.4.4.cmml">=</mo><mrow id="S2.Ex3.m1.4.4.3" xref="S2.Ex3.m1.4.4.3.cmml"><mrow id="S2.Ex3.m1.3.3.2.2.2" xref="S2.Ex3.m1.3.3.2.2.3.cmml"><mi id="S2.Ex3.m1.1.1" xref="S2.Ex3.m1.1.1.cmml">min</mi><mo id="S2.Ex3.m1.3.3.2.2.2a" xref="S2.Ex3.m1.3.3.2.2.3.cmml">⁡</mo><mrow id="S2.Ex3.m1.3.3.2.2.2.2" xref="S2.Ex3.m1.3.3.2.2.3.cmml"><mo id="S2.Ex3.m1.3.3.2.2.2.2.3" stretchy="false" xref="S2.Ex3.m1.3.3.2.2.3.cmml">{</mo><msub id="S2.Ex3.m1.2.2.1.1.1.1.1" xref="S2.Ex3.m1.2.2.1.1.1.1.1.cmml"><mi id="S2.Ex3.m1.2.2.1.1.1.1.1.2" xref="S2.Ex3.m1.2.2.1.1.1.1.1.2.cmml">θ</mi><msub id="S2.Ex3.m1.2.2.1.1.1.1.1.3" xref="S2.Ex3.m1.2.2.1.1.1.1.1.3.cmml"><mi id="S2.Ex3.m1.2.2.1.1.1.1.1.3.2" xref="S2.Ex3.m1.2.2.1.1.1.1.1.3.2.cmml">S</mi><mn id="S2.Ex3.m1.2.2.1.1.1.1.1.3.3" xref="S2.Ex3.m1.2.2.1.1.1.1.1.3.3.cmml">1</mn></msub></msub><mo id="S2.Ex3.m1.3.3.2.2.2.2.4" xref="S2.Ex3.m1.3.3.2.2.3.cmml">,</mo><msub id="S2.Ex3.m1.3.3.2.2.2.2.2" xref="S2.Ex3.m1.3.3.2.2.2.2.2.cmml"><mi id="S2.Ex3.m1.3.3.2.2.2.2.2.2" xref="S2.Ex3.m1.3.3.2.2.2.2.2.2.cmml">θ</mi><msub id="S2.Ex3.m1.3.3.2.2.2.2.2.3" xref="S2.Ex3.m1.3.3.2.2.2.2.2.3.cmml"><mi id="S2.Ex3.m1.3.3.2.2.2.2.2.3.2" xref="S2.Ex3.m1.3.3.2.2.2.2.2.3.2.cmml">S</mi><mn id="S2.Ex3.m1.3.3.2.2.2.2.2.3.3" xref="S2.Ex3.m1.3.3.2.2.2.2.2.3.3.cmml">2</mn></msub></msub><mo id="S2.Ex3.m1.3.3.2.2.2.2.5" stretchy="false" xref="S2.Ex3.m1.3.3.2.2.3.cmml">}</mo></mrow></mrow><mo id="S2.Ex3.m1.4.4.3.4" xref="S2.Ex3.m1.4.4.3.4.cmml">−</mo><mrow id="S2.Ex3.m1.4.4.3.3.1" xref="S2.Ex3.m1.4.4.3.3.2.cmml"><mo id="S2.Ex3.m1.4.4.3.3.1.2" stretchy="false" xref="S2.Ex3.m1.4.4.3.3.2.1.cmml">|</mo><msubsup id="S2.Ex3.m1.4.4.3.3.1.1" xref="S2.Ex3.m1.4.4.3.3.1.1.cmml"><mi id="S2.Ex3.m1.4.4.3.3.1.1.2.2" xref="S2.Ex3.m1.4.4.3.3.1.1.2.2.cmml">θ</mi><mi id="S2.Ex3.m1.4.4.3.3.1.1.2.3" xref="S2.Ex3.m1.4.4.3.3.1.1.2.3.cmml">u</mi><mi id="S2.Ex3.m1.4.4.3.3.1.1.3" xref="S2.Ex3.m1.4.4.3.3.1.1.3.cmml">max</mi></msubsup><mo id="S2.Ex3.m1.4.4.3.3.1.3" stretchy="false" xref="S2.Ex3.m1.4.4.3.3.2.1.cmml">|</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.Ex3.m1.4b"><apply id="S2.Ex3.m1.4.4.cmml" xref="S2.Ex3.m1.4.4"><eq id="S2.Ex3.m1.4.4.4.cmml" xref="S2.Ex3.m1.4.4.4"></eq><apply id="S2.Ex3.m1.4.4.5.cmml" xref="S2.Ex3.m1.4.4.5"><csymbol cd="ambiguous" id="S2.Ex3.m1.4.4.5.1.cmml" xref="S2.Ex3.m1.4.4.5">superscript</csymbol><apply id="S2.Ex3.m1.4.4.5.2.cmml" xref="S2.Ex3.m1.4.4.5"><csymbol cd="ambiguous" id="S2.Ex3.m1.4.4.5.2.1.cmml" xref="S2.Ex3.m1.4.4.5">subscript</csymbol><ci id="S2.Ex3.m1.4.4.5.2.2.cmml" xref="S2.Ex3.m1.4.4.5.2.2">𝜃</ci><ci id="S2.Ex3.m1.4.4.5.2.3.cmml" xref="S2.Ex3.m1.4.4.5.2.3">ℎ</ci></apply><min id="S2.Ex3.m1.4.4.5.3.cmml" xref="S2.Ex3.m1.4.4.5.3"></min></apply><apply id="S2.Ex3.m1.4.4.3.cmml" xref="S2.Ex3.m1.4.4.3"><minus id="S2.Ex3.m1.4.4.3.4.cmml" xref="S2.Ex3.m1.4.4.3.4"></minus><apply id="S2.Ex3.m1.3.3.2.2.3.cmml" xref="S2.Ex3.m1.3.3.2.2.2"><min id="S2.Ex3.m1.1.1.cmml" xref="S2.Ex3.m1.1.1"></min><apply id="S2.Ex3.m1.2.2.1.1.1.1.1.cmml" xref="S2.Ex3.m1.2.2.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.Ex3.m1.2.2.1.1.1.1.1.1.cmml" xref="S2.Ex3.m1.2.2.1.1.1.1.1">subscript</csymbol><ci id="S2.Ex3.m1.2.2.1.1.1.1.1.2.cmml" xref="S2.Ex3.m1.2.2.1.1.1.1.1.2">𝜃</ci><apply id="S2.Ex3.m1.2.2.1.1.1.1.1.3.cmml" xref="S2.Ex3.m1.2.2.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.Ex3.m1.2.2.1.1.1.1.1.3.1.cmml" xref="S2.Ex3.m1.2.2.1.1.1.1.1.3">subscript</csymbol><ci id="S2.Ex3.m1.2.2.1.1.1.1.1.3.2.cmml" xref="S2.Ex3.m1.2.2.1.1.1.1.1.3.2">𝑆</ci><cn id="S2.Ex3.m1.2.2.1.1.1.1.1.3.3.cmml" type="integer" xref="S2.Ex3.m1.2.2.1.1.1.1.1.3.3">1</cn></apply></apply><apply id="S2.Ex3.m1.3.3.2.2.2.2.2.cmml" xref="S2.Ex3.m1.3.3.2.2.2.2.2"><csymbol cd="ambiguous" id="S2.Ex3.m1.3.3.2.2.2.2.2.1.cmml" xref="S2.Ex3.m1.3.3.2.2.2.2.2">subscript</csymbol><ci id="S2.Ex3.m1.3.3.2.2.2.2.2.2.cmml" xref="S2.Ex3.m1.3.3.2.2.2.2.2.2">𝜃</ci><apply id="S2.Ex3.m1.3.3.2.2.2.2.2.3.cmml" xref="S2.Ex3.m1.3.3.2.2.2.2.2.3"><csymbol cd="ambiguous" id="S2.Ex3.m1.3.3.2.2.2.2.2.3.1.cmml" xref="S2.Ex3.m1.3.3.2.2.2.2.2.3">subscript</csymbol><ci id="S2.Ex3.m1.3.3.2.2.2.2.2.3.2.cmml" xref="S2.Ex3.m1.3.3.2.2.2.2.2.3.2">𝑆</ci><cn id="S2.Ex3.m1.3.3.2.2.2.2.2.3.3.cmml" type="integer" xref="S2.Ex3.m1.3.3.2.2.2.2.2.3.3">2</cn></apply></apply></apply><apply id="S2.Ex3.m1.4.4.3.3.2.cmml" xref="S2.Ex3.m1.4.4.3.3.1"><abs id="S2.Ex3.m1.4.4.3.3.2.1.cmml" xref="S2.Ex3.m1.4.4.3.3.1.2"></abs><apply id="S2.Ex3.m1.4.4.3.3.1.1.cmml" xref="S2.Ex3.m1.4.4.3.3.1.1"><csymbol cd="ambiguous" id="S2.Ex3.m1.4.4.3.3.1.1.1.cmml" xref="S2.Ex3.m1.4.4.3.3.1.1">superscript</csymbol><apply id="S2.Ex3.m1.4.4.3.3.1.1.2.cmml" xref="S2.Ex3.m1.4.4.3.3.1.1"><csymbol cd="ambiguous" id="S2.Ex3.m1.4.4.3.3.1.1.2.1.cmml" xref="S2.Ex3.m1.4.4.3.3.1.1">subscript</csymbol><ci id="S2.Ex3.m1.4.4.3.3.1.1.2.2.cmml" xref="S2.Ex3.m1.4.4.3.3.1.1.2.2">𝜃</ci><ci id="S2.Ex3.m1.4.4.3.3.1.1.2.3.cmml" xref="S2.Ex3.m1.4.4.3.3.1.1.2.3">𝑢</ci></apply><max id="S2.Ex3.m1.4.4.3.3.1.1.3.cmml" xref="S2.Ex3.m1.4.4.3.3.1.1.3"></max></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.Ex3.m1.4c">\theta_{h}^{\min}=\min\{\theta_{S_{1}},\theta_{S_{2}}\}-|\theta_{u}^{\max}|</annotation><annotation encoding="application/x-llamapun" id="S2.Ex3.m1.4d">italic_θ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT = roman_min { italic_θ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } - | italic_θ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT |</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> </tr></tbody> </table> <table class="ltx_equation ltx_eqn_table" id="S2.Ex4"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\theta_{h}^{\max}=\max\{\theta_{S_{1}},\theta_{S_{2}}\}+|\theta_{u}^{\max}|" class="ltx_Math" display="block" id="S2.Ex4.m1.4"><semantics id="S2.Ex4.m1.4a"><mrow id="S2.Ex4.m1.4.4" xref="S2.Ex4.m1.4.4.cmml"><msubsup id="S2.Ex4.m1.4.4.5" xref="S2.Ex4.m1.4.4.5.cmml"><mi id="S2.Ex4.m1.4.4.5.2.2" xref="S2.Ex4.m1.4.4.5.2.2.cmml">θ</mi><mi id="S2.Ex4.m1.4.4.5.2.3" xref="S2.Ex4.m1.4.4.5.2.3.cmml">h</mi><mi id="S2.Ex4.m1.4.4.5.3" xref="S2.Ex4.m1.4.4.5.3.cmml">max</mi></msubsup><mo id="S2.Ex4.m1.4.4.4" xref="S2.Ex4.m1.4.4.4.cmml">=</mo><mrow id="S2.Ex4.m1.4.4.3" xref="S2.Ex4.m1.4.4.3.cmml"><mrow id="S2.Ex4.m1.3.3.2.2.2" xref="S2.Ex4.m1.3.3.2.2.3.cmml"><mi id="S2.Ex4.m1.1.1" xref="S2.Ex4.m1.1.1.cmml">max</mi><mo id="S2.Ex4.m1.3.3.2.2.2a" xref="S2.Ex4.m1.3.3.2.2.3.cmml">⁡</mo><mrow id="S2.Ex4.m1.3.3.2.2.2.2" xref="S2.Ex4.m1.3.3.2.2.3.cmml"><mo id="S2.Ex4.m1.3.3.2.2.2.2.3" stretchy="false" xref="S2.Ex4.m1.3.3.2.2.3.cmml">{</mo><msub id="S2.Ex4.m1.2.2.1.1.1.1.1" xref="S2.Ex4.m1.2.2.1.1.1.1.1.cmml"><mi id="S2.Ex4.m1.2.2.1.1.1.1.1.2" xref="S2.Ex4.m1.2.2.1.1.1.1.1.2.cmml">θ</mi><msub id="S2.Ex4.m1.2.2.1.1.1.1.1.3" xref="S2.Ex4.m1.2.2.1.1.1.1.1.3.cmml"><mi id="S2.Ex4.m1.2.2.1.1.1.1.1.3.2" xref="S2.Ex4.m1.2.2.1.1.1.1.1.3.2.cmml">S</mi><mn id="S2.Ex4.m1.2.2.1.1.1.1.1.3.3" xref="S2.Ex4.m1.2.2.1.1.1.1.1.3.3.cmml">1</mn></msub></msub><mo id="S2.Ex4.m1.3.3.2.2.2.2.4" xref="S2.Ex4.m1.3.3.2.2.3.cmml">,</mo><msub id="S2.Ex4.m1.3.3.2.2.2.2.2" xref="S2.Ex4.m1.3.3.2.2.2.2.2.cmml"><mi id="S2.Ex4.m1.3.3.2.2.2.2.2.2" xref="S2.Ex4.m1.3.3.2.2.2.2.2.2.cmml">θ</mi><msub id="S2.Ex4.m1.3.3.2.2.2.2.2.3" xref="S2.Ex4.m1.3.3.2.2.2.2.2.3.cmml"><mi id="S2.Ex4.m1.3.3.2.2.2.2.2.3.2" xref="S2.Ex4.m1.3.3.2.2.2.2.2.3.2.cmml">S</mi><mn id="S2.Ex4.m1.3.3.2.2.2.2.2.3.3" xref="S2.Ex4.m1.3.3.2.2.2.2.2.3.3.cmml">2</mn></msub></msub><mo id="S2.Ex4.m1.3.3.2.2.2.2.5" stretchy="false" xref="S2.Ex4.m1.3.3.2.2.3.cmml">}</mo></mrow></mrow><mo id="S2.Ex4.m1.4.4.3.4" xref="S2.Ex4.m1.4.4.3.4.cmml">+</mo><mrow id="S2.Ex4.m1.4.4.3.3.1" xref="S2.Ex4.m1.4.4.3.3.2.cmml"><mo id="S2.Ex4.m1.4.4.3.3.1.2" stretchy="false" xref="S2.Ex4.m1.4.4.3.3.2.1.cmml">|</mo><msubsup id="S2.Ex4.m1.4.4.3.3.1.1" xref="S2.Ex4.m1.4.4.3.3.1.1.cmml"><mi id="S2.Ex4.m1.4.4.3.3.1.1.2.2" xref="S2.Ex4.m1.4.4.3.3.1.1.2.2.cmml">θ</mi><mi id="S2.Ex4.m1.4.4.3.3.1.1.2.3" xref="S2.Ex4.m1.4.4.3.3.1.1.2.3.cmml">u</mi><mi id="S2.Ex4.m1.4.4.3.3.1.1.3" xref="S2.Ex4.m1.4.4.3.3.1.1.3.cmml">max</mi></msubsup><mo id="S2.Ex4.m1.4.4.3.3.1.3" stretchy="false" xref="S2.Ex4.m1.4.4.3.3.2.1.cmml">|</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.Ex4.m1.4b"><apply id="S2.Ex4.m1.4.4.cmml" xref="S2.Ex4.m1.4.4"><eq id="S2.Ex4.m1.4.4.4.cmml" xref="S2.Ex4.m1.4.4.4"></eq><apply id="S2.Ex4.m1.4.4.5.cmml" xref="S2.Ex4.m1.4.4.5"><csymbol cd="ambiguous" id="S2.Ex4.m1.4.4.5.1.cmml" xref="S2.Ex4.m1.4.4.5">superscript</csymbol><apply id="S2.Ex4.m1.4.4.5.2.cmml" xref="S2.Ex4.m1.4.4.5"><csymbol cd="ambiguous" id="S2.Ex4.m1.4.4.5.2.1.cmml" xref="S2.Ex4.m1.4.4.5">subscript</csymbol><ci id="S2.Ex4.m1.4.4.5.2.2.cmml" xref="S2.Ex4.m1.4.4.5.2.2">𝜃</ci><ci id="S2.Ex4.m1.4.4.5.2.3.cmml" xref="S2.Ex4.m1.4.4.5.2.3">ℎ</ci></apply><max id="S2.Ex4.m1.4.4.5.3.cmml" xref="S2.Ex4.m1.4.4.5.3"></max></apply><apply id="S2.Ex4.m1.4.4.3.cmml" xref="S2.Ex4.m1.4.4.3"><plus id="S2.Ex4.m1.4.4.3.4.cmml" xref="S2.Ex4.m1.4.4.3.4"></plus><apply id="S2.Ex4.m1.3.3.2.2.3.cmml" xref="S2.Ex4.m1.3.3.2.2.2"><max id="S2.Ex4.m1.1.1.cmml" xref="S2.Ex4.m1.1.1"></max><apply id="S2.Ex4.m1.2.2.1.1.1.1.1.cmml" xref="S2.Ex4.m1.2.2.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.Ex4.m1.2.2.1.1.1.1.1.1.cmml" xref="S2.Ex4.m1.2.2.1.1.1.1.1">subscript</csymbol><ci id="S2.Ex4.m1.2.2.1.1.1.1.1.2.cmml" xref="S2.Ex4.m1.2.2.1.1.1.1.1.2">𝜃</ci><apply id="S2.Ex4.m1.2.2.1.1.1.1.1.3.cmml" xref="S2.Ex4.m1.2.2.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S2.Ex4.m1.2.2.1.1.1.1.1.3.1.cmml" xref="S2.Ex4.m1.2.2.1.1.1.1.1.3">subscript</csymbol><ci id="S2.Ex4.m1.2.2.1.1.1.1.1.3.2.cmml" xref="S2.Ex4.m1.2.2.1.1.1.1.1.3.2">𝑆</ci><cn id="S2.Ex4.m1.2.2.1.1.1.1.1.3.3.cmml" type="integer" xref="S2.Ex4.m1.2.2.1.1.1.1.1.3.3">1</cn></apply></apply><apply id="S2.Ex4.m1.3.3.2.2.2.2.2.cmml" xref="S2.Ex4.m1.3.3.2.2.2.2.2"><csymbol cd="ambiguous" id="S2.Ex4.m1.3.3.2.2.2.2.2.1.cmml" xref="S2.Ex4.m1.3.3.2.2.2.2.2">subscript</csymbol><ci id="S2.Ex4.m1.3.3.2.2.2.2.2.2.cmml" xref="S2.Ex4.m1.3.3.2.2.2.2.2.2">𝜃</ci><apply id="S2.Ex4.m1.3.3.2.2.2.2.2.3.cmml" xref="S2.Ex4.m1.3.3.2.2.2.2.2.3"><csymbol cd="ambiguous" id="S2.Ex4.m1.3.3.2.2.2.2.2.3.1.cmml" xref="S2.Ex4.m1.3.3.2.2.2.2.2.3">subscript</csymbol><ci id="S2.Ex4.m1.3.3.2.2.2.2.2.3.2.cmml" xref="S2.Ex4.m1.3.3.2.2.2.2.2.3.2">𝑆</ci><cn id="S2.Ex4.m1.3.3.2.2.2.2.2.3.3.cmml" type="integer" xref="S2.Ex4.m1.3.3.2.2.2.2.2.3.3">2</cn></apply></apply></apply><apply id="S2.Ex4.m1.4.4.3.3.2.cmml" xref="S2.Ex4.m1.4.4.3.3.1"><abs id="S2.Ex4.m1.4.4.3.3.2.1.cmml" xref="S2.Ex4.m1.4.4.3.3.1.2"></abs><apply id="S2.Ex4.m1.4.4.3.3.1.1.cmml" xref="S2.Ex4.m1.4.4.3.3.1.1"><csymbol cd="ambiguous" id="S2.Ex4.m1.4.4.3.3.1.1.1.cmml" xref="S2.Ex4.m1.4.4.3.3.1.1">superscript</csymbol><apply id="S2.Ex4.m1.4.4.3.3.1.1.2.cmml" xref="S2.Ex4.m1.4.4.3.3.1.1"><csymbol cd="ambiguous" id="S2.Ex4.m1.4.4.3.3.1.1.2.1.cmml" xref="S2.Ex4.m1.4.4.3.3.1.1">subscript</csymbol><ci id="S2.Ex4.m1.4.4.3.3.1.1.2.2.cmml" xref="S2.Ex4.m1.4.4.3.3.1.1.2.2">𝜃</ci><ci id="S2.Ex4.m1.4.4.3.3.1.1.2.3.cmml" xref="S2.Ex4.m1.4.4.3.3.1.1.2.3">𝑢</ci></apply><max id="S2.Ex4.m1.4.4.3.3.1.1.3.cmml" xref="S2.Ex4.m1.4.4.3.3.1.1.3"></max></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.Ex4.m1.4c">\theta_{h}^{\max}=\max\{\theta_{S_{1}},\theta_{S_{2}}\}+|\theta_{u}^{\max}|</annotation><annotation encoding="application/x-llamapun" id="S2.Ex4.m1.4d">italic_θ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT = roman_max { italic_θ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } + | italic_θ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT |</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> </tr></tbody> </table> </div> <div class="ltx_listingline" id="alg1.l8"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l8.1.1.1" style="font-size:80%;">8:</span></span> <span class="ltx_text ltx_font_bold" id="alg1.l8.2">Sample the listener’s head angle:</span> <table class="ltx_equation ltx_eqn_table" id="S2.Ex5"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\theta_{h}\sim\operatorname{Uniform}(\theta_{h}^{\min},\;\theta_{h}^{\max})" class="ltx_Math" display="block" id="S2.Ex5.m1.3"><semantics id="S2.Ex5.m1.3a"><mrow id="S2.Ex5.m1.3.3" xref="S2.Ex5.m1.3.3.cmml"><msub id="S2.Ex5.m1.3.3.4" xref="S2.Ex5.m1.3.3.4.cmml"><mi id="S2.Ex5.m1.3.3.4.2" xref="S2.Ex5.m1.3.3.4.2.cmml">θ</mi><mi id="S2.Ex5.m1.3.3.4.3" xref="S2.Ex5.m1.3.3.4.3.cmml">h</mi></msub><mo id="S2.Ex5.m1.3.3.3" xref="S2.Ex5.m1.3.3.3.cmml">∼</mo><mrow id="S2.Ex5.m1.3.3.2.2" xref="S2.Ex5.m1.3.3.2.3.cmml"><mi id="S2.Ex5.m1.1.1" xref="S2.Ex5.m1.1.1.cmml">Uniform</mi><mo id="S2.Ex5.m1.3.3.2.2a" xref="S2.Ex5.m1.3.3.2.3.cmml">⁡</mo><mrow id="S2.Ex5.m1.3.3.2.2.2" xref="S2.Ex5.m1.3.3.2.3.cmml"><mo id="S2.Ex5.m1.3.3.2.2.2.3" stretchy="false" xref="S2.Ex5.m1.3.3.2.3.cmml">(</mo><msubsup id="S2.Ex5.m1.2.2.1.1.1.1" xref="S2.Ex5.m1.2.2.1.1.1.1.cmml"><mi id="S2.Ex5.m1.2.2.1.1.1.1.2.2" xref="S2.Ex5.m1.2.2.1.1.1.1.2.2.cmml">θ</mi><mi id="S2.Ex5.m1.2.2.1.1.1.1.2.3" xref="S2.Ex5.m1.2.2.1.1.1.1.2.3.cmml">h</mi><mi id="S2.Ex5.m1.2.2.1.1.1.1.3" xref="S2.Ex5.m1.2.2.1.1.1.1.3.cmml">min</mi></msubsup><mo id="S2.Ex5.m1.3.3.2.2.2.4" rspace="0.447em" xref="S2.Ex5.m1.3.3.2.3.cmml">,</mo><msubsup id="S2.Ex5.m1.3.3.2.2.2.2" xref="S2.Ex5.m1.3.3.2.2.2.2.cmml"><mi id="S2.Ex5.m1.3.3.2.2.2.2.2.2" xref="S2.Ex5.m1.3.3.2.2.2.2.2.2.cmml">θ</mi><mi id="S2.Ex5.m1.3.3.2.2.2.2.2.3" xref="S2.Ex5.m1.3.3.2.2.2.2.2.3.cmml">h</mi><mi id="S2.Ex5.m1.3.3.2.2.2.2.3" xref="S2.Ex5.m1.3.3.2.2.2.2.3.cmml">max</mi></msubsup><mo id="S2.Ex5.m1.3.3.2.2.2.5" stretchy="false" xref="S2.Ex5.m1.3.3.2.3.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.Ex5.m1.3b"><apply id="S2.Ex5.m1.3.3.cmml" xref="S2.Ex5.m1.3.3"><csymbol cd="latexml" id="S2.Ex5.m1.3.3.3.cmml" xref="S2.Ex5.m1.3.3.3">similar-to</csymbol><apply id="S2.Ex5.m1.3.3.4.cmml" xref="S2.Ex5.m1.3.3.4"><csymbol cd="ambiguous" id="S2.Ex5.m1.3.3.4.1.cmml" xref="S2.Ex5.m1.3.3.4">subscript</csymbol><ci id="S2.Ex5.m1.3.3.4.2.cmml" xref="S2.Ex5.m1.3.3.4.2">𝜃</ci><ci id="S2.Ex5.m1.3.3.4.3.cmml" xref="S2.Ex5.m1.3.3.4.3">ℎ</ci></apply><apply id="S2.Ex5.m1.3.3.2.3.cmml" xref="S2.Ex5.m1.3.3.2.2"><ci id="S2.Ex5.m1.1.1.cmml" xref="S2.Ex5.m1.1.1">Uniform</ci><apply id="S2.Ex5.m1.2.2.1.1.1.1.cmml" xref="S2.Ex5.m1.2.2.1.1.1.1"><csymbol cd="ambiguous" id="S2.Ex5.m1.2.2.1.1.1.1.1.cmml" xref="S2.Ex5.m1.2.2.1.1.1.1">superscript</csymbol><apply id="S2.Ex5.m1.2.2.1.1.1.1.2.cmml" xref="S2.Ex5.m1.2.2.1.1.1.1"><csymbol cd="ambiguous" id="S2.Ex5.m1.2.2.1.1.1.1.2.1.cmml" xref="S2.Ex5.m1.2.2.1.1.1.1">subscript</csymbol><ci id="S2.Ex5.m1.2.2.1.1.1.1.2.2.cmml" xref="S2.Ex5.m1.2.2.1.1.1.1.2.2">𝜃</ci><ci id="S2.Ex5.m1.2.2.1.1.1.1.2.3.cmml" xref="S2.Ex5.m1.2.2.1.1.1.1.2.3">ℎ</ci></apply><min id="S2.Ex5.m1.2.2.1.1.1.1.3.cmml" xref="S2.Ex5.m1.2.2.1.1.1.1.3"></min></apply><apply id="S2.Ex5.m1.3.3.2.2.2.2.cmml" xref="S2.Ex5.m1.3.3.2.2.2.2"><csymbol cd="ambiguous" id="S2.Ex5.m1.3.3.2.2.2.2.1.cmml" xref="S2.Ex5.m1.3.3.2.2.2.2">superscript</csymbol><apply id="S2.Ex5.m1.3.3.2.2.2.2.2.cmml" xref="S2.Ex5.m1.3.3.2.2.2.2"><csymbol cd="ambiguous" id="S2.Ex5.m1.3.3.2.2.2.2.2.1.cmml" xref="S2.Ex5.m1.3.3.2.2.2.2">subscript</csymbol><ci id="S2.Ex5.m1.3.3.2.2.2.2.2.2.cmml" xref="S2.Ex5.m1.3.3.2.2.2.2.2.2">𝜃</ci><ci id="S2.Ex5.m1.3.3.2.2.2.2.2.3.cmml" xref="S2.Ex5.m1.3.3.2.2.2.2.2.3">ℎ</ci></apply><max id="S2.Ex5.m1.3.3.2.2.2.2.3.cmml" xref="S2.Ex5.m1.3.3.2.2.2.2.3"></max></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.Ex5.m1.3c">\theta_{h}\sim\operatorname{Uniform}(\theta_{h}^{\min},\;\theta_{h}^{\max})</annotation><annotation encoding="application/x-llamapun" id="S2.Ex5.m1.3d">italic_θ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∼ roman_Uniform ( italic_θ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_max end_POSTSUPERSCRIPT )</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> </tr></tbody> </table> </div> <div class="ltx_listingline" id="alg1.l9"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l9.1.1.1" style="font-size:80%;">9:</span></span> <span class="ltx_text ltx_font_bold" id="alg1.l9.2">Calculate the undershot angles for each speaker:</span> <table class="ltx_equation ltx_eqn_table" id="S2.Ex6"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="|\theta_{u1}|=|\theta_{h}-\theta_{S_{1}}|" class="ltx_Math" display="block" id="S2.Ex6.m1.2"><semantics id="S2.Ex6.m1.2a"><mrow id="S2.Ex6.m1.2.2" xref="S2.Ex6.m1.2.2.cmml"><mrow id="S2.Ex6.m1.1.1.1.1" xref="S2.Ex6.m1.1.1.1.2.cmml"><mo id="S2.Ex6.m1.1.1.1.1.2" stretchy="false" xref="S2.Ex6.m1.1.1.1.2.1.cmml">|</mo><msub id="S2.Ex6.m1.1.1.1.1.1" xref="S2.Ex6.m1.1.1.1.1.1.cmml"><mi id="S2.Ex6.m1.1.1.1.1.1.2" xref="S2.Ex6.m1.1.1.1.1.1.2.cmml">θ</mi><mrow id="S2.Ex6.m1.1.1.1.1.1.3" xref="S2.Ex6.m1.1.1.1.1.1.3.cmml"><mi id="S2.Ex6.m1.1.1.1.1.1.3.2" xref="S2.Ex6.m1.1.1.1.1.1.3.2.cmml">u</mi><mo id="S2.Ex6.m1.1.1.1.1.1.3.1" xref="S2.Ex6.m1.1.1.1.1.1.3.1.cmml">⁢</mo><mn id="S2.Ex6.m1.1.1.1.1.1.3.3" xref="S2.Ex6.m1.1.1.1.1.1.3.3.cmml">1</mn></mrow></msub><mo id="S2.Ex6.m1.1.1.1.1.3" stretchy="false" xref="S2.Ex6.m1.1.1.1.2.1.cmml">|</mo></mrow><mo id="S2.Ex6.m1.2.2.3" xref="S2.Ex6.m1.2.2.3.cmml">=</mo><mrow id="S2.Ex6.m1.2.2.2.1" xref="S2.Ex6.m1.2.2.2.2.cmml"><mo id="S2.Ex6.m1.2.2.2.1.2" stretchy="false" xref="S2.Ex6.m1.2.2.2.2.1.cmml">|</mo><mrow id="S2.Ex6.m1.2.2.2.1.1" xref="S2.Ex6.m1.2.2.2.1.1.cmml"><msub id="S2.Ex6.m1.2.2.2.1.1.2" xref="S2.Ex6.m1.2.2.2.1.1.2.cmml"><mi id="S2.Ex6.m1.2.2.2.1.1.2.2" xref="S2.Ex6.m1.2.2.2.1.1.2.2.cmml">θ</mi><mi id="S2.Ex6.m1.2.2.2.1.1.2.3" xref="S2.Ex6.m1.2.2.2.1.1.2.3.cmml">h</mi></msub><mo id="S2.Ex6.m1.2.2.2.1.1.1" xref="S2.Ex6.m1.2.2.2.1.1.1.cmml">−</mo><msub id="S2.Ex6.m1.2.2.2.1.1.3" xref="S2.Ex6.m1.2.2.2.1.1.3.cmml"><mi id="S2.Ex6.m1.2.2.2.1.1.3.2" xref="S2.Ex6.m1.2.2.2.1.1.3.2.cmml">θ</mi><msub id="S2.Ex6.m1.2.2.2.1.1.3.3" xref="S2.Ex6.m1.2.2.2.1.1.3.3.cmml"><mi id="S2.Ex6.m1.2.2.2.1.1.3.3.2" xref="S2.Ex6.m1.2.2.2.1.1.3.3.2.cmml">S</mi><mn id="S2.Ex6.m1.2.2.2.1.1.3.3.3" xref="S2.Ex6.m1.2.2.2.1.1.3.3.3.cmml">1</mn></msub></msub></mrow><mo id="S2.Ex6.m1.2.2.2.1.3" stretchy="false" xref="S2.Ex6.m1.2.2.2.2.1.cmml">|</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.Ex6.m1.2b"><apply id="S2.Ex6.m1.2.2.cmml" xref="S2.Ex6.m1.2.2"><eq id="S2.Ex6.m1.2.2.3.cmml" xref="S2.Ex6.m1.2.2.3"></eq><apply id="S2.Ex6.m1.1.1.1.2.cmml" xref="S2.Ex6.m1.1.1.1.1"><abs id="S2.Ex6.m1.1.1.1.2.1.cmml" xref="S2.Ex6.m1.1.1.1.1.2"></abs><apply id="S2.Ex6.m1.1.1.1.1.1.cmml" xref="S2.Ex6.m1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.Ex6.m1.1.1.1.1.1.1.cmml" xref="S2.Ex6.m1.1.1.1.1.1">subscript</csymbol><ci id="S2.Ex6.m1.1.1.1.1.1.2.cmml" xref="S2.Ex6.m1.1.1.1.1.1.2">𝜃</ci><apply id="S2.Ex6.m1.1.1.1.1.1.3.cmml" xref="S2.Ex6.m1.1.1.1.1.1.3"><times id="S2.Ex6.m1.1.1.1.1.1.3.1.cmml" xref="S2.Ex6.m1.1.1.1.1.1.3.1"></times><ci id="S2.Ex6.m1.1.1.1.1.1.3.2.cmml" xref="S2.Ex6.m1.1.1.1.1.1.3.2">𝑢</ci><cn id="S2.Ex6.m1.1.1.1.1.1.3.3.cmml" type="integer" xref="S2.Ex6.m1.1.1.1.1.1.3.3">1</cn></apply></apply></apply><apply id="S2.Ex6.m1.2.2.2.2.cmml" xref="S2.Ex6.m1.2.2.2.1"><abs id="S2.Ex6.m1.2.2.2.2.1.cmml" xref="S2.Ex6.m1.2.2.2.1.2"></abs><apply id="S2.Ex6.m1.2.2.2.1.1.cmml" xref="S2.Ex6.m1.2.2.2.1.1"><minus id="S2.Ex6.m1.2.2.2.1.1.1.cmml" xref="S2.Ex6.m1.2.2.2.1.1.1"></minus><apply id="S2.Ex6.m1.2.2.2.1.1.2.cmml" xref="S2.Ex6.m1.2.2.2.1.1.2"><csymbol cd="ambiguous" id="S2.Ex6.m1.2.2.2.1.1.2.1.cmml" xref="S2.Ex6.m1.2.2.2.1.1.2">subscript</csymbol><ci id="S2.Ex6.m1.2.2.2.1.1.2.2.cmml" xref="S2.Ex6.m1.2.2.2.1.1.2.2">𝜃</ci><ci id="S2.Ex6.m1.2.2.2.1.1.2.3.cmml" xref="S2.Ex6.m1.2.2.2.1.1.2.3">ℎ</ci></apply><apply id="S2.Ex6.m1.2.2.2.1.1.3.cmml" xref="S2.Ex6.m1.2.2.2.1.1.3"><csymbol cd="ambiguous" id="S2.Ex6.m1.2.2.2.1.1.3.1.cmml" xref="S2.Ex6.m1.2.2.2.1.1.3">subscript</csymbol><ci id="S2.Ex6.m1.2.2.2.1.1.3.2.cmml" xref="S2.Ex6.m1.2.2.2.1.1.3.2">𝜃</ci><apply id="S2.Ex6.m1.2.2.2.1.1.3.3.cmml" xref="S2.Ex6.m1.2.2.2.1.1.3.3"><csymbol cd="ambiguous" id="S2.Ex6.m1.2.2.2.1.1.3.3.1.cmml" xref="S2.Ex6.m1.2.2.2.1.1.3.3">subscript</csymbol><ci id="S2.Ex6.m1.2.2.2.1.1.3.3.2.cmml" xref="S2.Ex6.m1.2.2.2.1.1.3.3.2">𝑆</ci><cn id="S2.Ex6.m1.2.2.2.1.1.3.3.3.cmml" type="integer" xref="S2.Ex6.m1.2.2.2.1.1.3.3.3">1</cn></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.Ex6.m1.2c">|\theta_{u1}|=|\theta_{h}-\theta_{S_{1}}|</annotation><annotation encoding="application/x-llamapun" id="S2.Ex6.m1.2d">| italic_θ start_POSTSUBSCRIPT italic_u 1 end_POSTSUBSCRIPT | = | italic_θ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT |</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> </tr></tbody> </table> <table class="ltx_equation ltx_eqn_table" id="S2.Ex7"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="|\theta_{u2}|=|\theta_{h}-\theta_{S_{2}}|" class="ltx_Math" display="block" id="S2.Ex7.m1.2"><semantics id="S2.Ex7.m1.2a"><mrow id="S2.Ex7.m1.2.2" xref="S2.Ex7.m1.2.2.cmml"><mrow id="S2.Ex7.m1.1.1.1.1" xref="S2.Ex7.m1.1.1.1.2.cmml"><mo id="S2.Ex7.m1.1.1.1.1.2" stretchy="false" xref="S2.Ex7.m1.1.1.1.2.1.cmml">|</mo><msub id="S2.Ex7.m1.1.1.1.1.1" xref="S2.Ex7.m1.1.1.1.1.1.cmml"><mi id="S2.Ex7.m1.1.1.1.1.1.2" xref="S2.Ex7.m1.1.1.1.1.1.2.cmml">θ</mi><mrow id="S2.Ex7.m1.1.1.1.1.1.3" xref="S2.Ex7.m1.1.1.1.1.1.3.cmml"><mi id="S2.Ex7.m1.1.1.1.1.1.3.2" xref="S2.Ex7.m1.1.1.1.1.1.3.2.cmml">u</mi><mo id="S2.Ex7.m1.1.1.1.1.1.3.1" xref="S2.Ex7.m1.1.1.1.1.1.3.1.cmml">⁢</mo><mn id="S2.Ex7.m1.1.1.1.1.1.3.3" xref="S2.Ex7.m1.1.1.1.1.1.3.3.cmml">2</mn></mrow></msub><mo id="S2.Ex7.m1.1.1.1.1.3" stretchy="false" xref="S2.Ex7.m1.1.1.1.2.1.cmml">|</mo></mrow><mo id="S2.Ex7.m1.2.2.3" xref="S2.Ex7.m1.2.2.3.cmml">=</mo><mrow id="S2.Ex7.m1.2.2.2.1" xref="S2.Ex7.m1.2.2.2.2.cmml"><mo id="S2.Ex7.m1.2.2.2.1.2" stretchy="false" xref="S2.Ex7.m1.2.2.2.2.1.cmml">|</mo><mrow id="S2.Ex7.m1.2.2.2.1.1" xref="S2.Ex7.m1.2.2.2.1.1.cmml"><msub id="S2.Ex7.m1.2.2.2.1.1.2" xref="S2.Ex7.m1.2.2.2.1.1.2.cmml"><mi id="S2.Ex7.m1.2.2.2.1.1.2.2" xref="S2.Ex7.m1.2.2.2.1.1.2.2.cmml">θ</mi><mi id="S2.Ex7.m1.2.2.2.1.1.2.3" xref="S2.Ex7.m1.2.2.2.1.1.2.3.cmml">h</mi></msub><mo id="S2.Ex7.m1.2.2.2.1.1.1" xref="S2.Ex7.m1.2.2.2.1.1.1.cmml">−</mo><msub id="S2.Ex7.m1.2.2.2.1.1.3" xref="S2.Ex7.m1.2.2.2.1.1.3.cmml"><mi id="S2.Ex7.m1.2.2.2.1.1.3.2" xref="S2.Ex7.m1.2.2.2.1.1.3.2.cmml">θ</mi><msub id="S2.Ex7.m1.2.2.2.1.1.3.3" xref="S2.Ex7.m1.2.2.2.1.1.3.3.cmml"><mi id="S2.Ex7.m1.2.2.2.1.1.3.3.2" xref="S2.Ex7.m1.2.2.2.1.1.3.3.2.cmml">S</mi><mn id="S2.Ex7.m1.2.2.2.1.1.3.3.3" xref="S2.Ex7.m1.2.2.2.1.1.3.3.3.cmml">2</mn></msub></msub></mrow><mo id="S2.Ex7.m1.2.2.2.1.3" stretchy="false" xref="S2.Ex7.m1.2.2.2.2.1.cmml">|</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.Ex7.m1.2b"><apply id="S2.Ex7.m1.2.2.cmml" xref="S2.Ex7.m1.2.2"><eq id="S2.Ex7.m1.2.2.3.cmml" xref="S2.Ex7.m1.2.2.3"></eq><apply id="S2.Ex7.m1.1.1.1.2.cmml" xref="S2.Ex7.m1.1.1.1.1"><abs id="S2.Ex7.m1.1.1.1.2.1.cmml" xref="S2.Ex7.m1.1.1.1.1.2"></abs><apply id="S2.Ex7.m1.1.1.1.1.1.cmml" xref="S2.Ex7.m1.1.1.1.1.1"><csymbol cd="ambiguous" id="S2.Ex7.m1.1.1.1.1.1.1.cmml" xref="S2.Ex7.m1.1.1.1.1.1">subscript</csymbol><ci id="S2.Ex7.m1.1.1.1.1.1.2.cmml" xref="S2.Ex7.m1.1.1.1.1.1.2">𝜃</ci><apply id="S2.Ex7.m1.1.1.1.1.1.3.cmml" xref="S2.Ex7.m1.1.1.1.1.1.3"><times id="S2.Ex7.m1.1.1.1.1.1.3.1.cmml" xref="S2.Ex7.m1.1.1.1.1.1.3.1"></times><ci id="S2.Ex7.m1.1.1.1.1.1.3.2.cmml" xref="S2.Ex7.m1.1.1.1.1.1.3.2">𝑢</ci><cn id="S2.Ex7.m1.1.1.1.1.1.3.3.cmml" type="integer" xref="S2.Ex7.m1.1.1.1.1.1.3.3">2</cn></apply></apply></apply><apply id="S2.Ex7.m1.2.2.2.2.cmml" xref="S2.Ex7.m1.2.2.2.1"><abs id="S2.Ex7.m1.2.2.2.2.1.cmml" xref="S2.Ex7.m1.2.2.2.1.2"></abs><apply id="S2.Ex7.m1.2.2.2.1.1.cmml" xref="S2.Ex7.m1.2.2.2.1.1"><minus id="S2.Ex7.m1.2.2.2.1.1.1.cmml" xref="S2.Ex7.m1.2.2.2.1.1.1"></minus><apply id="S2.Ex7.m1.2.2.2.1.1.2.cmml" xref="S2.Ex7.m1.2.2.2.1.1.2"><csymbol cd="ambiguous" id="S2.Ex7.m1.2.2.2.1.1.2.1.cmml" xref="S2.Ex7.m1.2.2.2.1.1.2">subscript</csymbol><ci id="S2.Ex7.m1.2.2.2.1.1.2.2.cmml" xref="S2.Ex7.m1.2.2.2.1.1.2.2">𝜃</ci><ci id="S2.Ex7.m1.2.2.2.1.1.2.3.cmml" xref="S2.Ex7.m1.2.2.2.1.1.2.3">ℎ</ci></apply><apply id="S2.Ex7.m1.2.2.2.1.1.3.cmml" xref="S2.Ex7.m1.2.2.2.1.1.3"><csymbol cd="ambiguous" id="S2.Ex7.m1.2.2.2.1.1.3.1.cmml" xref="S2.Ex7.m1.2.2.2.1.1.3">subscript</csymbol><ci id="S2.Ex7.m1.2.2.2.1.1.3.2.cmml" xref="S2.Ex7.m1.2.2.2.1.1.3.2">𝜃</ci><apply id="S2.Ex7.m1.2.2.2.1.1.3.3.cmml" xref="S2.Ex7.m1.2.2.2.1.1.3.3"><csymbol cd="ambiguous" id="S2.Ex7.m1.2.2.2.1.1.3.3.1.cmml" xref="S2.Ex7.m1.2.2.2.1.1.3.3">subscript</csymbol><ci id="S2.Ex7.m1.2.2.2.1.1.3.3.2.cmml" xref="S2.Ex7.m1.2.2.2.1.1.3.3.2">𝑆</ci><cn id="S2.Ex7.m1.2.2.2.1.1.3.3.3.cmml" type="integer" xref="S2.Ex7.m1.2.2.2.1.1.3.3.3">2</cn></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.Ex7.m1.2c">|\theta_{u2}|=|\theta_{h}-\theta_{S_{2}}|</annotation><annotation encoding="application/x-llamapun" id="S2.Ex7.m1.2d">| italic_θ start_POSTSUBSCRIPT italic_u 2 end_POSTSUBSCRIPT | = | italic_θ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT |</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> </tr></tbody> </table> </div> <div class="ltx_listingline" id="alg1.l10"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l10.1.1.1" style="font-size:80%;">10:</span></span> <span class="ltx_text ltx_font_bold" id="alg1.l10.2">Select the speaker:</span> </div> <div class="ltx_listingline" id="alg1.l11"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l11.1.1.1" style="font-size:80%;">11:</span></span> <span class="ltx_text ltx_font_bold" id="alg1.l11.2">if</span> <math alttext="|\theta_{u1}|<|\theta_{u2}|" class="ltx_Math" display="inline" id="alg1.l11.m1.2"><semantics id="alg1.l11.m1.2a"><mrow id="alg1.l11.m1.2.2" xref="alg1.l11.m1.2.2.cmml"><mrow id="alg1.l11.m1.1.1.1.1" xref="alg1.l11.m1.1.1.1.2.cmml"><mo id="alg1.l11.m1.1.1.1.1.2" stretchy="false" xref="alg1.l11.m1.1.1.1.2.1.cmml">|</mo><msub id="alg1.l11.m1.1.1.1.1.1" xref="alg1.l11.m1.1.1.1.1.1.cmml"><mi id="alg1.l11.m1.1.1.1.1.1.2" xref="alg1.l11.m1.1.1.1.1.1.2.cmml">θ</mi><mrow id="alg1.l11.m1.1.1.1.1.1.3" xref="alg1.l11.m1.1.1.1.1.1.3.cmml"><mi id="alg1.l11.m1.1.1.1.1.1.3.2" xref="alg1.l11.m1.1.1.1.1.1.3.2.cmml">u</mi><mo id="alg1.l11.m1.1.1.1.1.1.3.1" xref="alg1.l11.m1.1.1.1.1.1.3.1.cmml">⁢</mo><mn id="alg1.l11.m1.1.1.1.1.1.3.3" xref="alg1.l11.m1.1.1.1.1.1.3.3.cmml">1</mn></mrow></msub><mo id="alg1.l11.m1.1.1.1.1.3" stretchy="false" xref="alg1.l11.m1.1.1.1.2.1.cmml">|</mo></mrow><mo id="alg1.l11.m1.2.2.3" xref="alg1.l11.m1.2.2.3.cmml"><</mo><mrow id="alg1.l11.m1.2.2.2.1" xref="alg1.l11.m1.2.2.2.2.cmml"><mo id="alg1.l11.m1.2.2.2.1.2" stretchy="false" xref="alg1.l11.m1.2.2.2.2.1.cmml">|</mo><msub id="alg1.l11.m1.2.2.2.1.1" xref="alg1.l11.m1.2.2.2.1.1.cmml"><mi id="alg1.l11.m1.2.2.2.1.1.2" xref="alg1.l11.m1.2.2.2.1.1.2.cmml">θ</mi><mrow id="alg1.l11.m1.2.2.2.1.1.3" xref="alg1.l11.m1.2.2.2.1.1.3.cmml"><mi id="alg1.l11.m1.2.2.2.1.1.3.2" xref="alg1.l11.m1.2.2.2.1.1.3.2.cmml">u</mi><mo id="alg1.l11.m1.2.2.2.1.1.3.1" xref="alg1.l11.m1.2.2.2.1.1.3.1.cmml">⁢</mo><mn id="alg1.l11.m1.2.2.2.1.1.3.3" xref="alg1.l11.m1.2.2.2.1.1.3.3.cmml">2</mn></mrow></msub><mo id="alg1.l11.m1.2.2.2.1.3" stretchy="false" xref="alg1.l11.m1.2.2.2.2.1.cmml">|</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="alg1.l11.m1.2b"><apply id="alg1.l11.m1.2.2.cmml" xref="alg1.l11.m1.2.2"><lt id="alg1.l11.m1.2.2.3.cmml" xref="alg1.l11.m1.2.2.3"></lt><apply id="alg1.l11.m1.1.1.1.2.cmml" xref="alg1.l11.m1.1.1.1.1"><abs id="alg1.l11.m1.1.1.1.2.1.cmml" xref="alg1.l11.m1.1.1.1.1.2"></abs><apply id="alg1.l11.m1.1.1.1.1.1.cmml" xref="alg1.l11.m1.1.1.1.1.1"><csymbol cd="ambiguous" id="alg1.l11.m1.1.1.1.1.1.1.cmml" xref="alg1.l11.m1.1.1.1.1.1">subscript</csymbol><ci id="alg1.l11.m1.1.1.1.1.1.2.cmml" xref="alg1.l11.m1.1.1.1.1.1.2">𝜃</ci><apply id="alg1.l11.m1.1.1.1.1.1.3.cmml" xref="alg1.l11.m1.1.1.1.1.1.3"><times id="alg1.l11.m1.1.1.1.1.1.3.1.cmml" xref="alg1.l11.m1.1.1.1.1.1.3.1"></times><ci id="alg1.l11.m1.1.1.1.1.1.3.2.cmml" xref="alg1.l11.m1.1.1.1.1.1.3.2">𝑢</ci><cn id="alg1.l11.m1.1.1.1.1.1.3.3.cmml" type="integer" xref="alg1.l11.m1.1.1.1.1.1.3.3">1</cn></apply></apply></apply><apply id="alg1.l11.m1.2.2.2.2.cmml" xref="alg1.l11.m1.2.2.2.1"><abs id="alg1.l11.m1.2.2.2.2.1.cmml" xref="alg1.l11.m1.2.2.2.1.2"></abs><apply id="alg1.l11.m1.2.2.2.1.1.cmml" xref="alg1.l11.m1.2.2.2.1.1"><csymbol cd="ambiguous" id="alg1.l11.m1.2.2.2.1.1.1.cmml" xref="alg1.l11.m1.2.2.2.1.1">subscript</csymbol><ci id="alg1.l11.m1.2.2.2.1.1.2.cmml" xref="alg1.l11.m1.2.2.2.1.1.2">𝜃</ci><apply id="alg1.l11.m1.2.2.2.1.1.3.cmml" xref="alg1.l11.m1.2.2.2.1.1.3"><times id="alg1.l11.m1.2.2.2.1.1.3.1.cmml" xref="alg1.l11.m1.2.2.2.1.1.3.1"></times><ci id="alg1.l11.m1.2.2.2.1.1.3.2.cmml" xref="alg1.l11.m1.2.2.2.1.1.3.2">𝑢</ci><cn id="alg1.l11.m1.2.2.2.1.1.3.3.cmml" type="integer" xref="alg1.l11.m1.2.2.2.1.1.3.3">2</cn></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="alg1.l11.m1.2c">|\theta_{u1}|<|\theta_{u2}|</annotation><annotation encoding="application/x-llamapun" id="alg1.l11.m1.2d">| italic_θ start_POSTSUBSCRIPT italic_u 1 end_POSTSUBSCRIPT | < | italic_θ start_POSTSUBSCRIPT italic_u 2 end_POSTSUBSCRIPT |</annotation></semantics></math> <span class="ltx_text ltx_font_bold" id="alg1.l11.3">then</span> </div> <div class="ltx_listingline" id="alg1.l12"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l12.2.1.1" style="font-size:80%;">12:</span></span> <span class="ltx_text ltx_font_bold" id="alg1.l12.3">Return</span> 1 <span class="ltx_text" id="alg1.l12.1" style="float:right;"><math alttext="\triangleright" class="ltx_Math" display="inline" id="alg1.l12.1.m1.1"><semantics id="alg1.l12.1.m1.1a"><mo id="alg1.l12.1.m1.1.1" xref="alg1.l12.1.m1.1.1.cmml">▷</mo><annotation-xml encoding="MathML-Content" id="alg1.l12.1.m1.1b"><ci id="alg1.l12.1.m1.1.1.cmml" xref="alg1.l12.1.m1.1.1">▷</ci></annotation-xml><annotation encoding="application/x-tex" id="alg1.l12.1.m1.1c">\triangleright</annotation><annotation encoding="application/x-llamapun" id="alg1.l12.1.m1.1d">▷</annotation></semantics></math> Speaker 1 is desired </span> </div> <div class="ltx_listingline" id="alg1.l13"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l13.1.1.1" style="font-size:80%;">13:</span></span> <span class="ltx_text ltx_font_bold" id="alg1.l13.2">else</span> </div> <div class="ltx_listingline" id="alg1.l14"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l14.2.1.1" style="font-size:80%;">14:</span></span> <span class="ltx_text ltx_font_bold" id="alg1.l14.3">Return</span> 2 <span class="ltx_text" id="alg1.l14.1" style="float:right;"><math alttext="\triangleright" class="ltx_Math" display="inline" id="alg1.l14.1.m1.1"><semantics id="alg1.l14.1.m1.1a"><mo id="alg1.l14.1.m1.1.1" xref="alg1.l14.1.m1.1.1.cmml">▷</mo><annotation-xml encoding="MathML-Content" id="alg1.l14.1.m1.1b"><ci id="alg1.l14.1.m1.1.1.cmml" xref="alg1.l14.1.m1.1.1">▷</ci></annotation-xml><annotation encoding="application/x-tex" id="alg1.l14.1.m1.1c">\triangleright</annotation><annotation encoding="application/x-llamapun" id="alg1.l14.1.m1.1d">▷</annotation></semantics></math> Speaker 2 is desired </span> </div> <div class="ltx_listingline" id="alg1.l15"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l15.1.1.1" style="font-size:80%;">15:</span></span> <span class="ltx_text ltx_font_bold" id="alg1.l15.2">end</span> <span class="ltx_text ltx_font_bold" id="alg1.l15.3">if</span> </div> <div class="ltx_listingline" id="alg1.l16"> <span class="ltx_tag ltx_tag_listingline"><span class="ltx_text" id="alg1.l16.1.1.1" style="font-size:80%;">16:</span></span><span class="ltx_text ltx_font_bold" id="alg1.l16.2">end</span> <span class="ltx_text ltx_font_bold" id="alg1.l16.3">procedure</span> </div> </div> </figure> </section> <section class="ltx_section" id="S3"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">III </span><span class="ltx_text ltx_font_smallcaps" id="S3.1.1">Speaker selection mechanism</span> </h2> <div class="ltx_para" id="S3.p1"> <p class="ltx_p" id="S3.p1.1">We propose a speaker selection mechanism (SSM) for allowing a neural network to learn which speaker is desired, and beamform toward it. This approach can be applied to a NN that is: estimating the steering vector of a classical beamforming algorithm, like the MVDR filter <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib7" title="">7</a>]</cite>; estimating the position of the target speaker <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib9" title="">9</a>]</cite>; estimating a time-frequency filter, which can be applied to the microphones’ outputs via filter-and-sum operation <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib6" title="">6</a>]</cite>; among other uses. In this work, we choose to validate the proposed mechanism with an end-to-end neural network that estimates a multi-channel time-frequency beamforming filter.</p> </div> <div class="ltx_para" id="S3.p2"> <p class="ltx_p" id="S3.p2.1">Our method is inspired on the findings of <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib3" title="">3</a>]</cite>, where it was noted that even though the eyes of the listener may follow precisely the speaker of interest, the head usually undershots the desired speaker. This is of crucial importance for applications where no visual or auxiliary cues are available, e.g., the case of most hearing aid devices.</p> </div> <div class="ltx_para" id="S3.p3"> <p class="ltx_p" id="S3.p3.1">Nevertheless, we assume that we have access to all microphones’ outputs in the array, and that the position of listener and speakers is known during training. The SSM works as follows. For each training utterance, we calculate the absolute value of the undershot angles for all speakers. We then identify the speaker that results in the smallest absolute undershot angle and set it as desired. Moreover, the desired speaker is used as a target for calculating the loss, during training, for that specific utterance. The target speaker in the loss function changes dynamically according to the smallest undershot angle. We consider the criteria of smallest undershot angle for changing desired speaker, but the movement of the head could be more explored, being out of scope for this paper. Alg. <a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#alg1" title="Algorithm 1 ‣ II Preliminaries ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_tag">1</span></a> details the speaker selection mechanism for an example situation of two speakers. Notice that, in inference mode, there is no need to provide any information regarding position. The NN trained with the proposed mechanism is able to beamform toward the desired speaker solely with audio information.</p> </div> <div class="ltx_para" id="S3.p4"> <p class="ltx_p" id="S3.p4.1">Differently from <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib10" title="">10</a>]</cite>, our approach does not require the listener to face the target at any moment, and only one neural network is used. Visual cues, e.g., as considered in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib7" title="">7</a>]</cite>, are not taken into account in our method since we assume that the neural network can obtain spatial information based only on audio features. Next, we describe the model and simulation framework for evaluating the proposed SSM.</p> </div> </section> <section class="ltx_section" id="S4"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">IV </span><span class="ltx_text ltx_font_smallcaps" id="S4.1.1">Model and simulation framework</span> </h2> <div class="ltx_para" id="S4.p1"> <p class="ltx_p" id="S4.p1.1">We evaluate the SSM with an end-to-end neural network beamforming system, as per Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S2.F2" title="Figure 2 ‣ II Preliminaries ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_tag">2</span></a>. A simulation environment outputs multi-microphone recordings, which are preprocessed and fed into the NN model in the time-frequency domain. The output of the neural network consists of a complex multi-channel filter <math alttext="H_{m}(t,f),\ \forall m\in[1,\ ...,\ M]" class="ltx_Math" display="inline" id="S4.p1.1.m1.7"><semantics id="S4.p1.1.m1.7a"><mrow id="S4.p1.1.m1.7.7" xref="S4.p1.1.m1.7.7.cmml"><mrow id="S4.p1.1.m1.7.7.2.2" xref="S4.p1.1.m1.7.7.2.3.cmml"><mrow id="S4.p1.1.m1.6.6.1.1.1" xref="S4.p1.1.m1.6.6.1.1.1.cmml"><msub id="S4.p1.1.m1.6.6.1.1.1.2" xref="S4.p1.1.m1.6.6.1.1.1.2.cmml"><mi id="S4.p1.1.m1.6.6.1.1.1.2.2" xref="S4.p1.1.m1.6.6.1.1.1.2.2.cmml">H</mi><mi id="S4.p1.1.m1.6.6.1.1.1.2.3" xref="S4.p1.1.m1.6.6.1.1.1.2.3.cmml">m</mi></msub><mo id="S4.p1.1.m1.6.6.1.1.1.1" xref="S4.p1.1.m1.6.6.1.1.1.1.cmml">⁢</mo><mrow id="S4.p1.1.m1.6.6.1.1.1.3.2" xref="S4.p1.1.m1.6.6.1.1.1.3.1.cmml"><mo id="S4.p1.1.m1.6.6.1.1.1.3.2.1" stretchy="false" xref="S4.p1.1.m1.6.6.1.1.1.3.1.cmml">(</mo><mi id="S4.p1.1.m1.1.1" xref="S4.p1.1.m1.1.1.cmml">t</mi><mo id="S4.p1.1.m1.6.6.1.1.1.3.2.2" xref="S4.p1.1.m1.6.6.1.1.1.3.1.cmml">,</mo><mi id="S4.p1.1.m1.2.2" xref="S4.p1.1.m1.2.2.cmml">f</mi><mo id="S4.p1.1.m1.6.6.1.1.1.3.2.3" stretchy="false" xref="S4.p1.1.m1.6.6.1.1.1.3.1.cmml">)</mo></mrow></mrow><mo id="S4.p1.1.m1.7.7.2.2.3" rspace="0.667em" xref="S4.p1.1.m1.7.7.2.3.cmml">,</mo><mrow id="S4.p1.1.m1.7.7.2.2.2" xref="S4.p1.1.m1.7.7.2.2.2.cmml"><mo id="S4.p1.1.m1.7.7.2.2.2.1" rspace="0.167em" xref="S4.p1.1.m1.7.7.2.2.2.1.cmml">∀</mo><mi id="S4.p1.1.m1.7.7.2.2.2.2" xref="S4.p1.1.m1.7.7.2.2.2.2.cmml">m</mi></mrow></mrow><mo id="S4.p1.1.m1.7.7.3" xref="S4.p1.1.m1.7.7.3.cmml">∈</mo><mrow id="S4.p1.1.m1.7.7.4.2" xref="S4.p1.1.m1.7.7.4.1.cmml"><mo id="S4.p1.1.m1.7.7.4.2.1" stretchy="false" xref="S4.p1.1.m1.7.7.4.1.cmml">[</mo><mn id="S4.p1.1.m1.3.3" xref="S4.p1.1.m1.3.3.cmml">1</mn><mo id="S4.p1.1.m1.7.7.4.2.2" rspace="0.667em" xref="S4.p1.1.m1.7.7.4.1.cmml">,</mo><mi id="S4.p1.1.m1.4.4" mathvariant="normal" xref="S4.p1.1.m1.4.4.cmml">…</mi><mo id="S4.p1.1.m1.7.7.4.2.3" rspace="0.667em" xref="S4.p1.1.m1.7.7.4.1.cmml">,</mo><mi id="S4.p1.1.m1.5.5" xref="S4.p1.1.m1.5.5.cmml">M</mi><mo id="S4.p1.1.m1.7.7.4.2.4" stretchy="false" xref="S4.p1.1.m1.7.7.4.1.cmml">]</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S4.p1.1.m1.7b"><apply id="S4.p1.1.m1.7.7.cmml" xref="S4.p1.1.m1.7.7"><in id="S4.p1.1.m1.7.7.3.cmml" xref="S4.p1.1.m1.7.7.3"></in><list id="S4.p1.1.m1.7.7.2.3.cmml" xref="S4.p1.1.m1.7.7.2.2"><apply id="S4.p1.1.m1.6.6.1.1.1.cmml" xref="S4.p1.1.m1.6.6.1.1.1"><times id="S4.p1.1.m1.6.6.1.1.1.1.cmml" xref="S4.p1.1.m1.6.6.1.1.1.1"></times><apply id="S4.p1.1.m1.6.6.1.1.1.2.cmml" xref="S4.p1.1.m1.6.6.1.1.1.2"><csymbol cd="ambiguous" id="S4.p1.1.m1.6.6.1.1.1.2.1.cmml" xref="S4.p1.1.m1.6.6.1.1.1.2">subscript</csymbol><ci id="S4.p1.1.m1.6.6.1.1.1.2.2.cmml" xref="S4.p1.1.m1.6.6.1.1.1.2.2">𝐻</ci><ci id="S4.p1.1.m1.6.6.1.1.1.2.3.cmml" xref="S4.p1.1.m1.6.6.1.1.1.2.3">𝑚</ci></apply><interval closure="open" id="S4.p1.1.m1.6.6.1.1.1.3.1.cmml" xref="S4.p1.1.m1.6.6.1.1.1.3.2"><ci id="S4.p1.1.m1.1.1.cmml" xref="S4.p1.1.m1.1.1">𝑡</ci><ci id="S4.p1.1.m1.2.2.cmml" xref="S4.p1.1.m1.2.2">𝑓</ci></interval></apply><apply id="S4.p1.1.m1.7.7.2.2.2.cmml" xref="S4.p1.1.m1.7.7.2.2.2"><csymbol cd="latexml" id="S4.p1.1.m1.7.7.2.2.2.1.cmml" xref="S4.p1.1.m1.7.7.2.2.2.1">for-all</csymbol><ci id="S4.p1.1.m1.7.7.2.2.2.2.cmml" xref="S4.p1.1.m1.7.7.2.2.2.2">𝑚</ci></apply></list><list id="S4.p1.1.m1.7.7.4.1.cmml" xref="S4.p1.1.m1.7.7.4.2"><cn id="S4.p1.1.m1.3.3.cmml" type="integer" xref="S4.p1.1.m1.3.3">1</cn><ci id="S4.p1.1.m1.4.4.cmml" xref="S4.p1.1.m1.4.4">…</ci><ci id="S4.p1.1.m1.5.5.cmml" xref="S4.p1.1.m1.5.5">𝑀</ci></list></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.p1.1.m1.7c">H_{m}(t,f),\ \forall m\in[1,\ ...,\ M]</annotation><annotation encoding="application/x-llamapun" id="S4.p1.1.m1.7d">italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t , italic_f ) , ∀ italic_m ∈ [ 1 , … , italic_M ]</annotation></semantics></math>, applied to the microphone recordings with a filter-and-sum operation, as</p> <table class="ltx_equation ltx_eqn_table" id="S4.E2"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="\hat{S}_{d}(t,f)=\sum_{m}Y_{m}(t,f)\cdot H_{m}(t,f)." class="ltx_Math" display="block" id="S4.E2.m1.7"><semantics id="S4.E2.m1.7a"><mrow id="S4.E2.m1.7.7.1" xref="S4.E2.m1.7.7.1.1.cmml"><mrow id="S4.E2.m1.7.7.1.1" xref="S4.E2.m1.7.7.1.1.cmml"><mrow id="S4.E2.m1.7.7.1.1.2" xref="S4.E2.m1.7.7.1.1.2.cmml"><msub id="S4.E2.m1.7.7.1.1.2.2" xref="S4.E2.m1.7.7.1.1.2.2.cmml"><mover accent="true" id="S4.E2.m1.7.7.1.1.2.2.2" xref="S4.E2.m1.7.7.1.1.2.2.2.cmml"><mi id="S4.E2.m1.7.7.1.1.2.2.2.2" xref="S4.E2.m1.7.7.1.1.2.2.2.2.cmml">S</mi><mo id="S4.E2.m1.7.7.1.1.2.2.2.1" xref="S4.E2.m1.7.7.1.1.2.2.2.1.cmml">^</mo></mover><mi id="S4.E2.m1.7.7.1.1.2.2.3" xref="S4.E2.m1.7.7.1.1.2.2.3.cmml">d</mi></msub><mo id="S4.E2.m1.7.7.1.1.2.1" xref="S4.E2.m1.7.7.1.1.2.1.cmml">⁢</mo><mrow id="S4.E2.m1.7.7.1.1.2.3.2" xref="S4.E2.m1.7.7.1.1.2.3.1.cmml"><mo id="S4.E2.m1.7.7.1.1.2.3.2.1" stretchy="false" xref="S4.E2.m1.7.7.1.1.2.3.1.cmml">(</mo><mi id="S4.E2.m1.1.1" xref="S4.E2.m1.1.1.cmml">t</mi><mo id="S4.E2.m1.7.7.1.1.2.3.2.2" xref="S4.E2.m1.7.7.1.1.2.3.1.cmml">,</mo><mi id="S4.E2.m1.2.2" xref="S4.E2.m1.2.2.cmml">f</mi><mo id="S4.E2.m1.7.7.1.1.2.3.2.3" stretchy="false" xref="S4.E2.m1.7.7.1.1.2.3.1.cmml">)</mo></mrow></mrow><mo id="S4.E2.m1.7.7.1.1.1" rspace="0.111em" xref="S4.E2.m1.7.7.1.1.1.cmml">=</mo><mrow id="S4.E2.m1.7.7.1.1.3" xref="S4.E2.m1.7.7.1.1.3.cmml"><munder id="S4.E2.m1.7.7.1.1.3.1" xref="S4.E2.m1.7.7.1.1.3.1.cmml"><mo id="S4.E2.m1.7.7.1.1.3.1.2" movablelimits="false" xref="S4.E2.m1.7.7.1.1.3.1.2.cmml">∑</mo><mi id="S4.E2.m1.7.7.1.1.3.1.3" xref="S4.E2.m1.7.7.1.1.3.1.3.cmml">m</mi></munder><mrow id="S4.E2.m1.7.7.1.1.3.2" xref="S4.E2.m1.7.7.1.1.3.2.cmml"><mrow id="S4.E2.m1.7.7.1.1.3.2.2" xref="S4.E2.m1.7.7.1.1.3.2.2.cmml"><mrow id="S4.E2.m1.7.7.1.1.3.2.2.2" xref="S4.E2.m1.7.7.1.1.3.2.2.2.cmml"><msub id="S4.E2.m1.7.7.1.1.3.2.2.2.2" xref="S4.E2.m1.7.7.1.1.3.2.2.2.2.cmml"><mi id="S4.E2.m1.7.7.1.1.3.2.2.2.2.2" xref="S4.E2.m1.7.7.1.1.3.2.2.2.2.2.cmml">Y</mi><mi id="S4.E2.m1.7.7.1.1.3.2.2.2.2.3" xref="S4.E2.m1.7.7.1.1.3.2.2.2.2.3.cmml">m</mi></msub><mo id="S4.E2.m1.7.7.1.1.3.2.2.2.1" xref="S4.E2.m1.7.7.1.1.3.2.2.2.1.cmml">⁢</mo><mrow id="S4.E2.m1.7.7.1.1.3.2.2.2.3.2" xref="S4.E2.m1.7.7.1.1.3.2.2.2.3.1.cmml"><mo id="S4.E2.m1.7.7.1.1.3.2.2.2.3.2.1" stretchy="false" xref="S4.E2.m1.7.7.1.1.3.2.2.2.3.1.cmml">(</mo><mi id="S4.E2.m1.3.3" xref="S4.E2.m1.3.3.cmml">t</mi><mo id="S4.E2.m1.7.7.1.1.3.2.2.2.3.2.2" xref="S4.E2.m1.7.7.1.1.3.2.2.2.3.1.cmml">,</mo><mi id="S4.E2.m1.4.4" xref="S4.E2.m1.4.4.cmml">f</mi><mo id="S4.E2.m1.7.7.1.1.3.2.2.2.3.2.3" rspace="0.055em" stretchy="false" xref="S4.E2.m1.7.7.1.1.3.2.2.2.3.1.cmml">)</mo></mrow></mrow><mo id="S4.E2.m1.7.7.1.1.3.2.2.1" rspace="0.222em" xref="S4.E2.m1.7.7.1.1.3.2.2.1.cmml">⋅</mo><msub id="S4.E2.m1.7.7.1.1.3.2.2.3" xref="S4.E2.m1.7.7.1.1.3.2.2.3.cmml"><mi id="S4.E2.m1.7.7.1.1.3.2.2.3.2" xref="S4.E2.m1.7.7.1.1.3.2.2.3.2.cmml">H</mi><mi id="S4.E2.m1.7.7.1.1.3.2.2.3.3" xref="S4.E2.m1.7.7.1.1.3.2.2.3.3.cmml">m</mi></msub></mrow><mo id="S4.E2.m1.7.7.1.1.3.2.1" xref="S4.E2.m1.7.7.1.1.3.2.1.cmml">⁢</mo><mrow id="S4.E2.m1.7.7.1.1.3.2.3.2" xref="S4.E2.m1.7.7.1.1.3.2.3.1.cmml"><mo id="S4.E2.m1.7.7.1.1.3.2.3.2.1" stretchy="false" xref="S4.E2.m1.7.7.1.1.3.2.3.1.cmml">(</mo><mi id="S4.E2.m1.5.5" xref="S4.E2.m1.5.5.cmml">t</mi><mo id="S4.E2.m1.7.7.1.1.3.2.3.2.2" xref="S4.E2.m1.7.7.1.1.3.2.3.1.cmml">,</mo><mi id="S4.E2.m1.6.6" xref="S4.E2.m1.6.6.cmml">f</mi><mo id="S4.E2.m1.7.7.1.1.3.2.3.2.3" stretchy="false" xref="S4.E2.m1.7.7.1.1.3.2.3.1.cmml">)</mo></mrow></mrow></mrow></mrow><mo id="S4.E2.m1.7.7.1.2" lspace="0em" xref="S4.E2.m1.7.7.1.1.cmml">.</mo></mrow><annotation-xml encoding="MathML-Content" id="S4.E2.m1.7b"><apply id="S4.E2.m1.7.7.1.1.cmml" xref="S4.E2.m1.7.7.1"><eq id="S4.E2.m1.7.7.1.1.1.cmml" xref="S4.E2.m1.7.7.1.1.1"></eq><apply id="S4.E2.m1.7.7.1.1.2.cmml" xref="S4.E2.m1.7.7.1.1.2"><times id="S4.E2.m1.7.7.1.1.2.1.cmml" xref="S4.E2.m1.7.7.1.1.2.1"></times><apply id="S4.E2.m1.7.7.1.1.2.2.cmml" xref="S4.E2.m1.7.7.1.1.2.2"><csymbol cd="ambiguous" id="S4.E2.m1.7.7.1.1.2.2.1.cmml" xref="S4.E2.m1.7.7.1.1.2.2">subscript</csymbol><apply id="S4.E2.m1.7.7.1.1.2.2.2.cmml" xref="S4.E2.m1.7.7.1.1.2.2.2"><ci id="S4.E2.m1.7.7.1.1.2.2.2.1.cmml" xref="S4.E2.m1.7.7.1.1.2.2.2.1">^</ci><ci id="S4.E2.m1.7.7.1.1.2.2.2.2.cmml" xref="S4.E2.m1.7.7.1.1.2.2.2.2">𝑆</ci></apply><ci id="S4.E2.m1.7.7.1.1.2.2.3.cmml" xref="S4.E2.m1.7.7.1.1.2.2.3">𝑑</ci></apply><interval closure="open" id="S4.E2.m1.7.7.1.1.2.3.1.cmml" xref="S4.E2.m1.7.7.1.1.2.3.2"><ci id="S4.E2.m1.1.1.cmml" xref="S4.E2.m1.1.1">𝑡</ci><ci id="S4.E2.m1.2.2.cmml" xref="S4.E2.m1.2.2">𝑓</ci></interval></apply><apply id="S4.E2.m1.7.7.1.1.3.cmml" xref="S4.E2.m1.7.7.1.1.3"><apply id="S4.E2.m1.7.7.1.1.3.1.cmml" xref="S4.E2.m1.7.7.1.1.3.1"><csymbol cd="ambiguous" id="S4.E2.m1.7.7.1.1.3.1.1.cmml" xref="S4.E2.m1.7.7.1.1.3.1">subscript</csymbol><sum id="S4.E2.m1.7.7.1.1.3.1.2.cmml" xref="S4.E2.m1.7.7.1.1.3.1.2"></sum><ci id="S4.E2.m1.7.7.1.1.3.1.3.cmml" xref="S4.E2.m1.7.7.1.1.3.1.3">𝑚</ci></apply><apply id="S4.E2.m1.7.7.1.1.3.2.cmml" xref="S4.E2.m1.7.7.1.1.3.2"><times id="S4.E2.m1.7.7.1.1.3.2.1.cmml" xref="S4.E2.m1.7.7.1.1.3.2.1"></times><apply id="S4.E2.m1.7.7.1.1.3.2.2.cmml" xref="S4.E2.m1.7.7.1.1.3.2.2"><ci id="S4.E2.m1.7.7.1.1.3.2.2.1.cmml" xref="S4.E2.m1.7.7.1.1.3.2.2.1">⋅</ci><apply id="S4.E2.m1.7.7.1.1.3.2.2.2.cmml" xref="S4.E2.m1.7.7.1.1.3.2.2.2"><times id="S4.E2.m1.7.7.1.1.3.2.2.2.1.cmml" xref="S4.E2.m1.7.7.1.1.3.2.2.2.1"></times><apply id="S4.E2.m1.7.7.1.1.3.2.2.2.2.cmml" xref="S4.E2.m1.7.7.1.1.3.2.2.2.2"><csymbol cd="ambiguous" id="S4.E2.m1.7.7.1.1.3.2.2.2.2.1.cmml" xref="S4.E2.m1.7.7.1.1.3.2.2.2.2">subscript</csymbol><ci id="S4.E2.m1.7.7.1.1.3.2.2.2.2.2.cmml" xref="S4.E2.m1.7.7.1.1.3.2.2.2.2.2">𝑌</ci><ci id="S4.E2.m1.7.7.1.1.3.2.2.2.2.3.cmml" xref="S4.E2.m1.7.7.1.1.3.2.2.2.2.3">𝑚</ci></apply><interval closure="open" id="S4.E2.m1.7.7.1.1.3.2.2.2.3.1.cmml" xref="S4.E2.m1.7.7.1.1.3.2.2.2.3.2"><ci id="S4.E2.m1.3.3.cmml" xref="S4.E2.m1.3.3">𝑡</ci><ci id="S4.E2.m1.4.4.cmml" xref="S4.E2.m1.4.4">𝑓</ci></interval></apply><apply id="S4.E2.m1.7.7.1.1.3.2.2.3.cmml" xref="S4.E2.m1.7.7.1.1.3.2.2.3"><csymbol cd="ambiguous" id="S4.E2.m1.7.7.1.1.3.2.2.3.1.cmml" xref="S4.E2.m1.7.7.1.1.3.2.2.3">subscript</csymbol><ci id="S4.E2.m1.7.7.1.1.3.2.2.3.2.cmml" xref="S4.E2.m1.7.7.1.1.3.2.2.3.2">𝐻</ci><ci id="S4.E2.m1.7.7.1.1.3.2.2.3.3.cmml" xref="S4.E2.m1.7.7.1.1.3.2.2.3.3">𝑚</ci></apply></apply><interval closure="open" id="S4.E2.m1.7.7.1.1.3.2.3.1.cmml" xref="S4.E2.m1.7.7.1.1.3.2.3.2"><ci id="S4.E2.m1.5.5.cmml" xref="S4.E2.m1.5.5">𝑡</ci><ci id="S4.E2.m1.6.6.cmml" xref="S4.E2.m1.6.6">𝑓</ci></interval></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.E2.m1.7c">\hat{S}_{d}(t,f)=\sum_{m}Y_{m}(t,f)\cdot H_{m}(t,f).</annotation><annotation encoding="application/x-llamapun" id="S4.E2.m1.7d">over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_t , italic_f ) = ∑ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t , italic_f ) ⋅ italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t , italic_f ) .</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(2)</span></td> </tr></tbody> </table> <p class="ltx_p" id="S4.p1.2">The model description is given in the following.</p> </div> <section class="ltx_subsection" id="S4.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S4.SS1.5.1.1">IV-A</span> </span><span class="ltx_text ltx_font_italic" id="S4.SS1.6.2">Audio beamforming model</span> </h3> <div class="ltx_para" id="S4.SS1.p1"> <p class="ltx_p" id="S4.SS1.p1.1">We consider a NN-based beamforming approach with filter-and-sum, similar to <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib6" title="">6</a>]</cite>, but we simplify the model by using only real-valued operations, with a real-imaginary split at the input, concatenating both in the frequency axis. Consequentially, the output is recombined as a complex filter. Further on reducing the model’s complexity, the convolutions are defined only in the frequency axis, as we did not observe significant performance difference against kernels in both frequency and time axis. The NN model is depicted in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S4.F3" title="Figure 3 ‣ IV-A Audio beamforming model ‣ IV Model and simulation framework ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_tag">3</span></a>.</p> </div> <div class="ltx_para" id="S4.SS1.p2"> <p class="ltx_p" id="S4.SS1.p2.1">The model is trained to maximize the scale-invariant signal-to-distortion ratio (SI-SDR) of filtered microphone outputs in relation to the desired speaker’s reverberant speech at the reference microphone. Differently from scale-invariant signal-to-noise ratio used in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib6" title="">6</a>]</cite>, we consider the SI-SDR since it is a lower bound to both SDR and SNR <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib11" title="">11</a>]</cite>.</p> </div> <div class="ltx_para" id="S4.SS1.p3"> <p class="ltx_p" id="S4.SS1.p3.1">For comparison, we train the same model twice. First, trained with the SSM for speaker-aware beamforming. Second, without using the proposed mechanism, by always setting a random speaker as the desired target, creating a NN baseline for the task that we are aiming to solve – beamforming on multi-speaker scenarios with undershot angles. To the best of our knowledge, this is the first study to propose a solution for this task that neither requires the listener to face the speaker at any time nor relies on additional sensors. We also compare it to an ideal MVDR, always beamforming in the target direction. The MVDR approach is equivalent to the optimal case for when MVDR parameters are calculated with NNs, e.g., <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib7" title="">7</a>]</cite>.</p> </div> <figure class="ltx_figure" id="S4.F3"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="197" id="S4.F3.g1" src="x3.png" width="622"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S4.F3.10.5.1" style="font-size:90%;">Figure 3</span>: </span><span class="ltx_text" id="S4.F3.8.4" style="font-size:90%;">Schematic of the considered neural nerwork model. The number in each layer indicates output channels. The <math alttext="\mathrm{C}" class="ltx_Math" display="inline" id="S4.F3.5.1.m1.1"><semantics id="S4.F3.5.1.m1.1b"><mi id="S4.F3.5.1.m1.1.1" mathvariant="normal" xref="S4.F3.5.1.m1.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S4.F3.5.1.m1.1c"><ci id="S4.F3.5.1.m1.1.1.cmml" xref="S4.F3.5.1.m1.1.1">C</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.F3.5.1.m1.1d">\mathrm{C}</annotation><annotation encoding="application/x-llamapun" id="S4.F3.5.1.m1.1e">roman_C</annotation></semantics></math> encoder layers consist of Conv2D with BatchNorm2D and ReLU functions in all layers. An additional Tanh is applied to the encoder output to bound values, ensuring stable inputs for the recurrent layers. The <math alttext="\mathrm{G}" class="ltx_Math" display="inline" id="S4.F3.6.2.m2.1"><semantics id="S4.F3.6.2.m2.1b"><mi id="S4.F3.6.2.m2.1.1" mathvariant="normal" xref="S4.F3.6.2.m2.1.1.cmml">G</mi><annotation-xml encoding="MathML-Content" id="S4.F3.6.2.m2.1c"><ci id="S4.F3.6.2.m2.1.1.cmml" xref="S4.F3.6.2.m2.1.1">G</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.F3.6.2.m2.1d">\mathrm{G}</annotation><annotation encoding="application/x-llamapun" id="S4.F3.6.2.m2.1e">roman_G</annotation></semantics></math> layers are gate recurrent units (GRUs), and <math alttext="\mathrm{L}" class="ltx_Math" display="inline" id="S4.F3.7.3.m3.1"><semantics id="S4.F3.7.3.m3.1b"><mi id="S4.F3.7.3.m3.1.1" mathvariant="normal" xref="S4.F3.7.3.m3.1.1.cmml">L</mi><annotation-xml encoding="MathML-Content" id="S4.F3.7.3.m3.1c"><ci id="S4.F3.7.3.m3.1.1.cmml" xref="S4.F3.7.3.m3.1.1">L</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.F3.7.3.m3.1d">\mathrm{L}</annotation><annotation encoding="application/x-llamapun" id="S4.F3.7.3.m3.1e">roman_L</annotation></semantics></math> is a linear layer. The decoder layers are Conv2D.T, with BatchNorm2D and ReLU activation in all layers but the last, without normalization or activation. All <math alttext="\mathrm{C}" class="ltx_Math" display="inline" id="S4.F3.8.4.m4.1"><semantics id="S4.F3.8.4.m4.1b"><mi id="S4.F3.8.4.m4.1.1" mathvariant="normal" xref="S4.F3.8.4.m4.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S4.F3.8.4.m4.1c"><ci id="S4.F3.8.4.m4.1.1.cmml" xref="S4.F3.8.4.m4.1.1">C</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.F3.8.4.m4.1d">\mathrm{C}</annotation><annotation encoding="application/x-llamapun" id="S4.F3.8.4.m4.1e">roman_C</annotation></semantics></math> kernels are (8,1) with stride (2,1) and padding (3,0).</span></figcaption> </figure> </section> <section class="ltx_subsection" id="S4.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S4.SS2.5.1.1">IV-B</span> </span><span class="ltx_text ltx_font_italic" id="S4.SS2.6.2">Acoustic simulation setup</span> </h3> <div class="ltx_para" id="S4.SS2.p1"> <p class="ltx_p" id="S4.SS2.p1.1">For this evaluation, we simulate a reverberant room with four microphones and two speakers. First, the walls, floor and ceiling are defined, forming a rectangle-shaped room of size 5.15 x 3.75 x 2.65 m. Although the room size is fixed, the time it takes for sound pressure to reduce by 60 dB (T60) is defined over a variable range, assuring generality. The room impulse response (RIR) for each speaker in relation to each microphone is generated using gpuRIR <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib12" title="">12</a>]</cite>. We set up the simulation as described in the following.</p> </div> <div class="ltx_para" id="S4.SS2.p2"> <p class="ltx_p" id="S4.SS2.p2.1">The microphones are omnidirectional and are positioned similarly as in a hearing aid device wore by a person. First, the position of the listener is (randomly) defined, and we assume a radius equal to 0.15 m, which is similar to the head breadth of an adult person. Two groups of two microphones are positioned in the east-most point and equivalently at the west-most point. The microphones within each group are split from each other by 0.50 cm, and positioned at the same height, with the front left microphone taken as reference.</p> </div> <div class="ltx_para" id="S4.SS2.p3"> <p class="ltx_p" id="S4.SS2.p3.1">Moreover, the speakers are randomly positioned following a few constraints. The first constraint is that the speakers cannot be closer than 1.00 m from the listener, and they cannot be closer than 1.00 m from each other. At the moment of positioning, the absolute angle difference of both speakers in relation to the listener must be of at least 45 degrees, avoiding that a speaker would be too close or behind the other speaker. Both listener and speaker positions are limited to be distant from any wall at least twice the head breadth value. The listener and speaker points are positioned with a height ranging from 1.50 and 1.95 m, similar to most adult humans’ height. When all speakers and listener are positioned, the head angle of the listener is defined by randomly rotating the center of the two groups of microphones in the azimuth direction, but not exceeding a maximum undershot of 30 degrees. The maximum undershot constraint provides a better sense of reality, as a listener would not look too far from the desired speaker.</p> </div> <div class="ltx_para" id="S4.SS2.p4"> <p class="ltx_p" id="S4.SS2.p4.1">Additionally, the signal-to-noise ratio (SNR), calculated with the mixed audio utterance (representing noise) against the desired-speaker-only utterance (representing signal) is randomly varied from -10 to 20 dB. The SNR is calculated considering the entire audio utterance, without filtering out silence periods. Note that speech traces are combined such that there is minimum silence period, but still sounding natural. Table <a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S4.T1" title="TABLE I ‣ IV-B Acoustic simulation setup ‣ IV Model and simulation framework ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_tag">I</span></a> summarizes the variable parameters in the simulation.</p> </div> <figure class="ltx_table" id="S4.T1"> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S4.T1.2.1.1" style="font-size:90%;">TABLE I</span>: </span><span class="ltx_text" id="S4.T1.3.2" style="font-size:90%;">Variable parameters and ranges for the acoustic simulation.</span></figcaption> <table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle" id="S4.T1.4"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S4.T1.4.1.1"> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T1.4.1.1.1"><span class="ltx_text ltx_font_bold" id="S4.T1.4.1.1.1.1">Parameter</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T1.4.1.1.2"><span class="ltx_text ltx_font_bold" id="S4.T1.4.1.1.2.1">Min. value</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T1.4.1.1.3"><span class="ltx_text ltx_font_bold" id="S4.T1.4.1.1.3.1">Max. value</span></th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S4.T1.4.2.1"> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T1.4.2.1.1">T60 (s)</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T1.4.2.1.2">0.20</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T1.4.2.1.3">1.00</td> </tr> <tr class="ltx_tr" id="S4.T1.4.3.2"> <td class="ltx_td ltx_align_center" id="S4.T1.4.3.2.1">SNR (dB)</td> <td class="ltx_td ltx_align_center" id="S4.T1.4.3.2.2">-10.00</td> <td class="ltx_td ltx_align_center" id="S4.T1.4.3.2.3">20.00</td> </tr> <tr class="ltx_tr" id="S4.T1.4.4.3"> <td class="ltx_td ltx_align_center" id="S4.T1.4.4.3.1">Listener/speaker height (m)</td> <td class="ltx_td ltx_align_center" id="S4.T1.4.4.3.2">1.50</td> <td class="ltx_td ltx_align_center" id="S4.T1.4.4.3.3">1.95</td> </tr> <tr class="ltx_tr" id="S4.T1.4.5.4"> <td class="ltx_td ltx_align_center ltx_border_b" id="S4.T1.4.5.4.1">Undershot angle (°)</td> <td class="ltx_td ltx_align_center ltx_border_b" id="S4.T1.4.5.4.2">-30</td> <td class="ltx_td ltx_align_center ltx_border_b" id="S4.T1.4.5.4.3">+30</td> </tr> </tbody> </table> </figure> </section> <section class="ltx_subsection" id="S4.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S4.SS3.5.1.1">IV-C</span> </span><span class="ltx_text ltx_font_italic" id="S4.SS3.6.2">Data</span> </h3> <div class="ltx_para" id="S4.SS3.p1"> <p class="ltx_p" id="S4.SS3.p1.1">We use the LibriTTS dataset <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib13" title="">13</a>]</cite> for the acoustic simulation. For each speaker, traces of speech are randomly selected and resampled to 16 kHz, combined until a duration of 10 seconds is reached, with a fade-in and fade-out of 0.05 to 0.20 seconds. Each speech trace is multiplied by a gain, randomly defined from -3 to 3 dB. Both speech utterances are adjusted to avoid clipping when combined. Each utterance is then convolved with the RIR referent to that speaker and microphones, which are obtained as described in Section <a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S4.SS2" title="IV-B Acoustic simulation setup ‣ IV Model and simulation framework ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_tag"><span class="ltx_text">IV-B</span></span></a>, according to (<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S2.E1" title="In II Preliminaries ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_tag">1</span></a>), resulting in the microphone outputs.</p> </div> <div class="ltx_para" id="S4.SS3.p2"> <p class="ltx_p" id="S4.SS3.p2.1">Moreover, the STFT operation is applied to the microphone outputs for 256 samples, with a Hann window of size 256 and a hop of 128 samples. The STFTs used as input to the neural network are normalized by their mean and standard deviation. Real and imaginary parts are then concatenated in the frequency axis, forming the input to the neural network. For training, the ‘train-clean-360’ subset of LibriTTS is used, with 360 hours of raw audio. The evaluation is performed on the ‘test-clean’ set, with approximately 8.6 hours of data.</p> </div> </section> </section> <section class="ltx_section" id="S5"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">V </span><span class="ltx_text ltx_font_smallcaps" id="S5.1.1">Results and discussion</span> </h2> <div class="ltx_para" id="S5.p1"> <p class="ltx_p" id="S5.p1.1">We train the neural network model described in Section <a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S4.SS1" title="IV-A Audio beamforming model ‣ IV Model and simulation framework ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_tag"><span class="ltx_text">IV-A</span></span></a> with and without the SSM proposed in Section <a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S3" title="III Speaker selection mechanism ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_tag">III</span></a>, for <math alttext="N=2" class="ltx_Math" display="inline" id="S5.p1.1.m1.1"><semantics id="S5.p1.1.m1.1a"><mrow id="S5.p1.1.m1.1.1" xref="S5.p1.1.m1.1.1.cmml"><mi id="S5.p1.1.m1.1.1.2" xref="S5.p1.1.m1.1.1.2.cmml">N</mi><mo id="S5.p1.1.m1.1.1.1" xref="S5.p1.1.m1.1.1.1.cmml">=</mo><mn id="S5.p1.1.m1.1.1.3" xref="S5.p1.1.m1.1.1.3.cmml">2</mn></mrow><annotation-xml encoding="MathML-Content" id="S5.p1.1.m1.1b"><apply id="S5.p1.1.m1.1.1.cmml" xref="S5.p1.1.m1.1.1"><eq id="S5.p1.1.m1.1.1.1.cmml" xref="S5.p1.1.m1.1.1.1"></eq><ci id="S5.p1.1.m1.1.1.2.cmml" xref="S5.p1.1.m1.1.1.2">𝑁</ci><cn id="S5.p1.1.m1.1.1.3.cmml" type="integer" xref="S5.p1.1.m1.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.p1.1.m1.1c">N=2</annotation><annotation encoding="application/x-llamapun" id="S5.p1.1.m1.1d">italic_N = 2</annotation></semantics></math> speakers, according to the acoustic parameters defined in Section <a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S4.SS2" title="IV-B Acoustic simulation setup ‣ IV Model and simulation framework ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_tag"><span class="ltx_text">IV-B</span></span></a>, with the data mentioned in Section <a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S4.SS3" title="IV-C Data ‣ IV Model and simulation framework ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_tag"><span class="ltx_text">IV-C</span></span></a>. We also consider the (ideal) MVDR filter as a baseline, implemented as proposed in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#bib.bib8" title="">8</a>]</cite>. At every utterance, the MVDR filter is calculated based on the reference microphone with access to all separate (reverberant) signals, i.e., always beamforming in the target speaker direction. In Table <a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S5.T2" title="TABLE II ‣ V Results and discussion ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_tag">II</span></a>, we show the average values over the ‘test-clean’ set of LibriTTS of short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), and SI-SDR, for the mixed audio at the reference microphone and the filtered signals.</p> </div> <figure class="ltx_table" id="S5.T2"> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S5.T2.6.3.1" style="font-size:90%;">TABLE II</span>: </span><span class="ltx_text" id="S5.T2.4.2" style="font-size:90%;">Average STOI, PESQ, and SI-SDR over the ‘test-clean’ LibriTTS set for the mixed audio and the NN-filtered speech, trained with and without SSM for two speakers and evaluated for <math alttext="N=2" class="ltx_Math" display="inline" id="S5.T2.3.1.m1.1"><semantics id="S5.T2.3.1.m1.1b"><mrow id="S5.T2.3.1.m1.1.1" xref="S5.T2.3.1.m1.1.1.cmml"><mi id="S5.T2.3.1.m1.1.1.2" xref="S5.T2.3.1.m1.1.1.2.cmml">N</mi><mo id="S5.T2.3.1.m1.1.1.1" xref="S5.T2.3.1.m1.1.1.1.cmml">=</mo><mn id="S5.T2.3.1.m1.1.1.3" xref="S5.T2.3.1.m1.1.1.3.cmml">2</mn></mrow><annotation-xml encoding="MathML-Content" id="S5.T2.3.1.m1.1c"><apply id="S5.T2.3.1.m1.1.1.cmml" xref="S5.T2.3.1.m1.1.1"><eq id="S5.T2.3.1.m1.1.1.1.cmml" xref="S5.T2.3.1.m1.1.1.1"></eq><ci id="S5.T2.3.1.m1.1.1.2.cmml" xref="S5.T2.3.1.m1.1.1.2">𝑁</ci><cn id="S5.T2.3.1.m1.1.1.3.cmml" type="integer" xref="S5.T2.3.1.m1.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.T2.3.1.m1.1d">N=2</annotation><annotation encoding="application/x-llamapun" id="S5.T2.3.1.m1.1e">italic_N = 2</annotation></semantics></math> and <math alttext="N=3" class="ltx_Math" display="inline" id="S5.T2.4.2.m2.1"><semantics id="S5.T2.4.2.m2.1b"><mrow id="S5.T2.4.2.m2.1.1" xref="S5.T2.4.2.m2.1.1.cmml"><mi id="S5.T2.4.2.m2.1.1.2" xref="S5.T2.4.2.m2.1.1.2.cmml">N</mi><mo id="S5.T2.4.2.m2.1.1.1" xref="S5.T2.4.2.m2.1.1.1.cmml">=</mo><mn id="S5.T2.4.2.m2.1.1.3" xref="S5.T2.4.2.m2.1.1.3.cmml">3</mn></mrow><annotation-xml encoding="MathML-Content" id="S5.T2.4.2.m2.1c"><apply id="S5.T2.4.2.m2.1.1.cmml" xref="S5.T2.4.2.m2.1.1"><eq id="S5.T2.4.2.m2.1.1.1.cmml" xref="S5.T2.4.2.m2.1.1.1"></eq><ci id="S5.T2.4.2.m2.1.1.2.cmml" xref="S5.T2.4.2.m2.1.1.2">𝑁</ci><cn id="S5.T2.4.2.m2.1.1.3.cmml" type="integer" xref="S5.T2.4.2.m2.1.1.3">3</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.T2.4.2.m2.1d">N=3</annotation><annotation encoding="application/x-llamapun" id="S5.T2.4.2.m2.1e">italic_N = 3</annotation></semantics></math> speakers.</span></figcaption> <table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle" id="S5.T2.7"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S5.T2.7.1.1"> <th class="ltx_td ltx_th ltx_th_row ltx_border_t" id="S5.T2.7.1.1.1"></th> <th class="ltx_td ltx_th ltx_th_column ltx_border_t" id="S5.T2.7.1.1.2"></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" colspan="3" id="S5.T2.7.1.1.3">-10 dB SNR</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" colspan="3" id="S5.T2.7.1.1.4">0 dB SNR</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" colspan="3" id="S5.T2.7.1.1.5">10 dB SNR</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" colspan="3" id="S5.T2.7.1.1.6">20 dB SNR</th> </tr> <tr class="ltx_tr" id="S5.T2.7.2.2"> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row" id="S5.T2.7.2.2.1"><span class="ltx_text ltx_font_bold" id="S5.T2.7.2.2.1.1">N</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S5.T2.7.2.2.2"><span class="ltx_text ltx_font_bold" id="S5.T2.7.2.2.2.1">Method</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S5.T2.7.2.2.3"><span class="ltx_text ltx_font_bold" id="S5.T2.7.2.2.3.1">STOI</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S5.T2.7.2.2.4"><span class="ltx_text ltx_font_bold" id="S5.T2.7.2.2.4.1">PESQ</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S5.T2.7.2.2.5"><span class="ltx_text ltx_font_bold" id="S5.T2.7.2.2.5.1">SI-SDR</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S5.T2.7.2.2.6"><span class="ltx_text ltx_font_bold" id="S5.T2.7.2.2.6.1">STOI</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S5.T2.7.2.2.7"><span class="ltx_text ltx_font_bold" id="S5.T2.7.2.2.7.1">PESQ</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S5.T2.7.2.2.8"><span class="ltx_text ltx_font_bold" id="S5.T2.7.2.2.8.1">SI-SDR</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S5.T2.7.2.2.9"><span class="ltx_text ltx_font_bold" id="S5.T2.7.2.2.9.1">STOI</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S5.T2.7.2.2.10"><span class="ltx_text ltx_font_bold" id="S5.T2.7.2.2.10.1">PESQ</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S5.T2.7.2.2.11"><span class="ltx_text ltx_font_bold" id="S5.T2.7.2.2.11.1">SI-SDR</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S5.T2.7.2.2.12"><span class="ltx_text ltx_font_bold" id="S5.T2.7.2.2.12.1">STOI</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S5.T2.7.2.2.13"><span class="ltx_text ltx_font_bold" id="S5.T2.7.2.2.13.1">PESQ</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S5.T2.7.2.2.14"><span class="ltx_text ltx_font_bold" id="S5.T2.7.2.2.14.1">SI-SDR</span></th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S5.T2.7.3.1"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t" id="S5.T2.7.3.1.1" rowspan="4"><span class="ltx_text" id="S5.T2.7.3.1.1.1">2</span></th> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.3.1.2"><span class="ltx_text" id="S5.T2.7.3.1.2.1" style="color:#404040;">None (mixed)</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.3.1.3"><span class="ltx_text" id="S5.T2.7.3.1.3.1" style="color:#404040;">0.384</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.3.1.4"><span class="ltx_text" id="S5.T2.7.3.1.4.1" style="color:#404040;">1.237</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.3.1.5"><span class="ltx_text" id="S5.T2.7.3.1.5.1" style="color:#404040;">-9.970</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.3.1.6"><span class="ltx_text" id="S5.T2.7.3.1.6.1" style="color:#404040;">0.634</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.3.1.7"><span class="ltx_text" id="S5.T2.7.3.1.7.1" style="color:#404040;">1.535</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.3.1.8"><span class="ltx_text" id="S5.T2.7.3.1.8.1" style="color:#404040;">0.032</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.3.1.9"><span class="ltx_text" id="S5.T2.7.3.1.9.1" style="color:#404040;">0.849</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.3.1.10"><span class="ltx_text" id="S5.T2.7.3.1.10.1" style="color:#404040;">2.334</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.3.1.11"><span class="ltx_text" id="S5.T2.7.3.1.11.1" style="color:#404040;">10.031</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.3.1.12"><span class="ltx_text" id="S5.T2.7.3.1.12.1" style="color:#404040;">0.958</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.3.1.13"><span class="ltx_text" id="S5.T2.7.3.1.13.1" style="color:#404040;">3.431</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.3.1.14"><span class="ltx_text" id="S5.T2.7.3.1.14.1" style="color:#404040;">20.030</span></td> </tr> <tr class="ltx_tr" id="S5.T2.7.4.2"> <td class="ltx_td ltx_align_center" id="S5.T2.7.4.2.1">MVDR filter</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.4.2.2">0.447</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.4.2.3">1.322</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.4.2.4">-6.445</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.4.2.5">0.682</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.4.2.6">1.740</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.4.2.7">2.043</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.4.2.8">0.838</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.4.2.9">2.430</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.4.2.10">5.109</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.4.2.11">0.868</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.4.2.12">2.707</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.4.2.13">1.866</td> </tr> <tr class="ltx_tr" id="S5.T2.7.5.3"> <td class="ltx_td ltx_align_center" id="S5.T2.7.5.3.1">NN</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.5.3.2">0.346</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.5.3.3">1.219</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.5.3.4">-11.172</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.5.3.5">0.629</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.5.3.6">1.532</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.5.3.7">-0.007</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.5.3.8">0.861</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.5.3.9">2.430</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.5.3.10">10.913</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.5.3.11"><span class="ltx_text" id="S5.T2.7.5.3.11.1" style="color:#C81919;">0.963</span></td> <td class="ltx_td ltx_align_center" id="S5.T2.7.5.3.12">3.617</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.5.3.13"><span class="ltx_text" id="S5.T2.7.5.3.13.1" style="color:#C81919;">20.790</span></td> </tr> <tr class="ltx_tr" id="S5.T2.7.6.4"> <td class="ltx_td ltx_align_center" id="S5.T2.7.6.4.1">NN + SSM training</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.6.4.2"><span class="ltx_text" id="S5.T2.7.6.4.2.1" style="color:#C81919;">0.526</span></td> <td class="ltx_td ltx_align_center" id="S5.T2.7.6.4.3"><span class="ltx_text" id="S5.T2.7.6.4.3.1" style="color:#C81919;">1.366</span></td> <td class="ltx_td ltx_align_center" id="S5.T2.7.6.4.4"><span class="ltx_text" id="S5.T2.7.6.4.4.1" style="color:#C81919;">-1.608</span></td> <td class="ltx_td ltx_align_center" id="S5.T2.7.6.4.5"><span class="ltx_text" id="S5.T2.7.6.4.5.1" style="color:#C81919;">0.736</span></td> <td class="ltx_td ltx_align_center" id="S5.T2.7.6.4.6"><span class="ltx_text" id="S5.T2.7.6.4.6.1" style="color:#C81919;">1.851</span></td> <td class="ltx_td ltx_align_center" id="S5.T2.7.6.4.7"><span class="ltx_text" id="S5.T2.7.6.4.7.1" style="color:#C81919;">5.012</span></td> <td class="ltx_td ltx_align_center" id="S5.T2.7.6.4.8"><span class="ltx_text" id="S5.T2.7.6.4.8.1" style="color:#C81919;">0.891</span></td> <td class="ltx_td ltx_align_center" id="S5.T2.7.6.4.9"><span class="ltx_text" id="S5.T2.7.6.4.9.1" style="color:#C81919;">2.790</span></td> <td class="ltx_td ltx_align_center" id="S5.T2.7.6.4.10"><span class="ltx_text" id="S5.T2.7.6.4.10.1" style="color:#C81919;">12.541</span></td> <td class="ltx_td ltx_align_center" id="S5.T2.7.6.4.11"><span class="ltx_text" id="S5.T2.7.6.4.11.1" style="color:#C81919;">0.963</span></td> <td class="ltx_td ltx_align_center" id="S5.T2.7.6.4.12"><span class="ltx_text" id="S5.T2.7.6.4.12.1" style="color:#C81919;">3.746</span></td> <td class="ltx_td ltx_align_center" id="S5.T2.7.6.4.13">20.227</td> </tr> <tr class="ltx_tr" id="S5.T2.7.7.5"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_bb ltx_border_t" id="S5.T2.7.7.5.1" rowspan="4"><span class="ltx_text" id="S5.T2.7.7.5.1.1">3</span></th> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.7.5.2"><span class="ltx_text" id="S5.T2.7.7.5.2.1" style="color:#404040;">None (mixed)</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.7.5.3"><span class="ltx_text" id="S5.T2.7.7.5.3.1" style="color:#404040;">0.313</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.7.5.4"><span class="ltx_text" id="S5.T2.7.7.5.4.1" style="color:#404040;">1.237</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.7.5.5"><span class="ltx_text" id="S5.T2.7.7.5.5.1" style="color:#404040;">-9.990</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.7.5.6"><span class="ltx_text" id="S5.T2.7.7.5.6.1" style="color:#404040;">0.580</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.7.5.7"><span class="ltx_text" id="S5.T2.7.7.5.7.1" style="color:#404040;">1.465</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.7.5.8"><span class="ltx_text" id="S5.T2.7.7.5.8.1" style="color:#404040;">0.016</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.7.5.9"><span class="ltx_text" id="S5.T2.7.7.5.9.1" style="color:#404040;">0.828</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.7.5.10"><span class="ltx_text" id="S5.T2.7.7.5.10.1" style="color:#404040;">2.211</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.7.5.11"><span class="ltx_text" id="S5.T2.7.7.5.11.1" style="color:#404040;">10.017</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.7.5.12"><span class="ltx_text" id="S5.T2.7.7.5.12.1" style="color:#404040;">0.954</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.7.5.13"><span class="ltx_text" id="S5.T2.7.7.5.13.1" style="color:#404040;">3.368</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T2.7.7.5.14"><span class="ltx_text" id="S5.T2.7.7.5.14.1" style="color:#404040;">20.018</span></td> </tr> <tr class="ltx_tr" id="S5.T2.7.8.6"> <td class="ltx_td ltx_align_center" id="S5.T2.7.8.6.1">MVDR filter</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.8.6.2">0.371</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.8.6.3"><span class="ltx_text" id="S5.T2.7.8.6.3.1" style="color:#C81919;">1.285</span></td> <td class="ltx_td ltx_align_center" id="S5.T2.7.8.6.4">-7.314</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.8.6.5">0.634</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.8.6.6">1.626</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.8.6.7">1.786</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.8.6.8">0.827</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.8.6.9">2.327</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.8.6.10">5.690</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.8.6.11">0.872</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.8.6.12">2.714</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.8.6.13">2.238</td> </tr> <tr class="ltx_tr" id="S5.T2.7.9.7"> <td class="ltx_td ltx_align_center" id="S5.T2.7.9.7.1">NN</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.9.7.2">0.299</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.9.7.3">1.225</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.9.7.4">-10.582</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.9.7.5">0.583</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.9.7.6">1.468</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.9.7.7">0.156</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.9.7.8">0.845</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.9.7.9">2.323</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.9.7.10">11.025</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.9.7.11">0.960</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.9.7.12">3.569</td> <td class="ltx_td ltx_align_center" id="S5.T2.7.9.7.13"><span class="ltx_text" id="S5.T2.7.9.7.13.1" style="color:#C81919;">20.740</span></td> </tr> <tr class="ltx_tr" id="S5.T2.7.10.8"> <td class="ltx_td ltx_align_center ltx_border_bb" id="S5.T2.7.10.8.1">NN + SSM training</td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S5.T2.7.10.8.2"><span class="ltx_text" id="S5.T2.7.10.8.2.1" style="color:#C81919;">0.400</span></td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S5.T2.7.10.8.3">1.253</td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S5.T2.7.10.8.4"><span class="ltx_text" id="S5.T2.7.10.8.4.1" style="color:#C81919;">-6.311</span></td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S5.T2.7.10.8.5"><span class="ltx_text" id="S5.T2.7.10.8.5.1" style="color:#C81919;">0.669</span></td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S5.T2.7.10.8.6"><span class="ltx_text" id="S5.T2.7.10.8.6.1" style="color:#C81919;">1.652</span></td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S5.T2.7.10.8.7"><span class="ltx_text" id="S5.T2.7.10.8.7.1" style="color:#C81919;">3.361</span></td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S5.T2.7.10.8.8"><span class="ltx_text" id="S5.T2.7.10.8.8.1" style="color:#C81919;">0.874</span></td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S5.T2.7.10.8.9"><span class="ltx_text" id="S5.T2.7.10.8.9.1" style="color:#C81919;">2.621</span></td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S5.T2.7.10.8.10"><span class="ltx_text" id="S5.T2.7.10.8.10.1" style="color:#C81919;">12.222</span></td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S5.T2.7.10.8.11"><span class="ltx_text" id="S5.T2.7.10.8.11.1" style="color:#C81919;">0.961</span></td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S5.T2.7.10.8.12"><span class="ltx_text" id="S5.T2.7.10.8.12.1" style="color:#C81919;">3.703</span></td> <td class="ltx_td ltx_align_center ltx_border_bb" id="S5.T2.7.10.8.13">20.273</td> </tr> </tbody> </table> </figure> <div class="ltx_para" id="S5.p2"> <p class="ltx_p" id="S5.p2.1">We can see from Table <a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S5.T2" title="TABLE II ‣ V Results and discussion ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_tag">II</span></a>, for <math alttext="N=2" class="ltx_Math" display="inline" id="S5.p2.1.m1.1"><semantics id="S5.p2.1.m1.1a"><mrow id="S5.p2.1.m1.1.1" xref="S5.p2.1.m1.1.1.cmml"><mi id="S5.p2.1.m1.1.1.2" xref="S5.p2.1.m1.1.1.2.cmml">N</mi><mo id="S5.p2.1.m1.1.1.1" xref="S5.p2.1.m1.1.1.1.cmml">=</mo><mn id="S5.p2.1.m1.1.1.3" xref="S5.p2.1.m1.1.1.3.cmml">2</mn></mrow><annotation-xml encoding="MathML-Content" id="S5.p2.1.m1.1b"><apply id="S5.p2.1.m1.1.1.cmml" xref="S5.p2.1.m1.1.1"><eq id="S5.p2.1.m1.1.1.1.cmml" xref="S5.p2.1.m1.1.1.1"></eq><ci id="S5.p2.1.m1.1.1.2.cmml" xref="S5.p2.1.m1.1.1.2">𝑁</ci><cn id="S5.p2.1.m1.1.1.3.cmml" type="integer" xref="S5.p2.1.m1.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.p2.1.m1.1c">N=2</annotation><annotation encoding="application/x-llamapun" id="S5.p2.1.m1.1d">italic_N = 2</annotation></semantics></math> speakers, that the use of the SSM in training can significantly increase the performance of the neural network-based beamforming model, for all considered SNRs. As expected, the proposed mechanism is able to teach the network which speaker is of interest at each utterance. When the model is trained without such information, a lower signal-to-noise ratio condition causes the performance to be drastically affected since the NN model “confuses” the choice of speaker. We can see that, as the SNR of the speech combination increases, the NN without SSM becomes able to separate desired from undesired speaker, indicating that it is focusing on the higher-amplitude signal, a major feature in the audio combination. However, even for higher SNR levels, the performance of the baseline NN is insufficient, as the model trained with SSM almost always forms an upper bound for the NN’s performance. For the lower considered SNR levels (-10 and 0 dB), the NN without SSM cannot even surpass the metrics obtained with the mixed signal, received at the reference microphone. The NN trained with SSM, on the other hand, is able to extract the desired reverberant speech trace.</p> </div> <div class="ltx_para" id="S5.p3"> <p class="ltx_p" id="S5.p3.1">Moreover, the MVDR filter is outperformed by the NN with SSM training for almost all cases. The baseline NN provides a similar or better performance than the MVDR filter at higher SNRs. That is due to the MVDR formulation, which assumes an acoustic scene with anechoic conditions, while the NNs can learn to suppress the effects of reverberation. For higher SNR, the reverberation of the desired speaker has more energy, contaminating the direct path and deteriorating the MVDR performance, which can be noticed in terms of SI-SDR.</p> </div> <div class="ltx_para" id="S5.p4"> <p class="ltx_p" id="S5.p4.2">We also check the robustness of the proposed mechanism against changes in the environment by re-evaluating all methods for a different acoustic scenario. Now, we consider a more challenging case of <math alttext="N=3" class="ltx_Math" display="inline" id="S5.p4.1.m1.1"><semantics id="S5.p4.1.m1.1a"><mrow id="S5.p4.1.m1.1.1" xref="S5.p4.1.m1.1.1.cmml"><mi id="S5.p4.1.m1.1.1.2" xref="S5.p4.1.m1.1.1.2.cmml">N</mi><mo id="S5.p4.1.m1.1.1.1" xref="S5.p4.1.m1.1.1.1.cmml">=</mo><mn id="S5.p4.1.m1.1.1.3" xref="S5.p4.1.m1.1.1.3.cmml">3</mn></mrow><annotation-xml encoding="MathML-Content" id="S5.p4.1.m1.1b"><apply id="S5.p4.1.m1.1.1.cmml" xref="S5.p4.1.m1.1.1"><eq id="S5.p4.1.m1.1.1.1.cmml" xref="S5.p4.1.m1.1.1.1"></eq><ci id="S5.p4.1.m1.1.1.2.cmml" xref="S5.p4.1.m1.1.1.2">𝑁</ci><cn id="S5.p4.1.m1.1.1.3.cmml" type="integer" xref="S5.p4.1.m1.1.1.3">3</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.p4.1.m1.1c">N=3</annotation><annotation encoding="application/x-llamapun" id="S5.p4.1.m1.1d">italic_N = 3</annotation></semantics></math> speakers, with minimum distance between listener to speakers, and speakers to speakers, of 0.5 m, and minimum absolute angle difference of speakers in relation to the listener center axis of at least 20 degrees. All other simulation parameters are kept as before. The training of the neural networks is not re-executed and their parameters are kept exactly the same as for <math alttext="N=2" class="ltx_Math" display="inline" id="S5.p4.2.m2.1"><semantics id="S5.p4.2.m2.1a"><mrow id="S5.p4.2.m2.1.1" xref="S5.p4.2.m2.1.1.cmml"><mi id="S5.p4.2.m2.1.1.2" xref="S5.p4.2.m2.1.1.2.cmml">N</mi><mo id="S5.p4.2.m2.1.1.1" xref="S5.p4.2.m2.1.1.1.cmml">=</mo><mn id="S5.p4.2.m2.1.1.3" xref="S5.p4.2.m2.1.1.3.cmml">2</mn></mrow><annotation-xml encoding="MathML-Content" id="S5.p4.2.m2.1b"><apply id="S5.p4.2.m2.1.1.cmml" xref="S5.p4.2.m2.1.1"><eq id="S5.p4.2.m2.1.1.1.cmml" xref="S5.p4.2.m2.1.1.1"></eq><ci id="S5.p4.2.m2.1.1.2.cmml" xref="S5.p4.2.m2.1.1.2">𝑁</ci><cn id="S5.p4.2.m2.1.1.3.cmml" type="integer" xref="S5.p4.2.m2.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.p4.2.m2.1c">N=2</annotation><annotation encoding="application/x-llamapun" id="S5.p4.2.m2.1d">italic_N = 2</annotation></semantics></math> speakers.</p> </div> <div class="ltx_para" id="S5.p5"> <p class="ltx_p" id="S5.p5.1">As shown in Table <a class="ltx_ref" href="https://arxiv.org/html/2503.18590v1#S5.T2" title="TABLE II ‣ V Results and discussion ‣ Target Speaker Selection for Neural Network Beamforming in Multi-Speaker Scenarios This work was supported by the Robust AI for SafE (radar) signal processing (RAISE) collaboration framework between Eindhoven University of Technology and NXP Semiconductors, including a Privaat-Publieke Samenwerkingen-toeslag (PPS) supplement from the Dutch Ministry of Economic Affairs and Climate Policy."><span class="ltx_text ltx_ref_tag">II</span></a>, with <math alttext="N=3" class="ltx_Math" display="inline" id="S5.p5.1.m1.1"><semantics id="S5.p5.1.m1.1a"><mrow id="S5.p5.1.m1.1.1" xref="S5.p5.1.m1.1.1.cmml"><mi id="S5.p5.1.m1.1.1.2" xref="S5.p5.1.m1.1.1.2.cmml">N</mi><mo id="S5.p5.1.m1.1.1.1" xref="S5.p5.1.m1.1.1.1.cmml">=</mo><mn id="S5.p5.1.m1.1.1.3" xref="S5.p5.1.m1.1.1.3.cmml">3</mn></mrow><annotation-xml encoding="MathML-Content" id="S5.p5.1.m1.1b"><apply id="S5.p5.1.m1.1.1.cmml" xref="S5.p5.1.m1.1.1"><eq id="S5.p5.1.m1.1.1.1.cmml" xref="S5.p5.1.m1.1.1.1"></eq><ci id="S5.p5.1.m1.1.1.2.cmml" xref="S5.p5.1.m1.1.1.2">𝑁</ci><cn id="S5.p5.1.m1.1.1.3.cmml" type="integer" xref="S5.p5.1.m1.1.1.3">3</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.p5.1.m1.1c">N=3</annotation><annotation encoding="application/x-llamapun" id="S5.p5.1.m1.1d">italic_N = 3</annotation></semantics></math> speakers, the proposed SSM is robust to changes in the number of speakers and positioning. The NN trained with SSM still outperforms the baselines for almost all cases. When the SNR is low, the NN without SSM again cannot achieve better metrics than the mixed audio obtained at the reference microphone. For high SNR, the baseline NN is able to extract the desired speech given the easier settings. The MVDR filter is clearly affected by the presence of reverberation, as previously observed.</p> </div> <div class="ltx_para" id="S5.p6"> <p class="ltx_p" id="S5.p6.1">Overall, the results indicate that the use of the proposed speaker selection mechanism in training dramatically improves beamforming results for NN-based beamforming, where the target speaker changes according to the listener’s head movement, working well even at a very low SNR level (-10 dB). The SSM is general and robust, providing significant advances toward the solution of the cocktail party problem.</p> </div> </section> <section class="ltx_section" id="S6"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">VI </span><span class="ltx_text ltx_font_smallcaps" id="S6.1.1">Conclusion</span> </h2> <div class="ltx_para" id="S6.p1"> <p class="ltx_p" id="S6.p1.1">We proposed a speaker selection mechanism for the training of a neural network model on the task of audio beamforming. The SSM dynamically changes the target speaker in the loss function, at every utterance, focusing on the closest speaker to the listener’s head center axis. Through acoustic simulations, the neural network model trained with SSM was able to outperform the baseline NN model (trained without it) and the (ideal) MVDR filter, achieving significantly higher performance metrics. Additionally, we showed that the SSM is robust to changes in the acoustic scene – number of speakers and positioning. The proposed speaker selection mechanism represents a leap toward the solution of the cocktail party problem. In future work, the employment of the SSM in a real-world set-up is suggested, as well as a deeper analysis of robustness to changes in the acoustic scene.</p> </div> </section> <section class="ltx_bibliography" id="bib"> <h2 class="ltx_title ltx_title_bibliography">References</h2> <ul class="ltx_biblist"> <li class="ltx_bibitem" id="bib.bib1"> <span class="ltx_tag ltx_tag_bibitem">[1]</span> <span class="ltx_bibblock"> A. W. Bronkhorst, “The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions,” <em class="ltx_emph ltx_font_italic" id="bib.bib1.1.1">Acustica</em>, vol. 86, pp. 117–128, 2000. </span> </li> <li class="ltx_bibitem" id="bib.bib2"> <span class="ltx_tag ltx_tag_bibitem">[2]</span> <span class="ltx_bibblock"> M. A. Bee and C. Micheyl, “The cocktail party problem: What is it? How can it be solved? And why should animal behaviorists study it?” <em class="ltx_emph ltx_font_italic" id="bib.bib2.1.1">Journal of Comparative Psychology</em>, vol. 122, no. 3, pp. 235–251, Aug 2008. </span> </li> <li class="ltx_bibitem" id="bib.bib3"> <span class="ltx_tag ltx_tag_bibitem">[3]</span> <span class="ltx_bibblock"> H. Lu, M. F. McKinney, T. Zhang, and A. J. Oxenham, “Investigating age, hearing loss, and background noise effects on speaker-targeted head and eye movements in three-way conversations,” <em class="ltx_emph ltx_font_italic" id="bib.bib3.1.1">J Acoust Soc Am</em>, vol. 149, no. 3, p. 1889, Mar 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib4"> <span class="ltx_tag ltx_tag_bibitem">[4]</span> <span class="ltx_bibblock"> J. Gerald Kidd, C. R. Mason, V. Best, and J. Swaminathan, “Benefits of Acoustic Beamforming for Solving the Cocktail Party Problem,” <em class="ltx_emph ltx_font_italic" id="bib.bib4.1.1">Trends in Hearing</em>, vol. 19, p. 2331216515593385, 2015, pMID: 26126896. </span> </li> <li class="ltx_bibitem" id="bib.bib5"> <span class="ltx_tag ltx_tag_bibitem">[5]</span> <span class="ltx_bibblock"> B. Cauchi, I. Kodrasi, R. Rehr <em class="ltx_emph ltx_font_italic" id="bib.bib5.1.1">et al.</em>, “Combination of MVDR beamforming and single-channel spectral processing for enhancing noisy and reverberant speech,” <em class="ltx_emph ltx_font_italic" id="bib.bib5.2.2">EURASIP Journal on Advances in Signal Processing</em>, vol. 2015, no. 61, 2015. </span> </li> <li class="ltx_bibitem" id="bib.bib6"> <span class="ltx_tag ltx_tag_bibitem">[6]</span> <span class="ltx_bibblock"> Y. Chen, Y. Hsu, and M. R. Bai, “Multi-channel end-to-end neural network for speech enhancement, source localization, and voice activity detection,” 2022. [Online]. Available: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://arxiv.org/abs/2206.09728" title="">https://arxiv.org/abs/2206.09728</a> </span> </li> <li class="ltx_bibitem" id="bib.bib7"> <span class="ltx_tag ltx_tag_bibitem">[7]</span> <span class="ltx_bibblock"> Z. Zhang, Y. Xu, M. Yu, S.-X. Zhang, L. Chen, and D. Yu, “ADL-MVDR: All Deep Learning MVDR Beamformer for Target Speech Separation,” in <em class="ltx_emph ltx_font_italic" id="bib.bib7.1.1">2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</em>, 2021, pp. 6089–6093. </span> </li> <li class="ltx_bibitem" id="bib.bib8"> <span class="ltx_tag ltx_tag_bibitem">[8]</span> <span class="ltx_bibblock"> M. Souden, J. Benesty, and S. Affes, “On Optimal Frequency-Domain Multichannel Linear Filtering for Noise Reduction,” <em class="ltx_emph ltx_font_italic" id="bib.bib8.1.1">IEEE Transactions on Audio, Speech, and Language Processing</em>, vol. 18, no. 2, pp. 260–276, 2010. </span> </li> <li class="ltx_bibitem" id="bib.bib9"> <span class="ltx_tag ltx_tag_bibitem">[9]</span> <span class="ltx_bibblock"> E. Grinstein, C. M. Hicks, T. van Waterschoot, M. Brookes, and P. A. Naylor, “The Neural-SRP Method for Universal Robust Multi-Source Tracking,” <em class="ltx_emph ltx_font_italic" id="bib.bib9.1.1">IEEE Open Journal of Signal Processing</em>, vol. 5, pp. 19–28, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib10"> <span class="ltx_tag ltx_tag_bibitem">[10]</span> <span class="ltx_bibblock"> B. Veluri, M. Itani, T. Chen, T. Yoshioka, and S. Gollakota, “Look Once to Hear: Target Speech Hearing with Noisy Examples,” in <em class="ltx_emph ltx_font_italic" id="bib.bib10.1.1">2024 CHI Conference on Human Factors in Computing Systems</em>, ser. CHI ’24. New York, NY, USA: Association for Computing Machinery, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib11"> <span class="ltx_tag ltx_tag_bibitem">[11]</span> <span class="ltx_bibblock"> J. L. Roux, S. Wisdom, H. Erdogan, and J. R. Hershey, “SDR - half-baked or well done?” <em class="ltx_emph ltx_font_italic" id="bib.bib11.1.1">arXiv preprint 1811.02508</em>, 2018. </span> </li> <li class="ltx_bibitem" id="bib.bib12"> <span class="ltx_tag ltx_tag_bibitem">[12]</span> <span class="ltx_bibblock"> D. Diaz-Guerra, A. Miguel, and J. R. Beltran, “gpurir: A python library for room impulse response simulation with gpu acceleration,” <em class="ltx_emph ltx_font_italic" id="bib.bib12.1.1">Multimedia Tools and Applications</em>, vol. 80, no. 7, pp. 5653–5671, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib13"> <span class="ltx_tag ltx_tag_bibitem">[13]</span> <span class="ltx_bibblock"> H. Zen, V. Dang, R. Clark, Y. Zhang, R. J. Weiss, Y. Jia, Z. Chen, and Y. Wu, “LibriTTS: A corpus derived from librispeech for text-to-speech,” <em class="ltx_emph ltx_font_italic" id="bib.bib13.1.1">arXiv preprint 1904.02882</em>, 2019. </span> </li> </ul> </section> </article> </div> <footer class="ltx_page_footer"> <div class="ltx_page_logo">Generated on Mon Mar 24 11:44:19 2025 by <a class="ltx_LaTeXML_logo" href="http://dlmf.nist.gov/LaTeXML/"><span style="letter-spacing:-0.2em; margin-right:0.1em;">L<span class="ltx_font_smallcaps" style="position:relative; bottom:2.2pt;">a</span>T<span class="ltx_font_smallcaps" style="font-size:120%;position:relative; bottom:-0.2ex;">e</span></span><span style="font-size:90%; position:relative; bottom:-0.2ex;">XML</span><img alt="Mascot Sammy" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAsAAAAOCAYAAAD5YeaVAAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A/wD/oL2nkwAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9wKExQZLWTEaOUAAAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAdpJREFUKM9tkL+L2nAARz9fPZNCKFapUn8kyI0e4iRHSR1Kb8ng0lJw6FYHFwv2LwhOpcWxTjeUunYqOmqd6hEoRDhtDWdA8ApRYsSUCDHNt5ul13vz4w0vWCgUnnEc975arX6ORqN3VqtVZbfbTQC4uEHANM3jSqXymFI6yWazP2KxWAXAL9zCUa1Wy2tXVxheKA9YNoR8Pt+aTqe4FVVVvz05O6MBhqUIBGk8Hn8HAOVy+T+XLJfLS4ZhTiRJgqIoVBRFIoric47jPnmeB1mW/9rr9ZpSSn3Lsmir1fJZlqWlUonKsvwWwD8ymc/nXwVBeLjf7xEKhdBut9Hr9WgmkyGEkJwsy5eHG5vN5g0AKIoCAEgkEkin0wQAfN9/cXPdheu6P33fBwB4ngcAcByHJpPJl+fn54mD3Gg0NrquXxeLRQAAwzAYj8cwTZPwPH9/sVg8PXweDAauqqr2cDjEer1GJBLBZDJBs9mE4zjwfZ85lAGg2+06hmGgXq+j3+/DsixYlgVN03a9Xu8jgCNCyIegIAgx13Vfd7vdu+FweG8YRkjXdWy329+dTgeSJD3ieZ7RNO0VAXAPwDEAO5VKndi2fWrb9jWl9Esul6PZbDY9Go1OZ7PZ9z/lyuD3OozU2wAAAABJRU5ErkJggg=="/></a> </div></footer> </div> </body> </html>