IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities

<!DOCTYPE html>     <html class="no-js" lang="en">  <head> <title>IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities - NASA/ADS</title>  <link rel="apple-touch-icon" sizes="180x180" href="//styles/favicon/apple-touch-icon.png" /> <link rel="icon" type="image/png" sizes="32x32" href="//styles/favicon/favicon-32x32.png" /> <link rel="icon" type="image/png" sizes="16x16" href="//styles/favicon/favicon-16x16.png" /> <link rel="manifest" href="//styles/favicon/site.webmanifest" /> <link rel="mask-icon" href="//styles/favicon/safari-pinned-tab.svg" color="#5bbad5" /> <meta name="apple-mobile-web-app-title" content="NASA ADS" /> <meta name="application-name" content="NASA ADS" /> <meta name="msapplication-TileColor" content="#ffc40d" /> <meta name="theme-color" content="#ffffff" />  <link rel="stylesheet" href="/styles/css/styles.css"> <meta name="robots" content="noarchive"> <link rel="canonical" href="http://ui.adsabs.harvard.edu/abs/2024arXiv241008035Z/abstract"/> <meta name="description" content="Current methods of building LLMs with voice interaction capabilities rely heavily on explicit text autoregressive generation before or during speech response generation to maintain content quality, which unfortunately brings computational overhead and increases latency in multi-turn interactions. To address this, we introduce IntrinsicVoic,e an LLM designed with intrinsic real-time voice interaction capabilities. IntrinsicVoice aims to facilitate the transfer of textual capabilities of pre-trained LLMs to the speech modality by mitigating the modality gap between text and speech. Our novelty architecture, GroupFormer, can reduce speech sequences to lengths comparable to text sequences while generating high-quality audio, significantly reducing the length difference between speech and text, speeding up inference, and alleviating long-text modeling issues. Additionally, we construct a multi-turn speech-to-speech dialogue dataset named \method-500k which includes nearly 500k turns of speech-to-speech dialogues, and a cross-modality training strategy to enhance the semantic alignment between speech and text. Experimental results demonstrate that IntrinsicVoice can generate high-quality speech response with latency lower than 100ms in multi-turn dialogue scenarios. Demos are available at https://instrinsicvoice.github.io/.">  <meta property="og:type" content="eprint"> <meta property="og:title" content="IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities"> <meta property="og:site_name" content="NASA/ADS"> <meta property="og:description" content="Current methods of building LLMs with voice interaction capabilities rely heavily on explicit text autoregressive generation before or during speech response generation to maintain content quality, which unfortunately brings computational overhead and increases latency in multi-turn interactions. To address this, we introduce IntrinsicVoic,e an LLM designed with intrinsic real-time voice interaction capabilities. IntrinsicVoice aims to facilitate the transfer of textual capabilities of pre-trained LLMs to the speech modality by mitigating the modality gap between text and speech. Our novelty architecture, GroupFormer, can reduce speech sequences to lengths comparable to text sequences while generating high-quality audio, significantly reducing the length difference between speech and text, speeding up inference, and alleviating long-text modeling issues. Additionally, we construct a multi-turn speech-to-speech dialogue dataset named \method-500k which includes nearly 500k turns of speech-to-speech dialogues, and a cross-modality training strategy to enhance the semantic alignment between speech and text. Experimental results demonstrate that IntrinsicVoice can generate high-quality speech response with latency lower than 100ms in multi-turn dialogue scenarios. Demos are available at https://instrinsicvoice.github.io/."> <meta property="og:url" content="https://ui.adsabs.harvard.edu/abs/2024arXiv241008035Z/abstract"> <meta property="og:image" content="https://ui.adsabs.harvard.edu/styles/img/transparent_logo.svg"> <meta property="article:published_time" content="10/2024"> <meta property="article:author" content="Zhang, Xin"> <meta property="article:author" content="Lyu, Xiang"> <meta property="article:author" content="Du, Zhihao"> <meta property="article:author" content="Chen, Qian"> <meta property="article:author" content="Zhang, Dong"> <meta property="article:author" content="Hu, Hangrui"> <meta property="article:author" content="Tan, Chaohong"> <meta property="article:author" content="Zhao, Tianyu"> <meta property="article:author" content="Wang, Yuxuan"> <meta property="article:author" content="Zhang, Bin"> <meta property="article:author" content="Lu, Heng"> <meta property="article:author" content="Zhou, Yaqian"> <meta property="article:author" content="Qiu, Xipeng">  <meta name="citation_journal_title" content="arXiv e-prints"> <meta name="citation_authors" content="Zhang, Xin;Lyu, Xiang;Du, Zhihao;Chen, Qian;Zhang, Dong;Hu, Hangrui;Tan, Chaohong;Zhao, Tianyu;Wang, Yuxuan;Zhang, Bin;Lu, Heng;Zhou, Yaqian;Qiu, Xipeng"> <meta name="citation_title" content="IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities"> <meta name="citation_date" content="10/2024"> <meta name="citation_firstpage" content="arXiv:2410.08035"> <meta name="citation_doi" content="10.48550/arXiv.2410.08035"> <meta name="citation_language" content="en"> <meta name="citation_keywords" content="Computer Science - Sound"> <meta name="citation_keywords" content="Computer Science - Artificial Intelligence"> <meta name="citation_abstract_html_url" content="https://ui.adsabs.harvard.edu/abs/2024arXiv241008035Z/abstract"> <meta name="citation_publication_date" content="10/2024"> <meta name="citation_arxiv_id" content="arXiv:2410.08035" /> <link title="schema(PRISM)" rel="schema.prism" href="http://prismstandard.org/namespaces/1.2/basic/" /> <meta name="prism.publicationDate" content="10/2024" /> <meta name="prism.publicationName" content="arXiv" /> <meta name="prism.startingPage" content="arXiv:2410.08035" /> <link title="schema(DC)" rel="schema.dc" href="http://purl.org/dc/elements/1.1/" /> <meta name="dc.identifier" content="doi:10.48550/arXiv.2410.08035" /> <meta name="dc.date" content="10/2024" /> <meta name="dc.source" content="arXiv" /> <meta name="dc.title" content="IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities" /> <meta name="dc.creator" content="Zhang, Xin"> <meta name="dc.creator" content="Lyu, Xiang"> <meta name="dc.creator" content="Du, Zhihao"> <meta name="dc.creator" content="Chen, Qian"> <meta name="dc.creator" content="Zhang, Dong"> <meta name="dc.creator" content="Hu, Hangrui"> <meta name="dc.creator" content="Tan, Chaohong"> <meta name="dc.creator" content="Zhao, Tianyu"> <meta name="dc.creator" content="Wang, Yuxuan"> <meta name="dc.creator" content="Zhang, Bin"> <meta name="dc.creator" content="Lu, Heng"> <meta name="dc.creator" content="Zhou, Yaqian"> <meta name="dc.creator" content="Qiu, Xipeng">  <meta name="twitter:card" content="summary_large_image"/> <meta name="twitter:description" content="Current methods of building LLMs with voice interaction capabilities rely heavily on explicit text autoregressive generation before or during speech response generation to maintain content quality, which unfortunately brings computational overhead and increases latency in multi-turn interactions. To address this, we introduce IntrinsicVoic,e an LLM designed with intrinsic real-time voice interaction capabilities. IntrinsicVoice aims to facilitate the transfer of textual capabilities of pre-trained LLMs to the speech modality by mitigating the modality gap between text and speech. Our novelty architecture, GroupFormer, can reduce speech sequences to lengths comparable to text sequences while generating high-quality audio, significantly reducing the length difference between speech and text, speeding up inference, and alleviating long-text modeling issues. Additionally, we construct a multi-turn speech-to-speech dialogue dataset named \method-500k which includes nearly 500k turns of speech-to-speech dialogues, and a cross-modality training strategy to enhance the semantic alignment between speech and text. Experimental results demonstrate that IntrinsicVoice can generate high-quality speech response with latency lower than 100ms in multi-turn dialogue scenarios. Demos are available at https://instrinsicvoice.github.io/."/> <meta name="twitter:title" content="IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities"/> <meta name="twitter:site" content="@adsabs"/> <meta name="twitter:domain" content="NASA/ADS"/> <meta name="twitter:image:src" content="https://ui.adsabs.harvard.edu/styles/img/transparent_logo.svg"/> <meta name="twitter:creator" content="@adsabs"/> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <base href="/"> <style> .btn-full-ads { color: #fff !important; background-color: #1a1a1a !important; border-color: #1a1a1a !important; margin-top: 9px !important; padding-bottom: 10px !important; padding-top: 10px !important; } .btn-full-ads:hover, .btn-full-ads:focus, .btn-full-ads:active, .btn-full-ads.active, .open>.dropdown-toggle.btn-full-ads { color: #000 !important; background-color: #ddd !important; border-color: #1a1a1a !important; } .dropdown-toggle:hover .dropdown-menu { display: block; } .navbar-nav.navbar-right:last-child { margin-right: -15px !important; } .navbar-right { @media screen (min-width: $screen-sm) { float: right!important; } } /*the container must be positioned relative:*/ .autocomplete { position: relative; display: inline-block; } .autocomplete-items { position: absolute; border: 1px solid #d4d4d4; border-bottom: none; border-top: none; z-index: 99; /*position the autocomplete items to be the same width as the container:*/ top: 100%; left: 0; right: 0; } .autocomplete-items div { padding: 10px; cursor: pointer; background-color: #fff; border-bottom: 1px solid #d4d4d4; } /*when hovering an item:*/ .autocomplete-items div:hover { background-color: #e9e9e9; } /*when navigating through the items using the arrow keys:*/ .autocomplete-active { background-color: #d7dfec !important; color: #000000; } </style> </head> <body> <div id="aria-announcement-container">Now on home page</div> <div id="app-container"> <div id="body-template-container"> <div class="s-master-page-manager"> <div id="navbar-container"> <div data-widget="NavbarWidget"> <nav class="navbar navbar-inverse"> <div class="container-fluid">  <div class=""> <ul class="nav navbar-nav navbar-left"> <li> <a class="navbar-brand" href="/"> <img class="s-ads-icon" src="/styles/img/transparent_logo.svg" alt="ads icon"/> <h1> <b>ads</b></h1> </a> </li> </ul> </div>  <div class="">  <ul class="nav navbar-nav navbar-right"> <li data-match-route="/"> <a href="/core/never/abs/2024arXiv241008035Z/abstract" style="transition: none; " class="btn btn-full-ads"> <i class="fa fa-refresh"></i> Enable full ADS </a> </li> </ul> </div> </div> </nav> </div> </div> <div id="content-container"> <div class="dynamic-container s-dynamic-container"> <div id="abstract-page-layout" class="s-abstract-page-layout"> <div class="row s-stable-search-bar-height s-results-control-row-container hidden-xs"> </div> <div class="row s-dynamic-page-body" id="dynamic-page-body"> <div class="s-abstract-content"> <div class="col-xs-12 col-sm-3 col-md-2" style="" id="left-column"> <div class="nav-container s-nav-container" style="transform: none; width: 100%; position: relative" id="left-column"> <nav> <div class="s-nav-header s-view-nav"> <i class="icon-list"></i> <h3>view </h3> </div> <a href="/abs/2024arXiv241008035Z/abstract" data-widget-id="ShowAbstract"> <div class="abstract-nav s-nav s-nav-selected"> <span class="s-content"> Abstract </span> </div> </a> </a> <a href="/abs/2024arXiv241008035Z/citations" aria-disabled="true" data-widget-id="ShowCitations"> <div class="abstract-nav s-nav "> <span class="s-content"> Citations <span class="num-items">(1)</span> </span> </div> </a> <a href="/abs/2024arXiv241008035Z/references" aria-disabled="true" data-widget-id="ShowReferences"> <div class="abstract-nav s-nav "> <span class="s-content"> References <span class="num-items">(23)</span> </span> </div> </a> <div class="abstract-nav s-nav s-nav-inactive "> <span class="s-content"> Co-Reads </span> </div> <a href="/abs/2024arXiv241008035Z/similar" aria-disabled="true" data-widget-id="ShowSimilar"> <div class="abstract-nav s-nav "> <span class="s-content"> Similar Papers </span> </div> </a> <div aria-disabled="true" data-widget-id="ShowToc"> <div class="abstract-nav s-nav s-nav-inactive"> <span class="s-content"> Volume Content </span> </div> </div> <div href="#" data-widget-id="ShowGraphics"> <div class="abstract-nav s-nav s-nav-inactive"> <span class="s-content"> Graphics </span> </div> </div> <a href="/abs/2024arXiv241008035Z/metrics" data-widget-id="ShowMetrics"> <div class="abstract-nav s-nav"> <span class="s-content"> Metrics </span> </div> <a href="/abs/2024arXiv241008035Z/exportcitation" data-widget-id="ShowExportcitation__default"> <div class="abstract-nav s-nav "> <span class="content"> Export Citation </span> </div> </a> </nav> </div> </div> <div class="col-xs-12 col-sm-8 col-md-7 col-lg-7 s-middle-column" id="middle-column" style="padding-bottom: 0%">  <div class="main-content-container s-main-content-container" id="main-content" tabindex="-1" style="margin-bottom: 5px"> <div class="print-visible"> <h2 style="margin-left:6.1%;">NASA/ADS</h2> </div> <div id="abstract-title-container" class="s-abstract-title-container"> <div data-widget="ShowAbstract"> <article class="s-abstract-metadata">  <h2 class="s-abstract-title"> IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities <a href=""></a> </h2> <div id="authors-and-aff" class="s-authors-and-aff"> <ul class="list-inline"> <li class="author"><a href="/search/?q=author%3A%22Zhang%2C+Xin%22">Zhang, Xin</a> </li>; <li class="author"><a href="/search/?q=author%3A%22Lyu%2C+Xiang%22">Lyu, Xiang</a> </li>; <li class="author"><a href="/search/?q=author%3A%22Du%2C+Zhihao%22">Du, Zhihao</a> </li>; <li class="author"><a href="/search/?q=author%3A%22Chen%2C+Qian%22">Chen, Qian</a> </li>; <li class="author"><a href="/search/?q=author%3A%22Zhang%2C+Dong%22">Zhang, Dong</a> </li>; <li class="author"><a href="/search/?q=author%3A%22Hu%2C+Hangrui%22">Hu, Hangrui</a> </li>; <li class="author"><a href="/search/?q=author%3A%22Tan%2C+Chaohong%22">Tan, Chaohong</a> </li>; <li class="author"><a href="/search/?q=author%3A%22Zhao%2C+Tianyu%22">Zhao, Tianyu</a> </li>; <li class="author"><a href="/search/?q=author%3A%22Wang%2C+Yuxuan%22">Wang, Yuxuan</a> </li>; <li class="author"><a href="/search/?q=author%3A%22Zhang%2C+Bin%22">Zhang, Bin</a> </li>; <li class="author"><a href="/search/?q=author%3A%22Lu%2C+Heng%22">Lu, Heng</a> </li>; <li class="author"><a href="/search/?q=author%3A%22Zhou%2C+Yaqian%22">Zhou, Yaqian</a> </li>; <li class="author"><a href="/search/?q=author%3A%22Qiu%2C+Xipeng%22">Qiu, Xipeng</a> </li> </ul> </div> <div class="s-abstract-text"> <h4 class="sr-only">Abstract</h4> <p> Current methods of building LLMs with voice interaction capabilities rely heavily on explicit text autoregressive generation before or during speech response generation to maintain content quality, which unfortunately brings computational overhead and increases latency in multi-turn interactions. To address this, we introduce IntrinsicVoic,e an LLM designed with intrinsic real-time voice interaction capabilities. IntrinsicVoice aims to facilitate the transfer of textual capabilities of pre-trained LLMs to the speech modality by mitigating the modality gap between text and speech. Our novelty architecture, GroupFormer, can reduce speech sequences to lengths comparable to text sequences while generating high-quality audio, significantly reducing the length difference between speech and text, speeding up inference, and alleviating long-text modeling issues. Additionally, we construct a multi-turn speech-to-speech dialogue dataset named \method-500k which includes nearly 500k turns of speech-to-speech dialogues, and a cross-modality training strategy to enhance the semantic alignment between speech and text. Experimental results demonstrate that IntrinsicVoice can generate high-quality speech response with latency lower than 100ms in multi-turn dialogue scenarios. Demos are available at https://instrinsicvoice.github.io/. </p> </div> <br> <dl class="s-abstract-dl-horizontal"> <dt>Publication:</dt> <dd> <div id="article-publication">arXiv e-prints</div> </dd> <dt>Pub Date:</dt> <dd>October 2024</dd> <dt>DOI:</dt> <dd> <span> <a href="/link_gateway/2024arXiv241008035Z/doi:10.48550/arXiv.2410.08035" target="_blank" rel="noopener">10.48550/arXiv.2410.08035</a> <i class="fa fa-external-link"></i> </span> </dd> <dt>arXiv:</dt> <dd> <span> <a href="/link_gateway/2024arXiv241008035Z/arXiv:2410.08035" target="_blank" rel="noopener">arXiv:2410.08035</a> <i class="fa fa-external-link"></i> </span> </dd> <dt>Bibcode:</dt> <dd> <a href="/abs/2024arXiv241008035Z/abstract"> 2024arXiv241008035Z </a> <i class="icon-help" title="The bibcode is assigned by the ADS as a unique identifier for the paper."></i> </dd> <dt>Keywords:</dt> <dd> <ul class="list-inline"> <li>Computer Science - Sound;</li> <li>Computer Science - Artificial Intelligence</li> </ul> </dd> </dl> </article> </div> <div data-widget="ShowCitations"></div> <div data-widget="ShowReferences"></div> <div data-widget="ShowCoreads"></div> <div data-widget="ShowSimilar"></div> <div data-widget="ShowTableofcontents"></div> <div data-widget="ShowGraphics"></div> <div data-widget="ShowExportcitation" data-origin="abstract"></div> <div data-widget="ShowMetrics" data-allow-redirect="false"></div> <div data-widget="MetaTagsWidget"></div> </div> </div> </div> <div class="s-right-col-container col-xs-12 col-sm-12 col-md-3 col-lg-2 s-right-column" id="right-col-container" > <div data-widget="ShowResources"> <div data-reactroot="" class="s-right-col-widget-container" style="padding: 10px" > <div> <div class="resources__container"> <div class="resources__full__list"> <div class="resources__header__row"> <i class="fa fa-file-text-o" aria-hidden="true"> </i> <div class="resources__header__title">full text sources</div> </div> <div class="resources__content"> <div class="resources__content__title">arXiv</div> <div class="resources__content__links"> <span> <a href="/link_gateway/2024arXiv241008035Z/EPRINT_PDF" rel="noopener" class="resources__content__link unlock" > <i class="fa fa-file-pdf-o" aria-hidden="true"> </i> </a> <div class="resources__content__link__separator">|</div> </span> <span> <a href="/link_gateway/2024arXiv241008035Z/EPRINT_HTML" rel="noopener" class="resources__content__link unlock" > <i class="fa fa-file-text" aria-hidden="true"> </i> </a> </span> </div> </div> </div> </div> <div data-widget="ShowAssociated"> </div> </div> </div> </div> <div data-widget="ShowGraphicsSidebar"> </div> </div> </div> </div> </div> </div> </div> <div id="footer-container"> <div data-widget="FooterWidget"> <div class="footer s-footer"> <footer> <div class="__footer_wrapper"> <div class="__footer_brand"> 漏 The SAO/NASA Astrophysics Data System <div class="__footer_brand_extra"> <p> <i class="fa fa-envelope"></i> adshelp[at]cfa.harvard.edu </p> <p> The ADS is operated by the Smithsonian Astrophysical Observatory under NASA Cooperative Agreement <em>NNX16AC86A</em> </p> </div> <div class="__footer_brand_logos"> <a href="http://www.nasa.gov" target="_blank" rel="noopener"> <img src="/styles/img/nasa.svg" alt="NASA logo" id="nasa-logo"> </a> <a href="http://www.si.edu" target="_blank" rel="noopener"> <img id="smithsonian-logo" src="/styles/img/smithsonian.svg" alt="Smithsonian logo"> </a> <a href="https://www.cfa.harvard.edu/" target="_blank" rel="noopener"> <img src="/styles/img/cfa.png" title="Harvard Center for Astrophysics logo" id="cfa-logo"> </a> </div> </div> <div class="__footer_list"> <div class="__footer_list_title"> Resources </div> <ul class="__footer_links"> <li> <a href="/about/" target="_blank" rel="noopener"> <i class="fa fa-question-circle"></i> About ADS </a> </li> <li> <a href="//ui.adsabs.harvard.edu/help/" target="_blank" rel="noopener"> <i class="fa fa-info-circle"></i> ADS Help </a> </li> <li> <a href="//ui.adsabs.harvard.edu/help/whats_new/" target="_blank" rel="noopener"> <i class="fa fa-bullhorn"></i> What's New </a> </li> <li> <a href="/about/careers/" target="_blank" rel="noopener"> <i class="fa fa-group"></i> Careers@ADS </a> </li> </ul> </div> <div class="__footer_list"> <div class="__footer_list_title"> Social </div> <ul class="__footer_links"> <li> <a href="//twitter.com/adsabs" target="_blank" rel="noopener"> <i class="fa fa-twitter"></i> @adsabs </a> </li> <li> <a href="//ui.adsabs.harvard.edu/blog/" target="_blank" rel="noopener"> <i class="fa fa-newspaper-o"></i> ADS Blog </a> </li> </ul> </div> <div class="__footer_list"> <div class="__footer_list_title"> Project </div> <ul class="__footer_links"> <li> <a href="/core/never">Switch to full ADS</a> </li> <li> <a href="https://adsisdownorjustme.herokuapp.com/" target="_blank" rel="noopener">Is ADS down? (or is it just me...)</a> </li> <li> <a href="http://www.si.edu" target="_blank" rel="noopener">Smithsonian Institution</a> </li> <li> <a href="http://www.si.edu/Privacy" target="_blank" rel="noopener">Smithsonian Privacy Notice</a> </li> <li> <a href="http://www.si.edu/Termsofuse" target="_blank" rel="noopener">Smithsonian Terms of Use</a> </li> <li> <a href="http://www.cfa.harvard.edu/sao" target="_blank" rel="noopener">Smithsonian Astrophysical Observatory</a> </li> <li> <a href="http://www.nasa.gov" target="_blank" rel="noopener">NASA</a> </li> </ul> </div> </div> </footer> </div> </div> </div> </div> </div> </div> <div id="darkSwitch" class="darkmode-toggle hidden" title="Turn on dark mode">馃寭</div> <script> function autocomplete(searchBox, autoValues) { // Arguments: the text field element and an array of possible autocompleted values var currentFocus; // selected autocomplete option // Function to be run when the user types searchBox.addEventListener("input", function(e) { var a, b, i, val = this.value; // close any list of autocomplete values closeAllLists(); if (!val) { return false;} val = val.split(/\s+/); val = val[val.length - 1]; if (!val) { return false;} currentFocus = -1; // Create a DIV element that will contain the items (values): a = document.createElement("DIV"); a.setAttribute("id", this.id + "autocomplete-list"); a.setAttribute("class", "autocomplete-items"); // Append the DIV element as a child of the autocomplete container: this.parentNode.appendChild(a); for (i = 0; i < autoValues.length; i++) { // Check if the item starts with the same letters as the text field value: if (autoValues[i].match.substr(0, val.length).toUpperCase() == val.toUpperCase()) { // Create a DIV element for each matching element: b = document.createElement("DIV"); b.innerHTML = autoValues[i].label; if ("desc" in autoValues[i]) { b.innerHTML += " <i>" + autoValues[i].desc + "</i>"; } if (autoValues[i].value.startsWith(autoValues[i].match) ) { b.innerHTML += " | <strong>" + autoValues[i].match.substr(0, val.length) + "</strong>"; b.innerHTML += autoValues[i].match.substr(val.length); } // Insert a input field that will hold the current array item's value: b.innerHTML += "<input type='hidden' value='" + autoValues[i].value + "'>"; // Listen to clicks on the item value (DIV element): b.addEventListener("click", function(e) { var terms = searchBox.value.split(/\s+/); // Remove the current part of the input used for matching terms.pop(); // Insert the value for the autocomplete text field: terms.push(this.getElementsByTagName("input")[0].value); searchBox.value = terms.join(" "); // Move cursor position inside quotes/parenthesis if needed searchBox.focus(); if (searchBox.value[searchBox.value.length-1] === '"' || searchBox.value[searchBox.value.length-1] === ')') { searchBox.setSelectionRange(searchBox.value.length-1, searchBox.value.length-1); } // Close the list of autocompleted values closeAllLists(); }); a.appendChild(b); } } if (a.children.length > 0) { // By default, enter will select the first entry currentFocus = 0; addActive(a.children); } }); /*execute a function presses a key on the keyboard:*/ searchBox.addEventListener("keydown", function(e) { var x = document.getElementById(this.id + "autocomplete-list"); if (x) x = x.getElementsByTagName("div"); if (e.keyCode == 40) { // If the arrow DOWN key is pressed, increase the currentFocus variable: currentFocus++; addActive(x); } else if (e.keyCode == 38) { //up // If the arrow UP key is pressed, decrease the currentFocus variable: currentFocus--; /*and and make the current item more visible:*/ addActive(x); } else if (e.keyCode == 13) { // If the ENTER key is pressed: if (currentFocus > -1) { // Prevent the form from being submitted: e.preventDefault(); // Simulate a click on the "active" item: if (x) x[currentFocus].click(); currentFocus = -1; } } }); function addActive(x) { // Classify an item as "active": if (!x) return false; // Remove the "active" class on all items: removeActive(x); if (currentFocus >= x.length) currentFocus = 0; if (currentFocus < 0) currentFocus = (x.length - 1); // Add class "autocomplete-active": x[currentFocus].classList.add("autocomplete-active"); } function removeActive(x) { // Remove the "active" class from all autocomplete items: for (var i = 0; i < x.length; i++) { x[i].classList.remove("autocomplete-active"); } } function closeAllLists(elmnt) { // Close all autocomplete lists in the document, except the one passed as an argument: var x = document.getElementsByClassName("autocomplete-items"); for (var i = 0; i < x.length; i++) { if (elmnt != x[i] && elmnt != searchBox) { x[i].parentNode.removeChild(x[i]); } } } // Any other clicks in the document: document.addEventListener("click", function (e) { closeAllLists(e.target); }); } var autoList = [ { value: 'author:""', label: 'Author', match: 'author:"' }, { value: 'author:"^"', label: 'First Author', match: 'first author' }, { value: 'author:"^"', label: 'First Author', match: 'author:"^' }, { value: 'bibcode:""', label: 'Bibcode', desc: 'e.g. bibcode:1989ApJ...342L..71R', match: 'bibcode:"' }, { value: 'bibstem:""', label: 'Publication', desc: 'e.g. bibstem:ApJ', match: 'bibstem:"' }, { value: 'bibstem:""', label: 'Publication', desc: 'e.g. bibstem:ApJ', match: 'publication (bibstem)' }, { value: 'arXiv:', label: 'arXiv ID', match: 'arxiv:' }, { value: 'doi:', label: 'DOI', match: 'doi:' }, { value: 'full:""', label: 'Full text search', desc: 'title, abstract, and body', match: 'full:' }, { value: 'full:""', label: 'Full text search', desc: 'title, abstract, and body', match: 'fulltext' }, { value: 'full:""', label: 'Full text search', desc: 'title, abstract, and body', match: 'text' }, { value: 'year:', label: 'Year', match: 'year' }, { value: 'year:1999-2005', label: 'Year Range', desc: 'e.g. 1999-2005', match: 'year range' }, { value: 'aff:""', label: 'Affiliation', match: 'aff:' }, { value: 'abs:""', label: 'Search abstract + title + keywords', match: 'abs:' }, { value: 'database:astronomy', label: 'Limit to papers in the astronomy database', match: 'database:astronomy' }, { value: 'database:physics', label: 'Limit to papers in the physics database', match: 'database:physics' }, { value: 'title:""', label: 'Title', match: 'title:"' }, { value: 'orcid:', label: 'ORCiD identifier', match: 'orcid:' }, { value: 'object:', label: 'SIMBAD object (e.g. object:LMC)', match: 'object:' }, { value: 'property:refereed', label: 'Limit to refereed', desc: '(property:refereed)', match: 'refereed' }, { value: 'property:refereed', label: 'Limit to refereed', desc: '(property:refereed)', match: 'property:refereed' }, { value: 'property:notrefereed', label: 'Limit to non-refereed', desc: '(property:notrefereed)', match: 'property:notrefereed' }, { value: 'property:notrefereed', label: 'Limit to non-refereed', desc: '(property:notrefereed)', match: 'notrefereed' }, { value: 'property:eprint', label: 'Limit to eprints', desc: '(property:eprint)', match: 'eprint' }, { value: 'property:eprint', label: 'Limit to eprints', desc: '(property:eprint)', match: 'property:eprint' }, { value: 'property:openaccess', label: 'Limit to open access', desc: '(property:openaccess)', match: 'property:openaccess' }, { value: 'property:openaccess', label: 'Limit to open access', desc: '(property:openaccess)', match: 'openaccess' }, { value: 'doctype:software', label: 'Limit to software', desc: '(doctype:software)', match: 'software' }, { value: 'doctype:software', label: 'Limit to software', desc: '(doctype:software)', match: 'doctype:software' }, { value: 'property:inproceedings', label: 'Limit to papers in conference proceedings', desc: '(property:inproceedings)', match: 'proceedings' }, { value: 'property:inproceedings', label: 'Limit to papers in conference proceedings', desc: '(property:inproceedings)', match: 'property:inproceedings' }, { value: 'citations()', label: 'Citations', desc: 'Get papers citing your search result set', match: 'citations(' }, { value: 'references()', label: 'References', desc: 'Get papers referenced by your search result set', match: 'references(' }, { value: 'trending()', label: 'Trending', desc: 'Get papers most read by users who recently read your search result set', match: 'trending(' }, { value: 'reviews()', label: 'Review Articles', desc: 'Get most relevant papers that cite your search result set', match: 'reviews(' }, { value: 'useful()', label: 'Useful', desc: 'Get papers most frequently cited by your search result set', match: 'useful(' }, { value: 'similar()', label: 'Similar', desc: 'Get papers that have similar full text to your search result set', match: 'similar(' }, ]; // initiate the autocomplete function on the "q" element, and pass along the operators array as possible autocomplete values: inputBox = document.getElementById("q") if (inputBox) { inputBox.focus() // autofucs inputBox.setSelectionRange(inputBox.value.length, inputBox.value.length); // bring cursor to the end autocomplete(inputBox, autoList); } </script> <script> (function() { // turn off no-js if we have javascript document.documentElement.className = document.documentElement.className.replace("no-js", "js"); function getCookie(cname) { var name = cname + "="; var decodedCookie = decodeURIComponent(document.cookie); var ca = decodedCookie.split(';'); for (var i = 0; i < ca.length; i++) { var c = ca[i]; while (c.charAt(0) == ' ') { c = c.substring(1); } if (c.indexOf(name) == 0) { return c.substring(name.length, c.length); } } return ""; } (function() { // looks for the cookie, and sets true if its 'always' const coreCookie = getCookie('core') === 'always'; // only load bumblebee if we detect the core cookie and we are on abstract page if (coreCookie || (!(/^\/abs\//.test(document.location.pathname)) && !coreCookie)) { return; } window.__PRERENDERED = true; const addScript = function(args, cb) { const script = document.createElement('script'); Object.keys(args).forEach((key) => { script.setAttribute(key, args[key]); }); script.onload = function() { cb && cb(script); }; document.body.appendChild(script); } window.require = { waitSeconds: 0, baseUrl: '/' }; addScript({ src: '/libs/require.js' }, () => { addScript({ src: '/config/shim.js' }); }); })(); })(); </script> </body> </html>

CINXE.COM

IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities - NASA/ADS