<ol class="breathe-horizontal" start="1">
        <li class="arxiv-result">
          <div class="is-marginless">
            <p class="list-title is-inline-block"><a href="">arXiv:2409.15207</a>
              <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span>
            </p>
            <div class="tags is-inline-block">
              
                <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Solar and Stellar Astrophysics">astro-ph.SR</span>
              
                <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Astrophysics of Galaxies">astro-ph.GA</span>
              
            </div>
          </div>
          <p class="title is-5 mathjax">
            Chemical Abundances for a Sample of FGK dwarfs in the Pleiades Open Cluster from APOGEE
          </p>
          <p class="authors">
            <span class="search-hit">Authors:</span>
            <a href="/search/?searchtype=author&amp;query=Grilo%2C+V">Vinicius Grilo</a>,
            <a href="/search/?searchtype=author&amp;query=Souto%2C+D">Diogo Souto</a>,
            <a href="/search/?searchtype=author&amp;query=Cunha%2C+K">Katia Cunha</a>,
            <a href="/search/?searchtype=author&amp;query=Guer%C3%A7o%2C+R">Rafael Guerço</a>,
            <a href="/search/?searchtype=author&amp;query=Vieira%2C+R">Rodrigo Vieira</a>,
            <a href="/search/?searchtype=author&amp;query=Smith%2C+V">Verne Smith</a>,
            <a <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Solar and Stellar Astrophysics">astro-ph.SR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Astrophysics of Galaxies">astro-ph.GA</span> </div> </div> <p class="title is-5 mathjax"> Chemical Abundances for a Sample of FGK dwarfs in the Pleiades Open Cluster from APOGEE </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/?searchtype=author&amp;query=Grilo%2C+V">Vinicius Grilo</a>, <a href="/search/?searchtype=author&amp;query=Souto%2C+D">Diogo Souto</a>, <a href="/search/?searchtype=author&amp;query=Cunha%2C+K">Katia Cunha</a>, <a href="/search/?searchtype=author&amp;query=Guer%C3%A7o%2C+R">Rafael Guer莽o</a>, <a href="/search/?searchtype=author&amp;query=Vieira%2C+R">Rodrigo Vieira</a>, <a href="/search/?searchtype=author&amp;query=Smith%2C+V">Verne Smith</a>, <a href="/search/?searchtype=author&amp;query=Vilar%2C+D">Deusalete Vilar</a>, <a href="/search/?searchtype=author&amp;query=Andrade%2C+A">Anderson Andrade</a>, <a href="/search/?searchtype=author&amp;query=Wanderley%2C+F">Fabio Wanderley</a>, <a href="/search/?searchtype=author&amp;query=Daflon%2C+S">Simone Daflon</a>, <a href="/search/?searchtype=author&amp;query=Silva%2C+J+V+S">Jo茫o Victor Sales Silva</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.15207v1-abstract-short" style="display: inline;"> This paper presents chemical abundances of twelve elements (C, Na, Mg, Al, Si, K, Ca, Ti, V, Cr, Mn, and Fe) for 80 FGK dwarfs in the Pleiades open cluster, which span a temperature range of $\sim$2000 K in T$_{\rm eff}$, using the high-resolution (R$\sim$22,500) near-infrared SDSS-IV/APOGEE spectra ($位$1.51--1.69 \micron). Using a 1D LTE abundance analysis, we determine an overall metallicity of&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.15207v1-abstract-full').style.display = 'inline'; document.getElementById('2409.15207v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.15207v1-abstract-full" style="display: none;"> This paper presents chemical abundances of twelve elements (C, Na, Mg, Al, Si, K, Ca, Ti, V, Cr, Mn, and Fe) for 80 FGK dwarfs in the Pleiades open cluster, which span a temperature range of $\sim$2000 K in T$_{\rm eff}$, using the high-resolution (R$\sim$22,500) near-infrared SDSS-IV/APOGEE spectra ($位$1.51--1.69 \micron). Using a 1D LTE abundance analysis, we determine an overall metallicity of [Fe/H]=+0.03$\pm$0.04 dex, with the elemental ratios [$伪$/Fe]=+0.01$\pm$0.05, [odd-z/Fe]=-0.04$\pm$0.08, and [iron peak/Fe]=-0.02$\pm$0.08. These abundances for the Pleiades are in line with the abundances of other open clusters at similar galactocentric distances as presented in the literature. Examination of the abundances derived from each individual spectral line revealed that several of the stronger lines displayed trends of decreasing abundance with decreasing $T_{\rm eff}$. The list of spectral lines that yield abundances that are independent of $T_{\rm eff}$ are presented and used for deriving the final abundances. An investigation into possible causes of the temperature-dependent abundances derived from the stronger lines suggests that the radiative codes and the APOGEE line list we employ may inadequately model van der Waals broadening, in particular in the cooler K dwarfs. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.15207v1-abstract-full').style.display = 'none'; document.getElementById('2409.15207v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 23 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">16 pages, 7 figures, and 4 tables. Accepted for publication in MNRAS</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.06537</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Introducing the NewsPaLM MBR and QE Dataset: LLM-Generated High-Quality Parallel Data Outperforms Traditional Web-Crawled Data </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/?searchtype=author&amp;query=Finkelstein%2C+M">Mara Finkelstein</a>, <a href="/search/?searchtype=author&amp;query=Vilar%2C+D">David Vilar</a>, <a href="/search/?searchtype=author&amp;query=Freitag%2C+M">Markus Freitag</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.06537v5-abstract-short" style="display: inline;"> Recent research in neural machine translation (NMT) has shown that training on high-quality machine-generated data can outperform training on human-generated data. This work accompanies the first-ever release of a LLM-generated, MBR-decoded and QE-reranked dataset with both sentence-level and multi-sentence examples. We perform extensive experiments to demonstrate the quality of our dataset in ter&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.06537v5-abstract-full').style.display = 'inline'; document.getElementById('2408.06537v5-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.06537v5-abstract-full" style="display: none;"> Recent research in neural machine translation (NMT) has shown that training on high-quality machine-generated data can outperform training on human-generated data. This work accompanies the first-ever release of a LLM-generated, MBR-decoded and QE-reranked dataset with both sentence-level and multi-sentence examples. We perform extensive experiments to demonstrate the quality of our dataset in terms of its downstream impact on NMT model performance. We find that training from scratch on our (machine-generated) dataset outperforms training on the (web-crawled) WMT&#39;23 training dataset (which is 300 times larger), and also outperforms training on the top-quality subset of the WMT&#39;23 training dataset. We also find that performing self-distillation by finetuning the LLM which generated this dataset outperforms the LLM&#39;s strong few-shot baseline. These findings corroborate the quality of our dataset, and demonstrate the value of high-quality machine-generated data in improving performance of NMT models. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.06537v5-abstract-full').style.display = 'none'; document.getElementById('2408.06537v5-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 22 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 12 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.02832</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/?searchtype=author&amp;query=Trabelsi%2C+F">Firas Trabelsi</a>, <a href="/search/?searchtype=author&amp;query=Vilar%2C+D">David Vilar</a>, <a href="/search/?searchtype=author&amp;query=Finkelstein%2C+M">Mara Finkelstein</a>, <a href="/search/?searchtype=author&amp;query=Freitag%2C+M">Markus Freitag</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.02832v1-abstract-short" style="display: inline;"> Minimum Bayes Risk (MBR) decoding is a powerful decoding strategy widely used for text generation tasks, but its quadratic computational complexity limits its practical application. This paper presents a novel approach for approximating MBR decoding using matrix completion techniques, focusing on the task of machine translation. We formulate MBR decoding as a matrix completion problem, where the u&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.02832v1-abstract-full').style.display = 'inline'; document.getElementById('2406.02832v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.02832v1-abstract-full" style="display: none;"> Minimum Bayes Risk (MBR) decoding is a powerful decoding strategy widely used for text generation tasks, but its quadratic computational complexity limits its practical application. This paper presents a novel approach for approximating MBR decoding using matrix completion techniques, focusing on the task of machine translation. We formulate MBR decoding as a matrix completion problem, where the utility metric scores between candidate hypotheses and pseudo-reference translations form a low-rank matrix. First, we empirically show that the scores matrices indeed have a low-rank structure. Then, we exploit this by only computing a random subset of the scores and efficiently recover the missing entries in the matrix by applying the Alternating Least Squares (ALS) algorithm, thereby enabling a fast approximation of the MBR decoding process. Our experimental results on machine translation tasks demonstrate that the proposed method requires 1/16 utility metric computations compared to vanilla MBR decoding while achieving equal translation quality measured by COMET22 on the WMT22 dataset (en&lt;&gt;de and en&lt;&gt;ru). We also benchmark our method against other approximation methods and we show gains in quality when comparing to them. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.02832v1-abstract-full').style.display = 'none'; document.getElementById('2406.02832v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2311.05350</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> There&#39;s no Data Like Better Data: Using QE Metrics for MT Data Filtering </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/?searchtype=author&amp;query=Peter%2C+J">Jan-Thorsten Peter</a>, <a href="/search/?searchtype=author&amp;query=Vilar%2C+D">David Vilar</a>, <a href="/search/?searchtype=author&amp;query=Deutsch%2C+D">Daniel Deutsch</a>, <a href="/search/?searchtype=author&amp;query=Finkelstein%2C+M">Mara Finkelstein</a>, <a href="/search/?searchtype=author&amp;query=Juraska%2C+J">Juraj Juraska</a>, <a href="/search/?searchtype=author&amp;query=Freitag%2C+M">Markus Freitag</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2311.05350v1-abstract-short" style="display: inline;"> Quality Estimation (QE), the evaluation of machine translation output without the need of explicit references, has seen big improvements in the last years with the use of neural metrics. In this paper we analyze the viability of using QE metrics for filtering out bad quality sentence pairs in the training data of neural machine translation systems~(NMT). While most corpus filtering methods are foc&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2311.05350v1-abstract-full').style.display = 'inline'; document.getElementById('2311.05350v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2311.05350v1-abstract-full" style="display: none;"> Quality Estimation (QE), the evaluation of machine translation output without the need of explicit references, has seen big improvements in the last years with the use of neural metrics. In this paper we analyze the viability of using QE metrics for filtering out bad quality sentence pairs in the training data of neural machine translation systems~(NMT). While most corpus filtering methods are focused on detecting noisy examples in collections of texts, usually huge amounts of web crawled data, QE models are trained to discriminate more fine-grained quality differences. We show that by selecting the highest quality sentence pairs in the training data, we can improve translation quality while reducing the training size by half. We also provide a detailed analysis of the filtering results, which highlights the differences between both approaches. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2311.05350v1-abstract-full').style.display = 'none'; document.getElementById('2311.05350v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 9 November, 2023; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2023. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">to be published at WMT23</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2310.06707</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/?searchtype=author&amp;query=Tomani%2C+C">Christian Tomani</a>, <a href="/search/?searchtype=author&amp;query=Vilar%2C+D">David Vilar</a>, <a href="/search/?searchtype=author&amp;query=Freitag%2C+M">Markus Freitag</a>, <a href="/search/?searchtype=author&amp;query=Cherry%2C+C">Colin Cherry</a>, <a href="/search/?searchtype=author&amp;query=Naskar%2C+S">Subhajit Naskar</a>, <a href="/search/?searchtype=author&amp;query=Finkelstein%2C+M">Mara Finkelstein</a>, <a href="/search/?searchtype=author&amp;query=Garcia%2C+X">Xavier Garcia</a>, <a href="/search/?searchtype=author&amp;query=Cremers%2C+D">Daniel Cremers</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2310.06707v4-abstract-short" style="display: inline;"> Maximum-a-posteriori (MAP) decoding is the most widely used decoding strategy for neural machine translation (NMT) models. The underlying assumption is that model probability correlates well with human judgment, with better translations getting assigned a higher score by the model. However, research has shown that this assumption does not always hold, and generation quality can be improved by deco&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2310.06707v4-abstract-full').style.display = 'inline'; document.getElementById('2310.06707v4-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2310.06707v4-abstract-full" style="display: none;"> Maximum-a-posteriori (MAP) decoding is the most widely used decoding strategy for neural machine translation (NMT) models. The underlying assumption is that model probability correlates well with human judgment, with better translations getting assigned a higher score by the model. However, research has shown that this assumption does not always hold, and generation quality can be improved by decoding to optimize a utility function backed by a metric or quality-estimation signal, as is done by Minimum Bayes Risk (MBR) or quality-aware decoding. The main disadvantage of these approaches is that they require an additional model to calculate the utility function during decoding, significantly increasing the computational cost. In this paper, we propose to make the NMT models themselves quality-aware by training them to estimate the quality of their own output. Using this approach for MBR decoding we can drastically reduce the size of the candidate list, resulting in a speed-up of two-orders of magnitude. When applying our method to MAP decoding we obtain quality gains similar or even superior to quality reranking approaches, but with the efficiency of single pass decoding. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2310.06707v4-abstract-full').style.display = 'none'; document.getElementById('2310.06707v4-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 11 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 10 October, 2023; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2023. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2211.09102</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Prompting PaLM for Translation: Assessing Strategies and Performance </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/?searchtype=author&amp;query=Vilar%2C+D">David Vilar</a>, <a href="/search/?searchtype=author&amp;query=Freitag%2C+M">Markus Freitag</a>, <a href="/search/?searchtype=author&amp;query=Cherry%2C+C">Colin Cherry</a>, <a href="/search/?searchtype=author&amp;query=Luo%2C+J">Jiaming Luo</a>, <a href="/search/?searchtype=author&amp;query=Ratnakar%2C+V">Viresh Ratnakar</a>, <a href="/search/?searchtype=author&amp;query=Foster%2C+G">George Foster</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2211.09102v3-abstract-short" style="display: inline;"> Large language models (LLMs) that have been trained on multilingual but not parallel text exhibit a remarkable ability to translate between languages. We probe this ability in an in-depth study of the pathways language model (PaLM), which has demonstrated the strongest machine translation (MT) performance among similarly-trained LLMs to date. We investigate various strategies for choosing translat&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2211.09102v3-abstract-full').style.display = 'inline'; document.getElementById('2211.09102v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2211.09102v3-abstract-full" style="display: none;"> Large language models (LLMs) that have been trained on multilingual but not parallel text exhibit a remarkable ability to translate between languages. We probe this ability in an in-depth study of the pathways language model (PaLM), which has demonstrated the strongest machine translation (MT) performance among similarly-trained LLMs to date. We investigate various strategies for choosing translation examples for few-shot prompting, concluding that example quality is the most important factor. Using optimized prompts, we revisit previous assessments of PaLM&#39;s MT capabilities with more recent test sets, modern MT metrics, and human evaluation, and find that its performance, while impressive, still lags that of state-of-the-art supervised systems. We conclude by providing an analysis of PaLM&#39;s MT output which reveals some interesting properties and prospects for future work. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2211.09102v3-abstract-full').style.display = 'none'; document.getElementById('2211.09102v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 25 June, 2023; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 16 November, 2022; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2022. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">ACL 2023</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2112.03052</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Scaling Up Influence Functions </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/?searchtype=author&amp;query=Schioppa%2C+A">Andrea Schioppa</a>, <a href="/search/?searchtype=author&amp;query=Zablotskaia%2C+P">Polina Zablotskaia</a>, <a href="/search/?searchtype=author&amp;query=Vilar%2C+D">David Vilar</a>, <a href="/search/?searchtype=author&amp;query=Sokolov%2C+A">Artem Sokolov</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2112.03052v1-abstract-short" style="display: inline;"> We address efficient calculation of influence functions for tracking predictions back to the training data. We propose and analyze a new approach to speeding up the inverse Hessian calculation based on Arnoldi iteration. With this improvement, we achieve, to the best of our knowledge, the first successful implementation of influence functions that scales to full-size (language and vision) Transfor&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2112.03052v1-abstract-full').style.display = 'inline'; document.getElementById('2112.03052v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2112.03052v1-abstract-full" style="display: none;"> We address efficient calculation of influence functions for tracking predictions back to the training data. We propose and analyze a new approach to speeding up the inverse Hessian calculation based on Arnoldi iteration. With this improvement, we achieve, to the best of our knowledge, the first successful implementation of influence functions that scales to full-size (language and vision) Transformer models with several hundreds of millions of parameters. We evaluate our approach on image classification and sequence-to-sequence tasks with tens to a hundred of millions of training examples. Our code will be available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2112.03052v1-abstract-full').style.display = 'none'; document.getElementById('2112.03052v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 6 December, 2021; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2021. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Published at AAAI-22</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2110.06997</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Bandits Don&#39;t Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/?searchtype=author&amp;query=Kreutzer%2C+J">Julia Kreutzer</a>, <a href="/search/?searchtype=author&amp;query=Vilar%2C+D">David Vilar</a>, <a href="/search/?searchtype=author&amp;query=Sokolov%2C+A">Artem Sokolov</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2110.06997v1-abstract-short" style="display: inline;"> Training data for machine translation (MT) is often sourced from a multitude of large corpora that are multi-faceted in nature, e.g. containing contents from multiple domains or different levels of quality or complexity. Naturally, these facets do not occur with equal frequency, nor are they equally important for the test scenario at hand. In this work, we propose to optimize this balance jointly&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2110.06997v1-abstract-full').style.display = 'inline'; document.getElementById('2110.06997v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2110.06997v1-abstract-full" style="display: none;"> Training data for machine translation (MT) is often sourced from a multitude of large corpora that are multi-faceted in nature, e.g. containing contents from multiple domains or different levels of quality or complexity. Naturally, these facets do not occur with equal frequency, nor are they equally important for the test scenario at hand. In this work, we propose to optimize this balance jointly with MT model parameters to relieve system developers from manual schedule design. A multi-armed bandit is trained to dynamically choose between facets in a way that is most beneficial for the MT system. We evaluate it on three different multi-facet applications: balancing translationese and natural training data, or data from multiple domains or multiple language pairs. We find that bandit learning leads to competitive MT systems across tasks, and our analysis provides insights into its learned strategies and the underlying data sets. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2110.06997v1-abstract-full').style.display = 'none'; document.getElementById('2110.06997v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 October, 2021; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2021. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">EMNLP Findings 2021</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2008.04885</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">ps</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> The Sockeye 2 Neural Machine Translation Toolkit at AMTA 2020 </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/?searchtype=author&amp;query=Domhan%2C+T">Tobias Domhan</a>, <a href="/search/?searchtype=author&amp;query=Denkowski%2C+M">Michael Denkowski</a>, <a href="/search/?searchtype=author&amp;query=Vilar%2C+D">David Vilar</a>, <a href="/search/?searchtype=author&amp;query=Niu%2C+X">Xing Niu</a>, <a href="/search/?searchtype=author&amp;query=Hieber%2C+F">Felix Hieber</a>, <a href="/search/?searchtype=author&amp;query=Heafield%2C+K">Kenneth Heafield</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2008.04885v1-abstract-short" style="display: inline;"> We present Sockeye 2, a modernized and streamlined version of the Sockeye neural machine translation (NMT) toolkit. New features include a simplified code base through the use of MXNet&#39;s Gluon API, a focus on state of the art model architectures, distributed mixed precision training, and efficient CPU decoding with 8-bit quantization. These improvements result in faster training and inference, hig&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2008.04885v1-abstract-full').style.display = 'inline'; document.getElementById('2008.04885v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2008.04885v1-abstract-full" style="display: none;"> We present Sockeye 2, a modernized and streamlined version of the Sockeye neural machine translation (NMT) toolkit. New features include a simplified code base through the use of MXNet&#39;s Gluon API, a focus on state of the art model architectures, distributed mixed precision training, and efficient CPU decoding with 8-bit quantization. These improvements result in faster training and inference, higher automatic metric scores, and a shorter path from research to production. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2008.04885v1-abstract-full').style.display = 'none'; document.getElementById('2008.04885v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 11 August, 2020; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2020. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:1804.06609</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/?searchtype=author&amp;query=Post%2C+M">Matt Post</a>, <a href="/search/?searchtype=author&amp;query=Vilar%2C+D">David Vilar</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="1804.06609v2-abstract-short" style="display: inline;"> The end-to-end nature of neural machine translation (NMT) removes many ways of manually guiding the translation process that were available in older paradigms. Recent work, however, has introduced a new capability: lexically constrained or guided decoding, a modification to beam search that forces the inclusion of pre-specified words and phrases in the output. However, while theoretically sound, e&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('1804.06609v2-abstract-full').style.display = 'inline'; document.getElementById('1804.06609v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="1804.06609v2-abstract-full" style="display: none;"> The end-to-end nature of neural machine translation (NMT) removes many ways of manually guiding the translation process that were available in older paradigms. Recent work, however, has introduced a new capability: lexically constrained or guided decoding, a modification to beam search that forces the inclusion of pre-specified words and phrases in the output. However, while theoretically sound, existing approaches have computational complexities that are either linear (Hokamp and Liu, 2017) or exponential (Anderson et al., 2017) in the number of constraints. We present a algorithm for lexically constrained decoding with a complexity of O(1) in the number of constraints. We demonstrate the algorithms remarkable ability to properly place these constraints, and use it to explore the shaky relationship between model and BLEU scores. Our implementation is available as part of Sockeye. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('1804.06609v2-abstract-full').style.display = 'none'; document.getElementById('1804.06609v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 9 November, 2018; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 18 April, 2018; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> April 2018. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">11 pages, 9 figures, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:1712.05690</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> </div> </div> <p class="title is-5 mathjax"> Sockeye: A Toolkit for Neural Machine Translation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/?searchtype=author&amp;query=Hieber%2C+F">Felix Hieber</a>, <a href="/search/?searchtype=author&amp;query=Domhan%2C+T">Tobias Domhan</a>, <a href="/search/?searchtype=author&amp;query=Denkowski%2C+M">Michael Denkowski</a>, <a href="/search/?searchtype=author&amp;query=Vilar%2C+D">David Vilar</a>, <a href="/search/?searchtype=author&amp;query=Sokolov%2C+A">Artem Sokolov</a>, <a href="/search/?searchtype=author&amp;query=Clifton%2C+A">Ann Clifton</a>, <a href="/search/?searchtype=author&amp;query=Post%2C+M">Matt Post</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="1712.05690v2-abstract-short" style="display: inline;"> We describe Sockeye (version 1.12), an open-source sequence-to-sequence toolkit for Neural Machine Translation (NMT). Sockeye is a production-ready framework for training and applying models as well as an experimental platform for researchers. Written in Python and built on MXNet, the toolkit offers scalable training and inference for the three most prominent encoder-decoder architectures: attenti&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('1712.05690v2-abstract-full').style.display = 'inline'; document.getElementById('1712.05690v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="1712.05690v2-abstract-full" style="display: none;"> We describe Sockeye (version 1.12), an open-source sequence-to-sequence toolkit for Neural Machine Translation (NMT). Sockeye is a production-ready framework for training and applying models as well as an experimental platform for researchers. Written in Python and built on MXNet, the toolkit offers scalable training and inference for the three most prominent encoder-decoder architectures: attentional recurrent neural networks, self-attentional transformers, and fully convolutional networks. Sockeye also supports a wide range of optimizers, normalization and regularization techniques, and inference improvements from current NMT literature. Users can easily run standard training recipes, explore different model settings, and incorporate new ideas. In this paper, we highlight Sockeye&#39;s features and benchmark it against other NMT toolkits on two language arcs from the 2017 Conference on Machine Translation (WMT): English-German and Latvian-English. We report competitive BLEU scores across all three architectures, including an overall best score for Sockeye&#39;s transformer implementation. To facilitate further comparison, we release all system outputs and training scripts used in our experiments. </li>
        </ol>

