CINXE.COM
Main | Suchin Gururangan
<!DOCTYPE html> <html lang="en-US"> <head> <meta charset="UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <!-- Begin Jekyll SEO tag v2.8.0 --> <title>Main | Suchin Gururangan</title> <meta name="generator" content="Jekyll v3.10.0" /> <meta property="og:title" content="Main" /> <meta name="author" content=" " /> <meta property="og:locale" content="en_US" /> <meta name="description" content="I am a research scientist at Meta GenAI, on the Llama team. I received my PhD in Computer Science in 2024 at the University of Washington. I was supported by the 2022 Bloomberg PhD Fellowship, and was previously a visiting researcher at Meta AI and a predoctoral resident at AI2. 📥 Email 🧑🏾💻 Github 🎓 Google Scholar 📚 Semantic Scholar 𝕏 Twitter✍🏾 Blog" /> <meta property="og:description" content="I am a research scientist at Meta GenAI, on the Llama team. I received my PhD in Computer Science in 2024 at the University of Washington. I was supported by the 2022 Bloomberg PhD Fellowship, and was previously a visiting researcher at Meta AI and a predoctoral resident at AI2. 📥 Email 🧑🏾💻 Github 🎓 Google Scholar 📚 Semantic Scholar 𝕏 Twitter✍🏾 Blog" /> <link rel="canonical" href="https://suchin.io/" /> <meta property="og:url" content="https://suchin.io/" /> <meta property="og:site_name" content="Suchin Gururangan" /> <meta property="og:type" content="website" /> <meta name="twitter:card" content="summary" /> <meta property="twitter:title" content="Main" /> <script type="application/ld+json"> {"@context":"https://schema.org","@type":"WebSite","author":{"@type":"Person","name":" "},"description":"I am a research scientist at Meta GenAI, on the Llama team. I received my PhD in Computer Science in 2024 at the University of Washington. I was supported by the 2022 Bloomberg PhD Fellowship, and was previously a visiting researcher at Meta AI and a predoctoral resident at AI2. 📥 Email 🧑🏾💻 Github 🎓 Google Scholar 📚 Semantic Scholar 𝕏 Twitter✍🏾 Blog","headline":"Main","name":"Suchin Gururangan","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"https://suchin.io/assets/images/bio_photo.png"},"name":" "},"url":"https://suchin.io/"}</script> <!-- End Jekyll SEO tag --> <link rel="stylesheet" href="/assets/css/style.css?v=4ccaf751ddf1eb65ab5101d5967e70435f7bdf19"> <!--[if lt IE 9]> <script src="https://cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv.min.js"></script> <![endif]--> <!-- start custom head snippets, customize with your own _includes/head-custom.html file --> <!-- Setup Google Analytics --> <!-- You can set your favicon here --> <!-- link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" --> <!-- end custom head snippets --> </head> <body> <div class="wrapper"> <header> <h1><a href="https://suchin.io/">Suchin Gururangan</a></h1> <img src="/assets/images/bio_photo.png" alt="Logo" /> <p class="description" ><p>I am a research scientist at <a href="https://ai.meta.com/">Meta GenAI</a>, on the <a href="https://llama.meta.com/">Llama</a> team. I received my PhD in Computer Science in 2024 at the <a href="https://nlp.washington.edu/">University of Washington</a>. I was supported by the <a href="https://www.bloomberg.com/company/stories/introducing-the-fifth-cohort-of-bloomberg-data-science-ph-d-fellows-2022-2023/">2022 Bloomberg PhD Fellowship</a>, and was previously a visiting researcher at <a href="https://ai.meta.com/">Meta AI</a> and a predoctoral resident at <a href="https://allenai.org/">AI2</a>. <br /><br /> <a href="mailto:suching@meta.com">📥 Email</a> <br /> <a href="https://github.com/kernelmachine">🧑🏾💻 Github</a> <br /> <a href="https://scholar.google.com/citations?user=CJIKhNIAAAAJ&hl=en&oi=ao">🎓 Google Scholar</a><br /> <a href="https://www.semanticscholar.org/author/Suchin-Gururangan/40895369">📚 Semantic Scholar</a><br /> <a href="https://twitter.com/ssgrn">𝕏 Twitter</a><br /><a href="https://suchin.io/blog">✍🏾 Blog</a></p> </p> <p class="view"><a href="https://github.com/kernelmachine">View My GitHub Profile</a></p> </header> <section> <h3 id="publications">Publications</h3> <h4 id="2024">2024</h4> <hr /> <table> <tbody> <tr> <td><a href="https://ai.meta.com/research/publications/the-llama-3-herd-of-models/">The Llama 3 Herd of Models</a><br /><sub> Llama Team</sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/meta-llama/llama-models">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2406.11794">DataComp-LM: In search of the next generation of training sets for language models</a><br /><sub>Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, <strong>Suchin Gururangan</strong>, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner, Maciej Kilian, Hanlin Zhang, Rulin Shao, Sarah Pratt, Sunny Sanyal, Gabriel Ilharco, Giannis Daras, Kalyani Marathe, Aaron Gokaslan, Jieyu Zhang, Khyathi Chandu, Thao Nguyen, Igor Vasiljevic, Sham Kakade, Shuran Song, Sujay Sanghavi, Fartash Faghri, Sewoong Oh, Luke Zettlemoyer, Kyle Lo, Alaaeldin El-Nouby, Hadi Pouransari, Alexander Toshev, Stephanie Wang, Dirk Groeneveld, Luca Soldaini, Pang Wei Koh, Jenia Jitsev, Thomas Kollar, Alexandros G. Dimakis, Yair Carmon, Achal Dave, Ludwig Schmidt, Vaishaal Shankar</sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://www.datacomp.ai/dclm/">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2403.08540">Language models scale reliably with over-training and on downstream tasks</a><br /><sub>Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, <strong>Suchin Gururangan</strong>, Mitchell Wortsman, Rulin Shao, Jean Mercat, Alex Fang, Jeffrey Li, Sedrick Keh, Rui Xin, Marianna Nezhurina, Igor Vasiljevic, Jenia Jitsev, Alexandros G. Dimakis, Gabriel Ilharco, Shuran Song, Thomas Kollar, Yair Carmon, Achal Dave, Reinhard Heckel, Niklas Muennighoff, Ludwig Schmidt</sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/mlfoundations/scaling">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2402.04333">LESS: Selecting Influential Data for Targeted Instruction Tuning</a><br /><sub>Mengzhou Xia, Sadhika Malladi, <strong>Suchin Gururangan</strong>, Sanjeev Arora, Danqi Chen</sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/princeton-nlp/LESS">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2401.10440">Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models</a><br /><sub>Terra Blevins, Tomasz Limisiewicz, <strong>Suchin Gururangan</strong>, Margaret Li, Hila Gonen, Noah A. Smith, Luke Zettlemoyer</sub></td> <td> </td> </tr> <tr> <td><a href="https://arxiv.org/abs/2401.06408">AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters</a><br /><sub>Li Lucy, <strong>Suchin Gururangan</strong>, Luca Soldaini, Emma Strubell, David Bamman, Lauren Klein, Jesse Dodge</sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/lucy3/whos_filtered">code</a></span></sub></td> </tr> </tbody> </table> <h4 id="2023">2023</h4> <hr /> <table> <tbody> <tr> <td><a href="https://laion.ai/blog/open-lm/">OpenLM</a><br /><sub><strong>Suchin Gururangan<sup>*</sup></strong>, Mitchell Wortsman<sup>*</sup>, Samir Yitzhak Gadre, Achal Dave, Maciej Kilian, Weijia Shi, Jean Mercat, Georgios Smyrnis, Gabriel Ilharco, Matt Jordan, Reinhard Heckel, Alex Dimakis, Ali Farhadi, Vaishaal Shankar, Ludwig Schmidt</sub><br /><sup>*</sup><sub>Equal Contribution</sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/mlfoundations/open_lm">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2312.13401">Time is Encoded in the Weights of Finetuned Language Models</a><br /><sub>Kai Nylund, <strong>Suchin Gururangan</strong>, Noah A. Smith</sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/KaiNylund/lm-weights-encode-time">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2308.04430">SILO Language Models: Isolating Legal Risk in a Nonparametric Datastore</a><br /><sub>Sewon Min<sup>*</sup>, <strong>Suchin Gururangan<sup>*</sup></strong>, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer</sub><br /><sub><sup>*</sup>Equal Contribution</sub><br /><sub><em>ICLR 2024, RegML 2024</em></sub><br />✨<sub><strong>Outstanding Paper Award at RegML 2024 Workshop</strong></sub>✨</td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/kernelmachine/silo-lm">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2303.14177">Scaling Expert Language Models with Unsupervised Domain Discovery</a><br /><sub><strong>Suchin Gururangan<sup>*</sup></strong>, Margaret Li<sup>*</sup>, Mike Lewis, Weijia Shi, Tim Althoff, Noah A. Smith, Luke Zettlemoyer</sub><br /><sub><sup>*</sup>Equal Contribution</sub><br /><sub><em>JMLR 2024</em></sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/kernelmachine/cbtm">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2212.04089">Editing Models with Task Arithmetic</a><br /><sub>Gabriel Ilharco, Marco Tulio Riberio, Mitchell Wortsman, <strong>Suchin Gururangan</strong>, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi</sub><br /><sub><em>ICLR 2023</em></sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/mlfoundations/task_vectors">code</a></span></sub></td> </tr> </tbody> </table> <h4 id="2022">2022</h4> <hr /> <table> <tbody> <tr> <td><a href="https://arxiv.org/abs/2210.11948">lo-fi: distributed fine-tuning without communication</a><br /><sub>Mitchell Wortsman, <strong>Suchin Gururangan</strong>, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos<br /></sub><sub><em>TMLR</em></sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/kernelmachine/lofi">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2210.07370">M2D2: A Massively Multi-Domain Language Modeling Dataset</a><br /><sub>Machel Reid, Victor Zhong, <strong>Suchin Gururangan</strong>, Luke Zettlemoyer </sub> <br /><sub><em>EMNLP 2022</em></sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/machelreid/m2d2">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2201.10474">Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection</a><br /><sub><strong>Suchin Gururangan</strong>, Dallas Card, Sarah K. Dreier, Emily K. Gade, Leroy Wang, Blarry Wang,Luke Zettlemoyer, and Noah A. Smith</sub><br /><sub><em>EMNLP 2022</em></sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/kernelmachine/quality-filter">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2205.13792">kNN-Prompt: Nearest Neighbor Zero-Shot Inference</a><br /><sub>Weijia Shi, Julian Michael, <strong>Suchin Gururangan</strong>, and Luke Zettlemoyer</sub><br /><sub><em>EMNLP 2022</em></sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/swj0419/kNN_prompt">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2208.03306">Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models</a><br /><sub>Margaret Li<sup>*</sup>, <strong>Suchin Gururangan<sup>*</sup></strong>, Tim Dettmers, Mike Lewis, Noah A. Smith, and Luke Zettlemoyer</sub><br /><sub><sup>*</sup>Equal Contribution</sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/hadasah/btm">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2111.07408">Time Waits for No One! Analysis and Challenges of Temporal Misalignment</a><br /><sub>Kelvin Luu, Daniel Khashabi, <strong>Suchin Gururangan</strong>, Karishma Mandyam, and Noah A. Smith</sub><br /><sub><em>NAACL 2022</em></sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/Kel-Lu/time-waits-for-no-one">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2108.05036">DEMix Layers: Disentangling Domains for Modular Language Modeling</a><br /><sub><strong>Suchin Gururangan</strong>, Mike Lewis, Ari Holtzman, Noah A. Smith, and Luke Zettlemoyer</sub><br /><sub><em>NAACL 2022</em></sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/kernelmachine/demix">code</a></span></sub></td> </tr> </tbody> </table> <h4 id="2021">2021</h4> <hr /> <table> <tbody> <tr> <td><a href="https://arxiv.org/abs/2107.00061">All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text</a><br /><sub>Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, <strong>Suchin Gururangan</strong>, and Noah A. Smith</sub><br /><sub><em>ACL 2021</em></sub><br />✨<sub><strong>Outstanding Paper Award</strong></sub>✨</td> <td> </td> </tr> <tr> <td><a href="https://arxiv.org/abs/2110.00613">Expected Validation Performance and Estimation of a Random Variable’s Maximum</a><br /><sub>Jesse Dodge, <strong>Suchin Gururangan</strong>, Roy Schwartz, Dallas Card, and Noah A. Smith</sub></td> <td> </td> </tr> <tr> <td><a href="https://arxiv.org/abs/2104.06390">Detoxifying Language Models Risks Marginalizing Minority Voices</a><br /><sub>Albert Xu, Eshaan Pathak, Eric Wallace, <strong>Suchin Gururangan</strong>, Maarten Sap, and Dan Klein</sub><br /><sub><em>NAACL 2021</em></sub></td> <td> </td> </tr> </tbody> </table> <h4 id="2020">2020</h4> <hr /> <table> <tbody> <tr> <td><a href="https://arxiv.org/abs/2009.11462">RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models</a><br /><sub> Sam Gehman, <strong>Suchin Gururangan</strong>, Maarten Sap, Yejin Choi, and Noah A. Smith</sub><br /><sub><em>EMNLP Findings 2020</em></sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/allenai/real-toxicity-prompts">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2004.10964">Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks</a><br /><sub><strong>Suchin Gururangan</strong>, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith </sub><br /><sub><em>ACL 2020</em></sub><br />✨<sub><strong>Honorable Mention for Best Overall Paper</strong></sub>✨</td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/allenai/dont-stop-pretraining">code</a></span></sub></td> </tr> </tbody> </table> <h4 id="2019">2019</h4> <hr /> <table> <tbody> <tr> <td><a href="https://arxiv.org/abs/1906.02242">Variational Pretraining for Semi-supervised Text Classification</a><br /><sub><strong>Suchin Gururangan</strong>,Tam Dang, Dallas Card, and Noah A. Smith</sub><br /><sub><em>ACL 2019</em></sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/allenai/vampire">code</a></span></sub></td> </tr> <tr> <td><a href="https://arxiv.org/abs/1909.03004">Show Your Work: Improved Reporting of Experimental Results</a><br /><sub>Jesse Dodge, <strong>Suchin Gururangan</strong>, Roy Schwartz, Dallas Card, and Noah A. Smith</sub><br /><sub><em>EMNLP 2019</em></sub></td> <td><sub><span style="border: 0.5px solid lightgrey; padding: 5px; box-shadow:2px 2px 2px grey; border-radius: 5px; margin-left: 10px; display: inline-block;"><a href="https://github.com/allenai/allentune">code</a></span></sub></td> </tr> <tr> <td><a href="https://pubmed.ncbi.nlm.nih.gov/29357477">Emergent coordination underlying learning to reach to grasp with a brain-machine interface</a><br /><sub> with many authors 🙂</sub><br /><sub><em>Journal of Neurophysiology</em></sub></td> <td> </td> </tr> </tbody> </table> <h4 id="2018">2018</h4> <hr /> <table> <tbody> <tr> <td><a href="https://arxiv.org/abs/1803.02324">Annotation Artifacts in Natural Language Inference Data</a><br /><sub><strong>Suchin Gururangan<sup>*</sup></strong>, Swabha Swayamdipta<sup>*</sup>, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A. Smith</sub> <br /><sub><sup>*</sup>Equal contribution</sub><br /><sub><em>NAACL 2018</em></sub></td> <td> </td> </tr> </tbody> </table> <h4 id="2014">2014</h4> <hr /> <table> <tbody> <tr> <td><a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003710">Analysis of Graph Invariants in Functional Neocortical Circuitry Reveals Generalized Features Common to Three Areas of Sensory Cortex</a><br /><sub><strong>Suchin Gururangan</strong>, Alex Sadovsky and Jason Maclean</sub><br /><sub><em>Plos Compbio 2014</em></sub></td> <td> </td> </tr> </tbody> </table> </section> <footer> <p><small>Hosted on GitHub Pages — Theme by <a href="https://github.com/orderedlist">orderedlist</a></small></p> </footer> </div> <script src="/assets/js/scale.fix.js"></script> </body> </html>