CINXE.COM

<!DOCTYPE html> <html lang="en-US"> <head> <meta charset="UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1">  <title>Main | Suchin Gururangan</title> <meta name="generator" content="Jekyll v3.10.0" /> <meta property="og:title" content="Main" /> <meta name="author" content=" " /> <meta property="og:locale" content="en_US" /> <meta name="description" content="I am a research scientist at Meta GenAI, on the Llama team. I received my PhD in Computer Science in 2024 at the University of Washington. I was supported by the 2022 Bloomberg PhD Fellowship, and was previously a visiting researcher at Meta AI and a predoctoral resident at AI2. 📥 Email 🧑🏾‍💻 Github 🎓 Google Scholar 📚 Semantic Scholar 𝕏 Twitter✍🏾 Blog" /> <meta property="og:description" content="I am a research scientist at Meta GenAI, on the Llama team. I received my PhD in Computer Science in 2024 at the University of Washington. I was supported by the 2022 Bloomberg PhD Fellowship, and was previously a visiting researcher at Meta AI and a predoctoral resident at AI2. 📥 Email 🧑🏾‍💻 Github 🎓 Google Scholar 📚 Semantic Scholar 𝕏 Twitter✍🏾 Blog" /> <link rel="canonical" href="https://suchin.io/" /> <meta property="og:url" content="https://suchin.io/" /> <meta property="og:site_name" content="Suchin Gururangan" /> <meta property="og:type" content="website" /> <meta name="twitter:card" content="summary" /> <meta property="twitter:title" content="Main" /> <script type="application/ld+json"> {"@context":"https://schema.org","@type":"WebSite","author":{"@type":"Person","name":" "},"description":"I am a research scientist at Meta GenAI, on the Llama team. I received my PhD in Computer Science in 2024 at the University of Washington. I was supported by the 2022 Bloomberg PhD Fellowship, and was previously a visiting researcher at Meta AI and a predoctoral resident at AI2. 📥 Email 🧑🏾‍💻 Github 🎓 Google Scholar 📚 Semantic Scholar 𝕏 Twitter✍🏾 Blog","headline":"Main","name":"Suchin Gururangan","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"https://suchin.io/assets/images/bio_photo.png"},"name":" "},"url":"https://suchin.io/"}</script>  <link rel="stylesheet" href="/assets/css/style.css?v=4ccaf751ddf1eb65ab5101d5967e70435f7bdf19">       </head> <body> <div class="wrapper"> <header> <h1><a href="https://suchin.io/">Suchin Gururangan</a></h1> <img src="/assets/images/bio_photo.png" alt="Logo" /> I am a research scientist at <a href="https://ai.meta.com/">Meta GenAI</a>, on the <a href="https://llama.meta.com/">Llama</a> team. I received my PhD in Computer Science in 2024 at the <a href="https://nlp.washington.edu/">University of Washington</a>. I was supported by the <a href="https://www.bloomberg.com/company/stories/introducing-the-fifth-cohort-of-bloomberg-data-science-ph-d-fellows-2022-2023/">2022 Bloomberg PhD Fellowship</a>, and was previously a visiting researcher at <a href="https://ai.meta.com/">Meta AI</a> and a predoctoral resident at <a href="https://allenai.org/">AI2</a>. <a href="mailto:suching@meta.com">📥 Email</a> <a href="https://github.com/kernelmachine">🧑🏾‍💻 Github</a> <a href="https://scholar.google.com/citations?user=CJIKhNIAAAAJ&hl=en&oi=ao">🎓 Google Scholar</a> <a href="https://www.semanticscholar.org/author/Suchin-Gururangan/40895369">📚 Semantic Scholar</a> <a href="https://twitter.com/ssgrn">𝕏 Twitter</a> <a href="https://suchin.io/blog">✍🏾 Blog</a> <a href="https://github.com/kernelmachine">View My GitHub Profile</a> </header> <section> <h3 id="publications">Publications</h3> <h4 id="2024">2024</h4> <hr /> <table> <tbody> <tr> <td><a href="https://ai.meta.com/research/publications/the-llama-3-herd-of-models/">The Llama 3 Herd of Models</a> Llama Team</td> <td><a href="https://github.com/meta-llama/llama-models">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2406.11794">DataComp-LM: In search of the next generation of training sets for language models</a> Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner, Maciej Kilian, Hanlin Zhang, Rulin Shao, Sarah Pratt, Sunny Sanyal, Gabriel Ilharco, Giannis Daras, Kalyani Marathe, Aaron Gokaslan, Jieyu Zhang, Khyathi Chandu, Thao Nguyen, Igor Vasiljevic, Sham Kakade, Shuran Song, Sujay Sanghavi, Fartash Faghri, Sewoong Oh, Luke Zettlemoyer, Kyle Lo, Alaaeldin El-Nouby, Hadi Pouransari, Alexander Toshev, Stephanie Wang, Dirk Groeneveld, Luca Soldaini, Pang Wei Koh, Jenia Jitsev, Thomas Kollar, Alexandros G. Dimakis, Yair Carmon, Achal Dave, Ludwig Schmidt, Vaishaal Shankar</td> <td><a href="https://www.datacomp.ai/dclm/">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2403.08540">Language models scale reliably with over-training and on downstream tasks</a> Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, Suchin Gururangan, Mitchell Wortsman, Rulin Shao, Jean Mercat, Alex Fang, Jeffrey Li, Sedrick Keh, Rui Xin, Marianna Nezhurina, Igor Vasiljevic, Jenia Jitsev, Alexandros G. Dimakis, Gabriel Ilharco, Shuran Song, Thomas Kollar, Yair Carmon, Achal Dave, Reinhard Heckel, Niklas Muennighoff, Ludwig Schmidt</td> <td><a href="https://github.com/mlfoundations/scaling">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2402.04333">LESS: Selecting Influential Data for Targeted Instruction Tuning</a> Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, Danqi Chen</td> <td><a href="https://github.com/princeton-nlp/LESS">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2401.10440">Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models</a> Terra Blevins, Tomasz Limisiewicz, Suchin Gururangan, Margaret Li, Hila Gonen, Noah A. Smith, Luke Zettlemoyer</td> <td> </td> </tr> <tr> <td><a href="https://arxiv.org/abs/2401.06408">AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters</a> Li Lucy, Suchin Gururangan, Luca Soldaini, Emma Strubell, David Bamman, Lauren Klein, Jesse Dodge</td> <td><a href="https://github.com/lucy3/whos_filtered">code</a></td> </tr> </tbody> </table> <h4 id="2023">2023</h4> <hr /> <table> <tbody> <tr> <td><a href="https://laion.ai/blog/open-lm/">OpenLM</a> Suchin Gururangan*, Mitchell Wortsman*, Samir Yitzhak Gadre, Achal Dave, Maciej Kilian, Weijia Shi, Jean Mercat, Georgios Smyrnis, Gabriel Ilharco, Matt Jordan, Reinhard Heckel, Alex Dimakis, Ali Farhadi, Vaishaal Shankar, Ludwig Schmidt *Equal Contribution</td> <td><a href="https://github.com/mlfoundations/open_lm">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2312.13401">Time is Encoded in the Weights of Finetuned Language Models</a> Kai Nylund, Suchin Gururangan, Noah A. Smith</td> <td><a href="https://github.com/KaiNylund/lm-weights-encode-time">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2308.04430">SILO Language Models: Isolating Legal Risk in a Nonparametric Datastore</a> Sewon Min*, Suchin Gururangan*, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer *Equal Contribution ICLR 2024, RegML 2024 ✨Outstanding Paper Award at RegML 2024 Workshop✨</td> <td><a href="https://github.com/kernelmachine/silo-lm">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2303.14177">Scaling Expert Language Models with Unsupervised Domain Discovery</a> Suchin Gururangan*, Margaret Li*, Mike Lewis, Weijia Shi, Tim Althoff, Noah A. Smith, Luke Zettlemoyer *Equal Contribution JMLR 2024</td> <td><a href="https://github.com/kernelmachine/cbtm">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2212.04089">Editing Models with Task Arithmetic</a> Gabriel Ilharco, Marco Tulio Riberio, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi ICLR 2023</td> <td><a href="https://github.com/mlfoundations/task_vectors">code</a></td> </tr> </tbody> </table> <h4 id="2022">2022</h4> <hr /> <table> <tbody> <tr> <td><a href="https://arxiv.org/abs/2210.11948">lo-fi: distributed fine-tuning without communication</a> Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos TMLR</td> <td><a href="https://github.com/kernelmachine/lofi">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2210.07370">M2D2: A Massively Multi-Domain Language Modeling Dataset</a> Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer EMNLP 2022</td> <td><a href="https://github.com/machelreid/m2d2">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2201.10474">Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection</a> Suchin Gururangan, Dallas Card, Sarah K. Dreier, Emily K. Gade, Leroy Wang, Blarry Wang,Luke Zettlemoyer, and Noah A. Smith EMNLP 2022</td> <td><a href="https://github.com/kernelmachine/quality-filter">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2205.13792">kNN-Prompt: Nearest Neighbor Zero-Shot Inference</a> Weijia Shi, Julian Michael, Suchin Gururangan, and Luke Zettlemoyer EMNLP 2022</td> <td><a href="https://github.com/swj0419/kNN_prompt">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2208.03306">Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models</a> Margaret Li*, Suchin Gururangan*, Tim Dettmers, Mike Lewis, Noah A. Smith, and Luke Zettlemoyer *Equal Contribution</td> <td><a href="https://github.com/hadasah/btm">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2111.07408">Time Waits for No One! Analysis and Challenges of Temporal Misalignment</a> Kelvin Luu, Daniel Khashabi, Suchin Gururangan, Karishma Mandyam, and Noah A. Smith NAACL 2022</td> <td><a href="https://github.com/Kel-Lu/time-waits-for-no-one">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2108.05036">DEMix Layers: Disentangling Domains for Modular Language Modeling</a> Suchin Gururangan, Mike Lewis, Ari Holtzman, Noah A. Smith, and Luke Zettlemoyer NAACL 2022</td> <td><a href="https://github.com/kernelmachine/demix">code</a></td> </tr> </tbody> </table> <h4 id="2021">2021</h4> <hr /> <table> <tbody> <tr> <td><a href="https://arxiv.org/abs/2107.00061">All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text</a> Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, and Noah A. Smith ACL 2021 ✨Outstanding Paper Award✨</td> <td> </td> </tr> <tr> <td><a href="https://arxiv.org/abs/2110.00613">Expected Validation Performance and Estimation of a Random Variable’s Maximum</a> Jesse Dodge, Suchin Gururangan, Roy Schwartz, Dallas Card, and Noah A. Smith</td> <td> </td> </tr> <tr> <td><a href="https://arxiv.org/abs/2104.06390">Detoxifying Language Models Risks Marginalizing Minority Voices</a> Albert Xu, Eshaan Pathak, Eric Wallace, Suchin Gururangan, Maarten Sap, and Dan Klein NAACL 2021</td> <td> </td> </tr> </tbody> </table> <h4 id="2020">2020</h4> <hr /> <table> <tbody> <tr> <td><a href="https://arxiv.org/abs/2009.11462">RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models</a> Sam Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith EMNLP Findings 2020</td> <td><a href="https://github.com/allenai/real-toxicity-prompts">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/2004.10964">Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks</a> Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith ACL 2020 ✨Honorable Mention for Best Overall Paper✨</td> <td><a href="https://github.com/allenai/dont-stop-pretraining">code</a></td> </tr> </tbody> </table> <h4 id="2019">2019</h4> <hr /> <table> <tbody> <tr> <td><a href="https://arxiv.org/abs/1906.02242">Variational Pretraining for Semi-supervised Text Classification</a> Suchin Gururangan,Tam Dang, Dallas Card, and Noah A. Smith ACL 2019</td> <td><a href="https://github.com/allenai/vampire">code</a></td> </tr> <tr> <td><a href="https://arxiv.org/abs/1909.03004">Show Your Work: Improved Reporting of Experimental Results</a> Jesse Dodge, Suchin Gururangan, Roy Schwartz, Dallas Card, and Noah A. Smith EMNLP 2019</td> <td><a href="https://github.com/allenai/allentune">code</a></td> </tr> <tr> <td><a href="https://pubmed.ncbi.nlm.nih.gov/29357477">Emergent coordination underlying learning to reach to grasp with a brain-machine interface</a> with many authors 🙂 Journal of Neurophysiology</td> <td> </td> </tr> </tbody> </table> <h4 id="2018">2018</h4> <hr /> <table> <tbody> <tr> <td><a href="https://arxiv.org/abs/1803.02324">Annotation Artifacts in Natural Language Inference Data</a> Suchin Gururangan*, Swabha Swayamdipta*, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A. Smith *Equal contribution NAACL 2018</td> <td> </td> </tr> </tbody> </table> <h4 id="2014">2014</h4> <hr /> <table> <tbody> <tr> <td><a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003710">Analysis of Graph Invariants in Functional Neocortical Circuitry Reveals Generalized Features Common to Three Areas of Sensory Cortex</a> Suchin Gururangan, Alex Sadovsky and Jason Maclean Plos Compbio 2014</td> <td> </td> </tr> </tbody> </table> </section> <footer> Hosted on GitHub Pages — Theme by <a href="https://github.com/orderedlist">orderedlist</a> </footer> </div> <script src="/assets/js/scale.fix.js"></script> </body> </html>