CINXE.COM
Mitchell Wortsman
<!DOCTYPE HTML> <html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Mitchell Wortsman</title> <meta name="author" content="Mitchell Wortsman"> <meta name="viewport" content="width=device-width, initial-scale=1"> <link rel="stylesheet" type="text/css" href="stylesheet.css"> <link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>馃寪</text></svg>"> </head> <body> <table style="width:100%;max-width:800px;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody> <tr style="padding:0px"> <td style="padding:0px"> <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody> <tr style="padding:0px"> <td style="padding:2.5%;width:63%;vertical-align:middle"> <p style="text-align:center"> <name>Mitchell Wortsman</name> </p> <p> </p> <p> I am a member of the technical staff on the pretraining team at Anthropic. Previously, I was a PhD student at the <a href="https://www.cs.washington.edu/">University of Washington</a>, advised by <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a> and <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a>. <!-- I am interested in large-scale machine learning, and have been fortunate to be a part of the <a href="https://arxiv.org/abs/2309.14322">small-scale proxies</a>, <a href="https://arxiv.org/abs/2203.05482">model soups</a> and <a href="https://github.com/mlfoundations/open_clip">OpenCLIP</a> projects. I am greatful to be supported by a Google PhD Fellowship. --> </p> <p style="text-align:center"> <a href="mailto:mitchnw@cs.washington.edu">Email</a>  /  <a href="https://scholar.google.com/citations?user=fzRnjFgAAAAJ">Google Scholar</a>  /  <a href="https://twitter.com/mitchnw">Twitter</a> </p> </td> <td style="padding:2.5%;width:40%;max-width:40%"> <a href="images/mitchell_small.jpg"><img style="width:75%;max-width:75%" alt="profile photo" src="images/mitchell_small.jpg" class="hoverZoomLink"></a> </td> </tr> </tbody></table> <!-- <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody> <tr onmouseout="end_user_stop()" onmouseover="end_user_start()"> <td style="padding:20px;width:100%;vertical-align:middle"> <heading>News</heading> <ul> <li>(September, 2023) New preprint on <a href="https://arxiv.org/abs/2309.14322">Small-scale proxies for large-scale Transformer training instabilities</a>. <li>(September, 2023) New preprint on <a href="https://arxiv.org/abs/2309.08586">Replacing softmax with ReLU in Vision Transformers</a>. <li>(September, 2023) Checkout <a href="https://github.com/mlfoundations/open_lm">OpenLM</a>, a hackable PyTorch codebase for research on medium sized language models. </ul> </td> </tr> </tbody></table> --> <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody> <tr onmouseout="end_user_stop()" onmouseover="end_user_start()"> <td style="padding:20px;width:100%;vertical-align:middle"> <heading>Select publications, preprints & projects</heading> <br> (*indicates equal contribution) <p> <a href="https://arxiv.org/abs/2309.14322"> <papertitle>Small-scale proxies for large-scale Transformer training instabilities</papertitle> </a> <br> <strong>Mitchell Wortsman</strong>, <a href="https://peterjliu.com/">Peter J. Liu</a>, <a href="https://sites.google.com/site/lechaoxiao/">Lechao Xiao</a>, <a href="https://scholar.google.com/citations?user=q1AewNAAAAAJ&hl=en">Katie Everett</a>, <a href="https://www.alexalemi.com/">Alex Alemi</a>, <a href="https://research.google/people/BenAdlam/">Ben Adlam</a>, <a href="https://jcoreyes.github.io/">John D. Co-Reyes</a>, <a href="https://scholar.google.com/citations?user=qS_ugJAAAAAJ&hl=en">Izzeddin Gur</a>, <a href="https://abhishek.umiacs.io/">Abhishek Kumar</a>, <a href="https://research.google/people/RomanNovak/">Roman Novak</a>, <a href="https://nlp.stanford.edu/~jpennin/">Jeffrey Pennington</a>, <a href="http://sohldickstein.com/">Jascha Sohl-dickstein</a>, <a href="https://kelvinxu.github.io/">Kelvin Xu</a>, <a href="https://jaehlee.github.io/">Jaehoon Lee</a>, <a href="https://scholar.google.com/citations?user=Ml_vQ8MAAAAJ&hl=en">Justin Gilmer</a>, <a href="https://simonster.com/">Simon Kornblith</a> <br> <em>ICLR</em>, 2024 <strong>(oral)</strong> <br> <a href="https://arxiv.org/abs/2309.14322">arxiv</a> </p> <p> <a href="https://arxiv.org/abs/2309.08586"> <papertitle>Replacing softmax with ReLU in Vision Transformers</papertitle> </a> <br> <strong>Mitchell Wortsman</strong>, <a href="https://jaehlee.github.io/">Jaehoon Lee</a>, <a href="https://scholar.google.com/citations?user=Ml_vQ8MAAAAJ&hl=en">Justin Gilmer</a>, <a href="https://simonster.com/">Simon Kornblith</a> <br> <em>ArXiv</em>, 2023 <br> <a href="https://arxiv.org/abs/2309.08586">arxiv</a> </p> <p> <a href="https://arxiv.org/abs/2304.14108"> <papertitle>DataComp: In search of the next generation of multimodal datasets</papertitle> </a> <br> <a href="https://sagadre.github.io/">Samir Yitzhak Gadre*</a>, <a href="http://gabrielilharco.com/">Gabriel Ilharco*</a>, Alex Fang*, <a href="https://www.linkedin.com/in/jonathan-hayase-5ab849128">Jonathan Hayase</a>, <a href="https://georgiossmyrnis.github.io/">Georgios Smyrnis</a>, <a href="https://thaonguyen19.github.io/">Thao Nguyen</a>, <a href="https://www.ryanmarten.com/">Ryan Marten</a>, <strong>Mitchell Wortsman</strong>, et al., <a href="https://www.cs.tau.ac.il/~ycarmon/">Yair Carmon</a>, <a href="http://vaishaal.com/">Vaishaal Shankar</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> <br> <em>NeurIPS</em>, 2023 <br> <a href="https://arxiv.org/abs/2304.14108">arXiv</a> </p> <p> <a href="https://arxiv.org/abs/2304.13013"> <papertitle>Stable and low-precision training for large-scale vision-language models</papertitle> </a> <br> <strong>Mitchell Wortsman</strong>*, <a href="https://timdettmers.com/">Tim Dettmers*</a>, <a href="https://www.cs.washington.edu/people/faculty/lsz">Luke Zettlemoyer</a>, <a href="http://www.arimorcos.com/">Ari S. Morcos</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> <br> <em>NeurIPS</em>, 2023 <br> <a href="https://arxiv.org/abs/2304.13013">arxiv</a> </p> <p> <a href="https://github.com/mlfoundations/open_flamingo"> <papertitle>OpenFlamingo: an open-source framework for training large multimodal models</papertitle> </a> <br> <a href="https://anas-awadalla.streamlit.app/">Anas Awadalla</a>, <a href="https://twitter.com/irena_gao">Irena Goa</a>, <a href="https://homes.cs.washington.edu/~jpgard/">Joshua Gardner</a>, <a href="https://jmhessel.com/">Jack Hessel</a>, et al., <strong>Mitchell Wortsman</strong>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> <br> <em>GitHub</em>, 2023 <br> </p> <p> <a href="https://arxiv.org/abs/2210.11948"> <papertitle>lo-fi: distributed fine-tuning without communication</papertitle> </a> <br> <strong>Mitchell Wortsman</strong>, <a href="https://suchin.io/">Suchin Gururangan</a>, <a href="https://mrshenli.github.io/">Shen Li</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a>, <a href="https://ai.facebook.com/people/michael-rabbat">Micheal Rabbat</a>, <a href="http://www.arimorcos.com/">Ari S. Morcos</a> <br> <em>TMLR</em>, 2022 <br> <a href="https://arxiv.org/abs/2210.11948">arxiv</a> </p> <!-- <p> <a href="https://arxiv.org/abs/2210.08402"> <papertitle>LAION-5B: An open large-scale dataset for training next generation image-text models</papertitle> </a> <br> <a href="http://christoph-schuhmann.de/">Christoph Schuhmann*</a>, <a href="https://github.com/rom1504/">Romain Beaumont*</a>, <a href="https://github.com/rvencu">Richard Vencu*</a>, <a href="https://cadegordon.io/">Cade Gordon*</a>, <a href="https://rwightman.com/">Ross Wightman*</a>, <a href="https://mehdidc.github.io/">Mehdi Cherti*</a>, <a>Theo Coombes</a>, <a>Aarush Katta</a>, <a>Clayton Mullis</a>, <strong>Mitchell Wortsman</strong>, <a>Patrick Schramowski</a>, <a>Srivatsa Kundurthy</a>, <a>Katherine Crowson</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt**</a>, <a>Robert Kaczmarczyk**</a>, <a href="https://scholar.google.de/citations?user=p1FuAMkAAAAJ&hl=en">Jenia Jitsev**</a> <br> <em>NeurIPS</em>, 2022 <strong>(outstanding paper award)</strong> <br> <a href="https://arxiv.org/abs/2210.08402">arxiv</a> </p> --> <!-- <p> <a href="https://arxiv.org/abs/2210.12517"> <papertitle>Exploring The Landscape of Distributional Robustness for Question Answering Models</papertitle> </a> <br> <a href="https://github.com/anas-awadalla">Anas Awadalla</a>, <strong>Mitchell Wortsman</strong>, <a href="http://gabrielilharco.com/">Gabriel Ilharco</a>, <a href="https://shmsw25.github.io/">Sewon Min</a>, <a>Ian Magnusson</a>, <a href="https://homes.cs.washington.edu/~hannaneh/">Hannaneh Hajishirzi</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> <br> <em>EMNLP Findings</em>, 2022 <br> <a href="https://arxiv.org/abs/2210.12517">arxiv</a> </p> --> <p> <a href="https://arxiv.org/abs/2208.05592"> <papertitle>Patching open-vocabulary models by interpolating weights</papertitle> </a> <br> <a href="http://gabrielilharco.com/">Gabriel Ilharco*</a>, <strong>Mitchell Wortsman*</strong>, <a href="https://sagadre.github.io/">Samir Yitzhak Gadre*</a>, <a href="https://www.cs.columbia.edu/~shurans/">Shuran Song</a> <a href="https://homes.cs.washington.edu/~hannaneh/">Hannaneh Hajishirzi</a>, <a href="https://simonster.com/">Simon Kornblith</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> <br> <em>NeurIPS</em>, 2022 <br> <a href="https://arxiv.org/abs/2208.05592">arxiv</a> </p> <!-- <p> <a href="https://arxiv.org/abs/2208.05516"> <papertitle>Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP</papertitle> </a> <br> <a href="https://thaonguyen19.github.io/">Thao Nguyen</a>, <a href="http://gabrielilharco.com/">Gabriel Ilharco</a>, <strong>Mitchell Wortsman</strong>, <a href="https://homes.cs.washington.edu/~sewoong/">Sewoong Oh</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> <br> <em>NeurIPS</em>, 2022 <strong>(oral)</strong> <br> <a href="https://arxiv.org/abs/2208.05516">arxiv</a> </p> --> <p> <a href="https://arxiv.org/abs/2203.10421"> <papertitle>CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and Exploration</papertitle> </a> <br> <a href="https://sagadre.github.io/">Samir Yitzhak Gadre</a>, <strong>Mitchell Wortsman</strong>, <a href="http://gabrielilharco.com/">Gabriel Ilharco</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> <a href="https://www.cs.columbia.edu/~shurans/">Shuran Song</a> <br> <em>ArXiv</em>, 2022 <br> <a href="https://arxiv.org/abs/2203.10421">arxiv</a> </p> <p> <a href="https://arxiv.org/abs/2203.05482"> <papertitle>Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time</papertitle> </a> <br> <strong>Mitchell Wortsman</strong>, <a href="http://gabrielilharco.com/">Gabriel Ilharco</a>, <a href="https://sagadre.github.io/">Samir Yitzhak Gadre</a>, <a href="https://twitter.com/beccaroelofs">Rebecca Roelofs</a>, <a href="https://raphagl.com/">Raphael Gontijo-Lopes</a>, <a href="http://www.arimorcos.com/">Ari S. Morcos</a>, <a href="https://hsnamkoong.github.io/">Hongseok Namkoong</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://www.cs.tau.ac.il/~ycarmon/">Yair Carmon**</a>, <a href="https://simonster.com/">Simon Kornblith**</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt**</a> <br> <em>ICML</em>, 2022 <br> <a href="https://arxiv.org/abs/2203.05482">arxiv</a> </p> <p> <a href="https://arxiv.org/abs/2205.01397"> <papertitle>Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)</papertitle> </a> <br> Alex Fang, <a href="http://gabrielilharco.com/">Gabriel Ilharco</a>, <strong>Mitchell Wortsman</strong>, <a href="https://github.com/Yuhao-Wan">Yuhao Wan</a>, <a href="http://vaishaal.com/">Vaishaal Shankar</a>, <a href="https://www.achaldave.com/">Achal Dave</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> <br> <em>ICML</em>, 2022 <br> <a href="https://arxiv.org/abs/2205.01397">arxiv</a> </p> <p> <a href="https://arxiv.org/abs/2109.01903"> <papertitle>Robust fine-tuning of zero-shot models</papertitle> </a> <br> <strong>Mitchell Wortsman*</strong>, <a href="http://gabrielilharco.com/">Gabriel Ilharco*</a>, <a href="https://jongwook.kim/">Jong Wook Kim</a>, <a href="https://www.linkedin.com/in/mike-li-48a192a6/">Mike Li</a>, <a href="https://simonster.com/">Simon Kornblith</a>, <a href="https://twitter.com/beccaroelofs">Rebecca Roelofs</a>, <a href="https://raphagl.com/">Raphael Gontijo-Lopes</a>, <a href="https://homes.cs.washington.edu/~hannaneh/">Hannaneh Hajishirzi</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://hsnamkoong.github.io/">Hongseok Namkoong</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> <br> <em>CVPR</em>, 2022 <strong>(oral, best paper finalist)</strong> <br> <a href="https://arxiv.org/abs/2109.01903">arxiv</a> / <a href="https://github.com/mlfoundations/wise-ft">code</a> </p> <p> <a href="https://github.com/mlfoundations/open_clip"> <papertitle>OpenCLIP: An open source implementation of CLIP</papertitle> </a> <br> <a href="http://gabrielilharco.com/">Gabriel Ilharco*</a>, <strong>Mitchell Wortsman*</strong>, <a href="https://rwightman.com/">Ross Wightman*</a>, <a href="https://cadegordon.io/">Cade Gordon*</a>, <a href="https://nicholas.carlini.com/">Nicholas Carlini</a>, <a href="https://www.rohantaori.com/">Rohan Taori</a>, <a href="https://www.achaldave.com/">Achal Dave</a>, <a href="http://vaishaal.com/">Vaishaal Shankar</a>, <a href="https://hsnamkoong.github.io/">Hongseok Namkoong</a>, <a href="https://scholar.google.bg/citations?user=vCHNxFcAAAAJ&hl=en">John Miller</a>, <a href="https://homes.cs.washington.edu/~hannaneh/">Hannaneh Hajishirzi</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> <br> <em>GitHub</em>, 2021 </p> <!-- <p> <a href="https://arxiv.org/abs/2102.10472"> <papertitle>Learning Neural Network Subspaces</papertitle> </a> <br> <strong>Mitchell Wortsman</strong>, <a href="https://homes.cs.washington.edu/~mchorton/">Maxwell Horton</a>, <a href="https://guestrin.su.domains/">Carlos Guestrin</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://mrastegari.github.io/">Mohammad Rastegari</a> <br> <em>ICML</em>, 2021 <br> <a href="https://arxiv.org/abs/2102.10472">arxiv</a> / <a href="https://github.com/apple/learning-subspaces">code</a> </p> --> <!-- <p> <a href="https://arxiv.org/abs/2006.14769"> <papertitle>Supermasks in Superposition</papertitle> </a> <br> <strong>Mitchell Wortsman*</strong>, <a href="https://vkramanuj.github.io/">Vivek Ramanujan*</a>, <a href="https://rosanneliu.com/">Rosanne Liu</a>, <a href="https://anikem.github.io/">Aniruddha Kembhavi</a>, <a href="https://mrastegari.github.io/">Mohammad Rastegari</a>, <a href="hhttps://yosinski.com/">Jason Yosinski</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a> <br> <em>NeurIPS</em>, 2020 <br> <a href="https://arxiv.org/abs/2006.14769">arxiv</a> / <a href="https://github.com/raivnlab/supsup">code</a> </p> --> <!-- <p> <a href="https://arxiv.org/abs/2002.03231"> <papertitle>Soft Threshold Weight Reparameterization for Learnable Sparsity</papertitle> </a> <br> <a href="https://homes.cs.washington.edu/~kusupati/">Aditya Kusupati</a>, <a href="https://raghavsomani.github.io/">Raghav Somani*</a>, <a href="https://vkramanuj.github.io/">Vivek Ramanujan*</a>, <strong>Mitchell Wortsman*</strong>, <a href="https://www.prateekjain.org/">Prateek Jain</a>, <a href="https://sham.seas.harvard.edu/">Sham Kakade</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a> <br> <em>ICML</em>, 2020 <br> <a href="https://arxiv.org/abs/2002.03231">arxiv</a> / <a href="https://github.com/RAIVNLab/STR">code</a> </p> --> <p> <a href="https://arxiv.org/abs/1911.13299"> <papertitle>What's Hidden in a Randomly Weighted Neural Network?</papertitle> </a> <br> <a href="https://vkramanuj.github.io/">Vivek Ramanujan*</a>, <strong>Mitchell Wortsman*</strong>, <a href="https://anikem.github.io/">Aniruddha Kembhavi</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://mrastegari.github.io/">Mohammad Rastegari</a> <br> <em>CVPR</em>, 2020 <br> <a href="https://arxiv.org/abs/1911.13299">arxiv</a> / <a href="https://github.com/allenai/hidden-networks">code</a> </p> <!-- <p> <a href="https://arxiv.org/abs/1906.00586"> <papertitle>Discovering Neural Wirings</papertitle> </a> <br> <strong>Mitchell Wortsman</strong>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://mrastegari.github.io/">Mohammad Rastegari</a> <br> <em>NeurIPS</em>, 2019 <br> <a href="https://arxiv.org/abs/1906.00586">arxiv</a> / <a href="https://github.com/allenai/dnw">code</a> </p> <p> <a href="https://arxiv.org/abs/1812.00971"> <papertitle>Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning</papertitle> </a> <br> <strong>Mitchell Wortsman</strong>, <a href="https://sites.google.com/view/ehsanik-personal-website">Kiana Ehsani</a>, <a href="https://mrastegari.github.io/">Mohammad Rastegari</a> <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://roozbehm.info/index.html">Roozbeh Mottaghi</a> <br> <em>CVPR</em>, 2019 <strong>(oral)</strong> <br> <a href="https://arxiv.org/abs/1812.00971">arxiv</a> / <a href="https://github.com/allenai/savn">code</a> </p> --> </td> </tr> <!-- <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody> <tr onmouseout="end_user_stop()" onmouseover="end_user_start()"> <td style="padding:20px;width:100%;vertical-align:middle"> <heading>More</heading> <p> <ul><li> I am part of team which works on an <a href="https://github.com/mlfoundations/open_clip">open source implementation of CLIP</a>. </li></ul> </p> <p> <ul><li> I am involved with <a href="https://mlcollective.org/">ML Collective</a> which aims to make ML research opportunities accessible. </li></ul> </p> <p> <ul><li> Previously, I interned at FAIR with <a href="http://www.arimorcos.com/">Ari S. Morcos</a>, at Apple with <a href="https://mrastegari.github.io/">Mohammad Rastegari</a>, and spent a great year at the Allen Institute for AI with <a href="https://roozbehm.info/index.html">Roozbeh Mottaghi</a>. </li></ul> </p> <p> <ul><li> I completed my undergraduate thesis in Applied Mathematics at Brown University with senior thesis advisor <a href="https://www.brown.edu/academics/applied-mathematics/faculty/kavita-ramanan/home">Professor Kavita Ramanan</a>. Thesis: <a href="https://mitchellnw.github.io/data/brown_thesis.pdf">Interacting Particles Systems and Efficient Approximations for Large Sparse Graphs</a>. This was my first research experience, for which I am grateful. </li></ul> </p> <p> <ul><li> I am from Toronto, Canada and enjoy music, skiing, hiking, camping, reading, and climbing. </li></ul> </p> </td> </tr> </tbody></table> --> </tbody></table> <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody> <tr> <td style="padding:0px"> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <p style="text-align:right;font-size:small;"> Template modified from <a href="https://github.com/jonbarron/jonbarron_website">here</a>. </p> </td> </tr> </tbody></table> </td> </tr> </table> </body> </html>