Mitchell Wortsman

<!DOCTYPE HTML> <html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Mitchell Wortsman</title> <meta name="author" content="Mitchell Wortsman"> <meta name="viewport" content="width=device-width, initial-scale=1"> <link rel="stylesheet" type="text/css" href="stylesheet.css"> <link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>馃寪</text></svg>"> </head> <body> <table style="width:100%;max-width:800px;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody> <tr style="padding:0px"> <td style="padding:0px"> <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody> <tr style="padding:0px"> <td style="padding:2.5%;width:63%;vertical-align:middle"> <name>Mitchell Wortsman</name> I am a member of the technical staff on the pretraining team at Anthropic. Previously, I was a PhD student at the <a href="https://www.cs.washington.edu/">University of Washington</a>, advised by <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a> and <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a>.  <a href="mailto:mitchnw@cs.washington.edu">Email</a> &nbsp/&nbsp <a href="https://scholar.google.com/citations?user=fzRnjFgAAAAJ">Google Scholar</a> &nbsp/&nbsp <a href="https://twitter.com/mitchnw">Twitter</a> </td> <td style="padding:2.5%;width:40%;max-width:40%"> <a href="images/mitchell_small.jpg"><img style="width:75%;max-width:75%" alt="profile photo" src="images/mitchell_small.jpg" class="hoverZoomLink"></a> </td> </tr> </tbody></table>  <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody> <tr onmouseout="end_user_stop()" onmouseover="end_user_start()"> <td style="padding:20px;width:100%;vertical-align:middle"> <heading>Select publications, preprints &amp projects</heading> (*indicates equal contribution) <a href="https://arxiv.org/abs/2309.14322"> <papertitle>Small-scale proxies for large-scale Transformer training instabilities</papertitle> </a> Mitchell Wortsman, <a href="https://peterjliu.com/">Peter J. Liu</a>, <a href="https://sites.google.com/site/lechaoxiao/">Lechao Xiao</a>, <a href="https://scholar.google.com/citations?user=q1AewNAAAAAJ&hl=en">Katie Everett</a>, <a href="https://www.alexalemi.com/">Alex Alemi</a>, <a href="https://research.google/people/BenAdlam/">Ben Adlam</a>, <a href="https://jcoreyes.github.io/">John D. Co-Reyes</a>, <a href="https://scholar.google.com/citations?user=qS_ugJAAAAAJ&hl=en">Izzeddin Gur</a>, <a href="https://abhishek.umiacs.io/">Abhishek Kumar</a>, <a href="https://research.google/people/RomanNovak/">Roman Novak</a>, <a href="https://nlp.stanford.edu/~jpennin/">Jeffrey Pennington</a>, <a href="http://sohldickstein.com/">Jascha Sohl-dickstein</a>, <a href="https://kelvinxu.github.io/">Kelvin Xu</a>, <a href="https://jaehlee.github.io/">Jaehoon Lee</a>, <a href="https://scholar.google.com/citations?user=Ml_vQ8MAAAAJ&hl=en">Justin Gilmer</a>, <a href="https://simonster.com/">Simon Kornblith</a> ICLR, 2024 (oral) <a href="https://arxiv.org/abs/2309.14322">arxiv</a> <a href="https://arxiv.org/abs/2309.08586"> <papertitle>Replacing softmax with ReLU in Vision Transformers</papertitle> </a> Mitchell Wortsman, <a href="https://jaehlee.github.io/">Jaehoon Lee</a>, <a href="https://scholar.google.com/citations?user=Ml_vQ8MAAAAJ&hl=en">Justin Gilmer</a>, <a href="https://simonster.com/">Simon Kornblith</a> ArXiv, 2023 <a href="https://arxiv.org/abs/2309.08586">arxiv</a> <a href="https://arxiv.org/abs/2304.14108"> <papertitle>DataComp: In search of the next generation of multimodal datasets</papertitle> </a> <a href="https://sagadre.github.io/">Samir Yitzhak Gadre*</a>, <a href="http://gabrielilharco.com/">Gabriel Ilharco*</a>, Alex Fang*, <a href="https://www.linkedin.com/in/jonathan-hayase-5ab849128">Jonathan Hayase</a>, <a href="https://georgiossmyrnis.github.io/">Georgios Smyrnis</a>, <a href="https://thaonguyen19.github.io/">Thao Nguyen</a>, <a href="https://www.ryanmarten.com/">Ryan Marten</a>, Mitchell Wortsman, et al., <a href="https://www.cs.tau.ac.il/~ycarmon/">Yair Carmon</a>, <a href="http://vaishaal.com/">Vaishaal Shankar</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> NeurIPS, 2023 <a href="https://arxiv.org/abs/2304.14108">arXiv</a> <a href="https://arxiv.org/abs/2304.13013"> <papertitle>Stable and low-precision training for large-scale vision-language models</papertitle> </a> Mitchell Wortsman*, <a href="https://timdettmers.com/">Tim Dettmers*</a>, <a href="https://www.cs.washington.edu/people/faculty/lsz">Luke Zettlemoyer</a>, <a href="http://www.arimorcos.com/">Ari S. Morcos</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> NeurIPS, 2023 <a href="https://arxiv.org/abs/2304.13013">arxiv</a> <a href="https://github.com/mlfoundations/open_flamingo"> <papertitle>OpenFlamingo: an open-source framework for training large multimodal models</papertitle> </a> <a href="https://anas-awadalla.streamlit.app/">Anas Awadalla</a>, <a href="https://twitter.com/irena_gao">Irena Goa</a>, <a href="https://homes.cs.washington.edu/~jpgard/">Joshua Gardner</a>, <a href="https://jmhessel.com/">Jack Hessel</a>, et al., Mitchell Wortsman, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> GitHub, 2023 <a href="https://arxiv.org/abs/2210.11948"> <papertitle>lo-fi: distributed fine-tuning without communication</papertitle> </a> Mitchell Wortsman, <a href="https://suchin.io/">Suchin Gururangan</a>, <a href="https://mrshenli.github.io/">Shen Li</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a>, <a href="https://ai.facebook.com/people/michael-rabbat">Micheal Rabbat</a>, <a href="http://www.arimorcos.com/">Ari S. Morcos</a> TMLR, 2022 <a href="https://arxiv.org/abs/2210.11948">arxiv</a>   <a href="https://arxiv.org/abs/2208.05592"> <papertitle>Patching open-vocabulary models by interpolating weights</papertitle> </a> <a href="http://gabrielilharco.com/">Gabriel Ilharco*</a>, Mitchell Wortsman*, <a href="https://sagadre.github.io/">Samir Yitzhak Gadre*</a>, <a href="https://www.cs.columbia.edu/~shurans/">Shuran Song</a> <a href="https://homes.cs.washington.edu/~hannaneh/">Hannaneh Hajishirzi</a>, <a href="https://simonster.com/">Simon Kornblith</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> NeurIPS, 2022 <a href="https://arxiv.org/abs/2208.05592">arxiv</a>  <a href="https://arxiv.org/abs/2203.10421"> <papertitle>CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and Exploration</papertitle> </a> <a href="https://sagadre.github.io/">Samir Yitzhak Gadre</a>, Mitchell Wortsman, <a href="http://gabrielilharco.com/">Gabriel Ilharco</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> <a href="https://www.cs.columbia.edu/~shurans/">Shuran Song</a> ArXiv, 2022 <a href="https://arxiv.org/abs/2203.10421">arxiv</a> <a href="https://arxiv.org/abs/2203.05482"> <papertitle>Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time</papertitle> </a> Mitchell Wortsman, <a href="http://gabrielilharco.com/">Gabriel Ilharco</a>, <a href="https://sagadre.github.io/">Samir Yitzhak Gadre</a>, <a href="https://twitter.com/beccaroelofs">Rebecca Roelofs</a>, <a href="https://raphagl.com/">Raphael Gontijo-Lopes</a>, <a href="http://www.arimorcos.com/">Ari S. Morcos</a>, <a href="https://hsnamkoong.github.io/">Hongseok Namkoong</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://www.cs.tau.ac.il/~ycarmon/">Yair Carmon**</a>, <a href="https://simonster.com/">Simon Kornblith**</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt**</a> ICML, 2022 <a href="https://arxiv.org/abs/2203.05482">arxiv</a> <a href="https://arxiv.org/abs/2205.01397"> <papertitle>Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)</papertitle> </a> Alex Fang, <a href="http://gabrielilharco.com/">Gabriel Ilharco</a>, Mitchell Wortsman, <a href="https://github.com/Yuhao-Wan">Yuhao Wan</a>, <a href="http://vaishaal.com/">Vaishaal Shankar</a>, <a href="https://www.achaldave.com/">Achal Dave</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> ICML, 2022 <a href="https://arxiv.org/abs/2205.01397">arxiv</a> <a href="https://arxiv.org/abs/2109.01903"> <papertitle>Robust fine-tuning of zero-shot models</papertitle> </a> Mitchell Wortsman*, <a href="http://gabrielilharco.com/">Gabriel Ilharco*</a>, <a href="https://jongwook.kim/">Jong Wook Kim</a>, <a href="https://www.linkedin.com/in/mike-li-48a192a6/">Mike Li</a>, <a href="https://simonster.com/">Simon Kornblith</a>, <a href="https://twitter.com/beccaroelofs">Rebecca Roelofs</a>, <a href="https://raphagl.com/">Raphael Gontijo-Lopes</a>, <a href="https://homes.cs.washington.edu/~hannaneh/">Hannaneh Hajishirzi</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://hsnamkoong.github.io/">Hongseok Namkoong</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> CVPR, 2022 (oral, best paper finalist) <a href="https://arxiv.org/abs/2109.01903">arxiv</a> / <a href="https://github.com/mlfoundations/wise-ft">code</a> <a href="https://github.com/mlfoundations/open_clip"> <papertitle>OpenCLIP: An open source implementation of CLIP</papertitle> </a> <a href="http://gabrielilharco.com/">Gabriel Ilharco*</a>, Mitchell Wortsman*, <a href="https://rwightman.com/">Ross Wightman*</a>, <a href="https://cadegordon.io/">Cade Gordon*</a>, <a href="https://nicholas.carlini.com/">Nicholas Carlini</a>, <a href="https://www.rohantaori.com/">Rohan Taori</a>, <a href="https://www.achaldave.com/">Achal Dave</a>, <a href="http://vaishaal.com/">Vaishaal Shankar</a>, <a href="https://hsnamkoong.github.io/">Hongseok Namkoong</a>, <a href="https://scholar.google.bg/citations?user=vCHNxFcAAAAJ&hl=en">John Miller</a>, <a href="https://homes.cs.washington.edu/~hannaneh/">Hannaneh Hajishirzi</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://people.csail.mit.edu/ludwigs/">Ludwig Schmidt</a> GitHub, 2021    <a href="https://arxiv.org/abs/1911.13299"> <papertitle>What's Hidden in a Randomly Weighted Neural Network?</papertitle> </a> <a href="https://vkramanuj.github.io/">Vivek Ramanujan*</a>, Mitchell Wortsman*, <a href="https://anikem.github.io/">Aniruddha Kembhavi</a>, <a href="https://homes.cs.washington.edu/~ali/">Ali Farhadi</a>, <a href="https://mrastegari.github.io/">Mohammad Rastegari</a> CVPR, 2020 <a href="https://arxiv.org/abs/1911.13299">arxiv</a> / <a href="https://github.com/allenai/hidden-networks">code</a>  </td> </tr>  </tbody></table> <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody> <tr> <td style="padding:0px"> Template modified from <a href="https://github.com/jonbarron/jonbarron_website">here</a>. </td> </tr> </tbody></table> </td> </tr> </table> </body> </html>

CINXE.COM

Mitchell Wortsman