CINXE.COM
Generative Pretraining From Pixels
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <!-- Search Engine Info --><title>Generative Pretraining From Pixels</title><meta name="description" content="Generative Pretraining From PixelsMark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya SutskeverInspi..."><!-- Solution from http://stackoverflow.com/questions/31593297/using-execcommand-javascript-to-copy-hidden-text-to-clipboard --> <script src="https://proceedings.mlr.press/v119/assets/js/copy_input.js"></script> <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <script src="https://proceedings.mlr.press/v119/assets/js/download.js"></script> <meta name="citation_publisher" content="PMLR"/><meta name="citation_title" content="Generative Pretraining From Pixels"/><meta name="citation_language" content="en"/> <meta name="citation_abstract_html_url" content="https://proceedings.mlr.press/v119/chen20s.html"/><meta name="citation_pdf_url" content="http://proceedings.mlr.press/v119/chen20s/chen20s.pdf"><meta name="citation_firstpage" content="1691"><meta name="citation_lastpage" content="1703"><meta name="citation_author" content="Mark Chen"><meta name="citation_author" content="Alec Radford"><meta name="citation_author" content="Rewon Child"><meta name="citation_author" content="Jeffrey Wu"><meta name="citation_author" content="Heewoo Jun"><meta name="citation_author" content="David Luan"><meta name="citation_author" content="Ilya Sutskever"><meta name="citation_publication_date" content="2020/11/21"><meta name="citation_inbook_title" content="International Conference on Machine Learning"/><meta name="citation_conference_title" content="International Conference on Machine Learning"/><meta name="citation_issn" content="2640-3498"><meta name="twitter:card" content="summary"/> <meta name="twitter:title" content="Generative Pretraining From Pixels"/><meta name="twitter:site" content="@MLResearchPress" /><meta name="twitter:description" content="Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for ima..."/><meta name="twitter:image" content="https://proceedings.mlr.press/v119/assets/images/logo-pmlr.png"/> <meta name="twitter:image:alt" content="PMLR Logo"/><meta property="og:title" content="Generative Pretraining From Pixels"/> <meta property="og:description" content="Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to ..."/> <meta property="og:type" content="article"/><meta property="article:published_time" content="2020-11-21T00:00:00+00:00"><meta property="og:url" content="https://proceedings.mlr.press/v119/chen20s.html"/><meta property="og:image" content="https://proceedings.mlr.press/v119/assets/images/logo-pmlr.png"/><meta property="og:site_name" content="PMLR"/><!-- Style Info --> <link rel="stylesheet" type="text/css" href="https://proceedings.mlr.press/v119/assets/css/pmlr.css" /> <style>.hero-text { color: #303030; }</style> <!-- Icon Info --> <link rel="shortcut icon" href="https://proceedings.mlr.press/v119/assets/images/favicon-pmlr.ico" type="image/x-icon"> <link rel="icon" href="https://proceedings.mlr.press/v119/assets/images/favicon-pmlr.ico" type="image/x-icon"> <!-- Feed Info --><link type="application/atom+xml" rel="alternate" href="https://proceedings.mlr.press/v119/feed.xml" title="Proceedings of Machine Learning Research" /><!-- Scripting info --><!-- Solution from http://stackoverflow.com/questions/31593297/using-execcommand-javascript-to-copy-hidden-text-to-clipboard --> <script src="https://proceedings.mlr.press/v119/assets/js/copy_input.js"></script> <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <script src="https://proceedings.mlr.press/v119/assets/js/download.js"></script> <script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-92432422-1', 'auto'); ga('send', 'pageview'); </script> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" } }, tex2jax: { inlineMath: [ ['$','$'], ['\\(', '\\)'] ], displayMath: [ ['$$','$$'], ['\\[', '\\]'] ], processEscapes: true, } }); </script> <script async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/latest.js?config=TeX-AMS-MML_HTMLorMML"> </script> <!-- User custom header --> </head> <body> <header class="site-header"> <div class="wrapper"> <div class="hero-image"> <div class="hero-text"> <a href="/" target="_top"><img src="/v119/assets/images/logo-pmlr.svg" alt="[International Conference on Machine Learning Logo]"></a> Proceedings of Machine Learning Research </div> </div> <nav class="site-nav"> <input type="checkbox" id="nav-trigger" class="nav-trigger" /> <label for="nav-trigger"> <span class="menu-icon"><svg viewBox="0 0 18 15" width="18px" height="15px"> <path d="M18,1.484c0,0.82-0.665,1.484-1.484,1.484H1.484C0.665,2.969,0,2.304,0,1.484l0,0C0,0.665,0.665,0,1.484,0 h15.032C17.335,0,18,0.665,18,1.484L18,1.484z M18,7.516C18,8.335,17.335,9,16.516,9H1.484C0.665,9,0,8.335,0,7.516l0,0 c0-0.82,0.665-1.484,1.484-1.484h15.032C17.335,6.031,18,6.696,18,7.516L18,7.516z M18,13.516C18,14.335,17.335,15,16.516,15H1.484 C0.665,15,0,14.335,0,13.516l0,0c0-0.82,0.665-1.483,1.484-1.483h15.032C17.335,12.031,18,12.695,18,13.516L18,13.516z"/> </svg> </span> </label> <div class="trigger"><a class="page-link" href="https://proceedings.mlr.press/v119/">Volume 119</a><a class="page-link" href="http://www.jmlr.org/">JMLR</a> <a class="page-link" href="http://www.jmlr.org/mloss">MLOSS</a> <a class="page-link" href="/faq.html">FAQ</a> <a class="page-link" href="/spec.html">Submission Format</a> <a class="page-link" href="https://proceedings.mlr.press//v119/assets/rss/feed.xml"> <img src="https://proceedings.mlr.press/v119/assets/images/RSS.gif" class="rss" alt="RSS Feed"> </a> </div> </nav> </div> </header> <main class="page-content" aria-label="Content"><div class="wrapper"> <p style="text-align:right">[<a href="https://github.com/mlresearch/v119/edit/gh-pages/_posts/2020-11-21-chen20s.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/mlresearch/v119/edit/gh-pages/_posts/2020-11-21-chen20s.md', 13);">edit</a>]</p> <article class="post-content"> <h1>Generative Pretraining From Pixels</h1><span class="authors">Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever</span> <div id="info"><i>Proceedings of the 37th International Conference on Machine Learning</i>, PMLR 119:1691-1703, 2020. </div> <!-- info --> <h4>Abstract</h4> <div id="abstract" class="abstract"> Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full fine-tuning, matching the top supervised pre-trained models. We are also competitive with self-supervised benchmarks on ImageNet when substituting pixels for a VQVAE encoding, achieving 69.0% top-1 accuracy on a linear probe of our features. </div> <h4>Cite this Paper</h4> <hr class="bibhr"> <div class="bibbuttongroup row row--justified"> <div class="bibbuttontext column"> BibTeX </div> <div class="column"> <div class="codebox"> <code class="citecode" id="bibtex"> @InProceedings{pmlr-v119-chen20s, title = {Generative Pretraining From Pixels}, author = {Chen, Mark and Radford, Alec and Child, Rewon and Wu, Jeffrey and Jun, Heewoo and Luan, David and Sutskever, Ilya}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {1691--1703}, year = {2020}, editor = {III, Hal Daum茅 and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/chen20s/chen20s.pdf}, url = {https://proceedings.mlr.press/v119/chen20s.html}, abstract = {Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full fine-tuning, matching the top supervised pre-trained models. We are also competitive with self-supervised benchmarks on ImageNet when substituting pixels for a VQVAE encoding, achieving 69.0% top-1 accuracy on a linear probe of our features.} } </code> </div> <button class="button" id="button-bibtex1" onclick="CopyToClipboard('bibtex'); ga('send', 'event', 'Ref Copy', 'CopyToClipboard', '/v119/pmlr-v119-chen20s.bib', 15);">Copy to Clipboard</button> <button class="button" id="button-bibtex2" onclick="DownloadTexToFile('pmlr-v119-chen20s.bib', 'bibtex'); ga('send', 'event', 'Ref Downloads', 'Download', '/v119/pmlr-v119-chen20s.bib', 16);">Download</button> </div> </div> <div class="bibbuttongroup row row--justified"> <div class="bibbuttontext column"> Endnote </div> <div class="column"> <div class="codebox"> <code class="citecode" id="endnote">%0 Conference Paper %T Generative Pretraining From Pixels %A Mark Chen %A Alec Radford %A Rewon Child %A Jeffrey Wu %A Heewoo Jun %A David Luan %A Ilya Sutskever %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daum茅 III %E Aarti Singh %F pmlr-v119-chen20s %I PMLR %P 1691--1703 %U https://proceedings.mlr.press/v119/chen20s.html %V 119 %X Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full fine-tuning, matching the top supervised pre-trained models. We are also competitive with self-supervised benchmarks on ImageNet when substituting pixels for a VQVAE encoding, achieving 69.0% top-1 accuracy on a linear probe of our features. </code> </div> <button class="button" id="button-endnote1" onclick="CopyToClipboard('endnote'); ga('send', 'event', 'Ref Copy', 'CopyToClipboard', '/v119/pmlr-v119-chen20s.enw', 15);">Copy to Clipboard</button> <button class="button" id="button-endnote2" onclick="DownloadTexToFile('pmlr-v119-chen20s.enw', 'endnote'); ga('send', 'event', 'Ref Downloads', 'Download', '/v119/pmlr-v119-chen20s.enw', 16);">Download</button> </div> </div> <div class="bibbuttongroup row row--justified"> <div class="bibbuttontext column"> APA </div> <div class="column"> <div class="codebox"> <code class="citecode" id="apa"> Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D. & Sutskever, I.. (2020). Generative Pretraining From Pixels. <i>Proceedings of the 37th International Conference on Machine Learning</i>, in <i>Proceedings of Machine Learning Research</i> 119:1691-1703 Available from https://proceedings.mlr.press/v119/chen20s.html. </code> </div> <button class="button" id="button-apa1" onclick="CopyToClipboard('apa'); ga('send', 'event', 'Ref Copy', 'CopyToClipboard', '/v119/pmlr-v119-chen20s.txt', 15);">Copy to Clipboard</button> <button class="button" id="button-apa2" onclick="DownloadTexToFile('pmlr-v119-chen20s.txt', 'apa'); ga('send', 'event', 'Ref Downloads', 'Download', '/v119/pmlr-v119-chen20s.txt', 16);">Download</button> </div> </div> <hr class="bibhr"> <h4>Related Material</h4> <div id="extras"> <ul> <li><a href="http://proceedings.mlr.press/v119/chen20s/chen20s.pdf" target="_blank" onclick="ga('send', 'event', 'PDF Downloads', 'Download', 'http://proceedings.mlr.press/v119/chen20s/chen20s.pdf', 10);">Download PDF</a></li> <li><a href="https://github.com/openai/image-gpt" target="_blank" onclick="ga('send', 'event', 'Software Link', 'Software', 'https://github.com/openai/image-gpt', 0);">Software</a></li> </ul> </div> </article> </div> </main> <footer class="site-footer"> <div class="wrapper"> <p>This site last compiled Wed, 08 Feb 2023 10:37:20 +0000</p> <div class="footer-left"><i><a href="https://github.com/mlresearch/v119">Github Account</a></i> </div> <div class="footer-right">Copyright © <a href="https://proceedings.mlr.press">The authors and PMLR</a> 2023.<a href="https://twitter.com/MLResearchPress"><i class="icon-twitter"></i> MLResearchPress</a></div> </div> </footer> </body> </html>