CINXE.COM

Baidu Research

<!doctype html> <html> <head> <meta charset="utf-8"> <meta name="renderer" content="webkit" /> <meta name="format-detection" content="telephone=no"/> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" /> <meta content="width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1" name="viewport" /> <title>Baidu Research</title> <link rel="shortcut icon" href="../web/images/favicon.png" type="image/x-icon"> <link rel="stylesheet" href="../web/css/reset.css"> <link rel="stylesheet" href="../web/css/animate.css"> <link rel="stylesheet" href="../web/css/Bootstrap.css"> <link rel="stylesheet" href="../web/css/style.css"> <link rel="stylesheet" href="../web/css/media.css?3333"> <script src="../web/js/jquery.min.js"></script> </head> <body class="n_body" ondragstart="return false"> <!--[if lt IE 9]> <p class="browserupgrade">您在使用一个 <strong>旧版本的</strong> 浏览器。请 <a href="http://browsehappy.com/">更新你的浏览器</a> 来更好的体验本网站.</p> <![endif]--> <div class="n_header"> <div class="container"> <div class="header01"> <div class="logo"> <a class="h_logo" href="/Index" style="background-image: url(../web/images/logo.png);"><img src="../web/images/logo.png" alt="Baidu Research"></a> </div> <div class="nav"> <ul> <li><a href="/Index">Home</a></li> <li><a href="/Publications">Publications</a></li> <li><a href="/Research_Areas?id=55">Research Areas</a> <div class="nav_er"> <ul class="div_dl "> <li><a href="/Research_Areas/index-view?id=55">Data Science and Data Mining</a></li> <li><a href="/Research_Areas/index-view?id=56">Natural Language and Speech</a></li> <li><a href="/Research_Areas/index-view?id=57">Business Intelligence</a></li> <li><a href="/Research_Areas/index-view?id=58">Robotics and Autonomous Driving</a></li> <li><a href="/Research_Areas/index-view?id=59">Computer Vision</a></li> <li><a href="/Research_Areas/index-view?id=60">Machine Learning and Deep Learning</a></li> <li><a href="/Research_Areas/index-view?id=61">Computational Biology and Bioinformatics</a></li> <li><a href="/Research_Areas/index-view?id=62">High Performance Computing</a></li> <li><a href="/Research_Areas/index-view?id=75">Quantum Computing</a></li> </ul> </div> </li> <li><a class="active" href="/Blog">Blog</a></li> <li><a href="/Career">Careers</a></li> <li><a href="/Downloads">Downloads</a></li> <li><a href="/AI_Colloquium">AI Colloquium</a></li> </ul> <div class="nav_btn"><span></span></div> </div> </div> </div> <div class="header03 "> <div class="logo"><a href="/Index"><img src="../web/images/logo.png" alt="Baidu Research"></a></div> <div class="nav"> <ul> <li><a href="/Index">Home</a></li> <li><a href="/Publications">Publications</a></li> <li><a href="/Research_Areas?id=55">Research Areas</a> <div class="nav_er"> <ul class="div_dl "> <li><a href="/Research_Areas/index-view?id=55">Data Science and Data Mining</a></li> <li><a href="/Research_Areas/index-view?id=56">Natural Language and Speech</a></li> <li><a href="/Research_Areas/index-view?id=57">Business Intelligence</a></li> <li><a href="/Research_Areas/index-view?id=58">Robotics and Autonomous Driving</a></li> <li><a href="/Research_Areas/index-view?id=59">Computer Vision</a></li> <li><a href="/Research_Areas/index-view?id=60">Machine Learning and Deep Learning</a></li> <li><a href="/Research_Areas/index-view?id=61">Computational Biology and Bioinformatics</a></li> <li><a href="/Research_Areas/index-view?id=62">High Performance Computing</a></li> <li><a href="/Research_Areas/index-view?id=75">Quantum Computing</a></li> </ul> </div> </li> <li><a href="/Blog">Blog</a></li> <li><a href="/Career">Careers</a></li> <li><a href="/Downloads">Downloads</a></li> <li><a href="/AI_Colloquium">AI Colloquium</a></li> </ul> </div> <div class="nav_btn"><span></span></div> </div> </div> <div class="baidu-page-banner blog-side" style="background: url(/Public/uploads/5ae96c0a7676c.png);"> <div class="container"> <div class="baidu-page-title wow fadeIn">Blog</div> </div> </div> <div class="content-info"> <div class="container-details-er"> <div class="blog-details-title">Deep Learning Scaling is Predictable, Empirically</div> <div class="blog-details-date"><p>2017-12-07</p><a href="/Blog">Back to list</a></div> <p>Our digital world and data are growing faster today than any time in the past—even faster than our computing power. Deep learning helps us quickly&nbsp;make sense of immense data, and offers users the best AI-powered products and experiences.</p><p><br/></p><p>To continually improve user experience, our challenge, then, is to quickly improve our deep learning models for existing and emerging application domains.&nbsp;Model architecture search creates important new improvements, but this search often depends on epiphany; Breakthroughs often require complex reframing of the modeling problem, and can take weeks or months of testing.</p><p><br/></p><p>It be great if we could supplement model architecture search with more reliable ways to improve model accuracy.</p><p><br/></p><p>Today, we are releasing&nbsp;<a href="https://arxiv.org/abs/1712.00409" target="_blank" style="color: rgb(192, 0, 0); text-decoration: underline;"><span style="color: rgb(192, 0, 0);">a large-scale study showing that deep learning accuracy improves predictably</span></a>&nbsp;as we grow training data set size. Through empirical testing, we find&nbsp;predictable accuracy scaling as long as we have enough data and compute power to train large models. These results hold for a broad spectrum of state-of-the-art models over four application domains: machine translation, language modeling, image classification, and speech recognition.</p><p><br/></p><p>More specifically, our results show that generalization error—the measure of how well a model can predict new samples—decreases as a power-law of the training data set size. Prior theoretical work also shows that error scaling should be a power-law. However, these works typically predict a “steep” learning curve—the power-law exponent is expected to be -0.5—suggesting models should learn very quickly. Our empirically-collected learning curves show&nbsp;smaller magnitude exponents in the range&nbsp;</p><p><br/></p><p>[-0.35, -0.07]: Real models actually learn real-world data more slowly than suggested by theory.&nbsp;</p><p><br/></p><p>As an example, consider our results for word language modeling below (Note the log-log scale!):</p><p style="text-align: center;"><img src="/ueditor/upload/20180404/1522831822119821.png" title="1522831822119821.png" alt="01.png"/></p><p><br/></p><p style="text-align: center;">Word language models show predictable power-law validation error scaling as training set size grows.</p><p><br/></p><p>For word language modeling, we test&nbsp;<a href="https://arxiv.org/abs/1602.02410" target="_blank" style="color: rgb(192, 0, 0); text-decoration: underline;"><span style="color: rgb(192, 0, 0);">LSTM</span></a>&nbsp;and&nbsp;<a href="https://arxiv.org/abs/1607.03474" target="_blank" style="color: rgb(192, 0, 0); text-decoration: underline;"><span style="color: rgb(192, 0, 0);">RHN</span></a>&nbsp;models on subsets of the&nbsp;<a href="https://research.google.com/pubs/pub41880.html" target="_blank" style="color: rgb(192, 0, 0); text-decoration: underline;"><span style="color: rgb(192, 0, 0);">Billion Word dataset</span></a>. The figure above shows each model architecture’s validation error (an approximation to generalization error) for the best-fit model size at each training set size. These learning curves are each predictable power-laws, which surprisingly have the same power-law exponent. On larger training sets, models tend away from the curve, but we find that optimization hyperparameter search often closes the gap.</p><p><br/></p><p>Model error improves starting with “best guessing” and following the power-law curve down to “irreducible error”.</p><p><br/></p><p>More generally, our empirical results suggest that learning curves take the following form (Again, log-log scale!):</p><p style="text-align: center;"><img src="/ueditor/upload/20180404/1522831852227336.png" title="1522831852227336.png" alt="02.png"/></p><p style="text-align: center;">Sketch of power-law learning curves for real applications</p><p><br/></p><p>The figure above shows a cartoon sketch power-law plot that breaks down learning curve phases for real applications. The curve begins in the small data region, where models struggle to learn from a small number of training samples. Here, models only perform as well as “best” or “random” guessing.&nbsp;The middle portion of learning curves is the power-law region, where each new training sample provides information that helps models improve predictions on previously unseen samples. The&nbsp;power-law exponent defines the steepness of this curve (slope when viewed in log-log scale). The exponent is an indicator of the difficulty of understanding the data.&nbsp;Finally, for most real world applications, there is likely to be a non-zero lower-bound error past which models will be unable to improve (we have yet to reach irreducible error on our real-world tests, but we’ve tested that it exists for toy problems). This irreducible error is caused by a combination of factors inherent in real-world data.</p><p><br/></p><p>Across the applications we test, we find:</p><ul class=" list-paddingleft-2" style="list-style-type: disc;"><li><p>Power-law learning curves exist across all applications, model architectures, optimizers, and loss functions.</p></li><li><p>Most surprisingly, for a single application, different model architectures and optimizers show the&nbsp;same power-law exponent. The different models learn at the same relative rate as training set size increases.</p></li><li><p>The required model size (in number of parameters) to best fit each training set grows sublinearly in the training set size. This relationship is also empirically predictable.</p></li></ul><p>We hope that these findings can spark a broader conversation in the deep learning community about ways that we can speed up deep learning progress.&nbsp;For deep learning researchers, learning curves can assist&nbsp;model debugging&nbsp;and predict the accuracy targets for improved model architectures. There is opportunity for&nbsp;redoubled effort to theoretically predict or interpret learning curve exponents. Further, predictable learning curves can guide decision-making about whether or how to grow data sets, system design and expansion, and they underscore the importance of continued computational scaling.</p><p><br/></p><p>More details and data can be found in our paper:&nbsp;<span style="color: rgb(192, 0, 0); text-decoration: underline;"><a href="https://arxiv.org/abs/1712.00409" target="_blank" style="color: rgb(192, 0, 0); text-decoration: underline;">Deep Learning Scaling is Predictable, Empirically</a></span></p><p><br/></p><p>Authors:&nbsp;Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou, Yi Li</p><p><br/></p><p>This work would not have been possible without the significant efforts of the Systems team at Baidu Silicon Valley AI Lab (SVAIL). In addition to co-authors, we specifically thank members of the machine learning research team,&nbsp;Rewon Child, Jiaji Huang, Sercan Arik, and Anuroop Sriram, who have provided valuable feedback. We also thank those who have contributed to the discussion: Awni Hannun, Andrew Ng,&nbsp;Ilya Sutskever, Ian Goodfellow, Pieter Abbeel.</p> <div class="pager"> <a href="/Blog/index-view?id=87"> <i class="glyphicon glyphicon-menu-up"></i>Previous One:PaddlePaddle Fluid: Elastic Deep Learning on Kubernetes </a> <a href="/Blog/index-view?id=90"> <i class="glyphicon glyphicon-menu-down"></i>Next One:Deep Speech 3: Even more end-to-end speech recognition </a> </div> </div> </div> <footer> <div class="baidu-bottom"> <div class="container"> <div class="col-md-6 col-xs-12"> <h2>Baidu Research</h2> <p>1195 Bordeaux Drive Sunnyvale, CA 94089<br>Baidu Technology Park, No. 10 Xibeiwang East Road, Haidian District, Beijing, China<br>Media Inquiries: <a href="mailto:intlcomm@baidu.com">intlcomm@baidu.com</a><br>General Inquries: <a href="mailto:air-info@baidu.com">air-info@baidu.com</a></p> </div> <div class="col-md-6 col-xs-12"> <ul class="social-icons"> <li><a href="https://twitter.com/baiduresearch" target="_blank"><img src="../web/images/ico-2.png"></a> </li> <li><a href="https://www.linkedin.com/company/baidu-usa" target="_blank"><img src="../web/images/ico-3.png"></a> </li> </ul> <div class="baidu-weibu"> <div class="baidu-img"><img src="../web/images/f-logo.png"></div> <div class="baidu-links"> <a class="baidu-links-title" href="javascript:;">Links</a> <ul class="baidu-links-friends"> <li><a href="http://ai.baidu.com/" target="_blank">Baidu AI Open Platform</a> </li> <li><a href="http://www.dlnel.org/" target="_blank">DLNEL</a> </li> </ul> </div> </div> </div> </div> </div> <div class="baidu-foot">© 2018 Baidu Research</div> </footer> <script src="../web/js/bootstrap.min.js"></script> <script src="../web/js/wow.js"></script> <script src="../web/js/base.js"></script> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10