Baidu Research

<!doctype html> <html> <head> <meta charset="utf-8"> <meta name="renderer" content="webkit" /> <meta name="format-detection" content="telephone=no"/> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" /> <meta content="width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1" name="viewport" /> <title>Baidu Research</title> <link rel="shortcut icon" href="../web/images/favicon.png" type="image/x-icon"> <link rel="stylesheet" href="../web/css/reset.css"> <link rel="stylesheet" href="../web/css/animate.css"> <link rel="stylesheet" href="../web/css/Bootstrap.css"> <link rel="stylesheet" href="../web/css/style.css"> <link rel="stylesheet" href="../web/css/media.css?3333"> <script src="../web/js/jquery.min.js"></script> </head> <body class="n_body" ondragstart="return false">  <div class="n_header"> <div class="container"> <div class="header01"> <div class="logo"> <a class="h_logo" href="/Index" style="background-image: url(../web/images/logo.png);"><img src="../web/images/logo.png" alt="Baidu Research"></a> </div> <div class="nav"> <ul> <li><a href="/Index">Home</a></li> <li><a href="/Publications">Publications</a></li> <li><a href="/Research_Areas?id=55">Research Areas</a> <div class="nav_er"> <ul class="div_dl "> <li><a href="/Research_Areas/index-view?id=55">Data Science and Data Mining</a></li> <li><a href="/Research_Areas/index-view?id=56">Natural Language and Speech</a></li> <li><a href="/Research_Areas/index-view?id=57">Business Intelligence</a></li> <li><a href="/Research_Areas/index-view?id=58">Robotics and Autonomous Driving</a></li> <li><a href="/Research_Areas/index-view?id=59">Computer Vision</a></li> <li><a href="/Research_Areas/index-view?id=60">Machine Learning and Deep Learning</a></li> <li><a href="/Research_Areas/index-view?id=61">Computational Biology and Bioinformatics</a></li> <li><a href="/Research_Areas/index-view?id=62">High Performance Computing</a></li> <li><a href="/Research_Areas/index-view?id=75">Quantum Computing</a></li> </ul> </div> </li> <li><a class="active" href="/Blog">Blog</a></li> <li><a href="/Career">Careers</a></li> <li><a href="/Downloads">Downloads</a></li> <li><a href="/AI_Colloquium">AI Colloquium</a></li> </ul> <div class="nav_btn"><span></span></div> </div> </div> </div> <div class="header03 "> <div class="logo"><a href="/Index"><img src="../web/images/logo.png" alt="Baidu Research"></a></div> <div class="nav"> <ul> <li><a href="/Index">Home</a></li> <li><a href="/Publications">Publications</a></li> <li><a href="/Research_Areas?id=55">Research Areas</a> <div class="nav_er"> <ul class="div_dl "> <li><a href="/Research_Areas/index-view?id=55">Data Science and Data Mining</a></li> <li><a href="/Research_Areas/index-view?id=56">Natural Language and Speech</a></li> <li><a href="/Research_Areas/index-view?id=57">Business Intelligence</a></li> <li><a href="/Research_Areas/index-view?id=58">Robotics and Autonomous Driving</a></li> <li><a href="/Research_Areas/index-view?id=59">Computer Vision</a></li> <li><a href="/Research_Areas/index-view?id=60">Machine Learning and Deep Learning</a></li> <li><a href="/Research_Areas/index-view?id=61">Computational Biology and Bioinformatics</a></li> <li><a href="/Research_Areas/index-view?id=62">High Performance Computing</a></li> <li><a href="/Research_Areas/index-view?id=75">Quantum Computing</a></li> </ul> </div> </li> <li><a href="/Blog">Blog</a></li> <li><a href="/Career">Careers</a></li> <li><a href="/Downloads">Downloads</a></li> <li><a href="/AI_Colloquium">AI Colloquium</a></li> </ul> </div> <div class="nav_btn"><span></span></div> </div> </div> <div class="baidu-page-banner blog-side" style="background: url(/Public/uploads/5ae96c0a7676c.png);"> <div class="container"> <div class="baidu-page-title wow fadeIn">Blog</div> </div> </div> <div class="content-info"> <div class="container-details-er"> <div class="blog-details-title">PLATO-2: The State-of-the-art Open-Domain Chatbot in Chinese and English</div> <div class="blog-details-date"><p>2020-07-15</p><a href="/Blog">Back to list</a></div> <p style="margin-bottom: 0px; text-align: left;"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">With the steady progress in computer dialogue systems, people are becoming more comfortable talking with conversational agents like Google Assistant and Baidu's DuerOS. However, a natural human-bot conversation has a long way to go. Most chatbots are task-oriented dialogue systems that only specialize in target areas. What people want is a chatbot without topic restrictions, known as an open-domain chatbot. </span></p><p style="margin-bottom: 0px; text-align: left;"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">While the open-domain chatbot is still a challenging research field, recent advances in large-scale pre-training approaches fueled by enormous text corpora have spawned cutting-edge English chatbot models like Microsoft's DialoGPT, Google's Meena, and Facebook's Blender. </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">We are excited to present PLATO-2, our newest open-domain chatbot model that can talk about anything in Chinese and English and engage in deep conversations. Inspired by its previous version PLATO, PLATO-2 uses latent variables for diverse response generation and introduces an effective training method via curriculum learning. Our experiments show that PLATO-2 outperforms other state-of-the-art models in both Chinese and English evaluations with a substantial improvement. </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">You can read the paper <em>Towards Building an Open-Domain Chatbot via Curriculum Learning</em> on </span><a href="https://arxiv.org/abs/2006.16779" target="_blank" style="text-decoration: underline; color: rgb(74, 110, 224); font-size: 16px; font-family: arial, helvetica, sans-serif;"><span style="color: rgb(74, 110, 224); font-size: 16px; font-family: arial, helvetica, sans-serif;">arXiv</span></a><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">. You can find open-sourced code on <a href="https://github.com/PaddlePaddle/Knover">GitHub</a>. </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style="text-align:center"><video class="edui-upload-video vjs-default-skin video-js" controls="" preload="none" width="801" height="663" src="/ueditor/upload/video/20200715/1594787326193393.mp4" data-setup="{}"></video></p><p style="text-align:center"><br/></p><p style=";margin-bottom:0"><span style="font-size: 16px; font-family: arial, helvetica, sans-serif;"><strong><span style="font-size: 16px; color: rgb(14, 16, 26);">One-to-many mapping</span></strong></span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">One of the challenges facing dialogue generation systems is "one-to-many" mapping, which refers to how one dialogue context might correspond to multiple appropriate responses. For example, if told, "It is snowing outside," people would say, "How about making a snowman?" or "It's so cold. I miss summer." </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">The creation of different answers can be attributed to context and background knowledge, including personal attributes (gender, age, portrait, etc.), commonsense knowledge, personality, emotion, etc. However, computer systems find it challenging to model one-to-many relationships, bringing disturbances to dialogue systems' training. </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">To solve this challenge, models like Baidu's PLATO and Microsoft's OPTIMUS represent this one-to-many relationship via latent space. PLATO explicitly uses discrete latent variables and designs two reciprocal tasks of response generation and response selection to boost the quality of dialogue generation. The model achieved state-of-the-art results in three publicly available datasets (Persona-Chat, Daily Dialogue, DSTC7-AVSD). </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="font-size: 16px; font-family: arial, helvetica, sans-serif;"><strong><span style="font-size: 16px; color: rgb(14, 16, 26);">PLATO-2</span></strong></span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">Unlike the unidirectional network from DialoGPT and the Encoder-Decoder architecture from Meena and Blender, PLATO-2 keeps the unified network for bidirectional context encoding and unidirectional response generation through a flexible attention mechanism design. The model also adopts the pre-normalization technique used in GPT- 2, where layer normalization is placed within residual connections. </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">Researchers trained PLATO-2 via curriculum learning. As shown in the illustration below, there are two stages involved in the learning process. At stage one, a coarse-grained baseline model is trained for general response generation under the simplified one-to-one mapping relationship. At stage two, two models of fine-grained generation and evaluation are further trained for diverse response generation and response coherence estimation, respectively. </span></p><p style="text-align:center"><img src="/ueditor/upload/20200715/1594785494752839.png" title="1594785494752839.png" alt="72e9cae01e5f7bce0b8e597ad77be8c0.PNG" width="1020" height="613"/></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">Our researchers scaled up PLATO to PLATO-2 this time. While PLATO contains 12 transformer blocks with 110 parameters, the standard PLATO-2 model has 32 transformer blocks and 32 attention heads with 1.6 billion parameters. Researchers also introduced an effective training method of PLATO-2 via curriculum learning, citing the increasing compute of training large-scale models. </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">Thanks to the powerful parallel compute capability of PaddlePaddle, the 1.6B parameter model's training took approximately three weeks with 64 Nvidia Tesla V100 graphic cards. Researchers also employed gradient checkpointing to trade computation for memory. </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">Since PLATO-2 has both Chinese and English models, researchers trained it separately on a 684M English dataset extracted from Reddit, and a 1.2B Chinese dataset originated from social media sites. </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="font-size: 16px; font-family: arial, helvetica, sans-serif;"><strong><span style="font-size: 16px; color: rgb(14, 16, 26);">Evaluation results</span></strong></span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">In the experiment, researchers carried out static evaluations where the model is tasked to produce responses towards the given multi-turn context, as well as interactive evaluations, which contain bot's self-chat for English evaluations and human-bot chat for Chinese evaluations. </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">In comparison with Microsoft's DialoGPT, Google's Meena, and Facebook's Blender, PLATO-2 outperformed others in aspects of coherence, information, and engagement in English dialogues. PLATO-2 also demonstrates a significant advantage in Chinese multi-turn chat over Microsoft's Chinese digital assistant Xiao Ice. </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">As shown in the image below, PLATO-2 added significant richness to its dialogue in self-chat evaluations and broadened the topic to include other related issues. In contrast, the Blender model often changes the subject. </span></p><p style="text-align:center"><img src="/ueditor/upload/20200715/1594785517701154.png" title="1594785517701154.png" alt="3ec1670e1576938322b2f58a7eeaaaf0.PNG" width="936" height="600"/></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p style=";margin-bottom:0"><span style="color: rgb(14, 16, 26); font-size: 16px; font-family: arial, helvetica, sans-serif;">We believe with PLATO-2, we are one step closer to natural human-machine interaction. The code and model of the English-based PLATO-2 will soon be available on GitHub. We also plan to release APIs for the Chinese model. </span></p><p><span style="font-size: 16px; font-family: arial, helvetica, sans-serif;"> </span></p><p><br/></p> <div class="pager"> <a href="/Blog/index-view?id=143"> <i class="glyphicon glyphicon-menu-up"></i>Previous One：Baidu at ECCV 2020 </a> <a href="/Blog/index-view?id=141"> <i class="glyphicon glyphicon-menu-down"></i>Next One：Baidu’s Multimodal Model ERNIE-ViL Achieves SOTA on 5 Tasks and Tops Visual Commonsense Reasoning Challenge </a> </div> </div> </div> <footer> <div class="baidu-bottom"> <div class="container"> <div class="col-md-6 col-xs-12"> <h2>Baidu Research</h2> <p>1195 Bordeaux Drive Sunnyvale, CA 94089<br>Baidu Technology Park, No. 10 Xibeiwang East Road, Haidian District, Beijing, China<br>Media Inquiries: <a href="mailto:intlcomm@baidu.com">intlcomm@baidu.com</a><br>General Inquries: <a href="mailto:air-info@baidu.com">air-info@baidu.com</a></p> </div> <div class="col-md-6 col-xs-12"> <ul class="social-icons"> <li><a href="https://twitter.com/baiduresearch" target="_blank"><img src="../web/images/ico-2.png"></a> </li> <li><a href="https://www.linkedin.com/company/baidu-usa" target="_blank"><img src="../web/images/ico-3.png"></a> </li> </ul> <div class="baidu-weibu"> <div class="baidu-img"><img src="../web/images/f-logo.png"></div> <div class="baidu-links"> <a class="baidu-links-title" href="javascript:;">Links</a> <ul class="baidu-links-friends"> <li><a href="http://ai.baidu.com/" target="_blank">Baidu AI Open Platform</a> </li> <li><a href="http://www.dlnel.org/" target="_blank">DLNEL</a> </li> </ul> </div> </div> </div> </div> </div> <div class="baidu-foot">© 2018 Baidu Research</div> </footer> <script src="../web/js/bootstrap.min.js"></script> <script src="../web/js/wow.js"></script> <script src="../web/js/base.js"></script> </body> </html>

CINXE.COM

Baidu Research