CINXE.COM

<!DOCTYPE html> <html > <head> <meta charset="utf-8"> <meta rel="search" type="application/opensearchdescription+xml" href="/open_search.xml" title="Academia.edu"> <meta content="width=device-width, initial-scale=1" name="viewport"> <meta name="google-site-verification" content="bKJMBZA7E43xhDOopFZkssMMkBRjvYERV-NaN4R6mrs"> <meta name="csrf-param" content="authenticity_token" /> <meta name="csrf-token" content="HKbvafdco91kmD9qQF7VjgMYxnUredHUHizBo441I0h9f6nfJRmrak7lZ2RIabYGzaNQNuyTYs5FBbD5QGArxg==" /> <meta name="citation_title" content="Curiosity-Driven Reinforcement Learning for Dialogue Management" /> <meta name="citation_publication_date" content="2018/01/01" /> <meta name="citation_journal_title" content="Master Thesis" /> <meta name="citation_author" content="Michael Chinkers" /> <meta name="twitter:card" content="summary" /> <meta name="twitter:url" content="https://www.academia.edu/40969161/Curiosity_Driven_Reinforcement_Learning_for_Dialogue_Management" /> <meta name="twitter:title" content="Curiosity-Driven Reinforcement Learning for Dialogue Management" /> <meta name="twitter:description" content="Obtaining an effective reward signal for dialogue management is a non trivial problem. Real user feedback is inconsistent and often even absent. This thesis investigates the use of intrinsic rewards for a reinforcement learning based dialogue manager" /> <meta name="twitter:image" content="http://a.academia-assets.com/images/twitter-card.jpeg" /> <meta property="fb:app_id" content="2369844204" /> <meta property="og:type" content="book" /> <meta property="og:url" content="https://www.academia.edu/40969161/Curiosity_Driven_Reinforcement_Learning_for_Dialogue_Management" /> <meta property="og:title" content="Curiosity-Driven Reinforcement Learning for Dialogue Management" /> <meta property="og:image" content="http://a.academia-assets.com/images/open-graph-icons/fb-book.gif" /> <meta property="og:description" content="Obtaining an effective reward signal for dialogue management is a non trivial problem. Real user feedback is inconsistent and often even absent. This thesis investigates the use of intrinsic rewards for a reinforcement learning based dialogue manager" /> <meta property="article:author" content="https://independent.academia.edu/NicolasParisi1" /> <meta name="description" content="Obtaining an effective reward signal for dialogue management is a non trivial problem. Real user feedback is inconsistent and often even absent. This thesis investigates the use of intrinsic rewards for a reinforcement learning based dialogue manager" /> <title>(PDF) Curiosity-Driven Reinforcement Learning for Dialogue Management | Michael Chinkers - Academia.edu</title> <link rel="canonical" href="https://www.academia.edu/40969161/Curiosity_Driven_Reinforcement_Learning_for_Dialogue_Management" /> <script async src="https://www.googletagmanager.com/gtag/js?id=G-5VKX33P2DS"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-5VKX33P2DS', { cookie_domain: 'academia.edu', send_page_view: false, }); gtag('event', 'page_view', { 'controller': "single_work", 'action': "show", 'controller_action': 'single_work#show', 'logged_in': 'false', 'edge': 'unknown', // Send nil if there is no A/B test bucket, in case some records get logged // with missing data - that way we can distinguish between the two cases. // ab_test_bucket should be of the form <ab_test_name>:<bucket> 'ab_test_bucket': null, }) </script> <script> var $controller_name = 'single_work'; var $action_name = "show"; var $rails_env = 'production'; var $app_rev = '49879c2402910372f4abc62630a427bbe033d190'; var $domain = 'academia.edu'; var $app_host = "academia.edu"; var $asset_host = "academia-assets.com"; var $start_time = new Date().getTime(); var $recaptcha_key = "6LdxlRMTAAAAADnu_zyLhLg0YF9uACwz78shpjJB"; var $recaptcha_invisible_key = "6Lf3KHUUAAAAACggoMpmGJdQDtiyrjVlvGJ6BbAj"; var $disableClientRecordHit = false; </script> <script> window.require = { config: function() { return function() {} } } </script> <script> window.Aedu = window.Aedu || {}; window.Aedu.hit_data = null; window.Aedu.serverRenderTime = new Date(1732446739000); window.Aedu.timeDifference = new Date().getTime() - 1732446739000; </script> <script type="application/ld+json">{"@context":"https://schema.org","@type":"ScholarlyArticle","abstract":"Obtaining an effective reward signal for dialogue management is a non trivial problem. Real user feedback is inconsistent and often even absent. This thesis investigates the use of intrinsic rewards for a reinforcement learning based dialogue manager in order to improve policy learning in an environment with sparse rewards and to move away from inefficient random ε-greedy exploration. In addition to rewards given by a user simulator for successful dialogues, intrinsic curiosity rewards are given in the form of belief-state prediction errors generated by an intrinsic curiosity module within the dialogue manager. We investigate two main settings for this method: (1) predicting the raw next belief-state, and (2) predicting belief-states in a learned feature space. In order to meet the right difficulty level for the system to be able to learn a feature space, the model is pre-trained on a small pool of dialogue transitions. For both settings, results comparable to and better than simple ε-greedy exploration are achieved. (1) is able to learn faster, but (2) achieves higher final results and has more potential for improvements and to be successful in larger state-action spaces, where feature encodings and generalization are beneficial.","author":[{"@context":"https://schema.org","@type":"Person","name":"Michael Chinkers"}],"contributor":[],"dateCreated":"2019-11-18","dateModified":null,"datePublished":"2018-01-01","headline":"Curiosity-Driven Reinforcement Learning for Dialogue Management","inLanguage":"en","keywords":["Robotics","Artificial Intelligence","Reinforcement Learning","Dialogue","Deep Learning"],"locationCreated":null,"publication":"Master Thesis","publisher":{"@context":"https://schema.org","@type":"Organization","name":null},"image":null,"thumbnailUrl":null,"url":"https://www.academia.edu/40969161/Curiosity_Driven_Reinforcement_Learning_for_Dialogue_Management","sourceOrganization":[{"@context":"https://schema.org","@type":"EducationalOrganization","name":null}]}</script><link rel="stylesheet" media="all" href="//a.academia-assets.com/assets/single_work_page/loswp-352e32ba4e89304dc0b4fa5b3952eef2198174c54cdb79066bc62e91c68a1a91.css" /><link rel="stylesheet" media="all" href="//a.academia-assets.com/assets/design_system/body-8d679e925718b5e8e4b18e9a4fab37f7eaa99e43386459376559080ac8f2856a.css" /><link rel="stylesheet" media="all" href="//a.academia-assets.com/assets/design_system/button-3cea6e0ad4715ed965c49bfb15dedfc632787b32ff6d8c3a474182b231146ab7.css" /><link rel="stylesheet" media="all" href="//a.academia-assets.com/assets/design_system/text_button-73590134e40cdb49f9abdc8e796cc00dc362693f3f0f6137d6cf9bb78c318ce7.css" /><link crossorigin="" href="https://fonts.gstatic.com/" rel="preconnect" /><link href="https://fonts.googleapis.com/css2?family=DM+Sans:ital,opsz,wght@0,9..40,100..1000;1,9..40,100..1000&family=Gupter:wght@400;500;700&family=IBM+Plex+Mono:wght@300;400&family=Material+Symbols+Outlined:opsz,wght,FILL,GRAD@20,400,0,0&display=swap" rel="stylesheet" /><link rel="stylesheet" media="all" href="//a.academia-assets.com/assets/design_system/common-10fa40af19d25203774df2d4a03b9b5771b45109c2304968038e88a81d1215c5.css" /> </head> <body> <div id='react-modal'></div> <div class="js-upgrade-ie-banner" style="display: none; text-align: center; padding: 8px 0; background-color: #ebe480;"><p style="color: #000; font-size: 12px; margin: 0 0 4px;">Academia.edu no longer supports Internet Explorer.</p><p style="color: #000; font-size: 12px; margin: 0;">To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to <a href="https://www.academia.edu/upgrade-browser">upgrade your browser</a>.</p></div><script>// Show this banner for all versions of IE if (!!window.MSInputMethodContext || /(MSIE)/.test(navigator.userAgent)) { document.querySelector('.js-upgrade-ie-banner').style.display = 'block'; }</script> <div class="bootstrap login"><div class="modal fade login-modal" id="login-modal"><div class="login-modal-dialog modal-dialog"><div class="modal-content"><div class="modal-header"><button class="close close" data-dismiss="modal" type="button"><span aria-hidden="true">×</span><span class="sr-only">Close</span></button><h4 class="modal-title text-center"><strong>Log In</strong></h4></div><div class="modal-body"><div class="row"><div class="col-xs-10 col-xs-offset-1"><button class="btn btn-fb btn-lg btn-block btn-v-center-content" id="login-facebook-oauth-button"><svg style="float: left; width: 19px; line-height: 1em; margin-right: .3em;" aria-hidden="true" focusable="false" data-prefix="fab" data-icon="facebook-square" class="svg-inline--fa fa-facebook-square fa-w-14" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><path fill="currentColor" d="M400 32H48A48 48 0 0 0 0 80v352a48 48 0 0 0 48 48h137.25V327.69h-63V256h63v-54.64c0-62.15 37-96.48 93.67-96.48 27.14 0 55.52 4.84 55.52 4.84v61h-31.27c-30.81 0-40.42 19.12-40.42 38.73V256h68.78l-11 71.69h-57.78V480H400a48 48 0 0 0 48-48V80a48 48 0 0 0-48-48z"></path></svg><small><strong>Log in</strong> with <strong>Facebook</strong></small></button><br /><button class="btn btn-google btn-lg btn-block btn-v-center-content" id="login-google-oauth-button"><svg style="float: left; width: 22px; line-height: 1em; margin-right: .3em;" aria-hidden="true" focusable="false" data-prefix="fab" data-icon="google-plus" class="svg-inline--fa fa-google-plus fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M256,8C119.1,8,8,119.1,8,256S119.1,504,256,504,504,392.9,504,256,392.9,8,256,8ZM185.3,380a124,124,0,0,1,0-248c31.3,0,60.1,11,83,32.3l-33.6,32.6c-13.2-12.9-31.3-19.1-49.4-19.1-42.9,0-77.2,35.5-77.2,78.1S142.3,334,185.3,334c32.6,0,64.9-19.1,70.1-53.3H185.3V238.1H302.2a109.2,109.2,0,0,1,1.9,20.7c0,70.8-47.5,121.2-118.8,121.2ZM415.5,273.8v35.5H380V273.8H344.5V238.3H380V202.8h35.5v35.5h35.2v35.5Z"></path></svg><small><strong>Log in</strong> with <strong>Google</strong></small></button><br /><style type="text/css">.sign-in-with-apple-button { width: 100%; height: 52px; border-radius: 3px; border: 1px solid black; cursor: pointer; }</style><script src="https://appleid.cdn-apple.com/appleauth/static/jsapi/appleid/1/en_US/appleid.auth.js" type="text/javascript"></script><div class="sign-in-with-apple-button" data-border="false" data-color="white" id="appleid-signin"><span   ="Sign Up with Apple" class="u-fs11"></span></div><script>AppleID.auth.init({ clientId: 'edu.academia.applesignon', scope: 'name email', redirectURI: 'https://www.academia.edu/sessions', state: "7849f5b2ea15878ea73ac70d884893b5a02c63e204cfd5a9870ab5c8bb38e547", });</script><script>// Hacky way of checking if on fast loswp if (window.loswp == null) { (function() { const Google = window?.Aedu?.Auth?.OauthButton?.Login?.Google; const Facebook = window?.Aedu?.Auth?.OauthButton?.Login?.Facebook; if (Google) { new Google({ el: '#login-google-oauth-button', rememberMeCheckboxId: 'remember_me', track: null }); } if (Facebook) { new Facebook({ el: '#login-facebook-oauth-button', rememberMeCheckboxId: 'remember_me', track: null }); } })(); }</script></div></div></div><div class="modal-body"><div class="row"><div class="col-xs-10 col-xs-offset-1"><div class="hr-heading login-hr-heading"><span class="hr-heading-text">or</span></div></div></div></div><div class="modal-body"><div class="row"><div class="col-xs-10 col-xs-offset-1"><form class="js-login-form" action="https://www.academia.edu/sessions" accept-charset="UTF-8" method="post"><input name="utf8" type="hidden" value="✓" autocomplete="off" /><input type="hidden" name="authenticity_token" value="FySVokMUKV2lpfbM6AgYiIM5eHpRmUI20tmFbnCPkSh2/dMUkVEh6o/YrsLgP3sATYLuOZZz8SyJ8PQ0vtqZpg==" autocomplete="off" /><div class="form-group"><label class="control-label" for="login-modal-email-input" style="font-size: 14px;">Email</label><input class="form-control" id="login-modal-email-input" name="login" type="email" /></div><div class="form-group"><label class="control-label" for="login-modal-password-input" style="font-size: 14px;">Password</label><input class="form-control" id="login-modal-password-input" name="password" type="password" /></div><input type="hidden" name="post_login_redirect_url" id="post_login_redirect_url" value="https://www.academia.edu/40969161/Curiosity_Driven_Reinforcement_Learning_for_Dialogue_Management" autocomplete="off" /><div class="checkbox"><label><input type="checkbox" name="remember_me" id="remember_me" value="1" checked="checked" /><small style="font-size: 12px; margin-top: 2px; display: inline-block;">Remember me on this computer</small></label></div><br><input type="submit" name="commit" value="Log In" class="btn btn-primary btn-block btn-lg js-login-submit" data-disable-with="Log In" /></br></form><script>typeof window?.Aedu?.recaptchaManagedForm === 'function' && window.Aedu.recaptchaManagedForm( document.querySelector('.js-login-form'), document.querySelector('.js-login-submit') );</script><small style="font-size: 12px;"><br />or <a data-target="#login-modal-reset-password-container" data-toggle="collapse" href="javascript:void(0)">reset password</a></small><div class="collapse" id="login-modal-reset-password-container"><br /><div class="well margin-0x"><form class="js-password-reset-form" action="https://www.academia.edu/reset_password" accept-charset="UTF-8" method="post"><input name="utf8" type="hidden" value="✓" autocomplete="off" /><input type="hidden" name="authenticity_token" value="I7WIYDISBIVijJ1MBIo68Ttkc3YwvVWYZdfR0fZFU8dCbM7W4FcMMkjxxUIMvVl59d/lNfdX5oI+/qCLOBBbSQ==" autocomplete="off" /><p>Enter the email address you signed up with and we'll email you a reset link.</p><div class="form-group"><input class="form-control" name="email" type="email" /></div><input class="btn btn-primary btn-block g-recaptcha js-password-reset-submit" data-sitekey="6Lf3KHUUAAAAACggoMpmGJdQDtiyrjVlvGJ6BbAj" type="submit" value="Email me a link" /></form></div></div><script> require.config({ waitSeconds: 90 })(["https://a.academia-assets.com/assets/collapse-45805421cf446ca5adf7aaa1935b08a3a8d1d9a6cc5d91a62a2a3a00b20b3e6a.js"], function() { // from javascript_helper.rb $("#login-modal-reset-password-container").on("shown.bs.collapse", function() { $(this).find("input[type=email]").focus(); }); }); </script> </div></div></div><div class="modal-footer"><div class="text-center"><small style="font-size: 12px;">Need an account? <a rel="nofollow" href="https://www.academia.edu/signup">Click here to sign up</a></small></div></div></div></div></div></div><script>// If we are on subdomain or non-bootstrapped page, redirect to login page instead of showing modal (function(){ if (typeof $ === 'undefined') return; var host = window.location.hostname; if ((host === $domain || host === "www."+$domain) && (typeof $().modal === 'function')) { $("#nav_log_in").click(function(e) { // Don't follow the link and open the modal e.preventDefault(); $("#login-modal").on('shown.bs.modal', function() { $(this).find("#login-modal-email-input").focus() }).modal('show'); }); } })()</script> <div id="fb-root"></div><script>window.fbAsyncInit = function() { FB.init({ appId: "2369844204", version: "v8.0", status: true, cookie: true, xfbml: true }); // Additional initialization code. if (window.InitFacebook) { // facebook.ts already loaded, set it up. window.InitFacebook(); } else { // Set a flag for facebook.ts to find when it loads. window.academiaAuthReadyFacebook = true; } };</script> <div id="google-root"></div><script>window.loadGoogle = function() { if (window.InitGoogle) { // google.ts already loaded, set it up. window.InitGoogle("331998490334-rsn3chp12mbkiqhl6e7lu2q0mlbu0f1b"); } else { // Set a flag for google.ts to use when it loads. window.GoogleClientID = "331998490334-rsn3chp12mbkiqhl6e7lu2q0mlbu0f1b"; } };</script> <div class="header--container" id="main-header-container"><div class="header--inner-container header--inner-container-ds2"><div class="header-ds2--left-wrapper"><div class="header-ds2--left-wrapper-inner"><a data-main-header-link-target="logo_home" href="https://www.academia.edu/"><img class="hide-on-desktop-redesign" style="height: 24px; width: 24px;" alt="Academia.edu" src="//a.academia-assets.com/images/academia-logo-redesign-2015-A.svg" width="24" height="24" /><img width="145.2" height="18" class="hide-on-mobile-redesign" style="height: 24px;" alt="Academia.edu" src="//a.academia-assets.com/images/academia-logo-redesign-2015.svg" /></a><div class="header--search-container header--search-container-ds2"><form class="js-SiteSearch-form select2-no-default-pills" action="https://www.academia.edu/search" accept-charset="UTF-8" method="get"><input name="utf8" type="hidden" value="✓" autocomplete="off" /><svg style="width: 14px; height: 14px;" aria-hidden="true" focusable="false" data-prefix="fas" data-icon="search" class="header--search-icon svg-inline--fa fa-search fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M505 442.7L405.3 343c-4.5-4.5-10.6-7-17-7H372c27.6-35.3 44-79.7 44-128C416 93.1 322.9 0 208 0S0 93.1 0 208s93.1 208 208 208c48.3 0 92.7-16.4 128-44v16.3c0 6.4 2.5 12.5 7 17l99.7 99.7c9.4 9.4 24.6 9.4 33.9 0l28.3-28.3c9.4-9.4 9.4-24.6.1-34zM208 336c-70.7 0-128-57.2-128-128 0-70.7 57.2-128 128-128 70.7 0 128 57.2 128 128 0 70.7-57.2 128-128 128z"></path></svg><input class="header--search-input header--search-input-ds2 js-SiteSearch-form-input" data-main-header-click-target="search_input" name="q" placeholder="Search" type="text" /></form></div></div></div><nav class="header--nav-buttons header--nav-buttons-ds2 js-main-nav"><a class="ds2-5-button ds2-5-button--secondary js-header-login-url header-button-ds2 header-login-ds2 hide-on-mobile-redesign" href="https://www.academia.edu/login" rel="nofollow">Log In</a><a class="ds2-5-button ds2-5-button--secondary header-button-ds2 hide-on-mobile-redesign" href="https://www.academia.edu/signup" rel="nofollow">Sign Up</a><button class="header--hamburger-button header--hamburger-button-ds2 hide-on-desktop-redesign js-header-hamburger-button"><div class="icon-bar"></div><div class="icon-bar" style="margin-top: 4px;"></div><div class="icon-bar" style="margin-top: 4px;"></div></button></nav></div><ul class="header--dropdown-container js-header-dropdown"><li class="header--dropdown-row"><a class="header--dropdown-link" href="https://www.academia.edu/login" rel="nofollow">Log In</a></li><li class="header--dropdown-row"><a class="header--dropdown-link" href="https://www.academia.edu/signup" rel="nofollow">Sign Up</a></li><li class="header--dropdown-row js-header-dropdown-expand-button"><button class="header--dropdown-button">more<svg aria-hidden="true" focusable="false" data-prefix="fas" data-icon="caret-down" class="header--dropdown-button-icon svg-inline--fa fa-caret-down fa-w-10" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 320 512"><path fill="currentColor" d="M31.3 192h257.3c17.8 0 26.7 21.5 14.1 34.1L174.1 354.8c-7.8 7.8-20.5 7.8-28.3 0L17.2 226.1C4.6 213.5 13.5 192 31.3 192z"></path></svg></button></li><li><ul class="header--expanded-dropdown-container"><li class="header--dropdown-row"><a class="header--dropdown-link" href="https://www.academia.edu/about">About</a></li><li class="header--dropdown-row"><a class="header--dropdown-link" href="https://www.academia.edu/press">Press</a></li><li class="header--dropdown-row"><a class="header--dropdown-link" href="https://medium.com/@academia">Blog</a></li><li class="header--dropdown-row"><a class="header--dropdown-link" href="https://www.academia.edu/documents">Papers</a></li><li class="header--dropdown-row"><a class="header--dropdown-link" href="https://www.academia.edu/terms">Terms</a></li><li class="header--dropdown-row"><a class="header--dropdown-link" href="https://www.academia.edu/privacy">Privacy</a></li><li class="header--dropdown-row"><a class="header--dropdown-link" href="https://www.academia.edu/copyright">Copyright</a></li><li class="header--dropdown-row"><a class="header--dropdown-link" href="https://www.academia.edu/hiring"><svg aria-hidden="true" focusable="false" data-prefix="fas" data-icon="briefcase" class="header--dropdown-row-icon svg-inline--fa fa-briefcase fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M320 336c0 8.84-7.16 16-16 16h-96c-8.84 0-16-7.16-16-16v-48H0v144c0 25.6 22.4 48 48 48h416c25.6 0 48-22.4 48-48V288H320v48zm144-208h-80V80c0-25.6-22.4-48-48-48H176c-25.6 0-48 22.4-48 48v48H48c-25.6 0-48 22.4-48 48v80h512v-80c0-25.6-22.4-48-48-48zm-144 0H192V96h128v32z"></path></svg>We're Hiring!</a></li><li class="header--dropdown-row"><a class="header--dropdown-link" href="https://support.academia.edu/"><svg aria-hidden="true" focusable="false" data-prefix="fas" data-icon="question-circle" class="header--dropdown-row-icon svg-inline--fa fa-question-circle fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M504 256c0 136.997-111.043 248-248 248S8 392.997 8 256C8 119.083 119.043 8 256 8s248 111.083 248 248zM262.655 90c-54.497 0-89.255 22.957-116.549 63.758-3.536 5.286-2.353 12.415 2.715 16.258l34.699 26.31c5.205 3.947 12.621 3.008 16.665-2.122 17.864-22.658 30.113-35.797 57.303-35.797 20.429 0 45.698 13.148 45.698 32.958 0 14.976-12.363 22.667-32.534 33.976C247.128 238.528 216 254.941 216 296v4c0 6.627 5.373 12 12 12h56c6.627 0 12-5.373 12-12v-1.333c0-28.462 83.186-29.647 83.186-106.667 0-58.002-60.165-102-116.531-102zM256 338c-25.365 0-46 20.635-46 46 0 25.364 20.635 46 46 46s46-20.636 46-46c0-25.365-20.635-46-46-46z"></path></svg>Help Center</a></li><li class="header--dropdown-row js-header-dropdown-collapse-button"><button class="header--dropdown-button">less<svg aria-hidden="true" focusable="false" data-prefix="fas" data-icon="caret-up" class="header--dropdown-button-icon svg-inline--fa fa-caret-up fa-w-10" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 320 512"><path fill="currentColor" d="M288.662 352H31.338c-17.818 0-26.741-21.543-14.142-34.142l128.662-128.662c7.81-7.81 20.474-7.81 28.284 0l128.662 128.662c12.6 12.599 3.676 34.142-14.142 34.142z"></path></svg></button></li></ul></li></ul></div> <script src="//a.academia-assets.com/assets/webpack_bundles/fast_loswp-bundle-bf3d831cde46cd0e142f29f81a3fc4ce5ab45a404c10c12a480e83de68aff851.js" defer="defer"></script><script>window.loswp = {}; window.loswp.author = 128533122; window.loswp.bulkDownloadFilterCounts = {}; window.loswp.hasDownloadableAttachment = true; window.loswp.hasViewableAttachments = true; // TODO: just use routes for this window.loswp.loginUrl = "https://www.academia.edu/login?post_login_redirect_url=https%3A%2F%2Fwww.academia.edu%2F40969161%2FCuriosity_Driven_Reinforcement_Learning_for_Dialogue_Management%3Fauto%3Ddownload"; window.loswp.translateUrl = "https://www.academia.edu/login?post_login_redirect_url=https%3A%2F%2Fwww.academia.edu%2F40969161%2FCuriosity_Driven_Reinforcement_Learning_for_Dialogue_Management%3Fshow_translation%3Dtrue"; window.loswp.previewableAttachments = [{"id":61251965,"identifier":"Attachment_61251965","shouldShowBulkDownload":false}]; window.loswp.shouldDetectTimezone = true; window.loswp.shouldShowBulkDownload = true; window.loswp.showSignupCaptcha = false window.loswp.willEdgeCache = false; window.loswp.work = {"work":{"id":40969161,"created_at":"2019-11-18T03:28:15.449-08:00","from_world_paper_id":null,"updated_at":"2021-01-16T17:58:44.933-08:00","_data":{"abstract":"Obtaining an effective reward signal for dialogue management is a non trivial problem. Real user feedback is inconsistent and often even absent. This thesis investigates the use of intrinsic rewards for a reinforcement learning based dialogue manager in order to improve policy learning in an environment with sparse rewards and to move away from inefficient random ε-greedy exploration. In addition to rewards given by a user simulator for successful dialogues, intrinsic curiosity rewards are given in the form of belief-state prediction errors generated by an intrinsic curiosity module within the dialogue manager. We investigate two main settings for this method: (1) predicting the raw next belief-state, and (2) predicting belief-states in a learned feature space. In order to meet the right difficulty level for the system to be able to learn a feature space, the model is pre-trained on a small pool of dialogue transitions. For both settings, results comparable to and better than simple ε-greedy exploration are achieved. (1) is able to learn faster, but (2) achieves higher final results and has more potential for improvements and to be successful in larger state-action spaces, where feature encodings and generalization are beneficial.","publication_date":"2018,,","publication_name":"Master Thesis"},"document_type":"book","pre_hit_view_count_baseline":null,"quality":"low","language":"en","title":"Curiosity-Driven Reinforcement Learning for Dialogue Management","broadcastable":true,"draft":false,"has_indexable_attachment":true,"indexable":true}}["work"]; window.loswp.workCoauthors = [128533122]; window.loswp.locale = "en"; window.loswp.countryCode = "SG"; window.loswp.cwvAbTestBucket = ""; window.loswp.designVariant = "ds_vanilla"; window.loswp.fullPageMobileSutdModalVariant = "full_page_mobile_sutd_modal"; window.loswp.useOptimizedScribd4genScript = false; window.loswp.appleClientId = 'edu.academia.applesignon';</script><script defer="" src="https://accounts.google.com/gsi/client"></script><div class="ds-loswp-container"><div class="ds-work-card--grid-container"><div class="ds-work-card--container js-loswp-work-card"><div class="ds-work-card--cover"><div class="ds-work-cover--wrapper"><div class="ds-work-cover--container"><button class="ds-work-cover--clickable js-swp-download-button" data-signup-modal="{"location":"swp-splash-paper-cover","attachmentId":61251965,"attachmentType":"pdf"}"><img alt="First page of “Curiosity-Driven Reinforcement Learning for Dialogue Management”" class="ds-work-cover--cover-thumbnail" src="https://0.academia-photos.com/attachment_thumbnails/61251965/mini_magick20191118-3058-1cs2pk5.png?1574077921" /><img alt="PDF Icon" class="ds-work-cover--file-icon" src="//a.academia-assets.com/assets/single_work_splash/adobe.icon-574afd46eb6b03a77a153a647fb47e30546f9215c0ee6a25df597a779717f9ef.svg" /><div class="ds-work-cover--hover-container"><span class="material-symbols-outlined" style="font-size: 20px" translate="no">download</span><p>Download Free PDF</p></div><div class="ds-work-cover--ribbon-container">Download Free PDF</div><div class="ds-work-cover--ribbon-triangle"></div></button></div></div></div><div class="ds-work-card--work-information"><h1 class="ds-work-card--work-title">Curiosity-Driven Reinforcement Learning for Dialogue Management</h1><div class="ds-work-card--work-authors ds-work-card--detail"><a class="ds-work-card--author js-wsj-grid-card-author ds2-5-body-md ds2-5-body-link" data-author-id="128533122" href="https://independent.academia.edu/NicolasParisi1"><img alt="Profile image of Michael Chinkers" class="ds-work-card--author-avatar" src="//a.academia-assets.com/images/s65_no_pic.png" />Michael Chinkers</a></div><p class="ds-work-card--detail ds2-5-body-sm">2018, Master Thesis</p><p class="ds-work-card--work-abstract ds-work-card--detail ds2-5-body-md">Obtaining an effective reward signal for dialogue management is a non trivial problem. Real user feedback is inconsistent and often even absent. This thesis investigates the use of intrinsic rewards for a reinforcement learning based dialogue manager in order to improve policy learning in an environment with sparse rewards and to move away from inefficient random ε-greedy exploration. In addition to rewards given by a user simulator for successful dialogues, intrinsic curiosity rewards are given in the form of belief-state prediction errors generated by an intrinsic curiosity module within the dialogue manager. We investigate two main settings for this method: (1) predicting the raw next belief-state, and (2) predicting belief-states in a learned feature space. In order to meet the right difficulty level for the system to be able to learn a feature space, the model is pre-trained on a small pool of dialogue transitions. For both settings, results comparable to and better than simple ε-greedy exploration are achieved. (1) is able to learn faster, but (2) achieves higher final results and has more potential for improvements and to be successful in larger state-action spaces, where feature encodings and generalization are beneficial.</p><div class="ds-work-card--button-container"><button class="ds2-5-button js-swp-download-button" data-signup-modal="{"location":"continue-reading-button--work-card","attachmentId":61251965,"attachmentType":"pdf","workUrl":"https://www.academia.edu/40969161/Curiosity_Driven_Reinforcement_Learning_for_Dialogue_Management"}">See full PDF</button><button class="ds2-5-button ds2-5-button--secondary js-swp-download-button" data-signup-modal="{"location":"download-pdf-button--work-card","attachmentId":61251965,"attachmentType":"pdf","workUrl":"https://www.academia.edu/40969161/Curiosity_Driven_Reinforcement_Learning_for_Dialogue_Management"}"><span class="material-symbols-outlined" style="font-size: 20px" translate="no">download</span>Download PDF</button></div></div></div></div><div data-auto_select="false" data-client_id="331998490334-rsn3chp12mbkiqhl6e7lu2q0mlbu0f1b" data-doc_id="61251965" data-landing_url="https://www.academia.edu/40969161/Curiosity_Driven_Reinforcement_Learning_for_Dialogue_Management" data-login_uri="https://www.academia.edu/registrations/google_one_tap" data-moment_callback="onGoogleOneTapEvent" id="g_id_onload"></div><div class="ds-top-related-works--grid-container"><div class="ds-related-content--container ds-top-related-works--container"><h2 class="ds-related-content--heading">Related papers</h2><div class="ds-related-work--container js-wsj-grid-card" data-collection-position="0" data-entity-id="18195408" data-sort-order="default"><a class="ds-related-work--title js-wsj-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/18195408/The_exploration_exploitation_trade_off_in_Reinforcement_Learning_for_dialogue_management">The exploration/exploitation trade-off in Reinforcement Learning for dialogue management</a><div class="ds-related-work--metadata"><a class="js-wsj-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="19627" href="https://independent.academia.edu/SilviaQuarteroni">Silvia Quarteroni</a></div><p class="ds-related-work--metadata ds2-5-body-xs">2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"The exploration/exploitation trade-off in Reinforcement Learning for dialogue management","attachmentId":39927098,"attachmentType":"pdf","work_url":"https://www.academia.edu/18195408/The_exploration_exploitation_trade_off_in_Reinforcement_Learning_for_dialogue_management","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-wsj-grid-card-view-pdf" href="https://www.academia.edu/18195408/The_exploration_exploitation_trade_off_in_Reinforcement_Learning_for_dialogue_management"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-wsj-grid-card" data-collection-position="1" data-entity-id="110026802" data-sort-order="default"><a class="ds-related-work--title js-wsj-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/110026802/Deep_Reinforcement_Learning_for_Dialogue_Systems_with_Dynamic_User_Goals">Deep Reinforcement Learning for Dialogue Systems with Dynamic User Goals</a><div class="ds-related-work--metadata"><a class="js-wsj-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="166390836" href="https://retired.academia.edu/glenchandler">glen chandler</a></div><p class="ds-related-work--metadata ds2-5-body-xs">2020</p><p class="ds-related-work--abstract ds2-5-body-sm">Dialogue systems have recently become a widely used system across the world. Some of the functionality offered includes application user interfacing, social conversation, data interaction, and task completion. Most recently, dialogue systems have been developed to autonomously and intelligently interact with users to complete complex tasks in diverse operational spaces. This kind of dialogue system can interact with users to complete tasks such as making a phone call, ordering items online, searching the internet for a question, and more. These systems are typically created by training a machine learning model with example conversational data. One of the existing problems with training these systems is that they require large amounts of realistic user data, which can be challenging to collect and label in large quantities. Our research focuses on modifications to user simulators that &quot;change their mind&quot; mid-episode with the goal of training more robust dialogue agents. We ...</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Deep Reinforcement Learning for Dialogue Systems with Dynamic User Goals","attachmentId":107972837,"attachmentType":"pdf","work_url":"https://www.academia.edu/110026802/Deep_Reinforcement_Learning_for_Dialogue_Systems_with_Dynamic_User_Goals","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-wsj-grid-card-view-pdf" href="https://www.academia.edu/110026802/Deep_Reinforcement_Learning_for_Dialogue_Systems_with_Dynamic_User_Goals"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-wsj-grid-card" data-collection-position="2" data-entity-id="79321656" data-sort-order="default"><a class="ds-related-work--title js-wsj-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/79321656/A_Survey_on_Reinforcement_Learning_for_Dialogue_Systems">A Survey on Reinforcement Learning for Dialogue Systems</a><div class="ds-related-work--metadata"><a class="js-wsj-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="191316872" href="https://independent.academia.edu/AmanSoni105">Aman Soni</a></div><p class="ds-related-work--metadata ds2-5-body-xs">viXra, 2019</p><p class="ds-related-work--abstract ds2-5-body-sm">Dialogue systems are computer systems which com- municate with humans using natural language. The goal is not just to imitate human communication but to learn from these interactions and improve the system’s behaviour over time. Therefore, different machine learning approaches can be implemented with Reinforcement Learning being one of the most promising techniques to generate a contextually and semantically appropriate response. This paper outlines the current state-of- the-art methods and algorithms for integration of Reinforcement Learning techniques into dialogue systems.</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"A Survey on Reinforcement Learning for Dialogue Systems","attachmentId":86075644,"attachmentType":"pdf","work_url":"https://www.academia.edu/79321656/A_Survey_on_Reinforcement_Learning_for_Dialogue_Systems","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-wsj-grid-card-view-pdf" href="https://www.academia.edu/79321656/A_Survey_on_Reinforcement_Learning_for_Dialogue_Systems"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-wsj-grid-card" data-collection-position="3" data-entity-id="26513438" data-sort-order="default"><a class="ds-related-work--title js-wsj-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/26513438/Using_reinforcement_learning_to_build_a_better_model_of_dialogue_state">Using reinforcement learning to build a better model of dialogue state</a><div class="ds-related-work--metadata"><a class="js-wsj-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="50446196" href="https://independent.academia.edu/JoelTetreault">Joel Tetreault</a></div><p class="ds-related-work--metadata ds2-5-body-xs">2006</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Using reinforcement learning to build a better model of dialogue state","attachmentId":46809839,"attachmentType":"pdf","work_url":"https://www.academia.edu/26513438/Using_reinforcement_learning_to_build_a_better_model_of_dialogue_state","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-wsj-grid-card-view-pdf" href="https://www.academia.edu/26513438/Using_reinforcement_learning_to_build_a_better_model_of_dialogue_state"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-wsj-grid-card" data-collection-position="4" data-entity-id="30524642" data-sort-order="default"><a class="ds-related-work--title js-wsj-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/30524642/Optimizing_Dialogue_Management_with_Reinforcement_Learning_Experiments_with_the_NJFun_System">Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System</a><div class="ds-related-work--metadata"><a class="js-wsj-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="34843701" href="https://independent.academia.edu/MarilynWalker5">Marilyn Walker</a></div><p class="ds-related-work--metadata ds2-5-body-xs">2011</p><p class="ds-related-work--abstract ds2-5-body-sm">Designing the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a dialogue policy, which addresses the technical challenges in applying reinforcement learning to a working dialogue system with human users. We report on the design, construction and empirical evaluation of NJFun, an experimental spoken dialogue system that provides users with access to information about fun things to do in New Jersey. Our results show that by optimizing its performance via reinforcement learning, NJFun measurably improves system performance.</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System","attachmentId":50968471,"attachmentType":"pdf","work_url":"https://www.academia.edu/30524642/Optimizing_Dialogue_Management_with_Reinforcement_Learning_Experiments_with_the_NJFun_System","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-wsj-grid-card-view-pdf" href="https://www.academia.edu/30524642/Optimizing_Dialogue_Management_with_Reinforcement_Learning_Experiments_with_the_NJFun_System"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-wsj-grid-card" data-collection-position="5" data-entity-id="30524414" data-sort-order="default"><a class="ds-related-work--title js-wsj-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/30524414/Optimizing_dialogue_management_with_reinforcement_learning_Experiments_">Optimizing dialogue management with reinforcement learning: Experiments …</a><div class="ds-related-work--metadata"><a class="js-wsj-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="34843701" href="https://independent.academia.edu/MarilynWalker5">Marilyn Walker</a></div><p class="ds-related-work--metadata ds2-5-body-xs">Journal of Artificial …</p><p class="ds-related-work--abstract ds2-5-body-sm">Designing the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimiz- ing a dialogue policy, which addresses the technical challenges in applying reinforcement ...</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Optimizing dialogue management with reinforcement learning: Experiments …","attachmentId":50968493,"attachmentType":"pdf","work_url":"https://www.academia.edu/30524414/Optimizing_dialogue_management_with_reinforcement_learning_Experiments_","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-wsj-grid-card-view-pdf" href="https://www.academia.edu/30524414/Optimizing_dialogue_management_with_reinforcement_learning_Experiments_"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-wsj-grid-card" data-collection-position="6" data-entity-id="14976651" data-sort-order="default"><a class="ds-related-work--title js-wsj-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/14976651/Hybrid_Reinforcement_Supervised_Learning_of_Dialogue_Policies_from_Fixed_Data_Sets">Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets</a><div class="ds-related-work--metadata"><a class="js-wsj-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="33971246" href="https://independent.academia.edu/JamesHenderson28">James Henderson</a></div><p class="ds-related-work--metadata ds2-5-body-xs">Computational Linguistics, 2008</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets","attachmentId":43681476,"attachmentType":"pdf","work_url":"https://www.academia.edu/14976651/Hybrid_Reinforcement_Supervised_Learning_of_Dialogue_Policies_from_Fixed_Data_Sets","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-wsj-grid-card-view-pdf" href="https://www.academia.edu/14976651/Hybrid_Reinforcement_Supervised_Learning_of_Dialogue_Policies_from_Fixed_Data_Sets"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-wsj-grid-card" data-collection-position="7" data-entity-id="14976659" data-sort-order="default"><a class="ds-related-work--title js-wsj-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/14976659/An_ISU_dialogue_system_exhibiting_reinforcement_learning_of_dialogue_policies">An ISU dialogue system exhibiting reinforcement learning of dialogue policies</a><div class="ds-related-work--metadata"><a class="js-wsj-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="33971246" href="https://independent.academia.edu/JamesHenderson28">James Henderson</a></div><p class="ds-related-work--metadata ds2-5-body-xs">Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations on - EACL '06, 2006</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"An ISU dialogue system exhibiting reinforcement learning of dialogue policies","attachmentId":43681663,"attachmentType":"pdf","work_url":"https://www.academia.edu/14976659/An_ISU_dialogue_system_exhibiting_reinforcement_learning_of_dialogue_policies","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-wsj-grid-card-view-pdf" href="https://www.academia.edu/14976659/An_ISU_dialogue_system_exhibiting_reinforcement_learning_of_dialogue_policies"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-wsj-grid-card" data-collection-position="8" data-entity-id="85255816" data-sort-order="default"><a class="ds-related-work--title js-wsj-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/85255816/Reinforcement_learning_based_dialogue_system_for_human_robot_interactions_with_socially_inspired_rewards">Reinforcement-learning based dialogue system for human–robot interactions with socially-inspired rewards</a><div class="ds-related-work--metadata"><a class="js-wsj-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="46307541" href="https://independent.academia.edu/FerreiraEmmanuel">Emmanuel Ferreira</a></div><p class="ds-related-work--metadata ds2-5-body-xs">Computer Speech & Language, 2015</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Reinforcement-learning based dialogue system for human–robot interactions with socially-inspired rewards","attachmentId":90010077,"attachmentType":"pdf","work_url":"https://www.academia.edu/85255816/Reinforcement_learning_based_dialogue_system_for_human_robot_interactions_with_socially_inspired_rewards","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-wsj-grid-card-view-pdf" href="https://www.academia.edu/85255816/Reinforcement_learning_based_dialogue_system_for_human_robot_interactions_with_socially_inspired_rewards"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-wsj-grid-card" data-collection-position="9" data-entity-id="85474617" data-sort-order="default"><a class="ds-related-work--title js-wsj-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/85474617/Experience_Replay_based_Deep_Reinforcement_Learning_for_Dialogue_Management_Optimisation">Experience Replay-based Deep Reinforcement Learning for Dialogue Management Optimisation</a><div class="ds-related-work--metadata"><a class="js-wsj-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="208546" href="https://iiita.academia.edu/umashankertiwary">uma shanker tiwary</a></div><p class="ds-related-work--metadata ds2-5-body-xs">ACM Transactions on Asian and Low-Resource Language Information Processing</p><p class="ds-related-work--abstract ds2-5-body-sm">Dialogue policy is a crucial component in task-oriented Spoken Dialogue Systems (SDSs). As a decision function, it takes the current dialogue state as input and generates appropriate system’s response. In this paper, we explore the reinforcement learning approaches to solve this problem in an Indic language scenario. Recently, Deep Reinforcement Learning (DRL) has been used to optimise the dialogue policy. However, many DRL approaches are not sample-efficient. Hence, particular attention is given to actor-critic methods based on off-policy reinforcement learning that utilise the Experience Replay (ER) technique for reducing the bias and variance to achieve high sample efficiency. ER based actor-critic methods, such as Advantage Actor-Critic Experience Replay (A2CER) are proven to deliver competitive results in gaming environments that are fully observable and have a very small action-set. While, in SDSs, the states are not fully observable and often have to deal with the large actio...</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Experience Replay-based Deep Reinforcement Learning for Dialogue Management Optimisation","attachmentId":90162149,"attachmentType":"pdf","work_url":"https://www.academia.edu/85474617/Experience_Replay_based_Deep_Reinforcement_Learning_for_Dialogue_Management_Optimisation","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-wsj-grid-card-view-pdf" href="https://www.academia.edu/85474617/Experience_Replay_based_Deep_Reinforcement_Learning_for_Dialogue_Management_Optimisation"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div></div></div><div class="ds-sticky-ctas--wrapper js-loswp-sticky-ctas hidden"><div class="ds-sticky-ctas--grid-container"><div class="ds-sticky-ctas--container"><button class="ds2-5-button js-swp-download-button" data-signup-modal="{"location":"continue-reading-button--sticky-ctas","attachmentId":61251965,"attachmentType":"pdf","workUrl":null}">See full PDF</button><button class="ds2-5-button ds2-5-button--secondary js-swp-download-button" data-signup-modal="{"location":"download-pdf-button--sticky-ctas","attachmentId":61251965,"attachmentType":"pdf","workUrl":null}"><span class="material-symbols-outlined" style="font-size: 20px" translate="no">download</span>Download PDF</button></div></div></div><div class="ds-below-fold--grid-container"><div class="ds-work--container js-loswp-embedded-document"><div class="attachment_preview" data-attachment="Attachment_61251965" style="display: none"><div class="js-scribd-document-container"><div class="scribd--document-loading js-scribd-document-loader" style="display: block;"><img alt="Loading..." src="//a.academia-assets.com/images/loaders/paper-load.gif" /><p>Loading Preview</p></div></div><div style="text-align: center;"><div class="scribd--no-preview-alert js-preview-unavailable"><p>Sorry, preview is currently unavailable. You can download the paper by clicking the button above.</p></div></div></div></div><div class="ds-sidebar--container js-work-sidebar"><div class="ds-related-content--container"><h2 class="ds-related-content--heading">Related papers</h2><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="0" data-entity-id="1175320" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/1175320/Learning_and_evaluation_of_dialogue_strategies_for_new_applications_Empirical_methods_for_optimization_from_small_data_sets">Learning and evaluation of dialogue strategies for new applications: Empirical methods for optimization from small data sets</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="1067708" href="https://hw.academia.edu/VerenaRieser">Verena Rieser</a></div><p class="ds-related-work--metadata ds2-5-body-xs">Computational Linguistics, 2011</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Learning and evaluation of dialogue strategies for new applications: Empirical methods for optimization from small data sets","attachmentId":7255080,"attachmentType":"pdf","work_url":"https://www.academia.edu/1175320/Learning_and_evaluation_of_dialogue_strategies_for_new_applications_Empirical_methods_for_optimization_from_small_data_sets","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/1175320/Learning_and_evaluation_of_dialogue_strategies_for_new_applications_Empirical_methods_for_optimization_from_small_data_sets"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="1" data-entity-id="30524415" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/30524415/Reinforcement_learning_for_spoken_dialogue_systems">Reinforcement learning for spoken dialogue systems</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="34843701" href="https://independent.academia.edu/MarilynWalker5">Marilyn Walker</a></div><p class="ds-related-work--metadata ds2-5-body-xs">Proc. NIPS99</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Reinforcement learning for spoken dialogue systems","attachmentId":50968496,"attachmentType":"pdf","work_url":"https://www.academia.edu/30524415/Reinforcement_learning_for_spoken_dialogue_systems","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/30524415/Reinforcement_learning_for_spoken_dialogue_systems"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="2" data-entity-id="30524655" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/30524655/An_Application_of_Reinforcement_Learning_to_Dialogue_Strategy_Selection_in_a_Spoken_Dialogue_System">An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="34843701" href="https://independent.academia.edu/MarilynWalker5">Marilyn Walker</a></div><p class="ds-related-work--metadata ds2-5-body-xs">2002</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System","attachmentId":50968645,"attachmentType":"pdf","work_url":"https://www.academia.edu/30524655/An_Application_of_Reinforcement_Learning_to_Dialogue_Strategy_Selection_in_a_Spoken_Dialogue_System","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/30524655/An_Application_of_Reinforcement_Learning_to_Dialogue_Strategy_Selection_in_a_Spoken_Dialogue_System"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="3" data-entity-id="14976622" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/14976622/Hybrid_reinforcement_supervised_learning_for_dialogue_policies_from_communicator_data">Hybrid reinforcement/supervised learning for dialogue policies from communicator data</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="33971246" href="https://independent.academia.edu/JamesHenderson28">James Henderson</a></div><p class="ds-related-work--metadata ds2-5-body-xs">IJCAI workshop on Knowledge and Reasoning in Practical Dialogue Systems, 2005</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Hybrid reinforcement/supervised learning for dialogue policies from communicator data","attachmentId":38494029,"attachmentType":"pdf","work_url":"https://www.academia.edu/14976622/Hybrid_reinforcement_supervised_learning_for_dialogue_policies_from_communicator_data","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/14976622/Hybrid_reinforcement_supervised_learning_for_dialogue_policies_from_communicator_data"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="4" data-entity-id="30524515" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/30524515/Automatic_optimization_of_dialogue_management">Automatic optimization of dialogue management</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="34843701" href="https://independent.academia.edu/MarilynWalker5">Marilyn Walker</a></div><p class="ds-related-work--metadata ds2-5-body-xs">Proceedings of the 18th conference on Computational linguistics -, 2000</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Automatic optimization of dialogue management","attachmentId":50968646,"attachmentType":"pdf","work_url":"https://www.academia.edu/30524515/Automatic_optimization_of_dialogue_management","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/30524515/Automatic_optimization_of_dialogue_management"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="5" data-entity-id="75041930" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/75041930/Dialogue_management_using_reinforcement_learning">Dialogue management using reinforcement learning</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="163561779" href="https://uad.academia.edu/TELKOMNIKAJOURNAL">TELKOMNIKA JOURNAL</a></div><p class="ds-related-work--metadata ds2-5-body-xs">TELKOMNIKA, 2021</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Dialogue management using reinforcement learning","attachmentId":82971714,"attachmentType":"pdf","work_url":"https://www.academia.edu/75041930/Dialogue_management_using_reinforcement_learning","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/75041930/Dialogue_management_using_reinforcement_learning"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="6" data-entity-id="26513494" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/26513494/Comparing_the_utility_of_state_features_in_spoken_dialogue_using_reinforcement_learning">Comparing the utility of state features in spoken dialogue using reinforcement learning</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="50446196" href="https://independent.academia.edu/JoelTetreault">Joel Tetreault</a></div><p class="ds-related-work--metadata ds2-5-body-xs">Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics -, 2006</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Comparing the utility of state features in spoken dialogue using reinforcement learning","attachmentId":46809854,"attachmentType":"pdf","work_url":"https://www.academia.edu/26513494/Comparing_the_utility_of_state_features_in_spoken_dialogue_using_reinforcement_learning","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/26513494/Comparing_the_utility_of_state_features_in_spoken_dialogue_using_reinforcement_learning"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="7" data-entity-id="92048809" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/92048809/Predictable_and_Adaptive_Goal_oriented_Dialog_Policy_Generation">Predictable and Adaptive Goal-oriented Dialog Policy Generation</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="248226883" href="https://independent.academia.edu/NhatLe161">Nhat Le</a></div><p class="ds-related-work--metadata ds2-5-body-xs">2021 IEEE 15th International Conference on Semantic Computing (ICSC)</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Predictable and Adaptive Goal-oriented Dialog Policy Generation","attachmentId":95163659,"attachmentType":"pdf","work_url":"https://www.academia.edu/92048809/Predictable_and_Adaptive_Goal_oriented_Dialog_Policy_Generation","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/92048809/Predictable_and_Adaptive_Goal_oriented_Dialog_Policy_Generation"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="8" data-entity-id="73560101" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/73560101/Learning_through_Dialogue_Interactions_by_Asking_Questions">Learning through Dialogue Interactions by Asking Questions</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="30387179" href="https://independent.academia.edu/SumitChopra5">Sumit Chopra</a></div><p class="ds-related-work--metadata ds2-5-body-xs">2017</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Learning through Dialogue Interactions by Asking Questions","attachmentId":82034635,"attachmentType":"pdf","work_url":"https://www.academia.edu/73560101/Learning_through_Dialogue_Interactions_by_Asking_Questions","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/73560101/Learning_through_Dialogue_Interactions_by_Asking_Questions"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="9" data-entity-id="64391635" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/64391635/A_Survey_on_Dialog_Management_Recent_Advances_and_Challenges">A Survey on Dialog Management: Recent Advances and Challenges</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="164963742" href="https://independent.academia.edu/chengguangtang">chengguang tang</a></div><p class="ds-related-work--metadata ds2-5-body-xs">ArXiv, 2020</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"A Survey on Dialog Management: Recent Advances and Challenges","attachmentId":76449794,"attachmentType":"pdf","work_url":"https://www.academia.edu/64391635/A_Survey_on_Dialog_Management_Recent_Advances_and_Challenges","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/64391635/A_Survey_on_Dialog_Management_Recent_Advances_and_Challenges"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="10" data-entity-id="3271398" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/3271398/Learning_the_reward_model_of_dialogue_POMDPs_from_data">Learning the reward model of dialogue POMDPs from data</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="3305596" href="https://ulaval.academia.edu/BrahimChaibdraa">Brahim Chaib-draa</a></div><p class="ds-related-work--metadata ds2-5-body-xs">2010</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Learning the reward model of dialogue POMDPs from data","attachmentId":31116303,"attachmentType":"pdf","work_url":"https://www.academia.edu/3271398/Learning_the_reward_model_of_dialogue_POMDPs_from_data","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/3271398/Learning_the_reward_model_of_dialogue_POMDPs_from_data"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="11" data-entity-id="64024556" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/64024556/Dialog_policy_optimization_for_low_resource_setting_using_Self_play_and_Reward_based_Sampling">Dialog policy optimization for low resource setting using Self-play and Reward based Sampling</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="61028408" href="https://independent.academia.edu/durashilangappuli">durashi langappuli</a></div><p class="ds-related-work--metadata ds2-5-body-xs">2020</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Dialog policy optimization for low resource setting using Self-play and Reward based Sampling","attachmentId":76253808,"attachmentType":"pdf","work_url":"https://www.academia.edu/64024556/Dialog_policy_optimization_for_low_resource_setting_using_Self_play_and_Reward_based_Sampling","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/64024556/Dialog_policy_optimization_for_low_resource_setting_using_Self_play_and_Reward_based_Sampling"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="12" data-entity-id="11601159" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/11601159/Empirical_evaluation_of_a_reinforcement_learning_spoken_dialogue_system">Empirical evaluation of a reinforcement learning spoken dialogue system</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="28331475" href="https://independent.academia.edu/satindersingh19">satinder singh</a></div><p class="ds-related-work--metadata ds2-5-body-xs">PROCEEDINGS OF THE …, 2000</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Empirical evaluation of a reinforcement learning spoken dialogue system","attachmentId":46611572,"attachmentType":"pdf","work_url":"https://www.academia.edu/11601159/Empirical_evaluation_of_a_reinforcement_learning_spoken_dialogue_system","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/11601159/Empirical_evaluation_of_a_reinforcement_learning_spoken_dialogue_system"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="13" data-entity-id="88142876" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/88142876/A_dynamic_goal_adapted_task_oriented_dialogue_agent">A dynamic goal adapted task oriented dialogue agent</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="4646101" href="https://iimcal.academia.edu/ShubhashisSengupta">Shubhashis Sengupta</a></div><p class="ds-related-work--metadata ds2-5-body-xs">PLOS ONE, 2021</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"A dynamic goal adapted task oriented dialogue agent","attachmentId":92175969,"attachmentType":"pdf","work_url":"https://www.academia.edu/88142876/A_dynamic_goal_adapted_task_oriented_dialogue_agent","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/88142876/A_dynamic_goal_adapted_task_oriented_dialogue_agent"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="14" data-entity-id="79127805" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/79127805/Dialogue_Systems_Domain_Interaction_Using_Reinforcement_Learning">Dialogue Systems Domain Interaction Using Reinforcement Learning</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="175342787" href="https://independent.academia.edu/PauloAra%C3%BAjo74">Paulo Araújo</a></div><p class="ds-related-work--metadata ds2-5-body-xs">2008</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Dialogue Systems Domain Interaction Using Reinforcement Learning","attachmentId":85953549,"attachmentType":"pdf","work_url":"https://www.academia.edu/79127805/Dialogue_Systems_Domain_Interaction_Using_Reinforcement_Learning","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/79127805/Dialogue_Systems_Domain_Interaction_Using_Reinforcement_Learning"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="15" data-entity-id="14976658" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/14976658/EVALUATING_EFFECTIVENESS_AND_PORTABILITY_OF_REINFORCEMENT_LEARNED_DIALOGUE_STRATEGIES_WITH_REAL_USERS_THE_TALK_TOWNINFO_EVALUATION">EVALUATING EFFECTIVENESS AND PORTABILITY OF REINFORCEMENT LEARNED DIALOGUE STRATEGIES WITH REAL USERS: THE TALK TOWNINFO EVALUATION</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="33971246" href="https://independent.academia.edu/JamesHenderson28">James Henderson</a></div><p class="ds-related-work--metadata ds2-5-body-xs">2006 IEEE Spoken Language Technology Workshop, 2006</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"EVALUATING EFFECTIVENESS AND PORTABILITY OF REINFORCEMENT LEARNED DIALOGUE STRATEGIES WITH REAL USERS: THE TALK TOWNINFO EVALUATION","attachmentId":43681486,"attachmentType":"pdf","work_url":"https://www.academia.edu/14976658/EVALUATING_EFFECTIVENESS_AND_PORTABILITY_OF_REINFORCEMENT_LEARNED_DIALOGUE_STRATEGIES_WITH_REAL_USERS_THE_TALK_TOWNINFO_EVALUATION","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/14976658/EVALUATING_EFFECTIVENESS_AND_PORTABILITY_OF_REINFORCEMENT_LEARNED_DIALOGUE_STRATEGIES_WITH_REAL_USERS_THE_TALK_TOWNINFO_EVALUATION"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="16" data-entity-id="82646762" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/82646762/Sample_Efficient_Deep_Reinforcement_Learning_for_Dialogue_Systems_With_Large_Action_Spaces">Sample Efficient Deep Reinforcement Learning for Dialogue Systems With Large Action Spaces</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="12420031" href="https://independent.academia.edu/ThabetMohammad">Mohammad Thabet</a></div><p class="ds-related-work--metadata ds2-5-body-xs">IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Sample Efficient Deep Reinforcement Learning for Dialogue Systems With Large Action Spaces","attachmentId":88285257,"attachmentType":"pdf","work_url":"https://www.academia.edu/82646762/Sample_Efficient_Deep_Reinforcement_Learning_for_Dialogue_Systems_With_Large_Action_Spaces","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/82646762/Sample_Efficient_Deep_Reinforcement_Learning_for_Dialogue_Systems_With_Large_Action_Spaces"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="17" data-entity-id="76839136" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/76839136/Reinforcement_Learning_With_Simulated_User_For_Automatic_Dialog_Strategy_Optimization">Reinforcement Learning With Simulated User For Automatic Dialog Strategy Optimization</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="129160128" href="https://ugh.academia.edu/MinhQuangNguyen">Minh Quang Nguyen</a></div><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Reinforcement Learning With Simulated User For Automatic Dialog Strategy Optimization","attachmentId":84413805,"attachmentType":"pdf","work_url":"https://www.academia.edu/76839136/Reinforcement_Learning_With_Simulated_User_For_Automatic_Dialog_Strategy_Optimization","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/76839136/Reinforcement_Learning_With_Simulated_User_For_Automatic_Dialog_Strategy_Optimization"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="18" data-entity-id="18195404" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/18195404/Leveraging_POMDPs_trained_with_user_simulations_and_rule_based_dialogue_management_in_a_spoken_dialogue_system">Leveraging POMDPs trained with user simulations and rule-based dialogue management in a spoken dialogue system</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="19627" href="https://independent.academia.edu/SilviaQuarteroni">Silvia Quarteroni</a></div><p class="ds-related-work--metadata ds2-5-body-xs">Proceedings of the SIGDIAL 2009 Conference on The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue - SIGDIAL '09, 2009</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"Leveraging POMDPs trained with user simulations and rule-based dialogue management in a spoken dialogue system","attachmentId":39927091,"attachmentType":"pdf","work_url":"https://www.academia.edu/18195404/Leveraging_POMDPs_trained_with_user_simulations_and_rule_based_dialogue_management_in_a_spoken_dialogue_system","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/18195404/Leveraging_POMDPs_trained_with_user_simulations_and_rule_based_dialogue_management_in_a_spoken_dialogue_system"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div><div class="ds-related-work--container js-related-work-sidebar-card" data-collection-position="19" data-entity-id="30524440" data-sort-order="default"><a class="ds-related-work--title js-related-work-grid-card-title ds2-5-body-md ds2-5-body-link" href="https://www.academia.edu/30524440/An_application_of_reinforcement_learning_to_dialogue_strategy_selection_in_a_spoken_dialogue_system_for_email">An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email</a><div class="ds-related-work--metadata"><a class="js-related-work-grid-card-author ds2-5-body-sm ds2-5-body-link" data-author-id="34843701" href="https://independent.academia.edu/MarilynWalker5">Marilyn Walker</a></div><p class="ds-related-work--metadata ds2-5-body-xs">2011</p><div class="ds-related-work--ctas"><button class="ds2-5-text-link ds2-5-text-link--inline js-swp-download-button" data-signup-modal="{"location":"wsj-grid-card-download-pdf-modal","work_title":"An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email","attachmentId":50968513,"attachmentType":"pdf","work_url":"https://www.academia.edu/30524440/An_application_of_reinforcement_learning_to_dialogue_strategy_selection_in_a_spoken_dialogue_system_for_email","alternativeTracking":true}"><span class="material-symbols-outlined" style="font-size: 18px" translate="no">download</span><span class="ds2-5-text-link__content">Download free PDF</span></button><a class="ds2-5-text-link ds2-5-text-link--inline js-related-work-grid-card-view-pdf" href="https://www.academia.edu/30524440/An_application_of_reinforcement_learning_to_dialogue_strategy_selection_in_a_spoken_dialogue_system_for_email"><span class="ds2-5-text-link__content">View PDF</span><span class="material-symbols-outlined" style="font-size: 18px" translate="no">chevron_right</span></a></div></div></div><div class="ds-related-content--container"><h2 class="ds-related-content--heading">Related topics</h2><div class="ds-research-interests--pills-container"><a class="js-related-research-interest ds-research-interests--pill" data-entity-id="77" href="https://www.academia.edu/Documents/in/Robotics">Robotics</a><a class="js-related-research-interest ds-research-interests--pill" data-entity-id="465" href="https://www.academia.edu/Documents/in/Artificial_Intelligence">Artificial Intelligence</a><a class="js-related-research-interest ds-research-interests--pill" data-entity-id="1688" href="https://www.academia.edu/Documents/in/Reinforcement_Learning">Reinforcement Learning</a><a class="js-related-research-interest ds-research-interests--pill" data-entity-id="5175" href="https://www.academia.edu/Documents/in/Dialogue">Dialogue</a><a class="js-related-research-interest ds-research-interests--pill" data-entity-id="81182" href="https://www.academia.edu/Documents/in/Deep_Learning">Deep Learning</a></div></div></div></div></div><div class="footer--content"><ul class="footer--main-links hide-on-mobile"><li><a href="https://www.academia.edu/about">About</a></li><li><a href="https://www.academia.edu/press">Press</a></li><li><a rel="nofollow" href="https://medium.com/academia">Blog</a></li><li><a href="https://www.academia.edu/documents">Papers</a></li><li><a href="https://www.academia.edu/topics">Topics</a></li><li><a href="https://www.academia.edu/hiring"><svg style="width: 13px; height: 13px; position: relative; bottom: -1px;" aria-hidden="true" focusable="false" data-prefix="fas" data-icon="briefcase" class="svg-inline--fa fa-briefcase fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M320 336c0 8.84-7.16 16-16 16h-96c-8.84 0-16-7.16-16-16v-48H0v144c0 25.6 22.4 48 48 48h416c25.6 0 48-22.4 48-48V288H320v48zm144-208h-80V80c0-25.6-22.4-48-48-48H176c-25.6 0-48 22.4-48 48v48H48c-25.6 0-48 22.4-48 48v80h512v-80c0-25.6-22.4-48-48-48zm-144 0H192V96h128v32z"></path></svg> <strong>We're Hiring!</strong></a></li><li><a href="https://support.academia.edu/"><svg style="width: 12px; height: 12px; position: relative; bottom: -1px;" aria-hidden="true" focusable="false" data-prefix="fas" data-icon="question-circle" class="svg-inline--fa fa-question-circle fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M504 256c0 136.997-111.043 248-248 248S8 392.997 8 256C8 119.083 119.043 8 256 8s248 111.083 248 248zM262.655 90c-54.497 0-89.255 22.957-116.549 63.758-3.536 5.286-2.353 12.415 2.715 16.258l34.699 26.31c5.205 3.947 12.621 3.008 16.665-2.122 17.864-22.658 30.113-35.797 57.303-35.797 20.429 0 45.698 13.148 45.698 32.958 0 14.976-12.363 22.667-32.534 33.976C247.128 238.528 216 254.941 216 296v4c0 6.627 5.373 12 12 12h56c6.627 0 12-5.373 12-12v-1.333c0-28.462 83.186-29.647 83.186-106.667 0-58.002-60.165-102-116.531-102zM256 338c-25.365 0-46 20.635-46 46 0 25.364 20.635 46 46 46s46-20.636 46-46c0-25.365-20.635-46-46-46z"></path></svg> <strong>Help Center</strong></a></li></ul><ul class="footer--research-interests"><li>Find new research papers in:</li><li><a href="https://www.academia.edu/Documents/in/Physics">Physics</a></li><li><a href="https://www.academia.edu/Documents/in/Chemistry">Chemistry</a></li><li><a href="https://www.academia.edu/Documents/in/Biology">Biology</a></li><li><a href="https://www.academia.edu/Documents/in/Health_Sciences">Health Sciences</a></li><li><a href="https://www.academia.edu/Documents/in/Ecology">Ecology</a></li><li><a href="https://www.academia.edu/Documents/in/Earth_Sciences">Earth Sciences</a></li><li><a href="https://www.academia.edu/Documents/in/Cognitive_Science">Cognitive Science</a></li><li><a href="https://www.academia.edu/Documents/in/Mathematics">Mathematics</a></li><li><a href="https://www.academia.edu/Documents/in/Computer_Science">Computer Science</a></li></ul><ul class="footer--legal-links hide-on-mobile"><li><a href="https://www.academia.edu/terms">Terms</a></li><li><a href="https://www.academia.edu/privacy">Privacy</a></li><li><a href="https://www.academia.edu/copyright">Copyright</a></li><li>Academia ©2024</li></ul></div> </body> </html>