CINXE.COM
Dataset formats and types
<!doctype html> <html class=""> <head> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no" /> <meta name="description" content="We’re on a journey to advance and democratize artificial intelligence through open source and open science." /> <meta property="fb:app_id" content="1321688464574422" /> <meta name="twitter:card" content="summary_large_image" /> <meta name="twitter:site" content="@huggingface" /> <meta name="twitter:image" content="https://huggingface.co/front/thumbnails/docs/trl.png" /> <meta property="og:title" content="Dataset formats and types" /> <meta property="og:type" content="website" /> <meta property="og:url" content="https://huggingface.co/docs/trl/v0.15.2/en/dataset_formats" /> <meta property="og:image" content="https://huggingface.co/front/thumbnails/docs/trl.png" /> <link rel="stylesheet" href="/front/build/kube-081ff2d/style.css" /> <link rel="preconnect" href="https://fonts.gstatic.com" /> <link href="https://fonts.googleapis.com/css2?family=Source+Sans+Pro:ital,wght@0,200;0,300;0,400;0,600;0,700;0,900;1,200;1,300;1,400;1,600;1,700;1,900&display=swap" rel="stylesheet" /> <link href="https://fonts.googleapis.com/css2?family=IBM+Plex+Mono:wght@400;600;700&display=swap" rel="stylesheet" /> <link rel="preload" href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.12.0/katex.min.css" as="style" onload="this.onload=null;this.rel='stylesheet'" /> <noscript> <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.12.0/katex.min.css" /> </noscript> <script>const guestTheme = document.cookie.match(/theme=(\w+)/)?.[1]; document.documentElement.classList.toggle('dark', guestTheme === 'dark' || ( (!guestTheme || guestTheme === 'system') && window.matchMedia('(prefers-color-scheme: dark)').matches));</script> <link rel="canonical" href="https://huggingface.co/docs/trl/v0.15.2/en/dataset_formats"> <link rel="alternate" hreflang="en" href="https://huggingface.co/docs/trl/v0.15.2/en/dataset_formats"> <link rel="alternate" hreflang="x-default" href="https://huggingface.co/docs/trl/v0.15.2/dataset_formats"> <title>Dataset formats and types</title> <script defer data-domain="huggingface.co" event-loggedIn="false" src="/js/script.pageview-props.js" ></script> <script> window.plausible = window.plausible || function () { (window.plausible.q = window.plausible.q || []).push(arguments); }; </script> <script> window.hubConfig = {"features":{"signupDisabled":false},"sshGitUrl":"git@hf.co","moonHttpUrl":"https:\/\/huggingface.co","captchaApiKey":"bd5f2066-93dc-4bdd-a64b-a24646ca3859","captchaDisabledOnSignup":true,"datasetViewerPublicUrl":"https:\/\/datasets-server.huggingface.co","stripePublicKey":"pk_live_x2tdjFXBCvXo2FFmMybezpeM00J6gPCAAc","environment":"production","userAgent":"HuggingFace (production)","spacesIframeDomain":"hf.space","spacesApiUrl":"https:\/\/api.hf.space","docSearchKey":"ece5e02e57300e17d152c08056145326e90c4bff3dd07d7d1ae40cf1c8d39cb6","logoDev":{"apiUrl":"https:\/\/img.logo.dev\/","apiKey":"pk_UHS2HZOeRnaSOdDp7jbd5w"}}; </script> <script type="text/javascript" src="https://de5282c3ca0c.edge.sdk.awswaf.com/de5282c3ca0c/526cf06acb0d/challenge.js" defer></script> </head> <body class="flex flex-col min-h-dvh bg-white dark:bg-gray-950 text-black DocBuilderPage"> <div class="flex min-h-dvh flex-col"><div class="SVELTE_HYDRATER contents" data-target="SystemThemeMonitor" data-props="{"isLoggedIn":false}"></div> <div class="SVELTE_HYDRATER contents" data-target="MainHeader" data-props="{"classNames":"","isWide":true,"isZh":false,"isPro":false}"><header class="border-b border-gray-100 "><div class="w-full px-4 flex h-16 items-center"><div class="flex flex-1 items-center"><a class="mr-5 flex flex-none items-center lg:mr-6" href="/"><img alt="Hugging Face's logo" class="w-7 md:mr-2" src="/front/assets/huggingface_logo-noborder.svg"> <span class="hidden whitespace-nowrap text-lg font-bold md:block">Hugging Face</span></a> <div class="relative flex-1 lg:max-w-sm mr-2 sm:mr-4 md:mr-3 xl:mr-6"><input autocomplete="off" class="w-full dark:bg-gray-950 pl-8 form-input-alt h-9 pr-3 focus:shadow-xl " name="" placeholder="Search models, datasets, users..." spellcheck="false" type="text" value=""> <svg class="absolute left-2.5 text-gray-400 top-1/2 transform -translate-y-1/2" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M30 28.59L22.45 21A11 11 0 1 0 21 22.45L28.59 30zM5 14a9 9 0 1 1 9 9a9 9 0 0 1-9-9z" fill="currentColor"></path></svg> </div> <div class="flex flex-none items-center justify-center p-0.5 place-self-stretch lg:hidden"><button class="relative z-40 flex h-6 w-8 items-center justify-center" type="button"><svg width="1em" height="1em" viewBox="0 0 10 10" class="text-xl" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" preserveAspectRatio="xMidYMid meet" fill="currentColor"><path fill-rule="evenodd" clip-rule="evenodd" d="M1.65039 2.9999C1.65039 2.8066 1.80709 2.6499 2.00039 2.6499H8.00039C8.19369 2.6499 8.35039 2.8066 8.35039 2.9999C8.35039 3.1932 8.19369 3.3499 8.00039 3.3499H2.00039C1.80709 3.3499 1.65039 3.1932 1.65039 2.9999ZM1.65039 4.9999C1.65039 4.8066 1.80709 4.6499 2.00039 4.6499H8.00039C8.19369 4.6499 8.35039 4.8066 8.35039 4.9999C8.35039 5.1932 8.19369 5.3499 8.00039 5.3499H2.00039C1.80709 5.3499 1.65039 5.1932 1.65039 4.9999ZM2.00039 6.6499C1.80709 6.6499 1.65039 6.8066 1.65039 6.9999C1.65039 7.1932 1.80709 7.3499 2.00039 7.3499H8.00039C8.19369 7.3499 8.35039 7.1932 8.35039 6.9999C8.35039 6.8066 8.19369 6.6499 8.00039 6.6499H2.00039Z"></path></svg> </button> </div></div> <nav aria-label="Main" class="ml-auto hidden lg:block"><ul class="flex items-center space-x-1.5 2xl:space-x-2"><li class="hover:text-indigo-700"><a class="group flex items-center px-2 py-0.5 dark:text-gray-300 dark:hover:text-gray-100" href="/models"><svg class="mr-1.5 text-gray-400 group-hover:text-indigo-500" style="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 24 24"><path class="uim-quaternary" d="M20.23 7.24L12 12L3.77 7.24a1.98 1.98 0 0 1 .7-.71L11 2.76c.62-.35 1.38-.35 2 0l6.53 3.77c.29.173.531.418.7.71z" opacity=".25" fill="currentColor"></path><path class="uim-tertiary" d="M12 12v9.5a2.09 2.09 0 0 1-.91-.21L4.5 17.48a2.003 2.003 0 0 1-1-1.73v-7.5a2.06 2.06 0 0 1 .27-1.01L12 12z" opacity=".5" fill="currentColor"></path><path class="uim-primary" d="M20.5 8.25v7.5a2.003 2.003 0 0 1-1 1.73l-6.62 3.82c-.275.13-.576.198-.88.2V12l8.23-4.76c.175.308.268.656.27 1.01z" fill="currentColor"></path></svg> Models</a> </li><li class="hover:text-red-700"><a class="group flex items-center px-2 py-0.5 dark:text-gray-300 dark:hover:text-gray-100" href="/datasets"><svg class="mr-1.5 text-gray-400 group-hover:text-red-500" style="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 25 25"><ellipse cx="12.5" cy="5" fill="currentColor" fill-opacity="0.25" rx="7.5" ry="2"></ellipse><path d="M12.5 15C16.6421 15 20 14.1046 20 13V20C20 21.1046 16.6421 22 12.5 22C8.35786 22 5 21.1046 5 20V13C5 14.1046 8.35786 15 12.5 15Z" fill="currentColor" opacity="0.5"></path><path d="M12.5 7C16.6421 7 20 6.10457 20 5V11.5C20 12.6046 16.6421 13.5 12.5 13.5C8.35786 13.5 5 12.6046 5 11.5V5C5 6.10457 8.35786 7 12.5 7Z" fill="currentColor" opacity="0.5"></path><path d="M5.23628 12C5.08204 12.1598 5 12.8273 5 13C5 14.1046 8.35786 15 12.5 15C16.6421 15 20 14.1046 20 13C20 12.8273 19.918 12.1598 19.7637 12C18.9311 12.8626 15.9947 13.5 12.5 13.5C9.0053 13.5 6.06886 12.8626 5.23628 12Z" fill="currentColor"></path></svg> Datasets</a> </li><li class="hover:text-blue-700"><a class="group flex items-center px-2 py-0.5 dark:text-gray-300 dark:hover:text-gray-100" href="/spaces"><svg class="mr-1.5 text-gray-400 group-hover:text-blue-500" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" viewBox="0 0 25 25"><path opacity=".5" d="M6.016 14.674v4.31h4.31v-4.31h-4.31ZM14.674 14.674v4.31h4.31v-4.31h-4.31ZM6.016 6.016v4.31h4.31v-4.31h-4.31Z" fill="currentColor"></path><path opacity=".75" fill-rule="evenodd" clip-rule="evenodd" d="M3 4.914C3 3.857 3.857 3 4.914 3h6.514c.884 0 1.628.6 1.848 1.414a5.171 5.171 0 0 1 7.31 7.31c.815.22 1.414.964 1.414 1.848v6.514A1.914 1.914 0 0 1 20.086 22H4.914A1.914 1.914 0 0 1 3 20.086V4.914Zm3.016 1.102v4.31h4.31v-4.31h-4.31Zm0 12.968v-4.31h4.31v4.31h-4.31Zm8.658 0v-4.31h4.31v4.31h-4.31Zm0-10.813a2.155 2.155 0 1 1 4.31 0 2.155 2.155 0 0 1-4.31 0Z" fill="currentColor"></path><path opacity=".25" d="M16.829 6.016a2.155 2.155 0 1 0 0 4.31 2.155 2.155 0 0 0 0-4.31Z" fill="currentColor"></path></svg> Spaces</a> </li><li class="hover:text-yellow-700 max-xl:hidden"><a class="group flex items-center px-2 py-0.5 dark:text-gray-300 dark:hover:text-gray-100" href="/posts"><svg class="mr-1.5 text-gray-400 group-hover:text-yellow-500 text-yellow-500!" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" viewBox="0 0 12 12" preserveAspectRatio="xMidYMid meet"><path fill="currentColor" fill-rule="evenodd" d="M3.73 2.4A4.25 4.25 0 1 1 6 10.26H2.17l-.13-.02a.43.43 0 0 1-.3-.43l.01-.06a.43.43 0 0 1 .12-.22l.84-.84A4.26 4.26 0 0 1 3.73 2.4Z" clip-rule="evenodd"></path></svg> Posts</a> </li><li class="hover:text-yellow-700"><a class="group flex items-center px-2 py-0.5 dark:text-gray-300 dark:hover:text-gray-100" href="/docs"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" class="mr-1.5 text-gray-400 group-hover:text-yellow-500" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 16 16"><path d="m2.28 3.7-.3.16a.67.67 0 0 0-.34.58v8.73l.01.04.02.07.01.04.03.06.02.04.02.03.04.06.05.05.04.04.06.04.06.04.08.04.08.02h.05l.07.02h.11l.04-.01.07-.02.03-.01.07-.03.22-.12a5.33 5.33 0 0 1 5.15.1.67.67 0 0 0 .66 0 5.33 5.33 0 0 1 5.33 0 .67.67 0 0 0 1-.58V4.36a.67.67 0 0 0-.34-.5l-.3-.17v7.78a.63.63 0 0 1-.87.59 4.9 4.9 0 0 0-4.35.35l-.65.39a.29.29 0 0 1-.15.04.29.29 0 0 1-.16-.04l-.65-.4a4.9 4.9 0 0 0-4.34-.34.63.63 0 0 1-.87-.59V3.7Z" fill="currentColor" class="dark:opacity-40"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M8 3.1a5.99 5.99 0 0 0-5.3-.43.66.66 0 0 0-.42.62v8.18c0 .45.46.76.87.59a4.9 4.9 0 0 1 4.34.35l.65.39c.05.03.1.04.16.04.05 0 .1-.01.15-.04l.65-.4a4.9 4.9 0 0 1 4.35-.34.63.63 0 0 0 .86-.59V3.3a.67.67 0 0 0-.41-.62 5.99 5.99 0 0 0-5.3.43l-.3.17L8 3.1Zm.73 1.87a.43.43 0 1 0-.86 0v5.48a.43.43 0 0 0 .86 0V4.97Z" fill="currentColor" class="opacity-40 dark:opacity-100"></path><path d="M8.73 4.97a.43.43 0 1 0-.86 0v5.48a.43.43 0 1 0 .86 0V4.96Z" fill="currentColor" class="dark:opacity-40"></path></svg> Docs</a> </li><li class="hover:text-black dark:hover:text-white"><a class="group flex items-center px-2 py-0.5 dark:text-gray-300 dark:hover:text-gray-100" href="/enterprise"><svg class="mr-1.5 text-gray-400 group-hover:text-black dark:group-hover:text-white" xmlns="http://www.w3.org/2000/svg" fill="none" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 33 27"><path fill="currentColor" fill-rule="evenodd" d="M13.5.7a8.7 8.7 0 0 0-7.7 5.7L1 20.6c-1 3.1.9 5.7 4.1 5.7h15c3.3 0 6.8-2.6 7.8-5.7l4.6-14.2c1-3.1-.8-5.7-4-5.7h-15Zm1.1 5.7L9.8 20.3h9.8l1-3.1h-5.8l.8-2.5h4.8l1.1-3h-4.8l.8-2.3H23l1-3h-9.5Z" clip-rule="evenodd"></path></svg> Enterprise</a> </li> <li><a class="group flex items-center px-2 py-0.5 dark:text-gray-300 dark:hover:text-gray-100" href="/pricing">Pricing </a></li> <li><div class="relative group"> <button class="px-2 py-0.5 hover:text-gray-500 dark:hover:text-gray-600 flex items-center " type="button"> <svg class=" text-gray-500 w-5 group-hover:text-gray-400 dark:text-gray-300 dark:group-hover:text-gray-100" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" viewBox="0 0 32 18" preserveAspectRatio="xMidYMid meet"><path fill-rule="evenodd" clip-rule="evenodd" d="M14.4504 3.30221C14.4504 2.836 14.8284 2.45807 15.2946 2.45807H28.4933C28.9595 2.45807 29.3374 2.836 29.3374 3.30221C29.3374 3.76842 28.9595 4.14635 28.4933 4.14635H15.2946C14.8284 4.14635 14.4504 3.76842 14.4504 3.30221Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M14.4504 9.00002C14.4504 8.53382 14.8284 8.15588 15.2946 8.15588H28.4933C28.9595 8.15588 29.3374 8.53382 29.3374 9.00002C29.3374 9.46623 28.9595 9.84417 28.4933 9.84417H15.2946C14.8284 9.84417 14.4504 9.46623 14.4504 9.00002Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M14.4504 14.6978C14.4504 14.2316 14.8284 13.8537 15.2946 13.8537H28.4933C28.9595 13.8537 29.3374 14.2316 29.3374 14.6978C29.3374 15.164 28.9595 15.542 28.4933 15.542H15.2946C14.8284 15.542 14.4504 15.164 14.4504 14.6978Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M1.94549 6.87377C2.27514 6.54411 2.80962 6.54411 3.13928 6.87377L6.23458 9.96907L9.32988 6.87377C9.65954 6.54411 10.194 6.54411 10.5237 6.87377C10.8533 7.20343 10.8533 7.73791 10.5237 8.06756L6.23458 12.3567L1.94549 8.06756C1.61583 7.73791 1.61583 7.20343 1.94549 6.87377Z" fill="currentColor"></path></svg> </button> </div></li> <li><hr class="h-5 w-0.5 border-none bg-gray-100 dark:bg-gray-800"></li> <li><a class="block cursor-pointer whitespace-nowrap px-2 py-0.5 hover:text-gray-500 dark:text-gray-300 dark:hover:text-gray-100" href="/login">Log In </a></li> <li><a class="whitespace-nowrap rounded-full border border-transparent bg-gray-900 px-3 py-1 leading-none text-white hover:border-black hover:bg-white hover:text-black" href="/join">Sign Up </a></li></ul></nav></div></header></div> <div class="SVELTE_HYDRATER contents" data-target="SSOBanner" data-props="{}"></div> <main class="flex flex-1 flex-col"><div class="relative lg:flex" id="hf-doc-container"><div class="sticky top-0 z-20 self-start"><div class="SVELTE_HYDRATER contents" data-target="SideMenu" data-props="{"chapters":[{"title":"Getting started","isExpanded":true,"sections":[{"title":"TRL","isExpanded":true,"id":"index","url":"/docs/trl/v0.15.2/en/index"},{"title":"Installation","isExpanded":true,"id":"installation","url":"/docs/trl/v0.15.2/en/installation"},{"title":"Quickstart","isExpanded":true,"id":"quickstart","url":"/docs/trl/v0.15.2/en/quickstart"}]},{"title":"Conceptual Guides","isExpanded":true,"sections":[{"title":"Dataset Formats","isExpanded":true,"id":"dataset_formats","url":"/docs/trl/v0.15.2/en/dataset_formats"},{"title":"Training FAQ","isExpanded":true,"id":"how_to_train","url":"/docs/trl/v0.15.2/en/how_to_train"},{"title":"Understanding Logs","isExpanded":true,"id":"logging","url":"/docs/trl/v0.15.2/en/logging"}]},{"title":"How-to guides","isExpanded":true,"sections":[{"title":"Command Line Interface (CLI)","isExpanded":true,"id":"clis","url":"/docs/trl/v0.15.2/en/clis"},{"title":"Customizing the Training","isExpanded":true,"id":"customization","url":"/docs/trl/v0.15.2/en/customization"},{"title":"Reducing Memory Usage","isExpanded":true,"id":"reducing_memory_usage","url":"/docs/trl/v0.15.2/en/reducing_memory_usage"},{"title":"Speeding Up Training","isExpanded":true,"id":"speeding_up_training","url":"/docs/trl/v0.15.2/en/speeding_up_training"},{"title":"Using Trained Models","isExpanded":true,"id":"use_model","url":"/docs/trl/v0.15.2/en/use_model"}]},{"title":"Integrations","isExpanded":true,"sections":[{"title":"DeepSpeed","isExpanded":true,"id":"deepspeed_integration","url":"/docs/trl/v0.15.2/en/deepspeed_integration"},{"title":"Liger Kernel","isExpanded":true,"id":"liger_kernel_integration","url":"/docs/trl/v0.15.2/en/liger_kernel_integration"},{"title":"PEFT","isExpanded":true,"id":"peft_integration","url":"/docs/trl/v0.15.2/en/peft_integration"},{"title":"Unsloth","isExpanded":true,"id":"unsloth_integration","url":"/docs/trl/v0.15.2/en/unsloth_integration"}]},{"title":"Examples","isExpanded":true,"sections":[{"title":"Example Overview","isExpanded":true,"id":"example_overview","url":"/docs/trl/v0.15.2/en/example_overview"},{"title":"Community Tutorials","isExpanded":true,"id":"community_tutorials","url":"/docs/trl/v0.15.2/en/community_tutorials"},{"title":"Sentiment Tuning","isExpanded":true,"id":"sentiment_tuning","url":"/docs/trl/v0.15.2/en/sentiment_tuning"},{"title":"Training StackLlama","isExpanded":true,"id":"using_llama_models","url":"/docs/trl/v0.15.2/en/using_llama_models"},{"title":"Detoxifying a Language Model","isExpanded":true,"id":"detoxifying_a_lm","url":"/docs/trl/v0.15.2/en/detoxifying_a_lm"},{"title":"Learning to Use Tools","isExpanded":true,"id":"learning_tools","url":"/docs/trl/v0.15.2/en/learning_tools"},{"title":"Multi Adapter RLHF","isExpanded":true,"id":"multi_adapter_rl","url":"/docs/trl/v0.15.2/en/multi_adapter_rl"}]},{"title":"API","isExpanded":true,"sections":[{"title":"Trainers","isExpanded":true,"sections":[{"title":"AlignProp","isExpanded":true,"id":"alignprop_trainer","url":"/docs/trl/v0.15.2/en/alignprop_trainer"},{"title":"BCO","isExpanded":true,"id":"bco_trainer","url":"/docs/trl/v0.15.2/en/bco_trainer"},{"title":"CPO","isExpanded":true,"id":"cpo_trainer","url":"/docs/trl/v0.15.2/en/cpo_trainer"},{"title":"DDPO","isExpanded":true,"id":"ddpo_trainer","url":"/docs/trl/v0.15.2/en/ddpo_trainer"},{"title":"DPO","isExpanded":true,"id":"dpo_trainer","url":"/docs/trl/v0.15.2/en/dpo_trainer"},{"title":"Online DPO","isExpanded":true,"id":"online_dpo_trainer","url":"/docs/trl/v0.15.2/en/online_dpo_trainer"},{"title":"GKD","isExpanded":true,"id":"gkd_trainer","url":"/docs/trl/v0.15.2/en/gkd_trainer"},{"title":"GRPO","isExpanded":true,"id":"grpo_trainer","url":"/docs/trl/v0.15.2/en/grpo_trainer"},{"title":"KTO","isExpanded":true,"id":"kto_trainer","url":"/docs/trl/v0.15.2/en/kto_trainer"},{"title":"Nash-MD","isExpanded":true,"id":"nash_md_trainer","url":"/docs/trl/v0.15.2/en/nash_md_trainer"},{"title":"ORPO","isExpanded":true,"id":"orpo_trainer","url":"/docs/trl/v0.15.2/en/orpo_trainer"},{"title":"PPO","isExpanded":true,"id":"ppo_trainer","url":"/docs/trl/v0.15.2/en/ppo_trainer"},{"title":"PRM","isExpanded":true,"id":"prm_trainer","url":"/docs/trl/v0.15.2/en/prm_trainer"},{"title":"Reward","isExpanded":true,"id":"reward_trainer","url":"/docs/trl/v0.15.2/en/reward_trainer"},{"title":"RLOO","isExpanded":true,"id":"rloo_trainer","url":"/docs/trl/v0.15.2/en/rloo_trainer"},{"title":"SFT","isExpanded":true,"id":"sft_trainer","url":"/docs/trl/v0.15.2/en/sft_trainer"},{"title":"Iterative SFT","isExpanded":true,"id":"iterative_sft_trainer","url":"/docs/trl/v0.15.2/en/iterative_sft_trainer"},{"title":"XPO","isExpanded":true,"id":"xpo_trainer","url":"/docs/trl/v0.15.2/en/xpo_trainer"}]},{"title":"Model Classes","isExpanded":true,"id":"models","url":"/docs/trl/v0.15.2/en/models"},{"title":"Best of N Sampling","isExpanded":true,"id":"best_of_n","url":"/docs/trl/v0.15.2/en/best_of_n"},{"title":"Judges","isExpanded":true,"id":"judges","url":"/docs/trl/v0.15.2/en/judges"},{"title":"Callbacks","isExpanded":true,"id":"callbacks","url":"/docs/trl/v0.15.2/en/callbacks"},{"title":"Data Utilities","isExpanded":true,"id":"data_utils","url":"/docs/trl/v0.15.2/en/data_utils"},{"title":"Text Environments","isExpanded":true,"id":"text_environments","url":"/docs/trl/v0.15.2/en/text_environments"},{"title":"Script Utilities","isExpanded":true,"id":"script_utils","url":"/docs/trl/v0.15.2/en/script_utils"}]}],"chapterId":"dataset_formats","docType":"docs","isLoggedIn":false,"lang":"en","langs":["en"],"library":"trl","theme":"light","version":"v0.15.2","versions":[{"version":"main"},{"version":"v0.16.0"},{"version":"v0.15.2"},{"version":"v0.15.1"},{"version":"v0.15.0"},{"version":"v0.14.0"},{"version":"v0.13.0"},{"version":"v0.12.2"},{"version":"v0.12.1"},{"version":"v0.12.0"},{"version":"v0.11.4"},{"version":"v0.11.3"},{"version":"v0.11.2"},{"version":"v0.11.1"},{"version":"v0.11.0"},{"version":"v0.10.1"},{"version":"v0.9.6"},{"version":"v0.9.4"},{"version":"v0.9.3"},{"version":"v0.9.2"},{"version":"v0.8.6"},{"version":"v0.8.5"},{"version":"v0.8.4"},{"version":"v0.8.3"},{"version":"v0.8.2"},{"version":"v0.8.1"},{"version":"v0.8.0"},{"version":"v0.7.11"},{"version":"v0.7.10"},{"version":"v0.7.9"},{"version":"v0.7.8"},{"version":"v0.7.4"},{"version":"v0.7.2"},{"version":"v0.7.1"},{"version":"v0.7.0"},{"version":"v0.6.0"},{"version":"v0.5.0"},{"version":"v0.4.7"},{"version":"v0.4.6"},{"version":"v0.4.5"},{"version":"v0.4.2"},{"version":"v0.4.1"},{"version":"v0.4.0"},{"version":"v0.3.1"},{"version":"v0.3.0"},{"version":"v0.2.1"},{"version":"v0.2.0"},{"version":"v0.1.1"}],"title":"Dataset formats and types"}"> <div class="z-2 w-full flex-none lg:flex lg:h-dvh lg:w-[270px] lg:flex-col 2xl:w-[300px] false"><div class="shadow-alternate flex h-auto w-full items-center rounded-b-xl border-b bg-white py-2 text-lg leading-tight lg:hidden"> <div class="flex flex-1 cursor-pointer flex-col justify-center self-stretch pl-6"><p class="text-sm text-gray-400 first-letter:capitalize">TRL documentation </p> <div class="mr-2 flex items-center"><p class="font-semibold">Dataset formats and types</p> <svg class="text-xl false" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 24 24"><path d="M16.293 9.293L12 13.586L7.707 9.293l-1.414 1.414L12 16.414l5.707-5.707z" fill="currentColor"></path></svg></div></div> <button class="hover:shadow-alternate group ml-auto mr-6 inline-flex flex-none cursor-pointer rounded-xl border p-2"><svg class="text-gray-500 group-hover:text-gray-700" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M30 28.59L22.45 21A11 11 0 1 0 21 22.45L28.59 30zM5 14a9 9 0 1 1 9 9a9 9 0 0 1-9-9z" fill="currentColor"></path></svg></button></div> <div class="bg-linear-to-r hidden flex-col justify-between border-b border-r bg-white p-4 lg:flex from-blue-50 to-white dark:from-gray-900 dark:to-gray-950"><div class="group relative mb-2 flex min-w-[50%] items-center self-start text-lg font-bold leading-tight first-letter:capitalize"><div class="mr-1.5 h-1.5 w-1.5 rounded-full bg-blue-500 flex-none"></div> <h1>TRL</h1> <svg class="opacity-50 ml-0.5 flex-none group-hover:opacity-100" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 24 24"><path d="M16.293 9.293L12 13.586L7.707 9.293l-1.414 1.414L12 16.414l5.707-5.707z" fill="currentColor"></path></svg> <select class="outline-hidden absolute inset-0 border-none bg-white text-base opacity-0"><option value="/docs">🏡 View all docs</option><option value="/docs/optimum-neuron" >AWS Trainium & Inferentia</option><option value="/docs/accelerate" >Accelerate</option><option value="/docs/sagemaker" >Amazon SageMaker</option><option value="https://argilla-io.github.io/argilla/" >Argilla</option><option value="/docs/autotrain" >AutoTrain</option><option value="/docs/bitsandbytes" >Bitsandbytes</option><option value="/docs/chat-ui" >Chat UI</option><option value="/docs/competitions" >Competitions</option><option value="/docs/dataset-viewer" >Dataset viewer</option><option value="/docs/datasets" >Datasets</option><option value="/docs/diffusers" >Diffusers</option><option value="https://distilabel.argilla.io/" >Distilabel</option><option value="/docs/evaluate" >Evaluate</option><option value="https://www.gradio.app/docs/" >Gradio</option><option value="/docs/hub" >Hub</option><option value="/docs/huggingface_hub" >Hub Python Library</option><option value="/docs/hugs" >Hugging Face Generative AI Services (HUGS)</option><option value="/docs/huggingface.js" >Huggingface.js</option><option value="/docs/api-inference" >Inference API (serverless)</option><option value="/docs/inference-endpoints" >Inference Endpoints (dedicated)</option><option value="/docs/leaderboards" >Leaderboards</option><option value="/docs/lighteval" >Lighteval</option><option value="/docs/optimum" >Optimum</option><option value="/docs/peft" >PEFT</option><option value="/docs/safetensors" >Safetensors</option><option value="https://sbert.net/" >Sentence Transformers</option><option value="/docs/trl" selected>TRL</option><option value="/tasks" >Tasks</option><option value="/docs/text-embeddings-inference" >Text Embeddings Inference</option><option value="/docs/text-generation-inference" >Text Generation Inference</option><option value="/docs/tokenizers" >Tokenizers</option><option value="/docs/transformers" >Transformers</option><option value="/docs/transformers.js" >Transformers.js</option><option value="/docs/smolagents" >smolagents</option><option value="/docs/timm" >timm</option></select></div> <button class="shadow-alternate mb-2 flex w-full items-center rounded-full border bg-white px-2 py-1 text-left text-sm text-gray-400 ring-indigo-200 hover:bg-indigo-50 hover:ring-2 dark:border-gray-700 dark:ring-yellow-600 dark:hover:bg-gray-900 dark:hover:text-yellow-500"><svg class="flex-none mr-1.5" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M30 28.59L22.45 21A11 11 0 1 0 21 22.45L28.59 30zM5 14a9 9 0 1 1 9 9a9 9 0 0 1-9-9z" fill="currentColor"></path></svg> <div>Search documentation</div> </button> <div class="flex items-center"> <select class="form-input mt-0! w-20! border! dark:text-gray-400! mr-1 rounded-sm border-gray-200 p-1 text-xs uppercase"><option value="0" >main</option><option value="1" >v0.16.0</option><option value="2" selected>v0.15.2</option><option value="3" >v0.14.0</option><option value="4" >v0.13.0</option><option value="5" >v0.12.2</option><option value="6" >v0.11.4</option><option value="7" >v0.10.1</option><option value="8" >v0.9.6</option><option value="9" >v0.8.6</option><option value="10" >v0.7.11</option><option value="11" >v0.6.0</option><option value="12" >v0.5.0</option><option value="13" >v0.4.7</option><option value="14" >v0.3.1</option><option value="15" >v0.2.1</option><option value="16" >v0.1.1</option></select> <select class="form-input dark:text-gray-400! mr-1 rounded-sm border-gray-200 p-1 text-xs w-12! mt-0! border!"><option value="en" selected>EN</option></select> <div class="relative inline-block "> <button class="rounded-full border border-gray-100 pl-2 py-1 pr-2.5 flex items-center text-sm text-gray-500 bg-white hover:bg-yellow-50 hover:border-yellow-200 dark:hover:bg-gray-800 dark:hover:border-gray-950 dark:border-gray-800 " type="button"> <svg class=" text-yellow-500" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 24 24" fill="currentColor"><path d="M6.05 4.14l-.39-.39a.993.993 0 0 0-1.4 0l-.01.01a.984.984 0 0 0 0 1.4l.39.39c.39.39 1.01.39 1.4 0l.01-.01a.984.984 0 0 0 0-1.4zM3.01 10.5H1.99c-.55 0-.99.44-.99.99v.01c0 .55.44.99.99.99H3c.56.01 1-.43 1-.98v-.01c0-.56-.44-1-.99-1zm9-9.95H12c-.56 0-1 .44-1 .99v.96c0 .55.44.99.99.99H12c.56.01 1-.43 1-.98v-.97c0-.55-.44-.99-.99-.99zm7.74 3.21c-.39-.39-1.02-.39-1.41-.01l-.39.39a.984.984 0 0 0 0 1.4l.01.01c.39.39 1.02.39 1.4 0l.39-.39a.984.984 0 0 0 0-1.4zm-1.81 15.1l.39.39a.996.996 0 1 0 1.41-1.41l-.39-.39a.993.993 0 0 0-1.4 0c-.4.4-.4 1.02-.01 1.41zM20 11.49v.01c0 .55.44.99.99.99H22c.55 0 .99-.44.99-.99v-.01c0-.55-.44-.99-.99-.99h-1.01c-.55 0-.99.44-.99.99zM12 5.5c-3.31 0-6 2.69-6 6s2.69 6 6 6s6-2.69 6-6s-2.69-6-6-6zm-.01 16.95H12c.55 0 .99-.44.99-.99v-.96c0-.55-.44-.99-.99-.99h-.01c-.55 0-.99.44-.99.99v.96c0 .55.44.99.99.99zm-7.74-3.21c.39.39 1.02.39 1.41 0l.39-.39a.993.993 0 0 0 0-1.4l-.01-.01a.996.996 0 0 0-1.41 0l-.39.39c-.38.4-.38 1.02.01 1.41z"></path></svg> </button> </div> <a href="https://github.com/huggingface/trl" class="group ml-auto text-xs text-gray-500 hover:text-gray-700 hover:underline dark:hover:text-gray-300"><svg class="inline-block text-gray-500 group-hover:text-gray-700 dark:group-hover:text-gray-300 mr-1.5 -mt-1 w-4 h-4" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1.03em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 250"><path d="M128.001 0C57.317 0 0 57.307 0 128.001c0 56.554 36.676 104.535 87.535 121.46c6.397 1.185 8.746-2.777 8.746-6.158c0-3.052-.12-13.135-.174-23.83c-35.61 7.742-43.124-15.103-43.124-15.103c-5.823-14.795-14.213-18.73-14.213-18.73c-11.613-7.944.876-7.78.876-7.78c12.853.902 19.621 13.19 19.621 13.19c11.417 19.568 29.945 13.911 37.249 10.64c1.149-8.272 4.466-13.92 8.127-17.116c-28.431-3.236-58.318-14.212-58.318-63.258c0-13.975 5-25.394 13.188-34.358c-1.329-3.224-5.71-16.242 1.24-33.874c0 0 10.749-3.44 35.21 13.121c10.21-2.836 21.16-4.258 32.038-4.307c10.878.049 21.837 1.47 32.066 4.307c24.431-16.56 35.165-13.12 35.165-13.12c6.967 17.63 2.584 30.65 1.255 33.873c8.207 8.964 13.173 20.383 13.173 34.358c0 49.163-29.944 59.988-58.447 63.157c4.591 3.972 8.682 11.762 8.682 23.704c0 17.126-.148 30.91-.148 35.126c0 3.407 2.304 7.398 8.792 6.14C219.37 232.5 256 184.537 256 128.002C256 57.307 198.691 0 128.001 0zm-80.06 182.34c-.282.636-1.283.827-2.194.39c-.929-.417-1.45-1.284-1.15-1.922c.276-.655 1.279-.838 2.205-.399c.93.418 1.46 1.293 1.139 1.931zm6.296 5.618c-.61.566-1.804.303-2.614-.591c-.837-.892-.994-2.086-.375-2.66c.63-.566 1.787-.301 2.626.591c.838.903 1 2.088.363 2.66zm4.32 7.188c-.785.545-2.067.034-2.86-1.104c-.784-1.138-.784-2.503.017-3.05c.795-.547 2.058-.055 2.861 1.075c.782 1.157.782 2.522-.019 3.08zm7.304 8.325c-.701.774-2.196.566-3.29-.49c-1.119-1.032-1.43-2.496-.726-3.27c.71-.776 2.213-.558 3.315.49c1.11 1.03 1.45 2.505.701 3.27zm9.442 2.81c-.31 1.003-1.75 1.459-3.199 1.033c-1.448-.439-2.395-1.613-2.103-2.626c.301-1.01 1.747-1.484 3.207-1.028c1.446.436 2.396 1.602 2.095 2.622zm10.744 1.193c.036 1.055-1.193 1.93-2.715 1.95c-1.53.034-2.769-.82-2.786-1.86c0-1.065 1.202-1.932 2.733-1.958c1.522-.03 2.768.818 2.768 1.868zm10.555-.405c.182 1.03-.875 2.088-2.387 2.37c-1.485.271-2.861-.365-3.05-1.386c-.184-1.056.893-2.114 2.376-2.387c1.514-.263 2.868.356 3.061 1.403z" fill="currentColor"></path></svg> </a></div></div> <nav class="hidden flex-auto lg:flex bottom-0 left-0 w-full flex-col overflow-y-auto border-r px-4 pb-16 pt-3 text-[0.95rem] lg:w-[270px] 2xl:w-[300px]"> <div class="group flex cursor-pointer items-center pl-2 text-[0.8rem] font-semibold uppercase leading-9 hover:text-gray-700 dark:hover:text-gray-300 ml-0"><div class="flex after:absolute after:right-4 after:text-gray-500 group-hover:after:content-['▶'] after:rotate-90 after:transform"><span><span class="inline-block space-x-1 leading-5"><span><!-- HTML_TAG_START -->Getting started<!-- HTML_TAG_END --></span> </span></span> </div></div> <div class="flex flex-col"><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/index" id="index"><!-- HTML_TAG_START -->TRL<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/installation" id="installation"><!-- HTML_TAG_START -->Installation<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/quickstart" id="quickstart"><!-- HTML_TAG_START -->Quickstart<!-- HTML_TAG_END --> </a> </div> <div class="group flex cursor-pointer items-center pl-2 text-[0.8rem] font-semibold uppercase leading-9 hover:text-gray-700 dark:hover:text-gray-300 ml-0"><div class="flex after:absolute after:right-4 after:text-gray-500 group-hover:after:content-['▶'] after:rotate-90 after:transform"><span><span class="inline-block space-x-1 leading-5"><span><!-- HTML_TAG_START -->Conceptual Guides<!-- HTML_TAG_END --></span> </span></span> </div></div> <div class="flex flex-col"><a class="bg-linear-to-br rounded-xl from-black to-gray-900 py-1 pl-2 pr-2 text-white first:mt-1 last:mb-4 dark:from-gray-800 dark:to-gray-900 ml-2" href="/docs/trl/v0.15.2/en/dataset_formats" id="dataset_formats"><!-- HTML_TAG_START -->Dataset Formats<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/how_to_train" id="how_to_train"><!-- HTML_TAG_START -->Training FAQ<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/logging" id="logging"><!-- HTML_TAG_START -->Understanding Logs<!-- HTML_TAG_END --> </a> </div> <div class="group flex cursor-pointer items-center pl-2 text-[0.8rem] font-semibold uppercase leading-9 hover:text-gray-700 dark:hover:text-gray-300 ml-0"><div class="flex after:absolute after:right-4 after:text-gray-500 group-hover:after:content-['▶'] after:rotate-90 after:transform"><span><span class="inline-block space-x-1 leading-5"><span><!-- HTML_TAG_START -->How-to guides<!-- HTML_TAG_END --></span> </span></span> </div></div> <div class="flex flex-col"><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/clis" id="clis"><!-- HTML_TAG_START -->Command Line Interface (CLI)<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/customization" id="customization"><!-- HTML_TAG_START -->Customizing the Training<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/reducing_memory_usage" id="reducing_memory_usage"><!-- HTML_TAG_START -->Reducing Memory Usage<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/speeding_up_training" id="speeding_up_training"><!-- HTML_TAG_START -->Speeding Up Training<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/use_model" id="use_model"><!-- HTML_TAG_START -->Using Trained Models<!-- HTML_TAG_END --> </a> </div> <div class="group flex cursor-pointer items-center pl-2 text-[0.8rem] font-semibold uppercase leading-9 hover:text-gray-700 dark:hover:text-gray-300 ml-0"><div class="flex after:absolute after:right-4 after:text-gray-500 group-hover:after:content-['▶'] after:rotate-90 after:transform"><span><span class="inline-block space-x-1 leading-5"><span><!-- HTML_TAG_START -->Integrations<!-- HTML_TAG_END --></span> </span></span> </div></div> <div class="flex flex-col"><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/deepspeed_integration" id="deepspeed_integration"><!-- HTML_TAG_START -->DeepSpeed<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/liger_kernel_integration" id="liger_kernel_integration"><!-- HTML_TAG_START -->Liger Kernel<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/peft_integration" id="peft_integration"><!-- HTML_TAG_START -->PEFT<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/unsloth_integration" id="unsloth_integration"><!-- HTML_TAG_START -->Unsloth<!-- HTML_TAG_END --> </a> </div> <div class="group flex cursor-pointer items-center pl-2 text-[0.8rem] font-semibold uppercase leading-9 hover:text-gray-700 dark:hover:text-gray-300 ml-0"><div class="flex after:absolute after:right-4 after:text-gray-500 group-hover:after:content-['▶'] after:rotate-90 after:transform"><span><span class="inline-block space-x-1 leading-5"><span><!-- HTML_TAG_START -->Examples<!-- HTML_TAG_END --></span> </span></span> </div></div> <div class="flex flex-col"><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/example_overview" id="example_overview"><!-- HTML_TAG_START -->Example Overview<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/community_tutorials" id="community_tutorials"><!-- HTML_TAG_START -->Community Tutorials<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/sentiment_tuning" id="sentiment_tuning"><!-- HTML_TAG_START -->Sentiment Tuning<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/using_llama_models" id="using_llama_models"><!-- HTML_TAG_START -->Training StackLlama<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/detoxifying_a_lm" id="detoxifying_a_lm"><!-- HTML_TAG_START -->Detoxifying a Language Model<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/learning_tools" id="learning_tools"><!-- HTML_TAG_START -->Learning to Use Tools<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/multi_adapter_rl" id="multi_adapter_rl"><!-- HTML_TAG_START -->Multi Adapter RLHF<!-- HTML_TAG_END --> </a> </div> <div class="group flex cursor-pointer items-center pl-2 text-[0.8rem] font-semibold uppercase leading-9 hover:text-gray-700 dark:hover:text-gray-300 ml-0"><div class="flex after:absolute after:right-4 after:text-gray-500 group-hover:after:content-['▶'] after:rotate-90 after:transform"><span><span class="inline-block space-x-1 leading-5"><span><!-- HTML_TAG_START -->API<!-- HTML_TAG_END --></span> </span></span> </div></div> <div class="flex flex-col"> <div class="group flex cursor-pointer items-center pl-2 text-[0.8rem] font-semibold uppercase leading-9 hover:text-gray-700 dark:hover:text-gray-300 ml-2"><div class="flex after:absolute after:right-4 after:text-gray-500 group-hover:after:content-['▶'] after:rotate-90 after:transform"><span><span class="inline-block space-x-1 leading-5"><span><!-- HTML_TAG_START -->Trainers<!-- HTML_TAG_END --></span> </span></span> </div></div> <div class="flex flex-col"><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/alignprop_trainer" id="alignprop_trainer"><!-- HTML_TAG_START -->AlignProp<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/bco_trainer" id="bco_trainer"><!-- HTML_TAG_START -->BCO<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/cpo_trainer" id="cpo_trainer"><!-- HTML_TAG_START -->CPO<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/ddpo_trainer" id="ddpo_trainer"><!-- HTML_TAG_START -->DDPO<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/dpo_trainer" id="dpo_trainer"><!-- HTML_TAG_START -->DPO<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/online_dpo_trainer" id="online_dpo_trainer"><!-- HTML_TAG_START -->Online DPO<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/gkd_trainer" id="gkd_trainer"><!-- HTML_TAG_START -->GKD<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/grpo_trainer" id="grpo_trainer"><!-- HTML_TAG_START -->GRPO<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/kto_trainer" id="kto_trainer"><!-- HTML_TAG_START -->KTO<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/nash_md_trainer" id="nash_md_trainer"><!-- HTML_TAG_START -->Nash-MD<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/orpo_trainer" id="orpo_trainer"><!-- HTML_TAG_START -->ORPO<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/ppo_trainer" id="ppo_trainer"><!-- HTML_TAG_START -->PPO<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/prm_trainer" id="prm_trainer"><!-- HTML_TAG_START -->PRM<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/reward_trainer" id="reward_trainer"><!-- HTML_TAG_START -->Reward<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/rloo_trainer" id="rloo_trainer"><!-- HTML_TAG_START -->RLOO<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/sft_trainer" id="sft_trainer"><!-- HTML_TAG_START -->SFT<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/iterative_sft_trainer" id="iterative_sft_trainer"><!-- HTML_TAG_START -->Iterative SFT<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-4" href="/docs/trl/v0.15.2/en/xpo_trainer" id="xpo_trainer"><!-- HTML_TAG_START -->XPO<!-- HTML_TAG_END --> </a> </div><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/models" id="models"><!-- HTML_TAG_START -->Model Classes<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/best_of_n" id="best_of_n"><!-- HTML_TAG_START -->Best of N Sampling<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/judges" id="judges"><!-- HTML_TAG_START -->Judges<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/callbacks" id="callbacks"><!-- HTML_TAG_START -->Callbacks<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/data_utils" id="data_utils"><!-- HTML_TAG_START -->Data Utilities<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/text_environments" id="text_environments"><!-- HTML_TAG_START -->Text Environments<!-- HTML_TAG_END --> </a><a class="transform py-1 pl-2 pr-2 text-gray-500 first:mt-1 last:mb-4 hover:translate-x-px hover:text-black dark:hover:text-gray-300 ml-2" href="/docs/trl/v0.15.2/en/script_utils" id="script_utils"><!-- HTML_TAG_START -->Script Utilities<!-- HTML_TAG_END --> </a> </div></nav></div></div></div> <div class="z-1 min-w-0 flex-1"><div class="flex justify-center bg-blue-50 px-6 py-1.5 text-xs text-blue-600 dark:bg-blue-900 dark:text-blue-200 sm:text-sm"><div>You are viewing v0.15.2 version. <span>A newer version <a class="underline" href="/docs/trl/v0.16.0/dataset_formats">v0.16.0</a> is available.</span></div></div> <div class="px-6 pt-6 md:px-12 md:pb-16 md:pt-16"><div class="max-w-4xl mx-auto mb-10"><div class="bg-linear-to-br relative overflow-hidden rounded-xl from-orange-300/10 px-4 py-5 ring-1 ring-orange-100/70 md:px-6 md:py-8"><img alt="Hugging Face's logo" class="absolute -bottom-6 -right-6 w-28 -rotate-45 md:hidden" src="/front/assets/huggingface_logo-noborder.svg"> <div class="mb-2 text-2xl font-bold dark:text-gray-200 md:mb-0">Join the Hugging Face community</div> <p class="mb-4 text-lg text-gray-400 dark:text-gray-300 md:mb-8">and get access to the augmented documentation experience </p> <div class="mb-8 hidden space-y-4 md:block xl:flex xl:space-x-6 xl:space-y-0"><div class="flex items-center"><div class="bg-linear-to-br mr-3 flex h-9 w-9 flex-none items-center justify-center rounded-lg from-indigo-100 to-indigo-100/20 dark:to-indigo-100"><svg class="text-indigo-400 group-hover:text-indigo-500" style="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 24 24"><path class="uim-quaternary" d="M20.23 7.24L12 12L3.77 7.24a1.98 1.98 0 0 1 .7-.71L11 2.76c.62-.35 1.38-.35 2 0l6.53 3.77c.29.173.531.418.7.71z" opacity=".25" fill="currentColor"></path><path class="uim-tertiary" d="M12 12v9.5a2.09 2.09 0 0 1-.91-.21L4.5 17.48a2.003 2.003 0 0 1-1-1.73v-7.5a2.06 2.06 0 0 1 .27-1.01L12 12z" opacity=".5" fill="currentColor"></path><path class="uim-primary" d="M20.5 8.25v7.5a2.003 2.003 0 0 1-1 1.73l-6.62 3.82c-.275.13-.576.198-.88.2V12l8.23-4.76c.175.308.268.656.27 1.01z" fill="currentColor"></path></svg></div> <div class="text-smd leading-tight text-gray-500 dark:text-gray-300 xl:max-w-[200px] 2xl:text-base">Collaborate on models, datasets and Spaces </div></div> <div class="flex items-center"><div class="bg-linear-to-br mr-3 flex h-9 w-9 flex-none items-center justify-center rounded-lg from-orange-100 to-orange-100/20 dark:to-orange-50"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" class="text-xl text-yellow-400" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 24 24"><path d="M11 15H6l7-14v8h5l-7 14v-8z" fill="currentColor"></path></svg></div> <div class="text-smd leading-tight text-gray-500 dark:text-gray-300 xl:max-w-[200px] 2xl:text-base">Faster examples with accelerated inference </div></div> <div class="flex items-center"><div class="bg-linear-to-br mr-3 flex h-9 w-9 flex-none items-center justify-center rounded-lg from-gray-500/10 to-gray-500/5"><svg class="text-gray-400" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M14.9804 3C14.9217 3.0002 14.8631 3.00555 14.8054 3.016C11.622 3.58252 8.76073 5.30669 6.77248 7.85653C4.78422 10.4064 3.80955 13.6016 4.03612 16.8271C4.26268 20.0525 5.67447 23.0801 7.99967 25.327C10.3249 27.5738 13.3991 28.8811 16.6304 28.997C16.7944 29.003 16.9584 28.997 17.1204 28.997C19.2193 28.9984 21.2877 28.4943 23.1507 27.5274C25.0137 26.5605 26.6164 25.1592 27.8234 23.442C27.9212 23.294 27.9783 23.1229 27.9889 22.9458C27.9995 22.7687 27.9633 22.592 27.884 22.4333C27.8046 22.2747 27.6848 22.1397 27.5367 22.0421C27.3887 21.9444 27.2175 21.8875 27.0404 21.877C25.0426 21.7017 23.112 21.0693 21.3976 20.0288C19.6832 18.9884 18.231 17.5676 17.1533 15.8764C16.0756 14.1852 15.4011 12.2688 15.1822 10.2754C14.9632 8.28193 15.2055 6.26484 15.8904 4.38C15.9486 4.22913 15.97 4.06652 15.9527 3.90572C15.9354 3.74492 15.8799 3.59059 15.7909 3.45557C15.7019 3.32055 15.5819 3.20877 15.4409 3.12952C15.2999 3.05028 15.142 3.00587 14.9804 3Z" fill="currentColor"></path></svg></div> <div class="text-smd leading-tight text-gray-500 dark:text-gray-300 xl:max-w-[200px] 2xl:text-base">Switch between documentation themes </div></div></div> <div class="flex items-center space-x-2.5"><a href="/join"><button class="bg-linear-to-br shadow-xs rounded-lg bg-white from-gray-100/20 to-gray-200/60 px-5 py-1.5 font-semibold text-gray-700 ring-1 ring-gray-300/60 hover:to-gray-100/70 hover:ring-gray-300/30 active:shadow-inner">Sign Up</button></a> <p class="text-gray-500 dark:text-gray-300">to get started</p></div></div></div> <div class="prose-doc prose relative mx-auto max-w-4xl break-words"><!-- HTML_TAG_START --> <link href="/docs/trl/v0.15.2/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> <link rel="modulepreload" href="/docs/trl/v0.15.2/en/_app/immutable/entry/start.a67350a3.js"> <link rel="modulepreload" href="/docs/trl/v0.15.2/en/_app/immutable/chunks/scheduler.d627b047.js"> <link rel="modulepreload" href="/docs/trl/v0.15.2/en/_app/immutable/chunks/singletons.873c3aa8.js"> <link rel="modulepreload" href="/docs/trl/v0.15.2/en/_app/immutable/chunks/index.a57a1c33.js"> <link rel="modulepreload" href="/docs/trl/v0.15.2/en/_app/immutable/chunks/paths.b049ce47.js"> <link rel="modulepreload" href="/docs/trl/v0.15.2/en/_app/immutable/entry/app.0e07cdbf.js"> <link rel="modulepreload" href="/docs/trl/v0.15.2/en/_app/immutable/chunks/index.73c51727.js"> <link rel="modulepreload" href="/docs/trl/v0.15.2/en/_app/immutable/nodes/0.fc3919dd.js"> <link rel="modulepreload" href="/docs/trl/v0.15.2/en/_app/immutable/chunks/each.e59479a4.js"> <link rel="modulepreload" href="/docs/trl/v0.15.2/en/_app/immutable/nodes/11.df88426a.js"> <link rel="modulepreload" href="/docs/trl/v0.15.2/en/_app/immutable/chunks/Tip.a82942ec.js"> <link rel="modulepreload" href="/docs/trl/v0.15.2/en/_app/immutable/chunks/CodeBlock.b1cdc5f6.js"> <link rel="modulepreload" href="/docs/trl/v0.15.2/en/_app/immutable/chunks/EditOnGithub.859b9ebc.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Dataset formats and types","local":"dataset-formats-and-types","sections":[{"title":"Overview of the dataset formats and types","local":"overview-of-the-dataset-formats-and-types","sections":[{"title":"Formats","local":"formats","sections":[{"title":"Standard","local":"standard","sections":[],"depth":4},{"title":"Conversational","local":"conversational","sections":[],"depth":4}],"depth":3},{"title":"Types","local":"types","sections":[{"title":"Language modeling","local":"language-modeling","sections":[],"depth":4},{"title":"Prompt-only","local":"prompt-only","sections":[],"depth":4},{"title":"Prompt-completion","local":"prompt-completion","sections":[],"depth":4},{"title":"Preference","local":"preference","sections":[],"depth":4},{"title":"Unpaired preference","local":"unpaired-preference","sections":[],"depth":4},{"title":"Stepwise supervision","local":"stepwise-supervision","sections":[],"depth":4}],"depth":3}],"depth":2},{"title":"Which dataset type to use?","local":"which-dataset-type-to-use","sections":[],"depth":2},{"title":"Working with conversational datasets in TRL","local":"working-with-conversational-datasets-in-trl","sections":[{"title":"Converting a conversational dataset into a standard dataset","local":"converting-a-conversational-dataset-into-a-standard-dataset","sections":[],"depth":3}],"depth":2},{"title":"Using any dataset with TRL: preprocessing and conversion","local":"using-any-dataset-with-trl-preprocessing-and-conversion","sections":[{"title":"Example: UltraFeedback dataset","local":"example-ultrafeedback-dataset","sections":[],"depth":3}],"depth":2},{"title":"Utilities for converting dataset types","local":"utilities-for-converting-dataset-types","sections":[{"title":"From prompt-completion to language modeling dataset","local":"from-prompt-completion-to-language-modeling-dataset","sections":[],"depth":3},{"title":"From prompt-completion to prompt-only dataset","local":"from-prompt-completion-to-prompt-only-dataset","sections":[],"depth":3},{"title":"From preference with implicit prompt to language modeling dataset","local":"from-preference-with-implicit-prompt-to-language-modeling-dataset","sections":[],"depth":3},{"title":"From preference with implicit prompt to prompt-completion dataset","local":"from-preference-with-implicit-prompt-to-prompt-completion-dataset","sections":[],"depth":3},{"title":"From preference with implicit prompt to prompt-only dataset","local":"from-preference-with-implicit-prompt-to-prompt-only-dataset","sections":[],"depth":3},{"title":"From implicit to explicit prompt preference dataset","local":"from-implicit-to-explicit-prompt-preference-dataset","sections":[],"depth":3},{"title":"From preference with implicit prompt to unpaired preference dataset","local":"from-preference-with-implicit-prompt-to-unpaired-preference-dataset","sections":[],"depth":3},{"title":"From preference to language modeling dataset","local":"from-preference-to-language-modeling-dataset","sections":[],"depth":3},{"title":"From preference to prompt-completion dataset","local":"from-preference-to-prompt-completion-dataset","sections":[],"depth":3},{"title":"From preference to prompt-only dataset","local":"from-preference-to-prompt-only-dataset","sections":[],"depth":3},{"title":"From explicit to implicit prompt preference dataset","local":"from-explicit-to-implicit-prompt-preference-dataset","sections":[],"depth":3},{"title":"From preference to unpaired preference dataset","local":"from-preference-to-unpaired-preference-dataset","sections":[],"depth":3},{"title":"From unpaired preference to language modeling dataset","local":"from-unpaired-preference-to-language-modeling-dataset","sections":[],"depth":3},{"title":"From unpaired preference to prompt-completion dataset","local":"from-unpaired-preference-to-prompt-completion-dataset","sections":[],"depth":3},{"title":"From unpaired preference to prompt-only dataset","local":"from-unpaired-preference-to-prompt-only-dataset","sections":[],"depth":3},{"title":"From stepwise supervision to language modeling dataset","local":"from-stepwise-supervision-to-language-modeling-dataset","sections":[],"depth":3},{"title":"From stepwise supervision to prompt completion dataset","local":"from-stepwise-supervision-to-prompt-completion-dataset","sections":[],"depth":3},{"title":"From stepwise supervision to prompt only dataset","local":"from-stepwise-supervision-to-prompt-only-dataset","sections":[],"depth":3},{"title":"From stepwise supervision to unpaired preference dataset","local":"from-stepwise-supervision-to-unpaired-preference-dataset","sections":[],"depth":3}],"depth":2},{"title":"Vision datasets","local":"vision-datasets","sections":[],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="dataset-formats-and-types" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#dataset-formats-and-types"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Dataset formats and types</span></h1> <p data-svelte-h="svelte-1976wsu">This guide provides an overview of the dataset formats and types supported by each trainer in TRL.</p> <h2 class="relative group"><a id="overview-of-the-dataset-formats-and-types" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#overview-of-the-dataset-formats-and-types"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Overview of the dataset formats and types</span></h2> <ul data-svelte-h="svelte-fvfcsu"><li>The <em>format</em> of a dataset refers to how the data is structured, typically categorized as either <em>standard</em> or <em>conversational</em>.</li> <li>The <em>type</em> is associated with the specific task the dataset is designed for, such as <em>prompt-only</em> or <em>preference</em>. Each type is characterized by its columns, which vary according to the task, as shown in the table.</li></ul> <table data-svelte-h="svelte-1j4ocz4"><tbody><tr><th>Type \ Format</th> <th>Standard</th> <th>Conversational</th></tr> <tr><td>Language modeling</td> <td><pre><code>{"text": "The sky is blue."}</code></pre></td> <td><pre><code>{"messages": [{"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "It is blue."}]}</code></pre></td></tr> <tr><td>Prompt-only</td> <td><pre><code>{"prompt": "The sky is"}</code></pre></td> <td><pre><code>{"prompt": [{"role": "user", "content": "What color is the sky?"}]}</code></pre></td></tr> <tr><td>Prompt-completion</td> <td><pre><code>{"prompt": "The sky is", "completion": " blue."}</code></pre></td> <td><pre><code>{"prompt": [{"role": "user", "content": "What color is the sky?"}], "completion": [{"role": "assistant", "content": "It is blue."}]}</code></pre></td></tr> <tr><td>Preference</td> <td><pre><code>{"prompt": "The sky is", "chosen": " blue.", "rejected": " green."}</code></pre> or, with implicit prompt: <pre><code>{"chosen": "The sky is blue.", "rejected": "The sky is green."}</code></pre></td> <td><pre><code>{"prompt": [{"role": "user", "content": "What color is the sky?"}], "chosen": [{"role": "assistant", "content": "It is blue."}], "rejected": [{"role": "assistant", "content": "It is green."}]}</code></pre> or, with implicit prompt: <pre><code>{"chosen": [{"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "It is blue."}], "rejected": [{"role": "user", "content": "What color is the sky?"}, {"role": "assistant", "content": "It is green."}]}</code></pre></td></tr> <tr><td>Unpaired preference</td> <td><pre><code>{"prompt": "The sky is", "completion": " blue.", "label": True}</code></pre></td> <td><pre><code>{"prompt": [{"role": "user", "content": "What color is the sky?"}], "completion": [{"role": "assistant", "content": "It is green."}], "label": False}</code></pre></td></tr> <tr><td>Stepwise supervision</td> <td><pre><code>{"prompt": "Which number is larger, 9.8 or 9.11?", "completions": ["The fractional part of 9.8 is 0.8.", "The fractional part of 9.11 is 0.11.", "0.11 is greater than 0.8.", "Hence, 9.11 > 9.8."], "labels": [True, True, False, False]}</code></pre></td> <td></td></tr></tbody></table> <h3 class="relative group"><a id="formats" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#formats"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Formats</span></h3> <h4 class="relative group"><a id="standard" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#standard"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Standard</span></h4> <p data-svelte-h="svelte-n22pbc">The standard dataset format typically consists of plain text strings. The columns in the dataset vary depending on the task. This is the format expected by TRL trainers. Below are examples of standard dataset formats for different tasks:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-comment"># Language modeling</span> language_modeling_example = {<span class="hljs-string">"text"</span>: <span class="hljs-string">"The sky is blue."</span>} <span class="hljs-comment"># Preference</span> preference_example = {<span class="hljs-string">"prompt"</span>: <span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"chosen"</span>: <span class="hljs-string">" blue."</span>, <span class="hljs-string">"rejected"</span>: <span class="hljs-string">" green."</span>} <span class="hljs-comment"># Unpaired preference</span> unpaired_preference_example = {<span class="hljs-string">"prompt"</span>: <span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"completion"</span>: <span class="hljs-string">" blue."</span>, <span class="hljs-string">"label"</span>: <span class="hljs-literal">True</span>}<!-- HTML_TAG_END --></pre></div> <h4 class="relative group"><a id="conversational" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#conversational"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Conversational</span></h4> <p data-svelte-h="svelte-ftn6so">Conversational datasets are used for tasks involving dialogues or chat interactions between users and assistants. Unlike standard dataset formats, these contain sequences of messages where each message has a <code>role</code> (e.g., <code>"user"</code> or <code>"assistant"</code>) and <code>content</code> (the message text).</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->messages = [ {<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Hello, how are you?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"I'm doing great. How can I help you today?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"I'd like to show off how chat templating works!"</span>}, ]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1n2btf9">Just like standard datasets, the columns in conversational datasets vary depending on the task. Below are examples of conversational dataset formats for different tasks:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-comment"># Prompt-completion</span> prompt_completion_example = {<span class="hljs-string">"prompt"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}], <span class="hljs-string">"completion"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>}]} <span class="hljs-comment"># Preference</span> preference_example = { <span class="hljs-string">"prompt"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}], <span class="hljs-string">"chosen"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>}], <span class="hljs-string">"rejected"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is green."</span>}], }<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1de5sg2">Conversational datasets are useful for training chat models, but must be converted into a standard format before being used with TRL trainers. This is typically done using chat templates specific to the model being used. For more information, refer to the <a href="#working-with-conversational-datasets-in-trl">Working with conversational datasets in TRL</a> section.</p> <h3 class="relative group"><a id="types" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#types"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Types</span></h3> <h4 class="relative group"><a id="language-modeling" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#language-modeling"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Language modeling</span></h4> <p data-svelte-h="svelte-zfzona">A language modeling dataset consists of a column <code>"text"</code> (or <code>"messages"</code> for conversational datasets) containing a full sequence of text.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-comment"># Standard format</span> language_modeling_example = {<span class="hljs-string">"text"</span>: <span class="hljs-string">"The sky is blue."</span>} <span class="hljs-comment"># Conversational format</span> language_modeling_example = {<span class="hljs-string">"messages"</span>: [ {<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>} ]}<!-- HTML_TAG_END --></pre></div> <h4 class="relative group"><a id="prompt-only" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#prompt-only"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Prompt-only</span></h4> <p data-svelte-h="svelte-1fi7big">In a prompt-only dataset, only the initial prompt (the question or partial sentence) is provided under the key <code>"prompt"</code>. The training typically involves generating the completion based on this prompt, where the model learns to continue or complete the given input.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-comment"># Standard format</span> prompt_only_example = {<span class="hljs-string">"prompt"</span>: <span class="hljs-string">"The sky is"</span>} <span class="hljs-comment"># Conversational format</span> prompt_only_example = {<span class="hljs-string">"prompt"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}]}<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-19lc5cw">For examples of prompt-only datasets, refer to the <a href="https://huggingface.co/collections/trl-lib/prompt-only-datasets-677ea25245d20252cea00368" rel="nofollow">Prompt-only datasets collection</a>.</p> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-1k9p4sz">While both the prompt-only and language modeling types are similar, they differ in how the input is handled. In the prompt-only type, the prompt represents a partial input that expects the model to complete or continue, while in the language modeling type, the input is treated as a complete sentence or sequence. These two types are processed differently by TRL. Below is an example showing the difference in the output of the <code>apply_chat_template</code> function for each type:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoTokenizer <span class="hljs-keyword">from</span> trl <span class="hljs-keyword">import</span> apply_chat_template tokenizer = AutoTokenizer.from_pretrained(<span class="hljs-string">"microsoft/Phi-3-mini-128k-instruct"</span>) <span class="hljs-comment"># Example for prompt-only type</span> prompt_only_example = {<span class="hljs-string">"prompt"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}]} apply_chat_template(prompt_only_example, tokenizer) <span class="hljs-comment"># Output: {'prompt': '<|user|>\nWhat color is the sky?<|end|>\n<|assistant|>\n'}</span> <span class="hljs-comment"># Example for language modeling type</span> lm_example = {<span class="hljs-string">"messages"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}]} apply_chat_template(lm_example, tokenizer) <span class="hljs-comment"># Output: {'text': '<|user|>\nWhat color is the sky?<|end|>\n<|endoftext|>'}</span><!-- HTML_TAG_END --></pre></div> <ul data-svelte-h="svelte-17mqw6t"><li>The prompt-only output includes a <code>'<|assistant|>\n'</code>, indicating the beginning of the assistant’s turn and expecting the model to generate a completion.</li> <li>In contrast, the language modeling output treats the input as a complete sequence and terminates it with <code>'<|endoftext|>'</code>, signaling the end of the text and not expecting any additional content.</li></ul></div> <h4 class="relative group"><a id="prompt-completion" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#prompt-completion"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Prompt-completion</span></h4> <p data-svelte-h="svelte-1qh8h3u">A prompt-completion dataset includes a <code>"prompt"</code> and a <code>"completion"</code>.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-comment"># Standard format</span> prompt_completion_example = {<span class="hljs-string">"prompt"</span>: <span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"completion"</span>: <span class="hljs-string">" blue."</span>} <span class="hljs-comment"># Conversational format</span> prompt_completion_example = {<span class="hljs-string">"prompt"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}], <span class="hljs-string">"completion"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>}]}<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-td5nru">For examples of prompt-completion datasets, refer to the <a href="https://huggingface.co/collections/trl-lib/prompt-completion-datasets-677ea2bb20bbb6bdccada216" rel="nofollow">Prompt-completion datasets collection</a>.</p> <h4 class="relative group"><a id="preference" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#preference"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Preference</span></h4> <p data-svelte-h="svelte-fpgtkb">A preference dataset is used for tasks where the model is trained to choose between two or more possible completions to the same prompt. This dataset includes a <code>"prompt"</code>, a <code>"chosen"</code> completion, and a <code>"rejected"</code> completion. The model is trained to select the <code>"chosen"</code> response over the <code>"rejected"</code> response. Some dataset may not include the <code>"prompt"</code> column, in which case the prompt is implicit and directly included in the <code>"chosen"</code> and <code>"rejected"</code> completions. We recommend using explicit prompts whenever possible.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-comment"># Standard format</span> <span class="hljs-comment">## Explicit prompt (recommended)</span> preference_example = {<span class="hljs-string">"prompt"</span>: <span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"chosen"</span>: <span class="hljs-string">" blue."</span>, <span class="hljs-string">"rejected"</span>: <span class="hljs-string">" green."</span>} <span class="hljs-comment"># Implicit prompt</span> preference_example = {<span class="hljs-string">"chosen"</span>: <span class="hljs-string">"The sky is blue."</span>, <span class="hljs-string">"rejected"</span>: <span class="hljs-string">"The sky is green."</span>} <span class="hljs-comment"># Conversational format</span> <span class="hljs-comment">## Explicit prompt (recommended)</span> preference_example = {<span class="hljs-string">"prompt"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}], <span class="hljs-string">"chosen"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>}], <span class="hljs-string">"rejected"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is green."</span>}]} <span class="hljs-comment">## Implicit prompt</span> preference_example = {<span class="hljs-string">"chosen"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>}], <span class="hljs-string">"rejected"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is green."</span>}]}<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-183i8gm">For examples of preference datasets, refer to the <a href="https://huggingface.co/collections/trl-lib/preference-datasets-677e99b581018fcad9abd82c" rel="nofollow">Preference datasets collection</a>.</p> <p data-svelte-h="svelte-144ttmo">Some preference datasets can be found with <a href="https://huggingface.co/datasets?other=dpo" rel="nofollow">the tag <code>dpo</code> on Hugging Face Hub</a>. You can also explore the <a href="https://huggingface.co/collections/librarian-bots/direct-preference-optimization-datasets-66964b12835f46289b6ef2fc" rel="nofollow">librarian-bots’ DPO Collections</a> to identify preference datasets.</p> <h4 class="relative group"><a id="unpaired-preference" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#unpaired-preference"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Unpaired preference</span></h4> <p data-svelte-h="svelte-ds3631">An unpaired preference dataset is similar to a preference dataset but instead of having <code>"chosen"</code> and <code>"rejected"</code> completions for the same prompt, it includes a single <code>"completion"</code> and a <code>"label"</code> indicating whether the completion is preferred or not.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-comment"># Standard format</span> unpaired_preference_example = {<span class="hljs-string">"prompt"</span>: <span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"completion"</span>: <span class="hljs-string">" blue."</span>, <span class="hljs-string">"label"</span>: <span class="hljs-literal">True</span>} <span class="hljs-comment"># Conversational format</span> unpaired_preference_example = {<span class="hljs-string">"prompt"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}], <span class="hljs-string">"completion"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>}], <span class="hljs-string">"label"</span>: <span class="hljs-literal">True</span>}<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-2vlprm">For examples of unpaired preference datasets, refer to the <a href="https://huggingface.co/collections/trl-lib/unpaired-preference-datasets-677ea22bf5f528c125b0bcdf" rel="nofollow">Unpaired preference datasets collection</a>.</p> <h4 class="relative group"><a id="stepwise-supervision" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#stepwise-supervision"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Stepwise supervision</span></h4> <p data-svelte-h="svelte-wwnrua">A stepwise (or process) supervision dataset is similar to an <a href="#unpaired-preference">unpaired preference</a> dataset but includes multiple steps of completions, each with its own label. This structure is useful for tasks that need detailed, step-by-step labeling, such as reasoning tasks. By evaluating each step separately and providing targeted labels, this approach helps identify precisely where the reasoning is correct and where errors occur, allowing for targeted feedback on each part of the reasoning process.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->stepwise_example = { <span class="hljs-string">"prompt"</span>: <span class="hljs-string">"Which number is larger, 9.8 or 9.11?"</span>, <span class="hljs-string">"completions"</span>: [<span class="hljs-string">"The fractional part of 9.8 is 0.8, while the fractional part of 9.11 is 0.11."</span>, <span class="hljs-string">"Since 0.11 is greater than 0.8, the number 9.11 is larger than 9.8."</span>], <span class="hljs-string">"labels"</span>: [<span class="hljs-literal">True</span>, <span class="hljs-literal">False</span>] }<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1r34it7">For examples of stepwise supervision datasets, refer to the <a href="https://huggingface.co/collections/trl-lib/stepwise-supervision-datasets-677ea27fd4c5941beed7a96e" rel="nofollow">Stepwise supervision datasets collection</a>.</p> <h2 class="relative group"><a id="which-dataset-type-to-use" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#which-dataset-type-to-use"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Which dataset type to use?</span></h2> <p data-svelte-h="svelte-vrn817">Choosing the right dataset type depends on the task you are working on and the specific requirements of the TRL trainer you are using. Below is a brief overview of the dataset types supported by each TRL trainer.</p> <table data-svelte-h="svelte-xery8p"><thead><tr><th>Trainer</th> <th>Expected dataset type</th></tr></thead> <tbody><tr><td><a href="/docs/trl/v0.15.2/en/bco_trainer#trl.BCOTrainer">BCOTrainer</a></td> <td><a href="#unpaired-preference">Unpaired preference</a></td></tr> <tr><td><a href="/docs/trl/v0.15.2/en/cpo_trainer#trl.CPOTrainer">CPOTrainer</a></td> <td><a href="#preference">Preference (explicit prompt recommended)</a></td></tr> <tr><td><a href="/docs/trl/v0.15.2/en/dpo_trainer#trl.DPOTrainer">DPOTrainer</a></td> <td><a href="#preference">Preference (explicit prompt recommended)</a></td></tr> <tr><td><a href="/docs/trl/v0.15.2/en/gkd_trainer#trl.GKDTrainer">GKDTrainer</a></td> <td><a href="#prompt-completion">Prompt-completion</a></td></tr> <tr><td><a href="/docs/trl/v0.15.2/en/grpo_trainer#trl.GRPOTrainer">GRPOTrainer</a></td> <td><a href="#prompt-only">Prompt-only</a></td></tr> <tr><td><a href="/docs/trl/v0.15.2/en/iterative_sft_trainer#trl.IterativeSFTTrainer">IterativeSFTTrainer</a></td> <td><a href="#unpaired-preference">Unpaired preference</a></td></tr> <tr><td><a href="/docs/trl/v0.15.2/en/kto_trainer#trl.KTOTrainer">KTOTrainer</a></td> <td><a href="#unpaired-preference">Unpaired preference</a> or <a href="#preference">Preference (explicit prompt recommended)</a></td></tr> <tr><td><a href="/docs/trl/v0.15.2/en/nash_md_trainer#trl.NashMDTrainer">NashMDTrainer</a></td> <td><a href="#prompt-only">Prompt-only</a></td></tr> <tr><td><a href="/docs/trl/v0.15.2/en/online_dpo_trainer#trl.OnlineDPOTrainer">OnlineDPOTrainer</a></td> <td><a href="#prompt-only">Prompt-only</a></td></tr> <tr><td><a href="/docs/trl/v0.15.2/en/orpo_trainer#trl.ORPOTrainer">ORPOTrainer</a></td> <td><a href="#preference">Preference (explicit prompt recommended)</a></td></tr> <tr><td><a href="/docs/trl/v0.15.2/en/ppo_trainer#trl.PPOTrainer">PPOTrainer</a></td> <td>Tokenized language modeling</td></tr> <tr><td><a href="/docs/trl/v0.15.2/en/prm_trainer#trl.PRMTrainer">PRMTrainer</a></td> <td><a href="#stepwise-supervision">Stepwise supervision</a></td></tr> <tr><td><a href="/docs/trl/v0.15.2/en/reward_trainer#trl.RewardTrainer">RewardTrainer</a></td> <td><a href="#preference">Preference (implicit prompt recommended)</a></td></tr> <tr><td><a href="/docs/trl/v0.15.2/en/sft_trainer#trl.SFTTrainer">SFTTrainer</a></td> <td><a href="#language-modeling">Language modeling</a></td></tr> <tr><td><a href="/docs/trl/v0.15.2/en/xpo_trainer#trl.XPOTrainer">XPOTrainer</a></td> <td><a href="#prompt-only">Prompt-only</a></td></tr></tbody></table> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-38co2n">TRL trainers only support standard dataset formats, <a href="https://github.com/huggingface/trl/issues/2071" rel="nofollow">for now</a>. If you have a conversational dataset, you must first convert it into a standard format. For more information on how to work with conversational datasets, refer to the <a href="#working-with-conversational-datasets-in-trl">Working with conversational datasets in TRL</a> section.</p></div> <h2 class="relative group"><a id="working-with-conversational-datasets-in-trl" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#working-with-conversational-datasets-in-trl"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Working with conversational datasets in TRL</span></h2> <p data-svelte-h="svelte-10g1ofc">Conversational datasets are increasingly common, especially for training chat models. However, some TRL trainers don’t support conversational datasets in their raw format. (For more information, see <a href="https://github.com/huggingface/trl/issues/2071" rel="nofollow">issue #2071</a>.) These datasets must first be converted into a standard format. Fortunately, TRL offers tools to easily handle this conversion, which are detailed below.</p> <h3 class="relative group"><a id="converting-a-conversational-dataset-into-a-standard-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#converting-a-conversational-dataset-into-a-standard-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Converting a conversational dataset into a standard dataset</span></h3> <p data-svelte-h="svelte-18kvh4r">To convert a conversational dataset into a standard dataset, you need to <em>apply a chat template</em> to the dataset. A chat template is a predefined structure that typically includes placeholders for user and assistant messages. This template is provided by the tokenizer of the model you use.</p> <p data-svelte-h="svelte-19t1v1h">For detailed instructions on using chat templating, refer to the <a href="https://huggingface.co/docs/transformers/en/chat_templating" rel="nofollow">Chat templating section in the <code>transformers</code> documentation</a>.</p> <p data-svelte-h="svelte-k85zuz">In TRL, the method you apply to convert the dataset will vary depending on the task. Fortunately, TRL provides a helper function called <a href="/docs/trl/v0.15.2/en/data_utils#trl.apply_chat_template">apply_chat_template()</a> to simplify this process. Here’s an example of how to use it:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoTokenizer <span class="hljs-keyword">from</span> trl <span class="hljs-keyword">import</span> apply_chat_template tokenizer = AutoTokenizer.from_pretrained(<span class="hljs-string">"microsoft/Phi-3-mini-128k-instruct"</span>) example = { <span class="hljs-string">"prompt"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}], <span class="hljs-string">"completion"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>}] } apply_chat_template(example, tokenizer) <span class="hljs-comment"># Output:</span> <span class="hljs-comment"># {'prompt': '<|user|>\nWhat color is the sky?<|end|>\n<|assistant|>\n', 'completion': 'It is blue.<|end|>\n<|endoftext|>'}</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1q93c60">Alternatively, you can use the <a href="https://huggingface.co/docs/datasets/v3.3.2/en/package_reference/main_classes#datasets.Dataset.map" rel="nofollow">map</a> method to apply the template across an entire dataset:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset <span class="hljs-keyword">from</span> trl <span class="hljs-keyword">import</span> apply_chat_template dataset_dict = { <span class="hljs-string">"prompt"</span>: [[{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Where is the sun?"</span>}]], <span class="hljs-string">"completion"</span>: [[{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"In the sky."</span>}]] } dataset = Dataset.from_dict(dataset_dict) dataset = dataset.<span class="hljs-built_in">map</span>(apply_chat_template, fn_kwargs={<span class="hljs-string">"tokenizer"</span>: tokenizer}) <span class="hljs-comment"># Output:</span> <span class="hljs-comment"># {'prompt': ['<|user|>\nWhat color is the sky?<|end|>\n<|assistant|>\n',</span> <span class="hljs-comment"># '<|user|>\nWhere is the sun?<|end|>\n<|assistant|>\n'],</span> <span class="hljs-comment"># 'completion': ['It is blue.<|end|>\n<|endoftext|>', 'In the sky.<|end|>\n<|endoftext|>']}</span><!-- HTML_TAG_END --></pre></div> <div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400"><p data-svelte-h="svelte-1erzori">We recommend using the <a href="/docs/trl/v0.15.2/en/data_utils#trl.apply_chat_template">apply_chat_template()</a> function instead of calling <code>tokenizer.apply_chat_template</code> directly. Handling chat templates for non-language modeling datasets can be tricky and may result in errors, such as mistakenly placing a system prompt in the middle of a conversation. For additional examples, see <a href="https://github.com/huggingface/trl/pull/1930#issuecomment-2292908614" rel="nofollow">#1930 (comment)</a>. The <a href="/docs/trl/v0.15.2/en/data_utils#trl.apply_chat_template">apply_chat_template()</a> is designed to handle these intricacies and ensure the correct application of chat templates for various tasks.</p></div> <div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400"><p data-svelte-h="svelte-6bgf58">It’s important to note that chat templates are model-specific. For example, if you use the chat template from <a href="https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct" rel="nofollow">meta-llama/Meta-Llama-3.1-8B-Instruct</a> with the above example, you get a different output:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->apply_chat_template(example, AutoTokenizer.from_pretrained(<span class="hljs-string">"meta-llama/Meta-Llama-3.1-8B-Instruct"</span>)) <span class="hljs-comment"># Output:</span> <span class="hljs-comment"># {'prompt': '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat color is the sky?<|im_end|>\n<|im_start|>assistant\n',</span> <span class="hljs-comment"># 'completion': 'It is blue.<|im_end|>\n'}</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-ien9e7">Always use the chat template associated with the model you’re working with. Using the wrong template can lead to inaccurate or unexpected results.</p></div> <h2 class="relative group"><a id="using-any-dataset-with-trl-preprocessing-and-conversion" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#using-any-dataset-with-trl-preprocessing-and-conversion"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Using any dataset with TRL: preprocessing and conversion</span></h2> <p data-svelte-h="svelte-n0cp2u">Many datasets come in formats tailored to specific tasks, which might not be directly compatible with TRL. To use such datasets with TRL, you may need to preprocess and convert them into the required format.</p> <p data-svelte-h="svelte-wbjjp8">To make this easier, we provide a set of <a href="https://github.com/huggingface/trl/tree/main/examples/datasets" rel="nofollow">example scripts</a> that cover common dataset conversions.</p> <h3 class="relative group"><a id="example-ultrafeedback-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#example-ultrafeedback-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Example: UltraFeedback dataset</span></h3> <p data-svelte-h="svelte-dr0jz6">Let’s take the <a href="https://huggingface.co/datasets/openbmb/UltraFeedback" rel="nofollow">UltraFeedback dataset</a> as an example. Here’s a preview of the dataset:</p> <iframe src="https://huggingface.co/datasets/openbmb/UltraFeedback/embed/viewer/default/train" frameborder="0" width="100%" height="560px"></iframe> <p data-svelte-h="svelte-1f6n8q4">As shown above, the dataset format does not match the expected structure. It’s not in a conversational format, the column names differ, and the results pertain to different models (e.g., Bard, GPT-4) and aspects (e.g., “helpfulness”, “honesty”).</p> <p data-svelte-h="svelte-17l9hcf">By using the provided conversion script <a href="https://github.com/huggingface/trl/tree/main/examples/datasets/ultrafeedback.py" rel="nofollow"><code>examples/datasets/ultrafeedback.py</code></a>, you can transform this dataset into an unpaired preference type, and push it to the Hub:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->python examples/datasets/ultrafeedback.py --push_to_hub --repo_id trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-hplhuu">Once converted, the dataset will look like this:</p> <iframe src="https://huggingface.co/datasets/trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness/embed/viewer/default/train?row=0" frameborder="0" width="100%" height="560px"></iframe> <p data-svelte-h="svelte-jt9cd3">Now, you can use this dataset with TRL!</p> <p data-svelte-h="svelte-1rvlfj0">By adapting the provided scripts or creating your own, you can convert any dataset into a format compatible with TRL.</p> <h2 class="relative group"><a id="utilities-for-converting-dataset-types" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#utilities-for-converting-dataset-types"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Utilities for converting dataset types</span></h2> <p data-svelte-h="svelte-1ttdd1s">This section provides example code to help you convert between different dataset types. While some conversions can be performed after applying the chat template (i.e., in the standard format), we recommend performing the conversion before applying the chat template to ensure it works consistently.</p> <p data-svelte-h="svelte-17l70ga">For simplicity, some of the examples below do not follow this recommendation and use the standard format. However, the conversions can be applied directly to the conversational format without modification.</p> <table data-svelte-h="svelte-byyzej"><thead><tr><th>From \ To</th> <th>Language modeling</th> <th>Prompt-completion</th> <th>Prompt-only</th> <th>Preference with implicit prompt</th> <th>Preference</th> <th>Unpaired preference</th> <th>Stepwise supervision</th></tr></thead> <tbody><tr><td>Language modeling</td> <td>N/A</td> <td>N/A</td> <td>N/A</td> <td>N/A</td> <td>N/A</td> <td>N/A</td> <td>N/A</td></tr> <tr><td>Prompt-completion</td> <td><a href="#from-prompt-completion-to-language-modeling-dataset">🔗</a></td> <td>N/A</td> <td><a href="#from-prompt-completion-to-prompt-only-dataset">🔗</a></td> <td>N/A</td> <td>N/A</td> <td>N/A</td> <td>N/A</td></tr> <tr><td>Prompt-only</td> <td>N/A</td> <td>N/A</td> <td>N/A</td> <td>N/A</td> <td>N/A</td> <td>N/A</td> <td>N/A</td></tr> <tr><td>Preference with implicit prompt</td> <td><a href="#from-preference-with-implicit-prompt-to-language-modeling-dataset">🔗</a></td> <td><a href="#from-preference-with-implicit-prompt-to-prompt-completion-dataset">🔗</a></td> <td><a href="#from-preference-with-implicit-prompt-to-prompt-only-dataset">🔗</a></td> <td>N/A</td> <td><a href="#from-implicit-to-explicit-prompt-preference-dataset">🔗</a></td> <td><a href="#from-preference-with-implicit-prompt-to-unpaired-preference-dataset">🔗</a></td> <td>N/A</td></tr> <tr><td>Preference</td> <td><a href="#from-preference-to-language-modeling-dataset">🔗</a></td> <td><a href="#from-preference-to-prompt-completion-dataset">🔗</a></td> <td><a href="#from-preference-to-prompt-only-dataset">🔗</a></td> <td><a href="#from-explicit-to-implicit-prompt-preference-dataset">🔗</a></td> <td>N/A</td> <td><a href="#from-preference-to-unpaired-preference-dataset">🔗</a></td> <td>N/A</td></tr> <tr><td>Unpaired preference</td> <td><a href="#from-unpaired-preference-to-language-modeling-dataset">🔗</a></td> <td><a href="#from-unpaired-preference-to-prompt-completion-dataset">🔗</a></td> <td><a href="#from-unpaired-preference-to-prompt-only-dataset">🔗</a></td> <td>N/A</td> <td>N/A</td> <td>N/A</td> <td>N/A</td></tr> <tr><td>Stepwise supervision</td> <td><a href="#from-stepwise-supervision-to-language-modeling-dataset">🔗</a></td> <td><a href="#from-stepwise-supervision-to-prompt-completion-dataset">🔗</a></td> <td><a href="#from-stepwise-supervision-to-prompt-only-dataset">🔗</a></td> <td>N/A</td> <td>N/A</td> <td><a href="#from-stepwise-supervision-to-unpaired-preference-dataset">🔗</a></td> <td>N/A</td></tr></tbody></table> <h3 class="relative group"><a id="from-prompt-completion-to-language-modeling-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-prompt-completion-to-language-modeling-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From prompt-completion to language modeling dataset</span></h3> <p data-svelte-h="svelte-q88m1s">To convert a prompt-completion dataset into a language modeling dataset, concatenate the prompt and the completion.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset dataset = Dataset.from_dict({ <span class="hljs-string">"prompt"</span>: [<span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"The sun is"</span>], <span class="hljs-string">"completion"</span>: [<span class="hljs-string">" blue."</span>, <span class="hljs-string">" in the sky."</span>], }) <span class="hljs-keyword">def</span> <span class="hljs-title function_">concat_prompt_completion</span>(<span class="hljs-params">example</span>): <span class="hljs-keyword">return</span> {<span class="hljs-string">"text"</span>: example[<span class="hljs-string">"prompt"</span>] + example[<span class="hljs-string">"completion"</span>]} dataset = dataset.<span class="hljs-built_in">map</span>(concat_prompt_completion, remove_columns=[<span class="hljs-string">"prompt"</span>, <span class="hljs-string">"completion"</span>])<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'text'</span>: <span class="hljs-string">'The sky is blue.'</span>}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-prompt-completion-to-prompt-only-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-prompt-completion-to-prompt-only-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From prompt-completion to prompt-only dataset</span></h3> <p data-svelte-h="svelte-dplhkb">To convert a prompt-completion dataset into a prompt-only dataset, remove the completion.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset dataset = Dataset.from_dict({ <span class="hljs-string">"prompt"</span>: [<span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"The sun is"</span>], <span class="hljs-string">"completion"</span>: [<span class="hljs-string">" blue."</span>, <span class="hljs-string">" in the sky."</span>], }) dataset = dataset.remove_columns(<span class="hljs-string">"completion"</span>)<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'prompt'</span>: <span class="hljs-string">'The sky is'</span>}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-preference-with-implicit-prompt-to-language-modeling-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-preference-with-implicit-prompt-to-language-modeling-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From preference with implicit prompt to language modeling dataset</span></h3> <p data-svelte-h="svelte-u8pmva">To convert a preference with implicit prompt dataset into a language modeling dataset, remove the rejected, and rename the column <code>"chosen"</code> to <code>"text"</code>.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset dataset = Dataset.from_dict({ <span class="hljs-string">"chosen"</span>: [<span class="hljs-string">"The sky is blue."</span>, <span class="hljs-string">"The sun is in the sky."</span>], <span class="hljs-string">"rejected"</span>: [<span class="hljs-string">"The sky is green."</span>, <span class="hljs-string">"The sun is in the sea."</span>], }) dataset = dataset.rename_column(<span class="hljs-string">"chosen"</span>, <span class="hljs-string">"text"</span>).remove_columns(<span class="hljs-string">"rejected"</span>)<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'text'</span>: <span class="hljs-string">'The sky is blue.'</span>}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-preference-with-implicit-prompt-to-prompt-completion-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-preference-with-implicit-prompt-to-prompt-completion-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From preference with implicit prompt to prompt-completion dataset</span></h3> <p data-svelte-h="svelte-1m7p243">To convert a preference dataset with implicit prompt into a prompt-completion dataset, extract the prompt with <a href="/docs/trl/v0.15.2/en/data_utils#trl.extract_prompt">extract_prompt()</a>, remove the rejected, and rename the column <code>"chosen"</code> to <code>"completion"</code>.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset <span class="hljs-keyword">from</span> trl <span class="hljs-keyword">import</span> extract_prompt dataset = Dataset.from_dict({ <span class="hljs-string">"chosen"</span>: [ [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Where is the sun?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"In the sky."</span>}], ], <span class="hljs-string">"rejected"</span>: [ [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is green."</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Where is the sun?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"In the sea."</span>}], ], }) dataset = dataset.<span class="hljs-built_in">map</span>(extract_prompt).remove_columns(<span class="hljs-string">"rejected"</span>).rename_column(<span class="hljs-string">"chosen"</span>, <span class="hljs-string">"completion"</span>)<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'prompt'</span>: [{<span class="hljs-string">'role'</span>: <span class="hljs-string">'user'</span>, <span class="hljs-string">'content'</span>: <span class="hljs-string">'What color is the sky?'</span>}], <span class="hljs-string">'completion'</span>: [{<span class="hljs-string">'role'</span>: <span class="hljs-string">'assistant'</span>, <span class="hljs-string">'content'</span>: <span class="hljs-string">'It is blue.'</span>}]}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-preference-with-implicit-prompt-to-prompt-only-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-preference-with-implicit-prompt-to-prompt-only-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From preference with implicit prompt to prompt-only dataset</span></h3> <p data-svelte-h="svelte-9uqkvt">To convert a preference dataset with implicit prompt into a prompt-only dataset, extract the prompt with <a href="/docs/trl/v0.15.2/en/data_utils#trl.extract_prompt">extract_prompt()</a>, and remove the rejected and the chosen.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset <span class="hljs-keyword">from</span> trl <span class="hljs-keyword">import</span> extract_prompt dataset = Dataset.from_dict({ <span class="hljs-string">"chosen"</span>: [ [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Where is the sun?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"In the sky."</span>}], ], <span class="hljs-string">"rejected"</span>: [ [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is green."</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Where is the sun?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"In the sea."</span>}], ], }) dataset = dataset.<span class="hljs-built_in">map</span>(extract_prompt).remove_columns([<span class="hljs-string">"chosen"</span>, <span class="hljs-string">"rejected"</span>])<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'prompt'</span>: [{<span class="hljs-string">'role'</span>: <span class="hljs-string">'user'</span>, <span class="hljs-string">'content'</span>: <span class="hljs-string">'What color is the sky?'</span>}]}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-implicit-to-explicit-prompt-preference-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-implicit-to-explicit-prompt-preference-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From implicit to explicit prompt preference dataset</span></h3> <p data-svelte-h="svelte-1m33e4x">To convert a preference dataset with implicit prompt into a preference dataset with explicit prompt, extract the prompt with <a href="/docs/trl/v0.15.2/en/data_utils#trl.extract_prompt">extract_prompt()</a>.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset <span class="hljs-keyword">from</span> trl <span class="hljs-keyword">import</span> extract_prompt dataset = Dataset.from_dict({ <span class="hljs-string">"chosen"</span>: [ [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Where is the sun?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"In the sky."</span>}], ], <span class="hljs-string">"rejected"</span>: [ [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is green."</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Where is the sun?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"In the sea."</span>}], ], }) dataset = dataset.<span class="hljs-built_in">map</span>(extract_prompt)<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'prompt'</span>: [{<span class="hljs-string">'role'</span>: <span class="hljs-string">'user'</span>, <span class="hljs-string">'content'</span>: <span class="hljs-string">'What color is the sky?'</span>}], <span class="hljs-string">'chosen'</span>: [{<span class="hljs-string">'role'</span>: <span class="hljs-string">'assistant'</span>, <span class="hljs-string">'content'</span>: <span class="hljs-string">'It is blue.'</span>}], <span class="hljs-string">'rejected'</span>: [{<span class="hljs-string">'role'</span>: <span class="hljs-string">'assistant'</span>, <span class="hljs-string">'content'</span>: <span class="hljs-string">'It is green.'</span>}]}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-preference-with-implicit-prompt-to-unpaired-preference-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-preference-with-implicit-prompt-to-unpaired-preference-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From preference with implicit prompt to unpaired preference dataset</span></h3> <p data-svelte-h="svelte-1lqzyzq">To convert a preference dataset with implicit prompt into an unpaired preference dataset, extract the prompt with <a href="/docs/trl/v0.15.2/en/data_utils#trl.extract_prompt">extract_prompt()</a>, and unpair the dataset with <a href="/docs/trl/v0.15.2/en/data_utils#trl.unpair_preference_dataset">unpair_preference_dataset()</a>.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset <span class="hljs-keyword">from</span> trl <span class="hljs-keyword">import</span> extract_prompt, unpair_preference_dataset dataset = Dataset.from_dict({ <span class="hljs-string">"chosen"</span>: [ [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Where is the sun?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"In the sky."</span>}], ], <span class="hljs-string">"rejected"</span>: [ [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is green."</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Where is the sun?"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"In the sea."</span>}], ], }) dataset = dataset.<span class="hljs-built_in">map</span>(extract_prompt) dataset = unpair_preference_dataset(dataset)<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'prompt'</span>: [{<span class="hljs-string">'role'</span>: <span class="hljs-string">'user'</span>, <span class="hljs-string">'content'</span>: <span class="hljs-string">'What color is the sky?'</span>}], <span class="hljs-string">'completion'</span>: [{<span class="hljs-string">'role'</span>: <span class="hljs-string">'assistant'</span>, <span class="hljs-string">'content'</span>: <span class="hljs-string">'It is blue.'</span>}], <span class="hljs-string">'label'</span>: <span class="hljs-literal">True</span>}<!-- HTML_TAG_END --></pre></div> <div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400"><p data-svelte-h="svelte-1my3v2u">Keep in mind that the <code>"chosen"</code> and <code>"rejected"</code> completions in a preference dataset can be both good or bad. Before applying <a href="/docs/trl/v0.15.2/en/data_utils#trl.unpair_preference_dataset">unpair_preference_dataset()</a>, please ensure that all <code>"chosen"</code> completions can be labeled as good and all <code>"rejected"</code> completions as bad. This can be ensured by checking absolute rating of each completion, e.g. from a reward model.</p></div> <h3 class="relative group"><a id="from-preference-to-language-modeling-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-preference-to-language-modeling-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From preference to language modeling dataset</span></h3> <p data-svelte-h="svelte-1azgh8u">To convert a preference dataset into a language modeling dataset, remove the rejected, concatenate the prompt and the chosen into the <code>"text"</code> column.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset dataset = Dataset.from_dict({ <span class="hljs-string">"prompt"</span>: [<span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"The sun is"</span>], <span class="hljs-string">"chosen"</span>: [<span class="hljs-string">" blue."</span>, <span class="hljs-string">" in the sky."</span>], <span class="hljs-string">"rejected"</span>: [<span class="hljs-string">" green."</span>, <span class="hljs-string">" in the sea."</span>], }) <span class="hljs-keyword">def</span> <span class="hljs-title function_">concat_prompt_chosen</span>(<span class="hljs-params">example</span>): <span class="hljs-keyword">return</span> {<span class="hljs-string">"text"</span>: example[<span class="hljs-string">"prompt"</span>] + example[<span class="hljs-string">"chosen"</span>]} dataset = dataset.<span class="hljs-built_in">map</span>(concat_prompt_chosen, remove_columns=[<span class="hljs-string">"prompt"</span>, <span class="hljs-string">"chosen"</span>, <span class="hljs-string">"rejected"</span>])<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'text'</span>: <span class="hljs-string">'The sky is blue.'</span>}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-preference-to-prompt-completion-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-preference-to-prompt-completion-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From preference to prompt-completion dataset</span></h3> <p data-svelte-h="svelte-thhn9e">To convert a preference dataset into a prompt-completion dataset, remove the rejected, and rename the column <code>"chosen"</code> to <code>"completion"</code>.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset dataset = Dataset.from_dict({ <span class="hljs-string">"prompt"</span>: [<span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"The sun is"</span>], <span class="hljs-string">"chosen"</span>: [<span class="hljs-string">" blue."</span>, <span class="hljs-string">" in the sky."</span>], <span class="hljs-string">"rejected"</span>: [<span class="hljs-string">" green."</span>, <span class="hljs-string">" in the sea."</span>], }) dataset = dataset.remove_columns(<span class="hljs-string">"rejected"</span>).rename_column(<span class="hljs-string">"chosen"</span>, <span class="hljs-string">"completion"</span>)<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'prompt'</span>: <span class="hljs-string">'The sky is'</span>, <span class="hljs-string">'completion'</span>: <span class="hljs-string">' blue.'</span>}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-preference-to-prompt-only-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-preference-to-prompt-only-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From preference to prompt-only dataset</span></h3> <p data-svelte-h="svelte-yu2uu5">To convert a preference dataset into a prompt-only dataset, remove the rejected and the chosen.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset dataset = Dataset.from_dict({ <span class="hljs-string">"prompt"</span>: [<span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"The sun is"</span>], <span class="hljs-string">"chosen"</span>: [<span class="hljs-string">" blue."</span>, <span class="hljs-string">" in the sky."</span>], <span class="hljs-string">"rejected"</span>: [<span class="hljs-string">" green."</span>, <span class="hljs-string">" in the sea."</span>], }) dataset = dataset.remove_columns([<span class="hljs-string">"chosen"</span>, <span class="hljs-string">"rejected"</span>])<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'prompt'</span>: <span class="hljs-string">'The sky is'</span>}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-explicit-to-implicit-prompt-preference-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-explicit-to-implicit-prompt-preference-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From explicit to implicit prompt preference dataset</span></h3> <p data-svelte-h="svelte-slr0da">To convert a preference dataset with explicit prompt into a preference dataset with implicit prompt, concatenate the prompt to both chosen and rejected, and remove the prompt.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset dataset = Dataset.from_dict({ <span class="hljs-string">"prompt"</span>: [ [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Where is the sun?"</span>}], ], <span class="hljs-string">"chosen"</span>: [ [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"In the sky."</span>}], ], <span class="hljs-string">"rejected"</span>: [ [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is green."</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"In the sea."</span>}], ], }) <span class="hljs-keyword">def</span> <span class="hljs-title function_">concat_prompt_to_completions</span>(<span class="hljs-params">example</span>): <span class="hljs-keyword">return</span> {<span class="hljs-string">"chosen"</span>: example[<span class="hljs-string">"prompt"</span>] + example[<span class="hljs-string">"chosen"</span>], <span class="hljs-string">"rejected"</span>: example[<span class="hljs-string">"prompt"</span>] + example[<span class="hljs-string">"rejected"</span>]} dataset = dataset.<span class="hljs-built_in">map</span>(concat_prompt_to_completions, remove_columns=<span class="hljs-string">"prompt"</span>)<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'chosen'</span>: [{<span class="hljs-string">'role'</span>: <span class="hljs-string">'user'</span>, <span class="hljs-string">'content'</span>: <span class="hljs-string">'What color is the sky?'</span>}, {<span class="hljs-string">'role'</span>: <span class="hljs-string">'assistant'</span>, <span class="hljs-string">'content'</span>: <span class="hljs-string">'It is blue.'</span>}], <span class="hljs-string">'rejected'</span>: [{<span class="hljs-string">'role'</span>: <span class="hljs-string">'user'</span>, <span class="hljs-string">'content'</span>: <span class="hljs-string">'What color is the sky?'</span>}, {<span class="hljs-string">'role'</span>: <span class="hljs-string">'assistant'</span>, <span class="hljs-string">'content'</span>: <span class="hljs-string">'It is green.'</span>}]}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-preference-to-unpaired-preference-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-preference-to-unpaired-preference-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From preference to unpaired preference dataset</span></h3> <p data-svelte-h="svelte-1u4x9ea">To convert dataset into an unpaired preference dataset, unpair the dataset with <a href="/docs/trl/v0.15.2/en/data_utils#trl.unpair_preference_dataset">unpair_preference_dataset()</a>.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset <span class="hljs-keyword">from</span> trl <span class="hljs-keyword">import</span> unpair_preference_dataset dataset = Dataset.from_dict({ <span class="hljs-string">"prompt"</span>: [ [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Where is the sun?"</span>}], ], <span class="hljs-string">"chosen"</span>: [ [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is blue."</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"In the sky."</span>}], ], <span class="hljs-string">"rejected"</span>: [ [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"It is green."</span>}], [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"assistant"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"In the sea."</span>}], ], }) dataset = unpair_preference_dataset(dataset)<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'prompt'</span>: [{<span class="hljs-string">'role'</span>: <span class="hljs-string">'user'</span>, <span class="hljs-string">'content'</span>: <span class="hljs-string">'What color is the sky?'</span>}], <span class="hljs-string">'completion'</span>: [{<span class="hljs-string">'role'</span>: <span class="hljs-string">'assistant'</span>, <span class="hljs-string">'content'</span>: <span class="hljs-string">'It is blue.'</span>}], <span class="hljs-string">'label'</span>: <span class="hljs-literal">True</span>}<!-- HTML_TAG_END --></pre></div> <div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400"><p data-svelte-h="svelte-1my3v2u">Keep in mind that the <code>"chosen"</code> and <code>"rejected"</code> completions in a preference dataset can be both good or bad. Before applying <a href="/docs/trl/v0.15.2/en/data_utils#trl.unpair_preference_dataset">unpair_preference_dataset()</a>, please ensure that all <code>"chosen"</code> completions can be labeled as good and all <code>"rejected"</code> completions as bad. This can be ensured by checking absolute rating of each completion, e.g. from a reward model.</p></div> <h3 class="relative group"><a id="from-unpaired-preference-to-language-modeling-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-unpaired-preference-to-language-modeling-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From unpaired preference to language modeling dataset</span></h3> <p data-svelte-h="svelte-1rtwiuz">To convert an unpaired preference dataset into a language modeling dataset, concatenate prompts with good completions into the <code>"text"</code> column, and remove the prompt, completion and label columns.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset dataset = Dataset.from_dict({ <span class="hljs-string">"prompt"</span>: [<span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"The sun is"</span>, <span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"The sun is"</span>], <span class="hljs-string">"completion"</span>: [<span class="hljs-string">" blue."</span>, <span class="hljs-string">" in the sky."</span>, <span class="hljs-string">" green."</span>, <span class="hljs-string">" in the sea."</span>], <span class="hljs-string">"label"</span>: [<span class="hljs-literal">True</span>, <span class="hljs-literal">True</span>, <span class="hljs-literal">False</span>, <span class="hljs-literal">False</span>], }) <span class="hljs-keyword">def</span> <span class="hljs-title function_">concatenate_prompt_completion</span>(<span class="hljs-params">example</span>): <span class="hljs-keyword">return</span> {<span class="hljs-string">"text"</span>: example[<span class="hljs-string">"prompt"</span>] + example[<span class="hljs-string">"completion"</span>]} dataset = dataset.<span class="hljs-built_in">filter</span>(<span class="hljs-keyword">lambda</span> x: x[<span class="hljs-string">"label"</span>]).<span class="hljs-built_in">map</span>(concatenate_prompt_completion).remove_columns([<span class="hljs-string">"prompt"</span>, <span class="hljs-string">"completion"</span>, <span class="hljs-string">"label"</span>])<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'text'</span>: <span class="hljs-string">'The sky is blue.'</span>}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-unpaired-preference-to-prompt-completion-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-unpaired-preference-to-prompt-completion-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From unpaired preference to prompt-completion dataset</span></h3> <p data-svelte-h="svelte-1mntuxe">To convert an unpaired preference dataset into a prompt-completion dataset, filter for good labels, then remove the label columns.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset dataset = Dataset.from_dict({ <span class="hljs-string">"prompt"</span>: [<span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"The sun is"</span>, <span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"The sun is"</span>], <span class="hljs-string">"completion"</span>: [<span class="hljs-string">" blue."</span>, <span class="hljs-string">" in the sky."</span>, <span class="hljs-string">" green."</span>, <span class="hljs-string">" in the sea."</span>], <span class="hljs-string">"label"</span>: [<span class="hljs-literal">True</span>, <span class="hljs-literal">True</span>, <span class="hljs-literal">False</span>, <span class="hljs-literal">False</span>], }) dataset = dataset.<span class="hljs-built_in">filter</span>(<span class="hljs-keyword">lambda</span> x: x[<span class="hljs-string">"label"</span>]).remove_columns([<span class="hljs-string">"label"</span>])<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'prompt'</span>: <span class="hljs-string">'The sky is'</span>, <span class="hljs-string">'completion'</span>: <span class="hljs-string">' blue.'</span>}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-unpaired-preference-to-prompt-only-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-unpaired-preference-to-prompt-only-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From unpaired preference to prompt-only dataset</span></h3> <p data-svelte-h="svelte-1x58tju">To convert an unpaired preference dataset into a prompt-only dataset, remove the completion and the label columns.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset dataset = Dataset.from_dict({ <span class="hljs-string">"prompt"</span>: [<span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"The sun is"</span>, <span class="hljs-string">"The sky is"</span>, <span class="hljs-string">"The sun is"</span>], <span class="hljs-string">"completion"</span>: [<span class="hljs-string">" blue."</span>, <span class="hljs-string">" in the sky."</span>, <span class="hljs-string">" green."</span>, <span class="hljs-string">" in the sea."</span>], <span class="hljs-string">"label"</span>: [<span class="hljs-literal">True</span>, <span class="hljs-literal">True</span>, <span class="hljs-literal">False</span>, <span class="hljs-literal">False</span>], }) dataset = dataset.remove_columns([<span class="hljs-string">"completion"</span>, <span class="hljs-string">"label"</span>])<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'prompt'</span>: <span class="hljs-string">'The sky is'</span>}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-stepwise-supervision-to-language-modeling-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-stepwise-supervision-to-language-modeling-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From stepwise supervision to language modeling dataset</span></h3> <p data-svelte-h="svelte-usvjfj">To convert a stepwise supervision dataset into a language modeling dataset, concatenate prompts with good completions into the <code>"text"</code> column.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset dataset = Dataset.from_dict({ <span class="hljs-string">"prompt"</span>: [<span class="hljs-string">"Blue light"</span>, <span class="hljs-string">"Water"</span>], <span class="hljs-string">"completions"</span>: [[<span class="hljs-string">" scatters more in the atmosphere,"</span>, <span class="hljs-string">" so the sky is green."</span>], [<span class="hljs-string">" forms a less dense structure in ice,"</span>, <span class="hljs-string">" which causes it to expand when it freezes."</span>]], <span class="hljs-string">"labels"</span>: [[<span class="hljs-literal">True</span>, <span class="hljs-literal">False</span>], [<span class="hljs-literal">True</span>, <span class="hljs-literal">True</span>]], }) <span class="hljs-keyword">def</span> <span class="hljs-title function_">concatenate_prompt_completions</span>(<span class="hljs-params">example</span>): completion = <span class="hljs-string">""</span>.join(example[<span class="hljs-string">"completions"</span>]) <span class="hljs-keyword">return</span> {<span class="hljs-string">"text"</span>: example[<span class="hljs-string">"prompt"</span>] + completion} dataset = dataset.<span class="hljs-built_in">filter</span>(<span class="hljs-keyword">lambda</span> x: <span class="hljs-built_in">all</span>(x[<span class="hljs-string">"labels"</span>])).<span class="hljs-built_in">map</span>(concatenate_prompt_completions, remove_columns=[<span class="hljs-string">"prompt"</span>, <span class="hljs-string">"completions"</span>, <span class="hljs-string">"labels"</span>])<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'text'</span>: <span class="hljs-string">'Blue light scatters more in the atmosphere, so the sky is green.'</span>}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-stepwise-supervision-to-prompt-completion-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-stepwise-supervision-to-prompt-completion-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From stepwise supervision to prompt completion dataset</span></h3> <p data-svelte-h="svelte-1wvawda">To convert a stepwise supervision dataset into a prompt-completion dataset, join the good completions and remove the labels.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset dataset = Dataset.from_dict({ <span class="hljs-string">"prompt"</span>: [<span class="hljs-string">"Blue light"</span>, <span class="hljs-string">"Water"</span>], <span class="hljs-string">"completions"</span>: [[<span class="hljs-string">" scatters more in the atmosphere,"</span>, <span class="hljs-string">" so the sky is green."</span>], [<span class="hljs-string">" forms a less dense structure in ice,"</span>, <span class="hljs-string">" which causes it to expand when it freezes."</span>]], <span class="hljs-string">"labels"</span>: [[<span class="hljs-literal">True</span>, <span class="hljs-literal">False</span>], [<span class="hljs-literal">True</span>, <span class="hljs-literal">True</span>]], }) <span class="hljs-keyword">def</span> <span class="hljs-title function_">join_completions</span>(<span class="hljs-params">example</span>): completion = <span class="hljs-string">""</span>.join(example[<span class="hljs-string">"completions"</span>]) <span class="hljs-keyword">return</span> {<span class="hljs-string">"completion"</span>: completion} dataset = dataset.<span class="hljs-built_in">filter</span>(<span class="hljs-keyword">lambda</span> x: <span class="hljs-built_in">all</span>(x[<span class="hljs-string">"labels"</span>])).<span class="hljs-built_in">map</span>(join_completions, remove_columns=[<span class="hljs-string">"completions"</span>, <span class="hljs-string">"labels"</span>])<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'prompt'</span>: <span class="hljs-string">'Blue light'</span>, <span class="hljs-string">'completion'</span>: <span class="hljs-string">' scatters more in the atmosphere, so the sky is green.'</span>}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-stepwise-supervision-to-prompt-only-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-stepwise-supervision-to-prompt-only-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From stepwise supervision to prompt only dataset</span></h3> <p data-svelte-h="svelte-1fb8htv">To convert a stepwise supervision dataset into a prompt-only dataset, remove the completions and the labels.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset dataset = Dataset.from_dict({ <span class="hljs-string">"prompt"</span>: [<span class="hljs-string">"Blue light"</span>, <span class="hljs-string">"Water"</span>], <span class="hljs-string">"completions"</span>: [[<span class="hljs-string">" scatters more in the atmosphere,"</span>, <span class="hljs-string">" so the sky is green."</span>], [<span class="hljs-string">" forms a less dense structure in ice,"</span>, <span class="hljs-string">" which causes it to expand when it freezes."</span>]], <span class="hljs-string">"labels"</span>: [[<span class="hljs-literal">True</span>, <span class="hljs-literal">False</span>], [<span class="hljs-literal">True</span>, <span class="hljs-literal">True</span>]], }) dataset = dataset.remove_columns([<span class="hljs-string">"completions"</span>, <span class="hljs-string">"labels"</span>])<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'prompt'</span>: <span class="hljs-string">'Blue light'</span>}<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="from-stepwise-supervision-to-unpaired-preference-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#from-stepwise-supervision-to-unpaired-preference-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>From stepwise supervision to unpaired preference dataset</span></h3> <p data-svelte-h="svelte-1p3ncex">To convert a stepwise supervision dataset into an unpaired preference dataset, join the completions and merge the labels.</p> <p data-svelte-h="svelte-wppc03">The method for merging the labels depends on the specific task. In this example, we use the logical AND operation. This means that if the step labels indicate the correctness of individual steps, the resulting label will reflect the correctness of the entire sequence.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> Dataset dataset = Dataset.from_dict({ <span class="hljs-string">"prompt"</span>: [<span class="hljs-string">"Blue light"</span>, <span class="hljs-string">"Water"</span>], <span class="hljs-string">"completions"</span>: [[<span class="hljs-string">" scatters more in the atmosphere,"</span>, <span class="hljs-string">" so the sky is green."</span>], [<span class="hljs-string">" forms a less dense structure in ice,"</span>, <span class="hljs-string">" which causes it to expand when it freezes."</span>]], <span class="hljs-string">"labels"</span>: [[<span class="hljs-literal">True</span>, <span class="hljs-literal">False</span>], [<span class="hljs-literal">True</span>, <span class="hljs-literal">True</span>]], }) <span class="hljs-keyword">def</span> <span class="hljs-title function_">merge_completions_and_labels</span>(<span class="hljs-params">example</span>): <span class="hljs-keyword">return</span> {<span class="hljs-string">"prompt"</span>: example[<span class="hljs-string">"prompt"</span>], <span class="hljs-string">"completion"</span>: <span class="hljs-string">""</span>.join(example[<span class="hljs-string">"completions"</span>]), <span class="hljs-string">"label"</span>: <span class="hljs-built_in">all</span>(example[<span class="hljs-string">"labels"</span>])} dataset = dataset.<span class="hljs-built_in">map</span>(merge_completions_and_labels, remove_columns=[<span class="hljs-string">"completions"</span>, <span class="hljs-string">"labels"</span>])<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-number">0</span>] {<span class="hljs-string">'prompt'</span>: <span class="hljs-string">'Blue light'</span>, <span class="hljs-string">'completion'</span>: <span class="hljs-string">' scatters more in the atmosphere, so the sky is green.'</span>, <span class="hljs-string">'label'</span>: <span class="hljs-literal">False</span>}<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="vision-datasets" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#vision-datasets"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Vision datasets</span></h2> <p data-svelte-h="svelte-1f2j4bf">Some trainers also support fine-tuning vision-language models (VLMs) using image-text pairs. In this scenario, it’s recommended to use a conversational format, as each model handles image placeholders in text differently.</p> <p data-svelte-h="svelte-1dkity6">A conversational vision dataset differs from a standard conversational dataset in two key ways:</p> <ol data-svelte-h="svelte-imd92t"><li>The dataset must contain the key <code>images</code> with the image data.</li> <li>The <code>"content"</code> field in messages must be a list of dictionaries, where each dictionary specifies the type of data: <code>"image"</code> or <code>"text"</code>.</li></ol> <p data-svelte-h="svelte-11lpom8">Example:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-comment"># Textual dataset:</span> <span class="hljs-string">"content"</span>: <span class="hljs-string">"What color is the sky?"</span> <span class="hljs-comment"># Vision dataset:</span> <span class="hljs-string">"content"</span>: [ {<span class="hljs-string">"type"</span>: <span class="hljs-string">"image"</span>}, {<span class="hljs-string">"type"</span>: <span class="hljs-string">"text"</span>, <span class="hljs-string">"text"</span>: <span class="hljs-string">"What color is the sky in the image?"</span>} ]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-11ot8l1">An example of a conversational vision dataset is the <a href="https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset" rel="nofollow">openbmb/RLAIF-V-Dataset</a>. Below is an embedded view of the dataset’s training data, allowing you to explore it directly:</p> <iframe src="https://huggingface.co/datasets/trl-lib/rlaif-v/embed/viewer/default/train" frameborder="0" width="100%" height="560px"></iframe> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/trl/blob/main/docs/source/dataset_formats.md" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> <script> { __sveltekit_wrknss = { assets: "/docs/trl/v0.15.2/en", base: "/docs/trl/v0.15.2/en", env: {} }; const element = document.currentScript.parentElement; const data = [null,null]; Promise.all([ import("/docs/trl/v0.15.2/en/_app/immutable/entry/start.a67350a3.js"), import("/docs/trl/v0.15.2/en/_app/immutable/entry/app.0e07cdbf.js") ]).then(([kit, app]) => { kit.start(app, element, { node_ids: [0, 11], data, form: null, error: null }); }); } </script> <!-- HTML_TAG_END --></div> <div class="SVELTE_HYDRATER contents" data-target="DocFooterNav" data-props="{"classNames":"mx-auto mt-16 flex max-w-4xl items-center pb-8 font-sans font-medium leading-6 xl:mt-32","chapterPrev":{"title":"Quickstart","isExpanded":true,"id":"quickstart","url":"/docs/trl/v0.15.2/en/quickstart"},"chapterNext":{"title":"Training FAQ","isExpanded":true,"id":"how_to_train","url":"/docs/trl/v0.15.2/en/how_to_train"},"isCourse":false,"isLoggedIn":false}"><div class="mx-auto mt-16 flex max-w-4xl items-center pb-8 font-sans font-medium leading-6 xl:mt-32"><a href="/docs/trl/v0.15.2/en/quickstart" class="mr-8 flex transform items-center text-gray-600 transition-all hover:-translate-x-px hover:text-gray-900 dark:hover:text-gray-300"><span class="mr-2 translate-y-px">←</span>Quickstart</a> <a href="/docs/trl/v0.15.2/en/how_to_train" class="ml-auto flex transform items-center text-right text-gray-600 transition-all hover:translate-x-px hover:text-gray-900 dark:hover:text-gray-300">Training FAQ<span class="ml-2 translate-y-px">→</span></a></div></div></div></div> <div class="sticky top-0 self-start"><div class="SVELTE_HYDRATER contents" data-target="SubSideMenu" data-props="{"chapter":{"title":"Dataset formats and types","isExpanded":true,"id":"dataset-formats-and-types","url":"#dataset-formats-and-types","sections":[{"title":"Overview of the dataset formats and types","isExpanded":true,"id":"overview-of-the-dataset-formats-and-types","url":"#overview-of-the-dataset-formats-and-types","sections":[{"title":"Formats","isExpanded":true,"id":"formats","url":"#formats","sections":[{"title":"Standard","isExpanded":true,"id":"standard","url":"#standard","sections":[]},{"title":"Conversational","isExpanded":true,"id":"conversational","url":"#conversational","sections":[]}]},{"title":"Types","isExpanded":true,"id":"types","url":"#types","sections":[{"title":"Language modeling","isExpanded":true,"id":"language-modeling","url":"#language-modeling","sections":[]},{"title":"Prompt-only","isExpanded":true,"id":"prompt-only","url":"#prompt-only","sections":[]},{"title":"Prompt-completion","isExpanded":true,"id":"prompt-completion","url":"#prompt-completion","sections":[]},{"title":"Preference","isExpanded":true,"id":"preference","url":"#preference","sections":[]},{"title":"Unpaired preference","isExpanded":true,"id":"unpaired-preference","url":"#unpaired-preference","sections":[]},{"title":"Stepwise supervision","isExpanded":true,"id":"stepwise-supervision","url":"#stepwise-supervision","sections":[]}]}]},{"title":"Which dataset type to use?","isExpanded":true,"id":"which-dataset-type-to-use","url":"#which-dataset-type-to-use","sections":[]},{"title":"Working with conversational datasets in TRL","isExpanded":true,"id":"working-with-conversational-datasets-in-trl","url":"#working-with-conversational-datasets-in-trl","sections":[{"title":"Converting a conversational dataset into a standard dataset","isExpanded":true,"id":"converting-a-conversational-dataset-into-a-standard-dataset","url":"#converting-a-conversational-dataset-into-a-standard-dataset","sections":[]}]},{"title":"Using any dataset with TRL: preprocessing and conversion","isExpanded":true,"id":"using-any-dataset-with-trl-preprocessing-and-conversion","url":"#using-any-dataset-with-trl-preprocessing-and-conversion","sections":[{"title":"Example: UltraFeedback dataset","isExpanded":true,"id":"example-ultrafeedback-dataset","url":"#example-ultrafeedback-dataset","sections":[]}]},{"title":"Utilities for converting dataset types","isExpanded":true,"id":"utilities-for-converting-dataset-types","url":"#utilities-for-converting-dataset-types","sections":[{"title":"From prompt-completion to language modeling dataset","isExpanded":true,"id":"from-prompt-completion-to-language-modeling-dataset","url":"#from-prompt-completion-to-language-modeling-dataset","sections":[]},{"title":"From prompt-completion to prompt-only dataset","isExpanded":true,"id":"from-prompt-completion-to-prompt-only-dataset","url":"#from-prompt-completion-to-prompt-only-dataset","sections":[]},{"title":"From preference with implicit prompt to language modeling dataset","isExpanded":true,"id":"from-preference-with-implicit-prompt-to-language-modeling-dataset","url":"#from-preference-with-implicit-prompt-to-language-modeling-dataset","sections":[]},{"title":"From preference with implicit prompt to prompt-completion dataset","isExpanded":true,"id":"from-preference-with-implicit-prompt-to-prompt-completion-dataset","url":"#from-preference-with-implicit-prompt-to-prompt-completion-dataset","sections":[]},{"title":"From preference with implicit prompt to prompt-only dataset","isExpanded":true,"id":"from-preference-with-implicit-prompt-to-prompt-only-dataset","url":"#from-preference-with-implicit-prompt-to-prompt-only-dataset","sections":[]},{"title":"From implicit to explicit prompt preference dataset","isExpanded":true,"id":"from-implicit-to-explicit-prompt-preference-dataset","url":"#from-implicit-to-explicit-prompt-preference-dataset","sections":[]},{"title":"From preference with implicit prompt to unpaired preference dataset","isExpanded":true,"id":"from-preference-with-implicit-prompt-to-unpaired-preference-dataset","url":"#from-preference-with-implicit-prompt-to-unpaired-preference-dataset","sections":[]},{"title":"From preference to language modeling dataset","isExpanded":true,"id":"from-preference-to-language-modeling-dataset","url":"#from-preference-to-language-modeling-dataset","sections":[]},{"title":"From preference to prompt-completion dataset","isExpanded":true,"id":"from-preference-to-prompt-completion-dataset","url":"#from-preference-to-prompt-completion-dataset","sections":[]},{"title":"From preference to prompt-only dataset","isExpanded":true,"id":"from-preference-to-prompt-only-dataset","url":"#from-preference-to-prompt-only-dataset","sections":[]},{"title":"From explicit to implicit prompt preference dataset","isExpanded":true,"id":"from-explicit-to-implicit-prompt-preference-dataset","url":"#from-explicit-to-implicit-prompt-preference-dataset","sections":[]},{"title":"From preference to unpaired preference dataset","isExpanded":true,"id":"from-preference-to-unpaired-preference-dataset","url":"#from-preference-to-unpaired-preference-dataset","sections":[]},{"title":"From unpaired preference to language modeling dataset","isExpanded":true,"id":"from-unpaired-preference-to-language-modeling-dataset","url":"#from-unpaired-preference-to-language-modeling-dataset","sections":[]},{"title":"From unpaired preference to prompt-completion dataset","isExpanded":true,"id":"from-unpaired-preference-to-prompt-completion-dataset","url":"#from-unpaired-preference-to-prompt-completion-dataset","sections":[]},{"title":"From unpaired preference to prompt-only dataset","isExpanded":true,"id":"from-unpaired-preference-to-prompt-only-dataset","url":"#from-unpaired-preference-to-prompt-only-dataset","sections":[]},{"title":"From stepwise supervision to language modeling dataset","isExpanded":true,"id":"from-stepwise-supervision-to-language-modeling-dataset","url":"#from-stepwise-supervision-to-language-modeling-dataset","sections":[]},{"title":"From stepwise supervision to prompt completion dataset","isExpanded":true,"id":"from-stepwise-supervision-to-prompt-completion-dataset","url":"#from-stepwise-supervision-to-prompt-completion-dataset","sections":[]},{"title":"From stepwise supervision to prompt only dataset","isExpanded":true,"id":"from-stepwise-supervision-to-prompt-only-dataset","url":"#from-stepwise-supervision-to-prompt-only-dataset","sections":[]},{"title":"From stepwise supervision to unpaired preference dataset","isExpanded":true,"id":"from-stepwise-supervision-to-unpaired-preference-dataset","url":"#from-stepwise-supervision-to-unpaired-preference-dataset","sections":[]}]},{"title":"Vision datasets","isExpanded":true,"id":"vision-datasets","url":"#vision-datasets","sections":[]}]}}"> <nav class="hidden h-dvh w-[270px] flex-none flex-col space-y-3 overflow-y-auto break-words border-l pb-16 pl-6 pr-10 pt-24 text-sm lg:flex 2xl:w-[305px]"> <a href="#dataset-formats-and-types" class=" text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-dataset-formats-and-types"><!-- HTML_TAG_START --><wbr>Dataset formats and types<!-- HTML_TAG_END --></a> <a href="#overview-of-the-dataset-formats-and-types" class="pl-4 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-overview-of-the-dataset-formats-and-types"><!-- HTML_TAG_START --><wbr>Overview of the dataset formats and types<!-- HTML_TAG_END --></a> <a href="#formats" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-formats"><!-- HTML_TAG_START --><wbr>Formats<!-- HTML_TAG_END --></a> <a href="#standard" class="pl-12 text-xs text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-standard"><!-- HTML_TAG_START --><wbr>Standard<!-- HTML_TAG_END --></a> <a href="#conversational" class="pl-12 text-xs text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-conversational"><!-- HTML_TAG_START --><wbr>Conversational<!-- HTML_TAG_END --></a> <a href="#types" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-types"><!-- HTML_TAG_START --><wbr>Types<!-- HTML_TAG_END --></a> <a href="#language-modeling" class="pl-12 text-xs text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-language-modeling"><!-- HTML_TAG_START --><wbr>Language modeling<!-- HTML_TAG_END --></a> <a href="#prompt-only" class="pl-12 text-xs text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-prompt-only"><!-- HTML_TAG_START --><wbr>Prompt-only<!-- HTML_TAG_END --></a> <a href="#prompt-completion" class="pl-12 text-xs text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-prompt-completion"><!-- HTML_TAG_START --><wbr>Prompt-completion<!-- HTML_TAG_END --></a> <a href="#preference" class="pl-12 text-xs text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-preference"><!-- HTML_TAG_START --><wbr>Preference<!-- HTML_TAG_END --></a> <a href="#unpaired-preference" class="pl-12 text-xs text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-unpaired-preference"><!-- HTML_TAG_START --><wbr>Unpaired preference<!-- HTML_TAG_END --></a> <a href="#stepwise-supervision" class="pl-12 text-xs text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-stepwise-supervision"><!-- HTML_TAG_START --><wbr>Stepwise supervision<!-- HTML_TAG_END --></a> <a href="#which-dataset-type-to-use" class="pl-4 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-which-dataset-type-to-use"><!-- HTML_TAG_START --><wbr>Which dataset type to use?<!-- HTML_TAG_END --></a> <a href="#working-with-conversational-datasets-in-trl" class="pl-4 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-working-with-conversational-datasets-in-trl"><!-- HTML_TAG_START --><wbr>Working with conversational datasets in TRL<!-- HTML_TAG_END --></a> <a href="#converting-a-conversational-dataset-into-a-standard-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-converting-a-conversational-dataset-into-a-standard-dataset"><!-- HTML_TAG_START --><wbr>Converting a conversational dataset into a standard dataset<!-- HTML_TAG_END --></a> <a href="#using-any-dataset-with-trl-preprocessing-and-conversion" class="pl-4 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-using-any-dataset-with-trl-preprocessing-and-conversion"><!-- HTML_TAG_START --><wbr>Using any dataset with TR<wbr>L: preprocessing and conversion<!-- HTML_TAG_END --></a> <a href="#example-ultrafeedback-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-example-ultrafeedback-dataset"><!-- HTML_TAG_START --><wbr>Example: <wbr>Ultra<wbr>Feedback dataset<!-- HTML_TAG_END --></a> <a href="#utilities-for-converting-dataset-types" class="pl-4 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-utilities-for-converting-dataset-types"><!-- HTML_TAG_START --><wbr>Utilities for converting dataset types<!-- HTML_TAG_END --></a> <a href="#from-prompt-completion-to-language-modeling-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-prompt-completion-to-language-modeling-dataset"><!-- HTML_TAG_START --><wbr>From prompt-completion to language modeling dataset<!-- HTML_TAG_END --></a> <a href="#from-prompt-completion-to-prompt-only-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-prompt-completion-to-prompt-only-dataset"><!-- HTML_TAG_START --><wbr>From prompt-completion to prompt-only dataset<!-- HTML_TAG_END --></a> <a href="#from-preference-with-implicit-prompt-to-language-modeling-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-preference-with-implicit-prompt-to-language-modeling-dataset"><!-- HTML_TAG_START --><wbr>From preference with implicit prompt to language modeling dataset<!-- HTML_TAG_END --></a> <a href="#from-preference-with-implicit-prompt-to-prompt-completion-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-preference-with-implicit-prompt-to-prompt-completion-dataset"><!-- HTML_TAG_START --><wbr>From preference with implicit prompt to prompt-completion dataset<!-- HTML_TAG_END --></a> <a href="#from-preference-with-implicit-prompt-to-prompt-only-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-preference-with-implicit-prompt-to-prompt-only-dataset"><!-- HTML_TAG_START --><wbr>From preference with implicit prompt to prompt-only dataset<!-- HTML_TAG_END --></a> <a href="#from-implicit-to-explicit-prompt-preference-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-implicit-to-explicit-prompt-preference-dataset"><!-- HTML_TAG_START --><wbr>From implicit to explicit prompt preference dataset<!-- HTML_TAG_END --></a> <a href="#from-preference-with-implicit-prompt-to-unpaired-preference-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-preference-with-implicit-prompt-to-unpaired-preference-dataset"><!-- HTML_TAG_START --><wbr>From preference with implicit prompt to unpaired preference dataset<!-- HTML_TAG_END --></a> <a href="#from-preference-to-language-modeling-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-preference-to-language-modeling-dataset"><!-- HTML_TAG_START --><wbr>From preference to language modeling dataset<!-- HTML_TAG_END --></a> <a href="#from-preference-to-prompt-completion-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-preference-to-prompt-completion-dataset"><!-- HTML_TAG_START --><wbr>From preference to prompt-completion dataset<!-- HTML_TAG_END --></a> <a href="#from-preference-to-prompt-only-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-preference-to-prompt-only-dataset"><!-- HTML_TAG_START --><wbr>From preference to prompt-only dataset<!-- HTML_TAG_END --></a> <a href="#from-explicit-to-implicit-prompt-preference-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-explicit-to-implicit-prompt-preference-dataset"><!-- HTML_TAG_START --><wbr>From explicit to implicit prompt preference dataset<!-- HTML_TAG_END --></a> <a href="#from-preference-to-unpaired-preference-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-preference-to-unpaired-preference-dataset"><!-- HTML_TAG_START --><wbr>From preference to unpaired preference dataset<!-- HTML_TAG_END --></a> <a href="#from-unpaired-preference-to-language-modeling-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-unpaired-preference-to-language-modeling-dataset"><!-- HTML_TAG_START --><wbr>From unpaired preference to language modeling dataset<!-- HTML_TAG_END --></a> <a href="#from-unpaired-preference-to-prompt-completion-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-unpaired-preference-to-prompt-completion-dataset"><!-- HTML_TAG_START --><wbr>From unpaired preference to prompt-completion dataset<!-- HTML_TAG_END --></a> <a href="#from-unpaired-preference-to-prompt-only-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-unpaired-preference-to-prompt-only-dataset"><!-- HTML_TAG_START --><wbr>From unpaired preference to prompt-only dataset<!-- HTML_TAG_END --></a> <a href="#from-stepwise-supervision-to-language-modeling-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-stepwise-supervision-to-language-modeling-dataset"><!-- HTML_TAG_START --><wbr>From stepwise supervision to language modeling dataset<!-- HTML_TAG_END --></a> <a href="#from-stepwise-supervision-to-prompt-completion-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-stepwise-supervision-to-prompt-completion-dataset"><!-- HTML_TAG_START --><wbr>From stepwise supervision to prompt completion dataset<!-- HTML_TAG_END --></a> <a href="#from-stepwise-supervision-to-prompt-only-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-stepwise-supervision-to-prompt-only-dataset"><!-- HTML_TAG_START --><wbr>From stepwise supervision to prompt only dataset<!-- HTML_TAG_END --></a> <a href="#from-stepwise-supervision-to-unpaired-preference-dataset" class="pl-8 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-from-stepwise-supervision-to-unpaired-preference-dataset"><!-- HTML_TAG_START --><wbr>From stepwise supervision to unpaired preference dataset<!-- HTML_TAG_END --></a> <a href="#vision-datasets" class="pl-4 text-gray-400 transform hover:translate-x-px hover:text-gray-700 dark:hover:text-gray-300" id="nav-vision-datasets"><!-- HTML_TAG_START --><wbr>Vision datasets<!-- HTML_TAG_END --></a> </nav></div></div></div> <div id="doc-footer"></div></main> </div> <script> import("\/front\/build\/kube-081ff2d\/index.js"); window.moonSha = "kube-081ff2d\/"; window.__hf_deferred = {}; </script> <!-- Stripe --> <script> if (["hf.co", "huggingface.co"].includes(window.location.hostname)) { const script = document.createElement("script"); script.src = "https://js.stripe.com/v3/"; script.async = true; document.head.appendChild(script); } </script> </body> </html>