CINXE.COM
Stream episode From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki by ODSC's Ai X Podcast podcast | Listen online for free on SoundCloud
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="theme-color" content="#333"> <link rel="dns-prefetch" href="//style.sndcdn.com"> <link rel="dns-prefetch" href="//a-v2.sndcdn.com"> <link rel="dns-prefetch" href="//api-v2.soundcloud.com"> <link rel="dns-prefetch" href="//sb.scorecardresearch.com"> <link rel="dns-prefetch" href="//secure.quantserve.com"> <link rel="dns-prefetch" href="//eventlogger.soundcloud.com"> <link rel="dns-prefetch" href="//api.soundcloud.com"> <link rel="dns-prefetch" href="//ssl.google-analytics.com"> <link rel="dns-prefetch" href="//i1.sndcdn.com"> <link rel="dns-prefetch" href="//i2.sndcdn.com"> <link rel="dns-prefetch" href="//i3.sndcdn.com"> <link rel="dns-prefetch" href="//i4.sndcdn.com"> <link rel="dns-prefetch" href="//wis.sndcdn.com"> <link rel="dns-prefetch" href="//va.sndcdn.com"> <link rel="dns-prefetch" href="//pixel.quantserve.com"> <title>Stream episode From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki by ODSC's Ai X Podcast podcast | Listen online for free on SoundCloud</title> <meta content="record, sounds, share, sound, audio, tracks, music, soundcloud" name="keywords"> <meta name="referrer" content="origin"> <meta name="google-site-verification" content="dY0CigqM8Inubs_hgrYMwk-zGchKwrvJLcvI_G8631Q"> <link crossorigin="use-credentials" rel="manifest" href="/webmanifest.json"> <meta name="viewport" content="width=device-width,minimum-scale=1,maximum-scale=1,user-scalable=no"> <meta content="19507961798" property="fb:app_id"> <meta content="SoundCloud" property="og:site_name"> <meta content="SoundCloud" property="twitter:site"> <meta content="SoundCloud" property="twitter:app:name:iphone"> <meta content="336353151" property="twitter:app:id:iphone"> <meta content="SoundCloud" property="twitter:app:name:ipad"> <meta content="336353151" property="twitter:app:id:ipad"> <meta content="SoundCloud" property="twitter:app:name:googleplay"> <meta content="com.soundcloud.android" property="twitter:app:id:googleplay"> <link href="/sc-opensearch.xml" rel="search" title="SoundCloud" type="application/opensearchdescription+xml"> <meta name="description" content="Play From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki by ODSC's Ai X Podcast on desktop and mobile. Play over 320 million tracks for free on SoundCloud."><meta property="twitter:app:name:iphone" content="SoundCloud"><meta property="twitter:app:id:iphone" content="336353151"><meta property="twitter:app:name:ipad" content="SoundCloud"><meta property="twitter:app:id:ipad" content="336353151"><meta property="twitter:app:name:googleplay" content="SoundCloud"><meta property="twitter:app:id:googleplay" content="com.soundcloud.android"><meta property="twitter:title" content="From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki"><meta property="twitter:description" content="In this episode of ODSC’s Ai X Podcast, our guest today, Jason Lopatecki, co-founder and CEO of Arize AI, joins us to discuss GenAI evaluations. Arize AI is a startup that is one of the leaders in A"><meta property="twitter:card" content="player"><meta property="twitter:player" content="https://w.soundcloud.com/player/?url=https%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F1931816531&auto_play=false&show_artwork=true&visual=true&origin=twitter"><meta property="twitter:url" content="https://soundcloud.com/aixpodcast/from-llms-to-ai-agents-and-rag-mastering-genai-evaluations-with-jason-lopatecki"><meta property="twitter:player:height" content="400"><meta property="twitter:player:width" content="435"><meta property="twitter:image" content="https://i1.sndcdn.com/artworks-hYzmtVD0oh1zjvfD-xrVRwA-t500x500.jpg"><meta property="twitter:app:url:googleplay" content="soundcloud://sounds:1931816531"><meta property="twitter:app:url:iphone" content="soundcloud://sounds:1931816531"><meta property="twitter:app:url:ipad" content="soundcloud://sounds:1931816531"><meta property="al:ios:app_name" content="SoundCloud"><meta property="al:ios:app_store_id" content="336353151"><meta property="al:android:app_name" content="SoundCloud"><meta property="al:android:package" content="com.soundcloud.android"><meta property="og:type" content="music.song"><meta property="og:url" content="https://soundcloud.com/aixpodcast/from-llms-to-ai-agents-and-rag-mastering-genai-evaluations-with-jason-lopatecki"><meta property="og:title" content="From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki"><meta property="og:image" content="https://i1.sndcdn.com/artworks-hYzmtVD0oh1zjvfD-xrVRwA-t500x500.jpg"><meta property="og:image:width" content="500"><meta property="og:image:height" content="500"><meta property="og:description" content="In this episode of ODSC’s Ai X Podcast, our guest today, Jason Lopatecki, co-founder and CEO of Arize AI, joins us to discuss GenAI evaluations. Arize AI is a startup that is one of the leaders in A"><meta property="al:web:should_fallback" content="false"><meta property="al:ios:url" content="soundcloud://sounds:1931816531"><meta property="al:android:url" content="soundcloud://sounds:1931816531"><meta property="soundcloud:user" content="https://soundcloud.com/aixpodcast"><meta property="soundcloud:play_count" content="6"><meta property="soundcloud:download_count" content="0"><meta property="soundcloud:comments_count" content="0"><meta property="soundcloud:like_count" content="0"> <link rel="canonical" href="https://soundcloud.com/aixpodcast/from-llms-to-ai-agents-and-rag-mastering-genai-evaluations-with-jason-lopatecki"><link rel="alternate" media="only screen and (max-width: 640px)" href="https://m.soundcloud.com/aixpodcast/from-llms-to-ai-agents-and-rag-mastering-genai-evaluations-with-jason-lopatecki"><link rel="alternate" type="text/xml+oembed" href="https://soundcloud.com/oembed?url=https%3A%2F%2Fsoundcloud.com%2Faixpodcast%2Ffrom-llms-to-ai-agents-and-rag-mastering-genai-evaluations-with-jason-lopatecki&format=xml"><link rel="alternate" type="text/json+oembed" href="https://soundcloud.com/oembed?url=https%3A%2F%2Fsoundcloud.com%2Faixpodcast%2Ffrom-llms-to-ai-agents-and-rag-mastering-genai-evaluations-with-jason-lopatecki&format=json"><link rel="author" href="/aixpodcast"><link rel="alternate" href="android-app://com.soundcloud.android/soundcloud/sounds:1931816531"><link rel="alternate" href="ios-app://336353151/soundcloud/sounds:1931816531"> <meta name="application-name" content="SoundCloud"> <meta name="msapplication-tooltip" content="Launch SoundCloud"> <meta name="msapplication-TileImage" content="https://a-v2.sndcdn.com/assets/images/sc-icons/win8-2dc974a18a.png"> <meta name="msapplication-TileColor" content="#ff5500"> <meta name="msapplication-starturl" content="https://soundcloud.com"> <link href="https://a-v2.sndcdn.com/assets/images/sc-icons/favicon-2cadd14bdb.ico" rel="icon"> <link href="https://a-v2.sndcdn.com/assets/images/sc-icons/ios-a62dfc8fe7.png" rel="apple-touch-icon"> <link href="https://a-v2.sndcdn.com/assets/images/sc-icons/fluid-b4e7a64b8b.png" rel="fluid-icon"> <script> (function () { window.ddjskey = '7FC6D561817844F25B65CDD97F28A1'; // https://docs.datadome.co/docs/how-to-configure-the-javascript-tag window.ddoptions = { ajaxListenerPath: [{"host":"api-v2.soundcloud.com","path":"/tracks","strict":true},{"host":"api-v2.soundcloud.com","path":"/tracks/*/comments","strict":true},{"host":"api-v2.soundcloud.com","path":"/users/*/conversations/*","strict":true},{"host":"api-v2.soundcloud.com","path":"/me/followings/*","strict":true},{"host":"api-v2.soundcloud.com","path":"/users/*/track_likes/*","strict":true},{"host":"api-v2.soundcloud.com","path":"/users/*/playlist_likes/*","strict":true},{"host":"api-v2.soundcloud.com","path":"/users/*/system_playlist_likes/*","strict":true},{"host":"api-v2.soundcloud.com","path":"/users/*/emails","strict":true},{"host":"api-v2.soundcloud.com","path":"/playlists","strict":true},{"host":"api-v2.soundcloud.com","path":"/playlists/*","strict":true},{"host":"api-v2.soundcloud.com","path":"/me","strict":true},{"host":"api-v2.soundcloud.com","path":"/me/track_reposts/*","strict":true},{"host":"api-v2.soundcloud.com","path":"/me/track_reposts/*/caption","strict":true},{"host":"api-v2.soundcloud.com","path":"/me/playlist_reposts/*","strict":true},{"host":"api-v2.soundcloud.com","path":"/uploads/*/track-transcoding","strict":true},{"host":"api-v2.soundcloud.com","path":"/uploads/track-upload-policy","strict":true},{"host":"graph.soundcloud.com","path":"/graphql","strict":true}], overrideAbortFetch: true, sessionByHeader: true, cookieName: 'datadome', endpoint: 'https://dwt.soundcloud.com/js/', disableAutoRefreshOnCaptchaPassed: true, enableTagEvents: true, abortAsyncOnCaptchaDisplay: false, }; })(); </script> <script src="https://dwt.soundcloud.com/tags.js" async></script> <script>!function(){var o,a,r;function e(a){return a.test(o)}o=window.navigator.userAgent.toLowerCase();var i,t,n,s=void 0!==window.opera&&"[object Opera]"===window.opera.toString(),p=o.match(/\sopr\/([0-9]+)\./),d=e(/chrome/),c=e(/webkit/),m=!d&&e(/safari/),w=!s&&e(/msie|trident/),f=!c&&e(/gecko/);i=p?parseInt(p[1],10):(n=o.match(/(opera|chrome|safari|firefox|msie|rv:)\/?\s*(\.?\d+(\.\d+)*)/i))&&(t=o.match(/version\/([.\d]+)/i))?parseInt(t[1],10):n?parseInt(n[2],10):null;var h=e(/mobile|android|iphone|ipod|symbianos|nokia|s60|playbook|playstation/);f&&(r=(a=o.match(/(firefox)\/?\s*(\.?\d+(\.\d+)*)/i))&&a.length>1&&parseInt(a[2],10)>=47),i&&!h&&(d&&!p&&i<49||f&&!p&&!1===r||m&&i<9||w||s&&i<13||p&&i<27)&&(window.__sc_abortApp=!0)}()</script> <link rel="stylesheet" href="https://style.sndcdn.com/css/inter-43e88497e6ff16c818c5.css"> <link rel="stylesheet" href="https://a-v2.sndcdn.com/assets/css/app-f8f0f158f5726e929b9d.css"> </head> <body class="sc-classic"> <div id="app"> <style>.header{width:100%;background:var(--background-surface-color);height:46px}.sc-classic .header{background:#333}.header__logo{background:var(--background-surface-color)}.sc-classic .header__logo{background:#333}body:not(.sc-classic) .header__logoLink{display:flex;flex-direction:column;justify-content:center;align-content:center}.header__logoLink{height:46px;width:48px}.header__logoLink svg{color:var(--primary-color)}.sc-classic .header__logoLink{background:transparent url(https://a-v2.sndcdn.com/assets/images/header/brand-1b72dd8210.svg) no-repeat 0 11px;background-size:49px 22px;display:block;height:46px;width:49px;margin-right:23px}.sc-classic .header__logoLink{background-image:url(https://a-v2.sndcdn.com/assets/images/header/cloud-e365a472bf.png);background-position-x:12px;background-size:48px 22px;width:69px;margin-right:unset}.sc-classic .header__logoLink svg{display:none}.header__logoLink:focus{background-color:rgba(255,72,0,.8);outline:0}#header__loading{margin:13px auto 0;width:16px;background:url(https://a-v2.sndcdn.com/assets/images/loader-dark-45940ae3d4.gif) center no-repeat;background-size:16px 16px}@media (-webkit-min-device-pixel-ratio:2),(min-resolution:192dpi),(min-resolution:2dppx){.sc-classic .header__logoLink{background-image:url(https://a-v2.sndcdn.com/assets/images/header/cloud@2x-e5fba4606d.png)}}</style> <div role="banner" class="header sc-selection-disabled show fixed g-dark g-z-index-header"> <div class="header__inner l-container l-fullwidth"> <div class="header__left left"> <div class="header__logo left"> <a href="/" title="Home" class="header__logoLink sc-border-box sc-ir"> <svg viewBox="0 0 143 64" xmlns="http://www.w3.org/2000/svg" aria-hidden="true"> <path fill="currentColor" transform="translate(-166.000000, -1125.000000)" d="M308.984235,1169.99251 C308.382505,1180.70295 299.444837,1189.03525 288.718543,1188.88554 L240.008437,1188.88554 C237.777524,1188.86472 235.977065,1187.05577 235.966737,1184.82478 L235.966737,1132.37801 C235.894282,1130.53582 236.962478,1128.83883 238.654849,1128.10753 C238.654849,1128.10753 243.135035,1124.99996 252.572022,1124.99996 C258.337036,1124.99309 263.996267,1126.54789 268.948531,1129.49925 C276.76341,1134.09703 282.29495,1141.75821 284.200228,1150.62285 C285.880958,1150.14737 287.620063,1149.90993 289.36674,1149.91746 C294.659738,1149.88414 299.738952,1152.0036 303.438351,1155.78928 C307.13775,1159.57496 309.139562,1164.70168 308.984235,1169.99251 Z M229.885123,1135.69525 C231.353099,1153.48254 232.420718,1169.70654 229.885123,1187.43663 C229.796699,1188.23857 229.119091,1188.84557 228.312292,1188.84557 C227.505494,1188.84557 226.827885,1188.23857 226.739461,1187.43663 C224.375448,1169.85905 225.404938,1153.33003 226.739461,1135.69525 C226.672943,1135.09199 226.957336,1134.50383 227.471487,1134.18133 C227.985639,1133.85884 228.638946,1133.85884 229.153097,1134.18133 C229.667248,1134.50383 229.951641,1135.09199 229.885123,1135.69525 Z M220.028715,1187.4557 C219.904865,1188.26549 219.208361,1188.86356 218.389157,1188.86356 C217.569953,1188.86356 216.87345,1188.26549 216.7496,1187.4557 C214.986145,1172.28686 214.986145,1156.96477 216.7496,1141.79593 C216.840309,1140.9535 217.551388,1140.31488 218.398689,1140.31488 C219.245991,1140.31488 219.95707,1140.9535 220.047779,1141.79593 C222.005153,1156.95333 221.998746,1172.29994 220.028715,1187.4557 Z M210.153241,1140.2517 C211.754669,1156.55195 212.479125,1171.15545 210.134176,1187.41757 C210.134176,1188.29148 209.425728,1188.99993 208.551813,1188.99993 C207.677898,1188.99993 206.969449,1188.29148 206.969449,1187.41757 C204.70076,1171.36516 205.463344,1156.34224 206.969449,1140.2517 C207.05845,1139.43964 207.744425,1138.82474 208.561345,1138.82474 C209.378266,1138.82474 210.06424,1139.43964 210.153241,1140.2517 Z M200.258703,1187.47476 C200.169129,1188.29694 199.474788,1188.91975 198.647742,1188.91975 C197.820697,1188.91975 197.126356,1188.29694 197.036782,1187.47476 C195.216051,1173.32359 195.216051,1158.99744 197.036782,1144.84627 C197.036782,1143.94077 197.770837,1143.20671 198.676339,1143.20671 C199.581842,1143.20671 200.315897,1143.94077 200.315897,1144.84627 C202.251054,1158.99121 202.231809,1173.33507 200.258703,1187.47476 Z M190.383229,1155.50339 C192.880695,1166.56087 191.755882,1176.32196 190.287906,1187.58915 C190.168936,1188.33924 189.522207,1188.89148 188.762737,1188.89148 C188.003266,1188.89148 187.356537,1188.33924 187.237567,1187.58915 C185.903044,1176.47448 184.797296,1166.48462 187.142244,1155.50339 C187.142244,1154.60842 187.867763,1153.8829 188.762737,1153.8829 C189.65771,1153.8829 190.383229,1154.60842 190.383229,1155.50339 Z M180.526821,1153.82571 C182.814575,1165.15009 182.071055,1174.7396 180.469627,1186.10211 C180.27898,1187.7798 177.400223,1187.79886 177.247706,1186.10211 C175.798795,1174.91118 175.112468,1165.0357 177.190512,1153.82571 C177.281785,1152.97315 178.001234,1152.32661 178.858666,1152.32661 C179.716099,1152.32661 180.435548,1152.97315 180.526821,1153.82571 Z M170.575089,1159.31632 C172.977231,1166.82778 172.157452,1172.92846 170.479765,1180.63056 C170.391921,1181.42239 169.722678,1182.02149 168.925999,1182.02149 C168.12932,1182.02149 167.460077,1181.42239 167.372232,1180.63056 C165.923321,1173.08097 165.332318,1166.84684 167.23878,1159.31632 C167.330053,1158.46376 168.049502,1157.81722 168.906934,1157.81722 C169.764367,1157.81722 170.483816,1158.46376 170.575089,1159.31632 Z"></path> </svg> SoundCloud </a> </div> </div> <div id="header__loading" class="sc-hidden"></div> </div> </div> <script>window.setTimeout((function(){if(!window.__sc_abortApp){var e=window.document.getElementById("header__loading");e&&(e.className="")}}),6e3)</script> <style>.errorPage__inner{width:580px;margin:0 auto;position:relative;padding-top:460px;background:url(https://a-v2.sndcdn.com/assets/images/errors/500-e5a180b7a8.png) no-repeat 50% 80px;text-align:center;transition:all 1s linear}.errorTitle{margin-bottom:10px;font-size:30px}.errorText{line-height:28px;color:#666;font-size:20px}.errorButtons{margin-top:30px}@media (max-width:1280px){.errorPage__inner{background-size:80%}}</style> <noscript class="errorPage__inner"> <div class="errorPage__inner"> <p class="errorTitle">JavaScript is disabled</p> <p class="errorText sc-font-light">You need to enable JavaScript to use SoundCloud</p> <div class="errorButtons"> <a href="http://www.enable-javascript.com/" target="_blank" class="sc-button sc-button-medium">Show me how to enable it</a> </div> </div> </noscript> <noscript><article itemscope itemtype="http://schema.org/MusicRecording"> <header> <h1 itemprop="name"><a itemprop="url" href="/aixpodcast/from-llms-to-ai-agents-and-rag-mastering-genai-evaluations-with-jason-lopatecki">From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki</a> by <a href="/aixpodcast">ODSC's Ai X Podcast</a></h1> published on <time pubdate>2024-10-09T17:54:31Z</time> <meta itemprop="duration" content="PT00H44M25S" /> <meta itemprop="genre" content="Technology" /> <meta itemprop="interactionCount" content="UserLikes:0" /> <meta itemprop="interactionCount" content="UserDownloads:0" /> <meta itemprop="interactionCount" content="UserComments:0" /> <div itemscope itemprop="audio" itemtype="http://schema.org/AudioObject"><meta itemprop="embedUrl" content="https://w.soundcloud.com/player/?url=https%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F1931816531&auto_play=false&show_artwork=true&visual=true&origin=schema.org" /><meta itemprop="height" content="400px" /></div> <div itemscope itemprop="byArtist" itemtype="http://schema.org/MusicGroup"><meta itemprop="name" content="ODSC's Ai X Podcast" /><meta itemprop="url" content="/aixpodcast" /></div> <div itemscope itemprop="provider" itemtype="http://schema.org/Organization"><meta itemprop="name" content="SoundCloud" /><meta itemprop="image" content="http://developers.soundcloud.com/assets/logo_white-8bf7615eb575eeb114fc65323068e1e4.png" /></div> </header> <p> <img src="https://i1.sndcdn.com/artworks-hYzmtVD0oh1zjvfD-xrVRwA-t500x500.jpg" width="" height="" alt="From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki" itemprop="image"> In this episode of ODSC’s Ai X Podcast, our guest today, Jason Lopatecki, co-founder and CEO of Arize AI, joins us to discuss GenAI evaluations. Arize AI is a startup that is one of the leaders in AI observability and LLM evaluation. It's the same company behind the very popular open-source evaluation project, Phoenix. Prior to Arize, Jason was the co-founder and chief innovation officer at TubeMogul where he scaled the business into a public company that was eventually acquired by Adobe. SHOW TOPICS: Jason’s background and key moments in his career Arize AI's founding journey and focus on observability and evaluation Primary challenges of evaluating GenAI and foundational models Using LLM / AI as-a-judge Common mistakes to avoid when evaluating LLMs Evaluation-driven development. AI agents, agentic AI, and challenge for evaluation Breaking down AI agents into manageable components. Agent Control Flow and assessing how agents make correct decisions at each step Evaluating individual actions performed by AI agents Retrieval Augmented Generation (RAG) evaluation Ensuring RAG retrieved information is accurate and relevant Risks and benefits of using open-source models vs. proprietary models, Large Language Model evaluation metrics The drawbacks of public benchmarks Practical considerations for creating an effective evaluation pipeline, and how it differs between experimentation and production The advantages of SLMs (Small language Models) Building an LLM task evaluation from scratch, the steps involved SHOW NOTES - Jason Lopatecki, CEO and Co-Founder of Arize AI: https://www.linkedin.com/in/jason-lopatecki-9509941 https://twitter.com/jason_lopatecki Arize AI: https://twitter.com/arizeai - Arize AI blogs https://arize.com/blog/ - Jason’s Talk at ODSC West - Demystifying LLM Evaluation - https://odsc.com/speakers/demystifying-llm-evaluation/ - Foundational Models https://en.wikipedia.org/wiki/Foundation_model - AI Agents https://en.wikipedia.org/wiki/Intelligent_agent - Agentic AI https://venturebeat.com/ai/agentic-ai-a-deep-dive-into-the-future-of-automation/ - Prometheus: Inducing Fine-grained Evaluation Capability in Language Models https://arxiv.org/abs/2310.08491 - Open LLM Leaderboard https://huggingface.co/open-llm-leaderboard - OpenAI o1 https://openai.com/o1/ - Mistral LLMs https://docs.mistral.ai/getting-started/models/models_overview/ - Llama 3.2 https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/ - Evaluation Prompts: https://arize.com/blog-course/evaluating-prompt-playground/ Phoenix - Open Source AI Observability & Evaluation -https://github.com/Arize-ai/phoenix This episode was sponsored by: Ai+ Training https://aiplus.training/ Home to 600+ hours of on-demand, self-paced AI training, live virtual training, and certifications in in-demand skills like LLMs and prompt engineering. And created in partnership with ODSC https://odsc.com/ The Leading AI Training Conference, featuring expert-led, hands-on workshops, training sessions, and talks on cutting-edge AI topics and tools, from data science and machine learning to generative AI to LLMOps Join us at our upcoming and highly anticipated conference ODSC West in South San Francisco October 29-31. <meta itemprop="description" content="In this episode of ODSC’s Ai X Podcast, our guest today, Jason Lopatecki, co-founder and CEO of Arize AI, joins us to discuss GenAI evaluations. Arize AI is a startup that is one of the leaders in AI observability and LLM evaluation. It's the same company behind the very popular open-source evaluation project, Phoenix. Prior to Arize, Jason was the co-founder and chief innovation officer at TubeMogul where he scaled the business into a public company that was eventually acquired by Adobe. SHOW TOPICS: Jason’s background and key moments in his career Arize AI's founding journey and focus on observability and evaluation Primary challenges of evaluating GenAI and foundational models Using LLM / AI as-a-judge Common mistakes to avoid when evaluating LLMs Evaluation-driven development. AI agents, agentic AI, and challenge for evaluation Breaking down AI agents into manageable components. Agent Control Flow and assessing how agents make correct decisions at each step Evaluating individual actions performed by AI agents Retrieval Augmented Generation (RAG) evaluation Ensuring RAG retrieved information is accurate and relevant Risks and benefits of using open-source models vs. proprietary models, Large Language Model evaluation metrics The drawbacks of public benchmarks Practical considerations for creating an effective evaluation pipeline, and how it differs between experimentation and production The advantages of SLMs (Small language Models) Building an LLM task evaluation from scratch, the steps involved SHOW NOTES - Jason Lopatecki, CEO and Co-Founder of Arize AI: https://www.linkedin.com/in/jason-lopatecki-9509941 https://twitter.com/jason_lopatecki Arize AI: https://twitter.com/arizeai - Arize AI blogs https://arize.com/blog/ - Jason’s Talk at ODSC West - Demystifying LLM Evaluation - https://odsc.com/speakers/demystifying-llm-evaluation/ - Foundational Models https://en.wikipedia.org/wiki/Foundation_model - AI Agents https://en.wikipedia.org/wiki/Intelligent_agent - Agentic AI https://venturebeat.com/ai/agentic-ai-a-deep-dive-into-the-future-of-automation/ - Prometheus: Inducing Fine-grained Evaluation Capability in Language Models https://arxiv.org/abs/2310.08491 - Open LLM Leaderboard https://huggingface.co/open-llm-leaderboard - OpenAI o1 https://openai.com/o1/ - Mistral LLMs https://docs.mistral.ai/getting-started/models/models_overview/ - Llama 3.2 https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/ - Evaluation Prompts: https://arize.com/blog-course/evaluating-prompt-playground/ Phoenix - Open Source AI Observability & Evaluation -https://github.com/Arize-ai/phoenix This episode was sponsored by: Ai+ Training https://aiplus.training/ Home to 600+ hours of on-demand, self-paced AI training, live virtual training, and certifications in in-demand skills like LLMs and prompt engineering. And created in partnership with ODSC https://odsc.com/ The Leading AI Training Conference, featuring expert-led, hands-on workshops, training sessions, and talks on cutting-edge AI topics and tools, from data science and machine learning to generative AI to LLMOps Join us at our upcoming and highly anticipated conference ODSC West in South San Francisco October 29-31." /> </p> <dl> <dt>Genre</dt> <dd><a href="/tags/Technology">Technology</a></dd> </dl> <footer> <ul> <li><a href="/aixpodcast/from-llms-to-ai-agents-and-rag-mastering-genai-evaluations-with-jason-lopatecki/likes">Users who like From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki</a></li> <li><a href="/aixpodcast/from-llms-to-ai-agents-and-rag-mastering-genai-evaluations-with-jason-lopatecki/reposts">Users who reposted From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki</a></li> <li><a href="/aixpodcast/from-llms-to-ai-agents-and-rag-mastering-genai-evaluations-with-jason-lopatecki/sets">Playlists containing From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki</a></li> <li><a href="/aixpodcast/from-llms-to-ai-agents-and-rag-mastering-genai-evaluations-with-jason-lopatecki/recommended">More tracks like From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki</a></li> </ul> License: all-rights-reserved </footer> </article> </noscript> <style>#updateBrowserMessage{width:600px;margin:0 auto;position:relative;padding-top:410px;background:url(https://a-v2.sndcdn.com/assets/images/errors/browser-9cdd4e6df7.png) no-repeat 50% 130px;text-align:center;display:none}#updateBrowserMessage .messageText{line-height:26px;font-size:20px;margin-bottom:5px}#updateBrowserMessage .downloadLinks{margin-top:0}</style> <div id="updateBrowserMessage"> <p class="messageText sc-text-light sc-text-secondary"> Your current browser isn't compatible with SoundCloud. <br> Please download one of our supported browsers. <a href="https://help.soundcloud.com/hc/articles/115003564308-Technical-requirements">Need help?</a> </p> <div class="downloadLinks sc-type-h3 sc-text-h3 sc-text-light sc-text-secondary"> <a href="http://google.com/chrome" target="_blank" title="Chrome">Chrome</a> | <a href="http://firefox.com" target="_blank" title="Firefox">Firefox</a> | <a href="http://apple.com/safari" target="_blank" title="Safari">Safari</a> | <a href="https://www.microsoft.com/edge" target="_blank" title="Edge">Edge</a> </div> </div> <script>window.__sc_abortApp&&(window.document.getElementById("updateBrowserMessage").style.display="block")</script> <div id="error__timeout" class="errorPage__inner sc-hidden"> <p class="errorTitle sc-type-h1 sc-text-h1">Sorry! Something went wrong</p> <div class="errorText sc-font-light"> <p>Is your network connection unstable or browser outdated?</p> </div> <div class="errorButtons"> <a class="sc-button" href="https://help.soundcloud.com" target="_blank" id="try-again">I need help</a> </div> </div> <script>function displayError(){if(!window.__sc_abortApp){var r=window.document,e=r.getElementById("error__timeout"),o=r.getElementById("header__loading");e&&o&&(e.className="errorPage__inner",o.className="sc-hidden")}}window.setTimeout(displayError,15e3),window.onerror=displayError</script> <p> <a href="/popular/searches" title="Popular searches">Popular searches</a> </p> </div> <script crossorigin src="https://a-v2.sndcdn.com/assets/54-ed770211.js"></script> <script crossorigin src="https://a-v2.sndcdn.com/assets/52-3a8aeb51.js"></script> <script type="text/javascript"> window.dataLayer = window.dataLayer || []; function gtag() { dataLayer.push(arguments); } gtag('consent', 'default', { 'ad_storage': 'denied', 'analytics_storage': 'denied', 'functionality_storage': 'denied', 'personalization_storage': 'denied', 'security_storage': 'granted', 'ad_user_data': 'denied', 'ad_personalization': 'denied', 'region': [ 'BE', 'EL', 'LT', 'PT', 'BG', 'ES', 'LU', 'RO', 'CZ', 'FR', 'HU', 'SI', 'DK', 'HR', 'MT', 'SK', 'DE', 'IT', 'NL', 'FI', 'EE', 'CY', 'AT', 'SE', 'IE', 'LV', 'PL', 'US-CA' ] }); gtag('consent', 'default', { 'ad_storage': 'granted', 'analytics_storage': 'granted', 'functionality_storage': 'granted', 'personalization_storage': 'granted', 'security_storage': 'granted', 'ad_user_data': 'granted', 'ad_personalization': 'granted' }); </script> <script async src="https://cdn.cookielaw.org/consent/7e62c772-c97a-4d95-8d0a-f99bbeadcf61/otSDKStub.js" type="text/javascript" charset="UTF-8" data-domain-script="7e62c772-c97a-4d95-8d0a-f99bbeadcf61" ></script> <script type="text/javascript"> (function (global) { function OptanonWrapper() { var activeGroups = (global.OptanonActiveGroups || '').split(','); if (Array.isArray(OptanonWrapper.callbacks)) { for (var i = 0, max = OptanonWrapper.callbacks.length; i < max; i++) { try { OptanonWrapper.callbacks[i](activeGroups); } catch (e) {} } } OptanonWrapper.isLoaded = true; }; OptanonWrapper.callbacks = []; OptanonWrapper.isLoaded = false; global.OptanonWrapper = OptanonWrapper; }(window)); </script> <script>window.__sc_version="1732286517"</script> <script>window.__sc_hydration = [{"hydratable":"anonymousId","data":"992105-338664-966272-165110"},{"hydratable":"features","data":{"features":["v2_dsa_report_content_links","mobi_webauth_oauth_mode","mobi_use_auth_internal_analytics","v2_use_onetrust_tcfv2_us_ca","mobi_enable_onetrust_tcfv2","mobi_tracking_send_session_id","mobi_use_onetrust_eu1","mobi_use_onetrust_gb","mobi_use_onetrust_tcfv2_us_ca","mobi_dsa_report_content_form","v2_use_onetrust_user_id_eu2","v2_enable_sourcepoint_tcfv2","mobi_use_onetrust_tcfv2_eu2","v2_test_feature_toggle","checkout_send_segment_events_to_event_gateway","mobi_use_onetrust_user_id_eu1","trolley","v2_nigeria_creator_banner","mobi_use_onetrust_user_id_ex_us","v2_use_drm_transcodings","mobi_use_onetrust_tcfv2_eu1","v2_post_with_caption","mobi_use_dwt","v2_use_onetrust_tcfv2_eu1","mobi_use_onetrust_eu4","featured_artists_banner","v2_repost_redirect_page","v2_use_onetrust_gb","v2_dsa_ad_compliance","use_onetrust_async","mobi_dsa_report_content_links","v2_signals_collection","v2_track_level_distro_to_plan_picker","v2_direct_support_link","checkout_web_products","v2_api_auth_sign_out","v2_ie11_support_end","checkout_use_new_connect","mobi_dsa_ad_compliance","cd_repost_to_artists","v2_enable_crossfade","creator_mid_tier_canada","v2_tracking_moengage_integration","v2_hq_file_storage_release","gql_tracks","creator_plan_names_repositioning","v2_use_onetrust_eu4","v2_stories_onboarding","mobi_use_onetrust_user_id_eu2","mobi_tracking_moengage_integration","v2_use_dwt","v2_use_updated_alert_banner_quota_upsell","v2_enable_onetrust","v2_signed_out_cancellation_flow","v2_import_playlist_experiment","v2_disable_sidebar_comments_count","v2_upload_redirection","v2_subhub_churn_intercept","checkout_use_new_plan_picker","v2_signage_on_home","v2_use_onetrust_eu2","next_pro_first_fans","v2_comscore_udm_2","checkout_creator_coupon_codes_enabled","fpi_messaging_drawer","v2_use_onetrust_us","v2_comment_sorting","checkout_use_recurly_with_paypal","v2_use_onetrust_tcfv2_ex_us","v2_show_for_artists_link","mobi_use_onetrust_eu3","mobi_use_onetrust_elsewhere","v2_use_onetrust_eu3","mobi_use_onetrust_us","v2_oscp_german_tax_fields_support","v2_fallback_queue_for_search","v2_use_onetrust_user_id_ex_us","creator_mid_tier_upgrade_downgrade","v2_use_new_connect","v2_use_onetrust_tcfv2_eu2","mobi_interstitial_ad","v2_get_heard","v2_next_pro_brazil_banner","v2_interstitial_ad","v2_send_segment_events_to_event_gateway","v2_use_onetrust_eu1","v2_enable_sourcepoint","v2_repost_with_caption_graphql","mobi_use_onetrust_tcfv2_ex_us","creator_mid_tier_anz","v2_tags_recent_tracks","sc4a_onboarding_checklist","mobi_use_onetrust_eu2","v2_velvetcake_profile_widget","v2_enable_new_web_errors","v2_use_onetrust_elsewhere","checkout_use_dwt","v2_webauth_use_local_tracking","mobi_sign_in_experiment","mobi_enable_onetrust","v2_can_see_insights","fpi_20_fans_rollout","mobi_trinity","v2_enable_crossfade_upload","request_takedown","v2_webauth_oauth_mode","v2_google_one_tap","v2_enable_pwa","mobi_use_hls_hack","creator_mid_tier_uk","v2_stories","v2_use_onetrust_user_id_eu1","v2_use_onetrust_user_id_global","use_recurly_checkout","v2_show_side_by_side_upsell_experience","v2_enable_onetrust_tcfv2","v2_enable_crossfade_track_manager","v2_enable_tcfv2_consent_string_cache","v2_track_manager_redirection","use_on_soundcloud_short_links","artist_fan_connection_widget","mobi_send_segment_events_to_event_gateway"]}},{"hydratable":"experiments","data":{}},{"hydratable":"geoip","data":{"country_code":"SG","country_name":"Singapore","latitude":1.3673,"longitude":103.8014}},{"hydratable":"privacySettings","data":{"allows_messages_from_unfollowed_users":false,"analytics_opt_in":true,"communications_opt_in":true,"targeted_advertising_opt_in":false,"legislation":[]}},{"hydratable":"trackingBrowserTabId","data":"2e20b5"},{"hydratable":"user","data":{"avatar_url":"https://i1.sndcdn.com/avatars-F1KWIZuK1UzB4LB9-Ik7sHQ-large.jpg","city":null,"comments_count":0,"country_code":null,"created_at":"2017-04-19T20:07:32Z","creator_subscriptions":[{"product":{"id":"creator-pro-unlimited"}}],"creator_subscription":{"product":{"id":"creator-pro-unlimited"}},"description":"With Ai X Podcast, Open Data Science Conference (ODSC) brings its vast experience in building community and its knowledge of the data science and AI fields to the podcast platform. The interests and challenges of the data science community are wide ranging. To reflect this Ai X Podcast will offer a similarly wide range of content, from one-on-one interviews with leading experts, to career talks, to educational interviews, to profiles of AI Startup Founders. Join us every two weeks to discover what’s going on in the data science community.\n\nFind more ODSC lightning interviews, webinars, live trainings, certifications, bootcamps here - https://aiplus.training/ \nDon't miss out on this exciting opportunity to expand your knowledge and stay ahead of the curve.","followers_count":60,"followings_count":0,"first_name":"","full_name":"","groups_count":0,"id":302530435,"kind":"user","last_modified":"2024-01-10T11:48:30Z","last_name":"","likes_count":0,"playlist_likes_count":0,"permalink":"aixpodcast","permalink_url":"https://soundcloud.com/aixpodcast","playlist_count":0,"reposts_count":null,"track_count":46,"uri":"https://api.soundcloud.com/users/302530435","urn":"soundcloud:users:302530435","username":"ODSC's Ai X Podcast","verified":false,"visuals":{"urn":"soundcloud:users:302530435","enabled":true,"visuals":[{"urn":"soundcloud:visuals:186226371","entry_time":0,"visual_url":"https://i1.sndcdn.com/visuals-000302530435-54RFWG-original.jpg"}],"tracking":null},"badges":{"pro":false,"creator_mid_tier":false,"pro_unlimited":true,"verified":false},"station_urn":"soundcloud:system-playlists:artist-stations:302530435","station_permalink":"artist-stations:302530435","url":"/aixpodcast"}},{"hydratable":"sound","data":{"artwork_url":"https://i1.sndcdn.com/artworks-hYzmtVD0oh1zjvfD-xrVRwA-large.jpg","caption":null,"commentable":true,"comment_count":0,"created_at":"2024-10-09T17:54:31Z","description":"In this episode of ODSC’s Ai X Podcast, our guest today, Jason Lopatecki, co-founder and CEO of Arize AI, joins us to discuss GenAI evaluations. \n\nArize AI is a startup that is one of the leaders in AI observability and LLM evaluation. \nIt's the same company behind the very popular open-source evaluation project, Phoenix.\n\nPrior to Arize, Jason was the co-founder and chief innovation officer at TubeMogul \nwhere he scaled the business into a public company that was eventually acquired by Adobe.\n\n\nSHOW TOPICS:\nJason’s background and key moments in his career\nArize AI's founding journey and focus on observability and evaluation \nPrimary challenges of evaluating GenAI and foundational models\nUsing LLM / AI as-a-judge\nCommon mistakes to avoid when evaluating LLMs\nEvaluation-driven development. \nAI agents, agentic AI, and challenge for evaluation\nBreaking down AI agents into manageable components. \nAgent Control Flow and assessing how agents make correct decisions at each step \nEvaluating individual actions performed by AI agents\nRetrieval Augmented Generation (RAG) evaluation \nEnsuring RAG retrieved information is accurate and relevant \nRisks and benefits of using open-source models vs. proprietary models, \nLarge Language Model evaluation metrics \nThe drawbacks of public benchmarks \nPractical considerations for creating an effective evaluation pipeline, and how it differs between experimentation and production\nThe advantages of SLMs (Small language Models) \nBuilding an LLM task evaluation from scratch, the steps involved\n\nSHOW NOTES\n\n- Jason Lopatecki, CEO and Co-Founder of Arize AI: https://www.linkedin.com/in/jason-lopatecki-9509941\nhttps://twitter.com/jason_lopatecki\nArize AI: https://twitter.com/arizeai\n- Arize AI blogs https://arize.com/blog/\n- Jason’s Talk at ODSC West - Demystifying LLM Evaluation - https://odsc.com/speakers/demystifying-llm-evaluation/\n- Foundational Models https://en.wikipedia.org/wiki/Foundation_model\n- AI Agents https://en.wikipedia.org/wiki/Intelligent_agent\n- Agentic AI https://venturebeat.com/ai/agentic-ai-a-deep-dive-into-the-future-of-automation/\n- Prometheus: Inducing Fine-grained Evaluation Capability in Language Models https://arxiv.org/abs/2310.08491\n- Open LLM Leaderboard https://huggingface.co/open-llm-leaderboard\n- OpenAI o1 https://openai.com/o1/\n- Mistral LLMs https://docs.mistral.ai/getting-started/models/models_overview/\n- Llama 3.2 https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/\n- Evaluation Prompts: https://arize.com/blog-course/evaluating-prompt-playground/\n Phoenix - Open Source AI Observability \u0026 Evaluation -https://github.com/Arize-ai/phoenix\nThis episode was sponsored by: \nAi+ Training https://aiplus.training/ \nHome to 600+ hours of on-demand, self-paced AI training, live virtual training, and certifications in in-demand skills like LLMs and prompt engineering.\nAnd created in partnership with ODSC https://odsc.com/ \nThe Leading AI Training Conference, featuring expert-led, hands-on workshops, training sessions, and talks on cutting-edge AI topics and tools, from data science and machine learning to generative AI to LLMOps\nJoin us at our upcoming and highly anticipated conference ODSC West in South San Francisco October 29-31.","downloadable":false,"download_count":0,"duration":2665482,"full_duration":2665482,"embeddable_by":"all","genre":"Technology","has_downloads_left":false,"id":1931816531,"kind":"track","label_name":null,"last_modified":"2024-10-16T15:57:02Z","license":"all-rights-reserved","likes_count":0,"permalink":"from-llms-to-ai-agents-and-rag-mastering-genai-evaluations-with-jason-lopatecki","permalink_url":"https://soundcloud.com/aixpodcast/from-llms-to-ai-agents-and-rag-mastering-genai-evaluations-with-jason-lopatecki","playback_count":6,"public":true,"publisher_metadata":{"id":1931816531,"urn":"soundcloud:tracks:1931816531","contains_music":true},"purchase_title":null,"purchase_url":null,"release_date":null,"reposts_count":0,"secret_token":null,"sharing":"public","state":"finished","streamable":true,"tag_list":"DataScience technology podcast RAG GenAI","title":"From LLMs to AI Agents and RAG: Mastering GenAI Evaluations with Jason Lopatecki","uri":"https://api.soundcloud.com/tracks/1931816531","urn":"soundcloud:tracks:1931816531","user_id":302530435,"visuals":null,"waveform_url":"https://wave.sndcdn.com/LczArPFFtBYA_m.json","display_date":"2024-10-09T17:54:31Z","media":{"transcodings":[{"url":"https://api-v2.soundcloud.com/media/soundcloud:tracks:1931816531/247aee02-5e5a-4d29-aada-d19611b8975d/stream/hls","preset":"mp3_1_0","duration":2665482,"snipped":false,"format":{"protocol":"hls","mime_type":"audio/mpeg"},"quality":"sq"},{"url":"https://api-v2.soundcloud.com/media/soundcloud:tracks:1931816531/247aee02-5e5a-4d29-aada-d19611b8975d/stream/progressive","preset":"mp3_1_0","duration":2665482,"snipped":false,"format":{"protocol":"progressive","mime_type":"audio/mpeg"},"quality":"sq"},{"url":"https://api-v2.soundcloud.com/media/soundcloud:tracks:1931816531/790415bf-47e8-4e66-9593-fc874003a7fc/stream/hls","preset":"opus_0_0","duration":2665447,"snipped":false,"format":{"protocol":"hls","mime_type":"audio/ogg; codecs=\"opus\""},"quality":"sq"}]},"station_urn":"soundcloud:system-playlists:track-stations:1931816531","station_permalink":"track-stations:1931816531","track_authorization":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJnZW8iOiJTRyIsInN1YiI6IiIsInJpZCI6IiIsImlhdCI6MTczMjM0NTE3MX0.CGa7mxqOVSZWyjI4AuqGGlXXU-H_WECCEVnvxFZfpnM","monetization_model":"NOT_APPLICABLE","policy":"ALLOW","user":{"avatar_url":"https://i1.sndcdn.com/avatars-F1KWIZuK1UzB4LB9-Ik7sHQ-large.jpg","city":null,"comments_count":0,"country_code":null,"created_at":"2017-04-19T20:07:32Z","creator_subscriptions":[{"product":{"id":"creator-pro-unlimited"}}],"creator_subscription":{"product":{"id":"creator-pro-unlimited"}},"description":"With Ai X Podcast, Open Data Science Conference (ODSC) brings its vast experience in building community and its knowledge of the data science and AI fields to the podcast platform. The interests and challenges of the data science community are wide ranging. To reflect this Ai X Podcast will offer a similarly wide range of content, from one-on-one interviews with leading experts, to career talks, to educational interviews, to profiles of AI Startup Founders. Join us every two weeks to discover what’s going on in the data science community.\n\nFind more ODSC lightning interviews, webinars, live trainings, certifications, bootcamps here - https://aiplus.training/ \nDon't miss out on this exciting opportunity to expand your knowledge and stay ahead of the curve.","followers_count":60,"followings_count":0,"first_name":"","full_name":"","groups_count":0,"id":302530435,"kind":"user","last_modified":"2024-01-10T11:48:30Z","last_name":"","likes_count":0,"playlist_likes_count":0,"permalink":"aixpodcast","permalink_url":"https://soundcloud.com/aixpodcast","playlist_count":0,"reposts_count":null,"track_count":46,"uri":"https://api.soundcloud.com/users/302530435","urn":"soundcloud:users:302530435","username":"ODSC's Ai X Podcast","verified":false,"visuals":{"urn":"soundcloud:users:302530435","enabled":true,"visuals":[{"urn":"soundcloud:visuals:186226371","entry_time":0,"visual_url":"https://i1.sndcdn.com/visuals-000302530435-54RFWG-original.jpg"}],"tracking":null},"badges":{"pro":false,"creator_mid_tier":false,"pro_unlimited":true,"verified":false},"station_urn":"soundcloud:system-playlists:artist-stations:302530435","station_permalink":"artist-stations:302530435"}}}];</script> <script src="https://a-v2.sndcdn.com/assets/19-af85e3dc.js" crossorigin></script> <script crossorigin src="https://a-v2.sndcdn.com/assets/53-4d8c8679.js"></script> <script crossorigin src="https://a-v2.sndcdn.com/assets/1-65e60bef.js"></script> <script crossorigin src="https://a-v2.sndcdn.com/assets/0-2b2a72c9.js"></script> <script crossorigin src="https://a-v2.sndcdn.com/assets/2-82d925e8.js"></script> <script crossorigin src="https://a-v2.sndcdn.com/assets/51-8fe2a844.js"></script> <script crossorigin src="https://a-v2.sndcdn.com/assets/50-35371dbe.js"></script> </body> </html>