CINXE.COM

Starlet #22 Giskard: Open-Source Evaluation & Testing for LLMs and ML models

<!DOCTYPE html><html lang="en"> <head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width"/><link rel="icon" href="/assets/favicon.ico"/><script defer="" data-domain="star-history.com" src="https://plausible.io/js/script.js"></script><meta name="next-head-count" content="4"/><link data-next-font="size-adjust" rel="preconnect" href="/" crossorigin="anonymous"/><link rel="preload" href="/_next/static/css/f94657194d4c857a.css" as="style" crossorigin=""/><link rel="stylesheet" href="/_next/static/css/f94657194d4c857a.css" crossorigin="" data-n-g=""/><noscript data-n-css=""></noscript><script defer="" crossorigin="" nomodule="" src="/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js"></script><script src="/_next/static/chunks/webpack-38cee4c0e358b1a3.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/framework-fda0a023b274c574.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/main-001c9e19b1894c7d.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/pages/_app-915effad870aa62e.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/6c86d9ce-d8b7531786dd65a5.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/472-8057db644de3d496.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/590-d0a3c67c09cc0662.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/pages/blog/%5Bslug%5D-7b378153203b51eb.js" defer="" crossorigin=""></script><script src="/_next/static/xKX4ZiOi_N7h3OBOEsSZu/_buildManifest.js" defer="" crossorigin=""></script><script src="/_next/static/xKX4ZiOi_N7h3OBOEsSZu/_ssgManifest.js" defer="" crossorigin=""></script></head><body><div id="__next"><div class="relative w-full h-auto min-h-screen overflow-auto flex flex-col"><title>Starlet #22 Giskard: Open-Source Evaluation &amp; Testing for LLMs and ML models</title><meta name="description" content="Giskard develops an open-source AI testing framework that helps to identify risks of AI models - with a specific focus on LLM agents."/><meta property="og:type" content="website"/><meta property="og:url" content="https://star-history.com/blog/giskard"/><meta property="og:title" content="Starlet #22 Giskard: Open-Source Evaluation &amp; Testing for LLMs and ML models"/><meta property="og:description" content="Giskard develops an open-source AI testing framework that helps to identify risks of AI models - with a specific focus on LLM agents."/><meta property="og:image" content="https://star-history.com/assets/blog/giskard/banner.webp"/><meta name="twitter:card" content="summary_large_image"/><meta name="twitter:url" content="https://star-history.com/blog/giskard"/><meta name="twitter:title" content="Starlet #22 Giskard: Open-Source Evaluation &amp; Testing for LLMs and ML models"/><meta name="twitter:description" content="Giskard develops an open-source AI testing framework that helps to identify risks of AI models - with a specific focus on LLM agents."/><meta name="twitter:image" content="https://star-history.com/assets/blog/giskard/banner.webp"/><nav><div class="flex justify-center items-center gap-x-6 bg-green-600 px-6 py-1 sm:px-3.5 "><p class="text-sm leading-6 text-white"><a href="/blog/list-your-open-source-project">Want to promote your open source project? Be on our ⭐️Starlet List⭐️ for FREE →</a></p></div></nav><header class="w-full h-14 shrink-0 flex flex-row justify-center items-center bg-[#363636] text-light"><div class="w-full md:max-w-5xl lg:max-w-7xl h-full flex flex-row justify-between items-center px-0 sm:px-4"><div class="h-full bg-dark flex flex-row justify-start items-center"><a class="h-full flex flex-row justify-center items-center px-3 hover:bg-zinc-800" href="/"><img class="w-7 h-auto" src="/assets/icon.png" alt="Logo"/></a><a class="h-full flex flex-row justify-center items-center text-base px-3 hover:bg-zinc-800" href="/blog"><span class="text-white font-semibold -2">Blog</span></a><span class="h-full flex flex-row justify-center items-center cursor-pointer text-white text-base px-3 font-semibold mr-2 hover:bg-zinc-800">Add Access Token</span></div><div class="hidden h-full md:flex flex-row justify-start items-center"><a target="_blank" rel="noopener noreferrer" class="h-full flex text-white text-base flex-row justify-center items-center px-4 hover:bg-zinc-800" href="https://www.bytebase.com/?source=star-history"><img class="h-6 mt-1 mr-2" src="/assets/craft-by-bytebase.webp" alt=""/></a></div><div class="h-full hidden md:flex flex-row justify-end items-center space-x-2"><a class="h-full flex flex-row justify-center items-center px-2 hover:bg-zinc-800" href="https://twitter.com/StarHistoryHQ" target="_blank" rel="noopener noreferrer"><i class="fab fa-twitter text-2xl text-blue-300"></i></a></div><div class="h-full flex md:hidden flex-row justify-end items-center"><span class="relative h-full w-10 px-3 flex flex-row justify-center items-center cursor-pointer font-semibold text-light hover:bg-zinc-800"><span class="w-4 transition-all h-px bg-light absolute top-1/2 -mt-1"></span><span class="w-4 transition-all h-px bg-light absolute top-1/2 "></span><span class="w-4 transition-all h-px bg-light absolute top-1/2 mt-1"></span></span></div></div></header><div class="w-full h-auto py-2 flex md:hidden flex-col justify-start items-start shadow-lg border-b hidden"><a class="h-12 text-base px-3 w-full flex flex-row justify-start items-center cursor-pointer font-semibold text-dark mr-2 hover:bg-gray-100 hover:text-blue-500" href="/blog/how-to-use-github-star-history">📕 How to use this site</a><span class="h-12 px-3 text-base w-full flex flex-row justify-start items-center cursor-pointer font-semibold text-dark mr-2 hover:bg-gray-100 hover:text-blue-500">Add Access Token</span><span class="h-12 text-base px-3 w-full flex flex-row justify-start items-center"><a class="github-button -mt-1" href="https://github.com/star-history/star-history" data-show-count="true" aria-label="Star star-history/star-history on GitHub" target="_blank" rel="noopener noreferrer">Star</a></span></div><div class="w-full h-auto grow lg:grid lg:grid-cols-[256px_1fr_256px]"><div class="w-full hidden lg:block"><div class="flex flex-col justify-start items-start w-full mt-2 p-4 pl-8"><a class="hover:opacity-75" href="/blog/list-your-open-source-project"><img class="w-auto max-w-full" src="/assets/starlet-icon.webp"/></a><div><div class="w-full flex flex-row justify-between items-center my-2"><h3 class="text-sm font-medium text-gray-400 leading-6">Playbook</h3></div><ul class="list-disc list-inside"><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/how-to-use-github-star-history"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">📕 How to Use this Site</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/playbook-for-more-github-stars"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">⭐️ How to Get More Stars</span></a></li></ul></div><div><div class="w-full flex flex-row justify-between items-center my-2"><h3 class="text-sm font-medium text-gray-400 leading-6">Monthly Pick</h3></div><ul class="list-disc list-inside"><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/ai-devtools"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Nov (AI DevTools)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/homelab"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Oct (Homelab)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/ai-agents"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Sep (AI Agents)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/rag-frameworks"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Aug (RAG frameworks)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/ai-generators"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Jul (AI Generators)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/ai-search"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Jun (AI Searches)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/ai-web-scraper"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 May (AI Web Scraper)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/prompt-engineering"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Apr (AI Prompt)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/non-ai"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Mar (Non-AI)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/most-underrated"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Feb (Most Underrated)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/text2sql"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Jan (Text2SQL)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/gpt-wrappers"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Dec (GPT Wrappers)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/tts"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Nov (TTS)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/ai-for-postgres"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Oct (AI for Postgres)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/coding-ai"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Sept (Coding AI)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/cli-tool-for-llm"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Aug (CLI tool for LLMs)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/llama2"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 July (Llama 2 Edition)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-monthly-pick-202306"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 June</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-monthly-pick-202305"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 May</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-monthly-pick-202304"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Apr</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-monthly-pick-202303"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Mar (ChatGPT Edition)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-monthly-pick-202302"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Feb</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-monthly-pick-202301"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Jan</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-monthly-pick-202212"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2022 Dec</span></a></li></ul></div><div><div class="w-full flex flex-row justify-between items-center my-2"><h3 class="text-sm font-medium text-gray-400 leading-6">Yearly Pick</h3></div><ul class="list-disc list-inside"><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/best-of-2023"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-yearly-pick-2022-data-infra-devtools"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2022 Data, Infra &amp; DevTools</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-open-source-2022-platform-engineering"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2022 Platform Engineering</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-open-source-2022-open-source-alternatives"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2022 OSS Alternatives</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-yearly-pick-2022-frontend"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2022 Front-end</span></a></li></ul></div><div><div class="w-full flex flex-row justify-between items-center my-2"><h3 class="text-sm font-medium text-gray-400 leading-6">Starlet List</h3></div><ul class="list-disc list-inside"><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/list-your-open-source-project"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">🎁 Prompt yours for FREE</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/trench"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #28 - Trench</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/langfuse"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #27 - langfuse</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/thepipe"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #26 - thepi.pe</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/taipy"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #25 - Taipy</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/superlinked"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #24 - Superlinked</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/tea-tasting"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #23 - tea-tasting</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/giskard"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #22 - Giskard</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/khoj"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #21 - Khoj</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/paradedb"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #20 - ParadeDB</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/skyvern"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #19 - Skyvern</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/prisma"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #18 - Prisma</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/spicedb"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #17 - SpiceDB</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/answer"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #16 - Apache Answer</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/infinity"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #15 - Infinity</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/proton"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #14 - Proton</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/earthly"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #13 - Earthly</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/wasp"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #12 - Wasp</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/libsql"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #11 - libSQL</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/postgresml"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #10 - PostgresML</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/electricsql"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #9 - ElectricSQL</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/prompt-flow"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #8 - Prompt flow</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/clipboard"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #7 - Clipboard</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/hoppscotch"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #6 - Hoppscotch</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/metisfl"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #5 - MetisFL</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/chatgpt-js"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #4 - chatgpt.js</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/mockoon"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #3 - Mockoon</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/dlta-ai"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #2 - DLTA-AI</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/sniffnet"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #1 - Sniffnet</span></a></li></ul></div></div></div><div class="w-full flex flex-col justify-start items-center"><div class="w-full p-4 md:p-0 mt-6 md:w-5/6 lg:max-w-6xl h-full flex flex-col justify-start items-center self-center"><img class="hidden md:block w-auto max-w-full object-scale-down" src="/assets/blog/giskard/banner.webp" alt=""/><div class="w-auto max-w-6xl mt-4 md:mt-12 prose prose-indigo prose-xl md:prose-2xl flex flex-col justify-center items-center"><h1 class="leading-16">Starlet #22 Giskard: Open-Source Evaluation &amp; Testing for LLMs and ML models</h1></div><div class="w-full mt-8 mb-2 max-w-6xl px-2 flex flex-row items-center justify-center text-sm text-gray-900 font-semibold trackingwide uppercase"><div class="flex space-x-1 text-gray-500"><span class="text-gray-900">Blanca Rivera Campos</span><span aria-hidden="true"> · </span><time dateTime="2024-06-21:00:00.000Z">Jun 21, 2024</time><span aria-hidden="true"> · </span><span> <!-- -->3<!-- --> min read </span></div></div><div class="mt-8 w-full max-w-5xl prose prose-indigo prose-xl md:prose-2xl"><p><em>This is the twenty-second issue of The Starlet List. If you want to prompt your open source project on star-history.com for free, please check out our <a href="/blog/list-your-open-source-project">announcement</a>.</em></p> <hr> <p>Giskard develops an open-source <strong>AI testing framework</strong> that helps to identify risks of AI models - with a specific focus on LLM agents. This helps AI developers save time by <em><strong>automating the evaluation</strong></em> process, and save money by avoiding costly AI incidents.</p> <h2>Problem: Why is AI testing important?</h2> <p>LLMs are susceptible to issues that traditional models don&#39;t face, such as hallucinations, prompt injection, and vulnerabilities to adversarial attacks. Developers often struggle with where to start, what issues to focus on, and how to implement effective tests for these risks.</p> <p>Meanwhile, the pressure to deploy LLMs quickly is constant, often pushing models into production with hidden vulnerabilities. But failures in LLMs can lead to legal liability, reputational damage, and costly service disruptions.</p> <p>This is why we created Giskard, an open-source framework to test AI models automatically, with a holistic coverage of AI risks, specifically designed to address the challenges of testing LLMs.</p> <h2>Solution: Giskard’s Open-Source Evaluation &amp; Testing for LLMs and ML models</h2> <p>Our open-source solution offers a Python library that you can use in your notebook. It helps you <strong>test AI models</strong>, whether it&#39;s an <em><strong>LLM application</strong></em> such as a <em><strong>RAG agent</strong></em> or a tabular model. </p> <p>With the <code>giskard.scan</code> method, our library can <strong>detect the main vulnerabilities</strong> of AI models using advanced adversarial testing techniques. For LLMs, it helps you detect hallucinations, sensitive information disclosure, prompt injections, and more. </p> <p><img src="/assets/blog/giskard/llm-scan.webp" alt="Giskard LLM scan for vulneabilities"></p> <p>You can export this scan with a customizable test suite, which you can integrate into your CI/CD pipeline.</p> <p>If you&#39;re testing a RAG application, you can automate its evaluation using RAGET, Giskard&#39;s RAG Evaluation Toolkit. It generates realistic test cases automatically to detect weaknesses and evaluate answer correctness across your RAG agent components.</p> <p><img src="/assets/blog/giskard/raget.webp" alt="Giskard RAG Evaluation toolkit"></p> <p><img src="/assets/blog/giskard/raget-plot.webp" alt="Giskard RAG Evaluation toolkit"></p> <h2>Open-source and fully integrated with the ML and LLM ecosystem</h2> <ul> <li>Fully open-source Python library</li> <li>Compatible with all proprietary and open-source LLM providers</li> <li>Integrated with:<ul> <li>Databricks MLFlow</li> <li>Pytest</li> <li>GitHub</li> <li>Nvidia NeMo Guardrails</li> <li>And more!</li> </ul> </li> </ul> <p><img src="/assets/blog/giskard/integrations.webp" alt="Giskard&#39;s integrations"></p> <h2>Getting started</h2> <p>Install Giskard</p> <pre><code class="language-sh">pip install &quot;giskard[llm]&quot; </code></pre> <p>In this <a href="https://colab.research.google.com/github/giskard-ai/giskard/blob/main/docs/getting_started/quickstart/quickstart_llm.ipynb">tutorial</a> we will use Giskard’s LLM Scan to automatically detect issues on a Retrieval Augmented Generation (RAG) task.</p> <p><img src="/assets/blog/giskard/notebook.webp" alt="LLM Quickstart"></p> <p>For more details, check out the <a href="https://docs.giskard.ai/en/stable/getting_started/quickstart/quickstart_llm.html">docs</a>.</p> <h2>Future development</h2> <p>Giskard is simplifying AI testing for developers. We want to become the go-to testing library for Data Scientists and AI Engineers and empower you to build AI applications without the usual complexities.</p> <p>To stay up-to-date with our progress or contribute to the project, we welcome your feedback, questions, and comments. You can check out our <a href="https://github.com/Giskard-AI/giskard">GitHub</a> or join our community on <a href="https://discord.com/invite/ABvfpbu69R">Discord</a> to chat with us directly!</p> </div></div><div class="mt-12"><iframe src="https://embeds.beehiiv.com/2803dbaa-d8dd-4486-8880-4b843f3a7da6?slim=true" data-test-id="beehiiv-embed" height="52" frameBorder="0" scrolling="no" style="margin:0;border-radius:0px !important;background-color:transparent"></iframe></div></div><div class="w-full hidden lg:block"></div></div><footer class="relative w-full shrink-0 h-auto mt-6 flex flex-col justify-end items-center"><div class="w-full py-2 px-3 md:w-5/6 lg:max-w-7xl flex flex-row flex-wrap justify-between items-center text-neutral-700 border-t"><div class="text-sm leading-8 flex flex-row flex-wrap justify-start items-center"><div class="h-full text-gray-600">The missing GitHub star history graph</div><a class="h-full flex flex-row justify-center items-center ml-3 text-lg hover:opacity-80" href="https://twitter.com/StarHistoryHQ" target="_blank" rel="noopener noreferrer"><svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 512 512" height="1em" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg></a><a class="h-full flex flex-row justify-center items-center mx-3 text-lg hover:opacity-80" href="mailto:star@bytebase.com" target="_blank" rel="noopener noreferrer"><svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 512 512" height="1em" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M502.3 190.8c3.9-3.1 9.7-.2 9.7 4.7V400c0 26.5-21.5 48-48 48H48c-26.5 0-48-21.5-48-48V195.6c0-5 5.7-7.8 9.7-4.7 22.4 17.4 52.1 39.5 154.1 113.6 21.1 15.4 56.7 47.8 92.2 47.6 35.7.3 72-32.8 92.3-47.6 102-74.1 131.6-96.3 154-113.7zM256 320c23.2.4 56.6-29.2 73.4-41.4 132.7-96.3 142.8-104.7 173.4-128.7 5.8-4.5 9.2-11.5 9.2-18.9v-19c0-26.5-21.5-48-48-48H48C21.5 64 0 85.5 0 112v19c0 7.4 3.4 14.3 9.2 18.9 30.6 23.9 40.7 32.4 173.4 128.7 16.8 12.2 50.2 41.8 73.4 41.4z"></path></svg></a><a class="h-full flex flex-row justify-center items-center mr-3 text-lg hover:opacity-80" href="https://github.com/star-history/star-history" target="_blank" rel="noopener noreferrer"><svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 496 512" height="1em" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg></a></div><div class="flex flex-row flex-wrap items-center space-x-4"><div class="flex flex-row text-sm leading-8 underline text-blue-700 hover:opacity-80"><img class="h-6 mt-1 mr-2" src="/assets/sqlchat.webp" alt="SQL Chat"/><a href="https://sqlchat.ai" target="_blank" rel="noopener noreferrer"> <!-- -->SQL Chat<!-- --> </a></div><div class="flex flex-row text-sm leading-8 underline text-blue-700 hover:opacity-80"><img class="h-6 mt-1 mr-2" src="/assets/dbcost.webp" alt="DB Cost"/><a href="https://dbcost.com" target="_blank" rel="noopener noreferrer">DB Cost</a></div></div><div class="text-xs leading-8 flex flex-row flex-nowrap justify-end items-center"><span class="text-gray-600">Maintained by<!-- --> <a class="text-blue-500 font-bold hover:opacity-80" href="https://bytebase.com" target="_blank" rel="noopener noreferrer">Bytebase</a>, originally built by<!-- --> <a class="bg-blue-400 text-white p-1 pl-2 pr-2 rounded-l-2xl rounded-r-2xl hover:opacity-80" href="https://twitter.com/tim_qian" target="_blank" rel="noopener noreferrer">@tim_qian</a></span></div></div></footer><div class="fixed right-0 top-32 hidden lg:flex flex-col justify-start items-start transition-all bg-white w-48 xl:w-56 p-2 z-10 "><div class="w-full flex justify-between items-center mb-2"><p class="text-xs text-gray-400">Sponsors (random order)</p><svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 352 512" class="fas fa-times text-xs text-gray-400 cursor-pointer hover:text-gray-500" height="1em" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M242.72 256l100.07-100.07c12.28-12.28 12.28-32.19 0-44.48l-22.24-22.24c-12.28-12.28-32.19-12.28-44.48 0L176 189.28 75.93 89.21c-12.28-12.28-32.19-12.28-44.48 0L9.21 111.45c-12.28 12.28-12.28 32.19 0 44.48L109.28 256 9.21 356.07c-12.28 12.28-12.28 32.19 0 44.48l22.24 22.24c12.28 12.28 32.2 12.28 44.48 0L176 322.72l100.07 100.07c12.28 12.28 32.2 12.28 44.48 0l22.24-22.24c12.28-12.28 12.28-32.19 0-44.48L242.72 256z"></path></svg></div><a href="https://dify.ai/?utm_source=star-history" class="bg-gray-50 p-2 rounded w-full flex flex-col justify-center items-center mb-2 text-zinc-600 hover:opacity-80 hover:text-blue-600 hover:underline" target="_blank"><img class="w-auto max-w-full" src="/assets/sponsors/dify/logo.webp" alt="Dify"/><span class="text-xs mt-2">Dify: Open-source platform for building LLM apps, from agents to AI workflows.</span></a><a href="https://bytebase.com?utm_source=star-history" class="bg-gray-50 p-2 rounded w-full flex flex-col justify-center items-center mb-2 text-zinc-600 hover:opacity-80 hover:text-blue-600 hover:underline" target="_blank"><img class="w-auto max-w-full" src="/assets/sponsors/bytebase/logo.webp" alt="Bytebase"/><span class="text-xs mt-2">Bytebase: Database DevOps and CI/CD for MySQL, PG, Oracle, SQL Server, Snowflake, ClickHouse, Mongo, Redis</span></a><a href="mailto:star@bytebase.com?subject=I&#x27;m interested in sponsoring star-history.com" target="_blank" class="w-full p-2 text-center bg-gray-50 text-xs leading-6 text-gray-400 rounded hover:underline hover:text-blue-600">Your logo</a></div></div></div><script id="__NEXT_DATA__" type="application/json" crossorigin="">{"props":{"pageProps":{"blog":{"title":"Starlet #22 Giskard: Open-Source Evaluation \u0026 Testing for LLMs and ML models","slug":"giskard","author":"Blanca Rivera Campos","featured":true,"featureImage":"/assets/blog/giskard/banner.webp","publishedDate":"2024-06-21:00:00.000Z","excerpt":"Giskard develops an open-source AI testing framework that helps to identify risks of AI models - with a specific focus on LLM agents.","readingTime":3},"parsedBlogHTML":"\u003cp\u003e\u003cem\u003eThis is the twenty-second issue of The Starlet List. If you want to prompt your open source project on star-history.com for free, please check out our \u003ca href=\"/blog/list-your-open-source-project\"\u003eannouncement\u003c/a\u003e.\u003c/em\u003e\u003c/p\u003e\n\u003chr\u003e\n\u003cp\u003eGiskard develops an open-source \u003cstrong\u003eAI testing framework\u003c/strong\u003e that helps to identify risks of AI models - with a specific focus on LLM agents. This helps AI developers save time by \u003cem\u003e\u003cstrong\u003eautomating the evaluation\u003c/strong\u003e\u003c/em\u003e process, and save money by avoiding costly AI incidents.\u003c/p\u003e\n\u003ch2\u003eProblem: Why is AI testing important?\u003c/h2\u003e\n\u003cp\u003eLLMs are susceptible to issues that traditional models don\u0026#39;t face, such as hallucinations, prompt injection, and vulnerabilities to adversarial attacks. Developers often struggle with where to start, what issues to focus on, and how to implement effective tests for these risks.\u003c/p\u003e\n\u003cp\u003eMeanwhile, the pressure to deploy LLMs quickly is constant, often pushing models into production with hidden vulnerabilities. But failures in LLMs can lead to legal liability, reputational damage, and costly service disruptions.\u003c/p\u003e\n\u003cp\u003eThis is why we created Giskard, an open-source framework to test AI models automatically, with a holistic coverage of AI risks, specifically designed to address the challenges of testing LLMs.\u003c/p\u003e\n\u003ch2\u003eSolution: Giskard’s Open-Source Evaluation \u0026amp; Testing for LLMs and ML models\u003c/h2\u003e\n\u003cp\u003eOur open-source solution offers a Python library that you can use in your notebook. It helps you \u003cstrong\u003etest AI models\u003c/strong\u003e, whether it\u0026#39;s an \u003cem\u003e\u003cstrong\u003eLLM application\u003c/strong\u003e\u003c/em\u003e such as a \u003cem\u003e\u003cstrong\u003eRAG agent\u003c/strong\u003e\u003c/em\u003e or a tabular model. \u003c/p\u003e\n\u003cp\u003eWith the \u003ccode\u003egiskard.scan\u003c/code\u003e method, our library can \u003cstrong\u003edetect the main vulnerabilities\u003c/strong\u003e of AI models using advanced adversarial testing techniques. For LLMs, it helps you detect hallucinations, sensitive information disclosure, prompt injections, and more. \u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"/assets/blog/giskard/llm-scan.webp\" alt=\"Giskard LLM scan for vulneabilities\"\u003e\u003c/p\u003e\n\u003cp\u003eYou can export this scan with a customizable test suite, which you can integrate into your CI/CD pipeline.\u003c/p\u003e\n\u003cp\u003eIf you\u0026#39;re testing a RAG application, you can automate its evaluation using RAGET, Giskard\u0026#39;s RAG Evaluation Toolkit. It generates realistic test cases automatically to detect weaknesses and evaluate answer correctness across your RAG agent components.\u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"/assets/blog/giskard/raget.webp\" alt=\"Giskard RAG Evaluation toolkit\"\u003e\u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"/assets/blog/giskard/raget-plot.webp\" alt=\"Giskard RAG Evaluation toolkit\"\u003e\u003c/p\u003e\n\u003ch2\u003eOpen-source and fully integrated with the ML and LLM ecosystem\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eFully open-source Python library\u003c/li\u003e\n\u003cli\u003eCompatible with all proprietary and open-source LLM providers\u003c/li\u003e\n\u003cli\u003eIntegrated with:\u003cul\u003e\n\u003cli\u003eDatabricks MLFlow\u003c/li\u003e\n\u003cli\u003ePytest\u003c/li\u003e\n\u003cli\u003eGitHub\u003c/li\u003e\n\u003cli\u003eNvidia NeMo Guardrails\u003c/li\u003e\n\u003cli\u003eAnd more!\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cimg src=\"/assets/blog/giskard/integrations.webp\" alt=\"Giskard\u0026#39;s integrations\"\u003e\u003c/p\u003e\n\u003ch2\u003eGetting started\u003c/h2\u003e\n\u003cp\u003eInstall Giskard\u003c/p\u003e\n\u003cpre\u003e\u003ccode class=\"language-sh\"\u003epip install \u0026quot;giskard[llm]\u0026quot;\n\u003c/code\u003e\u003c/pre\u003e\n\u003cp\u003eIn this \u003ca href=\"https://colab.research.google.com/github/giskard-ai/giskard/blob/main/docs/getting_started/quickstart/quickstart_llm.ipynb\"\u003etutorial\u003c/a\u003e we will use Giskard’s LLM Scan to automatically detect issues on a Retrieval Augmented Generation (RAG) task.\u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"/assets/blog/giskard/notebook.webp\" alt=\"LLM Quickstart\"\u003e\u003c/p\u003e\n\u003cp\u003eFor more details, check out the \u003ca href=\"https://docs.giskard.ai/en/stable/getting_started/quickstart/quickstart_llm.html\"\u003edocs\u003c/a\u003e.\u003c/p\u003e\n\u003ch2\u003eFuture development\u003c/h2\u003e\n\u003cp\u003eGiskard is simplifying AI testing for developers. We want to become the go-to testing library for Data Scientists and AI Engineers and empower you to build AI applications without the usual complexities.\u003c/p\u003e\n\u003cp\u003eTo stay up-to-date with our progress or contribute to the project, we welcome your feedback, questions, and comments. You can check out our \u003ca href=\"https://github.com/Giskard-AI/giskard\"\u003eGitHub\u003c/a\u003e or join our community on \u003ca href=\"https://discord.com/invite/ABvfpbu69R\"\u003eDiscord\u003c/a\u003e to chat with us directly!\u003c/p\u003e\n"},"__N_SSG":true},"page":"/blog/[slug]","query":{"slug":"giskard"},"buildId":"xKX4ZiOi_N7h3OBOEsSZu","isFallback":false,"gsp":true,"scriptLoader":[]}</script></body></html>

Pages: 1 2 3 4 5 6 7 8 9 10