CINXE.COM

Starlet #15 Infinity: the AI-Native Database Powering the Next-Gen RAG for LLM

<!DOCTYPE html><html lang="en"> <head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width"/><link rel="icon" href="/assets/favicon.ico"/><script defer="" data-domain="star-history.com" src="https://plausible.io/js/script.js"></script><meta name="next-head-count" content="4"/><link data-next-font="size-adjust" rel="preconnect" href="/" crossorigin="anonymous"/><link rel="preload" href="/_next/static/css/f94657194d4c857a.css" as="style" crossorigin=""/><link rel="stylesheet" href="/_next/static/css/f94657194d4c857a.css" crossorigin="" data-n-g=""/><noscript data-n-css=""></noscript><script defer="" crossorigin="" nomodule="" src="/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js"></script><script src="/_next/static/chunks/webpack-38cee4c0e358b1a3.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/framework-fda0a023b274c574.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/main-001c9e19b1894c7d.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/pages/_app-915effad870aa62e.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/6c86d9ce-d8b7531786dd65a5.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/472-8057db644de3d496.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/590-d0a3c67c09cc0662.js" defer="" crossorigin=""></script><script src="/_next/static/chunks/pages/blog/%5Bslug%5D-7b378153203b51eb.js" defer="" crossorigin=""></script><script src="/_next/static/xKX4ZiOi_N7h3OBOEsSZu/_buildManifest.js" defer="" crossorigin=""></script><script src="/_next/static/xKX4ZiOi_N7h3OBOEsSZu/_ssgManifest.js" defer="" crossorigin=""></script></head><body><div id="__next"><div class="relative w-full h-auto min-h-screen overflow-auto flex flex-col"><title>Starlet #15 Infinity: the AI-Native Database Powering the Next-Gen RAG for LLM</title><meta name="description" content="Infinity is an AI-native database specifically designed to cater to large models and is primarily used for Retrieval Augmented Generation (RAG)."/><meta property="og:type" content="website"/><meta property="og:url" content="https://star-history.com/blog/infinity"/><meta property="og:title" content="Starlet #15 Infinity: the AI-Native Database Powering the Next-Gen RAG for LLM"/><meta property="og:description" content="Infinity is an AI-native database specifically designed to cater to large models and is primarily used for Retrieval Augmented Generation (RAG)."/><meta property="og:image" content="https://star-history.com/assets/blog/infinity/banner.webp"/><meta name="twitter:card" content="summary_large_image"/><meta name="twitter:url" content="https://star-history.com/blog/infinity"/><meta name="twitter:title" content="Starlet #15 Infinity: the AI-Native Database Powering the Next-Gen RAG for LLM"/><meta name="twitter:description" content="Infinity is an AI-native database specifically designed to cater to large models and is primarily used for Retrieval Augmented Generation (RAG)."/><meta name="twitter:image" content="https://star-history.com/assets/blog/infinity/banner.webp"/><nav><div class="flex justify-center items-center gap-x-6 bg-green-600 px-6 py-1 sm:px-3.5 "><p class="text-sm leading-6 text-white"><a href="/blog/list-your-open-source-project">Want to promote your open source project? Be on our ⭐️Starlet List⭐️ for FREE →</a></p></div></nav><header class="w-full h-14 shrink-0 flex flex-row justify-center items-center bg-[#363636] text-light"><div class="w-full md:max-w-5xl lg:max-w-7xl h-full flex flex-row justify-between items-center px-0 sm:px-4"><div class="h-full bg-dark flex flex-row justify-start items-center"><a class="h-full flex flex-row justify-center items-center px-3 hover:bg-zinc-800" href="/"><img class="w-7 h-auto" src="/assets/icon.png" alt="Logo"/></a><a class="h-full flex flex-row justify-center items-center text-base px-3 hover:bg-zinc-800" href="/blog"><span class="text-white font-semibold -2">Blog</span></a><span class="h-full flex flex-row justify-center items-center cursor-pointer text-white text-base px-3 font-semibold mr-2 hover:bg-zinc-800">Add Access Token</span></div><div class="hidden h-full md:flex flex-row justify-start items-center"><a target="_blank" rel="noopener noreferrer" class="h-full flex text-white text-base flex-row justify-center items-center px-4 hover:bg-zinc-800" href="https://www.bytebase.com/?source=star-history"><img class="h-6 mt-1 mr-2" src="/assets/craft-by-bytebase.webp" alt=""/></a></div><div class="h-full hidden md:flex flex-row justify-end items-center space-x-2"><a class="h-full flex flex-row justify-center items-center px-2 hover:bg-zinc-800" href="https://twitter.com/StarHistoryHQ" target="_blank" rel="noopener noreferrer"><i class="fab fa-twitter text-2xl text-blue-300"></i></a></div><div class="h-full flex md:hidden flex-row justify-end items-center"><span class="relative h-full w-10 px-3 flex flex-row justify-center items-center cursor-pointer font-semibold text-light hover:bg-zinc-800"><span class="w-4 transition-all h-px bg-light absolute top-1/2 -mt-1"></span><span class="w-4 transition-all h-px bg-light absolute top-1/2 "></span><span class="w-4 transition-all h-px bg-light absolute top-1/2 mt-1"></span></span></div></div></header><div class="w-full h-auto py-2 flex md:hidden flex-col justify-start items-start shadow-lg border-b hidden"><a class="h-12 text-base px-3 w-full flex flex-row justify-start items-center cursor-pointer font-semibold text-dark mr-2 hover:bg-gray-100 hover:text-blue-500" href="/blog/how-to-use-github-star-history">📕 How to use this site</a><span class="h-12 px-3 text-base w-full flex flex-row justify-start items-center cursor-pointer font-semibold text-dark mr-2 hover:bg-gray-100 hover:text-blue-500">Add Access Token</span><span class="h-12 text-base px-3 w-full flex flex-row justify-start items-center"><a class="github-button -mt-1" href="https://github.com/star-history/star-history" data-show-count="true" aria-label="Star star-history/star-history on GitHub" target="_blank" rel="noopener noreferrer">Star</a></span></div><div class="w-full h-auto grow lg:grid lg:grid-cols-[256px_1fr_256px]"><div class="w-full hidden lg:block"><div class="flex flex-col justify-start items-start w-full mt-2 p-4 pl-8"><a class="hover:opacity-75" href="/blog/list-your-open-source-project"><img class="w-auto max-w-full" src="/assets/starlet-icon.webp"/></a><div><div class="w-full flex flex-row justify-between items-center my-2"><h3 class="text-sm font-medium text-gray-400 leading-6">Playbook</h3></div><ul class="list-disc list-inside"><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/how-to-use-github-star-history"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">📕 How to Use this Site</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/playbook-for-more-github-stars"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">⭐️ How to Get More Stars</span></a></li></ul></div><div><div class="w-full flex flex-row justify-between items-center my-2"><h3 class="text-sm font-medium text-gray-400 leading-6">Monthly Pick</h3></div><ul class="list-disc list-inside"><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/ai-devtools"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Nov (AI DevTools)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/homelab"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Oct (Homelab)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/ai-agents"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Sep (AI Agents)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/rag-frameworks"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Aug (RAG frameworks)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/ai-generators"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Jul (AI Generators)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/ai-search"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Jun (AI Searches)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/ai-web-scraper"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 May (AI Web Scraper)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/prompt-engineering"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Apr (AI Prompt)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/non-ai"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Mar (Non-AI)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/most-underrated"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Feb (Most Underrated)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/text2sql"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2024 Jan (Text2SQL)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/gpt-wrappers"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Dec (GPT Wrappers)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/tts"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Nov (TTS)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/ai-for-postgres"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Oct (AI for Postgres)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/coding-ai"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Sept (Coding AI)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/cli-tool-for-llm"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Aug (CLI tool for LLMs)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/llama2"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 July (Llama 2 Edition)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-monthly-pick-202306"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 June</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-monthly-pick-202305"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 May</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-monthly-pick-202304"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Apr</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-monthly-pick-202303"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Mar (ChatGPT Edition)</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-monthly-pick-202302"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Feb</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-monthly-pick-202301"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023 Jan</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-monthly-pick-202212"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2022 Dec</span></a></li></ul></div><div><div class="w-full flex flex-row justify-between items-center my-2"><h3 class="text-sm font-medium text-gray-400 leading-6">Yearly Pick</h3></div><ul class="list-disc list-inside"><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/best-of-2023"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2023</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-yearly-pick-2022-data-infra-devtools"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2022 Data, Infra &amp; DevTools</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-open-source-2022-platform-engineering"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2022 Platform Engineering</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-open-source-2022-open-source-alternatives"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2022 OSS Alternatives</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/star-history-yearly-pick-2022-frontend"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">2022 Front-end</span></a></li></ul></div><div><div class="w-full flex flex-row justify-between items-center my-2"><h3 class="text-sm font-medium text-gray-400 leading-6">Starlet List</h3></div><ul class="list-disc list-inside"><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/list-your-open-source-project"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">🎁 Prompt yours for FREE</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/trench"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #28 - Trench</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/langfuse"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #27 - langfuse</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/thepipe"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #26 - thepi.pe</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/taipy"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #25 - Taipy</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/superlinked"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #24 - Superlinked</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/tea-tasting"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #23 - tea-tasting</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/giskard"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #22 - Giskard</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/khoj"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #21 - Khoj</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/paradedb"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #20 - ParadeDB</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/skyvern"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #19 - Skyvern</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/prisma"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #18 - Prisma</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/spicedb"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #17 - SpiceDB</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/answer"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #16 - Apache Answer</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/infinity"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #15 - Infinity</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/proton"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #14 - Proton</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/earthly"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #13 - Earthly</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/wasp"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #12 - Wasp</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/libsql"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #11 - libSQL</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/postgresml"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #10 - PostgresML</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/electricsql"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #9 - ElectricSQL</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/prompt-flow"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #8 - Prompt flow</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/clipboard"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #7 - Clipboard</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/hoppscotch"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #6 - Hoppscotch</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/metisfl"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #5 - MetisFL</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/chatgpt-js"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #4 - chatgpt.js</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/mockoon"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #3 - Mockoon</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/dlta-ai"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #2 - DLTA-AI</span></a></li><li class="mb-2 leading-3"><a class="cursor-pointer" rel="noopener noreferrer" href="/blog/sniffnet"><span class="inline -ml-2 text-sm text-blue-700 hover:underline">Issue #1 - Sniffnet</span></a></li></ul></div></div></div><div class="w-full flex flex-col justify-start items-center"><div class="w-full p-4 md:p-0 mt-6 md:w-5/6 lg:max-w-6xl h-full flex flex-col justify-start items-center self-center"><img class="hidden md:block w-auto max-w-full object-scale-down" src="/assets/blog/infinity/banner.webp" alt=""/><div class="w-auto max-w-6xl mt-4 md:mt-12 prose prose-indigo prose-xl md:prose-2xl flex flex-col justify-center items-center"><h1 class="leading-16">Starlet #15 Infinity: the AI-Native Database Powering the Next-Gen RAG for LLM</h1></div><div class="w-full mt-8 mb-2 max-w-6xl px-2 flex flex-row items-center justify-center text-sm text-gray-900 font-semibold trackingwide uppercase"><div class="flex space-x-1 text-gray-500"><span class="text-gray-900">Yingfeng Zhang</span><span aria-hidden="true"> · </span><time dateTime="2024-01-05:00:00.000Z">Jan 5, 2024</time><span aria-hidden="true"> · </span><span> <!-- -->9<!-- --> min read </span></div></div><div class="mt-8 w-full max-w-5xl prose prose-indigo prose-xl md:prose-2xl"><p><em>This is the fifteenth issue of The Starlet List. If you want to prompt your open source project on star-history.com for free, please check out our <a href="/blog/list-your-open-source-project">announcement</a>.</em></p> <hr> <p>After extensive development, the AI-native database Infinity was officially open-sourced on December 21, 2023. Github repo: <a href="https://github.com/infiniflow/infinity">https://github.com/infiniflow/infinity</a>. Infinity is specifically designed to cater to large models and is primarily used for Retrieval Augmented Generation (RAG). In the future, the infrastructure layer of enterprise AI applications will only require an AI-native database combined with a large model (LLM currently, multi-modal models in the future) to fully address the core needs of enterprise AI applications including Copilot, search, recommendations, and conversational AI. All types of enterprise data — documents, regular databases (OLTP and OLAP), APIs, logs, and unstructured data — can be integrated into a single AI-native database. The database feeds the business queries’ data to the large model, which generates the final results for specific applications.</p> <p><img src="/assets/blog/infinity/applications.webp" alt="applications"></p> <h2>Vector databases alone are insufficient for enterprise AI applications</h2> <p>You might be wondering: What is an AI-native database? Is it just an old vector database with a new brand? Absolutely not! An AI-native database goes beyond a vector database. While vector databases are “necessary but not sufficient” infra for large language models. Why? Well, vectors are limited to semantic retrieval, and they are not suitable for the precise query required by enterprise AI applications.</p> <p>For instance, consider the task of filtering content based on access permissions from an authorization table. This seemingly simple but common operation cannot be accomplished solely using vectors. Although some vector databases do support basic filtering and appear to work in similar scenarios, it’s important to note that this requires Extract-Transform-Load (ETL) to write permission fields to scalar fields in the vector database for effective filtering. This means three things:</p> <ol> <li>Introduces high-cost ETL for simple requirements.</li> <li>Updates to the raw data cannot be reflected in business.</li> <li>Introduces unnecessary data inflation. Permission filtering is just one example. Unnecessary data inflation becomes apparent when relying solely on ETL for handling diverse queries from multiple data sources. It’s akin to storing a wide table that includes all filtering fields within the vector database. This approach not only poses challenges in terms of system maintenance and data updates as mentioned above, but also results in unnecessary data inflation. Typically, it is only required to introduce wide tables in offline data lakes as part of enterprise system architecture.</li> </ol> <p>For example, most RAG (Retrieval-Augmented Generation) applications require precise retrieval. For instance, when a user asks a question about the contents of a table within a PDF document, relying solely on vectors cannot provide accurate answers, resulting in hallucinations in the answers returned by LLM. Therefore, precise retrieval can only be achieved through search engines.</p> <p>Therefore, the infrastructure for AI has actually evolved through three generations:</p> <p><img src="/assets/blog/infinity/iterations.webp" alt="iterations"></p> <p>The initial phase of AI relies on data statistics and mining and features the search engine. Elasticsearch and databases like MySQL were commonly used as infrastructure supporting enterprise AI applications.</p> <p>The next phase of AI brought deep learning, which led to vector search and the rise of vector databases. But because vector databases are only capable of vector search, they need to collaborate with various databases to create what is known as AI PaaS.</p> <p>In the latest phase, with the arrival of large language models, numerous new opportunities have emerged. However, a vector database alone is insufficient to handle the demands of these scenarios. We now need an infrastructure that can perform vector search, full-text search, and structured data retrieval simultaneously. Additionally, it must support complex data retrieval for large language models and work in collaboration with them to fulfill the requirements of enterprise businesses.</p> <h2>AI-native database is not a traditional database or data lake plus vector plugins</h2> <p>You might find that PostgreSQL, a traditional relational database, has the pgvector plugin for vector search. However, you cannot directly create an AI-native database from these traditional databases with vector search. Let’s see if PostgreSQL can address the main problems in AI’s RAG applications. Once you have those answers, I think you’ll figure it out.</p> <h3>How to implement full-text search required by precise recall?</h3> <p>PostgreSQL is an OLTP (Online Transaction Processing) database that focuses on ensuring ACID compliance for data writes. It does not have any direct connection to vectors and full-text search. While PostgreSQL does offer support for full-text search and has been in existence for over a decade, enterprises tend to use Elasticsearch for full-text search rather than PostgreSQL. The reason behind this is that PostgreSQL’s full-text search capability is best suited for small-scale and straightforward searches. On the other hand, an AI-native database paired with RAG needs to handle various data scales, perform customizable relevance ranking, and especially integrate multiple recalls (including vectors) for fusion ranking. These are tasks that PostgreSQL is not equipped to handle.</p> <h3>How to balance scalability, cost, and other factors?</h3> <p>PostgreSQL is a standalone database. In distributed scenarios, the prevailing approach is to use shard-nothing architecture for scalability. However, vector search and full-text search have distinct constraints and optimization goals, making the use of such complex techniques for scalability unnecessary. As a result, the era of large models calls for a specialized database, rather than a traditional database with added plugins.</p> <h3>How about a data lake with vector search capabilities?</h3> <p>Some may consider incorporating vector search capabilities into data lakes as an alternative approach. However, this strategy faces conflicting design objectives. Data lakes traditionally serve offline scenarios where generating complex SQL queries for internal business reports can take a significant amount of time, ranging from minutes to hours. Although there are real-time data lakes from a technical standpoint, they are not specifically designed to handle high concurrency in online scenarios. On the other hand, vector search is commonly used in high-concurrency online business scenarios like search, recommenders, and conversational AI. As a result, there is a disconnect between vector search, which suits high concurrency, and data lakes, which excel at high throughput and low latency, at least in terms of specific use cases.</p> <p>Therefore, as depicted in the blue box of the diagram, Infinity is a product that integrates AI and data infrastructure. It is specifically designed to cater to online scenarios and fulfill the future requirements of enterprises regarding data infrastructure for large language models.</p> <p><img src="/assets/blog/infinity/ecosystem.webp" alt="ecosystem"></p> <h2>Infinity’s system architecture</h2> <p><img src="/assets/blog/infinity/architecture.webp" alt="architecture"></p> <p>As shown in the diagram above, Infinity consists of two major modules: the storage layer and the computation layer.</p> <h3>Storage Layer</h3> <ul> <li><strong>Columnar Storage:</strong> Infinity’s default storage is a columnar storage engine, which focuses on structured data retrieval and guarantees ACID compliance.</li> <li><strong>ANN Index (Approximate Nearest Neighbor Index):</strong> Infinity utilizes the ANN Index for vector searching. Two types of indexes, IVF and memory-optimized HNSW, are currently available in Infinity. IVF is suitable for memory-constrained scenarios, while HNSW serves high-performance scenarios. In terms of efficiency, Infinity’s HNSW index, employing local quantization techniques, outperforms other vector indexes by providing superior search performance with less memory consumption. Additionally, the ANN index is built on a singular vector-type column in Infinity tables. This allows Infinity to potentially have multiple vector columns, unlike traditional vector databases that can only support one vector column. As a result, Infinity effortlessly enables multi-vector queries.</li> <li><strong>Inverted Index:</strong> This index caters to both full-text search and structured data retrieval. It comprises two components: a full-text index for text-based searches, which incorporates relevance ranking using BM25 and supports precise recall functionalities like phrase queries, and a secondary index for structured data, delivering efficient filtering capabilities.</li> </ul> <h3>Compute Layer</h3> <ul> <li><strong>Parser:</strong> To assist AI developers, Infinity offers a Pythonic API and a PostgreSQL-compatible SQL dialect, supported by a parser written from scratch.</li> <li><strong>Executor:</strong> Infinity’s executor provides a variety of indexes designed for different types of data and data distributions. It dynamically allocates resources, storage options, and queries based on the specific query requirements. For structured queries, it selects either columnar storage or inverted indexes. In low-latency situations, it can use multiple compute units to run queries in parallel. And if there are concerns about concurrency, it can utilize an inverted index on a single compute unit. The choice made by Infinity’s executor is always the best fit for the situation. It follows a push-based pipeline execution plan, which is optimized for handling high volumes of data and concurrent tasks. Infinity’s Fusion operator is responsible for managing multiple recall, sorting and selecting data from vector searches, full-text searches, and structured filtering. This approach prevents inefficiencies and potential errors that may occur when conducting federated search across multiple databases.</li> <li><strong>Executor:</strong> Infinity’s executor provides a variety of indexes designed for different types of data and data distributions. It dynamically allocates resources, storage options, and queries based on the specific query requirements. For structured queries, it selects either columnar storage or inverted indexes. In low-latency situations, it can use multiple compute units to run queries in parallel. And if there are concerns about concurrency, it can utilize an inverted index on a single compute unit. The choice made by Infinity’s executor is always the best fit for the situation. It follows a push-based pipeline execution plan, which is optimized for handling high volumes of data and concurrent tasks. Infinity’s Fusion operator is responsible for managing multiple recall, sorting and selecting data from vector searches, full-text searches, and structured filtering. This approach prevents inefficiencies and potential errors that may occur when conducting federated search across multiple databases.</li> </ul> <h2>Best-in-class vector search performance</h2> <p>Infinity is built with C++20, guaranteeing the best execution paths. It outperforms all existing vector databases in terms of vector search performance, thanks to its innovative algorithms. On an 8-core machine with a dataset of a million SIFT vectors, Infinity effortlessly achieves 10,000 QPS for high-concurrency situations, while in a single-client scenario, the query response latency is as low as 0.1 milliseconds, with minimal memory usage.</p> <p>In addition, Infinity has integrated C++ Modules to improve development efficiency. This makes it one of the first open-source projects to utilize C++ Modules. As a result, the time required to compile Infinity’s 200,000 lines of code, along with its dependencies amounting to a million lines of C++ code, is significantly reduced. The compilation time is reduced to minutes on a regular personal laptop. This is a great relief for traditional C++ programmers who no longer have to recompile numerous files when modifying a single header file, a process that used to take more than ten minutes. Lastly, we warmly invite you to join the <a href="https://discord.gg/jEfRUwEYEV">Infinity open-source community</a> and <a href="https://github.com/infiniflow/infinity">contribute code</a>, which will help accelerate the progress of Infinity towards its GA (general availability).</p> </div></div><div class="mt-12"><iframe src="https://embeds.beehiiv.com/2803dbaa-d8dd-4486-8880-4b843f3a7da6?slim=true" data-test-id="beehiiv-embed" height="52" frameBorder="0" scrolling="no" style="margin:0;border-radius:0px !important;background-color:transparent"></iframe></div></div><div class="w-full hidden lg:block"></div></div><footer class="relative w-full shrink-0 h-auto mt-6 flex flex-col justify-end items-center"><div class="w-full py-2 px-3 md:w-5/6 lg:max-w-7xl flex flex-row flex-wrap justify-between items-center text-neutral-700 border-t"><div class="text-sm leading-8 flex flex-row flex-wrap justify-start items-center"><div class="h-full text-gray-600">The missing GitHub star history graph</div><a class="h-full flex flex-row justify-center items-center ml-3 text-lg hover:opacity-80" href="https://twitter.com/StarHistoryHQ" target="_blank" rel="noopener noreferrer"><svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 512 512" height="1em" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg></a><a class="h-full flex flex-row justify-center items-center mx-3 text-lg hover:opacity-80" href="mailto:star@bytebase.com" target="_blank" rel="noopener noreferrer"><svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 512 512" height="1em" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M502.3 190.8c3.9-3.1 9.7-.2 9.7 4.7V400c0 26.5-21.5 48-48 48H48c-26.5 0-48-21.5-48-48V195.6c0-5 5.7-7.8 9.7-4.7 22.4 17.4 52.1 39.5 154.1 113.6 21.1 15.4 56.7 47.8 92.2 47.6 35.7.3 72-32.8 92.3-47.6 102-74.1 131.6-96.3 154-113.7zM256 320c23.2.4 56.6-29.2 73.4-41.4 132.7-96.3 142.8-104.7 173.4-128.7 5.8-4.5 9.2-11.5 9.2-18.9v-19c0-26.5-21.5-48-48-48H48C21.5 64 0 85.5 0 112v19c0 7.4 3.4 14.3 9.2 18.9 30.6 23.9 40.7 32.4 173.4 128.7 16.8 12.2 50.2 41.8 73.4 41.4z"></path></svg></a><a class="h-full flex flex-row justify-center items-center mr-3 text-lg hover:opacity-80" href="https://github.com/star-history/star-history" target="_blank" rel="noopener noreferrer"><svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 496 512" height="1em" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg></a></div><div class="flex flex-row flex-wrap items-center space-x-4"><div class="flex flex-row text-sm leading-8 underline text-blue-700 hover:opacity-80"><img class="h-6 mt-1 mr-2" src="/assets/sqlchat.webp" alt="SQL Chat"/><a href="https://sqlchat.ai" target="_blank" rel="noopener noreferrer"> <!-- -->SQL Chat<!-- --> </a></div><div class="flex flex-row text-sm leading-8 underline text-blue-700 hover:opacity-80"><img class="h-6 mt-1 mr-2" src="/assets/dbcost.webp" alt="DB Cost"/><a href="https://dbcost.com" target="_blank" rel="noopener noreferrer">DB Cost</a></div></div><div class="text-xs leading-8 flex flex-row flex-nowrap justify-end items-center"><span class="text-gray-600">Maintained by<!-- --> <a class="text-blue-500 font-bold hover:opacity-80" href="https://bytebase.com" target="_blank" rel="noopener noreferrer">Bytebase</a>, originally built by<!-- --> <a class="bg-blue-400 text-white p-1 pl-2 pr-2 rounded-l-2xl rounded-r-2xl hover:opacity-80" href="https://twitter.com/tim_qian" target="_blank" rel="noopener noreferrer">@tim_qian</a></span></div></div></footer><div class="fixed right-0 top-32 hidden lg:flex flex-col justify-start items-start transition-all bg-white w-48 xl:w-56 p-2 z-10 "><div class="w-full flex justify-between items-center mb-2"><p class="text-xs text-gray-400">Sponsors (random order)</p><svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 352 512" class="fas fa-times text-xs text-gray-400 cursor-pointer hover:text-gray-500" height="1em" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M242.72 256l100.07-100.07c12.28-12.28 12.28-32.19 0-44.48l-22.24-22.24c-12.28-12.28-32.19-12.28-44.48 0L176 189.28 75.93 89.21c-12.28-12.28-32.19-12.28-44.48 0L9.21 111.45c-12.28 12.28-12.28 32.19 0 44.48L109.28 256 9.21 356.07c-12.28 12.28-12.28 32.19 0 44.48l22.24 22.24c12.28 12.28 32.2 12.28 44.48 0L176 322.72l100.07 100.07c12.28 12.28 32.2 12.28 44.48 0l22.24-22.24c12.28-12.28 12.28-32.19 0-44.48L242.72 256z"></path></svg></div><a href="https://bytebase.com?utm_source=star-history" class="bg-gray-50 p-2 rounded w-full flex flex-col justify-center items-center mb-2 text-zinc-600 hover:opacity-80 hover:text-blue-600 hover:underline" target="_blank"><img class="w-auto max-w-full" src="/assets/sponsors/bytebase/logo.webp" alt="Bytebase"/><span class="text-xs mt-2">Bytebase: Database DevOps and CI/CD for MySQL, PG, Oracle, SQL Server, Snowflake, ClickHouse, Mongo, Redis</span></a><a href="https://dify.ai/?utm_source=star-history" class="bg-gray-50 p-2 rounded w-full flex flex-col justify-center items-center mb-2 text-zinc-600 hover:opacity-80 hover:text-blue-600 hover:underline" target="_blank"><img class="w-auto max-w-full" src="/assets/sponsors/dify/logo.webp" alt="Dify"/><span class="text-xs mt-2">Dify: Open-source platform for building LLM apps, from agents to AI workflows.</span></a><a href="mailto:star@bytebase.com?subject=I&#x27;m interested in sponsoring star-history.com" target="_blank" class="w-full p-2 text-center bg-gray-50 text-xs leading-6 text-gray-400 rounded hover:underline hover:text-blue-600">Your logo</a></div></div></div><script id="__NEXT_DATA__" type="application/json" crossorigin="">{"props":{"pageProps":{"blog":{"title":"Starlet #15 Infinity: the AI-Native Database Powering the Next-Gen RAG for LLM","slug":"infinity","author":"Yingfeng Zhang","featured":true,"featureImage":"/assets/blog/infinity/banner.webp","publishedDate":"2024-01-05:00:00.000Z","excerpt":"Infinity is an AI-native database specifically designed to cater to large models and is primarily used for Retrieval Augmented Generation (RAG).","readingTime":9},"parsedBlogHTML":"\u003cp\u003e\u003cem\u003eThis is the fifteenth issue of The Starlet List. If you want to prompt your open source project on star-history.com for free, please check out our \u003ca href=\"/blog/list-your-open-source-project\"\u003eannouncement\u003c/a\u003e.\u003c/em\u003e\u003c/p\u003e\n\u003chr\u003e\n\u003cp\u003eAfter extensive development, the AI-native database Infinity was officially open-sourced on December 21, 2023. Github repo: \u003ca href=\"https://github.com/infiniflow/infinity\"\u003ehttps://github.com/infiniflow/infinity\u003c/a\u003e. Infinity is specifically designed to cater to large models and is primarily used for Retrieval Augmented Generation (RAG). In the future, the infrastructure layer of enterprise AI applications will only require an AI-native database combined with a large model (LLM currently, multi-modal models in the future) to fully address the core needs of enterprise AI applications including Copilot, search, recommendations, and conversational AI. All types of enterprise data — documents, regular databases (OLTP and OLAP), APIs, logs, and unstructured data — can be integrated into a single AI-native database. The database feeds the business queries’ data to the large model, which generates the final results for specific applications.\u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"/assets/blog/infinity/applications.webp\" alt=\"applications\"\u003e\u003c/p\u003e\n\u003ch2\u003eVector databases alone are insufficient for enterprise AI applications\u003c/h2\u003e\n\u003cp\u003eYou might be wondering: What is an AI-native database? Is it just an old vector database with a new brand? Absolutely not! An AI-native database goes beyond a vector database. While vector databases are “necessary but not sufficient” infra for large language models. Why? Well, vectors are limited to semantic retrieval, and they are not suitable for the precise query required by enterprise AI applications.\u003c/p\u003e\n\u003cp\u003eFor instance, consider the task of filtering content based on access permissions from an authorization table. This seemingly simple but common operation cannot be accomplished solely using vectors. Although some vector databases do support basic filtering and appear to work in similar scenarios, it’s important to note that this requires Extract-Transform-Load (ETL) to write permission fields to scalar fields in the vector database for effective filtering. This means three things:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eIntroduces high-cost ETL for simple requirements.\u003c/li\u003e\n\u003cli\u003eUpdates to the raw data cannot be reflected in business.\u003c/li\u003e\n\u003cli\u003eIntroduces unnecessary data inflation. Permission filtering is just one example. Unnecessary data inflation becomes apparent when relying solely on ETL for handling diverse queries from multiple data sources. It’s akin to storing a wide table that includes all filtering fields within the vector database. This approach not only poses challenges in terms of system maintenance and data updates as mentioned above, but also results in unnecessary data inflation. Typically, it is only required to introduce wide tables in offline data lakes as part of enterprise system architecture.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eFor example, most RAG (Retrieval-Augmented Generation) applications require precise retrieval. For instance, when a user asks a question about the contents of a table within a PDF document, relying solely on vectors cannot provide accurate answers, resulting in hallucinations in the answers returned by LLM. Therefore, precise retrieval can only be achieved through search engines.\u003c/p\u003e\n\u003cp\u003eTherefore, the infrastructure for AI has actually evolved through three generations:\u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"/assets/blog/infinity/iterations.webp\" alt=\"iterations\"\u003e\u003c/p\u003e\n\u003cp\u003eThe initial phase of AI relies on data statistics and mining and features the search engine. Elasticsearch and databases like MySQL were commonly used as infrastructure supporting enterprise AI applications.\u003c/p\u003e\n\u003cp\u003eThe next phase of AI brought deep learning, which led to vector search and the rise of vector databases. But because vector databases are only capable of vector search, they need to collaborate with various databases to create what is known as AI PaaS.\u003c/p\u003e\n\u003cp\u003eIn the latest phase, with the arrival of large language models, numerous new opportunities have emerged. However, a vector database alone is insufficient to handle the demands of these scenarios. We now need an infrastructure that can perform vector search, full-text search, and structured data retrieval simultaneously. Additionally, it must support complex data retrieval for large language models and work in collaboration with them to fulfill the requirements of enterprise businesses.\u003c/p\u003e\n\u003ch2\u003eAI-native database is not a traditional database or data lake plus vector plugins\u003c/h2\u003e\n\u003cp\u003eYou might find that PostgreSQL, a traditional relational database, has the pgvector plugin for vector search. However, you cannot directly create an AI-native database from these traditional databases with vector search. Let’s see if PostgreSQL can address the main problems in AI’s RAG applications. Once you have those answers, I think you’ll figure it out.\u003c/p\u003e\n\u003ch3\u003eHow to implement full-text search required by precise recall?\u003c/h3\u003e\n\u003cp\u003ePostgreSQL is an OLTP (Online Transaction Processing) database that focuses on ensuring ACID compliance for data writes. It does not have any direct connection to vectors and full-text search. While PostgreSQL does offer support for full-text search and has been in existence for over a decade, enterprises tend to use Elasticsearch for full-text search rather than PostgreSQL. The reason behind this is that PostgreSQL’s full-text search capability is best suited for small-scale and straightforward searches. On the other hand, an AI-native database paired with RAG needs to handle various data scales, perform customizable relevance ranking, and especially integrate multiple recalls (including vectors) for fusion ranking. These are tasks that PostgreSQL is not equipped to handle.\u003c/p\u003e\n\u003ch3\u003eHow to balance scalability, cost, and other factors?\u003c/h3\u003e\n\u003cp\u003ePostgreSQL is a standalone database. In distributed scenarios, the prevailing approach is to use shard-nothing architecture for scalability. However, vector search and full-text search have distinct constraints and optimization goals, making the use of such complex techniques for scalability unnecessary. As a result, the era of large models calls for a specialized database, rather than a traditional database with added plugins.\u003c/p\u003e\n\u003ch3\u003eHow about a data lake with vector search capabilities?\u003c/h3\u003e\n\u003cp\u003eSome may consider incorporating vector search capabilities into data lakes as an alternative approach. However, this strategy faces conflicting design objectives. Data lakes traditionally serve offline scenarios where generating complex SQL queries for internal business reports can take a significant amount of time, ranging from minutes to hours. Although there are real-time data lakes from a technical standpoint, they are not specifically designed to handle high concurrency in online scenarios. On the other hand, vector search is commonly used in high-concurrency online business scenarios like search, recommenders, and conversational AI. As a result, there is a disconnect between vector search, which suits high concurrency, and data lakes, which excel at high throughput and low latency, at least in terms of specific use cases.\u003c/p\u003e\n\u003cp\u003eTherefore, as depicted in the blue box of the diagram, Infinity is a product that integrates AI and data infrastructure. It is specifically designed to cater to online scenarios and fulfill the future requirements of enterprises regarding data infrastructure for large language models.\u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"/assets/blog/infinity/ecosystem.webp\" alt=\"ecosystem\"\u003e\u003c/p\u003e\n\u003ch2\u003eInfinity’s system architecture\u003c/h2\u003e\n\u003cp\u003e\u003cimg src=\"/assets/blog/infinity/architecture.webp\" alt=\"architecture\"\u003e\u003c/p\u003e\n\u003cp\u003eAs shown in the diagram above, Infinity consists of two major modules: the storage layer and the computation layer.\u003c/p\u003e\n\u003ch3\u003eStorage Layer\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eColumnar Storage:\u003c/strong\u003e Infinity’s default storage is a columnar storage engine, which focuses on structured data retrieval and guarantees ACID compliance.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eANN Index (Approximate Nearest Neighbor Index):\u003c/strong\u003e Infinity utilizes the ANN Index for vector searching. Two types of indexes, IVF and memory-optimized HNSW, are currently available in Infinity. IVF is suitable for memory-constrained scenarios, while HNSW serves high-performance scenarios. In terms of efficiency, Infinity’s HNSW index, employing local quantization techniques, outperforms other vector indexes by providing superior search performance with less memory consumption. Additionally, the ANN index is built on a singular vector-type column in Infinity tables. This allows Infinity to potentially have multiple vector columns, unlike traditional vector databases that can only support one vector column. As a result, Infinity effortlessly enables multi-vector queries.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eInverted Index:\u003c/strong\u003e This index caters to both full-text search and structured data retrieval. It comprises two components: a full-text index for text-based searches, which incorporates relevance ranking using BM25 and supports precise recall functionalities like phrase queries, and a secondary index for structured data, delivering efficient filtering capabilities.\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch3\u003eCompute Layer\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eParser:\u003c/strong\u003e To assist AI developers, Infinity offers a Pythonic API and a PostgreSQL-compatible SQL dialect, supported by a parser written from scratch.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eExecutor:\u003c/strong\u003e Infinity’s executor provides a variety of indexes designed for different types of data and data distributions. It dynamically allocates resources, storage options, and queries based on the specific query requirements. For structured queries, it selects either columnar storage or inverted indexes. In low-latency situations, it can use multiple compute units to run queries in parallel. And if there are concerns about concurrency, it can utilize an inverted index on a single compute unit. The choice made by Infinity’s executor is always the best fit for the situation. It follows a push-based pipeline execution plan, which is optimized for handling high volumes of data and concurrent tasks. Infinity’s Fusion operator is responsible for managing multiple recall, sorting and selecting data from vector searches, full-text searches, and structured filtering. This approach prevents inefficiencies and potential errors that may occur when conducting federated search across multiple databases.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eExecutor:\u003c/strong\u003e Infinity’s executor provides a variety of indexes designed for different types of data and data distributions. It dynamically allocates resources, storage options, and queries based on the specific query requirements. For structured queries, it selects either columnar storage or inverted indexes. In low-latency situations, it can use multiple compute units to run queries in parallel. And if there are concerns about concurrency, it can utilize an inverted index on a single compute unit. The choice made by Infinity’s executor is always the best fit for the situation. It follows a push-based pipeline execution plan, which is optimized for handling high volumes of data and concurrent tasks. Infinity’s Fusion operator is responsible for managing multiple recall, sorting and selecting data from vector searches, full-text searches, and structured filtering. This approach prevents inefficiencies and potential errors that may occur when conducting federated search across multiple databases.\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch2\u003eBest-in-class vector search performance\u003c/h2\u003e\n\u003cp\u003eInfinity is built with C++20, guaranteeing the best execution paths. It outperforms all existing vector databases in terms of vector search performance, thanks to its innovative algorithms. On an 8-core machine with a dataset of a million SIFT vectors, Infinity effortlessly achieves 10,000 QPS for high-concurrency situations, while in a single-client scenario, the query response latency is as low as 0.1 milliseconds, with minimal memory usage.\u003c/p\u003e\n\u003cp\u003eIn addition, Infinity has integrated C++ Modules to improve development efficiency. This makes it one of the first open-source projects to utilize C++ Modules. As a result, the time required to compile Infinity’s 200,000 lines of code, along with its dependencies amounting to a million lines of C++ code, is significantly reduced. The compilation time is reduced to minutes on a regular personal laptop. This is a great relief for traditional C++ programmers who no longer have to recompile numerous files when modifying a single header file, a process that used to take more than ten minutes. Lastly, we warmly invite you to join the \u003ca href=\"https://discord.gg/jEfRUwEYEV\"\u003eInfinity open-source community\u003c/a\u003e and \u003ca href=\"https://github.com/infiniflow/infinity\"\u003econtribute code\u003c/a\u003e, which will help accelerate the progress of Infinity towards its GA (general availability).\u003c/p\u003e\n"},"__N_SSG":true},"page":"/blog/[slug]","query":{"slug":"infinity"},"buildId":"xKX4ZiOi_N7h3OBOEsSZu","isFallback":false,"gsp":true,"scriptLoader":[]}</script></body></html>

Pages: 1 2 3 4 5 6 7 8 9 10