CINXE.COM
sqlite-vec now supports metadata columns and filtering | Alex Garcia's Blog
<html lang="en"><head><meta charSet="UTF-8" /><link rel="icon" type="image/svg+xml" href="/vite.svg" /><meta name="viewport" content="width=device-width, initial-scale=1.0" /><title>sqlite-vec now supports metadata columns and filtering | Alex Garcia's Blog</title><meta name="description" content="Metadata, partition key, and auxiliary column support in sqlite-vec" /><meta name="twitter:card" content="summary" /><meta name="twitter:image" content="https://blog-static.alxg.xyz/Screenshot%202024-11-20%20at%208.26.35%E2%80%AFAM.png" /><meta name="twitter:creator" content="@agarcia_me" /><meta property="og:url" content="https://alexgarcia.xyz/blog/2024/sqlite-vec-metadata-release/index.html" /><meta property="og:title" content="sqlite-vec now supports metadata columns and filtering" /><meta property="og:image" content="https://blog-static.alxg.xyz/Screenshot%202024-11-20%20at%208.26.35%E2%80%AFAM.png" /><meta property="og:type" content="article" /><meta property="og:description" content="Metadata, partition key, and auxiliary column support in sqlite-vec" /><meta property="og:updated_time" content="2024-11-20T00:00:00.000Z" /><style>/* https://unpkg.com/@catppuccin/palette@1.1.1/css/catppuccin.css */ :root { --ctp-latte-rosewater: #dc8a78; --ctp-latte-flamingo: #dd7878; --ctp-latte-pink: #ea76cb; --ctp-latte-mauve: #8839ef; --ctp-latte-red: #d20f39; --ctp-latte-maroon: #e64553; --ctp-latte-peach: #fe640b; --ctp-latte-yellow: #df8e1d; --ctp-latte-green: #40a02b; --ctp-latte-teal: #179299; --ctp-latte-sky: #04a5e5; --ctp-latte-sapphire: #209fb5; --ctp-latte-blue: #1e66f5; --ctp-latte-lavender: #7287fd; --ctp-latte-text: #4c4f69; --ctp-latte-subtext1: #5c5f77; --ctp-latte-subtext0: #6c6f85; --ctp-latte-overlay2: #7c7f93; --ctp-latte-overlay1: #8c8fa1; --ctp-latte-overlay0: #9ca0b0; --ctp-latte-surface2: #acb0be; --ctp-latte-surface1: #bcc0cc; --ctp-latte-surface0: #ccd0da; --ctp-latte-base: #eff1f5; --ctp-latte-mantle: #e6e9ef; --ctp-latte-crust: #dce0e8; } :root { --ctp-frappe-rosewater: #f2d5cf; --ctp-frappe-flamingo: #eebebe; --ctp-frappe-pink: #f4b8e4; --ctp-frappe-mauve: #ca9ee6; --ctp-frappe-red: #e78284; --ctp-frappe-maroon: #ea999c; --ctp-frappe-peach: #ef9f76; --ctp-frappe-yellow: #e5c890; --ctp-frappe-green: #a6d189; --ctp-frappe-teal: #81c8be; --ctp-frappe-sky: #99d1db; --ctp-frappe-sapphire: #85c1dc; --ctp-frappe-blue: #8caaee; --ctp-frappe-lavender: #babbf1; --ctp-frappe-text: #c6d0f5; --ctp-frappe-subtext1: #b5bfe2; --ctp-frappe-subtext0: #a5adce; --ctp-frappe-overlay2: #949cbb; --ctp-frappe-overlay1: #838ba7; --ctp-frappe-overlay0: #737994; --ctp-frappe-surface2: #626880; --ctp-frappe-surface1: #51576d; --ctp-frappe-surface0: #414559; --ctp-frappe-base: #303446; --ctp-frappe-mantle: #292c3c; --ctp-frappe-crust: #232634; } :root { --ctp-macchiato-rosewater: #f4dbd6; --ctp-macchiato-flamingo: #f0c6c6; --ctp-macchiato-pink: #f5bde6; --ctp-macchiato-mauve: #c6a0f6; --ctp-macchiato-red: #ed8796; --ctp-macchiato-maroon: #ee99a0; --ctp-macchiato-peach: #f5a97f; --ctp-macchiato-yellow: #eed49f; --ctp-macchiato-green: #a6da95; --ctp-macchiato-teal: #8bd5ca; --ctp-macchiato-sky: #91d7e3; --ctp-macchiato-sapphire: #7dc4e4; --ctp-macchiato-blue: #8aadf4; --ctp-macchiato-lavender: #b7bdf8; --ctp-macchiato-text: #cad3f5; --ctp-macchiato-subtext1: #b8c0e0; --ctp-macchiato-subtext0: #a5adcb; --ctp-macchiato-overlay2: #939ab7; --ctp-macchiato-overlay1: #8087a2; --ctp-macchiato-overlay0: #6e738d; --ctp-macchiato-surface2: #5b6078; --ctp-macchiato-surface1: #494d64; --ctp-macchiato-surface0: #363a4f; --ctp-macchiato-base: #24273a; --ctp-macchiato-mantle: #1e2030; --ctp-macchiato-crust: #181926; } :root { --ctp-mocha-rosewater: #f5e0dc; --ctp-mocha-flamingo: #f2cdcd; --ctp-mocha-pink: #f5c2e7; --ctp-mocha-mauve: #cba6f7; --ctp-mocha-red: #f38ba8; --ctp-mocha-maroon: #eba0ac; --ctp-mocha-peach: #fab387; --ctp-mocha-yellow: #f9e2af; --ctp-mocha-green: #a6e3a1; --ctp-mocha-teal: #94e2d5; --ctp-mocha-sky: #89dceb; --ctp-mocha-sapphire: #74c7ec; --ctp-mocha-blue: #89b4fa; --ctp-mocha-lavender: #b4befe; --ctp-mocha-text: #cdd6f4; --ctp-mocha-subtext1: #bac2de; --ctp-mocha-subtext0: #a6adc8; --ctp-mocha-overlay2: #9399b2; --ctp-mocha-overlay1: #7f849c; --ctp-mocha-overlay0: #6c7086; --ctp-mocha-surface2: #585b70; --ctp-mocha-surface1: #45475a; --ctp-mocha-surface0: #313244; --ctp-mocha-base: #1e1e2e; --ctp-mocha-mantle: #181825; --ctp-mocha-crust: #11111b; } :root { --ctp-rosewater: var(--ctp-latte-rosewater); --ctp-flamingo: var(--ctp-latte-flamingo); --ctp-pink: var(--ctp-latte-pink); --ctp-mauve: var(--ctp-latte-mauve); --ctp-red: var(--ctp-latte-red); --ctp-maroon: var(--ctp-latte-maroon); --ctp-peach: var(--ctp-latte-peach); --ctp-yellow: var(--ctp-latte-yellow); --ctp-green: var(--ctp-latte-green); --ctp-teal: var(--ctp-latte-teal); --ctp-sky: var(--ctp-latte-sky); --ctp-sapphire: var(--ctp-latte-sapphire); --ctp-blue: var(--ctp-latte-blue); --ctp-lavender: var(--ctp-latte-lavender); --ctp-text: var(--ctp-latte-text); --ctp-subtext1: var(--ctp-latte-subtext1); --ctp-subtext0: var(--ctp-latte-subtext0); --ctp-overlay2: var(--ctp-latte-overlay2); --ctp-overlay1: var(--ctp-latte-overlay1); --ctp-overlay0: var(--ctp-latte-overlay0); --ctp-surface2: var(--ctp-latte-surface2); --ctp-surface1: var(--ctp-latte-surface1); --ctp-surface0: var(--ctp-latte-surface0); --ctp-base: var(--ctp-latte-base); --ctp-mantle: var(--ctp-latte-mantle); --ctp-crust: var(--ctp-latte-crust); } html.dark { --ctp-rosewater: var(--ctp-mocha-rosewater); --ctp-flamingo: var(--ctp-mocha-flamingo); --ctp-pink: var(--ctp-mocha-pink); --ctp-mauve: var(--ctp-mocha-mauve); --ctp-red: var(--ctp-mocha-red); --ctp-maroon: var(--ctp-mocha-maroon); --ctp-peach: var(--ctp-mocha-peach); --ctp-yellow: var(--ctp-mocha-yellow); --ctp-green: var(--ctp-mocha-green); --ctp-teal: var(--ctp-mocha-teal); --ctp-sky: var(--ctp-mocha-sky); --ctp-sapphire: var(--ctp-mocha-sapphire); --ctp-blue: var(--ctp-mocha-blue); --ctp-lavender: var(--ctp-mocha-lavender); --ctp-text: var(--ctp-mocha-text); --ctp-subtext1: var(--ctp-mocha-subtext1); --ctp-subtext0: var(--ctp-mocha-subtext0); --ctp-overlay2: var(--ctp-mocha-overlay2); --ctp-overlay1: var(--ctp-mocha-overlay1); --ctp-overlay0: var(--ctp-mocha-overlay0); --ctp-surface2: var(--ctp-mocha-surface2); --ctp-surface1: var(--ctp-mocha-surface1); --ctp-surface0: var(--ctp-mocha-surface0); --ctp-base: var(--ctp-mocha-base); --ctp-mantle: var(--ctp-mocha-mantle); --ctp-crust: var(--ctp-mocha-crust); } html { background-color: var(--ctp-base); } /* * { color: var(--ctp-text); }*/ * { color: black; } html.dark * { color: white; } h1 { margin-bottom: 0rem; } /* h2, h3, h4 { color: var(--ctp-subtext0); }*/ .header { display: flex; justify-content: space-between; align-items: center; } hr { border: 0.5px solid var(--ctp-subtext0); } button.theme::before { content: "⏼"; } .header a { color: var(--blog-link-color); } button.theme { cursor: pointer; background: none; border: none; text-decoration: underline; color: var(--blog-link-color); } main { max-width: 48rem; margin: 0 auto; padding-bottom: 6rem; } summary { cursor: pointer; margin-left: 1rem; } p, li, h1, h2, h3, h4, h5, h6, details, summary { /*font-family: "Helvetica Neue", helvetica, sans-serif;*/ font-family: "Inter", sans-serif; font-optical-sizing: auto; /*font-weight: ;*/ font-style: normal; font-variation-settings: "slnt" 0; } p { line-height: 22px; } a { color: var(--ctp-blue); } a:visited { color: var(--ctp-lavender); } code { font-family: MonaspaceNeon; background: var(--ctp-crust); padding: 2px 4px; border-radius: 4px; } html.dark code { background: unset; padding: unset; } a code { color: var(--ctp-blue); } pre code { background: unset; padding: unset; } pre { padding: 0.75rem; border-radius: 0.25rem; overflow: auto; background: var(--ctp-crust); } @font-face { font-family: MonaspaceNeon; font-weight: bold; src: url("../../MonaspaceNeon-Regular.otf") format("opentype"); } html .shiki, html .shiki span { color: var(--shiki-light); /*background-color: var(--shiki-light-bg);*/ } html.dark .shiki, html.dark .shiki span { color: var(--shiki-dark); /*background-color: var(--shiki-dark-bg);*/ } .shiki .line.highlighted { background-color: #ccd0da; transition: background-color 0.5s; margin: 0 -0.75rem; padding: 0 0.75rem; width: calc(100% + 0.1px); display: inline-block; } .shiki .line.highlighted span { background-color: #ccd0da; transition: background-color 0.5s; } html.dark .shiki .line.highlighted { background-color: #313244; transition: background-color 0.5s; margin: 0 -0.75rem; padding: 0 0.75rem; width: calc(100% + 0.1px); display: inline-block; } html.dark .shiki .line.highlighted span { background-color: #313244; transition: background-color 0.5s; } </style><link rel="preconnect" href="https://fonts.googleapis.com" /><link rel="preconnect" href="https://fonts.gstatic.com" crossOrigin /><link href="https://fonts.googleapis.com/css2?family=Inter:wght@100..900&display=swap" rel="stylesheet" /><script src="../../theme.js"></script><script defer data-domain="alexgarcia.xyz/blog" src="https://plausible.io/js/script.js"></script></head><body><main><div class="header"><div><a href="../../">Alex Garcia's Blog</a></div><div><button class="theme"></button></div></div><h1>sqlite-vec now supports metadata columns and filtering</h1><p style="margin-top: .5rem; margin-bottom: 2.5rem; font-size: 15px;">2024-11-20 by <a href="https://alexgarcia.xyz/">Alex Garcia</a></p><section><script type="module" src="./index.min.js"></script> <link rel="stylesheet" href="./index.min.css"></link> <p><svg id="hero"></svg></p> <div class="summary"> <blockquote> <p><em>tl;dr — <a href="https://github.com/asg017/sqlite-vec"><code>sqlite-vec</code></a>, a SQLite extension for vector search, now supports <a href="https://alexgarcia.xyz/sqlite-vec/features/vec0.html#metadata">metadata columns</a>, <a href="https://alexgarcia.xyz/sqlite-vec/features/vec0.html#aux">auxiliary columns</a>, and <a href="https://alexgarcia.xyz/sqlite-vec/features/vec0.html#partition-keys">partitioning</a> in vec0 virtual tables! You can use these to store metadata like <code>user_id</code> or <code>created_at</code> fields, add additional <code>WHERE</code> clauses in KNN queries, and make certain selective queries much faster. Try it out!</em></p> </blockquote> <style> .summary blockquote { font-weight: 500; } .tbl code { font-family: monospace; } </style> </div> <hr> <p>As of the latest <a href="https://github.com/asg017/sqlite-vec/releases/tag/v0.1.6">v0.1.6</a> release of <code>sqlite-vec</code>, you can now store non-vector data in <code>vec0</code> virtual tables! For example:</p> <pre class="shiki shiki-themes catppuccin-latte catppuccin-mocha" style="--shiki-light:#4c4f69;--shiki-dark:#cdd6f4;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#1e1e2e" tabindex="0"><code class="language-sql"><span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">create</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> virtual </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">table</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> vec_articles </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">using</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> vec0(</span></span> <span class="line"></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> article_id </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">integer</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> primary key</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"></span> <span class="line"><span style="--shiki-light:#9CA0B0;--shiki-light-font-style:italic;--shiki-dark:#6C7086;--shiki-dark-font-style:italic"> -- Vector text embedding of the `headline` column, with 384 dimensions</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> headline_embedding </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">float</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">[384],</span></span> <span class="line"></span> <span class="line"><span style="--shiki-light:#9CA0B0;--shiki-light-font-style:italic;--shiki-dark:#6C7086;--shiki-dark-font-style:italic"> -- Partition key, internally shard vector index on article published year</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> year</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> integer</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> partition</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> key</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"></span> <span class="line"><span style="--shiki-light:#9CA0B0;--shiki-light-font-style:italic;--shiki-dark:#6C7086;--shiki-dark-font-style:italic"> -- Metadata columns, can appear in `WHERE` clause of KNN queries</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> news_desk </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">text</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> word_count </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">integer</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> pub_date </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">text</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"></span> <span class="line"><span style="--shiki-light:#9CA0B0;--shiki-light-font-style:italic;--shiki-dark:#6C7086;--shiki-dark-font-style:italic"> -- Auxiliary columns, unindexed but fast lookups</span></span> <span class="line"><span style="--shiki-light:#179299;--shiki-dark:#94E2D5"> +</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">headline </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">text</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"><span style="--shiki-light:#179299;--shiki-dark:#94E2D5"> +</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">url</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> text</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">);</span></span></code></pre> <p>Here we are storing a <a href="https://www.kaggle.com/datasets/johnbandy/new-york-times-headlines">New York Time article headlines dataset</a> from the past 30 years, where we embed the headlines with <a href="https://huggingface.co/mixedbread-ai/mxbai-embed-xsmall-v1"><code>mixedbread-ai/mxbai-embed-xsmall-v1</code></a>.</p> <p>If we wanted to see the closest related headlines to <code>'pandemic'</code> on article published in 2020 by the <code>'Sports'</code> or <code>'Business'</code> new desk with more than 500 but less than 1000 words, we can perform a KNN query like so:</p> <pre class="shiki shiki-themes catppuccin-latte catppuccin-mocha" style="--shiki-light:#4c4f69;--shiki-dark:#cdd6f4;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#1e1e2e" tabindex="0"><code class="language-sql"><span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">select</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> article_id,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> headline,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> news_desk,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> word_count,</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> url</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> pub_date,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> distance</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">from</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> vec_articles</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">where</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> headline_embedding </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">match</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> lembed(</span><span style="--shiki-light:#40A02B;--shiki-dark:#A6E3A1">'pandemic'</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">)</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> k </span><span style="--shiki-light:#179299;--shiki-dark:#94E2D5">=</span><span style="--shiki-light:#FE640B;--shiki-dark:#FAB387"> 8</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> year</span><span style="--shiki-light:#179299;--shiki-dark:#94E2D5"> =</span><span style="--shiki-light:#FE640B;--shiki-dark:#FAB387"> 2020</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> news_desk </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">in</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> (</span><span style="--shiki-light:#40A02B;--shiki-dark:#A6E3A1">'Sports'</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">, </span><span style="--shiki-light:#40A02B;--shiki-dark:#A6E3A1">'Business'</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">)</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> word_count </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">between</span><span style="--shiki-light:#FE640B;--shiki-dark:#FAB387"> 500</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#FE640B;--shiki-dark:#FAB387"> 1000</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">;</span></span></code></pre> <div class="tbl"> <pre class="shiki shiki-themes catppuccin-latte catppuccin-mocha" style="--shiki-light:#4c4f69;--shiki-dark:#cdd6f4;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#1e1e2e" tabindex="0"><code class="language-text"><span class="line"><span>┌────────────┬──────────────────────────────────────────────────────────────────────┬───────────┬────────────┬─────────────────────────────┬──────────────────────────┬───────────┐</span></span> <span class="line"><span>│ article_id │ headline │ news_desk │ word_count │ url │ pub_date │ distance │</span></span> <span class="line"><span>├────────────┼──────────────────────────────────────────────────────────────────────┼───────────┼────────────┼─────────────────────────────┼──────────────────────────┼───────────┤</span></span> <span class="line"><span>│ 2911716 │ The Pandemic’s Economic Damage Is Growing │ Business │ 910 │ https://www.nytimes.com/... │ 2020-07-07T18:12:40+0000 │ 0.8928120 │</span></span> <span class="line"><span>│ 2892929 │ As Coronavirus Spreads, Olympics Face Ticking Clock and a Tough Call │ Sports │ 987 │ https://www.nytimes.com/... │ 2020-03-06T01:34:36+0000 │ 0.9608180 │</span></span> <span class="line"><span>│ 2932041 │ The Pandemic Is Already Affecting Next Year’s Sports Schedule │ Sports │ 620 │ https://www.nytimes.com/... │ 2020-11-11T13:56:25+0000 │ 0.9802038 │</span></span> <span class="line"><span>│ 2915381 │ The Week in Business: Getting Rich Off the Pandemic │ Business │ 814 │ https://www.nytimes.com/... │ 2020-08-02T11:00:03+0000 │ 1.0064692 │</span></span> <span class="line"><span>│ 2896043 │ The Coronavirus and the Postponement of the Olympics, Explained │ Sports │ 798 │ https://www.nytimes.com/... │ 2020-03-25T17:45:58+0000 │ 1.0115833 │</span></span> <span class="line"><span>│ 2898566 │ Robots Welcome to Take Over, as Pandemic Accelerates Automation │ Business │ 871 │ https://www.nytimes.com/... │ 2020-04-10T09:00:27+0000 │ 1.019637 │</span></span> <span class="line"><span>│ 2898239 │ The Pandemic Feeds Tech Companies’ Power │ Business │ 784 │ https://www.nytimes.com/... │ 2020-04-08T16:43:13+0000 │ 1.0200014 │</span></span> <span class="line"><span>│ 2929224 │ In M.L.S., the Pandemic Changes the Playoff Math │ Sports │ 859 │ https://www.nytimes.com/... │ 2020-10-29T17:09:10+0000 │ 1.0238885 │</span></span> <span class="line"><span>└────────────┴──────────────────────────────────────────────────────────────────────┴───────────┴────────────┴─────────────────────────────┴──────────────────────────┴───────────┘</span></span></code></pre> </div> <p><small><i>Here we used <a href="https://github.com/asg017/sqlite-lembed"><code>sqlite-lembed</code></a> to embed our query, but any other embeddings provider could be used!</i></small></p> <p>We can reference those metadata columns and parition key columns in the <code>WHERE</code> clause of the KNN query, and get the exact results we want!</p> <p>Now, what's the difference between metadata, partition key, and auxiliary columns?</p> <h2 id="metadata-columns-for-where-clause-filtering" tabindex="-1"><a class="header-anchor" href="#metadata-columns-for-where-clause-filtering">¶</a> Metadata columns for <code>WHERE</code> clause filtering</h2> <p>Metadata columns are declared with normal column declartions in the <code>vec0</code> constructor. Metadata columns are stored and indexed <em>alongside</em> vectors, and can appear in the <code>WHERE</code> clause of KNN queries.</p> <pre class="shiki shiki-themes catppuccin-latte catppuccin-mocha" style="--shiki-light:#4c4f69;--shiki-dark:#cdd6f4;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#1e1e2e" tabindex="0"><code class="language-sql"><span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">create</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> virtual </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">table</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> vec_articles </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">using</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> vec0(</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> article_id </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">integer</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> primary key</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> headline_embedding </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">float</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">[384],</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> news_desk </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">text</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> word_count </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">integer</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> pub_date </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">text</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">);</span></span> <span class="line"></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">select</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> article_id,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> headline,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> news_desk,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> word_count,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> pub_date,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> distance</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">from</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> vec_articles</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">where</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> headline_embedding </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">match</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> lembed(</span><span style="--shiki-light:#40A02B;--shiki-dark:#A6E3A1">'new york city housing'</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">)</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> k </span><span style="--shiki-light:#179299;--shiki-dark:#94E2D5">=</span><span style="--shiki-light:#FE640B;--shiki-dark:#FAB387"> 20</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> news_desk </span><span style="--shiki-light:#179299;--shiki-dark:#94E2D5">=</span><span style="--shiki-light:#40A02B;--shiki-dark:#A6E3A1"> 'Metro'</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> word_count </span><span style="--shiki-light:#179299;--shiki-dark:#94E2D5"><</span><span style="--shiki-light:#FE640B;--shiki-dark:#FAB387"> 1000</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> pub_date </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">between</span><span style="--shiki-light:#40A02B;--shiki-dark:#A6E3A1"> '2004-01-20'</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#40A02B;--shiki-dark:#A6E3A1"> '2009-01-20'</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">;</span></span></code></pre> <div class="tbl"> <pre class="shiki shiki-themes catppuccin-latte catppuccin-mocha" style="--shiki-light:#4c4f69;--shiki-dark:#cdd6f4;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#1e1e2e" tabindex="0"><code class="language-text"><span class="line"><span>┌────────────┬──────────────────────────────────────────────────────────────────────┬───────────┬────────────┬──────────────────────────┬────────────────────┐</span></span> <span class="line"><span>│ article_id │ headline │ news_desk │ word_count │ pub_date │ distance │</span></span> <span class="line"><span>├────────────┼──────────────────────────────────────────────────────────────────────┼───────────┼────────────┼──────────────────────────┼────────────────────┤</span></span> <span class="line"><span>│ 1717598 │ Manhattan: City to Expand Housing Program │ Metro │ 83 │ 2007-02-28T05:00:00+0000 │ 0.7736235857009888 │</span></span> <span class="line"><span>│ 1607183 │ Manhattan: More Money for Housing │ Metro │ 96 │ 2006-06-16T04:00:00+0000 │ 0.7818768620491028 │</span></span> <span class="line"><span>│ ... │</span></span> <span class="line"><span>│ 1772158 │ Ask About New York Architecture, On Screen and Off │ Metro │ 241 │ 2007-09-17T18:25:57+0000 │ 0.930429220199585 │</span></span> <span class="line"><span>│ 1673007 │ Manhattan: City Balances Budget for 26th Year │ Metro │ 87 │ 2006-11-01T05:00:00+0000 │ 0.9327330589294434 │</span></span> <span class="line"><span>│ 1616702 │ Little Shift in Prices of Manhattan Apartments │ Metro │ 615 │ 2006-07-06T04:00:00+0000 │ 0.9354249238967896 │</span></span> <span class="line"><span>└────────────┴──────────────────────────────────────────────────────────────────────┴───────────┴────────────┴──────────────────────────┴────────────────────┘</span></span></code></pre> </div> <p>There we retrieved the 20 most related article headlines to <code>'new york city housing'</code>, published by the <code>'Metro'</code> news desk, with less than 1000 words, published during the George W Bush administration.</p> <p>Metadata columns can be boolean, integer, floats, or text values. More types like <a href="https://github.com/asg017/sqlite-vec/issues/138">BLOBs</a>, <a href="https://github.com/asg017/sqlite-vec/issues/139">dates</a>, and <a href="https://github.com/asg017/sqlite-vec/issues/140">UUID/ULIDs</a> are coming soon!</p> <p>Only a subset of operators are supported during metadata filtering, including:</p> <ul> <li>Equality constraints, ie <code>=</code> and <code>!=</code></li> <li>Comparison constraints, ie <code>></code>, <code>>=</code>, <code><</code>, <code><=</code></li> <li><code>column in (...)</code> constraints, only on <code>INTEGER</code> and <code>TEXT</code> columns on SQLite 3.38 or above</li> </ul> <p>Notably absent: <code>REGEXP</code>, <code>LIKE</code>, <code>GLOB</code>, and other custom scalar functions. Also <a href="https://github.com/asg017/sqlite-vec/issues/141"><code>NULL</code> values are not supported yet</a>,</p> <h2 id="partition-keys-for-faster-where-clause-filtering" tabindex="-1"><a class="header-anchor" href="#partition-keys-for-faster-where-clause-filtering">¶</a> Partition keys for faster <code>WHERE</code> clause filtering</h2> <p>Now the above query was actually a bit slow! There are 3 million rows in the table, and metadata filters need to visit every single row to do a comparison. Metadata comparison are quite fast and built for fast filtering, but they have their limits.</p> <p>But notice how we only wanted a small subset of values – <code>between '2004-01-20' and '2009-01-20'</code> is only 5 years out of 30 years of data. We can tell the <code>vec0</code> virtual table to internally shard the vector index on a given key, using partition keys!</p> <pre class="shiki shiki-themes catppuccin-latte catppuccin-mocha" style="--shiki-light:#4c4f69;--shiki-dark:#cdd6f4;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#1e1e2e" tabindex="0"><code class="language-sql"><span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">create</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> virtual </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">table</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> vec_articles </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">using</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> vec0(</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> article_id </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">integer</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> primary key</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> headline_embedding </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">float</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">[384],</span></span> <span class="line"></span> <span class="line"><span style="--shiki-light:#9CA0B0;--shiki-light-font-style:italic;--shiki-dark:#6C7086;--shiki-dark-font-style:italic"> -- shard the vector index based on published year</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> year</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> integer</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> partition</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> key</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> news_desk </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">text</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> word_count </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">integer</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> pub_date </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">text</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">);</span></span> <span class="line"></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">select</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> article_id,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> headline,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> news_desk,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> word_count,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> pub_date,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> distance</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">from</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> vec_articles</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">where</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> headline_embedding </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">match</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> lembed(</span><span style="--shiki-light:#40A02B;--shiki-dark:#A6E3A1">'new york city housing'</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">)</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> k </span><span style="--shiki-light:#179299;--shiki-dark:#94E2D5">=</span><span style="--shiki-light:#FE640B;--shiki-dark:#FAB387"> 20</span></span> <span class="line"><span style="--shiki-light:#9CA0B0;--shiki-light-font-style:italic;--shiki-dark:#6C7086;--shiki-dark-font-style:italic"> -- narrow search to these years only</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> year</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> between</span><span style="--shiki-light:#FE640B;--shiki-dark:#FAB387"> 2004</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#FE640B;--shiki-dark:#FAB387"> 2009</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> news_desk </span><span style="--shiki-light:#179299;--shiki-dark:#94E2D5">=</span><span style="--shiki-light:#40A02B;--shiki-dark:#A6E3A1"> 'Metro'</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> word_count </span><span style="--shiki-light:#179299;--shiki-dark:#94E2D5"><</span><span style="--shiki-light:#FE640B;--shiki-dark:#FAB387"> 1000</span></span> <span class="line"><span style="--shiki-light:#9CA0B0;--shiki-light-font-style:italic;--shiki-dark:#6C7086;--shiki-dark-font-style:italic"> -- finer filtering for exact dates we care about</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> pub_date </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">between</span><span style="--shiki-light:#40A02B;--shiki-dark:#A6E3A1"> '2004-01-20'</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#40A02B;--shiki-dark:#A6E3A1"> '2009-01-20'</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">;</span></span></code></pre> <p>This KNN query returns the same exact results as the one above - but is 3x faster! This is because internally, vectors are stored based on the <code>year</code> value of its row. In that KNN query, <code>sqlite-vec</code> will recognize constraints on partition keys, and quickly pre-filter rows before any vectors are compared.</p> <p>But beware! It's easy to accidentally over-shard a vector index on the wrong values and cause performance issues. Partition keys are great for date-based items like <code>year</code> or <code>month</code>, particulary when each unique partition key value has 100's or 1000's of vectors. They are also great for user IDs or document IDs, for "per-user" or "per-document" vector indexes.</p> <p>Partition key columns can only be <code>TEXT</code> or <code>INTEGER</code> values, file an issue if you want to see some other type support. Currently <code>column in (...)</code> constraints are not supported for partition key columns, <a href="https://github.com/asg017/sqlite-vec/issues/142">but will be soon</a>!</p> <h2 id="auxiliary-columns" tabindex="-1"><a class="header-anchor" href="#auxiliary-columns">¶</a> Auxiliary columns</h2> <p>Some columns never need to be indexed! You can always store addtionally <code>SELECT</code>-only metadata in separate tables and do a <code>JOIN</code> yourself, or you can use auxiliary columns:</p> <pre class="shiki shiki-themes catppuccin-latte catppuccin-mocha" style="--shiki-light:#4c4f69;--shiki-dark:#cdd6f4;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#1e1e2e" tabindex="0"><code class="language-sql"><span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">create</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> virtual </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">table</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> vec_articles </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">using</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> vec0(</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> article_id </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">integer</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> primary key</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> headline_embedding </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">float</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">[384],</span></span> <span class="line"><span style="--shiki-light:#179299;--shiki-dark:#94E2D5"> +</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">headline </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">text</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"><span style="--shiki-light:#179299;--shiki-dark:#94E2D5"> +</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">url</span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> text</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">);</span></span> <span class="line"></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">select</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> article_id,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> headline,</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> url</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">,</span></span> <span class="line"><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> distance</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">from</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> vec_articles</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">where</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> headline_embedding </span><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7">match</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> lembed(</span><span style="--shiki-light:#40A02B;--shiki-dark:#A6E3A1">'dodgers game'</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">)</span></span> <span class="line"><span style="--shiki-light:#8839EF;--shiki-dark:#CBA6F7"> and</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4"> k </span><span style="--shiki-light:#179299;--shiki-dark:#94E2D5">=</span><span style="--shiki-light:#FE640B;--shiki-dark:#FAB387"> 20</span><span style="--shiki-light:#4C4F69;--shiki-dark:#CDD6F4">;</span></span></code></pre> <div class="tbl"> <pre class="shiki shiki-themes catppuccin-latte catppuccin-mocha" style="--shiki-light:#4c4f69;--shiki-dark:#cdd6f4;--shiki-light-bg:#eff1f5;--shiki-dark-bg:#1e1e2e" tabindex="0"><code class="language-text"><span class="line"><span>┌────────────┬─────────────────────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────────────────┐</span></span> <span class="line"><span>│ article_id │ headline │ url │ distance │</span></span> <span class="line"><span>├────────────┼─────────────────────────────────────────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────────────┤</span></span> <span class="line"><span>│ 1896278 │ Attention Dodgers Fans: There’s a Game Tonight │ https://bats.blogs.nytimes.com/2008/10/15/attention-dodgers-fans-theres-a-game-tonight/ │ 0.6733786463737488 │</span></span> <span class="line"><span>│ 2556896 │ Dodgers, in Flurry of Activity, Move to Revamp Their Infield │ https://www.nytimes.com/2014/12/11/sports/baseball/mlb-jimmy-rollins.html │ 0.7796685099601746 │</span></span> <span class="line"><span>│ 2382487 │ Keeping Up With the Dodgers │ https://www.nytimes.com/2012/12/15/sports/angels-keeping-up-with-the-dodgers-leading-off.html │ 0.7849781513214111 │</span></span> <span class="line"><span>│ 2585169 │ New Life for the Dodgers’ Old Digs │ https://www.nytimes.com/slideshow/2015/04/19/sports/baseball/20150419DODGERTOWN.html │ 0.7894293665885925 │</span></span> <span class="line"><span>│ 1032111 │ Not Dodgers II, but It's Baseball; The Game Is Back in Brooklyn, on a Smaller Scale │ https://www.nytimes.com/2001/06/23/nyregion/not-dodgers-ii-but-it-s-baseball-the-game-is-back-in-brooklyn-on-a-smaller-scale.html │ 0.7978747487068176 │</span></span> <span class="line"><span>└────────────┴─────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────┘</span></span></code></pre> </div> <p>Auxiliary columns are denoted by a <code>+</code> prefix in the column definition, modeled after <a href="https://www.sqlite.org/rtree.html#auxiliary_columns">the same feature in the SQLite R*Tree extension</a>. These columns are unindex, stored in a separate internal table and <code>JOIN</code>'ed at <code>SELECT</code> time. They <em>cannot</em> appear in a KNN <code>WHERE</code> query, as performance would worsen dramatically.</p> <p>But it saves you from dealing with additional <code>JOIN</code>s yourself! They are especially great for longer <code>TEXT</code> or <code>BLOB</code> values.</p> <h2 id="roadmap-and-the-future-of-sqlite-vec" tabindex="-1"><a class="header-anchor" href="#roadmap-and-the-future-of-sqlite-vec">¶</a> Roadmap and the future of <code>sqlite-vec</code></h2> <p>Metadata column support is the biggest update to <code>sqlite-vec</code> since the initial <a href="https://alexgarcia.xyz/blog/2024/sqlite-vec-stable-release/index.html"><code>v0.1.0</code> launch 3 months ago</a>, but I have a lot planned for the project!</p> <p>First off: <strong>ANN indexes.</strong> The <code>vec0</code> virtual table is brute-force only, which really slows down KNN queries on larger datasets. There are strategies like <a href="https://alexgarcia.xyz/sqlite-vec/guides/binary-quant.html">binary quantization</a> or <a href="https://alexgarcia.xyz/sqlite-vec/guides/matryoshka.html">Matryoshka embeddings</a> that can help, but <code>sqlite-vec</code> won't be fast until ANN indexes are supported.</p> <p>I delayed working on ANN indexes until metadata columns were supported, because its much easier to build an ANN index with metaddata filtering on day 1 than it is to retroactively try to support them. I think this was the right call — metadata columns are hard! Follow <a href="https://github.com/asg017/sqlite-vec/issues/25">issue #25</a> for future update on this!</p> <p>Next: <strong>Quantizers.</strong> Currently <code>sqlite-vec</code> only supported simple binary quantization and scalar quantization with <code>int8</code> vectors. But I want to support <code>float16</code>, <code>float8</code>, "smarter" binary quantization (ie custom thresholds instead of just <code>> 0</code>), and other techniques that have come about the last few months. This will also help support ANN indexes, as many of them rely on vector compression for fast queries.</p> <p>There's also a ton of <strong>performance work</strong> that <code>sqlite-vec</code> needs, especially with these new metadata column features. This initial release was more of a "make it work" and not "make it fast", so expect much faster metadata filtering in upcoming releases!</p> <p>Sister projects <a href="https://github.com/asg017/sqlite-lembed"><code>sqlite-lembed</code></a> and <a href="https://github.com/asg017/sqlite-rembed"><code>sqlite-rembed</code></a> also need a ton of love, they both have some older PRs that need merging. Expect releases of both of these projects very soon!</p> <p>And finally, <strong>a ton of smaller integrations</strong>! For example, Rody Davis <a href="https://github.com/asg017/sqlite-vec/pull/119">submitted Dart and Flutter bindings</a> that I have not yet merged, Oscar Franco contributed <a href="https://github.com/asg017/sqlite-vec/pull/91">Android and iOS bindings</a> that needs love, and <a href="https://github.com/asg017/sqlite-vec/issues/135">Pyodide support is on the horizon</a>.</p> </section></main></body></html>