CINXE.COM

LLM Optimization and Deployment on SiFive RISC-V Intelligence Products

<!DOCTYPE html><html lang="en-us"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1"/><title>LLM Optimization and Deployment on SiFive RISC-V Intelligence Products</title><meta name="description" content="SiFive outlines how we optimized &amp; deployed Llama on the SiFive Intelligence X390 platform, the future computing platform for AI/ML workloads."/><meta name="next-head-count" content="4"/><meta property="og:site_name" content="SiFive"/><meta property="og:image" content="https://www.sifive.com/assets/fb-black.png"/><meta name="robots" content="index, follow"/><link rel="canonical" href="https://www.sifive.com/blog/llm-optimization-and-deployment-on-sifive-intellig"/><link rel="alternate" hrefLang="en" href="https://www.sifive.com/blog/llm-optimization-and-deployment-on-sifive-intellig"/><link rel="alternate" hrefLang="zh-Hans" href="https://www.sifive.cn/blog/llm-optimization-and-deployment-on-sifive-intellig"/><link rel="alternate" hrefLang="x-default" href="https://www.sifive.com/blog/llm-optimization-and-deployment-on-sifive-intellig"/><link data-next-font="" rel="preconnect" href="/" crossorigin="anonymous"/><link rel="preload" href="/_next/static/css/ee57be04403e45b7.css" as="style"/><link rel="stylesheet" href="/_next/static/css/ee57be04403e45b7.css" data-n-g=""/><link rel="preload" href="/_next/static/css/aa72fe3452317b23.css" as="style"/><link rel="stylesheet" href="/_next/static/css/aa72fe3452317b23.css" data-n-p=""/><noscript data-n-css=""></noscript><script defer="" nomodule="" src="/_next/static/chunks/polyfills-42372ed130431b0a.js"></script><script src="/_next/static/chunks/webpack-380af1fb2c388ee2.js" defer=""></script><script src="/_next/static/chunks/framework-945b357d4a851f4b.js" defer=""></script><script src="/_next/static/chunks/main-9569f400f5cb17b6.js" defer=""></script><script src="/_next/static/chunks/pages/_app-c121de4cafa581ab.js" defer=""></script><script src="/_next/static/chunks/75fc9c18-a8c9805be41684e2.js" defer=""></script><script src="/_next/static/chunks/1984-2f1766273411e7ca.js" defer=""></script><script src="/_next/static/chunks/1330-8c560920e7fe43e3.js" defer=""></script><script src="/_next/static/chunks/7575-ef60c1d9c7a71970.js" defer=""></script><script src="/_next/static/chunks/135-31c0f870ebc24485.js" defer=""></script><script src="/_next/static/chunks/pages/blog/%5Buid%5D-7237454a774551ff.js" defer=""></script><script src="/_next/static/qSvTi_5dqOuR6wwdFd7qa/_buildManifest.js" defer=""></script><script src="/_next/static/qSvTi_5dqOuR6wwdFd7qa/_ssgManifest.js" defer=""></script></head><body><div id="__next"><header class="Header_main__C4a3S "><div class="container"><nav><a class="Header_logo__e4CTy" title="SiFive" href="https://www.sifive.com/"><svg xmlns="http://www.w3.org/2000/svg" width="100" height="34" fill="none" viewBox="0 0 100 34"><path fill="#252323" d="M4.274 17.6 8.862 3.375H26.44l1.784 5.534H12.696l-.992 3.14h17.533l2.636 8.175L17.65 30.708 4.135 20.739h9.127l4.349 3.207 8.605-6.343zM28.56 0H6.742L0 21.016 17.651 34l17.651-12.988zM53.086 9.614l-1.083 2.329c-1.68-1.03-3.358-1.456-4.529-1.456-1.524 0-2.519.582-2.519 1.635 0 3.425 8.352 1.59 8.33 7.232 0 2.8-2.43 4.523-5.833 4.523-2.43 0-4.728-1.008-6.318-2.485l1.126-2.284c1.59 1.478 3.579 2.284 5.236 2.284 1.811 0 2.894-.694 2.894-1.904 0-3.492-8.35-1.544-8.35-7.12 0-2.687 2.275-4.366 5.633-4.366 2.01 0 3.977.65 5.413 1.612M56.179 11.876h2.518v11.912h-2.518zm2.74-3.404c0 .896-.642 1.545-1.481 1.545s-1.48-.649-1.48-1.545c0-.918.64-1.567 1.48-1.567s1.48.65 1.48 1.567M62.986 8.063l-.073.073v15.652l.073.073h.928l.072-.073V16.83h7.35l.073-.073v-.94l-.072-.074h-7.351V9.128h8.3l.073-.074v-.918l-.072-.073zM75.05 7.884c-.466 0-.845.394-.845.88 0 .497.38.901.845.901.467 0 .846-.404.846-.902 0-.485-.38-.88-.846-.88M74.586 12.049l-.072.073v11.666l.072.073h.906l.073-.073V12.122l-.073-.073zM87.267 12.049l-.067.046-4.064 10.485-4.085-10.485-.068-.046h-.994l-.068.1 4.662 11.666.067.046h.95l.067-.046 4.64-11.666-.067-.1zM89.976 17.31c.25-2.567 2.09-4.286 4.604-4.286 2.492 0 4.223 1.68 4.43 4.286zm10.023.903c.036-1.915-.533-3.577-1.6-4.679-.956-.987-2.276-1.508-3.819-1.508-3.293 0-5.683 2.498-5.683 5.94 0 3.429 2.39 5.917 5.683 5.917 1.807 0 3.411-.668 4.517-1.882v-.099l-.53-.604h-.108c-.922 1.024-2.284 1.588-3.834 1.588-2.643 0-4.513-1.845-4.674-4.6h9.976z"></path></svg></a><ul class="Header_menu__w3kGA"><li><span class="">Products</span><ul class="Header_submenu__km9kr "><li><a class="" href="https://www.sifive.com/risc-v-core-ip">SiFive Core IP</a></li><li><a class="Header_sub__mH0MR" href="https://www.sifive.com/cores/performance">Performance</a></li><li><a class="Header_sub__mH0MR" href="https://www.sifive.com/cores/intelligence">Intelligence</a></li><li><a class="Header_sub__mH0MR" href="https://www.sifive.com/cores/automotive">Automotive</a></li><li><a class="Header_sub__mH0MR" href="https://www.sifive.com/cores/essential">Essential</a></li><li><a class="" target="_blank" href="https://scs.sifive.com/accounts/login/">SiFive Core Designer</a></li><li><a class="" href="https://www.sifive.com/software">Software</a></li><li><a class="" href="https://www.sifive.com/boards">Boards</a></li><li><a class="" href="https://www.sifive.com/documentation">Documentation</a></li><li><a class="" href="http://support.sifive.com/">Customer Support</a></li></ul></li><li><span class="">Technology</span><ul class="Header_submenu__km9kr "><li><a class="" href="https://www.sifive.com/technology/risc-v">RISC-V</a></li><li><a class="" href="https://www.sifive.com/technology/vectors">Vectors</a></li><li><a class="" href="https://www.sifive.com/technology/sifive-insight">Trace+Debug</a></li><li><a class="" href="https://www.sifive.com/technology/shield-soc-security">Security</a></li></ul></li><li><span class="">Company</span><ul class="Header_submenu__km9kr "><li><a class="" href="https://www.sifive.com/about">About</a></li><li><a class="" href="https://www.sifive.com/press">Newsroom</a></li><li><a class="" href="https://www.sifive.com/careers">Careers</a></li><li><a class="" href="https://www.sifive.com/contact">Contact Us</a></li></ul></li><li><span class="">Community</span><ul class="Header_submenu__km9kr "><li><a class="" href="https://www.sifive.com/blog">Blog</a></li><li><a class="" href="https://www.sifive.com/partners">Partners</a></li></ul></li><li class="Header_contact__4Nm3F"><a href="https://www.sifive.com/contact-sales">Contact</a></li><li class="Header_lang__6RHwC"><div class="LanguageToggle_languageToggle__czKEY "><svg xmlns="http://www.w3.org/2000/svg" width="96" height="96" viewBox="0 0 96 96" class="LanguageToggle_icon__WTIGJ"><defs><clipPath id="a"><path d="M0 0h96v96H0z"></path></clipPath></defs><g clip-path="url(#a)"><path d="M95.971 49.8a55 55 0 0 0 0-3.628v-.167h-.01a47.84 47.84 0 0 0-10.423-27.948c.01-.01.02-.01.029-.02-.137-.147-.274-.294-.4-.441a47.85 47.85 0 0 0-35-17.577l-3.83-.02A47.99 47.99 0 0 0 .078 46.005H0v3.843h.069a47.986 47.986 0 0 0 95.9-.01Zm-3.624-3.794H73.185a83.2 83.2 0 0 0-3.223-21.762 62.5 62.5 0 0 0 12.343-4.421 43.9 43.9 0 0 1 10.041 26.182ZM50.165 3.931c6.162 1.157 11.559 7.872 15.047 17.724a90.4 90.4 0 0 1-15.047 1.529Zm-3.83 0v19.243a90 90 0 0 1-15.027-1.539C34.785 11.793 40.173 5.1 46.335 3.931m0 22.841v19.233H27.174A82.1 82.1 0 0 1 30.2 25.046a90.7 90.7 0 0 0 16.135 1.726m0 23.076v19.233a83.2 83.2 0 0 0-16.085 1.873 81.9 81.9 0 0 1-3.066-21.106Zm0 23.066v19c-6.083-1.147-11.422-7.705-14.909-17.351a83.5 83.5 0 0 1 14.909-1.649M36.8 90.6a44.1 44.1 0 0 1-19.608-11.285 56.3 56.3 0 0 1 10.6-3.9C30.1 81.776 33.2 87.011 36.8 90.6m13.362 1.323V72.914a83 83 0 0 1 14.919 1.647c-3.481 9.656-8.823 16.214-14.916 17.361Zm18.556-16.509a55.6 55.6 0 0 1 10.6 3.892 44 44 0 0 1-19.6 11.283c3.593-3.589 6.689-8.813 9-15.175m-18.553-6.343V49.848h19.161a81.8 81.8 0 0 1-3.066 21.1 84 84 0 0 0-16.095-1.877m0-23.066V26.772a91 91 0 0 0 16.144-1.716A82.1 82.1 0 0 1 69.326 46l-19.161.01Zm29.711-28.9a58.6 58.6 0 0 1-11.011 3.755c-2.331-6.558-5.476-11.94-9.159-15.616a44.2 44.2 0 0 1 20.17 11.862ZM36.8 5.254c-3.683 3.666-6.828 9.048-9.159 15.6A58.4 58.4 0 0 1 16.65 17.08 44.05 44.05 0 0 1 36.8 5.254M14.224 19.782a62.5 62.5 0 0 0 12.323 4.441 83.8 83.8 0 0 0-3.233 21.792H4.153a43.96 43.96 0 0 1 10.071-26.233m9.1 30.066a83.5 83.5 0 0 0 3.291 22 58.4 58.4 0 0 0-12.01 4.666A44.1 44.1 0 0 1 4.153 49.848zM81.9 76.512a58.4 58.4 0 0 0-12.01-4.666 83.5 83.5 0 0 0 3.291-22h19.161A44.02 44.02 0 0 1 81.9 76.512" data-name="globe-icon"></path></g></svg><span class="LanguageToggle_label__RGGsn LanguageToggle_hidden__wDK_t">English</span><svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 16 16" class="LanguageToggle_chevron__Fg2so"><path fill="currentColor" d="M6 13.4 4.6 12l4-4-4-4L6 2.6 11.4 8z"></path></svg><ul class="LanguageToggle_dropdown__vSjIc"><li><a href="https://www.sifive.com/blog/llm-optimization-and-deployment-on-sifive-intellig">English</a></li><li><a href="https://www.sifive.cn/blog/llm-optimization-and-deployment-on-sifive-intellig">简体中文</a></li></ul></div></li></ul></nav><div class="Header_mobile__QA_4r"><span></span><span></span><span></span></div></div><div class="Header_navBack__SfLTa"></div></header><main><div class="press_pressDetail__1EgAt"><div class="SubNav_subnav__FY9fa"><div class="container"><div class="row"><div class="col-lg-6 col-md-6 col-6"><div class="SubNav_subnav__breadcrumbs__TqlGv"><a class="SubNav_backButton__FzG6V" href="https://www.sifive.com/blog"><svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 16 16" class="SubNav_backButton__icon__ugkqE"><path fill="currentColor" d="M6 13.4 4.6 12l4-4-4-4L6 2.6 11.4 8z"></path></svg></a><a class="SubNav_link__eG9FA SubNav_breadcrumb__7jfaE" href="https://www.sifive.com/blog">The Blog</a></div></div><div class="col-lg-6 col-md-6 col-6"><div class="SubNav_subnav__section__GVSTv"><div class="SocialIcons_socialIcons__mvMQ7"><a class="SocialButton_button__yEyai SocialButton_type-twitter__XLwtS" target="_blank" href="https://twitter.com/intent/tweet?via=SiFive&amp;url=https://www.sifive.com%2Fblog%2Fllm-optimization-and-deployment-on-sifive-intellig&amp;text=LLM Optimization and Deployment on SiFive RISC-V Intelligence Products"><svg xmlns="http://www.w3.org/2000/svg" width="30" height="25" fill="none" viewBox="0 0 30 25" class="SocialButton_button__icon__OJaTI"><path fill="#CBD5E1" d="M29.506 3.428c-1.093.477-2.25.794-3.433.94a5.93 5.93 0 0 0 2.62-3.292 11.9 11.9 0 0 1-3.777 1.442A5.958 5.958 0 0 0 14.61 6.592c0 .472.04.927.138 1.359A16.86 16.86 0 0 1 2.467 1.719a5.967 5.967 0 0 0 1.83 7.963 5.9 5.9 0 0 1-2.691-.734v.065a5.985 5.985 0 0 0 4.773 5.855c-.51.134-1.035.2-1.562.196a5.3 5.3 0 0 1-1.128-.102 6.01 6.01 0 0 0 5.568 4.15 11.97 11.97 0 0 1-7.388 2.542c-.488 0-.957-.022-1.426-.082a16.77 16.77 0 0 0 9.14 2.674c10.964 0 16.959-9.082 16.959-16.954 0-.264-.01-.518-.022-.77a11.9 11.9 0 0 0 2.986-3.094"></path></svg><span class="SocialButton_button__text__wnofO">Tweet</span></a><a class="SocialButton_button__yEyai SocialButton_type-linkedin__MdkUU" target="_blank" href="https://linkedin.com/shareArticle?mini=true&amp;url=https://www.sifive.com%2Fblog%2Fllm-optimization-and-deployment-on-sifive-intellig"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="23" fill="none" viewBox="0 0 24 23" class="SocialButton_button__icon__OJaTI"><path fill="#CBD5E1" d="M5.744 7.749H.923v14.502h4.821zM3.369.75C1.719.75.642 1.834.642 3.256c0 1.391 1.045 2.506 2.663 2.506h.031c1.681 0 2.728-1.115 2.728-2.506C6.032 1.834 5.017.75 3.369.75M17.592 7.412c-2.557 0-3.703 1.406-4.344 2.394V7.753h-4.82c.064 1.36 0 14.502 0 14.502h4.82v-8.099c0-.433.03-.867.159-1.177.347-.865 1.141-1.762 2.473-1.762 1.743 0 2.442 1.33 2.442 3.279v7.759h4.82v-8.316c0-4.455-2.379-6.527-5.55-6.527"></path></svg><span class="SocialButton_button__text__wnofO">Share</span></a><a class="SocialButton_button__yEyai SocialButton_type-facebook__9_MjH" target="_blank" href="https://facebook.com/sharer/sharer.php?u=https://www.sifive.com%2Fblog%2Fllm-optimization-and-deployment-on-sifive-intellig&amp;quote=LLM Optimization and Deployment on SiFive RISC-V Intelligence Products"><svg xmlns="http://www.w3.org/2000/svg" width="27" height="28" fill="none" viewBox="0 0 27 28" class="SocialButton_button__icon__OJaTI"><path fill="#CBD5E1" d="M26.25 14.143C26.25 6.85 20.374.938 13.125.938S0 6.85 0 14.143c0 6.59 4.8 12.054 11.074 13.045V17.96H7.742v-3.817h3.332v-2.91c0-3.31 1.96-5.137 4.958-5.137 1.436 0 2.938.258 2.938.258v3.25h-1.655c-1.63 0-2.14 1.017-2.14 2.062v2.477h3.64l-.581 3.817h-3.058v9.228c6.274-.991 11.074-6.454 11.074-13.045"></path></svg><span class="SocialButton_button__text__wnofO">Share</span></a></div></div></div></div></div></div><div class="container"><div class="row"><div class="col-lg-8 offset-lg-2"><div class="BlogPosts_body__NgX5s"><div><p class="BlogPosts_bodyMeta__jTzbI"><strong>SiFive</strong> - <!-- -->October 10, 2024</p><h1 class="BlogPosts_bodyTitle___Jbrk">LLM Optimization and Deployment on SiFive RISC-V Intelligence Products</h1><div><div class="MarkDown_markdown__xtH9f"><p><strong>Bruce Lai, Darren Hsieh and Hong-Rong Hsu</strong></p> <p>Large language models (LLMs) have become essential to numerous applications, thanks to their powerful capabilities in natural language understanding and text generation. However, their large model sizes and high computational demands present significant challenges for efficient deployment and real-time performance.</p> <p>Can you imagine running Llama on a RISC-V machine while achieving real-time performance? The ML compiler plays a crucial role in making this possible. For more details, please refer to our previous article “ML Compiler for RISC-V Vector”. In this article, we will share how we optimized and deployed Llama on the SiFive Intelligence X390 platform[ref0], the future computing platform for AI/ML workloads.</p> <p><strong>Want to learn more about this? Watch our webinar:</strong></p> <p><a href="https://www.sifive.com/resources/webinar/advanced-llm-optimization-and-deployment-on-risc-v">Watch Webinar Now</a></p> <p><strong>SiFive AI/ML Software Stack</strong></p> <p>To enable LLM models on RISC-V, multiple components need to interact seamlessly. In this section, we will introduce each component of the SiFive AI/ML Software Stack.</p> <p>The orange blocks in Figure 1 represent the fundamental RISC-V building blocks, primarily owned and maintained by SiFive:</p> <ul> <li> <p><strong>SiFive Intelligence and High-Performance core series</strong>: Cores well known as <a href="https://www.sifive.com/cores/intelligence-x280">X280</a>/<a href="https://www.sifive.com/cores/intelligence-x390">X390</a> (Intelligence) and <a href="https://www.sifive.com/cores/performance-p450-470">P470</a>/<a href="https://www.sifive.com/cores/performance-p650-670">P670</a>/<a href="https://www.sifive.com/cores/performance-p870-p870a">P870</a> (High-Performance).</p> </li> <li> <p><strong>SiFive accelerators</strong>: In-house Hardware to accelerate domain specific instructions like SiFive XM Series[ref9].</p> </li> <li> <p><strong>SiFive LLVM Compiler</strong>: Provides RISC-V C/C++ compilation and optimization, RVV intrinsic programming, auto-vectorization, and efficient RISC-V backend code generation for MLIR. SiFive offers a proprietary version of LLVM with advanced u-architecture optimizations and custom instruction compilation/IR for SiFive cores, though users can also access a generic version from the upstream compiler directly.</p> </li> <li> <p><strong>SiFive System Software</strong>: Provide FSFL (Freedom SDK for Linux) - the Yocto/OpenEmbedded based RISC-V Linux solution and FSFM (Freedom SDK for Metal) - a reference ASM/C/C++ bare-metal environment for SiFive cores.</p> </li> <li> <p><strong>SKL (SiFive Kernel Library)</strong>: A fully optimized C/C++ library with a set of tuned routines that maximize algorithm throughput on SiFive Processors. Some hot-spot operations in IREE will be offloaded to SKL in order to maximize performance from SiFive processors.</p> </li> </ul> <p><img src="https://images.prismic.io/sifive/ZwlBvIF3NbkBXTts_fig1.png?auto=format,compress" alt="SiFive AI/ML Software Stack"> <strong>Figure 1: SiFive AI/ML Software Stack</strong></p> <p>The blue blocks in Figure 1 represent the components that SiFive leverages from and contributes back to Open Source projects. ML Compiler and Runtime: IREE[ref1] is an open sourced MLIR-based compiler and runtime. We leverage most of the generic optimization while adding SiFive architecture specific functions and optimizations.</p> <p><strong>VCIX MLIR Dialect:</strong> SiFive has open-sourced the VCIX MLIR dialect, allowing users to lower their models and delegate them to custom TPUs with minimum effort.</p> <p><strong>ML Interpreters:</strong> For customers requiring a more lightweight framework, we offer:</p> <ul> <li>Customized TFLite with RVV optimizations.</li> <li>Upstream XNNPACK with RVV optimizations.</li> <li>ONNXRuntime and delegated to XNNPACK.</li> <li>Customized Llama.cpp with additional RVV optimizations</li> <li>Other Open Source Libraries like libyuv, libc, openSSL, zlib-ng to accelerate other non-AI/ML domains in RVV and fine-tuned for SiFive u-architecture.</li> </ul> <p>The yellow blocks in Figure 1 represent the components provided by third-party vendors or communities, which SiFive leverages to create a comprehensive solution.</p> <p>Read more about how <a href="https://www.sifive.com/blog/sifive-accelerates-risc-v-vector-integration-in-xnnpack-for-optimized-ai-inference">SiFive Accelerates RISC-V Vector Integration in XNNPACK for Optimized AI Inference</a></p> <p><strong>Pytorch end-to-end flow</strong></p> <p>In this section, we’ll introduce the implementation of a Pytorch end-to-end flow for Large Language Models (LLM) using SHARK-Turbine, a tool supported by AMD for facilitating various model deployment activities. The complete workflow for enabling the LLM demonstration is illustrated in Figure 2.</p> <p><img src="https://images.prismic.io/sifive/ZwlNrIF3NbkBXUcz_fig2.png?auto=format,compress" alt="Pytorch LLM Demo"> <strong>Figure 2: Pytorch end-to-end Large Language Model Demo flow in SHARK-Turbine</strong></p> <p>SHARK-Turbine[ref2] provides two key scripts to enable the LLM demonstration. The first script, stateless_llm.py, is responsible for converting a PyTorch Hugging Face LLM into a VMFB (VM FlatBuffer) and a parameter file. This conversion process utilizes the FX/Dynamo-based torch-mlir compiler along with the IREE compiler. Currently we are using upstream IREE along with several SiFive patches, which incorporates SiFive MLIR optimizations and leverages SiFive LLVM to achieve optimal performance. Detailed optimization strategies are discussed in the subsequent section.</p> <p>After the LLM is compiled, the second script, llm_runner.py, utilizes Python bindings to invoke IREE runtime, load VMFB models, and set up the runtime context. This allows the LLM to run within a Python environment on SiFive Intelligence cores. Users can input queries (prompts), and the SiFive Intelligence platform will execute multiple inferences to generate meaningful responses.</p> <p>Now, you might be wondering: Why not simply use llama.cpp for LLM inference, since the GGUF format (a file format for storing models for inference with llama.cpp) works fine? We will explain this in the performance section.</p> <p><strong>Llama Optimization</strong></p> <p>Model Architecture and Bottleneck Understanding model architecture helps us identify the performance bottleneck. LLaMA (Large Language Model Meta AI) is a series of transformer-based[ref3] large language models developed by Meta. Figure 3 shows the overview of LLaMA architecture. This architecture utilizes self-attention mechanisms and feed-forward neural networks to process input sequences. The key components of the architecture include:</p> <ul> <li>Embedding Layer: Converts tokens into dense vectors.</li> <li>Multi-Head Self-Attention Mechanism: Computes relationships between all pairs of tokens, enabling the model to capture long-range dependencies.</li> <li>Feed-Forward Networks (FFNs): Contains a series of matrix multiplication(matmul) operations and non-linear functions to provide the non-Linearity and feature Transformation.</li> <li>Layer Normalization: Normalizes activations for stability and faster convergence.</li> <li>Residual Connections: Avoid gradient vanishment during backpropagation.</li> </ul> <p><img src="https://images.prismic.io/sifive/Z2RlopbqstJ98sg7_fig3.png?auto=format,compress" alt="Overview of LLaMA architecture"> <strong>Figure 3: Overview of the LLaMA architecture[ref4]</strong></p> <p>The Self-Attention and Feed-Forward layers are implemented through a series of matrix multiplication operations.</p> <p>The figure 3 shows that there are N self-attention and N feed-forward networks, both implemented through a series of matrix multiplication operations. This extensive reliance on matrix multiplication makes it the primary bottleneck in transformer-based models.</p> <p><strong>Performance Results</strong></p> <p>Performance profiling result for TinyLLama using IREE during the decode phase. Different operations can be grouped into a single dispatch. For example, matrix multiplication (matmul) followed by bias addition can be combined into one dispatch. The &quot;Operation Type&quot; column indicates the primary operation within each dispatch. Notably, matmul operations are the primary performance bottleneck, consuming over 95% of the inference time. “D” in the shape means dynamic shape, which handles the increasing dimension from the KV-cache.</p> <p><img src="https://images.prismic.io/sifive/Zw6mp4F3NbkBXeXd_table1.png?auto=format,compress" alt="Table 1"> <strong>Table 1</strong></p> <p>Table 1 shows the performance profiling results of running TinyLLama with IREE during the decode phase. The next section will provide more detailed information about the &quot;prefill&quot; and &quot;generation&quot; phases. Briefly, in the prefill phase, the matrix multiplication (matmul) operations follow the General Matrix Multiply (GEMM) pattern, where the M dimension is greater than or equal to 1. In contrast, during the decode phase, the matmul operations follow the General Matrix-Vector Multiply (GEMV) pattern, where the M dimension is always equal to 1.The shape for the matmul operation is represented as [BxMxNxK], where: B is batch size and it’s optional; M is rows in output; N is columns in output; K is reduction dimension. For the case of dispatch_60, the batch size is 32(from the number of attention heads) and D indicates dynamic input in the N dimension.</p> <p>Table 1 demonstrates that matmul operations account for over 95% of the decode phase inference time, making them the primary performance bottleneck in LLaMA.</p> <p>Since matmul operations are the main performance hotspot in LLM inference, the next section will focus on optimizing these operations.</p> <p><strong>Optimization through IREE Compiler</strong></p> <p>IREE (Intermediate Representation Execution Environment) is an MLIR-based end-to-end AI/ML compiler and runtime. The Architecture Overview is shown in figure4. In IREE, the input model is lowered to MLIR and then different levels of optimizations are applied (such as kernel fusion, tiling, and loop unrolling) and finally translated to target-dependent VM Bytecode. The VM Bytecode is able to execute with IREE runtime.</p> <p><img src="https://images.prismic.io/sifive/ZwlMhYF3NbkBXUYJ_fig4.png?auto=format,compress" alt="IREE Architecture"> <strong>Figure 4. IREE Architecture Overview[ref1]</strong></p> <p>Although IREE is a powerful framework that supports both CPU &amp; GPU code generation and optimizations, RISC-V backend with RVV vectorization hasn’t been tuned to optimal at this point. Figure 5 shows the significant performance improvement between the code before and after optimizations.</p> <p><img src="https://images.prismic.io/sifive/ZwlMhoF3NbkBXUYL_fig5.png?auto=format,compress" alt="TinyLLama on SiFive-X390"> <strong>Figure 5. Performance Gap between before and after optimizations</strong></p> <p>Several optimizations will be demonstrated in the following subsection to improve the performance of the matmul operation, leading to significant improvements in LLM performance. Cache &amp; Register Tiling optimizations for matmul In LLM inference, the process is divided into two distinct phases: Prefill (Prompt) and Decode (Generation). The LLM inference process is visualized in Figure 6.</p> <p><img src="https://images.prismic.io/sifive/ZwlMh4F3NbkBXUYN_fig6.png?auto=format,compress" alt="Prefill Prompt"> <strong>Figure 6. Prefill (Prompt) and Decode(Generation) phases in the LLM inference.[ref5]</strong></p> <p>Prefill or Prompt Phase: During this phase, the input prompt has a length greater than or equal to 1. The model processes the initial input sequence and constructs the KV-Cache, which will be reused in subsequent decoding steps. Assuming the matmul has a problem size of [m,n,k] where the left-hand side (LHS) is [m,k], the dimension m is always greater than or equal to 1.</p> <p>Decode or Generation Phase: This phase begins with the output of the prefill phase as the first input. The model uses the KV-Cache to efficiently generate tokens in the following iterations. The matmul operations in this phase have the dimension m=1. This smaller m value (1) is a key distinction compared to traditional operations in neural network models.</p> <p>To optimize the matmul operation using RISC-V Vector (RVV), two key factors affect the efficiency of matmul:</p> <p><strong>Register Tiling</strong>: This focuses on maximizing the utilization of the CPU's vector registers during matrix multiplication. By carefully organizing data to fit into vector registers, register tiling helps minimize memory access and maximize computational throughput.</p> <p><strong>Cache Tiling</strong>: This strategy optimizes matrix multiplication by improving data locality, ensuring that data remains in the cache hierarchy for as long as possible. Efficient cache tiling reduces memory latency and improves the performance of the matmul operation by minimizing cache misses.</p> <p>In IREE, the matmul operation is implemented using an outer product approach. The output register tile size is [m, n], where m and n represent the number of elements in the output rows and columns, respectively. Implementing the outer product with RVV requires m vector grouped registers for output accumulators and 1 additional vector grouped register to load data from the RHS matrix. Each vector grouped register has a length of n. Hence, the vector register utilization rate can be calculated using the formula:</p> <p><img src="https://images.prismic.io/sifive/Zw6lP4F3NbkBXeWz_formula1.png?auto=format,compress" alt="FORMULA 1"></p> <p>For instance, if VLEN=1024, the data type is float, and [m,n]= [8,32]</p> <p><img src="https://images.prismic.io/sifive/Zw6lQIF3NbkBXeW0_formula2.png?auto=format,compress" alt="FORMULA INSTANCE"></p> <p>Upon analyzing the output assembly code, we discovered that IREE generates code with a fixed output register tile size of [m,n]=[8,32] on RISC-V platforms. This configuration reduces the vector register utilization rate to 28% when VLEN = 1024. The optimal case occurs when VLEN = 512, where the utilization rate improves to 56%. However, for smaller VLEN values, such as VLEN = 128, the generated code suffers from excessive vector register spilling, leading to poor performance.</p> <p>To enhance performance, we adjusted the register tile size to [m, n] = [7, m4], where m4 corresponds to LMUL = 4. In this case, if VLEN = 1024 and the data type is float32, the value of n will be 128, which is the maximum vector length when LMUL = 4. This adjustment allows us to achieve a 100% utilization rate of the vector registers, maximizing performance.</p> <p>The register tile configuration [m,n]=[7,m4] is well-suited for the prefill phase of LLM inference, where the matrix dimension M typically exceeds 7. However, this setup is suboptimal for the decode phase. During decoding, we modified the register tile size to [m,n]=[1,m8], which significantly improved the vector register utilization rate from 25% to 50%.</p> <p>Following the adjustment to the register tiling policy for enhanced performance, we extended our optimization to the cache tiling strategy within the IREE compiler. The cache tiling size is now dynamically selected based on the register tiling size n. For instance, when n=128, IREE defaults to an n-dimension cache tile size of 128. By fine-tuning this parameter to match the specific requirements of the system, we are able to further enhance cache efficiency, yielding notable performance improvements.</p> <p>Using the IREE compiler, it is fast and convenient to tune the tiling policy at each level of the memory hierarchy. Compared to library-based solutions, IREE significantly reduces the required engineering effort.</p> <p><strong>Performance Results</strong></p> <p>Originally, generating a single token on a 32.5 MHz FPGA took several hours. After applying IREE optimizations, the f16 TinyLlama model now generates a token in about 5 seconds. Normalized to a 1 GHz real chip frequency, this translates to 5.37 tokens per second, achieving real-time user experience on single-core X390.</p> <p>We also compared IREE with Llama.cpp, a pure C/C++ LLM inference project. While llama.cpp is convenient for deployment, its performance optimizations are limited. Even with hand-written RVV kernels, it is constrained by algorithm limitations and lacks graph-level optimization.</p> <p><img src="https://images.prismic.io/sifive/Zw6moYF3NbkBXeXa_table2.png?auto=format,compress" alt="Table 2"> <strong>Table 2: Performance (tokens/second) of TinyLlama-1.1b on X390 between llama.cpp and IREE</strong></p> <p>The data also shows that the SiFive X390 is competitive, as a single core can achieve real-time user experience performance on TinyLlama.</p> <p>Not only that, leveraging the optimizations we have made allows us to successfully run Llama2-7b on the SiFive Intelligence platform, as the model architectures of TinyLlama and Llama2-7b are the same. The e2e flow and scripts can be leveraged as well.</p> <p><img src="https://images.prismic.io/sifive/Zw6mooF3NbkBXeXb_table3.png?auto=format,compress" alt="Table 3"> <strong>Table 3: Llama2-7b-Q4 performance comparison</strong></p> <p><strong>Accuracy Verification</strong></p> <p>After optimization, securing accuracy is also important for deployment. This section will describe how we verify the model accuracy.</p> <p><strong>Leveraging MLPerf's Llama Accuracy Flow for TinyLlama</strong></p> <p>To align with industry standards, we leverage MLPerf’s benchmarks to evaluate the accuracy of our models, ensuring our solutions are both reliable and effective. Currently, our focus is solely on accuracy—we’ve yet to leverage the performance benchmarking aspects of MLPerf.</p> <p>While MLPerf picked the Llama2 70B model as a benchmark, we adapt it by using TinyLlama, which provides a more manageable, scaled-down version of the original model.</p> <p><strong>Adapting MLPerf’s Llama Accuracy Flow for TinyLlama</strong></p> <p>Our approach revolves around using <strong>MLPerf's Llama accuracy flow</strong> to ensure that the TinyLlama model maintains accuracy while adapting IREE runtime on SiFive Intelligence Cores. This flow provides a robust framework to evaluate how well our models perform on OPENORCA dataset. With the <strong>Llama accuracy script</strong>, we continuously assess the accuracy of TinyLlama, allowing us to refine the model and make necessary adjustments to maximize precision within the constraints of our hardware.</p> <p><img src="https://images.prismic.io/sifive/ZwlMiIF3NbkBXUYP_fig7.png?auto=format,compress" alt="How LoadGen interacts with system"> <strong>Figure 7. How the LoadGen interacts with the Inference system.[ref7]</strong></p> <ol> <li>Benchmark knows the model, dataset, and preprocessing.</li> <li>Benchmark hands dataset sample IDs to LoadGen.</li> <li>LoadGen starts generating queries of sample IDs.</li> <li>Benchmark creates requests to the backend.</li> <li>Result is post-processed and forwarded to LoadGen.</li> <li>LoadGen outputs logs for analysis.</li> <li>Accuracy script uses LoadGen Logs and reference responses from the dataset to calculate the ROUGE scores.</li> <li>Accuracy script outputs the ROUGE scores.</li> </ol> <p>“Result not verified by MLCommons Association.” Followed by MLPerf Policies[ref8]</p> <p><img src="https://images.prismic.io/sifive/Zw6mo4F3NbkBXeXc_table4.png?auto=format,compress" alt="Table 4"> <strong>Table 4 : MLPerf OPENORCA Accuracy Comparison of TinyLlama-1.1b Across Different Frameworks and Platforms</strong></p> <p>The MLPerf taskforce selected ROUGE scores to evaluate how closely a generated text aligns with its reference. They decided to use three specific ROUGE metrics for this benchmark: ROUGE-1, ROUGE-2, and ROUGE-L.</p> <p>Limited by FPGA emulation speed, we executed only 100 samples to validate accuracy. For the FP32 model, the results indicate that the SiFive X390 with IREE achieved identical scores compared with the NVIDIA 2080 Ti and HuggingFace backend. In contrast, for the FP16 model, the X390 with IREE demonstrated slightly better accuracy than the FP32. It is important to note that the full dataset in MLPerf OpenOrca consists of 24,576 samples, and reference results from NVIDIA and Hugging Face show a decline in ROUGE scores after processing the entire dataset.</p> <p>Unverified MLPerf® v4.0 Inference Closed Llama2 offline. Result not verified by MLCommons Association. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.”</p> <p><strong>Demo Snapshot</strong></p> <p>The following snapshot was taken from a demo running on the X390 FPGA. The operating system is the Linux kernel from the SiFive FSFL (Freedom SDK for Linux). The prerequisites include pre-compiled VMFB, model weights (in .safetensors format), the IREE runtime, the llm_runner.py script, and the IREE Python binding package.</p> <p><img src="https://images.prismic.io/sifive/ZwlMiYF3NbkBXUYR_fig8.png?auto=format,compress" alt="Demo Snapshot"> <strong>Figure 7. Demo snapshot on X390 FPGA</strong></p> <p><strong>Conclusion and Future Work</strong></p> <p>In this article, we walked through the end-to-end process of deploying TinyLlama and Llama2-7b-Q4, from PyTorch to the SiFive Intelligence X390 platform, demonstrating the readiness of running LLMs on the cutting-edge RISC-V ecosystem. We also highlighted how using IREE for optimization can achieve significantly better performance compared to library-based frameworks. Accuracy validation post-optimization is equally important, which is why we have integrated our software stack into the MLPerf framework to ensure accuracy. We plan to upstream the generic optimizations to the IREE repository, so stay tuned if you're interested in our work.</p> <p>Exciting tasks we're currently working on include:</p> <ul> <li>For the upcoming SiFive XM series platform[ref9], we are enabling end-to-end lowering and optimization for SiFve Intelligence X390 cores and the AI matrix engine via IREE.</li> <li>Supporting the installation and native execution of PyTorch on RISC-V platforms.</li> <li>Enabling Llama 3.2 on SiFive platforms. We look forward to sharing more in our next technical blog!</li> </ul> <hr> <p><strong>Webinar</strong></p> <p>SiFive hosted a live webinar titled <strong>Advanced LLM Optimization and Deployment on RISC-V: Techniques You Need to Know</strong> on Wednesday, October 16, 2024.</p> <p>In it, you'll learn about:</p> <ul> <li>SiFive AI/ML Software Stack for RISC-V</li> <li>End-to-end deployment of Pytorch Llama models</li> <li>Challenges and solutions in optimizing LLM models</li> <li>Achieving real-time Llama performance with MLIR-based IREE</li> </ul> <p><strong>English Session</strong> <a href="https://www.sifive.com/resources/webinar/advanced-llm-optimization-and-deployment-on-risc-v">Watch Webinar: LLM Optimization and Deployment on SiFive Intelligence</a></p> <p><strong>Chinese Session</strong>: <a href="https://www.sifive.cn/resources/webinar/advanced-llm-optimization-and-deployment-on-risc-v" target="_blank" rel="noopener">Watch Webinar: LLM Optimization and Deployment on SiFive Intelligence</a></p> <p><strong>References</strong></p> <p><a href="https://www.sifive.com/cores/intelligence-x390">SiFive Intelligence X390</a><br /> <a href="https://iree.dev/" target="_blank" rel="noopener">IREE</a><br /> <a href="https://github.com/nod-ai/SHARK-Turbine/tree/main" target="_blank" rel="noopener">SHARK-Turbine</a><br /> <a href="https://arxiv.org/abs/1706.03762" target="_blank" rel="noopener">Attention Is All You Need</a><br /> <a href="https://medium.com/@vi.ai_/exploring-and-building-the-llama-3-architecture-a-deep-dive-into-components-coding-and-43d4097cfbbb" target="_blank" rel="noopener">LLama architecture</a><br /> <a href="https://www.researchgate.net/publication/382331612_LLM_Inference_Serving_Survey_of_Recent_Advances_and_Opportunities" target="_blank" rel="noopener">LLM Inference Serving: Survey of Recent Advances and Opportunities</a><br /> <a href="https://mlcommons.org/2024/03/mlperf-llama2-70b/" target="_blank" rel="noopener">MLPerf Llama2 70B</a><br /> <a href="https://github.com/mlcommons/inference/tree/master/loadgen#integration-example-and-flow" target="_blank" rel="noopener">MLPerf loadgen</a><br /> <a href="https://github.com/mlcommons/policies/blob/master/MLPerf_Results_Messaging_Guidelines.adoc#2-use-of-mlperf-benchmark-for-mlcommons-reviewed-and-verified-results" target="_blank" rel="noopener">MLPerf Policies</a><br /> <a href="https://www.sifive.com/cores/intelligence-xm-series">SiFive XM series</a></p> <p><a href="https://cloud-v.co/blog/risc-v-1/accelerating-llama-cpp-with-risc-v-vector-extension-3" target="_blank" rel="noopener">https://cloud-v.co/blog/risc-v-1/accelerating-llama-cpp-with-risc-v-vector-extension-3</a><br /> PerfXLM: <a href="https://www.youtube.com/watch?v=tVXejqZCL_c" target="_blank" rel="noopener">https://www.youtube.com/watch?v=tVXejqZCL_c</a></p> </div></div></div></div></div></div></div></div></main><footer class="Footer_footer__KQx25"><div class="container"><div style="z-index:10" class="relative row"><div class="col-xl-5 col-lg-12"><a class="Footer_logo__CBryw" title="SiFive" href="https://www.sifive.com/"><svg xmlns="http://www.w3.org/2000/svg" width="100" height="34" fill="none" viewBox="0 0 100 34"><path fill="#fff" d="M4.274 17.6 8.862 3.375H26.44l1.784 5.534H12.696l-.992 3.14h17.533l2.636 8.175L17.65 30.708 4.135 20.739h9.127l4.349 3.207 8.605-6.343zM28.56 0H6.742L0 21.016 17.651 34l17.651-12.988zM53.086 9.614l-1.083 2.329c-1.68-1.03-3.358-1.456-4.529-1.456-1.524 0-2.519.582-2.519 1.635 0 3.425 8.351 1.59 8.33 7.232 0 2.8-2.43 4.523-5.833 4.523-2.43 0-4.728-1.008-6.318-2.485l1.126-2.284c1.59 1.478 3.579 2.284 5.236 2.284 1.811 0 2.894-.694 2.894-1.904 0-3.492-8.35-1.544-8.35-7.12 0-2.687 2.275-4.366 5.633-4.366 2.01 0 3.977.65 5.413 1.612M56.179 11.876h2.518v11.912h-2.518zm2.74-3.404c0 .896-.642 1.545-1.481 1.545s-1.48-.649-1.48-1.545c0-.918.64-1.567 1.48-1.567s1.48.65 1.48 1.567M62.986 8.063l-.073.073v15.652l.073.073h.927l.073-.073V16.83h7.35l.073-.073v-.94l-.072-.074h-7.351V9.128h8.3l.073-.074v-.918l-.072-.073zM75.05 7.884c-.466 0-.845.394-.845.88 0 .497.38.901.845.901.467 0 .846-.404.846-.902 0-.485-.38-.88-.846-.88M74.586 12.049l-.072.073v11.666l.072.073h.906l.073-.073V12.122l-.073-.073zM87.267 12.049l-.067.046-4.064 10.485-4.085-10.485-.068-.046h-.994l-.068.1 4.662 11.666.067.046h.95l.067-.046 4.64-11.666-.067-.1zM89.976 17.31c.25-2.567 2.09-4.286 4.604-4.286 2.492 0 4.223 1.68 4.43 4.286zm10.023.903c.036-1.915-.533-3.577-1.6-4.679-.956-.987-2.276-1.508-3.819-1.508-3.293 0-5.683 2.498-5.683 5.94 0 3.429 2.39 5.917 5.683 5.917 1.807 0 3.411-.668 4.517-1.882v-.099l-.53-.604h-.108c-.922 1.024-2.284 1.588-3.834 1.588-2.643 0-4.513-1.845-4.674-4.6h9.976z"></path></svg></a></div><div class="col-xl-7 col-lg-12"><div class="row"><div class="col-md-3 col-12"><h3>Products</h3><ul class="Footer_menu__oj6R6"><li><a href="https://www.sifive.com/risc-v-core-ip">SiFive Core IP</a></li><li><a href="https://www.sifive.com/cores/performance">SiFive Performance</a></li><li><a href="https://www.sifive.com/cores/intelligence">SiFive Intelligence</a></li><li><a href="https://www.sifive.com/cores/automotive">SiFive Automotive</a></li><li><a href="https://www.sifive.com/cores/essential">SiFive Essential</a></li><li><a target="_blank" href="https://scs.sifive.com/accounts/login/">SiFive Core Designer</a></li><li><a href="https://www.sifive.com/software">Software</a></li><li><a href="https://www.sifive.com/boards">Boards</a></li><li><a href="https://www.sifive.com/documentation">Documentation</a></li><li><a href="http://support.sifive.com/">Customer Support</a></li></ul></div><div class="col-md-3 col-12"><h3>Technology</h3><ul class="Footer_menu__oj6R6"><li><a href="https://www.sifive.com/technology/risc-v">RISC-V</a></li><li><a href="https://www.sifive.com/technology/vectors">Vectors</a></li><li><a href="https://www.sifive.com/technology/sifive-insight">Trace+Debug</a></li><li><a href="https://www.sifive.com/technology/shield-soc-security">Security</a></li></ul></div><div class="col-md-3 col-12"><h3>Company</h3><ul class="Footer_menu__oj6R6"><li><a href="https://www.sifive.com/about">About</a></li><li><a href="https://www.sifive.com/press">Newsroom</a></li><li><a href="https://www.sifive.com/careers">Careers</a></li><li><a href="https://www.sifive.com/contact">Contact Us</a></li></ul></div><div class="col-md-3 col-12"><h3>Community</h3><ul class="Footer_menu__oj6R6"><li><a href="https://www.sifive.com/blog">Blog</a></li><li><a href="https://www.sifive.com/partners">Partners</a></li></ul></div></div></div></div><div style="z-index:11" class="relative row"><div class="col"><div class="Footer_footerAward__ULWjy"><img alt="siFive" loading="lazy" width="78" height="56" decoding="async" data-nimg="1" style="color:transparent" srcSet="/_next/static/media/sifive-footer-award-2022.c2d841a0.png?auto=compress,format&amp;w=128&amp;q=70 1x, /_next/static/media/sifive-footer-award-2022.c2d841a0.png?auto=compress,format&amp;w=256&amp;q=70 2x" src="/_next/static/media/sifive-footer-award-2022.c2d841a0.png?auto=compress,format&amp;w=256&amp;q=70"/><div><p>Most respected<br/> private company</p><span>2022</span></div></div><div class="Footer_socialLinks__QyOzG"><a target="_blank" title="Facebook" href="https://www.facebook.com/SiFive"><svg xmlns="http://www.w3.org/2000/svg" width="27" height="28" fill="none" viewBox="0 0 27 28"><path fill="#CBD5E1" d="M26.25 14.143C26.25 6.85 20.374.938 13.125.938S0 6.85 0 14.143c0 6.59 4.8 12.054 11.074 13.045V17.96H7.742v-3.817h3.332v-2.91c0-3.31 1.96-5.137 4.958-5.137 1.436 0 2.938.258 2.938.258v3.25h-1.655c-1.63 0-2.14 1.017-2.14 2.062v2.477h3.64l-.581 3.817h-3.058v9.228c6.274-.991 11.074-6.454 11.074-13.045"></path></svg></a><a target="_blank" title="Twitter" href="https://twitter.com/SiFive"><svg xmlns="http://www.w3.org/2000/svg" width="30" height="25" fill="none" viewBox="0 0 30 25"><path fill="#CBD5E1" d="M29.506 3.428c-1.093.477-2.25.794-3.433.94a5.93 5.93 0 0 0 2.62-3.292 11.9 11.9 0 0 1-3.777 1.442A5.958 5.958 0 0 0 14.61 6.592c0 .472.04.927.138 1.359A16.86 16.86 0 0 1 2.467 1.719a5.967 5.967 0 0 0 1.83 7.963 5.9 5.9 0 0 1-2.691-.734v.065a5.985 5.985 0 0 0 4.773 5.855c-.51.134-1.035.2-1.562.196a5.3 5.3 0 0 1-1.128-.102 6.01 6.01 0 0 0 5.568 4.15 11.97 11.97 0 0 1-7.388 2.542c-.488 0-.957-.022-1.426-.082a16.77 16.77 0 0 0 9.14 2.674c10.964 0 16.959-9.082 16.959-16.954 0-.264-.01-.518-.022-.77a11.9 11.9 0 0 0 2.986-3.094"></path></svg></a><a target="_blank" title="Github" href="https://github.com/sifive/"><svg xmlns="http://www.w3.org/2000/svg" width="32" height="30" fill="none" viewBox="0 0 32 30"><path fill="#CBD5E1" fill-rule="evenodd" d="M15.971 0a15.272 15.272 0 0 0-4.828 29.766c.76.14 1.04-.337 1.04-.74v-2.597c-4.246.937-5.146-2.044-5.146-2.044a4.1 4.1 0 0 0-1.697-2.24c-1.378-.938.112-.938.112-.938a3.22 3.22 0 0 1 2.335 1.575 3.263 3.263 0 0 0 4.443 1.275 3.24 3.24 0 0 1 .938-2.044c-3.394-.384-6.956-1.697-6.956-7.547a5.9 5.9 0 0 1 1.565-4.097 5.56 5.56 0 0 1 .15-4.04s1.285-.413 4.2 1.565a14.45 14.45 0 0 1 7.65 0c2.916-1.978 4.19-1.565 4.19-1.565a5.53 5.53 0 0 1 .16 4.04 5.9 5.9 0 0 1 1.566 4.097c0 5.869-3.572 7.153-6.975 7.5a3.6 3.6 0 0 1 1.04 2.813v4.19c0 .497.272.882 1.05.741A15.281 15.281 0 0 0 15.971-.056z" clip-rule="evenodd"></path></svg></a><a target="_blank" title="LinkedIn" href="https://www.linkedin.com/company/sifive"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="23" fill="none" viewBox="0 0 24 23"><path fill="#CBD5E1" d="M5.744 7.749H.923v14.502h4.821zM3.369.75C1.719.75.642 1.834.642 3.256c0 1.391 1.045 2.506 2.663 2.506h.031c1.681 0 2.728-1.115 2.728-2.506C6.032 1.834 5.017.75 3.369.75M17.592 7.412c-2.557 0-3.703 1.406-4.344 2.394V7.753h-4.82c.064 1.36 0 14.502 0 14.502h4.82v-8.099c0-.433.03-.867.159-1.177.347-.865 1.141-1.762 2.473-1.762 1.743 0 2.442 1.33 2.442 3.279v7.759h4.82v-8.316c0-4.455-2.379-6.527-5.55-6.527"></path></svg></a><a target="_blank" title="YouTube" href="https://www.youtube.com/SiFiveInc"><svg xmlns="http://www.w3.org/2000/svg" width="32" height="23" fill="none" viewBox="0 0 32 23"><path fill="#CBD5E1" d="M13.292 22.62c-5.844-.113-7.84-.213-9.066-.475-.83-.174-1.552-.561-2.08-1.122-.41-.425-.734-1.073-.987-1.971-.216-.75-.3-1.372-.42-2.894-.184-3.435-.228-6.243 0-9.38.188-1.734.279-3.79 1.527-4.99A4.04 4.04 0 0 1 4.31.715C5.512.478 10.635.291 15.938.291c5.29 0 10.425.187 11.628.424.962.187 1.864.748 2.393 1.472 1.139 1.858 1.159 4.168 1.274 5.975.048.861.048 5.75 0 6.611-.18 2.856-.325 3.867-.733 4.915-.253.661-.469 1.01-.842 1.397a4 4 0 0 1-2.14 1.135c-5.058.394-9.352.48-14.226.4m7.744-11.477c-2.813-1.56-5.507-3.006-8.26-4.503v8.957c2.897-1.634 5.952-3.131 8.272-4.466z"></path></svg></a><a target="_blank" title="GlassDoor" href="https://www.glassdoor.com/Overview/Working-at-SiFive-EI_IE1922671.11,17.htm"><svg xmlns="http://www.w3.org/2000/svg" width="16" height="23" viewBox="0.6 0 680.6 959.3"><path fill="#CBD5E1" d="M545.2 259.5c0-2.7 2.2-5 4.9-5h126.3c2.7 0 4.8 2.4 4.8 5.1v562.8c0 75.5-60.8 136.9-136 136.9H136.8C61.5 959.3.6 898.1.6 822.4h544.6zM136.8 0C61.5 0 .6 61.4.6 137.1v562.8c0 2.7 2.3 5 5 5h126.2c2.7 0 5-2.3 5-5V137.1h544.4C681.2 61.4 620.4 0 545.2 0z"></path></svg></a><a class="SocialLinks_link__2xp3k" title="WeChat"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" fill-rule="evenodd" clip-rule="evenodd"><path d="M21.502 19.525C23.026 18.42 24 16.787 24 14.971c0-3.326-3.237-6.023-7.229-6.023s-7.229 2.697-7.229 6.023c0 3.327 3.237 6.024 7.229 6.024.825 0 1.621-.117 2.36-.33l.212-.032c.139 0 .265.043.384.111l1.583.914.139.045a.24.24 0 0 0 .241-.241l-.039-.176-.326-1.215-.025-.154a.48.48 0 0 1 .202-.392M8.675 2.297C3.884 2.297 0 5.533 0 9.526c0 2.178 1.168 4.139 2.997 5.464a.57.57 0 0 1 .243.471l-.03.184-.391 1.458-.047.211c0 .16.13.29.289.29l.168-.054 1.899-1.097a.9.9 0 0 1 .46-.133l.255.038c.886.255 1.842.397 2.832.397l.476-.012a5.6 5.6 0 0 1-.291-1.771c0-3.641 3.542-6.593 7.911-6.593l.471.012c-.653-3.453-4.24-6.094-8.567-6.094m5.686 11.711a.964.964 0 1 1 .001-1.927.964.964 0 0 1-.001 1.927m4.82 0a.964.964 0 1 1 0-1.928.964.964 0 0 1 0 1.928M5.783 8.369a1.156 1.156 0 1 1 0-2.312 1.156 1.156 0 0 1 0 2.312m5.783 0a1.156 1.156 0 1 1 0-2.312 1.156 1.156 0 0 1 0 2.312"></path></svg></a></div></div></div><div class="row"><div class="col"><div class="Footer_secondary__fQ3Vr"><a href="https://www.sifive.com/">© SiFive, Inc.</a><a href="https://www.sifive.com/terms">Terms of Use</a><a href="https://www.sifive.com/privacy">Privacy Policy</a></div></div></div><div style="z-index:15" class="relative row"><div class="text-center col"><div class="Footer_lang__7_Fhv"><div class="LanguageToggle_languageToggle__czKEY LanguageToggle_white__p2WJq"><svg xmlns="http://www.w3.org/2000/svg" width="96" height="96" viewBox="0 0 96 96" class="LanguageToggle_icon__WTIGJ"><defs><clipPath id="a"><path d="M0 0h96v96H0z"></path></clipPath></defs><g clip-path="url(#a)"><path d="M95.971 49.8a55 55 0 0 0 0-3.628v-.167h-.01a47.84 47.84 0 0 0-10.423-27.948c.01-.01.02-.01.029-.02-.137-.147-.274-.294-.4-.441a47.85 47.85 0 0 0-35-17.577l-3.83-.02A47.99 47.99 0 0 0 .078 46.005H0v3.843h.069a47.986 47.986 0 0 0 95.9-.01Zm-3.624-3.794H73.185a83.2 83.2 0 0 0-3.223-21.762 62.5 62.5 0 0 0 12.343-4.421 43.9 43.9 0 0 1 10.041 26.182ZM50.165 3.931c6.162 1.157 11.559 7.872 15.047 17.724a90.4 90.4 0 0 1-15.047 1.529Zm-3.83 0v19.243a90 90 0 0 1-15.027-1.539C34.785 11.793 40.173 5.1 46.335 3.931m0 22.841v19.233H27.174A82.1 82.1 0 0 1 30.2 25.046a90.7 90.7 0 0 0 16.135 1.726m0 23.076v19.233a83.2 83.2 0 0 0-16.085 1.873 81.9 81.9 0 0 1-3.066-21.106Zm0 23.066v19c-6.083-1.147-11.422-7.705-14.909-17.351a83.5 83.5 0 0 1 14.909-1.649M36.8 90.6a44.1 44.1 0 0 1-19.608-11.285 56.3 56.3 0 0 1 10.6-3.9C30.1 81.776 33.2 87.011 36.8 90.6m13.362 1.323V72.914a83 83 0 0 1 14.919 1.647c-3.481 9.656-8.823 16.214-14.916 17.361Zm18.556-16.509a55.6 55.6 0 0 1 10.6 3.892 44 44 0 0 1-19.6 11.283c3.593-3.589 6.689-8.813 9-15.175m-18.553-6.343V49.848h19.161a81.8 81.8 0 0 1-3.066 21.1 84 84 0 0 0-16.095-1.877m0-23.066V26.772a91 91 0 0 0 16.144-1.716A82.1 82.1 0 0 1 69.326 46l-19.161.01Zm29.711-28.9a58.6 58.6 0 0 1-11.011 3.755c-2.331-6.558-5.476-11.94-9.159-15.616a44.2 44.2 0 0 1 20.17 11.862ZM36.8 5.254c-3.683 3.666-6.828 9.048-9.159 15.6A58.4 58.4 0 0 1 16.65 17.08 44.05 44.05 0 0 1 36.8 5.254M14.224 19.782a62.5 62.5 0 0 0 12.323 4.441 83.8 83.8 0 0 0-3.233 21.792H4.153a43.96 43.96 0 0 1 10.071-26.233m9.1 30.066a83.5 83.5 0 0 0 3.291 22 58.4 58.4 0 0 0-12.01 4.666A44.1 44.1 0 0 1 4.153 49.848zM81.9 76.512a58.4 58.4 0 0 0-12.01-4.666 83.5 83.5 0 0 0 3.291-22h19.161A44.02 44.02 0 0 1 81.9 76.512" data-name="globe-icon"></path></g></svg><span class="LanguageToggle_label__RGGsn false">English</span><svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 16 16" class="LanguageToggle_chevron__Fg2so"><path fill="currentColor" d="M6 13.4 4.6 12l4-4-4-4L6 2.6 11.4 8z"></path></svg><ul class="LanguageToggle_dropdown__vSjIc"><li><a href="https://www.sifive.com/blog/llm-optimization-and-deployment-on-sifive-intellig">English</a></li><li><a href="https://www.sifive.cn/blog/llm-optimization-and-deployment-on-sifive-intellig">简体中文</a></li></ul></div></div></div></div></div></footer></div><script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":{"_nextI18Next":{"initialI18nStore":{"en-us":{"common":{},"forms":{}},"en":{"common":{"en-us":"English","zh-cn":"简体中文","careers-open-roles":"See open roles","most-respected-private-company":"Most respected\u003cbr/\u003e private company","contact":"Contact","quote":"Quote","website":"Website","this-page-not-exist":"This page doesn`t exist!","go-back-home-page":"Go back to our homepage","or":"or","contact-us":"contact us","read-more":"Read More","request-information-from":"Request Information from","blog-archives":"Blog Archives","press-page":"Press Page","view-archived-news-articles":"View Archived News \u0026 Articles","open-positions":"Open Positions","filter-by":"Filter by","All Locations":"All Locations","All Departments":"All Departments","All Work Types":"All Work Types","Full time":"Full Time","Part time":"Part Time","Engineering":"Engineering","Marketing":"Marketing","Sales":"Sales","wechat-scan-text":"Scan the QR code with your mobile phone to contact us","search":"Search","watch-recording":"Watch Recording","webinar-info":"Webinar Info","view-more-details":"View More Details","register-now":"Register Now"},"forms":{"first-name":"First Name","last-name":"Last Name","email":"E-mail Address","phone":"Phone number","organization":"Organization (e.g. company, university)","job-title":"Job Title","comments":"Questions / Comments","submit":"Submit","request-sent":"Your request has been sent.","thank-you":"Thank you for your submission!","horse-creek-form-title":"Register for \u003cem\u003eHorse Creek\u003c/em\u003e Updates","click-to-download":"Click to Download"}}},"initialLocale":"en-us","ns":["common","forms"],"userConfig":{"i18n":{"locales":["en-us","zh-cn"],"defaultLocale":"en-us","localeDetection":false,"domains":[{"domain":"www.sifive.com","defaultLocale":"en-us"},{"domain":"www.sifive.cn","defaultLocale":"zh-cn"}]},"fallbackLng":{"default":["en"],"zh-cn":["zh"]},"localePath":"/app/public/locales","reloadOnPrerender":false,"default":{"i18n":{"locales":["en-us","zh-cn"],"defaultLocale":"en-us","localeDetection":false,"domains":[{"domain":"www.sifive.com","defaultLocale":"en-us"},{"domain":"www.sifive.cn","defaultLocale":"zh-cn"}]},"fallbackLng":{"default":["en"],"zh-cn":["zh"]},"localePath":"/app/public/locales","reloadOnPrerender":false}}},"page":{"id":"ZwaiBREAACQAyU28","uid":"llm-optimization-and-deployment-on-sifive-intellig","url":null,"type":"blog_post","href":"https://sifive.cdn.prismic.io/api/v2/documents/search?ref=Z8H4ZhAAACEAUi4M\u0026q=%5B%5B%3Ad+%3D+at%28document.id%2C+%22ZwaiBREAACQAyU28%22%29+%5D%5D","tags":[],"first_publication_date":"2024-10-11T16:58:18+0000","last_publication_date":"2024-12-19T18:28:11+0000","slugs":["llm-optimization-and-deployment-on-sifive-risc-v-intelligence-products"],"linked_documents":[],"lang":"en-us","alternate_languages":[],"data":{"page_title":"LLM Optimization and Deployment on SiFive RISC-V Intelligence Products","meta_description":"SiFive outlines how we optimized \u0026 deployed Llama on the SiFive Intelligence X390 platform, the future computing platform for AI/ML workloads.","title":[{"type":"heading1","text":"LLM Optimization and Deployment on SiFive RISC-V Intelligence Products","spans":[],"direction":"ltr"}],"author":{"id":"Y4UDihAAACQAjm3U","type":"blog_author","tags":[],"lang":"en-us","slug":"david-miller","first_publication_date":"2022-11-28T18:59:18+0000","last_publication_date":"2022-11-28T18:59:18+0000","uid":"head-of-corporate-communications-sifive","link_type":"Document","key":"a3e7ac30-31dd-4761-ae59-2f32dcf7a7a7","isBroken":false},"publish_date":"2024-10-10","share_image":{"link_type":"Media","key":"93a65838-3e6e-419c-a0e6-972409803f6e","kind":"image","id":"ZJX-UhEAACcA5A5F","url":"https://images.prismic.io/sifive/9f85c5eb-f5b9-4da0-8a54-a2e29e67df6b_data-center-news.png?auto=format,compress?auto=compress,format","name":"data-center-news.png","size":"367243","width":"640","height":"361"},"body":[{"type":"preformatted","text":"**Bruce Lai, Darren Hsieh and Hong-Rong Hsu** \n\nLarge language models (LLMs) have become essential to numerous applications, thanks to their powerful capabilities in natural language understanding and text generation. However, their large model sizes and high computational demands present significant challenges for efficient deployment and real-time performance.\n\nCan you imagine running Llama on a RISC-V machine while achieving real-time performance? The ML compiler plays a crucial role in making this possible. For more details, please refer to our previous article “ML Compiler for RISC-V Vector”. In this article, we will share how we optimized and deployed Llama on the SiFive Intelligence X390 platform[ref0], the future computing platform for AI/ML workloads.\n\n**Want to learn more about this? Watch our webinar:**\n\n[Watch Webinar Now](https://www.sifive.com/resources/webinar/advanced-llm-optimization-and-deployment-on-risc-v)\n\n**SiFive AI/ML Software Stack**\n\nTo enable LLM models on RISC-V, multiple components need to interact seamlessly. In this section, we will introduce each component of the SiFive AI/ML Software Stack.\n\nThe orange blocks in Figure 1 represent the fundamental RISC-V building blocks, primarily owned and maintained by SiFive:\n\n- **SiFive Intelligence and High-Performance core series**: Cores well known as [X280](https://www.sifive.com/cores/intelligence-x280)/[X390](https://www.sifive.com/cores/intelligence-x390) (Intelligence) and [P470](https://www.sifive.com/cores/performance-p450-470)/[P670](https://www.sifive.com/cores/performance-p650-670)/[P870](https://www.sifive.com/cores/performance-p870-p870a) (High-Performance).\n\n- **SiFive accelerators**: In-house Hardware to accelerate domain specific instructions like SiFive XM Series[ref9].\n\n- **SiFive LLVM Compiler**: Provides RISC-V C/C++ compilation and optimization, RVV intrinsic programming, auto-vectorization, and efficient RISC-V backend code generation for MLIR. SiFive offers a proprietary version of LLVM with advanced u-architecture optimizations and custom instruction compilation/IR for SiFive cores, though users can also access a generic version from the upstream compiler directly.\n\n- **SiFive System Software**: Provide FSFL (Freedom SDK for Linux) - the Yocto/OpenEmbedded based RISC-V Linux solution and FSFM (Freedom SDK for Metal) - a reference ASM/C/C++ bare-metal environment for SiFive cores.\n\n- **SKL (SiFive Kernel Library)**: A fully optimized C/C++ library with a set of tuned routines that maximize algorithm throughput on SiFive Processors. Some hot-spot operations in IREE will be offloaded to SKL in order to maximize performance from SiFive processors.\n\n![SiFive AI/ML Software Stack](https://images.prismic.io/sifive/ZwlBvIF3NbkBXTts_fig1.png?auto=format,compress)\n**Figure 1: SiFive AI/ML Software Stack**\n\nThe blue blocks in Figure 1 represent the components that SiFive leverages from and contributes back to Open Source projects.\nML Compiler and Runtime: IREE[ref1] is an open sourced MLIR-based compiler and runtime. We leverage most of the generic optimization while adding SiFive architecture specific functions and optimizations.\n\n**VCIX MLIR Dialect:** SiFive has open-sourced the VCIX MLIR dialect, allowing users to lower their models and delegate them to custom TPUs with minimum effort.\n\n**ML Interpreters:** For customers requiring a more lightweight framework, we offer:\n- Customized TFLite with RVV optimizations.\n- Upstream XNNPACK with RVV optimizations.\n- ONNXRuntime and delegated to XNNPACK.\n- Customized Llama.cpp with additional RVV optimizations\n- Other Open Source Libraries like libyuv, libc, openSSL, zlib-ng to accelerate other non-AI/ML domains in RVV and fine-tuned for SiFive u-architecture.\n\nThe yellow blocks in Figure 1 represent the components provided by third-party vendors or communities, which SiFive leverages to create a comprehensive solution.\n\nRead more about how [SiFive Accelerates RISC-V Vector Integration in XNNPACK for Optimized AI Inference](https://www.sifive.com/blog/sifive-accelerates-risc-v-vector-integration-in-xnnpack-for-optimized-ai-inference)\n\n**Pytorch end-to-end flow**\n\nIn this section, we’ll introduce the implementation of a Pytorch end-to-end flow for Large Language Models (LLM) using SHARK-Turbine, a tool supported by AMD for facilitating various model deployment activities. The complete workflow for enabling the LLM demonstration is illustrated in Figure 2.\n\n![Pytorch LLM Demo](https://images.prismic.io/sifive/ZwlNrIF3NbkBXUcz_fig2.png?auto=format,compress)\n**Figure 2: Pytorch end-to-end Large Language Model Demo flow in SHARK-Turbine**\n\nSHARK-Turbine[ref2] provides two key scripts to enable the LLM demonstration. The first script, stateless_llm.py, is responsible for converting a PyTorch Hugging Face LLM into a VMFB (VM FlatBuffer) and a parameter file. This conversion process utilizes the FX/Dynamo-based torch-mlir compiler along with the IREE compiler. Currently we are using upstream IREE along with several SiFive patches, which incorporates SiFive MLIR optimizations and leverages SiFive LLVM to achieve optimal performance. Detailed optimization strategies are discussed in the subsequent section.\n\nAfter the LLM is compiled, the second script, llm_runner.py, utilizes Python bindings to invoke IREE runtime, load VMFB models, and set up the runtime context. This allows the LLM to run within a Python environment on SiFive Intelligence cores. Users can input queries (prompts), and the SiFive Intelligence platform will execute multiple inferences to generate meaningful responses.\n\nNow, you might be wondering: Why not simply use llama.cpp for LLM inference, since the GGUF format (a file format for storing models for inference with llama.cpp) works fine? We will explain this in the performance section.\n\n**Llama Optimization**\n\nModel Architecture and Bottleneck\nUnderstanding model architecture helps us identify the performance bottleneck. LLaMA (Large Language Model Meta AI) is a series of transformer-based[ref3] large language models developed by Meta. Figure 3 shows the overview of LLaMA architecture. This architecture utilizes self-attention mechanisms and feed-forward neural networks to process input sequences. The key components of the architecture include: \n\n- Embedding Layer: Converts tokens into dense vectors.\n- Multi-Head Self-Attention Mechanism: Computes relationships between all pairs of tokens, enabling the model to capture long-range dependencies.\n- Feed-Forward Networks (FFNs): Contains a series of matrix multiplication(matmul) operations and non-linear functions to provide the non-Linearity and feature Transformation.\n- Layer Normalization: Normalizes activations for stability and faster convergence.\n- Residual Connections: Avoid gradient vanishment during backpropagation.\n\n![Overview of LLaMA architecture](https://images.prismic.io/sifive/Z2RlopbqstJ98sg7_fig3.png?auto=format,compress)\n**Figure 3: Overview of the LLaMA architecture[ref4]**\n\nThe Self-Attention and Feed-Forward layers are implemented through a series of matrix multiplication operations. \n\nThe figure 3 shows that there are N self-attention and N feed-forward networks, both implemented through a series of matrix multiplication operations. This extensive reliance on matrix multiplication makes it the primary bottleneck in transformer-based models.\n\n**Performance Results**\n\nPerformance profiling result for TinyLLama using IREE during the decode phase. Different operations can be grouped into a single dispatch. For example, matrix multiplication (matmul) followed by bias addition can be combined into one dispatch. The \"Operation Type\" column indicates the primary operation within each dispatch. Notably, matmul operations are the primary performance bottleneck, consuming over 95% of the inference time. “D” in the shape means dynamic shape, which handles the increasing dimension from the KV-cache.\n\n![Table 1](https://images.prismic.io/sifive/Zw6mp4F3NbkBXeXd_table1.png?auto=format,compress)\n**Table 1**\n\nTable 1 shows the performance profiling results of running TinyLLama with IREE during the decode phase. The next section will provide more detailed information about the \"prefill\" and \"generation\" phases. Briefly, in the prefill phase, the matrix multiplication (matmul) operations follow the General Matrix Multiply (GEMM) pattern, where the M dimension is greater than or equal to 1. In contrast, during the decode phase, the matmul operations follow the General Matrix-Vector Multiply (GEMV) pattern, where the M dimension is always equal to 1.The shape for the matmul operation is represented as [BxMxNxK], where: B is batch size and it’s optional; M is rows in output; N is columns in output; K is reduction dimension. For the case of dispatch_60, the batch size is 32(from the number of attention heads) and D indicates dynamic input in the N dimension.\n \nTable 1 demonstrates that matmul operations account for over 95% of the decode phase inference time, making them the primary performance bottleneck in LLaMA.\n\nSince matmul operations are the main performance hotspot in LLM inference, the next section will focus on optimizing these operations.\n\n**Optimization through IREE Compiler**\n\nIREE (Intermediate Representation Execution Environment) is an MLIR-based end-to-end AI/ML compiler and runtime. The Architecture Overview is shown in figure4. In IREE, the input model is lowered to MLIR and then different levels of optimizations are applied (such as kernel fusion, tiling, and loop unrolling) and finally translated to target-dependent VM Bytecode. The VM Bytecode is able to execute with IREE runtime.\n\n![IREE Architecture](https://images.prismic.io/sifive/ZwlMhYF3NbkBXUYJ_fig4.png?auto=format,compress)\n**Figure 4. IREE Architecture Overview[ref1]**\n\nAlthough IREE is a powerful framework that supports both CPU \u0026 GPU code generation and optimizations, RISC-V backend with RVV vectorization hasn’t been tuned to optimal at this point. Figure 5 shows the significant performance improvement between the code before and after optimizations.\n\n![TinyLLama on SiFive-X390](https://images.prismic.io/sifive/ZwlMhoF3NbkBXUYL_fig5.png?auto=format,compress)\n**Figure 5. Performance Gap between before and after optimizations**\n \nSeveral optimizations will be demonstrated in the following subsection to improve the performance of the matmul operation, leading to significant improvements in LLM performance.\nCache \u0026 Register Tiling optimizations for matmul\nIn LLM inference, the process is divided into two distinct phases: Prefill (Prompt) and Decode (Generation). The LLM inference process is visualized in Figure 6.\n\n![Prefill Prompt](https://images.prismic.io/sifive/ZwlMh4F3NbkBXUYN_fig6.png?auto=format,compress)\n**Figure 6. Prefill (Prompt) and Decode(Generation) phases in the LLM inference.[ref5]**\n\nPrefill or Prompt Phase: During this phase, the input prompt has a length greater than or equal to 1. The model processes the initial input sequence and constructs the KV-Cache, which will be reused in subsequent decoding steps. Assuming the matmul has a problem size of [m,n,k] where the left-hand side (LHS) is [m,k], the dimension m is always greater than or equal to 1.\n\nDecode or Generation Phase: This phase begins with the output of the prefill phase as the first input. The model uses the KV-Cache to efficiently generate tokens in the following iterations. The matmul operations in this phase have the dimension m=1. This smaller m value (1) is a key distinction compared to traditional operations in neural network models.\n\nTo optimize the matmul operation using RISC-V Vector (RVV), two key factors affect the efficiency of matmul:\n\n**Register Tiling**: \nThis focuses on maximizing the utilization of the CPU's vector registers during matrix multiplication. By carefully organizing data to fit into vector registers, register tiling helps minimize memory access and maximize computational throughput.\n\n**Cache Tiling**: \nThis strategy optimizes matrix multiplication by improving data locality, ensuring that data remains in the cache hierarchy for as long as possible. Efficient cache tiling reduces memory latency and improves the performance of the matmul operation by minimizing cache misses.\n\nIn IREE, the matmul operation is implemented using an outer product approach. The output register tile size is [m, n], where m and n represent the number of elements in the output rows and columns, respectively. Implementing the outer product with RVV requires m vector grouped registers for output accumulators and 1 additional vector grouped register to load data from the RHS matrix. Each vector grouped register has a length of n. Hence, the vector register utilization rate can be calculated using the formula:\n\n![FORMULA 1](https://images.prismic.io/sifive/Zw6lP4F3NbkBXeWz_formula1.png?auto=format,compress)\n\nFor instance, if VLEN=1024, the data type is float, and [m,n]= [8,32]\n\n![FORMULA INSTANCE](https://images.prismic.io/sifive/Zw6lQIF3NbkBXeW0_formula2.png?auto=format,compress)\n\nUpon analyzing the output assembly code, we discovered that IREE generates code with a fixed output register tile size of [m,n]=[8,32] on RISC-V platforms. This configuration reduces the vector register utilization rate to 28% when VLEN = 1024. The optimal case occurs when VLEN = 512, where the utilization rate improves to 56%. However, for smaller VLEN values, such as VLEN = 128, the generated code suffers from excessive vector register spilling, leading to poor performance. \n\nTo enhance performance, we adjusted the register tile size to [m, n] = [7, m4], where m4 corresponds to LMUL = 4. In this case, if VLEN = 1024 and the data type is float32, the value of n will be 128, which is the maximum vector length when LMUL = 4. This adjustment allows us to achieve a 100% utilization rate of the vector registers, maximizing performance.\n\nThe register tile configuration [m,n]=[7,m4] is well-suited for the prefill phase of LLM inference, where the matrix dimension M typically exceeds 7. However, this setup is suboptimal for the decode phase. During decoding, we modified the register tile size to [m,n]=[1,m8], which significantly improved the vector register utilization rate from 25% to 50%.\n\nFollowing the adjustment to the register tiling policy for enhanced performance, we extended our optimization to the cache tiling strategy within the IREE compiler. The cache tiling size is now dynamically selected based on the register tiling size n. For instance, when n=128, IREE defaults to an n-dimension cache tile size of 128. By fine-tuning this parameter to match the specific requirements of the system, we are able to further enhance cache efficiency, yielding notable performance improvements.\n\nUsing the IREE compiler, it is fast and convenient to tune the tiling policy at each level of the memory hierarchy. Compared to library-based solutions, IREE significantly reduces the required engineering effort.\n\n**Performance Results**\n\nOriginally, generating a single token on a 32.5 MHz FPGA took several hours. After applying IREE optimizations, the f16 TinyLlama model now generates a token in about 5 seconds. Normalized to a 1 GHz real chip frequency, this translates to 5.37 tokens per second, achieving real-time user experience on single-core X390.\n\nWe also compared IREE with Llama.cpp, a pure C/C++ LLM inference project. While llama.cpp is convenient for deployment, its performance optimizations are limited. Even with hand-written RVV kernels, it is constrained by algorithm limitations and lacks graph-level optimization.\n\n![Table 2](https://images.prismic.io/sifive/Zw6moYF3NbkBXeXa_table2.png?auto=format,compress)\n**Table 2: Performance (tokens/second) of TinyLlama-1.1b on X390 between llama.cpp and IREE**\n\nThe data also shows that the SiFive X390 is competitive, as a single core can achieve real-time user experience performance on TinyLlama.\n\nNot only that, leveraging the optimizations we have made allows us to successfully run Llama2-7b on the SiFive Intelligence platform, as the model architectures of TinyLlama and Llama2-7b are the same. The e2e flow and scripts can be leveraged as well.\n\n![Table 3](https://images.prismic.io/sifive/Zw6mooF3NbkBXeXb_table3.png?auto=format,compress)\n**Table 3: Llama2-7b-Q4 performance comparison**\n\n**Accuracy Verification**\n\nAfter optimization, securing accuracy is also important for deployment. This section will describe how we verify the model accuracy.\n\n**Leveraging MLPerf's Llama Accuracy Flow for TinyLlama**\n\nTo align with industry standards, we leverage MLPerf’s benchmarks to evaluate the accuracy of our models, ensuring our solutions are both reliable and effective. Currently, our focus is solely on accuracy—we’ve yet to leverage the performance benchmarking aspects of MLPerf. \n\nWhile MLPerf picked the Llama2 70B model as a benchmark, we adapt it by using TinyLlama, which provides a more manageable, scaled-down version of the original model.\n\n\n**Adapting MLPerf’s Llama Accuracy Flow for TinyLlama**\n\nOur approach revolves around using **MLPerf's Llama accuracy flow** to ensure that the TinyLlama model maintains accuracy while adapting IREE runtime on SiFive Intelligence Cores. This flow provides a robust framework to evaluate how well our models perform on OPENORCA dataset. With the **Llama accuracy script**, we continuously assess the accuracy of TinyLlama, allowing us to refine the model and make necessary adjustments to maximize precision within the constraints of our hardware.\n\n![How LoadGen interacts with system](https://images.prismic.io/sifive/ZwlMiIF3NbkBXUYP_fig7.png?auto=format,compress)\n**Figure 7. How the LoadGen interacts with the Inference system.[ref7]**\n\n1. Benchmark knows the model, dataset, and preprocessing.\n2. Benchmark hands dataset sample IDs to LoadGen.\n3. LoadGen starts generating queries of sample IDs.\n4. Benchmark creates requests to the backend.\n5. Result is post-processed and forwarded to LoadGen.\n6. LoadGen outputs logs for analysis.\n7. Accuracy script uses LoadGen Logs and reference responses from the dataset to calculate the ROUGE scores.\n8. Accuracy script outputs the ROUGE scores.\n\n“Result not verified by MLCommons Association.” Followed by MLPerf Policies[ref8]\n\n![Table 4](https://images.prismic.io/sifive/Zw6mo4F3NbkBXeXc_table4.png?auto=format,compress)\n**Table 4 : MLPerf OPENORCA Accuracy Comparison of TinyLlama-1.1b Across Different Frameworks and Platforms**\n\nThe MLPerf taskforce selected ROUGE scores to evaluate how closely a generated text aligns with its reference. They decided to use three specific ROUGE metrics for this benchmark: ROUGE-1, ROUGE-2, and ROUGE-L.\n\nLimited by FPGA emulation speed, we executed only 100 samples to validate accuracy. For the FP32 model, the results indicate that the SiFive X390 with IREE achieved identical scores compared with the NVIDIA 2080 Ti and HuggingFace backend. In contrast, for the FP16 model, the X390 with IREE demonstrated slightly better accuracy than the FP32. It is important to note that the full dataset in MLPerf OpenOrca consists of 24,576 samples, and reference results from NVIDIA and Hugging Face show a decline in ROUGE scores after processing the entire dataset.\n\nUnverified MLPerf® v4.0 Inference Closed Llama2 offline. Result not verified by MLCommons Association. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.”\n\n**Demo Snapshot**\n\nThe following snapshot was taken from a demo running on the X390 FPGA. The operating system is the Linux kernel from the SiFive FSFL (Freedom SDK for Linux). The prerequisites include pre-compiled VMFB, model weights (in .safetensors format), the IREE runtime, the llm_runner.py script, and the IREE Python binding package.\n\n![Demo Snapshot](https://images.prismic.io/sifive/ZwlMiYF3NbkBXUYR_fig8.png?auto=format,compress)\n**Figure 7. Demo snapshot on X390 FPGA** \n\n**Conclusion and Future Work**\n\nIn this article, we walked through the end-to-end process of deploying TinyLlama and Llama2-7b-Q4, from PyTorch to the SiFive Intelligence X390 platform, demonstrating the readiness of running LLMs on the cutting-edge RISC-V ecosystem. We also highlighted how using IREE for optimization can achieve significantly better performance compared to library-based frameworks. Accuracy validation post-optimization is equally important, which is why we have integrated our software stack into the MLPerf framework to ensure accuracy. We plan to upstream the generic optimizations to the IREE repository, so stay tuned if you're interested in our work.\n\nExciting tasks we're currently working on include:\n- For the upcoming SiFive XM series platform[ref9], we are enabling end-to-end lowering and optimization for SiFve Intelligence X390 cores and the AI matrix engine via IREE.\n- Supporting the installation and native execution of PyTorch on RISC-V platforms.\n- Enabling Llama 3.2 on SiFive platforms.\nWe look forward to sharing more in our next technical blog!\n---\n**Webinar**\n\nSiFive hosted a live webinar titled **Advanced LLM Optimization and Deployment on RISC-V: Techniques You Need to Know** on Wednesday, October 16, 2024.\n\nIn it, you'll learn about:\n- SiFive AI/ML Software Stack for RISC-V\n- End-to-end deployment of Pytorch Llama models\n- Challenges and solutions in optimizing LLM models\n- Achieving real-time Llama performance with MLIR-based IREE\n\n**English Session**\n[Watch Webinar: LLM Optimization and Deployment on SiFive Intelligence](https://www.sifive.com/resources/webinar/advanced-llm-optimization-and-deployment-on-risc-v)\n\n**Chinese Session**:\n[Watch Webinar: LLM Optimization and Deployment on SiFive Intelligence](https://www.sifive.cn/resources/webinar/advanced-llm-optimization-and-deployment-on-risc-v)\n\n**References**\n\n[SiFive Intelligence X390](https://www.sifive.com/cores/intelligence-x390)\u003cbr /\u003e\n[IREE](https://iree.dev/)\u003cbr /\u003e\n[SHARK-Turbine](https://github.com/nod-ai/SHARK-Turbine/tree/main)\u003cbr /\u003e\n[Attention Is All You Need](https://arxiv.org/abs/1706.03762)\u003cbr /\u003e\n[LLama architecture](https://medium.com/@vi.ai_/exploring-and-building-the-llama-3-architecture-a-deep-dive-into-components-coding-and-43d4097cfbbb)\u003cbr /\u003e \n[LLM Inference Serving: Survey of Recent Advances and Opportunities](https://www.researchgate.net/publication/382331612_LLM_Inference_Serving_Survey_of_Recent_Advances_and_Opportunities)\u003cbr /\u003e\n[MLPerf Llama2 70B](https://mlcommons.org/2024/03/mlperf-llama2-70b/)\u003cbr /\u003e\n[MLPerf loadgen](https://github.com/mlcommons/inference/tree/master/loadgen#integration-example-and-flow)\u003cbr /\u003e\n[MLPerf Policies](https://github.com/mlcommons/policies/blob/master/MLPerf_Results_Messaging_Guidelines.adoc#2-use-of-mlperf-benchmark-for-mlcommons-reviewed-and-verified-results)\u003cbr /\u003e\n[SiFive XM series](https://www.sifive.com/cores/intelligence-xm-series)\n\n[https://cloud-v.co/blog/risc-v-1/accelerating-llama-cpp-with-risc-v-vector-extension-3](https://cloud-v.co/blog/risc-v-1/accelerating-llama-cpp-with-risc-v-vector-extension-3)\u003cbr /\u003e\nPerfXLM: [https://www.youtube.com/watch?v=tVXejqZCL_c](https://www.youtube.com/watch?v=tVXejqZCL_c)\n","spans":[],"direction":"ltr"},{"type":"preformatted","text":"","spans":[],"direction":"ltr"},{"type":"preformatted","text":"","spans":[],"direction":"ltr"}]}},"header":{"id":"W6V1zB0AACNRrnB8","uid":null,"url":null,"type":"header","href":"https://sifive.cdn.prismic.io/api/v2/documents/search?ref=Z8H4ZhAAACEAUi4M\u0026q=%5B%5B%3Ad+%3D+at%28document.id%2C+%22W6V1zB0AACNRrnB8%22%29+%5D%5D","tags":[],"first_publication_date":"2018-09-21T22:50:55+0000","last_publication_date":"2023-03-07T17:41:45+0000","slugs":["header"],"linked_documents":[],"lang":"en-us","alternate_languages":[{"id":"ZJC9UREAACQAzUkH","type":"header","lang":"zh-cn"}],"data":{"title":"Header","primary_nav":[{"text":"Core Designer","link":{"id":"W47SSiYAACUAfX3G","type":"core_designer_page","tags":[],"lang":"en-us","slug":"core-designer-page","first_publication_date":"2018-09-04T18:43:26+0000","last_publication_date":"2021-12-27T19:24:55+0000","link_type":"Document","key":"459db37a-5b61-45df-9ac0-5a1d699bf105","isBroken":false}},{"text":"Boards \u0026 Software","link":{"id":"W32A3h0AALp2ivxt","type":"boards_and_software_page","tags":[],"lang":"en-us","slug":"boards-and-software-page","first_publication_date":"2018-08-22T15:27:29+0000","last_publication_date":"2022-12-22T23:55:38+0000","link_type":"Document","key":"f5049625-6daf-4dd1-8e34-f392b4e789c8","isBroken":false}},{"text":"Why SiFive","link":{"id":"W476ayYAACcAfi7X","type":"why_page","tags":[],"lang":"en-us","slug":"why-sifive","first_publication_date":"2018-09-04T21:34:38+0000","last_publication_date":"2020-08-17T14:28:06+0000","link_type":"Document","key":"fbd5bdbd-bc66-4a41-89b2-41dcd9439bc1","isBroken":false}}],"secondary_nav":[{"text":"Contact Sales","link":{"id":"W7KBpREAAE0GIACF","type":"contact_sales_popup","tags":[],"lang":"en-us","slug":"contact-sales-popup","first_publication_date":"2018-10-01T20:20:56+0000","last_publication_date":"2018-10-03T00:04:21+0000","link_type":"Document","key":"ede6c2e7-8693-4dd9-92c4-2435c3c8b803","isBroken":false}}],"user_nav":[{"scs_text":"Evaluate Cores","scs_link":{"link_type":"Web","key":"155dfb95-ae58-436a-b2a6-039f7c5c114b","url":"https://scs.sifive.com/core-designer/"}}],"primary_nav_2":[{"text":"Products","link":{"link_type":"Any"},"dropdown":{"id":"XEeq2RAAACoAvyxk","type":"dropdown","tags":[],"lang":"en-us","slug":"products-dropdown","first_publication_date":"2019-01-22T23:44:41+0000","last_publication_date":"2023-07-05T15:10:56+0000","link_type":"Document","key":"f5395d97-a154-4d8a-8fe2-92d22c9ea1d0","isBroken":false}},{"text":"Technology","link":{"link_type":"Any"},"dropdown":{"id":"YCcZ7BMAACMAuqC1","type":"dropdown","tags":[],"lang":"en-us","slug":"technology-page-dropdown","first_publication_date":"2021-02-13T00:46:38+0000","last_publication_date":"2023-03-07T17:41:48+0000","link_type":"Document","key":"f8d7b090-deb4-4f55-94c7-39c72ad5ebb8","isBroken":false}},{"text":"Company","link":{"link_type":"Any"},"dropdown":{"id":"XLpOCBUAACoAIUVv","type":"dropdown","tags":[],"lang":"en-us","slug":"company-dropdown","first_publication_date":"2019-04-19T22:41:37+0000","last_publication_date":"2022-05-23T22:09:02+0000","link_type":"Document","key":"fc184d4b-5882-4c58-8e99-afd1cd419a23","isBroken":false}},{"text":"Community","link":{"link_type":"Any"},"dropdown":{"id":"XLpO8RUAACcAIUl5","type":"dropdown","tags":[],"lang":"en-us","slug":"community-dropdown","first_publication_date":"2019-04-19T22:43:00+0000","last_publication_date":"2023-03-07T17:41:04+0000","link_type":"Document","key":"3b77e2ce-1058-4571-8061-6d78f3117cad","isBroken":false}}],"products_nav":[{"text":null,"link":{"link_type":"Any"}}]}},"footer":{"id":"W48IeSYAACYAfmzo","uid":null,"url":null,"type":"footer","href":"https://sifive.cdn.prismic.io/api/v2/documents/search?ref=Z8H4ZhAAACEAUi4M\u0026q=%5B%5B%3Ad+%3D+at%28document.id%2C+%22W48IeSYAACYAfmzo%22%29+%5D%5D","tags":[],"first_publication_date":"2018-09-04T22:34:39+0000","last_publication_date":"2023-07-05T15:20:23+0000","slugs":["footer"],"linked_documents":[],"lang":"en-us","alternate_languages":[{"id":"ZJC-YxEAACgAzU3X","type":"footer","lang":"zh-cn"}],"data":{"content_title":"Footer","secondary":[{"link_text":"Terms of Use","link":{"id":"W32l9B0AAAoNi5wL","type":"terms_page","tags":[],"lang":"en-us","slug":"terms-page","first_publication_date":"2018-08-22T18:05:43+0000","last_publication_date":"2021-06-16T20:06:17+0000","link_type":"Document","key":"60f5c02b-1192-464d-b681-a59b518a3fc2","isBroken":false}},{"link_text":"Privacy Policy","link":{"id":"W6FXNSUAAFarByOd","type":"privacy_policy_page","tags":[],"lang":"en-us","slug":"privacy-policy-page","first_publication_date":"2018-09-18T19:51:19+0000","last_publication_date":"2023-01-26T19:03:19+0000","link_type":"Document","key":"0f18f859-8f67-4f10-965a-ac171ddb5873","isBroken":false}}],"body":[{"primary":{"heading":[{"type":"heading2","text":"Products","spans":[]}]},"items":[{"link_text":"SiFive Core IP","link":{"id":"XEJUYxAAACoAp7jn","type":"standard_core_ip_page","tags":["standard core ip","risc-v core ip"],"lang":"en-us","slug":"standard-core-ip-page","first_publication_date":"2019-01-19T02:21:00+0000","last_publication_date":"2022-09-13T12:59:26+0000","link_type":"Document","key":"24ae5511-b622-4838-b5a3-5c562ea2dbf2","isBroken":false}},{"link_text":"SiFive Performance","link":{"id":"YND_YRUAACgA1gqB","type":"core_ip_category","tags":[],"lang":"en-us","slug":"sifive-performance","first_publication_date":"2021-06-21T23:27:12+0000","last_publication_date":"2022-11-01T12:56:51+0000","uid":"performance","link_type":"Document","key":"3257b7d0-6a5d-4224-958a-b50162c5df5f","isBroken":false}},{"link_text":"SiFive Intelligence","link":{"id":"YMu8fRUAACYAvzNV","type":"core_ip_category","tags":[],"lang":"en-us","slug":"sifive-intelligence","first_publication_date":"2021-06-17T21:26:16+0000","last_publication_date":"2023-01-18T23:34:34+0000","uid":"intelligence","link_type":"Document","key":"e7b38433-c910-4dc4-9940-ea3ab6449055","isBroken":false}},{"link_text":"SiFive Automotive","link":{"id":"YxuPJBIAACMAK3vz","type":"core_ip_category","tags":[],"lang":"en-us","slug":"sifive-automotive","first_publication_date":"2022-09-09T22:19:46+0000","last_publication_date":"2022-10-20T20:29:00+0000","uid":"automotive","link_type":"Document","key":"267f2452-f7dd-4d77-9caf-0fe49424e341","isBroken":false}},{"link_text":"SiFive Essential","link":{"id":"YNHluxMAAB8Ac3__","type":"core_ip_category","tags":[],"lang":"en-us","slug":"sifive-essential","first_publication_date":"2021-06-22T13:29:37+0000","last_publication_date":"2022-09-13T13:59:33+0000","uid":"essential","link_type":"Document","key":"72da5f44-0a3a-4367-82a6-d5cfab7eb161","isBroken":false}},{"link_text":"SiFive Core Designer","link":{"link_type":"Web","key":"6aa804d9-b8ee-482e-9610-f2f6386cbb5d","url":"https://scs.sifive.com/accounts/login/","target":"_blank"}},{"link_text":"Software","link":{"id":"XtifRREAACMAlq_C","type":"software_page","tags":[],"lang":"en-us","slug":"software-page","first_publication_date":"2020-06-04T10:30:59+0000","last_publication_date":"2023-02-09T23:00:05+0000","link_type":"Document","key":"43c87a79-a7a9-4df2-9305-80e38d6deb6a","isBroken":false}},{"link_text":"Boards","link":{"id":"W32A3h0AALp2ivxt","type":"boards_and_software_page","tags":[],"lang":"en-us","slug":"boards-and-software-page","first_publication_date":"2018-08-22T15:27:29+0000","last_publication_date":"2022-12-22T23:55:38+0000","link_type":"Document","key":"2f241317-e2e4-4aec-94d0-6d639052511d","isBroken":false}},{"link_text":"Documentation","link":{"id":"W6KNix0AAB0AoY0C","type":"documentation_page","tags":[],"lang":"en-us","slug":"documentation-page","first_publication_date":"2018-09-19T17:55:53+0000","last_publication_date":"2024-06-27T16:23:34+0000","link_type":"Document","key":"9011ee66-1a7c-44c9-90d4-fd4e8bf05a45","isBroken":false}},{"link_text":"Customer Support","link":{"link_type":"Web","key":"30a27a96-054d-4f3e-94ca-21c4b42bd1c5","url":"http://support.sifive.com/"}}],"id":"column$ebe10cd2-676d-434f-a745-2c5d3c2dab9d","slice_type":"column","slice_label":null},{"primary":{"heading":[{"type":"heading2","text":"Technology","spans":[]}]},"items":[{"link_text":"RISC-V","link":{"id":"YA1QoRMAAJV4rDnV","type":"technilo","tags":[],"lang":"en-us","slug":"a-risc-vfuture-is-inevitable","first_publication_date":"2021-02-09T21:16:06+0000","last_publication_date":"2023-03-07T19:16:22+0000","uid":"risc-v","link_type":"Document","key":"abde96ad-7aee-4ff3-a0cb-838fe4146b6e","isBroken":false}},{"link_text":"Vectors","link":{"id":"YlcDJREAACMAUtA3","type":"campaign","tags":["vectors"],"lang":"en-us","slug":"campaign","first_publication_date":"2022-05-18T18:28:02+0000","last_publication_date":"2022-10-26T22:33:50+0000","link_type":"Document","key":"537e64aa-c2c0-4a9b-968c-e8fb9727f20c","isBroken":false}},{"link_text":"Trace+Debug","link":{"id":"YA2xgxMAAHcBrePf","type":"technilo","tags":[],"lang":"en-us","slug":"sifive-insight","first_publication_date":"2021-02-03T19:47:10+0000","last_publication_date":"2021-12-14T21:32:25+0000","uid":"sifive-insight","link_type":"Document","key":"e9986050-d428-4cb2-9fe9-a09e14bfa9d2","isBroken":false}},{"link_text":"Security","link":{"id":"YA6wNRMAAJV4sjHF","type":"technilo","tags":[],"lang":"en-us","slug":"securing-theriscv-revolution","first_publication_date":"2021-02-09T21:15:51+0000","last_publication_date":"2023-06-22T16:27:20+0000","uid":"shield-soc-security","link_type":"Document","key":"9630f24c-4dce-44b0-9be6-b589f5e61e99","isBroken":false}}],"id":"column$bed4dac2-ff4f-47b2-b955-c0ccdaa7a7a9","slice_type":"column","slice_label":null},{"primary":{"heading":[{"type":"heading2","text":"Company","spans":[]}]},"items":[{"link_text":"About","link":{"id":"YYyw-hIAACEAotzW","type":"about_page_v2","tags":[],"lang":"en-us","slug":"about-page-v2","first_publication_date":"2021-11-11T06:06:10+0000","last_publication_date":"2023-03-03T14:58:38+0000","link_type":"Document","key":"c5f6c2f7-6ea9-43cf-bfb3-0786e0d46c64","isBroken":false}},{"link_text":"Newsroom","link":{"id":"W6PY8B0AAPwAp0d3","type":"press_page","tags":[],"lang":"en-us","slug":"press-page","first_publication_date":"2018-09-20T17:31:53+0000","last_publication_date":"2024-12-12T17:36:40+0000","link_type":"Document","key":"8e2416cc-bd2b-4b98-9d8f-cb2ee5c33327","isBroken":false}},{"link_text":"Careers","link":{"id":"YXwluREAACUA4k8U","type":"careers_v2","tags":[],"lang":"en-us","slug":"careers-v2","first_publication_date":"2021-10-29T17:19:52+0000","last_publication_date":"2023-03-07T17:40:03+0000","link_type":"Document","key":"50b79aba-518e-42de-8f32-a675387d1170","isBroken":false}},{"link_text":"Contact Us","link":{"id":"W6wJ3hAAADWIubas","type":"contact_page","tags":[],"lang":"en-us","slug":"contact-page","first_publication_date":"2018-09-26T22:36:17+0000","last_publication_date":"2023-03-07T17:39:30+0000","link_type":"Document","key":"978b92ac-3e55-476f-87ce-9492fdfbe76a","isBroken":false}}],"id":"column$560a6e45-7b11-4d8a-908c-4148d8cd3150","slice_type":"column","slice_label":null},{"primary":{"heading":[{"type":"heading2","text":"Community","spans":[]}]},"items":[{"link_text":"Blog","link":{"id":"W7agnRMAACsA6wJu","type":"blog_landing_page","tags":[],"lang":"en-us","slug":"blog-landing-page","first_publication_date":"2018-10-04T23:22:08+0000","last_publication_date":"2022-10-26T21:53:07+0000","link_type":"Document","key":"8e056563-10fe-44c4-918e-1569883b5d28","isBroken":false}},{"link_text":"Partners","link":{"id":"YFOVSxMAACIADCDD","type":"our_partners_page","tags":[],"lang":"en-us","slug":"our-partners","first_publication_date":"2021-03-19T23:49:37+0000","last_publication_date":"2023-02-03T21:36:34+0000","link_type":"Document","key":"1926b004-5cc5-4085-881c-e4a9343c23d2","isBroken":false}}],"id":"column$88c9f3b8-850c-4e36-93ee-88170f19ffd5","slice_type":"column","slice_label":null}]}},"socialLinks":{"id":"W6VvqR0AALBjrlVt","uid":null,"url":null,"type":"social_media_links","href":"https://sifive.cdn.prismic.io/api/v2/documents/search?ref=Z8H4ZhAAACEAUi4M\u0026q=%5B%5B%3Ad+%3D+at%28document.id%2C+%22W6VvqR0AALBjrlVt%22%29+%5D%5D","tags":[],"first_publication_date":"2018-09-21T22:24:43+0000","last_publication_date":"2023-08-11T17:32:15+0000","slugs":["social-media-links","untitled-document"],"linked_documents":[],"lang":"en-us","alternate_languages":[{"id":"ZJC-1xEAACgAzU_d","type":"social_media_links","lang":"zh-cn"}],"data":{"name":"Social Media Links","facebook":{"link_type":"Web","key":"600c6c89-f33b-45fa-bf9a-91ce45d7a4e0","url":"https://www.facebook.com/SiFive","target":"_blank"},"twitter":{"link_type":"Web","key":"ee52fd08-6261-4222-beda-bb29341c6dcb","url":"https://twitter.com/SiFive","target":"_blank"},"github":{"link_type":"Web","key":"daedb28b-aac5-4ba5-a5df-a3513864e3fc","url":"https://github.com/sifive/","target":"_blank"},"linkedin":{"link_type":"Web","key":"33ebaa5a-100a-401b-af2a-ca554bd7143b","url":"https://www.linkedin.com/company/sifive","target":"_blank"},"youtube":{"link_type":"Web","key":"8b90b343-d9c1-4e6d-8288-2d524f5688fb","url":"https://www.youtube.com/SiFiveInc","target":"_blank"},"glassdoor":{"link_type":"Web","key":"71f74031-da54-4def-8104-6f0c95f85adb","url":"https://www.glassdoor.com/Overview/Working-at-SiFive-EI_IE1922671.11,17.htm","target":"_blank"},"wechat":{"link_type":"Web","key":"e56a99bc-e56b-4520-ba7b-d08baff2dbde","url":"http://weixin.qq.com/r/Hh3q8nPEui7FrTeR90gM","target":"_blank"}}},"dropdowns":{"page":1,"results_per_page":20,"results_size":8,"total_results_size":8,"total_pages":1,"next_page":null,"prev_page":null,"results":[{"id":"XEeq2RAAACoAvyxk","uid":null,"url":null,"type":"dropdown","href":"https://sifive.cdn.prismic.io/api/v2/documents/search?ref=Z8H4ZhAAACEAUi4M\u0026q=%5B%5B%3Ad+%3D+at%28document.id%2C+%22XEeq2RAAACoAvyxk%22%29+%5D%5D","tags":[],"first_publication_date":"2019-01-22T23:44:41+0000","last_publication_date":"2023-07-05T15:10:56+0000","slugs":["products-dropdown","products","product-dropdown"],"linked_documents":[],"lang":"en-us","alternate_languages":[{"id":"ZJC9fBEAACgAzUnM","type":"dropdown","lang":"zh-cn"}],"data":{"title":[{"type":"heading1","text":"Products Dropdown","spans":[]}],"links":[{"text":"SiFive Core IP","link":{"id":"XEJUYxAAACoAp7jn","type":"standard_core_ip_page","tags":["standard core ip","risc-v core ip"],"lang":"en-us","slug":"standard-core-ip-page","first_publication_date":"2019-01-19T02:21:00+0000","last_publication_date":"2022-09-13T12:59:26+0000","link_type":"Document","key":"53e5c94e-b9e1-469f-9fcd-7d39bcf3e2b3","isBroken":false},"sub_menu":false},{"text":"Performance","link":{"id":"YND_YRUAACgA1gqB","type":"core_ip_category","tags":[],"lang":"en-us","slug":"sifive-performance","first_publication_date":"2021-06-21T23:27:12+0000","last_publication_date":"2022-11-01T12:56:51+0000","uid":"performance","link_type":"Document","key":"235cfda7-bf2b-4cc7-8dab-6d2f64d0884a","isBroken":false},"sub_menu":true},{"text":"Intelligence","link":{"id":"YMu8fRUAACYAvzNV","type":"core_ip_category","tags":[],"lang":"en-us","slug":"sifive-intelligence","first_publication_date":"2021-06-17T21:26:16+0000","last_publication_date":"2023-01-18T23:34:34+0000","uid":"intelligence","link_type":"Document","key":"361455b8-105a-4ebe-9e04-047a1409b6e9","isBroken":false},"sub_menu":true},{"text":"Automotive","link":{"id":"YxuPJBIAACMAK3vz","type":"core_ip_category","tags":[],"lang":"en-us","slug":"sifive-automotive","first_publication_date":"2022-09-09T22:19:46+0000","last_publication_date":"2022-10-20T20:29:00+0000","uid":"automotive","link_type":"Document","key":"92d93db5-9517-42ee-b67c-f6c2164597e7","isBroken":false},"sub_menu":true},{"text":"Essential","link":{"id":"YNHluxMAAB8Ac3__","type":"core_ip_category","tags":[],"lang":"en-us","slug":"sifive-essential","first_publication_date":"2021-06-22T13:29:37+0000","last_publication_date":"2022-09-13T13:59:33+0000","uid":"essential","link_type":"Document","key":"48da529c-ad72-4c9e-acf6-3278abcd029d","isBroken":false},"sub_menu":true},{"text":"SiFive Core Designer","link":{"link_type":"Web","key":"b4fc239d-b955-4fa3-9865-c66bd12b47e6","url":"https://scs.sifive.com/accounts/login/","target":"_blank"},"sub_menu":false},{"text":"Software","link":{"id":"XtifRREAACMAlq_C","type":"software_page","tags":[],"lang":"en-us","slug":"software-page","first_publication_date":"2020-06-04T10:30:59+0000","last_publication_date":"2023-02-09T23:00:05+0000","link_type":"Document","key":"e706633f-c60e-48d9-a291-f8e5832511b8","isBroken":false},"sub_menu":false},{"text":"Boards","link":{"id":"W32A3h0AALp2ivxt","type":"boards_and_software_page","tags":[],"lang":"en-us","slug":"boards-and-software-page","first_publication_date":"2018-08-22T15:27:29+0000","last_publication_date":"2022-12-22T23:55:38+0000","link_type":"Document","key":"f834942a-acaf-4eae-a023-7489b9a38c99","isBroken":false},"sub_menu":false},{"text":"Documentation","link":{"id":"W6KNix0AAB0AoY0C","type":"documentation_page","tags":[],"lang":"en-us","slug":"documentation-page","first_publication_date":"2018-09-19T17:55:53+0000","last_publication_date":"2024-06-27T16:23:34+0000","link_type":"Document","key":"8a58b3e8-365a-4cfe-92d4-76a41b0c0ec2","isBroken":false},"sub_menu":false},{"text":"Customer Support","link":{"link_type":"Web","key":"3c574c4b-8218-413d-b5d8-0907c3a781cb","url":"http://support.sifive.com/"},"sub_menu":false}]}},{"id":"XLpO8RUAACcAIUl5","uid":null,"url":null,"type":"dropdown","href":"https://sifive.cdn.prismic.io/api/v2/documents/search?ref=Z8H4ZhAAACEAUi4M\u0026q=%5B%5B%3Ad+%3D+at%28document.id%2C+%22XLpO8RUAACcAIUl5%22%29+%5D%5D","tags":[],"first_publication_date":"2019-04-19T22:43:00+0000","last_publication_date":"2023-03-07T17:41:04+0000","slugs":["community-dropdown"],"linked_documents":[],"lang":"en-us","alternate_languages":[{"id":"ZJC_4BEAACUAzVSh","type":"dropdown","lang":"zh-cn"}],"data":{"title":[{"type":"heading1","text":"Community Dropdown","spans":[]}],"links":[{"text":"Blog","link":{"id":"W7agnRMAACsA6wJu","type":"blog_landing_page","tags":[],"lang":"en-us","slug":"blog-landing-page","first_publication_date":"2018-10-04T23:22:08+0000","last_publication_date":"2022-10-26T21:53:07+0000","link_type":"Document","key":"875713fc-afc5-4a97-9573-ac21bba643da","isBroken":false},"sub_menu":false},{"text":"Partners","link":{"id":"YFOVSxMAACIADCDD","type":"our_partners_page","tags":[],"lang":"en-us","slug":"our-partners","first_publication_date":"2021-03-19T23:49:37+0000","last_publication_date":"2023-02-03T21:36:34+0000","link_type":"Document","key":"3c4a5f17-48d3-4f61-9e8d-1a29f0a1c470","isBroken":false},"sub_menu":false}]}},{"id":"YCcZ7BMAACMAuqC1","uid":null,"url":null,"type":"dropdown","href":"https://sifive.cdn.prismic.io/api/v2/documents/search?ref=Z8H4ZhAAACEAUi4M\u0026q=%5B%5B%3Ad+%3D+at%28document.id%2C+%22YCcZ7BMAACMAuqC1%22%29+%5D%5D","tags":[],"first_publication_date":"2021-02-13T00:46:38+0000","last_publication_date":"2023-03-07T17:41:48+0000","slugs":["technology-page-dropdown"],"linked_documents":[],"lang":"en-us","alternate_languages":[{"id":"ZJC_HBEAACcAzVEV","type":"dropdown","lang":"zh-cn"}],"data":{"title":[{"type":"heading1","text":"Technology Page Dropdown","spans":[]}],"links":[{"text":"RISC-V","link":{"id":"YA1QoRMAAJV4rDnV","type":"technilo","tags":[],"lang":"en-us","slug":"a-risc-vfuture-is-inevitable","first_publication_date":"2021-02-09T21:16:06+0000","last_publication_date":"2023-03-07T19:16:22+0000","uid":"risc-v","link_type":"Document","key":"56c67a3b-f081-431d-a4b2-c4674e877289","isBroken":false},"sub_menu":false},{"text":"Vectors","link":{"id":"YlcDJREAACMAUtA3","type":"campaign","tags":["vectors"],"lang":"en-us","slug":"campaign","first_publication_date":"2022-05-18T18:28:02+0000","last_publication_date":"2022-10-26T22:33:50+0000","link_type":"Document","key":"7f1941a4-0a9b-477e-a82e-5895bf064bab","isBroken":false},"sub_menu":false},{"text":"Trace+Debug","link":{"id":"YA2xgxMAAHcBrePf","type":"technilo","tags":[],"lang":"en-us","slug":"sifive-insight","first_publication_date":"2021-02-03T19:47:10+0000","last_publication_date":"2021-12-14T21:32:25+0000","uid":"sifive-insight","link_type":"Document","key":"380f9f17-191a-4c86-9426-c65b40089520","isBroken":false},"sub_menu":false},{"text":"Security","link":{"id":"YA6wNRMAAJV4sjHF","type":"technilo","tags":[],"lang":"en-us","slug":"securing-theriscv-revolution","first_publication_date":"2021-02-09T21:15:51+0000","last_publication_date":"2023-06-22T16:27:20+0000","uid":"shield-soc-security","link_type":"Document","key":"24a1dfe4-e162-4347-9895-2c5412c69984","isBroken":false},"sub_menu":false}]}},{"id":"XLpOCBUAACoAIUVv","uid":null,"url":null,"type":"dropdown","href":"https://sifive.cdn.prismic.io/api/v2/documents/search?ref=Z8H4ZhAAACEAUi4M\u0026q=%5B%5B%3Ad+%3D+at%28document.id%2C+%22XLpOCBUAACoAIUVv%22%29+%5D%5D","tags":[],"first_publication_date":"2019-04-19T22:41:37+0000","last_publication_date":"2022-05-23T22:09:02+0000","slugs":["company-dropdown","company"],"linked_documents":[],"lang":"en-us","alternate_languages":[{"id":"ZJC_qhEAACUAzVOw","type":"dropdown","lang":"zh-cn"}],"data":{"title":[{"type":"heading1","text":"Company Dropdown","spans":[]}],"links":[{"text":"About","link":{"id":"YYyw-hIAACEAotzW","type":"about_page_v2","tags":[],"lang":"en-us","slug":"about-page-v2","first_publication_date":"2021-11-11T06:06:10+0000","last_publication_date":"2023-03-03T14:58:38+0000","link_type":"Document","key":"5d7e7197-5cfd-4815-9ef6-9fe7e924469a","isBroken":false},"sub_menu":false},{"text":"Newsroom","link":{"id":"W6PY8B0AAPwAp0d3","type":"press_page","tags":[],"lang":"en-us","slug":"press-page","first_publication_date":"2018-09-20T17:31:53+0000","last_publication_date":"2024-12-12T17:36:40+0000","link_type":"Document","key":"726b2960-b939-48b3-922a-c5f830b73cc8","isBroken":false},"sub_menu":false},{"text":"Careers","link":{"id":"YXwluREAACUA4k8U","type":"careers_v2","tags":[],"lang":"en-us","slug":"careers-v2","first_publication_date":"2021-10-29T17:19:52+0000","last_publication_date":"2023-03-07T17:40:03+0000","link_type":"Document","key":"7eb38bb5-7985-4d22-ad22-aeb05a8059c5","isBroken":false},"sub_menu":false},{"text":"Contact Us","link":{"id":"W6wJ3hAAADWIubas","type":"contact_page","tags":[],"lang":"en-us","slug":"contact-page","first_publication_date":"2018-09-26T22:36:17+0000","last_publication_date":"2023-03-07T17:39:30+0000","link_type":"Document","key":"b70d1ef2-4872-421b-9c4b-97a5737ffd69","isBroken":false},"sub_menu":false}]}},{"id":"X7YlUBMAACEAMMwh","uid":null,"url":null,"type":"dropdown","href":"https://sifive.cdn.prismic.io/api/v2/documents/search?ref=Z8H4ZhAAACEAUi4M\u0026q=%5B%5B%3Ad+%3D+at%28document.id%2C+%22X7YlUBMAACEAMMwh%22%29+%5D%5D","tags":[],"first_publication_date":"2020-11-20T09:14:43+0000","last_publication_date":"2021-11-15T17:39:16+0000","slugs":["resources-dropdown"],"linked_documents":[],"lang":"en-us","alternate_languages":[{"id":"ZJC_hxEAACUAzVMN","type":"dropdown","lang":"zh-cn"}],"data":{"title":[{"type":"heading1","text":"Resources DropDown","spans":[]}],"links":[{"text":"Featured Resources","link":{"id":"XRVQaBIAACEAq4Nz","type":"resources_page","tags":[],"lang":"en-us","slug":"resources-page","first_publication_date":"2019-06-27T23:26:02+0000","last_publication_date":"2023-10-17T18:41:54+0000","uid":"resources","link_type":"Document","key":"fdfdfac2-d6bb-436c-b10d-145321c2114a","isBroken":false},"sub_menu":false},{"text":"Videos","link":{"link_type":"Web","key":"6e14d90e-876c-4225-ba8e-12f759bdc231","url":"https://www.sifive.com/resources/videos"},"sub_menu":false},{"text":"Webinars","link":{"link_type":"Web","key":"a80e0105-dd5d-4b63-a2c6-3838bf4c2d54","url":"https://www.sifive.com/resources/webinars"},"sub_menu":false},{"text":"Tech Papers \u0026 Case Studies","link":{"link_type":"Web","key":"f2790302-1681-4309-8658-30b43e64e8d0","url":"https://www.sifive.com/resources/case-studies"},"sub_menu":false}]}},{"id":"XzD_tBAAACAAthYk","uid":null,"url":null,"type":"dropdown","href":"https://sifive.cdn.prismic.io/api/v2/documents/search?ref=Z8H4ZhAAACEAUi4M\u0026q=%5B%5B%3Ad+%3D+at%28document.id%2C+%22XzD_tBAAACAAthYk%22%29+%5D%5D","tags":[],"first_publication_date":"2020-08-17T14:28:06+0000","last_publication_date":"2020-08-17T14:28:06+0000","slugs":["technology-dropdown","technology"],"linked_documents":[],"lang":"en-us","alternate_languages":[],"data":{"title":[{"type":"heading1","text":"Technology Dropdown","spans":[]}],"links":[{"text":"RISC-V","link":{"id":"W476ayYAACcAfi7X","type":"why_page","tags":[],"lang":"en-us","slug":"why-sifive","first_publication_date":"2018-09-04T21:34:38+0000","last_publication_date":"2020-08-17T14:28:06+0000","link_type":"Document","key":"edd093bc-3cf1-40a1-9484-bca725d690e2","isBroken":false},"sub_menu":null},{"text":"Scalable Microarchitectures","link":{"id":"XzUQBhAAAB8ANvNd","type":"configurability_via_core_designer_page","tags":[],"lang":"en-us","slug":"configurability-via-core-designer-page","first_publication_date":"2020-08-17T14:28:06+0000","last_publication_date":"2022-10-26T22:50:56+0000","link_type":"Document","key":"804cd551-b694-476e-8b16-3f8eb5bcf5e2","isBroken":false},"sub_menu":null},{"text":"SiFive Release Cadence","link":{"id":"XzUWwRAAAB8ANxF_","type":"federated_releases_page","tags":[],"lang":"en-us","slug":"sifive-release-cadence","first_publication_date":"2020-08-17T14:28:06+0000","last_publication_date":"2020-08-17T14:28:06+0000","link_type":"Document","key":"477bc0fa-6921-45c0-a725-9961a77879d0","isBroken":false},"sub_menu":null},{"text":"SiFive Mix+Match","link":{"id":"XzUfohAAAB8ANzjM","type":"mix_match_page","tags":[],"lang":"en-us","slug":"sifive-mix--match","first_publication_date":"2020-08-17T14:28:06+0000","last_publication_date":"2020-08-17T14:28:06+0000","link_type":"Document","key":"8f8e7cd5-014b-4e82-82ed-75b1061689e5","isBroken":false},"sub_menu":null},{"text":"SiFive Insight Trace and Debug","link":{"id":"XmvKTxIAACIAt6mb","type":"soc_ip","tags":[],"lang":"en-us","slug":"sifive-insight","first_publication_date":"2020-03-17T13:00:05+0000","last_publication_date":"2020-07-22T15:00:03+0000","uid":"sifive-insight","link_type":"Document","key":"6925ca62-d5ff-4f4f-84e3-548a204d5ac8","isBroken":false},"sub_menu":null},{"text":"SiFive Shield SoC Security","link":{"id":"Xa-XNhAAAB4An46Y","type":"blog_post","tags":[],"lang":"en-us","slug":"sifive-shield-an-open-scalable-platform-architecture-for-security","first_publication_date":"2019-10-23T20:30:00+0000","last_publication_date":"2019-10-23T20:30:00+0000","uid":"sifive-shield-an-open-scalable-platform-architecture","link_type":"Document","key":"6309f715-979e-4edc-be8b-457b4e127d8c","isBroken":false},"sub_menu":null}]}},{"id":"XzEAGBAAAB8Athfj","uid":null,"url":null,"type":"dropdown","href":"https://sifive.cdn.prismic.io/api/v2/documents/search?ref=Z8H4ZhAAACEAUi4M\u0026q=%5B%5B%3Ad+%3D+at%28document.id%2C+%22XzEAGBAAAB8Athfj%22%29+%5D%5D","tags":[],"first_publication_date":"2020-08-17T14:28:06+0000","last_publication_date":"2020-08-17T14:28:06+0000","slugs":["solutionsmarkets-dropdown","solutions--markets"],"linked_documents":[],"lang":"en-us","alternate_languages":[{"id":"ZJC_ZxEAACcAzVJy","type":"dropdown","lang":"zh-cn"}],"data":{"title":[{"type":"heading1","text":"Solutions/Markets Dropdown","spans":[]}],"links":[{"text":"Embedded","link":{"id":"XzUpGxAAACAAN2KR","type":"embedded_page","tags":[],"lang":"en-us","slug":"embedded","first_publication_date":"2020-08-17T14:28:06+0000","last_publication_date":"2020-08-17T14:28:06+0000","link_type":"Document","key":"84b22f1c-feb7-41ec-8e8e-7dbbbf1d03c4","isBroken":false},"sub_menu":null},{"text":"AR/VR/Sensor Fusion/Consumer Device","link":{"id":"XzUpaRAAAB4AN2Ps","type":"ar_vr_page","tags":[],"lang":"en-us","slug":"arvrsensor-fusionconsumer-device","first_publication_date":"2020-08-17T14:28:06+0000","last_publication_date":"2020-08-17T14:28:06+0000","link_type":"Document","key":"a5b9795d-f680-4377-87eb-3af3a83dc426","isBroken":false},"sub_menu":null},{"text":"Networking/Edge","link":{"id":"XzUpjRAAACEAN2SP","type":"network_edge_page","tags":[],"lang":"en-us","slug":"networkingedge","first_publication_date":"2020-08-17T14:28:06+0000","last_publication_date":"2020-08-17T14:28:06+0000","link_type":"Document","key":"0cbc902a-2e06-47c1-a147-3f0e8ffbd367","isBroken":false},"sub_menu":null},{"text":"Storage","link":{"id":"XzO_thAAACEAwjuK","type":"storage_page","tags":[],"lang":"en-us","slug":"storage","first_publication_date":"2020-08-17T14:28:06+0000","last_publication_date":"2020-08-17T14:28:06+0000","link_type":"Document","key":"84f26fc5-5389-474d-922f-88109ffce077","isBroken":false},"sub_menu":null}]}},{"id":"XL-ztRUAACcAOL-T","uid":null,"url":null,"type":"dropdown","href":"https://sifive.cdn.prismic.io/api/v2/documents/search?ref=Z8H4ZhAAACEAUi4M\u0026q=%5B%5B%3Ad+%3D+at%28document.id%2C+%22XL-ztRUAACcAOL-T%22%29+%5D%5D","tags":[],"first_publication_date":"2019-04-25T20:30:54+0000","last_publication_date":"2019-04-25T20:30:54+0000","slugs":["cloud-dropdown"],"linked_documents":[],"lang":"en-us","alternate_languages":[{"id":"ZJC_NBEAACcAzVGB","type":"dropdown","lang":"zh-cn"}],"data":{"title":[{"type":"heading1","text":"Cloud Dropdown","spans":[]}],"links":[{"text":"Core Designer","link":{"id":"W47SSiYAACUAfX3G","type":"core_designer_page","tags":[],"lang":"en-us","slug":"core-designer-page","first_publication_date":"2018-09-04T18:43:26+0000","last_publication_date":"2021-12-27T19:24:55+0000","link_type":"Document","key":"3ca71826-71d0-4439-8543-f85a03bec7d4","isBroken":false},"sub_menu":null},{"text":"Chip Designer","link":{"id":"W47r9yYAACQAfe9i","type":"broken_type","tags":[],"lang":null,"slug":"-","first_publication_date":null,"last_publication_date":null,"link_type":"Document","key":"1a0483a4-60f2-4c80-a74a-48b62da1f59b","isBroken":true},"sub_menu":null}]}}],"version":"d4ff394","license":"All Rights Reserved"}},"__N_SSG":true},"page":"/blog/[uid]","query":{"uid":"llm-optimization-and-deployment-on-sifive-intellig"},"buildId":"qSvTi_5dqOuR6wwdFd7qa","isFallback":false,"isExperimentalCompile":false,"gsp":true,"locale":"en-us","locales":["en-us","zh-cn"],"defaultLocale":"en-us","domainLocales":[{"domain":"www.sifive.com","defaultLocale":"en-us"},{"domain":"www.sifive.cn","defaultLocale":"zh-cn"}],"scriptLoader":[{"id":"google-tag-manager","strategy":"afterInteractive","children":"\n (function (w, d, s, l, i) {\n w[l] = w[l] || [];\n w[l].push({ \"gtm.start\": new Date().getTime(), event: \"gtm.js\" });\n var f = d.getElementsByTagName(s)[0],\n j = d.createElement(s),\n dl = l != \"dataLayer\" ? \"\u0026l=\" + l : \"\";\n j.async = true;\n j.src = \"https://www.googletagmanager.com/gtm.js?id=\" + i + dl;\n f.parentNode.insertBefore(j, f);\n })(window, document, \"script\", \"dataLayer\", \"GTM-TTH6HTM\");\n "},{"src":"https://www.googletagmanager.com/gtag/js?id=G-41SWBNHLEK","strategy":"afterInteractive"},{"id":"google-analytics","strategy":"afterInteractive","children":"\n window.dataLayer = window.dataLayer || []; \n function gtag(){dataLayer.push(arguments);} \n gtag('js', new Date()); \n gtag('config', 'G-41SWBNHLEK');\n "},{"id":"segment","strategy":"afterInteractive","children":"\n !function(){\n var analytics=window.analytics=window.analytics||[];\n if(!analytics.initialize)if(analytics.invoked)window.console\u0026\u0026console.error\u0026\u0026console.error(\"Segment snippet included twice.\");else{analytics.invoked=!0;analytics.methods=[\"trackSubmit\",\"trackClick\",\"trackLink\",\"trackForm\",\"pageview\",\"identify\",\"reset\",\"group\",\"track\",\"ready\",\"alias\",\"debug\",\"page\",\"once\",\"off\",\"on\"];analytics.factory=function(t){return function(){var e=Array.prototype.slice.call(arguments);e.unshift(t);analytics.push(e);return analytics}};for(var t=0;t\u003canalytics.methods.length;t++){var e=analytics.methods[t];analytics[e]=analytics.factory(e)}analytics.load=function(t,e){var n=document.createElement(\"script\");n.type=\"text/javascript\";n.async=!0;n.src=\"https://cdn.segment.com/analytics.js/v1/\"+t+\"/analytics.min.js\";var a=document.getElementsByTagName(\"script\")[0];a.parentNode.insertBefore(n,a);analytics._loadOptions=e};\n analytics.SNIPPET_VERSION=\"4.1.0\";\n analytics.load(\"sOS7puT996WV5wh0V6QR1AYps6trxHJQ\");\n }}();\n "},{"id":"hs-script-loader","src":"//js.hs-scripts.com/3020607.js","strategy":"afterInteractive"}]}</script></body></html>

Pages: 1 2 3 4 5 6 7 8 9 10