CINXE.COM

A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks

<!DOCTYPE html> <html lang="en"> <head> <meta content="text/html; charset=utf-8" http-equiv="content-type"/> <title>A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks</title> <!--Generated on Mon Mar 3 14:04:51 2025 by LaTeXML (version 0.8.8) http://dlmf.nist.gov/LaTeXML/.--> <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv-fonts.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/latexml_styles.css" rel="stylesheet" type="text/css"/> <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.3.3/html2canvas.min.js"></script> <script src="/static/browse/0.3.4/js/addons_new.js"></script> <script src="/static/browse/0.3.4/js/feedbackOverlay.js"></script> <meta content="BCPNN Neuromorphic FPGA HLS." lang="en" name="keywords"/> <base href="/html/2503.01561v1/"/></head> <body> <nav class="ltx_page_navbar"> <nav class="ltx_TOC"> <ol class="ltx_toclist"> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S1" title="In A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">1 </span>Introduction</span></a></li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S2" title="In A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">2 </span>Related Work</span></a></li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S3" title="In A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3 </span>Bayesian Confidence Propagation Neural Network</span></a></li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S4" title="In A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4 </span>High-Performance Stream-Based BCPNN Accelerator</span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S4.SS1" title="In 4 High-Performance Stream-Based BCPNN Accelerator ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.1 </span>Accelerator Design using HLS</span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S4.SS1.SSS1" title="In 4.1 Accelerator Design using HLS ‣ 4 High-Performance Stream-Based BCPNN Accelerator ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.1.1 </span>Initial Unoptimized Sequential Implementation.</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S4.SS1.SSS2" title="In 4.1 Accelerator Design using HLS ‣ 4 High-Performance Stream-Based BCPNN Accelerator ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.1.2 </span>Optimization #1: Stream-based FIFO data.</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S4.SS1.SSS3" title="In 4.1 Accelerator Design using HLS ‣ 4 High-Performance Stream-Based BCPNN Accelerator ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.1.3 </span>Optimization #2: Dataflow process.</span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S4.SS2" title="In 4 High-Performance Stream-Based BCPNN Accelerator ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.2 </span>Performance Analysis</span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S4.SS2.SSS1" title="In 4.2 Performance Analysis ‣ 4 High-Performance Stream-Based BCPNN Accelerator ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.2.1 </span>Theoretical Performance and Bandwidth.</span></a></li> </ol> </li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S5" title="In A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">5 </span>Experimental Setup</span></a></li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S6" title="In A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">6 </span>Result</span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S6.SS1" title="In 6 Result ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">6.1 </span>Correctness</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S6.SS2" title="In 6 Result ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">6.2 </span>Performance</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S6.SS3" title="In 6 Result ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">6.3 </span>Analysis</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S6.SS4" title="In 6 Result ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">6.4 </span>Resource Consumption</span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S7" title="In A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">7 </span>Conclusion</span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S7.SS0.SSS1" title="In 7 Conclusion ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">7.0.1 </span>Acknowledgements</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S7.SS0.SSS2" title="In 7 Conclusion ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">7.0.2 </span><span class="ltx_ERROR undefined">\discintname</span></span></a></li> </ol> </li> </ol></nav> </nav> <div class="ltx_page_main"> <div class="ltx_page_content"> <article class="ltx_document ltx_authors_1line"><span class="ltx_note ltx_role_institutetext" id="id1"><sup class="ltx_note_mark">1</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">1</sup><span class="ltx_note_type">institutetext: </span>KTH Royal Institute of Technology, Stockholm, Sweden <br class="ltx_break"/></span></span></span><span class="ltx_note ltx_role_institutetext" id="id2"><sup class="ltx_note_mark">2</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">2</sup><span class="ltx_note_type">institutetext: </span>Stockholm University, Stockholm, Sweden <br class="ltx_break"/></span></span></span><span class="ltx_note ltx_role_institutetext" id="id3"><sup class="ltx_note_mark">3</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">3</sup><span class="ltx_note_type">institutetext: </span>Swedish e-Science Research Centre (SeRC), Sweden <br class="ltx_break"/></span></span></span><span class="ltx_note ltx_role_institutetext" id="id4"><sup class="ltx_note_mark">4</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">4</sup><span class="ltx_note_type">institutetext: </span>Digital Futures, Stockholm, Sweden <br class="ltx_break"/></span></span></span><span class="ltx_note ltx_role_institutetext" id="id5"><sup class="ltx_note_mark">5</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">5</sup><span class="ltx_note_type">institutetext: </span>International Research Centre for Neurointelligence, University of Tokyo, Japan <br class="ltx_break"/><span class="ltx_note ltx_role_email" id="id5.1"><sup class="ltx_note_mark">5</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">5</sup><span class="ltx_note_type">email: </span>{miahafiz, nbrav, ala, paherman, podobas}@kth.se</span></span></span></span></span></span> <h1 class="ltx_title ltx_title_document">A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks</h1> <div class="ltx_subtitle">Design, Implementation, and Performance Analysis </div> <div class="ltx_authors"> <span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Muhammad Ihsan Al Hafiz </span><span class="ltx_author_notes">11</span></span> <span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Naresh Ravichandran </span><span class="ltx_author_notes">11</span></span> <span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Anders Lansner </span><span class="ltx_author_notes">112233</span></span> <span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Pawel Herman </span><span class="ltx_author_notes">11334455</span></span> <span class="ltx_author_before">  </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Artur Podobas </span><span class="ltx_author_notes">1133</span></span> </div> <div class="ltx_abstract"> <h6 class="ltx_title ltx_title_abstract">Abstract</h6> <p class="ltx_p" id="id1.id1">Brain-inspired algorithms are attractive and emerging alternatives to classical deep learning methods for use in various machine learning applications. Brain-inspired systems can feature local learning rules, both unsupervised/semi-supervised learning and different types of plasticity (structural/synaptic), allowing them to potentially be faster and more energy-efficient than traditional machine learning alternatives. Among the more salient brain-inspired algorithms are Bayesian Confidence Propagation Neural Networks (BCPNNs). BCPNN is an important tool for both machine learning and computational neuroscience research, and recent work shows that BCPNN can reach state-of-the-art performance in tasks such as learning and memory recall compared to other models. Unfortunately, BCPNN is primarily executed on slow general-purpose processors (CPUs) or power-hungry graphics processing units (GPUs), reducing the applicability of using BCPNN in (among others) Edge systems. In this work, we design a custom stream-based accelerator for BCPNN using Field-Programmable Gate Arrays (FPGA) using Xilinx Vitis High-Level Synthesis (HLS) flow. Furthermore, we model our accelerator’s performance using first principles, and we empirically show that our proposed accelerator is between 1.3x - 5.3x faster than an Nvidia A100 GPU while at the same time consuming between 2.62x - 3.19x less power and 5.8x - 16.5x less energy without any degradation in performance.</p> </div> <div class="ltx_keywords"> <h6 class="ltx_title ltx_title_keywords">Keywords: </h6>BCPNN Neuromorphic FPGA HLS. </div> <section class="ltx_section" id="S1"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">1 </span>Introduction</h2> <div class="ltx_para" id="S1.p1"> <p class="ltx_p" id="S1.p1.1"><span class="ltx_glossaryref" title="">Deep Learning (DL)</span> <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib15" title="">15</a>]</cite> architecture has emerged as one of the most essential machine learning tools in the past decades. <span class="ltx_glossaryref" title="">DLs</span> are used in everything from image recognition <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib2" title="">2</a>]</cite> and time-series prediction <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib16" title="">16</a>]</cite> to natural language processing <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib11" title="">11</a>]</cite>. Since their inception around 2012, the size of <span class="ltx_glossaryref" title="">DL</span> systems has been growing at an exponential rate, demanding more and more computational resources and power <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib28" title="">28</a>]</cite>. In particular the latter, energy consumption, has been identified as challenge to overcome since training a modern <span class="ltx_glossaryref" title="">DL</span> system can take several months and can be very energy-consuming (ChatGPT4 consumed  50 million kWh <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib8" title="">8</a>]</cite>). In short, there is a growing need to research alternative machine learning methods in order to satisfy performance demands without needlessly taxing the environment. One such direction is to draw inspiration from biology and investigate <span class="ltx_text ltx_font_typewriter" id="S1.p1.1.1">brain-inspired</span> systems.</p> </div> <div class="ltx_para" id="S1.p2"> <p class="ltx_p" id="S1.p2.1">A <span class="ltx_text ltx_font_typewriter" id="S1.p2.1.1">brain-inspired</span> system is a system that solves machine learning problems in a way abstracted but derived from theories of the brain in computational neuroscience. A brain-inspired system can be either spiking <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib18" title="">18</a>]</cite> (often called neuromorphic system <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib27" title="">27</a>]</cite>) or rate-based (non-spiking). Brain-inspired systems typically have several traits that make them attractive to use: <span class="ltx_text ltx_font_bold" id="S1.p2.1.2">(i)</span> they can be very sparse and energy-efficient, <span class="ltx_text ltx_font_bold" id="S1.p2.1.3">(ii)</span> they have local (non-propagating) learning rules, <span class="ltx_text ltx_font_bold" id="S1.p2.1.4">(iii)</span> supports one- and few-shot learning, and <span class="ltx_text ltx_font_bold" id="S1.p2.1.5">(iv)</span> they can provide insight into how the brain computes. There are multiple brain-inspired machine learning models, but few are as salient and with such mature theory as the <span class="ltx_text ltx_font_italic" id="S1.p2.1.6">Bayesian Confidence Propagation Neural Network</span> (BCPNN) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib3" title="">3</a>]</cite>.</p> </div> <div class="ltx_para" id="S1.p3"> <p class="ltx_p" id="S1.p3.1">BCPNN is a biologically plausible model that is derived from the organization of the human cortex <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib19" title="">19</a>]</cite>, where the basic building blocks are hypercolumns and minicolumns. BCPNN supports multiple different forms of learning, including learning of synaptic strengths <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib5" title="">5</a>]</cite> (based on Bayes theorem) as well as structural plasticity <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib12" title="">12</a>]</cite> that rewire the connections between building blocks. More importantly, BCPNN supports supervised, semi-supervised, and unsupervised learning <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib25" title="">25</a>]</cite>, making it a strong choice for systems with a limited amount of labelled data. While BCPNN has shown state-of-the-art training and inference performance <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib23" title="">23</a>]</cite> in multiple data sets using general-purpose <span class="ltx_glossaryref" title="">Central Processing Unit (CPU)</span> and <span class="ltx_glossaryref" title="">Graphics Processing Unit (GPU)</span> implementation, these devices are typically too expensive (e.g., in terms of power consumption) to deploy on Edge computing devices that could leverage the properties of BCPNN.</p> </div> <div class="ltx_para" id="S1.p4"> <p class="ltx_p" id="S1.p4.1">In this work, we propose the first high-performance hardware accelerator for BCPNN. We have described our data-flow accelerator using the Xilinx Vitis <span class="ltx_glossaryref" title="">High Level Synthesis (HLS)</span> <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib20" title="">20</a>]</cite> toolchain and executed it on state-of-the-art Alveo U55C Field-Programmable Gate Arrays (FPGAs). We claim the following contributions:</p> </div> <div class="ltx_para" id="S1.p5"> <ol class="ltx_enumerate" id="S1.I1"> <li class="ltx_item" id="S1.I1.i1" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">1.</span> <div class="ltx_para" id="S1.I1.i1.p1"> <p class="ltx_p" id="S1.I1.i1.p1.1">We describe and implement the first high-performance BCPNN FPGA accelerator for use in data centers and edge systems that support both inference as well as online (unsupervised) learning in faster-than real-time,</p> </div> </li> <li class="ltx_item" id="S1.I1.i2" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">2.</span> <div class="ltx_para" id="S1.I1.i2.p1"> <p class="ltx_p" id="S1.I1.i2.p1.1">we apply the BCPNN theory on two new data-sets: detecting Pneumonia and Breast cancer,</p> </div> </li> <li class="ltx_item" id="S1.I1.i3" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">3.</span> <div class="ltx_para" id="S1.I1.i3.p1"> <p class="ltx_p" id="S1.I1.i3.p1.1">We develop an analytical performance model (based on first principles) to provide insight into the performance of our hardware accelerator and</p> </div> </li> <li class="ltx_item" id="S1.I1.i4" style="list-style-type:none;"> <span class="ltx_tag ltx_tag_item">4.</span> <div class="ltx_para" id="S1.I1.i4.p1"> <p class="ltx_p" id="S1.I1.i4.p1.1">We empirically quantify the performance of our accelerator, positioning it against an Tesla-class Nvidia A100 GPU and a Intel Xeon server-class CPU, showing an advantage in both performance and power consumption of our FPGA accelerator</p> </div> </li> </ol> </div> </section> <section class="ltx_section" id="S2"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">2 </span>Related Work</h2> <div class="ltx_para" id="S2.p1"> <p class="ltx_p" id="S2.p1.1">BCPNN has a long lineage of research work dating back to 1980s <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib13" title="">13</a>]</cite>. Since then, several research works have extended the use of BCPNN to (among others) drug reaction signal generation <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib3" title="">3</a>]</cite>, pattern recongition <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib21" title="">21</a>]</cite>, spike-based formulation <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib31" title="">31</a>]</cite>, investigated support for fixed-point arithmetic <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib9" title="">9</a>]</cite>, and several machine learning applications <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib25" title="">25</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib30" title="">30</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib26" title="">26</a>]</cite> and more. Motivated by the success and versatility of BCPNN, several groups have proposed hardware accelerators to improve performance and reduce the energy consumption of BCPNN. In 2020, Yang et al. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib39" title="">39</a>]</cite> optimized the BCPNN learning rule with respect to memory accesses, showing how non-coalesced column-wise memory access patterns in lazy-based methods can be eliminated, which can result in significant speed ups. In 2020, Liu et al. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib17" title="">17</a>]</cite> implemented an <span class="ltx_glossaryref" title="">Field Programmable Gate Array (FPGA)</span>-based hardware accelerator for a spiking-based <span class="ltx_glossaryref" title="">Bayesian Confidence Propagation Neural Network (BCPNN)</span> model. This architecture employs a ’lazy update mode’, efficiently updating eight local synaptic state variables by optimizing parallelism and decomposing calculations based on inherent data dependencies. These optimizations reduce the computation and bandwidth by more than two orders of magnitude, which makes efficient implementation of <span class="ltx_glossaryref" title="">BCPNN</span> for real-time brain simulation engine <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib17" title="">17</a>]</cite>. This approach led to a substantial acceleration in processing time, with an update time of 110 ns on an <span class="ltx_glossaryref" title="">FPGA</span>, compared to 25800 ns on a <span class="ltx_glossaryref" title="">CPU</span> <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib17" title="">17</a>]</cite>. Podobas et al. introduced StreamBrain <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib22" title="">22</a>]</cite> in 2021, a framework that enables the deployment of the rate-based <span class="ltx_glossaryref" title="">BCPNN</span> in High-Performance Computing (HPC) systems. StreamBrain is a domain-specific language (DSL) that supports various backends, including <span class="ltx_glossaryref" title="">CPUs</span>, <span class="ltx_glossaryref" title="">GPUs</span>, and <span class="ltx_glossaryref" title="">FPGAs</span>. The authors demonstrate the practical capabilities of StreamBrain by training the MNIST dataset within seconds and to show the result of <span class="ltx_glossaryref" title="">BCPNN</span> in higher-dimension problems with STL-10 networks. Additionally, the paper explores the use of custom floating-point formats and the impact when using <span class="ltx_glossaryref" title="">FPGAs</span>. However, unlike the present paper, StreamBrain only accelerated a small subset of the BCPNN algorithm on the FPGA platform. Wang et al. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib33" title="">33</a>]</cite> showed that the BCPNN local learning rule can be mapped and executed using an analog memristor model, showing that the device could have a correlation coefficient as high as <math alttext="0.98" class="ltx_Math" display="inline" id="S2.p1.1.m1.1"><semantics id="S2.p1.1.m1.1a"><mn id="S2.p1.1.m1.1.1" xref="S2.p1.1.m1.1.1.cmml">0.98</mn><annotation-xml encoding="MathML-Content" id="S2.p1.1.m1.1b"><cn id="S2.p1.1.m1.1.1.cmml" type="float" xref="S2.p1.1.m1.1.1">0.98</cn></annotation-xml><annotation encoding="application/x-tex" id="S2.p1.1.m1.1c">0.98</annotation><annotation encoding="application/x-llamapun" id="S2.p1.1.m1.1d">0.98</annotation></semantics></math>, and showing that it could learn the MNIST benchmark. Wang et al. <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib32" title="">32</a>]</cite> presented an <span class="ltx_glossaryref" title="">FPGA</span>-based HPC design specifically optimized for a <span class="ltx_glossaryref" title="">BCPNN</span>-based associative memory system. Their approach incorporates several optimizations, including shared parallel computing units, hybrid-precision computing for a hybrid update mechanism, and the globally asynchronous, locally synchronous (GALS) strategy. Using the Xilinx Alveo U200 <span class="ltx_glossaryref" title="">FPGA</span> accelerator card, the design achieved a maximum network size of 150x10 and a peak frequency of 100 MHz. The <span class="ltx_glossaryref" title="">FPGA</span>-based solution outperformed its Nvidia GTX 4090 counterpart, demonstrating a maximum latency reduction of 33.23x and a power consumption reduction of over 6.9x. The study underscores the potential of <span class="ltx_glossaryref" title="">FPGA</span>-based accelerators to significantly enhance both speed and energy efficiency in neuromorphic computing implementations. However, the scope of their work is limited to a small network size and omits evaluation of real-world datasets. Contrary to the related work, which has been shown either in-parts <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib22" title="">22</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib39" title="">39</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib33" title="">33</a>]</cite> or at a low TRL (omitting real use-cases) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib32" title="">32</a>]</cite>, our work is the first that provides an FPGA accelerator for BCPNN that is high-performance (outperforms Nvidia A100) and that can handle real-life use-case with a low-latency, encourage its deployment in data-centers and on-edge premises. We are also the first to show that BCPNN with the (more complicated) use cases, such as detecting pneumonia or breast cancer.</p> </div> </section> <section class="ltx_section" id="S3"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">3 </span>Bayesian Confidence Propagation Neural Network</h2> <div class="ltx_para" id="S3.p1"> <p class="ltx_p" id="S3.p1.1"><span class="ltx_glossaryref" title="">BCPNN</span> is a brain-inspired machine learning model that utilizes the principles of Bayesian statistics to derive the synaptic and neuronal update operations. It has two types of formulation: spike-based and rate-based. In this paper, we design a hardware accelerator for the rate-based <span class="ltx_glossaryref" title="">BCPNN</span>. The work is based on the latest work in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib23" title="">23</a>]</cite>, which is a feedforward <span class="ltx_glossaryref" title="">BCPNN</span> that integrates cortical column, divisive normalization, Hebbian synaptic plasticity, structural plasticity, sparse activity, and sparse, patchy connectivity.</p> </div> <div class="ltx_para" id="S3.p2"> <p class="ltx_p" id="S3.p2.1">The BCPNN divides its computational units into <span class="ltx_text ltx_font_italic" id="S3.p2.1.1">minicolumns</span>, which form part of larger modules known as <span class="ltx_text ltx_font_italic" id="S3.p2.1.2">hypercolumns</span> <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib23" title="">23</a>]</cite>. Each hypercolumn encodes a particular input attribute, while its constituent minicolumns represent discrete, mutually exclusive values of that attribute. This arrangement echoes the columnar structure of the primate neocortex, where functionally similar neurons are grouped vertically, creating a sparse and energy-efficient coding scheme <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib7" title="">7</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib6" title="">6</a>]</cite>.</p> </div> <div class="ltx_para" id="S3.p3"> <p class="ltx_p" id="S3.p3.1">A basic feedforward BCPNN consists of at least two layers: an input layer and a hidden layer. The input layer’s minicolumns capture discrete feature values provided by the data, and the hidden layer’s minicolumns encode internal representations derived from these inputs <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib23" title="">23</a>]</cite>. Connecting these layers are weighted projections that undergo synaptic plasticity, an unsupervised learning mechanism analogous to Hebbian-Bayesian updates. This rule adapts the network parameters online, using local statistics of neuronal activities.</p> </div> <div class="ltx_para" id="S3.p4"> <p class="ltx_p" id="S3.p4.5">At the core of BCPNN lie three key probability traces, incrementally updated at each training step: the probability of an input minicolumn being active (<math alttext="p_{i}" class="ltx_Math" display="inline" id="S3.p4.1.m1.1"><semantics id="S3.p4.1.m1.1a"><msub id="S3.p4.1.m1.1.1" xref="S3.p4.1.m1.1.1.cmml"><mi id="S3.p4.1.m1.1.1.2" xref="S3.p4.1.m1.1.1.2.cmml">p</mi><mi id="S3.p4.1.m1.1.1.3" xref="S3.p4.1.m1.1.1.3.cmml">i</mi></msub><annotation-xml encoding="MathML-Content" id="S3.p4.1.m1.1b"><apply id="S3.p4.1.m1.1.1.cmml" xref="S3.p4.1.m1.1.1"><csymbol cd="ambiguous" id="S3.p4.1.m1.1.1.1.cmml" xref="S3.p4.1.m1.1.1">subscript</csymbol><ci id="S3.p4.1.m1.1.1.2.cmml" xref="S3.p4.1.m1.1.1.2">𝑝</ci><ci id="S3.p4.1.m1.1.1.3.cmml" xref="S3.p4.1.m1.1.1.3">𝑖</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.p4.1.m1.1c">p_{i}</annotation><annotation encoding="application/x-llamapun" id="S3.p4.1.m1.1d">italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT</annotation></semantics></math>), the probability of a hidden minicolumn being active (<math alttext="p_{j}" class="ltx_Math" display="inline" id="S3.p4.2.m2.1"><semantics id="S3.p4.2.m2.1a"><msub id="S3.p4.2.m2.1.1" xref="S3.p4.2.m2.1.1.cmml"><mi id="S3.p4.2.m2.1.1.2" xref="S3.p4.2.m2.1.1.2.cmml">p</mi><mi id="S3.p4.2.m2.1.1.3" xref="S3.p4.2.m2.1.1.3.cmml">j</mi></msub><annotation-xml encoding="MathML-Content" id="S3.p4.2.m2.1b"><apply id="S3.p4.2.m2.1.1.cmml" xref="S3.p4.2.m2.1.1"><csymbol cd="ambiguous" id="S3.p4.2.m2.1.1.1.cmml" xref="S3.p4.2.m2.1.1">subscript</csymbol><ci id="S3.p4.2.m2.1.1.2.cmml" xref="S3.p4.2.m2.1.1.2">𝑝</ci><ci id="S3.p4.2.m2.1.1.3.cmml" xref="S3.p4.2.m2.1.1.3">𝑗</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.p4.2.m2.1c">p_{j}</annotation><annotation encoding="application/x-llamapun" id="S3.p4.2.m2.1d">italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT</annotation></semantics></math>), and their joint probability (<math alttext="p_{ij}" class="ltx_Math" display="inline" id="S3.p4.3.m3.1"><semantics id="S3.p4.3.m3.1a"><msub id="S3.p4.3.m3.1.1" xref="S3.p4.3.m3.1.1.cmml"><mi id="S3.p4.3.m3.1.1.2" xref="S3.p4.3.m3.1.1.2.cmml">p</mi><mrow id="S3.p4.3.m3.1.1.3" xref="S3.p4.3.m3.1.1.3.cmml"><mi id="S3.p4.3.m3.1.1.3.2" xref="S3.p4.3.m3.1.1.3.2.cmml">i</mi><mo id="S3.p4.3.m3.1.1.3.1" xref="S3.p4.3.m3.1.1.3.1.cmml">⁢</mo><mi id="S3.p4.3.m3.1.1.3.3" xref="S3.p4.3.m3.1.1.3.3.cmml">j</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.p4.3.m3.1b"><apply id="S3.p4.3.m3.1.1.cmml" xref="S3.p4.3.m3.1.1"><csymbol cd="ambiguous" id="S3.p4.3.m3.1.1.1.cmml" xref="S3.p4.3.m3.1.1">subscript</csymbol><ci id="S3.p4.3.m3.1.1.2.cmml" xref="S3.p4.3.m3.1.1.2">𝑝</ci><apply id="S3.p4.3.m3.1.1.3.cmml" xref="S3.p4.3.m3.1.1.3"><times id="S3.p4.3.m3.1.1.3.1.cmml" xref="S3.p4.3.m3.1.1.3.1"></times><ci id="S3.p4.3.m3.1.1.3.2.cmml" xref="S3.p4.3.m3.1.1.3.2">𝑖</ci><ci id="S3.p4.3.m3.1.1.3.3.cmml" xref="S3.p4.3.m3.1.1.3.3">𝑗</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.p4.3.m3.1c">p_{ij}</annotation><annotation encoding="application/x-llamapun" id="S3.p4.3.m3.1d">italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT</annotation></semantics></math>). These traces support a learning rule where biases (<math alttext="b_{j}" class="ltx_Math" display="inline" id="S3.p4.4.m4.1"><semantics id="S3.p4.4.m4.1a"><msub id="S3.p4.4.m4.1.1" xref="S3.p4.4.m4.1.1.cmml"><mi id="S3.p4.4.m4.1.1.2" xref="S3.p4.4.m4.1.1.2.cmml">b</mi><mi id="S3.p4.4.m4.1.1.3" xref="S3.p4.4.m4.1.1.3.cmml">j</mi></msub><annotation-xml encoding="MathML-Content" id="S3.p4.4.m4.1b"><apply id="S3.p4.4.m4.1.1.cmml" xref="S3.p4.4.m4.1.1"><csymbol cd="ambiguous" id="S3.p4.4.m4.1.1.1.cmml" xref="S3.p4.4.m4.1.1">subscript</csymbol><ci id="S3.p4.4.m4.1.1.2.cmml" xref="S3.p4.4.m4.1.1.2">𝑏</ci><ci id="S3.p4.4.m4.1.1.3.cmml" xref="S3.p4.4.m4.1.1.3">𝑗</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.p4.4.m4.1c">b_{j}</annotation><annotation encoding="application/x-llamapun" id="S3.p4.4.m4.1d">italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT</annotation></semantics></math>) and connection weights (<math alttext="w_{ij}" class="ltx_Math" display="inline" id="S3.p4.5.m5.1"><semantics id="S3.p4.5.m5.1a"><msub id="S3.p4.5.m5.1.1" xref="S3.p4.5.m5.1.1.cmml"><mi id="S3.p4.5.m5.1.1.2" xref="S3.p4.5.m5.1.1.2.cmml">w</mi><mrow id="S3.p4.5.m5.1.1.3" xref="S3.p4.5.m5.1.1.3.cmml"><mi id="S3.p4.5.m5.1.1.3.2" xref="S3.p4.5.m5.1.1.3.2.cmml">i</mi><mo id="S3.p4.5.m5.1.1.3.1" xref="S3.p4.5.m5.1.1.3.1.cmml">⁢</mo><mi id="S3.p4.5.m5.1.1.3.3" xref="S3.p4.5.m5.1.1.3.3.cmml">j</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.p4.5.m5.1b"><apply id="S3.p4.5.m5.1.1.cmml" xref="S3.p4.5.m5.1.1"><csymbol cd="ambiguous" id="S3.p4.5.m5.1.1.1.cmml" xref="S3.p4.5.m5.1.1">subscript</csymbol><ci id="S3.p4.5.m5.1.1.2.cmml" xref="S3.p4.5.m5.1.1.2">𝑤</ci><apply id="S3.p4.5.m5.1.1.3.cmml" xref="S3.p4.5.m5.1.1.3"><times id="S3.p4.5.m5.1.1.3.1.cmml" xref="S3.p4.5.m5.1.1.3.1"></times><ci id="S3.p4.5.m5.1.1.3.2.cmml" xref="S3.p4.5.m5.1.1.3.2">𝑖</ci><ci id="S3.p4.5.m5.1.1.3.3.cmml" xref="S3.p4.5.m5.1.1.3.3">𝑗</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.p4.5.m5.1c">w_{ij}</annotation><annotation encoding="application/x-llamapun" id="S3.p4.5.m5.1d">italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT</annotation></semantics></math>) are computed as logarithms of conditional probabilities:</p> </div> <div class="ltx_para" id="S3.p5"> <table class="ltx_equation ltx_eqn_table" id="S3.E1"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="b_{j}=\log p_{j},\quad w_{ij}=\log\frac{p_{ij}}{p_{i}p_{j}}." class="ltx_Math" display="block" id="S3.E1.m1.1"><semantics id="S3.E1.m1.1a"><mrow id="S3.E1.m1.1.1.1"><mrow id="S3.E1.m1.1.1.1.1.2" xref="S3.E1.m1.1.1.1.1.3.cmml"><mrow id="S3.E1.m1.1.1.1.1.1.1" xref="S3.E1.m1.1.1.1.1.1.1.cmml"><msub id="S3.E1.m1.1.1.1.1.1.1.2" xref="S3.E1.m1.1.1.1.1.1.1.2.cmml"><mi id="S3.E1.m1.1.1.1.1.1.1.2.2" xref="S3.E1.m1.1.1.1.1.1.1.2.2.cmml">b</mi><mi id="S3.E1.m1.1.1.1.1.1.1.2.3" xref="S3.E1.m1.1.1.1.1.1.1.2.3.cmml">j</mi></msub><mo id="S3.E1.m1.1.1.1.1.1.1.1" xref="S3.E1.m1.1.1.1.1.1.1.1.cmml">=</mo><mrow id="S3.E1.m1.1.1.1.1.1.1.3" xref="S3.E1.m1.1.1.1.1.1.1.3.cmml"><mi id="S3.E1.m1.1.1.1.1.1.1.3.1" xref="S3.E1.m1.1.1.1.1.1.1.3.1.cmml">log</mi><mo id="S3.E1.m1.1.1.1.1.1.1.3a" lspace="0.167em" xref="S3.E1.m1.1.1.1.1.1.1.3.cmml">⁡</mo><msub id="S3.E1.m1.1.1.1.1.1.1.3.2" xref="S3.E1.m1.1.1.1.1.1.1.3.2.cmml"><mi id="S3.E1.m1.1.1.1.1.1.1.3.2.2" xref="S3.E1.m1.1.1.1.1.1.1.3.2.2.cmml">p</mi><mi id="S3.E1.m1.1.1.1.1.1.1.3.2.3" xref="S3.E1.m1.1.1.1.1.1.1.3.2.3.cmml">j</mi></msub></mrow></mrow><mo id="S3.E1.m1.1.1.1.1.2.3" rspace="1.167em" xref="S3.E1.m1.1.1.1.1.3a.cmml">,</mo><mrow id="S3.E1.m1.1.1.1.1.2.2" xref="S3.E1.m1.1.1.1.1.2.2.cmml"><msub id="S3.E1.m1.1.1.1.1.2.2.2" xref="S3.E1.m1.1.1.1.1.2.2.2.cmml"><mi id="S3.E1.m1.1.1.1.1.2.2.2.2" xref="S3.E1.m1.1.1.1.1.2.2.2.2.cmml">w</mi><mrow id="S3.E1.m1.1.1.1.1.2.2.2.3" xref="S3.E1.m1.1.1.1.1.2.2.2.3.cmml"><mi id="S3.E1.m1.1.1.1.1.2.2.2.3.2" xref="S3.E1.m1.1.1.1.1.2.2.2.3.2.cmml">i</mi><mo id="S3.E1.m1.1.1.1.1.2.2.2.3.1" xref="S3.E1.m1.1.1.1.1.2.2.2.3.1.cmml">⁢</mo><mi id="S3.E1.m1.1.1.1.1.2.2.2.3.3" xref="S3.E1.m1.1.1.1.1.2.2.2.3.3.cmml">j</mi></mrow></msub><mo id="S3.E1.m1.1.1.1.1.2.2.1" xref="S3.E1.m1.1.1.1.1.2.2.1.cmml">=</mo><mrow id="S3.E1.m1.1.1.1.1.2.2.3" xref="S3.E1.m1.1.1.1.1.2.2.3.cmml"><mi id="S3.E1.m1.1.1.1.1.2.2.3.1" xref="S3.E1.m1.1.1.1.1.2.2.3.1.cmml">log</mi><mo id="S3.E1.m1.1.1.1.1.2.2.3a" lspace="0.167em" xref="S3.E1.m1.1.1.1.1.2.2.3.cmml">⁡</mo><mfrac id="S3.E1.m1.1.1.1.1.2.2.3.2" xref="S3.E1.m1.1.1.1.1.2.2.3.2.cmml"><msub id="S3.E1.m1.1.1.1.1.2.2.3.2.2" xref="S3.E1.m1.1.1.1.1.2.2.3.2.2.cmml"><mi id="S3.E1.m1.1.1.1.1.2.2.3.2.2.2" xref="S3.E1.m1.1.1.1.1.2.2.3.2.2.2.cmml">p</mi><mrow id="S3.E1.m1.1.1.1.1.2.2.3.2.2.3" xref="S3.E1.m1.1.1.1.1.2.2.3.2.2.3.cmml"><mi id="S3.E1.m1.1.1.1.1.2.2.3.2.2.3.2" xref="S3.E1.m1.1.1.1.1.2.2.3.2.2.3.2.cmml">i</mi><mo id="S3.E1.m1.1.1.1.1.2.2.3.2.2.3.1" xref="S3.E1.m1.1.1.1.1.2.2.3.2.2.3.1.cmml">⁢</mo><mi id="S3.E1.m1.1.1.1.1.2.2.3.2.2.3.3" xref="S3.E1.m1.1.1.1.1.2.2.3.2.2.3.3.cmml">j</mi></mrow></msub><mrow id="S3.E1.m1.1.1.1.1.2.2.3.2.3" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.cmml"><msub id="S3.E1.m1.1.1.1.1.2.2.3.2.3.2" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.2.cmml"><mi id="S3.E1.m1.1.1.1.1.2.2.3.2.3.2.2" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.2.2.cmml">p</mi><mi id="S3.E1.m1.1.1.1.1.2.2.3.2.3.2.3" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.2.3.cmml">i</mi></msub><mo id="S3.E1.m1.1.1.1.1.2.2.3.2.3.1" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.1.cmml">⁢</mo><msub id="S3.E1.m1.1.1.1.1.2.2.3.2.3.3" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.3.cmml"><mi id="S3.E1.m1.1.1.1.1.2.2.3.2.3.3.2" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.3.2.cmml">p</mi><mi id="S3.E1.m1.1.1.1.1.2.2.3.2.3.3.3" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.3.3.cmml">j</mi></msub></mrow></mfrac></mrow></mrow></mrow><mo id="S3.E1.m1.1.1.1.2" lspace="0em">.</mo></mrow><annotation-xml encoding="MathML-Content" id="S3.E1.m1.1b"><apply id="S3.E1.m1.1.1.1.1.3.cmml" xref="S3.E1.m1.1.1.1.1.2"><csymbol cd="ambiguous" id="S3.E1.m1.1.1.1.1.3a.cmml" xref="S3.E1.m1.1.1.1.1.2.3">formulae-sequence</csymbol><apply id="S3.E1.m1.1.1.1.1.1.1.cmml" xref="S3.E1.m1.1.1.1.1.1.1"><eq id="S3.E1.m1.1.1.1.1.1.1.1.cmml" xref="S3.E1.m1.1.1.1.1.1.1.1"></eq><apply id="S3.E1.m1.1.1.1.1.1.1.2.cmml" xref="S3.E1.m1.1.1.1.1.1.1.2"><csymbol cd="ambiguous" id="S3.E1.m1.1.1.1.1.1.1.2.1.cmml" xref="S3.E1.m1.1.1.1.1.1.1.2">subscript</csymbol><ci id="S3.E1.m1.1.1.1.1.1.1.2.2.cmml" xref="S3.E1.m1.1.1.1.1.1.1.2.2">𝑏</ci><ci id="S3.E1.m1.1.1.1.1.1.1.2.3.cmml" xref="S3.E1.m1.1.1.1.1.1.1.2.3">𝑗</ci></apply><apply id="S3.E1.m1.1.1.1.1.1.1.3.cmml" xref="S3.E1.m1.1.1.1.1.1.1.3"><log id="S3.E1.m1.1.1.1.1.1.1.3.1.cmml" xref="S3.E1.m1.1.1.1.1.1.1.3.1"></log><apply id="S3.E1.m1.1.1.1.1.1.1.3.2.cmml" xref="S3.E1.m1.1.1.1.1.1.1.3.2"><csymbol cd="ambiguous" id="S3.E1.m1.1.1.1.1.1.1.3.2.1.cmml" xref="S3.E1.m1.1.1.1.1.1.1.3.2">subscript</csymbol><ci id="S3.E1.m1.1.1.1.1.1.1.3.2.2.cmml" xref="S3.E1.m1.1.1.1.1.1.1.3.2.2">𝑝</ci><ci id="S3.E1.m1.1.1.1.1.1.1.3.2.3.cmml" xref="S3.E1.m1.1.1.1.1.1.1.3.2.3">𝑗</ci></apply></apply></apply><apply id="S3.E1.m1.1.1.1.1.2.2.cmml" xref="S3.E1.m1.1.1.1.1.2.2"><eq id="S3.E1.m1.1.1.1.1.2.2.1.cmml" xref="S3.E1.m1.1.1.1.1.2.2.1"></eq><apply id="S3.E1.m1.1.1.1.1.2.2.2.cmml" xref="S3.E1.m1.1.1.1.1.2.2.2"><csymbol cd="ambiguous" id="S3.E1.m1.1.1.1.1.2.2.2.1.cmml" xref="S3.E1.m1.1.1.1.1.2.2.2">subscript</csymbol><ci id="S3.E1.m1.1.1.1.1.2.2.2.2.cmml" xref="S3.E1.m1.1.1.1.1.2.2.2.2">𝑤</ci><apply id="S3.E1.m1.1.1.1.1.2.2.2.3.cmml" xref="S3.E1.m1.1.1.1.1.2.2.2.3"><times id="S3.E1.m1.1.1.1.1.2.2.2.3.1.cmml" xref="S3.E1.m1.1.1.1.1.2.2.2.3.1"></times><ci id="S3.E1.m1.1.1.1.1.2.2.2.3.2.cmml" xref="S3.E1.m1.1.1.1.1.2.2.2.3.2">𝑖</ci><ci id="S3.E1.m1.1.1.1.1.2.2.2.3.3.cmml" xref="S3.E1.m1.1.1.1.1.2.2.2.3.3">𝑗</ci></apply></apply><apply id="S3.E1.m1.1.1.1.1.2.2.3.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3"><log id="S3.E1.m1.1.1.1.1.2.2.3.1.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.1"></log><apply id="S3.E1.m1.1.1.1.1.2.2.3.2.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2"><divide id="S3.E1.m1.1.1.1.1.2.2.3.2.1.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2"></divide><apply id="S3.E1.m1.1.1.1.1.2.2.3.2.2.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.2"><csymbol cd="ambiguous" id="S3.E1.m1.1.1.1.1.2.2.3.2.2.1.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.2">subscript</csymbol><ci id="S3.E1.m1.1.1.1.1.2.2.3.2.2.2.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.2.2">𝑝</ci><apply id="S3.E1.m1.1.1.1.1.2.2.3.2.2.3.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.2.3"><times id="S3.E1.m1.1.1.1.1.2.2.3.2.2.3.1.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.2.3.1"></times><ci id="S3.E1.m1.1.1.1.1.2.2.3.2.2.3.2.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.2.3.2">𝑖</ci><ci id="S3.E1.m1.1.1.1.1.2.2.3.2.2.3.3.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.2.3.3">𝑗</ci></apply></apply><apply id="S3.E1.m1.1.1.1.1.2.2.3.2.3.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3"><times id="S3.E1.m1.1.1.1.1.2.2.3.2.3.1.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.1"></times><apply id="S3.E1.m1.1.1.1.1.2.2.3.2.3.2.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.2"><csymbol cd="ambiguous" id="S3.E1.m1.1.1.1.1.2.2.3.2.3.2.1.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.2">subscript</csymbol><ci id="S3.E1.m1.1.1.1.1.2.2.3.2.3.2.2.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.2.2">𝑝</ci><ci id="S3.E1.m1.1.1.1.1.2.2.3.2.3.2.3.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.2.3">𝑖</ci></apply><apply id="S3.E1.m1.1.1.1.1.2.2.3.2.3.3.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.3"><csymbol cd="ambiguous" id="S3.E1.m1.1.1.1.1.2.2.3.2.3.3.1.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.3">subscript</csymbol><ci id="S3.E1.m1.1.1.1.1.2.2.3.2.3.3.2.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.3.2">𝑝</ci><ci id="S3.E1.m1.1.1.1.1.2.2.3.2.3.3.3.cmml" xref="S3.E1.m1.1.1.1.1.2.2.3.2.3.3.3">𝑗</ci></apply></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.E1.m1.1c">b_{j}=\log p_{j},\quad w_{ij}=\log\frac{p_{ij}}{p_{i}p_{j}}.</annotation><annotation encoding="application/x-llamapun" id="S3.E1.m1.1d">italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = roman_log italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = roman_log divide start_ARG italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG .</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(1)</span></td> </tr></tbody> </table> </div> <div class="ltx_para" id="S3.p6"> <p class="ltx_p" id="S3.p6.1">This formulation expresses the hidden unit’s bias as the self-information and the synaptic weight as the mutual information between pre- and post-synaptic activities. Conceptually, it transforms observed co-occurrences of events into updated parameters that influence network activity. The activation of each hidden minicolumn is determined by a softmax function applied to support values derived from weighted input signals. This ensures that minicolumns in the same hypercolumn compete, resulting in a probability distribution across features. Consequently, a BCPNN hypercolumn provides a discrete probability estimate that closely resembles the cortical microcircuit behaviour, where excitatory and inhibitory interactions lead to sparse, distributed coding patterns. Finally (and importantly), BCPNN also supports structural plasticity where the network changes as a function of time, complementing the synaptic learning rule described above.</p> </div> <div class="ltx_para" id="S3.p7"> <p class="ltx_p" id="S3.p7.1">In short, BCPNN integrates neuroscientific principles—cortical microcircuitry, local learning, and probabilistic coding—into a neural computation framework. It encodes probability distributions directly within its weights and biases, learns online from streaming data, and yields a compact, high-level representation of complex inputs.</p> </div> </section> <section class="ltx_section" id="S4"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">4 </span>High-Performance Stream-Based BCPNN Accelerator</h2> <div class="ltx_para" id="S4.p1"> <p class="ltx_p" id="S4.p1.1">Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S4.F1" title="Figure 1 ‣ 4 High-Performance Stream-Based BCPNN Accelerator ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_tag">1</span></a> illustrates our complete development workflow. We start with a C-level simulation to verify correctness, then proceed to C-level synthesis to obtain a preliminary estimate of hardware resources. Next, we perform C/<span class="ltx_glossaryref" title="">Register Transfer Level (RTL)</span> cosimulation to finalize <span class="ltx_glossaryref" title="">First In First Out (FIFO)</span> depths and confirm that no deadlocks can occur. If we encounter resource constraints, we adjust model sizes or parameters before moving on to <span class="ltx_glossaryref" title="">RTL</span> synthesis for a more accurate assessment of hardware utilization. Once the design meets our resource and timing requirements, we transition to the Vitis development flow. This process packages the <span class="ltx_glossaryref" title="">RTL</span> into an extensible platform, performs synthesis and implementation, and generates the <span class="ltx_glossaryref" title="">FPGA</span> bitstream. By leveraging Vitis flow, we can concentrate on optimizing the <span class="ltx_glossaryref" title="">BCPNN</span> kernel, as low-level tasks such as PCIe or DMA configuration are handled automatically.</p> </div> <figure class="ltx_figure" id="S4.F1"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="132" id="S4.F1.g1" src="extracted/6222606/workflow.drawio.png" width="354"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 1: </span>Design workflow</figcaption> </figure> <section class="ltx_subsection" id="S4.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.1 </span>Accelerator Design using HLS</h3> <div class="ltx_para" id="S4.SS1.p1"> <p class="ltx_p" id="S4.SS1.p1.1">The <span class="ltx_glossaryref" title="">BCPNN</span> kernel comprises three interconnected population layers: input, hidden, and output. Each population layer represents a group of neurons that encodes and processes probabilistic relationships. These layers communicate through projection layers, with the input-hidden projection connecting the input population to the hidden population, and the hidden-output projection linking the hidden population to the output population. A projection refers to the connections in which information is transmitted from one population of neurons to another. To simplify <span class="ltx_glossaryref" title="">FPGA</span> optimization, we set key dimensions (e.g., hidden layer sizes) to powers of two or multiples of four. This choice eases unrolling and data partitioning during <span class="ltx_glossaryref" title="">HLS</span>.</p> </div> <figure class="ltx_figure" id="S4.F2"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="223" id="S4.F2.g1" src="extracted/6222606/GeneralBlockDiagram.png" width="393"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 2: </span>Block diagram connection host to FPGA</figcaption> </figure> <div class="ltx_para" id="S4.SS1.p2"> <p class="ltx_p" id="S4.SS1.p2.1">Building on these structural decisions, we designed the <span class="ltx_glossaryref" title="">BCPNN</span> kernel as a stream-based, data-driven architecture as shown in Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S4.F2" title="Figure 2 ‣ 4.1 Accelerator Design using HLS ‣ 4 High-Performance Stream-Based BCPNN Accelerator ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_tag">2</span></a>. Starting from a C/C++ specification, the <span class="ltx_glossaryref" title="">HLS</span> flow generates <span class="ltx_glossaryref" title="">RTL</span> that covers both unit activations and synaptic plasticity, which are the most computationally demanding. Although some routines depend on each other’s outputs and thus execute sequentially, operations associated with separate populations and projections are inherently independent. This independence enables parallelization through multiple streaming pipelines.</p> </div> <div class="ltx_para" id="S4.SS1.p3"> <p class="ltx_p" id="S4.SS1.p3.1">Expanding on the range of functionalities, the complete kernel supports unsupervised, supervised, and inference modes, with or without structural plasticity. Although each mode reuses the same streaming pipeline and, therefore, might appear to have similar execution times, there is a key exception in the inference-only design. Inference does not require synaptic plasticity updates (weights, biases, and activity probabilities remain fixed), which reduces <span class="ltx_glossaryref" title="">Block Random Access Memory (BRAM)</span> usage and allows for higher clock frequencies. This inference-specific configuration is particularly beneficial for energy-sensitive edge deployments. Although the final kernel design takes advantage of parallel streaming and specialized inference-only configurations, this level of efficiency and resource utilization was not achieved in a single step. Our development process began with a more straightforward sequential implementation. Starting from this initial baseline allowed us to identify bottlenecks in computation and memory access, paving the way for the subsequent optimization strategies described below.</p> </div> <section class="ltx_subsubsection" id="S4.SS1.SSS1"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection">4.1.1 </span>Initial Unoptimized Sequential Implementation. </h4> <div class="ltx_para" id="S4.SS1.SSS1.p1"> <p class="ltx_p" id="S4.SS1.SSS1.p1.1">As illustrated in Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S4.F3" title="Figure 3 ‣ 4.1.1 Initial Unoptimized Sequential Implementation. ‣ 4.1 Accelerator Design using HLS ‣ 4 High-Performance Stream-Based BCPNN Accelerator ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_tag">3</span></a>, our initial implementation processed each subtask sequentially. This approach wasted resources because the hardware allocated for other steps remained idle during the execution of the current step. It also introduced challenges in handling data: storing all data on-chip consumed an excessive amount of <span class="ltx_glossaryref" title="">BRAM</span> and led to routing congestion while relying on off-chip memory caused significant latency overhead. Recognizing these inefficiencies, we pursued several optimization techniques to enable parallelism, reduce memory overhead, and improve overall throughput.</p> </div> <figure class="ltx_figure" id="S4.F3"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="182" id="S4.F3.g1" src="extracted/6222606/dataflowillustration.png" width="354"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 3: </span>Optimization from sequential process to dataflow stream-based</figcaption> </figure> </section> <section class="ltx_subsubsection" id="S4.SS1.SSS2"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection">4.1.2 </span>Optimization #1: Stream-based FIFO data. </h4> <div class="ltx_para" id="S4.SS1.SSS2.p1"> <p class="ltx_p" id="S4.SS1.SSS2.p1.1">The first step was to adopt a stream-based data transfer model, where data elements are packaged into fixed-size segments and forwarded continuously through <span class="ltx_glossaryref" title="">FIFOs</span>. Rather than using static arrays in <span class="ltx_glossaryref" title="">BRAM</span>, we defined <span class="ltx_glossaryref" title="">FIFO</span> channels with a fixed depth to control data flow dynamically. We found that this approach reduces on-chip memory usage, mitigates routing complexity, and provides a foundation for task-level parallelism. However, streams alone are insufficient; we still need to break the sequential execution pattern.</p> </div> </section> <section class="ltx_subsubsection" id="S4.SS1.SSS3"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection">4.1.3 </span>Optimization #2: Dataflow process. </h4> <div class="ltx_para" id="S4.SS1.SSS3.p1"> <p class="ltx_p" id="S4.SS1.SSS3.p1.1">Dataflow directives in <span class="ltx_glossaryref" title="">HLS</span> enable task-level pipelining, allowing multiple sub-tasks to run concurrently as long as they are not interdependent. As shown in Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S4.F3" title="Figure 3 ‣ 4.1.1 Initial Unoptimized Sequential Implementation. ‣ 4.1 Accelerator Design using HLS ‣ 4 High-Performance Stream-Based BCPNN Accelerator ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_tag">3</span></a>, each stage of the computation can begin processing as soon as partial data is ready, passing intermediate results through <span class="ltx_glossaryref" title="">FIFO</span> streams. This setup lets independent operations, such as those performed on different populations and projections, proceed in parallel, significantly increasing throughput. When combined with stream-based <span class="ltx_glossaryref" title="">FIFOs</span>, dataflow introduces backpressure to maintain synchronization, preventing writes when <span class="ltx_glossaryref" title="">FIFOs</span> are full and reads when they are empty. Certain operations, such as the softmax computation for updating activity levels, require waiting until all relevant data arrives. To avoid deadlocks and ensure every stage has the data it needs, we carefully size the <span class="ltx_glossaryref" title="">FIFO</span> depths. Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S4.F1" title="Figure 1 ‣ 4 High-Performance Stream-Based BCPNN Accelerator ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_tag">1</span></a> illustrates our systematic approach to determining optimal <span class="ltx_glossaryref" title="">FIFO</span> configurations without resorting to trial and error. By applying dataflow directives alongside stream-based data transfers, our BCPNN kernel achieved roughly a 70% performance improvement compared to the initial sequential implementation.</p> </div> </section> <section class="ltx_subsubsection" id="S4.SS1.SSSx1"> <h4 class="ltx_title ltx_title_subsubsection">Optimization #3: Spread memory mapping in HBM with data partitioning and data merging.</h4> <div class="ltx_para" id="S4.SS1.SSSx1.p1"> <p class="ltx_p" id="S4.SS1.SSSx1.p1.1">As shown in Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S4.F4" title="Figure 4 ‣ Optimization #3: Spread memory mapping in HBM with data partitioning and data merging. ‣ 4.1 Accelerator Design using HLS ‣ 4 High-Performance Stream-Based BCPNN Accelerator ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_tag">4</span></a>, we further improve performance by leveraging multiple HBM channels through data partitioning and merging. Large arrays from the input-hidden projection layer (e.g., joint probability and weight data) are divided into four segments, each streamed to a separate HBM channel. On the FPGA, we use 512-bit burst reads, equivalent to fetching 16 floating-point values at a time, from each channel. Although HBM natively supports 256-bit access, its higher frequency (450 MHz) allows this effective doubling to 512 bits at our lower clock rate (&lt;300 MHz) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib35" title="">35</a>]</cite>. The data from all four channels is then merged into a single stream packet of 64 floating-point values. Aligning indexing between pre-/post-synaptic activities allows these large packets to be processed in parallel using <span class="ltx_glossaryref" title="">HLS</span> unroll directives. For the hidden-output projection, we apply a similar burst-read strategy without partitioning, producing 16-value packets to maintain efficient dataflow. Since the input-hidden and hidden-output projections operate in parallel, this optimization reduces latency by a factor of about 64. A similar approach is used for write operations.</p> </div> <figure class="ltx_figure" id="S4.F4"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="147" id="S4.F4.1.g1" src="extracted/6222606/datapartition.drawio.png" width="354"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">Figure 4: </span>Parallel HBM Access with Data Partitioning and Merging</figcaption> </figure> </section> </section> <section class="ltx_subsection" id="S4.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.2 </span>Performance Analysis</h3> <div class="ltx_para" id="S4.SS2.p1"> <p class="ltx_p" id="S4.SS2.p1.1">We conducted an internal performance analysis to guide platform-specific optimizations. To accomplish this, we employ a roofline model that highlights bottlenecks and helps us refine the design for a given hardware.</p> </div> <div class="ltx_para" id="S4.SS2.p2"> <p class="ltx_p" id="S4.SS2.p2.3">The Roofline Model <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib34" title="">34</a>]</cite> helps us visualize whether an application is limited by compute resources or memory bandwidth. It does so by plotting achievable performance (in FLOP/s) against arithmetic intensity (<math alttext="I" class="ltx_Math" display="inline" id="S4.SS2.p2.1.m1.1"><semantics id="S4.SS2.p2.1.m1.1a"><mi id="S4.SS2.p2.1.m1.1.1" xref="S4.SS2.p2.1.m1.1.1.cmml">I</mi><annotation-xml encoding="MathML-Content" id="S4.SS2.p2.1.m1.1b"><ci id="S4.SS2.p2.1.m1.1.1.cmml" xref="S4.SS2.p2.1.m1.1.1">𝐼</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p2.1.m1.1c">I</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p2.1.m1.1d">italic_I</annotation></semantics></math>, defined as the ratio of floating-point operations to bytes of data moved). On conventional architectures, if <math alttext="I" class="ltx_Math" display="inline" id="S4.SS2.p2.2.m2.1"><semantics id="S4.SS2.p2.2.m2.1a"><mi id="S4.SS2.p2.2.m2.1.1" xref="S4.SS2.p2.2.m2.1.1.cmml">I</mi><annotation-xml encoding="MathML-Content" id="S4.SS2.p2.2.m2.1b"><ci id="S4.SS2.p2.2.m2.1.1.cmml" xref="S4.SS2.p2.2.m2.1.1">𝐼</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p2.2.m2.1c">I</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p2.2.m2.1d">italic_I</annotation></semantics></math> is lower than the machine balance <math alttext="M_{b}" class="ltx_Math" display="inline" id="S4.SS2.p2.3.m3.1"><semantics id="S4.SS2.p2.3.m3.1a"><msub id="S4.SS2.p2.3.m3.1.1" xref="S4.SS2.p2.3.m3.1.1.cmml"><mi id="S4.SS2.p2.3.m3.1.1.2" xref="S4.SS2.p2.3.m3.1.1.2.cmml">M</mi><mi id="S4.SS2.p2.3.m3.1.1.3" xref="S4.SS2.p2.3.m3.1.1.3.cmml">b</mi></msub><annotation-xml encoding="MathML-Content" id="S4.SS2.p2.3.m3.1b"><apply id="S4.SS2.p2.3.m3.1.1.cmml" xref="S4.SS2.p2.3.m3.1.1"><csymbol cd="ambiguous" id="S4.SS2.p2.3.m3.1.1.1.cmml" xref="S4.SS2.p2.3.m3.1.1">subscript</csymbol><ci id="S4.SS2.p2.3.m3.1.1.2.cmml" xref="S4.SS2.p2.3.m3.1.1.2">𝑀</ci><ci id="S4.SS2.p2.3.m3.1.1.3.cmml" xref="S4.SS2.p2.3.m3.1.1.3">𝑏</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p2.3.m3.1c">M_{b}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p2.3.m3.1d">italic_M start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT</annotation></semantics></math>, the application is memory-bound; otherwise, it is compute-bound <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib4" title="">4</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib10" title="">10</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib29" title="">29</a>]</cite>.</p> </div> <div class="ltx_para" id="S4.SS2.p3"> <p class="ltx_p" id="S4.SS2.p3.7">Adapting this model to FPGAs is non-trivial. Unlike fixed architectures, an FPGA’s theoretical peak compute performance <math alttext="C_{FPGA}" class="ltx_Math" display="inline" id="S4.SS2.p3.1.m1.1"><semantics id="S4.SS2.p3.1.m1.1a"><msub id="S4.SS2.p3.1.m1.1.1" xref="S4.SS2.p3.1.m1.1.1.cmml"><mi id="S4.SS2.p3.1.m1.1.1.2" xref="S4.SS2.p3.1.m1.1.1.2.cmml">C</mi><mrow id="S4.SS2.p3.1.m1.1.1.3" xref="S4.SS2.p3.1.m1.1.1.3.cmml"><mi id="S4.SS2.p3.1.m1.1.1.3.2" xref="S4.SS2.p3.1.m1.1.1.3.2.cmml">F</mi><mo id="S4.SS2.p3.1.m1.1.1.3.1" xref="S4.SS2.p3.1.m1.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p3.1.m1.1.1.3.3" xref="S4.SS2.p3.1.m1.1.1.3.3.cmml">P</mi><mo id="S4.SS2.p3.1.m1.1.1.3.1a" xref="S4.SS2.p3.1.m1.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p3.1.m1.1.1.3.4" xref="S4.SS2.p3.1.m1.1.1.3.4.cmml">G</mi><mo id="S4.SS2.p3.1.m1.1.1.3.1b" xref="S4.SS2.p3.1.m1.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p3.1.m1.1.1.3.5" xref="S4.SS2.p3.1.m1.1.1.3.5.cmml">A</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S4.SS2.p3.1.m1.1b"><apply id="S4.SS2.p3.1.m1.1.1.cmml" xref="S4.SS2.p3.1.m1.1.1"><csymbol cd="ambiguous" id="S4.SS2.p3.1.m1.1.1.1.cmml" xref="S4.SS2.p3.1.m1.1.1">subscript</csymbol><ci id="S4.SS2.p3.1.m1.1.1.2.cmml" xref="S4.SS2.p3.1.m1.1.1.2">𝐶</ci><apply id="S4.SS2.p3.1.m1.1.1.3.cmml" xref="S4.SS2.p3.1.m1.1.1.3"><times id="S4.SS2.p3.1.m1.1.1.3.1.cmml" xref="S4.SS2.p3.1.m1.1.1.3.1"></times><ci id="S4.SS2.p3.1.m1.1.1.3.2.cmml" xref="S4.SS2.p3.1.m1.1.1.3.2">𝐹</ci><ci id="S4.SS2.p3.1.m1.1.1.3.3.cmml" xref="S4.SS2.p3.1.m1.1.1.3.3">𝑃</ci><ci id="S4.SS2.p3.1.m1.1.1.3.4.cmml" xref="S4.SS2.p3.1.m1.1.1.3.4">𝐺</ci><ci id="S4.SS2.p3.1.m1.1.1.3.5.cmml" xref="S4.SS2.p3.1.m1.1.1.3.5">𝐴</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p3.1.m1.1c">C_{FPGA}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p3.1.m1.1d">italic_C start_POSTSUBSCRIPT italic_F italic_P italic_G italic_A end_POSTSUBSCRIPT</annotation></semantics></math> depends on how many operations can be mapped onto its available resources and the operating frequency <math alttext="f_{imp}" class="ltx_Math" display="inline" id="S4.SS2.p3.2.m2.1"><semantics id="S4.SS2.p3.2.m2.1a"><msub id="S4.SS2.p3.2.m2.1.1" xref="S4.SS2.p3.2.m2.1.1.cmml"><mi id="S4.SS2.p3.2.m2.1.1.2" xref="S4.SS2.p3.2.m2.1.1.2.cmml">f</mi><mrow id="S4.SS2.p3.2.m2.1.1.3" xref="S4.SS2.p3.2.m2.1.1.3.cmml"><mi id="S4.SS2.p3.2.m2.1.1.3.2" xref="S4.SS2.p3.2.m2.1.1.3.2.cmml">i</mi><mo id="S4.SS2.p3.2.m2.1.1.3.1" xref="S4.SS2.p3.2.m2.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p3.2.m2.1.1.3.3" xref="S4.SS2.p3.2.m2.1.1.3.3.cmml">m</mi><mo id="S4.SS2.p3.2.m2.1.1.3.1a" xref="S4.SS2.p3.2.m2.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p3.2.m2.1.1.3.4" xref="S4.SS2.p3.2.m2.1.1.3.4.cmml">p</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S4.SS2.p3.2.m2.1b"><apply id="S4.SS2.p3.2.m2.1.1.cmml" xref="S4.SS2.p3.2.m2.1.1"><csymbol cd="ambiguous" id="S4.SS2.p3.2.m2.1.1.1.cmml" xref="S4.SS2.p3.2.m2.1.1">subscript</csymbol><ci id="S4.SS2.p3.2.m2.1.1.2.cmml" xref="S4.SS2.p3.2.m2.1.1.2">𝑓</ci><apply id="S4.SS2.p3.2.m2.1.1.3.cmml" xref="S4.SS2.p3.2.m2.1.1.3"><times id="S4.SS2.p3.2.m2.1.1.3.1.cmml" xref="S4.SS2.p3.2.m2.1.1.3.1"></times><ci id="S4.SS2.p3.2.m2.1.1.3.2.cmml" xref="S4.SS2.p3.2.m2.1.1.3.2">𝑖</ci><ci id="S4.SS2.p3.2.m2.1.1.3.3.cmml" xref="S4.SS2.p3.2.m2.1.1.3.3">𝑚</ci><ci id="S4.SS2.p3.2.m2.1.1.3.4.cmml" xref="S4.SS2.p3.2.m2.1.1.3.4">𝑝</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p3.2.m2.1c">f_{imp}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p3.2.m2.1d">italic_f start_POSTSUBSCRIPT italic_i italic_m italic_p end_POSTSUBSCRIPT</annotation></semantics></math>. We start with the number of available resources <math alttext="R_{A}" class="ltx_Math" display="inline" id="S4.SS2.p3.3.m3.1"><semantics id="S4.SS2.p3.3.m3.1a"><msub id="S4.SS2.p3.3.m3.1.1" xref="S4.SS2.p3.3.m3.1.1.cmml"><mi id="S4.SS2.p3.3.m3.1.1.2" xref="S4.SS2.p3.3.m3.1.1.2.cmml">R</mi><mi id="S4.SS2.p3.3.m3.1.1.3" xref="S4.SS2.p3.3.m3.1.1.3.cmml">A</mi></msub><annotation-xml encoding="MathML-Content" id="S4.SS2.p3.3.m3.1b"><apply id="S4.SS2.p3.3.m3.1.1.cmml" xref="S4.SS2.p3.3.m3.1.1"><csymbol cd="ambiguous" id="S4.SS2.p3.3.m3.1.1.1.cmml" xref="S4.SS2.p3.3.m3.1.1">subscript</csymbol><ci id="S4.SS2.p3.3.m3.1.1.2.cmml" xref="S4.SS2.p3.3.m3.1.1.2">𝑅</ci><ci id="S4.SS2.p3.3.m3.1.1.3.cmml" xref="S4.SS2.p3.3.m3.1.1.3">𝐴</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p3.3.m3.1c">R_{A}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p3.3.m3.1d">italic_R start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT</annotation></semantics></math> and the resource requirement per operation <math alttext="R_{O}" class="ltx_Math" display="inline" id="S4.SS2.p3.4.m4.1"><semantics id="S4.SS2.p3.4.m4.1a"><msub id="S4.SS2.p3.4.m4.1.1" xref="S4.SS2.p3.4.m4.1.1.cmml"><mi id="S4.SS2.p3.4.m4.1.1.2" xref="S4.SS2.p3.4.m4.1.1.2.cmml">R</mi><mi id="S4.SS2.p3.4.m4.1.1.3" xref="S4.SS2.p3.4.m4.1.1.3.cmml">O</mi></msub><annotation-xml encoding="MathML-Content" id="S4.SS2.p3.4.m4.1b"><apply id="S4.SS2.p3.4.m4.1.1.cmml" xref="S4.SS2.p3.4.m4.1.1"><csymbol cd="ambiguous" id="S4.SS2.p3.4.m4.1.1.1.cmml" xref="S4.SS2.p3.4.m4.1.1">subscript</csymbol><ci id="S4.SS2.p3.4.m4.1.1.2.cmml" xref="S4.SS2.p3.4.m4.1.1.2">𝑅</ci><ci id="S4.SS2.p3.4.m4.1.1.3.cmml" xref="S4.SS2.p3.4.m4.1.1.3">𝑂</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p3.4.m4.1c">R_{O}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p3.4.m4.1d">italic_R start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT</annotation></semantics></math>. The ratio <math alttext="R_{A}/R_{O}" class="ltx_Math" display="inline" id="S4.SS2.p3.5.m5.1"><semantics id="S4.SS2.p3.5.m5.1a"><mrow id="S4.SS2.p3.5.m5.1.1" xref="S4.SS2.p3.5.m5.1.1.cmml"><msub id="S4.SS2.p3.5.m5.1.1.2" xref="S4.SS2.p3.5.m5.1.1.2.cmml"><mi id="S4.SS2.p3.5.m5.1.1.2.2" xref="S4.SS2.p3.5.m5.1.1.2.2.cmml">R</mi><mi id="S4.SS2.p3.5.m5.1.1.2.3" xref="S4.SS2.p3.5.m5.1.1.2.3.cmml">A</mi></msub><mo id="S4.SS2.p3.5.m5.1.1.1" xref="S4.SS2.p3.5.m5.1.1.1.cmml">/</mo><msub id="S4.SS2.p3.5.m5.1.1.3" xref="S4.SS2.p3.5.m5.1.1.3.cmml"><mi id="S4.SS2.p3.5.m5.1.1.3.2" xref="S4.SS2.p3.5.m5.1.1.3.2.cmml">R</mi><mi id="S4.SS2.p3.5.m5.1.1.3.3" xref="S4.SS2.p3.5.m5.1.1.3.3.cmml">O</mi></msub></mrow><annotation-xml encoding="MathML-Content" id="S4.SS2.p3.5.m5.1b"><apply id="S4.SS2.p3.5.m5.1.1.cmml" xref="S4.SS2.p3.5.m5.1.1"><divide id="S4.SS2.p3.5.m5.1.1.1.cmml" xref="S4.SS2.p3.5.m5.1.1.1"></divide><apply id="S4.SS2.p3.5.m5.1.1.2.cmml" xref="S4.SS2.p3.5.m5.1.1.2"><csymbol cd="ambiguous" id="S4.SS2.p3.5.m5.1.1.2.1.cmml" xref="S4.SS2.p3.5.m5.1.1.2">subscript</csymbol><ci id="S4.SS2.p3.5.m5.1.1.2.2.cmml" xref="S4.SS2.p3.5.m5.1.1.2.2">𝑅</ci><ci id="S4.SS2.p3.5.m5.1.1.2.3.cmml" xref="S4.SS2.p3.5.m5.1.1.2.3">𝐴</ci></apply><apply id="S4.SS2.p3.5.m5.1.1.3.cmml" xref="S4.SS2.p3.5.m5.1.1.3"><csymbol cd="ambiguous" id="S4.SS2.p3.5.m5.1.1.3.1.cmml" xref="S4.SS2.p3.5.m5.1.1.3">subscript</csymbol><ci id="S4.SS2.p3.5.m5.1.1.3.2.cmml" xref="S4.SS2.p3.5.m5.1.1.3.2">𝑅</ci><ci id="S4.SS2.p3.5.m5.1.1.3.3.cmml" xref="S4.SS2.p3.5.m5.1.1.3.3">𝑂</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p3.5.m5.1c">R_{A}/R_{O}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p3.5.m5.1d">italic_R start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT / italic_R start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT</annotation></semantics></math> indicates how many such operations can run in parallel. Incorporating a utilization factor <math alttext="U_{R}^{i}" class="ltx_Math" display="inline" id="S4.SS2.p3.6.m6.1"><semantics id="S4.SS2.p3.6.m6.1a"><msubsup id="S4.SS2.p3.6.m6.1.1" xref="S4.SS2.p3.6.m6.1.1.cmml"><mi id="S4.SS2.p3.6.m6.1.1.2.2" xref="S4.SS2.p3.6.m6.1.1.2.2.cmml">U</mi><mi id="S4.SS2.p3.6.m6.1.1.2.3" xref="S4.SS2.p3.6.m6.1.1.2.3.cmml">R</mi><mi id="S4.SS2.p3.6.m6.1.1.3" xref="S4.SS2.p3.6.m6.1.1.3.cmml">i</mi></msubsup><annotation-xml encoding="MathML-Content" id="S4.SS2.p3.6.m6.1b"><apply id="S4.SS2.p3.6.m6.1.1.cmml" xref="S4.SS2.p3.6.m6.1.1"><csymbol cd="ambiguous" id="S4.SS2.p3.6.m6.1.1.1.cmml" xref="S4.SS2.p3.6.m6.1.1">superscript</csymbol><apply id="S4.SS2.p3.6.m6.1.1.2.cmml" xref="S4.SS2.p3.6.m6.1.1"><csymbol cd="ambiguous" id="S4.SS2.p3.6.m6.1.1.2.1.cmml" xref="S4.SS2.p3.6.m6.1.1">subscript</csymbol><ci id="S4.SS2.p3.6.m6.1.1.2.2.cmml" xref="S4.SS2.p3.6.m6.1.1.2.2">𝑈</ci><ci id="S4.SS2.p3.6.m6.1.1.2.3.cmml" xref="S4.SS2.p3.6.m6.1.1.2.3">𝑅</ci></apply><ci id="S4.SS2.p3.6.m6.1.1.3.cmml" xref="S4.SS2.p3.6.m6.1.1.3">𝑖</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p3.6.m6.1c">U_{R}^{i}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p3.6.m6.1d">italic_U start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT</annotation></semantics></math> (to account for routing congestion and practical limits, often around 80%), and the implemented frequency <math alttext="f_{imp}" class="ltx_Math" display="inline" id="S4.SS2.p3.7.m7.1"><semantics id="S4.SS2.p3.7.m7.1a"><msub id="S4.SS2.p3.7.m7.1.1" xref="S4.SS2.p3.7.m7.1.1.cmml"><mi id="S4.SS2.p3.7.m7.1.1.2" xref="S4.SS2.p3.7.m7.1.1.2.cmml">f</mi><mrow id="S4.SS2.p3.7.m7.1.1.3" xref="S4.SS2.p3.7.m7.1.1.3.cmml"><mi id="S4.SS2.p3.7.m7.1.1.3.2" xref="S4.SS2.p3.7.m7.1.1.3.2.cmml">i</mi><mo id="S4.SS2.p3.7.m7.1.1.3.1" xref="S4.SS2.p3.7.m7.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p3.7.m7.1.1.3.3" xref="S4.SS2.p3.7.m7.1.1.3.3.cmml">m</mi><mo id="S4.SS2.p3.7.m7.1.1.3.1a" xref="S4.SS2.p3.7.m7.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p3.7.m7.1.1.3.4" xref="S4.SS2.p3.7.m7.1.1.3.4.cmml">p</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S4.SS2.p3.7.m7.1b"><apply id="S4.SS2.p3.7.m7.1.1.cmml" xref="S4.SS2.p3.7.m7.1.1"><csymbol cd="ambiguous" id="S4.SS2.p3.7.m7.1.1.1.cmml" xref="S4.SS2.p3.7.m7.1.1">subscript</csymbol><ci id="S4.SS2.p3.7.m7.1.1.2.cmml" xref="S4.SS2.p3.7.m7.1.1.2">𝑓</ci><apply id="S4.SS2.p3.7.m7.1.1.3.cmml" xref="S4.SS2.p3.7.m7.1.1.3"><times id="S4.SS2.p3.7.m7.1.1.3.1.cmml" xref="S4.SS2.p3.7.m7.1.1.3.1"></times><ci id="S4.SS2.p3.7.m7.1.1.3.2.cmml" xref="S4.SS2.p3.7.m7.1.1.3.2">𝑖</ci><ci id="S4.SS2.p3.7.m7.1.1.3.3.cmml" xref="S4.SS2.p3.7.m7.1.1.3.3">𝑚</ci><ci id="S4.SS2.p3.7.m7.1.1.3.4.cmml" xref="S4.SS2.p3.7.m7.1.1.3.4">𝑝</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p3.7.m7.1c">f_{imp}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p3.7.m7.1d">italic_f start_POSTSUBSCRIPT italic_i italic_m italic_p end_POSTSUBSCRIPT</annotation></semantics></math>, we have:</p> </div> <div class="ltx_para" id="S4.SS2.p4"> <table class="ltx_equation ltx_eqn_table" id="S4.E2"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="C_{FPGA}=f_{imp}\times\min_{i}\left(\frac{R_{A}^{i}}{R_{O}^{i}}\times U_{R}^{i% }\right)" class="ltx_Math" display="block" id="S4.E2.m1.2"><semantics id="S4.E2.m1.2a"><mrow id="S4.E2.m1.2.2" xref="S4.E2.m1.2.2.cmml"><msub id="S4.E2.m1.2.2.4" xref="S4.E2.m1.2.2.4.cmml"><mi id="S4.E2.m1.2.2.4.2" xref="S4.E2.m1.2.2.4.2.cmml">C</mi><mrow id="S4.E2.m1.2.2.4.3" xref="S4.E2.m1.2.2.4.3.cmml"><mi id="S4.E2.m1.2.2.4.3.2" xref="S4.E2.m1.2.2.4.3.2.cmml">F</mi><mo id="S4.E2.m1.2.2.4.3.1" xref="S4.E2.m1.2.2.4.3.1.cmml">⁢</mo><mi id="S4.E2.m1.2.2.4.3.3" xref="S4.E2.m1.2.2.4.3.3.cmml">P</mi><mo id="S4.E2.m1.2.2.4.3.1a" xref="S4.E2.m1.2.2.4.3.1.cmml">⁢</mo><mi id="S4.E2.m1.2.2.4.3.4" xref="S4.E2.m1.2.2.4.3.4.cmml">G</mi><mo id="S4.E2.m1.2.2.4.3.1b" xref="S4.E2.m1.2.2.4.3.1.cmml">⁢</mo><mi id="S4.E2.m1.2.2.4.3.5" xref="S4.E2.m1.2.2.4.3.5.cmml">A</mi></mrow></msub><mo id="S4.E2.m1.2.2.3" xref="S4.E2.m1.2.2.3.cmml">=</mo><mrow id="S4.E2.m1.2.2.2" xref="S4.E2.m1.2.2.2.cmml"><msub id="S4.E2.m1.2.2.2.4" xref="S4.E2.m1.2.2.2.4.cmml"><mi id="S4.E2.m1.2.2.2.4.2" xref="S4.E2.m1.2.2.2.4.2.cmml">f</mi><mrow id="S4.E2.m1.2.2.2.4.3" xref="S4.E2.m1.2.2.2.4.3.cmml"><mi id="S4.E2.m1.2.2.2.4.3.2" xref="S4.E2.m1.2.2.2.4.3.2.cmml">i</mi><mo id="S4.E2.m1.2.2.2.4.3.1" xref="S4.E2.m1.2.2.2.4.3.1.cmml">⁢</mo><mi id="S4.E2.m1.2.2.2.4.3.3" xref="S4.E2.m1.2.2.2.4.3.3.cmml">m</mi><mo id="S4.E2.m1.2.2.2.4.3.1a" xref="S4.E2.m1.2.2.2.4.3.1.cmml">⁢</mo><mi id="S4.E2.m1.2.2.2.4.3.4" xref="S4.E2.m1.2.2.2.4.3.4.cmml">p</mi></mrow></msub><mo id="S4.E2.m1.2.2.2.3" lspace="0.222em" rspace="0.222em" xref="S4.E2.m1.2.2.2.3.cmml">×</mo><mrow id="S4.E2.m1.2.2.2.2.2" xref="S4.E2.m1.2.2.2.2.3.cmml"><munder id="S4.E2.m1.1.1.1.1.1.1" xref="S4.E2.m1.1.1.1.1.1.1.cmml"><mi id="S4.E2.m1.1.1.1.1.1.1.2" xref="S4.E2.m1.1.1.1.1.1.1.2.cmml">min</mi><mi id="S4.E2.m1.1.1.1.1.1.1.3" xref="S4.E2.m1.1.1.1.1.1.1.3.cmml">i</mi></munder><mo id="S4.E2.m1.2.2.2.2.2a" xref="S4.E2.m1.2.2.2.2.3.cmml">⁡</mo><mrow id="S4.E2.m1.2.2.2.2.2.2" xref="S4.E2.m1.2.2.2.2.3.cmml"><mo id="S4.E2.m1.2.2.2.2.2.2.2" xref="S4.E2.m1.2.2.2.2.3.cmml">(</mo><mrow id="S4.E2.m1.2.2.2.2.2.2.1" xref="S4.E2.m1.2.2.2.2.2.2.1.cmml"><mfrac id="S4.E2.m1.2.2.2.2.2.2.1.2" xref="S4.E2.m1.2.2.2.2.2.2.1.2.cmml"><msubsup id="S4.E2.m1.2.2.2.2.2.2.1.2.2" xref="S4.E2.m1.2.2.2.2.2.2.1.2.2.cmml"><mi id="S4.E2.m1.2.2.2.2.2.2.1.2.2.2.2" xref="S4.E2.m1.2.2.2.2.2.2.1.2.2.2.2.cmml">R</mi><mi id="S4.E2.m1.2.2.2.2.2.2.1.2.2.2.3" xref="S4.E2.m1.2.2.2.2.2.2.1.2.2.2.3.cmml">A</mi><mi id="S4.E2.m1.2.2.2.2.2.2.1.2.2.3" xref="S4.E2.m1.2.2.2.2.2.2.1.2.2.3.cmml">i</mi></msubsup><msubsup id="S4.E2.m1.2.2.2.2.2.2.1.2.3" xref="S4.E2.m1.2.2.2.2.2.2.1.2.3.cmml"><mi id="S4.E2.m1.2.2.2.2.2.2.1.2.3.2.2" xref="S4.E2.m1.2.2.2.2.2.2.1.2.3.2.2.cmml">R</mi><mi id="S4.E2.m1.2.2.2.2.2.2.1.2.3.2.3" xref="S4.E2.m1.2.2.2.2.2.2.1.2.3.2.3.cmml">O</mi><mi id="S4.E2.m1.2.2.2.2.2.2.1.2.3.3" xref="S4.E2.m1.2.2.2.2.2.2.1.2.3.3.cmml">i</mi></msubsup></mfrac><mo id="S4.E2.m1.2.2.2.2.2.2.1.1" lspace="0.222em" rspace="0.222em" xref="S4.E2.m1.2.2.2.2.2.2.1.1.cmml">×</mo><msubsup id="S4.E2.m1.2.2.2.2.2.2.1.3" xref="S4.E2.m1.2.2.2.2.2.2.1.3.cmml"><mi id="S4.E2.m1.2.2.2.2.2.2.1.3.2.2" xref="S4.E2.m1.2.2.2.2.2.2.1.3.2.2.cmml">U</mi><mi id="S4.E2.m1.2.2.2.2.2.2.1.3.2.3" xref="S4.E2.m1.2.2.2.2.2.2.1.3.2.3.cmml">R</mi><mi id="S4.E2.m1.2.2.2.2.2.2.1.3.3" xref="S4.E2.m1.2.2.2.2.2.2.1.3.3.cmml">i</mi></msubsup></mrow><mo id="S4.E2.m1.2.2.2.2.2.2.3" xref="S4.E2.m1.2.2.2.2.3.cmml">)</mo></mrow></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S4.E2.m1.2b"><apply id="S4.E2.m1.2.2.cmml" xref="S4.E2.m1.2.2"><eq id="S4.E2.m1.2.2.3.cmml" xref="S4.E2.m1.2.2.3"></eq><apply id="S4.E2.m1.2.2.4.cmml" xref="S4.E2.m1.2.2.4"><csymbol cd="ambiguous" id="S4.E2.m1.2.2.4.1.cmml" xref="S4.E2.m1.2.2.4">subscript</csymbol><ci id="S4.E2.m1.2.2.4.2.cmml" xref="S4.E2.m1.2.2.4.2">𝐶</ci><apply id="S4.E2.m1.2.2.4.3.cmml" xref="S4.E2.m1.2.2.4.3"><times id="S4.E2.m1.2.2.4.3.1.cmml" xref="S4.E2.m1.2.2.4.3.1"></times><ci id="S4.E2.m1.2.2.4.3.2.cmml" xref="S4.E2.m1.2.2.4.3.2">𝐹</ci><ci id="S4.E2.m1.2.2.4.3.3.cmml" xref="S4.E2.m1.2.2.4.3.3">𝑃</ci><ci id="S4.E2.m1.2.2.4.3.4.cmml" xref="S4.E2.m1.2.2.4.3.4">𝐺</ci><ci id="S4.E2.m1.2.2.4.3.5.cmml" xref="S4.E2.m1.2.2.4.3.5">𝐴</ci></apply></apply><apply id="S4.E2.m1.2.2.2.cmml" xref="S4.E2.m1.2.2.2"><times id="S4.E2.m1.2.2.2.3.cmml" xref="S4.E2.m1.2.2.2.3"></times><apply id="S4.E2.m1.2.2.2.4.cmml" xref="S4.E2.m1.2.2.2.4"><csymbol cd="ambiguous" id="S4.E2.m1.2.2.2.4.1.cmml" xref="S4.E2.m1.2.2.2.4">subscript</csymbol><ci id="S4.E2.m1.2.2.2.4.2.cmml" xref="S4.E2.m1.2.2.2.4.2">𝑓</ci><apply id="S4.E2.m1.2.2.2.4.3.cmml" xref="S4.E2.m1.2.2.2.4.3"><times id="S4.E2.m1.2.2.2.4.3.1.cmml" xref="S4.E2.m1.2.2.2.4.3.1"></times><ci id="S4.E2.m1.2.2.2.4.3.2.cmml" xref="S4.E2.m1.2.2.2.4.3.2">𝑖</ci><ci id="S4.E2.m1.2.2.2.4.3.3.cmml" xref="S4.E2.m1.2.2.2.4.3.3">𝑚</ci><ci id="S4.E2.m1.2.2.2.4.3.4.cmml" xref="S4.E2.m1.2.2.2.4.3.4">𝑝</ci></apply></apply><apply id="S4.E2.m1.2.2.2.2.3.cmml" xref="S4.E2.m1.2.2.2.2.2"><apply id="S4.E2.m1.1.1.1.1.1.1.cmml" xref="S4.E2.m1.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S4.E2.m1.1.1.1.1.1.1.1.cmml" xref="S4.E2.m1.1.1.1.1.1.1">subscript</csymbol><min id="S4.E2.m1.1.1.1.1.1.1.2.cmml" xref="S4.E2.m1.1.1.1.1.1.1.2"></min><ci id="S4.E2.m1.1.1.1.1.1.1.3.cmml" xref="S4.E2.m1.1.1.1.1.1.1.3">𝑖</ci></apply><apply id="S4.E2.m1.2.2.2.2.2.2.1.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1"><times id="S4.E2.m1.2.2.2.2.2.2.1.1.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.1"></times><apply id="S4.E2.m1.2.2.2.2.2.2.1.2.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2"><divide id="S4.E2.m1.2.2.2.2.2.2.1.2.1.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2"></divide><apply id="S4.E2.m1.2.2.2.2.2.2.1.2.2.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2.2"><csymbol cd="ambiguous" id="S4.E2.m1.2.2.2.2.2.2.1.2.2.1.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2.2">superscript</csymbol><apply id="S4.E2.m1.2.2.2.2.2.2.1.2.2.2.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2.2"><csymbol cd="ambiguous" id="S4.E2.m1.2.2.2.2.2.2.1.2.2.2.1.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2.2">subscript</csymbol><ci id="S4.E2.m1.2.2.2.2.2.2.1.2.2.2.2.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2.2.2.2">𝑅</ci><ci id="S4.E2.m1.2.2.2.2.2.2.1.2.2.2.3.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2.2.2.3">𝐴</ci></apply><ci id="S4.E2.m1.2.2.2.2.2.2.1.2.2.3.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2.2.3">𝑖</ci></apply><apply id="S4.E2.m1.2.2.2.2.2.2.1.2.3.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2.3"><csymbol cd="ambiguous" id="S4.E2.m1.2.2.2.2.2.2.1.2.3.1.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2.3">superscript</csymbol><apply id="S4.E2.m1.2.2.2.2.2.2.1.2.3.2.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2.3"><csymbol cd="ambiguous" id="S4.E2.m1.2.2.2.2.2.2.1.2.3.2.1.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2.3">subscript</csymbol><ci id="S4.E2.m1.2.2.2.2.2.2.1.2.3.2.2.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2.3.2.2">𝑅</ci><ci id="S4.E2.m1.2.2.2.2.2.2.1.2.3.2.3.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2.3.2.3">𝑂</ci></apply><ci id="S4.E2.m1.2.2.2.2.2.2.1.2.3.3.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.2.3.3">𝑖</ci></apply></apply><apply id="S4.E2.m1.2.2.2.2.2.2.1.3.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.3"><csymbol cd="ambiguous" id="S4.E2.m1.2.2.2.2.2.2.1.3.1.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.3">superscript</csymbol><apply id="S4.E2.m1.2.2.2.2.2.2.1.3.2.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.3"><csymbol cd="ambiguous" id="S4.E2.m1.2.2.2.2.2.2.1.3.2.1.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.3">subscript</csymbol><ci id="S4.E2.m1.2.2.2.2.2.2.1.3.2.2.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.3.2.2">𝑈</ci><ci id="S4.E2.m1.2.2.2.2.2.2.1.3.2.3.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.3.2.3">𝑅</ci></apply><ci id="S4.E2.m1.2.2.2.2.2.2.1.3.3.cmml" xref="S4.E2.m1.2.2.2.2.2.2.1.3.3">𝑖</ci></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.E2.m1.2c">C_{FPGA}=f_{imp}\times\min_{i}\left(\frac{R_{A}^{i}}{R_{O}^{i}}\times U_{R}^{i% }\right)</annotation><annotation encoding="application/x-llamapun" id="S4.E2.m1.2d">italic_C start_POSTSUBSCRIPT italic_F italic_P italic_G italic_A end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_i italic_m italic_p end_POSTSUBSCRIPT × roman_min start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( divide start_ARG italic_R start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG × italic_U start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT )</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(2)</span></td> </tr></tbody> </table> </div> <div class="ltx_para" id="S4.SS2.p5"> <p class="ltx_p" id="S4.SS2.p5.1">If we focus on DSPs and LUTs as the primary resources for floating-point operations, this simplifies to:</p> </div> <div class="ltx_para" id="S4.SS2.p6"> <table class="ltx_equation ltx_eqn_table" id="S4.E3"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="C_{FPGA}=f_{imp}\times\min\left(\frac{R_{A}^{LUT}}{R_{O}^{LUT}}\times U_{R}^{% LUT},\frac{R_{A}^{DSP}}{R_{O}^{DSP}}\times U_{R}^{DSP}\right)" class="ltx_Math" display="block" id="S4.E3.m1.3"><semantics id="S4.E3.m1.3a"><mrow id="S4.E3.m1.3.3" xref="S4.E3.m1.3.3.cmml"><msub id="S4.E3.m1.3.3.4" xref="S4.E3.m1.3.3.4.cmml"><mi id="S4.E3.m1.3.3.4.2" xref="S4.E3.m1.3.3.4.2.cmml">C</mi><mrow id="S4.E3.m1.3.3.4.3" xref="S4.E3.m1.3.3.4.3.cmml"><mi id="S4.E3.m1.3.3.4.3.2" xref="S4.E3.m1.3.3.4.3.2.cmml">F</mi><mo id="S4.E3.m1.3.3.4.3.1" xref="S4.E3.m1.3.3.4.3.1.cmml">⁢</mo><mi id="S4.E3.m1.3.3.4.3.3" xref="S4.E3.m1.3.3.4.3.3.cmml">P</mi><mo id="S4.E3.m1.3.3.4.3.1a" xref="S4.E3.m1.3.3.4.3.1.cmml">⁢</mo><mi id="S4.E3.m1.3.3.4.3.4" xref="S4.E3.m1.3.3.4.3.4.cmml">G</mi><mo id="S4.E3.m1.3.3.4.3.1b" xref="S4.E3.m1.3.3.4.3.1.cmml">⁢</mo><mi id="S4.E3.m1.3.3.4.3.5" xref="S4.E3.m1.3.3.4.3.5.cmml">A</mi></mrow></msub><mo id="S4.E3.m1.3.3.3" xref="S4.E3.m1.3.3.3.cmml">=</mo><mrow id="S4.E3.m1.3.3.2" xref="S4.E3.m1.3.3.2.cmml"><msub id="S4.E3.m1.3.3.2.4" xref="S4.E3.m1.3.3.2.4.cmml"><mi id="S4.E3.m1.3.3.2.4.2" xref="S4.E3.m1.3.3.2.4.2.cmml">f</mi><mrow id="S4.E3.m1.3.3.2.4.3" xref="S4.E3.m1.3.3.2.4.3.cmml"><mi id="S4.E3.m1.3.3.2.4.3.2" xref="S4.E3.m1.3.3.2.4.3.2.cmml">i</mi><mo id="S4.E3.m1.3.3.2.4.3.1" xref="S4.E3.m1.3.3.2.4.3.1.cmml">⁢</mo><mi id="S4.E3.m1.3.3.2.4.3.3" xref="S4.E3.m1.3.3.2.4.3.3.cmml">m</mi><mo id="S4.E3.m1.3.3.2.4.3.1a" xref="S4.E3.m1.3.3.2.4.3.1.cmml">⁢</mo><mi id="S4.E3.m1.3.3.2.4.3.4" xref="S4.E3.m1.3.3.2.4.3.4.cmml">p</mi></mrow></msub><mo id="S4.E3.m1.3.3.2.3" lspace="0.222em" rspace="0.222em" xref="S4.E3.m1.3.3.2.3.cmml">×</mo><mrow id="S4.E3.m1.3.3.2.2.2" xref="S4.E3.m1.3.3.2.2.3.cmml"><mi id="S4.E3.m1.1.1" xref="S4.E3.m1.1.1.cmml">min</mi><mo id="S4.E3.m1.3.3.2.2.2a" xref="S4.E3.m1.3.3.2.2.3.cmml">⁡</mo><mrow id="S4.E3.m1.3.3.2.2.2.2" xref="S4.E3.m1.3.3.2.2.3.cmml"><mo id="S4.E3.m1.3.3.2.2.2.2.3" xref="S4.E3.m1.3.3.2.2.3.cmml">(</mo><mrow id="S4.E3.m1.2.2.1.1.1.1.1" xref="S4.E3.m1.2.2.1.1.1.1.1.cmml"><mfrac id="S4.E3.m1.2.2.1.1.1.1.1.2" xref="S4.E3.m1.2.2.1.1.1.1.1.2.cmml"><msubsup id="S4.E3.m1.2.2.1.1.1.1.1.2.2" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.cmml"><mi id="S4.E3.m1.2.2.1.1.1.1.1.2.2.2.2" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.2.2.cmml">R</mi><mi id="S4.E3.m1.2.2.1.1.1.1.1.2.2.2.3" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.2.3.cmml">A</mi><mrow id="S4.E3.m1.2.2.1.1.1.1.1.2.2.3" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.cmml"><mi id="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.2" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.2.cmml">L</mi><mo id="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.1" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.1.cmml">⁢</mo><mi id="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.3" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.3.cmml">U</mi><mo id="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.1a" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.1.cmml">⁢</mo><mi id="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.4" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.4.cmml">T</mi></mrow></msubsup><msubsup id="S4.E3.m1.2.2.1.1.1.1.1.2.3" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.cmml"><mi id="S4.E3.m1.2.2.1.1.1.1.1.2.3.2.2" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.2.2.cmml">R</mi><mi id="S4.E3.m1.2.2.1.1.1.1.1.2.3.2.3" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.2.3.cmml">O</mi><mrow id="S4.E3.m1.2.2.1.1.1.1.1.2.3.3" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.cmml"><mi id="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.2" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.2.cmml">L</mi><mo id="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.1" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.1.cmml">⁢</mo><mi id="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.3" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.3.cmml">U</mi><mo id="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.1a" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.1.cmml">⁢</mo><mi id="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.4" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.4.cmml">T</mi></mrow></msubsup></mfrac><mo id="S4.E3.m1.2.2.1.1.1.1.1.1" lspace="0.222em" rspace="0.222em" xref="S4.E3.m1.2.2.1.1.1.1.1.1.cmml">×</mo><msubsup id="S4.E3.m1.2.2.1.1.1.1.1.3" xref="S4.E3.m1.2.2.1.1.1.1.1.3.cmml"><mi id="S4.E3.m1.2.2.1.1.1.1.1.3.2.2" xref="S4.E3.m1.2.2.1.1.1.1.1.3.2.2.cmml">U</mi><mi id="S4.E3.m1.2.2.1.1.1.1.1.3.2.3" xref="S4.E3.m1.2.2.1.1.1.1.1.3.2.3.cmml">R</mi><mrow id="S4.E3.m1.2.2.1.1.1.1.1.3.3" xref="S4.E3.m1.2.2.1.1.1.1.1.3.3.cmml"><mi id="S4.E3.m1.2.2.1.1.1.1.1.3.3.2" xref="S4.E3.m1.2.2.1.1.1.1.1.3.3.2.cmml">L</mi><mo id="S4.E3.m1.2.2.1.1.1.1.1.3.3.1" xref="S4.E3.m1.2.2.1.1.1.1.1.3.3.1.cmml">⁢</mo><mi id="S4.E3.m1.2.2.1.1.1.1.1.3.3.3" xref="S4.E3.m1.2.2.1.1.1.1.1.3.3.3.cmml">U</mi><mo id="S4.E3.m1.2.2.1.1.1.1.1.3.3.1a" xref="S4.E3.m1.2.2.1.1.1.1.1.3.3.1.cmml">⁢</mo><mi id="S4.E3.m1.2.2.1.1.1.1.1.3.3.4" xref="S4.E3.m1.2.2.1.1.1.1.1.3.3.4.cmml">T</mi></mrow></msubsup></mrow><mo id="S4.E3.m1.3.3.2.2.2.2.4" xref="S4.E3.m1.3.3.2.2.3.cmml">,</mo><mrow id="S4.E3.m1.3.3.2.2.2.2.2" xref="S4.E3.m1.3.3.2.2.2.2.2.cmml"><mfrac id="S4.E3.m1.3.3.2.2.2.2.2.2" xref="S4.E3.m1.3.3.2.2.2.2.2.2.cmml"><msubsup id="S4.E3.m1.3.3.2.2.2.2.2.2.2" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.cmml"><mi id="S4.E3.m1.3.3.2.2.2.2.2.2.2.2.2" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.2.2.cmml">R</mi><mi id="S4.E3.m1.3.3.2.2.2.2.2.2.2.2.3" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.2.3.cmml">A</mi><mrow id="S4.E3.m1.3.3.2.2.2.2.2.2.2.3" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.cmml"><mi id="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.2" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.2.cmml">D</mi><mo id="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.1" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.1.cmml">⁢</mo><mi id="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.3" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.3.cmml">S</mi><mo id="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.1a" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.1.cmml">⁢</mo><mi id="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.4" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.4.cmml">P</mi></mrow></msubsup><msubsup id="S4.E3.m1.3.3.2.2.2.2.2.2.3" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.cmml"><mi id="S4.E3.m1.3.3.2.2.2.2.2.2.3.2.2" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.2.2.cmml">R</mi><mi id="S4.E3.m1.3.3.2.2.2.2.2.2.3.2.3" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.2.3.cmml">O</mi><mrow id="S4.E3.m1.3.3.2.2.2.2.2.2.3.3" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.cmml"><mi id="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.2" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.2.cmml">D</mi><mo id="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.1" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.1.cmml">⁢</mo><mi id="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.3" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.3.cmml">S</mi><mo id="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.1a" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.1.cmml">⁢</mo><mi id="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.4" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.4.cmml">P</mi></mrow></msubsup></mfrac><mo id="S4.E3.m1.3.3.2.2.2.2.2.1" lspace="0.222em" rspace="0.222em" xref="S4.E3.m1.3.3.2.2.2.2.2.1.cmml">×</mo><msubsup id="S4.E3.m1.3.3.2.2.2.2.2.3" xref="S4.E3.m1.3.3.2.2.2.2.2.3.cmml"><mi id="S4.E3.m1.3.3.2.2.2.2.2.3.2.2" xref="S4.E3.m1.3.3.2.2.2.2.2.3.2.2.cmml">U</mi><mi id="S4.E3.m1.3.3.2.2.2.2.2.3.2.3" xref="S4.E3.m1.3.3.2.2.2.2.2.3.2.3.cmml">R</mi><mrow id="S4.E3.m1.3.3.2.2.2.2.2.3.3" xref="S4.E3.m1.3.3.2.2.2.2.2.3.3.cmml"><mi id="S4.E3.m1.3.3.2.2.2.2.2.3.3.2" xref="S4.E3.m1.3.3.2.2.2.2.2.3.3.2.cmml">D</mi><mo id="S4.E3.m1.3.3.2.2.2.2.2.3.3.1" xref="S4.E3.m1.3.3.2.2.2.2.2.3.3.1.cmml">⁢</mo><mi id="S4.E3.m1.3.3.2.2.2.2.2.3.3.3" xref="S4.E3.m1.3.3.2.2.2.2.2.3.3.3.cmml">S</mi><mo id="S4.E3.m1.3.3.2.2.2.2.2.3.3.1a" xref="S4.E3.m1.3.3.2.2.2.2.2.3.3.1.cmml">⁢</mo><mi id="S4.E3.m1.3.3.2.2.2.2.2.3.3.4" xref="S4.E3.m1.3.3.2.2.2.2.2.3.3.4.cmml">P</mi></mrow></msubsup></mrow><mo id="S4.E3.m1.3.3.2.2.2.2.5" xref="S4.E3.m1.3.3.2.2.3.cmml">)</mo></mrow></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S4.E3.m1.3b"><apply id="S4.E3.m1.3.3.cmml" xref="S4.E3.m1.3.3"><eq id="S4.E3.m1.3.3.3.cmml" xref="S4.E3.m1.3.3.3"></eq><apply id="S4.E3.m1.3.3.4.cmml" xref="S4.E3.m1.3.3.4"><csymbol cd="ambiguous" id="S4.E3.m1.3.3.4.1.cmml" xref="S4.E3.m1.3.3.4">subscript</csymbol><ci id="S4.E3.m1.3.3.4.2.cmml" xref="S4.E3.m1.3.3.4.2">𝐶</ci><apply id="S4.E3.m1.3.3.4.3.cmml" xref="S4.E3.m1.3.3.4.3"><times id="S4.E3.m1.3.3.4.3.1.cmml" xref="S4.E3.m1.3.3.4.3.1"></times><ci id="S4.E3.m1.3.3.4.3.2.cmml" xref="S4.E3.m1.3.3.4.3.2">𝐹</ci><ci id="S4.E3.m1.3.3.4.3.3.cmml" xref="S4.E3.m1.3.3.4.3.3">𝑃</ci><ci id="S4.E3.m1.3.3.4.3.4.cmml" xref="S4.E3.m1.3.3.4.3.4">𝐺</ci><ci id="S4.E3.m1.3.3.4.3.5.cmml" xref="S4.E3.m1.3.3.4.3.5">𝐴</ci></apply></apply><apply id="S4.E3.m1.3.3.2.cmml" xref="S4.E3.m1.3.3.2"><times id="S4.E3.m1.3.3.2.3.cmml" xref="S4.E3.m1.3.3.2.3"></times><apply id="S4.E3.m1.3.3.2.4.cmml" xref="S4.E3.m1.3.3.2.4"><csymbol cd="ambiguous" id="S4.E3.m1.3.3.2.4.1.cmml" xref="S4.E3.m1.3.3.2.4">subscript</csymbol><ci id="S4.E3.m1.3.3.2.4.2.cmml" xref="S4.E3.m1.3.3.2.4.2">𝑓</ci><apply id="S4.E3.m1.3.3.2.4.3.cmml" xref="S4.E3.m1.3.3.2.4.3"><times id="S4.E3.m1.3.3.2.4.3.1.cmml" xref="S4.E3.m1.3.3.2.4.3.1"></times><ci id="S4.E3.m1.3.3.2.4.3.2.cmml" xref="S4.E3.m1.3.3.2.4.3.2">𝑖</ci><ci id="S4.E3.m1.3.3.2.4.3.3.cmml" xref="S4.E3.m1.3.3.2.4.3.3">𝑚</ci><ci id="S4.E3.m1.3.3.2.4.3.4.cmml" xref="S4.E3.m1.3.3.2.4.3.4">𝑝</ci></apply></apply><apply id="S4.E3.m1.3.3.2.2.3.cmml" xref="S4.E3.m1.3.3.2.2.2"><min id="S4.E3.m1.1.1.cmml" xref="S4.E3.m1.1.1"></min><apply id="S4.E3.m1.2.2.1.1.1.1.1.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1"><times id="S4.E3.m1.2.2.1.1.1.1.1.1.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.1"></times><apply id="S4.E3.m1.2.2.1.1.1.1.1.2.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2"><divide id="S4.E3.m1.2.2.1.1.1.1.1.2.1.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2"></divide><apply id="S4.E3.m1.2.2.1.1.1.1.1.2.2.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2"><csymbol cd="ambiguous" id="S4.E3.m1.2.2.1.1.1.1.1.2.2.1.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2">superscript</csymbol><apply id="S4.E3.m1.2.2.1.1.1.1.1.2.2.2.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2"><csymbol cd="ambiguous" id="S4.E3.m1.2.2.1.1.1.1.1.2.2.2.1.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2">subscript</csymbol><ci id="S4.E3.m1.2.2.1.1.1.1.1.2.2.2.2.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.2.2">𝑅</ci><ci id="S4.E3.m1.2.2.1.1.1.1.1.2.2.2.3.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.2.3">𝐴</ci></apply><apply id="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.3"><times id="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.1.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.1"></times><ci id="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.2.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.2">𝐿</ci><ci id="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.3.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.3">𝑈</ci><ci id="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.4.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.2.3.4">𝑇</ci></apply></apply><apply id="S4.E3.m1.2.2.1.1.1.1.1.2.3.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3"><csymbol cd="ambiguous" id="S4.E3.m1.2.2.1.1.1.1.1.2.3.1.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3">superscript</csymbol><apply id="S4.E3.m1.2.2.1.1.1.1.1.2.3.2.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3"><csymbol cd="ambiguous" id="S4.E3.m1.2.2.1.1.1.1.1.2.3.2.1.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3">subscript</csymbol><ci id="S4.E3.m1.2.2.1.1.1.1.1.2.3.2.2.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.2.2">𝑅</ci><ci id="S4.E3.m1.2.2.1.1.1.1.1.2.3.2.3.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.2.3">𝑂</ci></apply><apply id="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.3"><times id="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.1.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.1"></times><ci id="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.2.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.2">𝐿</ci><ci id="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.3.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.3">𝑈</ci><ci id="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.4.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.2.3.3.4">𝑇</ci></apply></apply></apply><apply id="S4.E3.m1.2.2.1.1.1.1.1.3.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S4.E3.m1.2.2.1.1.1.1.1.3.1.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.3">superscript</csymbol><apply id="S4.E3.m1.2.2.1.1.1.1.1.3.2.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.3"><csymbol cd="ambiguous" id="S4.E3.m1.2.2.1.1.1.1.1.3.2.1.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.3">subscript</csymbol><ci id="S4.E3.m1.2.2.1.1.1.1.1.3.2.2.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.3.2.2">𝑈</ci><ci id="S4.E3.m1.2.2.1.1.1.1.1.3.2.3.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.3.2.3">𝑅</ci></apply><apply id="S4.E3.m1.2.2.1.1.1.1.1.3.3.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.3.3"><times id="S4.E3.m1.2.2.1.1.1.1.1.3.3.1.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.3.3.1"></times><ci id="S4.E3.m1.2.2.1.1.1.1.1.3.3.2.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.3.3.2">𝐿</ci><ci id="S4.E3.m1.2.2.1.1.1.1.1.3.3.3.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.3.3.3">𝑈</ci><ci id="S4.E3.m1.2.2.1.1.1.1.1.3.3.4.cmml" xref="S4.E3.m1.2.2.1.1.1.1.1.3.3.4">𝑇</ci></apply></apply></apply><apply id="S4.E3.m1.3.3.2.2.2.2.2.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2"><times id="S4.E3.m1.3.3.2.2.2.2.2.1.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.1"></times><apply id="S4.E3.m1.3.3.2.2.2.2.2.2.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2"><divide id="S4.E3.m1.3.3.2.2.2.2.2.2.1.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2"></divide><apply id="S4.E3.m1.3.3.2.2.2.2.2.2.2.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2"><csymbol cd="ambiguous" id="S4.E3.m1.3.3.2.2.2.2.2.2.2.1.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2">superscript</csymbol><apply id="S4.E3.m1.3.3.2.2.2.2.2.2.2.2.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2"><csymbol cd="ambiguous" id="S4.E3.m1.3.3.2.2.2.2.2.2.2.2.1.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2">subscript</csymbol><ci id="S4.E3.m1.3.3.2.2.2.2.2.2.2.2.2.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.2.2">𝑅</ci><ci id="S4.E3.m1.3.3.2.2.2.2.2.2.2.2.3.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.2.3">𝐴</ci></apply><apply id="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.3"><times id="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.1.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.1"></times><ci id="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.2.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.2">𝐷</ci><ci id="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.3.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.3">𝑆</ci><ci id="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.4.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.2.3.4">𝑃</ci></apply></apply><apply id="S4.E3.m1.3.3.2.2.2.2.2.2.3.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3"><csymbol cd="ambiguous" id="S4.E3.m1.3.3.2.2.2.2.2.2.3.1.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3">superscript</csymbol><apply id="S4.E3.m1.3.3.2.2.2.2.2.2.3.2.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3"><csymbol cd="ambiguous" id="S4.E3.m1.3.3.2.2.2.2.2.2.3.2.1.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3">subscript</csymbol><ci id="S4.E3.m1.3.3.2.2.2.2.2.2.3.2.2.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.2.2">𝑅</ci><ci id="S4.E3.m1.3.3.2.2.2.2.2.2.3.2.3.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.2.3">𝑂</ci></apply><apply id="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.3"><times id="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.1.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.1"></times><ci id="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.2.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.2">𝐷</ci><ci id="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.3.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.3">𝑆</ci><ci id="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.4.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.2.3.3.4">𝑃</ci></apply></apply></apply><apply id="S4.E3.m1.3.3.2.2.2.2.2.3.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.3"><csymbol cd="ambiguous" id="S4.E3.m1.3.3.2.2.2.2.2.3.1.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.3">superscript</csymbol><apply id="S4.E3.m1.3.3.2.2.2.2.2.3.2.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.3"><csymbol cd="ambiguous" id="S4.E3.m1.3.3.2.2.2.2.2.3.2.1.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.3">subscript</csymbol><ci id="S4.E3.m1.3.3.2.2.2.2.2.3.2.2.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.3.2.2">𝑈</ci><ci id="S4.E3.m1.3.3.2.2.2.2.2.3.2.3.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.3.2.3">𝑅</ci></apply><apply id="S4.E3.m1.3.3.2.2.2.2.2.3.3.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.3.3"><times id="S4.E3.m1.3.3.2.2.2.2.2.3.3.1.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.3.3.1"></times><ci id="S4.E3.m1.3.3.2.2.2.2.2.3.3.2.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.3.3.2">𝐷</ci><ci id="S4.E3.m1.3.3.2.2.2.2.2.3.3.3.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.3.3.3">𝑆</ci><ci id="S4.E3.m1.3.3.2.2.2.2.2.3.3.4.cmml" xref="S4.E3.m1.3.3.2.2.2.2.2.3.3.4">𝑃</ci></apply></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.E3.m1.3c">C_{FPGA}=f_{imp}\times\min\left(\frac{R_{A}^{LUT}}{R_{O}^{LUT}}\times U_{R}^{% LUT},\frac{R_{A}^{DSP}}{R_{O}^{DSP}}\times U_{R}^{DSP}\right)</annotation><annotation encoding="application/x-llamapun" id="S4.E3.m1.3d">italic_C start_POSTSUBSCRIPT italic_F italic_P italic_G italic_A end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_i italic_m italic_p end_POSTSUBSCRIPT × roman_min ( divide start_ARG italic_R start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L italic_U italic_T end_POSTSUPERSCRIPT end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L italic_U italic_T end_POSTSUPERSCRIPT end_ARG × italic_U start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L italic_U italic_T end_POSTSUPERSCRIPT , divide start_ARG italic_R start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D italic_S italic_P end_POSTSUPERSCRIPT end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D italic_S italic_P end_POSTSUPERSCRIPT end_ARG × italic_U start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D italic_S italic_P end_POSTSUPERSCRIPT )</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(3)</span></td> </tr></tbody> </table> </div> <div class="ltx_para" id="S4.SS2.p7"> <p class="ltx_p" id="S4.SS2.p7.4">Next, we determine the FPGA’s memory bandwidth <math alttext="B_{HBM}" class="ltx_Math" display="inline" id="S4.SS2.p7.1.m1.1"><semantics id="S4.SS2.p7.1.m1.1a"><msub id="S4.SS2.p7.1.m1.1.1" xref="S4.SS2.p7.1.m1.1.1.cmml"><mi id="S4.SS2.p7.1.m1.1.1.2" xref="S4.SS2.p7.1.m1.1.1.2.cmml">B</mi><mrow id="S4.SS2.p7.1.m1.1.1.3" xref="S4.SS2.p7.1.m1.1.1.3.cmml"><mi id="S4.SS2.p7.1.m1.1.1.3.2" xref="S4.SS2.p7.1.m1.1.1.3.2.cmml">H</mi><mo id="S4.SS2.p7.1.m1.1.1.3.1" xref="S4.SS2.p7.1.m1.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p7.1.m1.1.1.3.3" xref="S4.SS2.p7.1.m1.1.1.3.3.cmml">B</mi><mo id="S4.SS2.p7.1.m1.1.1.3.1a" xref="S4.SS2.p7.1.m1.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p7.1.m1.1.1.3.4" xref="S4.SS2.p7.1.m1.1.1.3.4.cmml">M</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S4.SS2.p7.1.m1.1b"><apply id="S4.SS2.p7.1.m1.1.1.cmml" xref="S4.SS2.p7.1.m1.1.1"><csymbol cd="ambiguous" id="S4.SS2.p7.1.m1.1.1.1.cmml" xref="S4.SS2.p7.1.m1.1.1">subscript</csymbol><ci id="S4.SS2.p7.1.m1.1.1.2.cmml" xref="S4.SS2.p7.1.m1.1.1.2">𝐵</ci><apply id="S4.SS2.p7.1.m1.1.1.3.cmml" xref="S4.SS2.p7.1.m1.1.1.3"><times id="S4.SS2.p7.1.m1.1.1.3.1.cmml" xref="S4.SS2.p7.1.m1.1.1.3.1"></times><ci id="S4.SS2.p7.1.m1.1.1.3.2.cmml" xref="S4.SS2.p7.1.m1.1.1.3.2">𝐻</ci><ci id="S4.SS2.p7.1.m1.1.1.3.3.cmml" xref="S4.SS2.p7.1.m1.1.1.3.3">𝐵</ci><ci id="S4.SS2.p7.1.m1.1.1.3.4.cmml" xref="S4.SS2.p7.1.m1.1.1.3.4">𝑀</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p7.1.m1.1c">B_{HBM}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p7.1.m1.1d">italic_B start_POSTSUBSCRIPT italic_H italic_B italic_M end_POSTSUBSCRIPT</annotation></semantics></math> by considering the HBM frequency <math alttext="f_{HBM}" class="ltx_Math" display="inline" id="S4.SS2.p7.2.m2.1"><semantics id="S4.SS2.p7.2.m2.1a"><msub id="S4.SS2.p7.2.m2.1.1" xref="S4.SS2.p7.2.m2.1.1.cmml"><mi id="S4.SS2.p7.2.m2.1.1.2" xref="S4.SS2.p7.2.m2.1.1.2.cmml">f</mi><mrow id="S4.SS2.p7.2.m2.1.1.3" xref="S4.SS2.p7.2.m2.1.1.3.cmml"><mi id="S4.SS2.p7.2.m2.1.1.3.2" xref="S4.SS2.p7.2.m2.1.1.3.2.cmml">H</mi><mo id="S4.SS2.p7.2.m2.1.1.3.1" xref="S4.SS2.p7.2.m2.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p7.2.m2.1.1.3.3" xref="S4.SS2.p7.2.m2.1.1.3.3.cmml">B</mi><mo id="S4.SS2.p7.2.m2.1.1.3.1a" xref="S4.SS2.p7.2.m2.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p7.2.m2.1.1.3.4" xref="S4.SS2.p7.2.m2.1.1.3.4.cmml">M</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S4.SS2.p7.2.m2.1b"><apply id="S4.SS2.p7.2.m2.1.1.cmml" xref="S4.SS2.p7.2.m2.1.1"><csymbol cd="ambiguous" id="S4.SS2.p7.2.m2.1.1.1.cmml" xref="S4.SS2.p7.2.m2.1.1">subscript</csymbol><ci id="S4.SS2.p7.2.m2.1.1.2.cmml" xref="S4.SS2.p7.2.m2.1.1.2">𝑓</ci><apply id="S4.SS2.p7.2.m2.1.1.3.cmml" xref="S4.SS2.p7.2.m2.1.1.3"><times id="S4.SS2.p7.2.m2.1.1.3.1.cmml" xref="S4.SS2.p7.2.m2.1.1.3.1"></times><ci id="S4.SS2.p7.2.m2.1.1.3.2.cmml" xref="S4.SS2.p7.2.m2.1.1.3.2">𝐻</ci><ci id="S4.SS2.p7.2.m2.1.1.3.3.cmml" xref="S4.SS2.p7.2.m2.1.1.3.3">𝐵</ci><ci id="S4.SS2.p7.2.m2.1.1.3.4.cmml" xref="S4.SS2.p7.2.m2.1.1.3.4">𝑀</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p7.2.m2.1c">f_{HBM}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p7.2.m2.1d">italic_f start_POSTSUBSCRIPT italic_H italic_B italic_M end_POSTSUBSCRIPT</annotation></semantics></math>, data width <math alttext="W_{HBM}" class="ltx_Math" display="inline" id="S4.SS2.p7.3.m3.1"><semantics id="S4.SS2.p7.3.m3.1a"><msub id="S4.SS2.p7.3.m3.1.1" xref="S4.SS2.p7.3.m3.1.1.cmml"><mi id="S4.SS2.p7.3.m3.1.1.2" xref="S4.SS2.p7.3.m3.1.1.2.cmml">W</mi><mrow id="S4.SS2.p7.3.m3.1.1.3" xref="S4.SS2.p7.3.m3.1.1.3.cmml"><mi id="S4.SS2.p7.3.m3.1.1.3.2" xref="S4.SS2.p7.3.m3.1.1.3.2.cmml">H</mi><mo id="S4.SS2.p7.3.m3.1.1.3.1" xref="S4.SS2.p7.3.m3.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p7.3.m3.1.1.3.3" xref="S4.SS2.p7.3.m3.1.1.3.3.cmml">B</mi><mo id="S4.SS2.p7.3.m3.1.1.3.1a" xref="S4.SS2.p7.3.m3.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p7.3.m3.1.1.3.4" xref="S4.SS2.p7.3.m3.1.1.3.4.cmml">M</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S4.SS2.p7.3.m3.1b"><apply id="S4.SS2.p7.3.m3.1.1.cmml" xref="S4.SS2.p7.3.m3.1.1"><csymbol cd="ambiguous" id="S4.SS2.p7.3.m3.1.1.1.cmml" xref="S4.SS2.p7.3.m3.1.1">subscript</csymbol><ci id="S4.SS2.p7.3.m3.1.1.2.cmml" xref="S4.SS2.p7.3.m3.1.1.2">𝑊</ci><apply id="S4.SS2.p7.3.m3.1.1.3.cmml" xref="S4.SS2.p7.3.m3.1.1.3"><times id="S4.SS2.p7.3.m3.1.1.3.1.cmml" xref="S4.SS2.p7.3.m3.1.1.3.1"></times><ci id="S4.SS2.p7.3.m3.1.1.3.2.cmml" xref="S4.SS2.p7.3.m3.1.1.3.2">𝐻</ci><ci id="S4.SS2.p7.3.m3.1.1.3.3.cmml" xref="S4.SS2.p7.3.m3.1.1.3.3">𝐵</ci><ci id="S4.SS2.p7.3.m3.1.1.3.4.cmml" xref="S4.SS2.p7.3.m3.1.1.3.4">𝑀</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p7.3.m3.1c">W_{HBM}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p7.3.m3.1d">italic_W start_POSTSUBSCRIPT italic_H italic_B italic_M end_POSTSUBSCRIPT</annotation></semantics></math>, and the number of channels <math alttext="Ch_{HBM}" class="ltx_Math" display="inline" id="S4.SS2.p7.4.m4.1"><semantics id="S4.SS2.p7.4.m4.1a"><mrow id="S4.SS2.p7.4.m4.1.1" xref="S4.SS2.p7.4.m4.1.1.cmml"><mi id="S4.SS2.p7.4.m4.1.1.2" xref="S4.SS2.p7.4.m4.1.1.2.cmml">C</mi><mo id="S4.SS2.p7.4.m4.1.1.1" xref="S4.SS2.p7.4.m4.1.1.1.cmml">⁢</mo><msub id="S4.SS2.p7.4.m4.1.1.3" xref="S4.SS2.p7.4.m4.1.1.3.cmml"><mi id="S4.SS2.p7.4.m4.1.1.3.2" xref="S4.SS2.p7.4.m4.1.1.3.2.cmml">h</mi><mrow id="S4.SS2.p7.4.m4.1.1.3.3" xref="S4.SS2.p7.4.m4.1.1.3.3.cmml"><mi id="S4.SS2.p7.4.m4.1.1.3.3.2" xref="S4.SS2.p7.4.m4.1.1.3.3.2.cmml">H</mi><mo id="S4.SS2.p7.4.m4.1.1.3.3.1" xref="S4.SS2.p7.4.m4.1.1.3.3.1.cmml">⁢</mo><mi id="S4.SS2.p7.4.m4.1.1.3.3.3" xref="S4.SS2.p7.4.m4.1.1.3.3.3.cmml">B</mi><mo id="S4.SS2.p7.4.m4.1.1.3.3.1a" xref="S4.SS2.p7.4.m4.1.1.3.3.1.cmml">⁢</mo><mi id="S4.SS2.p7.4.m4.1.1.3.3.4" xref="S4.SS2.p7.4.m4.1.1.3.3.4.cmml">M</mi></mrow></msub></mrow><annotation-xml encoding="MathML-Content" id="S4.SS2.p7.4.m4.1b"><apply id="S4.SS2.p7.4.m4.1.1.cmml" xref="S4.SS2.p7.4.m4.1.1"><times id="S4.SS2.p7.4.m4.1.1.1.cmml" xref="S4.SS2.p7.4.m4.1.1.1"></times><ci id="S4.SS2.p7.4.m4.1.1.2.cmml" xref="S4.SS2.p7.4.m4.1.1.2">𝐶</ci><apply id="S4.SS2.p7.4.m4.1.1.3.cmml" xref="S4.SS2.p7.4.m4.1.1.3"><csymbol cd="ambiguous" id="S4.SS2.p7.4.m4.1.1.3.1.cmml" xref="S4.SS2.p7.4.m4.1.1.3">subscript</csymbol><ci id="S4.SS2.p7.4.m4.1.1.3.2.cmml" xref="S4.SS2.p7.4.m4.1.1.3.2">ℎ</ci><apply id="S4.SS2.p7.4.m4.1.1.3.3.cmml" xref="S4.SS2.p7.4.m4.1.1.3.3"><times id="S4.SS2.p7.4.m4.1.1.3.3.1.cmml" xref="S4.SS2.p7.4.m4.1.1.3.3.1"></times><ci id="S4.SS2.p7.4.m4.1.1.3.3.2.cmml" xref="S4.SS2.p7.4.m4.1.1.3.3.2">𝐻</ci><ci id="S4.SS2.p7.4.m4.1.1.3.3.3.cmml" xref="S4.SS2.p7.4.m4.1.1.3.3.3">𝐵</ci><ci id="S4.SS2.p7.4.m4.1.1.3.3.4.cmml" xref="S4.SS2.p7.4.m4.1.1.3.3.4">𝑀</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p7.4.m4.1c">Ch_{HBM}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p7.4.m4.1d">italic_C italic_h start_POSTSUBSCRIPT italic_H italic_B italic_M end_POSTSUBSCRIPT</annotation></semantics></math>:</p> </div> <div class="ltx_para" id="S4.SS2.p8"> <table class="ltx_equation ltx_eqn_table" id="S4.E4"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="B_{HBM}=f_{HBM}\times W_{HBM}\times Ch_{HBM}" class="ltx_Math" display="block" id="S4.E4.m1.1"><semantics id="S4.E4.m1.1a"><mrow id="S4.E4.m1.1.1" xref="S4.E4.m1.1.1.cmml"><msub id="S4.E4.m1.1.1.2" xref="S4.E4.m1.1.1.2.cmml"><mi id="S4.E4.m1.1.1.2.2" xref="S4.E4.m1.1.1.2.2.cmml">B</mi><mrow id="S4.E4.m1.1.1.2.3" xref="S4.E4.m1.1.1.2.3.cmml"><mi id="S4.E4.m1.1.1.2.3.2" xref="S4.E4.m1.1.1.2.3.2.cmml">H</mi><mo id="S4.E4.m1.1.1.2.3.1" xref="S4.E4.m1.1.1.2.3.1.cmml">⁢</mo><mi id="S4.E4.m1.1.1.2.3.3" xref="S4.E4.m1.1.1.2.3.3.cmml">B</mi><mo id="S4.E4.m1.1.1.2.3.1a" xref="S4.E4.m1.1.1.2.3.1.cmml">⁢</mo><mi id="S4.E4.m1.1.1.2.3.4" xref="S4.E4.m1.1.1.2.3.4.cmml">M</mi></mrow></msub><mo id="S4.E4.m1.1.1.1" xref="S4.E4.m1.1.1.1.cmml">=</mo><mrow id="S4.E4.m1.1.1.3" xref="S4.E4.m1.1.1.3.cmml"><mrow id="S4.E4.m1.1.1.3.2" xref="S4.E4.m1.1.1.3.2.cmml"><msub id="S4.E4.m1.1.1.3.2.2" xref="S4.E4.m1.1.1.3.2.2.cmml"><mi id="S4.E4.m1.1.1.3.2.2.2" xref="S4.E4.m1.1.1.3.2.2.2.cmml">f</mi><mrow id="S4.E4.m1.1.1.3.2.2.3" xref="S4.E4.m1.1.1.3.2.2.3.cmml"><mi id="S4.E4.m1.1.1.3.2.2.3.2" xref="S4.E4.m1.1.1.3.2.2.3.2.cmml">H</mi><mo id="S4.E4.m1.1.1.3.2.2.3.1" xref="S4.E4.m1.1.1.3.2.2.3.1.cmml">⁢</mo><mi id="S4.E4.m1.1.1.3.2.2.3.3" xref="S4.E4.m1.1.1.3.2.2.3.3.cmml">B</mi><mo id="S4.E4.m1.1.1.3.2.2.3.1a" xref="S4.E4.m1.1.1.3.2.2.3.1.cmml">⁢</mo><mi id="S4.E4.m1.1.1.3.2.2.3.4" xref="S4.E4.m1.1.1.3.2.2.3.4.cmml">M</mi></mrow></msub><mo id="S4.E4.m1.1.1.3.2.1" lspace="0.222em" rspace="0.222em" xref="S4.E4.m1.1.1.3.2.1.cmml">×</mo><msub id="S4.E4.m1.1.1.3.2.3" xref="S4.E4.m1.1.1.3.2.3.cmml"><mi id="S4.E4.m1.1.1.3.2.3.2" xref="S4.E4.m1.1.1.3.2.3.2.cmml">W</mi><mrow id="S4.E4.m1.1.1.3.2.3.3" xref="S4.E4.m1.1.1.3.2.3.3.cmml"><mi id="S4.E4.m1.1.1.3.2.3.3.2" xref="S4.E4.m1.1.1.3.2.3.3.2.cmml">H</mi><mo id="S4.E4.m1.1.1.3.2.3.3.1" xref="S4.E4.m1.1.1.3.2.3.3.1.cmml">⁢</mo><mi id="S4.E4.m1.1.1.3.2.3.3.3" xref="S4.E4.m1.1.1.3.2.3.3.3.cmml">B</mi><mo id="S4.E4.m1.1.1.3.2.3.3.1a" xref="S4.E4.m1.1.1.3.2.3.3.1.cmml">⁢</mo><mi id="S4.E4.m1.1.1.3.2.3.3.4" xref="S4.E4.m1.1.1.3.2.3.3.4.cmml">M</mi></mrow></msub><mo id="S4.E4.m1.1.1.3.2.1a" lspace="0.222em" rspace="0.222em" xref="S4.E4.m1.1.1.3.2.1.cmml">×</mo><mi id="S4.E4.m1.1.1.3.2.4" xref="S4.E4.m1.1.1.3.2.4.cmml">C</mi></mrow><mo id="S4.E4.m1.1.1.3.1" xref="S4.E4.m1.1.1.3.1.cmml">⁢</mo><msub id="S4.E4.m1.1.1.3.3" xref="S4.E4.m1.1.1.3.3.cmml"><mi id="S4.E4.m1.1.1.3.3.2" xref="S4.E4.m1.1.1.3.3.2.cmml">h</mi><mrow id="S4.E4.m1.1.1.3.3.3" xref="S4.E4.m1.1.1.3.3.3.cmml"><mi id="S4.E4.m1.1.1.3.3.3.2" xref="S4.E4.m1.1.1.3.3.3.2.cmml">H</mi><mo id="S4.E4.m1.1.1.3.3.3.1" xref="S4.E4.m1.1.1.3.3.3.1.cmml">⁢</mo><mi id="S4.E4.m1.1.1.3.3.3.3" xref="S4.E4.m1.1.1.3.3.3.3.cmml">B</mi><mo id="S4.E4.m1.1.1.3.3.3.1a" xref="S4.E4.m1.1.1.3.3.3.1.cmml">⁢</mo><mi id="S4.E4.m1.1.1.3.3.3.4" xref="S4.E4.m1.1.1.3.3.3.4.cmml">M</mi></mrow></msub></mrow></mrow><annotation-xml encoding="MathML-Content" id="S4.E4.m1.1b"><apply id="S4.E4.m1.1.1.cmml" xref="S4.E4.m1.1.1"><eq id="S4.E4.m1.1.1.1.cmml" xref="S4.E4.m1.1.1.1"></eq><apply id="S4.E4.m1.1.1.2.cmml" xref="S4.E4.m1.1.1.2"><csymbol cd="ambiguous" id="S4.E4.m1.1.1.2.1.cmml" xref="S4.E4.m1.1.1.2">subscript</csymbol><ci id="S4.E4.m1.1.1.2.2.cmml" xref="S4.E4.m1.1.1.2.2">𝐵</ci><apply id="S4.E4.m1.1.1.2.3.cmml" xref="S4.E4.m1.1.1.2.3"><times id="S4.E4.m1.1.1.2.3.1.cmml" xref="S4.E4.m1.1.1.2.3.1"></times><ci id="S4.E4.m1.1.1.2.3.2.cmml" xref="S4.E4.m1.1.1.2.3.2">𝐻</ci><ci id="S4.E4.m1.1.1.2.3.3.cmml" xref="S4.E4.m1.1.1.2.3.3">𝐵</ci><ci id="S4.E4.m1.1.1.2.3.4.cmml" xref="S4.E4.m1.1.1.2.3.4">𝑀</ci></apply></apply><apply id="S4.E4.m1.1.1.3.cmml" xref="S4.E4.m1.1.1.3"><times id="S4.E4.m1.1.1.3.1.cmml" xref="S4.E4.m1.1.1.3.1"></times><apply id="S4.E4.m1.1.1.3.2.cmml" xref="S4.E4.m1.1.1.3.2"><times id="S4.E4.m1.1.1.3.2.1.cmml" xref="S4.E4.m1.1.1.3.2.1"></times><apply id="S4.E4.m1.1.1.3.2.2.cmml" xref="S4.E4.m1.1.1.3.2.2"><csymbol cd="ambiguous" id="S4.E4.m1.1.1.3.2.2.1.cmml" xref="S4.E4.m1.1.1.3.2.2">subscript</csymbol><ci id="S4.E4.m1.1.1.3.2.2.2.cmml" xref="S4.E4.m1.1.1.3.2.2.2">𝑓</ci><apply id="S4.E4.m1.1.1.3.2.2.3.cmml" xref="S4.E4.m1.1.1.3.2.2.3"><times id="S4.E4.m1.1.1.3.2.2.3.1.cmml" xref="S4.E4.m1.1.1.3.2.2.3.1"></times><ci id="S4.E4.m1.1.1.3.2.2.3.2.cmml" xref="S4.E4.m1.1.1.3.2.2.3.2">𝐻</ci><ci id="S4.E4.m1.1.1.3.2.2.3.3.cmml" xref="S4.E4.m1.1.1.3.2.2.3.3">𝐵</ci><ci id="S4.E4.m1.1.1.3.2.2.3.4.cmml" xref="S4.E4.m1.1.1.3.2.2.3.4">𝑀</ci></apply></apply><apply id="S4.E4.m1.1.1.3.2.3.cmml" xref="S4.E4.m1.1.1.3.2.3"><csymbol cd="ambiguous" id="S4.E4.m1.1.1.3.2.3.1.cmml" xref="S4.E4.m1.1.1.3.2.3">subscript</csymbol><ci id="S4.E4.m1.1.1.3.2.3.2.cmml" xref="S4.E4.m1.1.1.3.2.3.2">𝑊</ci><apply id="S4.E4.m1.1.1.3.2.3.3.cmml" xref="S4.E4.m1.1.1.3.2.3.3"><times id="S4.E4.m1.1.1.3.2.3.3.1.cmml" xref="S4.E4.m1.1.1.3.2.3.3.1"></times><ci id="S4.E4.m1.1.1.3.2.3.3.2.cmml" xref="S4.E4.m1.1.1.3.2.3.3.2">𝐻</ci><ci id="S4.E4.m1.1.1.3.2.3.3.3.cmml" xref="S4.E4.m1.1.1.3.2.3.3.3">𝐵</ci><ci id="S4.E4.m1.1.1.3.2.3.3.4.cmml" xref="S4.E4.m1.1.1.3.2.3.3.4">𝑀</ci></apply></apply><ci id="S4.E4.m1.1.1.3.2.4.cmml" xref="S4.E4.m1.1.1.3.2.4">𝐶</ci></apply><apply id="S4.E4.m1.1.1.3.3.cmml" xref="S4.E4.m1.1.1.3.3"><csymbol cd="ambiguous" id="S4.E4.m1.1.1.3.3.1.cmml" xref="S4.E4.m1.1.1.3.3">subscript</csymbol><ci id="S4.E4.m1.1.1.3.3.2.cmml" xref="S4.E4.m1.1.1.3.3.2">ℎ</ci><apply id="S4.E4.m1.1.1.3.3.3.cmml" xref="S4.E4.m1.1.1.3.3.3"><times id="S4.E4.m1.1.1.3.3.3.1.cmml" xref="S4.E4.m1.1.1.3.3.3.1"></times><ci id="S4.E4.m1.1.1.3.3.3.2.cmml" xref="S4.E4.m1.1.1.3.3.3.2">𝐻</ci><ci id="S4.E4.m1.1.1.3.3.3.3.cmml" xref="S4.E4.m1.1.1.3.3.3.3">𝐵</ci><ci id="S4.E4.m1.1.1.3.3.3.4.cmml" xref="S4.E4.m1.1.1.3.3.3.4">𝑀</ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.E4.m1.1c">B_{HBM}=f_{HBM}\times W_{HBM}\times Ch_{HBM}</annotation><annotation encoding="application/x-llamapun" id="S4.E4.m1.1d">italic_B start_POSTSUBSCRIPT italic_H italic_B italic_M end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_H italic_B italic_M end_POSTSUBSCRIPT × italic_W start_POSTSUBSCRIPT italic_H italic_B italic_M end_POSTSUBSCRIPT × italic_C italic_h start_POSTSUBSCRIPT italic_H italic_B italic_M end_POSTSUBSCRIPT</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(4)</span></td> </tr></tbody> </table> </div> <div class="ltx_para" id="S4.SS2.p9"> <p class="ltx_p" id="S4.SS2.p9.1">Finally, the machine balance <math alttext="M_{b}" class="ltx_Math" display="inline" id="S4.SS2.p9.1.m1.1"><semantics id="S4.SS2.p9.1.m1.1a"><msub id="S4.SS2.p9.1.m1.1.1" xref="S4.SS2.p9.1.m1.1.1.cmml"><mi id="S4.SS2.p9.1.m1.1.1.2" xref="S4.SS2.p9.1.m1.1.1.2.cmml">M</mi><mi id="S4.SS2.p9.1.m1.1.1.3" xref="S4.SS2.p9.1.m1.1.1.3.cmml">b</mi></msub><annotation-xml encoding="MathML-Content" id="S4.SS2.p9.1.m1.1b"><apply id="S4.SS2.p9.1.m1.1.1.cmml" xref="S4.SS2.p9.1.m1.1.1"><csymbol cd="ambiguous" id="S4.SS2.p9.1.m1.1.1.1.cmml" xref="S4.SS2.p9.1.m1.1.1">subscript</csymbol><ci id="S4.SS2.p9.1.m1.1.1.2.cmml" xref="S4.SS2.p9.1.m1.1.1.2">𝑀</ci><ci id="S4.SS2.p9.1.m1.1.1.3.cmml" xref="S4.SS2.p9.1.m1.1.1.3">𝑏</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p9.1.m1.1c">M_{b}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p9.1.m1.1d">italic_M start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT</annotation></semantics></math> for the FPGA is given by:</p> </div> <div class="ltx_para" id="S4.SS2.p10"> <table class="ltx_equation ltx_eqn_table" id="S4.E5"> <tbody><tr class="ltx_equation ltx_eqn_row ltx_align_baseline"> <td class="ltx_eqn_cell ltx_eqn_center_padleft"></td> <td class="ltx_eqn_cell ltx_align_center"><math alttext="M_{b}=\frac{C_{FPGA}}{B_{HBM}}=\frac{f_{imp}\times\min\left(\frac{R_{A}^{LUT}}% {R_{O}^{LUT}}\times U_{R}^{LUT},\frac{R_{A}^{DSP}}{R_{O}^{DSP}}\times U_{R}^{% DSP}\right)}{f_{HBM}\times W_{HBM}\times Ch_{HBM}}" class="ltx_Math" display="block" id="S4.E5.m1.3"><semantics id="S4.E5.m1.3a"><mrow id="S4.E5.m1.3.4" xref="S4.E5.m1.3.4.cmml"><msub id="S4.E5.m1.3.4.2" xref="S4.E5.m1.3.4.2.cmml"><mi id="S4.E5.m1.3.4.2.2" xref="S4.E5.m1.3.4.2.2.cmml">M</mi><mi id="S4.E5.m1.3.4.2.3" xref="S4.E5.m1.3.4.2.3.cmml">b</mi></msub><mo id="S4.E5.m1.3.4.3" xref="S4.E5.m1.3.4.3.cmml">=</mo><mfrac id="S4.E5.m1.3.4.4" xref="S4.E5.m1.3.4.4.cmml"><msub id="S4.E5.m1.3.4.4.2" xref="S4.E5.m1.3.4.4.2.cmml"><mi id="S4.E5.m1.3.4.4.2.2" xref="S4.E5.m1.3.4.4.2.2.cmml">C</mi><mrow id="S4.E5.m1.3.4.4.2.3" xref="S4.E5.m1.3.4.4.2.3.cmml"><mi id="S4.E5.m1.3.4.4.2.3.2" xref="S4.E5.m1.3.4.4.2.3.2.cmml">F</mi><mo id="S4.E5.m1.3.4.4.2.3.1" xref="S4.E5.m1.3.4.4.2.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.4.4.2.3.3" xref="S4.E5.m1.3.4.4.2.3.3.cmml">P</mi><mo id="S4.E5.m1.3.4.4.2.3.1a" xref="S4.E5.m1.3.4.4.2.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.4.4.2.3.4" xref="S4.E5.m1.3.4.4.2.3.4.cmml">G</mi><mo id="S4.E5.m1.3.4.4.2.3.1b" xref="S4.E5.m1.3.4.4.2.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.4.4.2.3.5" xref="S4.E5.m1.3.4.4.2.3.5.cmml">A</mi></mrow></msub><msub id="S4.E5.m1.3.4.4.3" xref="S4.E5.m1.3.4.4.3.cmml"><mi id="S4.E5.m1.3.4.4.3.2" xref="S4.E5.m1.3.4.4.3.2.cmml">B</mi><mrow id="S4.E5.m1.3.4.4.3.3" xref="S4.E5.m1.3.4.4.3.3.cmml"><mi id="S4.E5.m1.3.4.4.3.3.2" xref="S4.E5.m1.3.4.4.3.3.2.cmml">H</mi><mo id="S4.E5.m1.3.4.4.3.3.1" xref="S4.E5.m1.3.4.4.3.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.4.4.3.3.3" xref="S4.E5.m1.3.4.4.3.3.3.cmml">B</mi><mo id="S4.E5.m1.3.4.4.3.3.1a" xref="S4.E5.m1.3.4.4.3.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.4.4.3.3.4" xref="S4.E5.m1.3.4.4.3.3.4.cmml">M</mi></mrow></msub></mfrac><mo id="S4.E5.m1.3.4.5" xref="S4.E5.m1.3.4.5.cmml">=</mo><mfrac id="S4.E5.m1.3.3" xref="S4.E5.m1.3.3.cmml"><mrow id="S4.E5.m1.3.3.3" xref="S4.E5.m1.3.3.3.cmml"><msub id="S4.E5.m1.3.3.3.5" xref="S4.E5.m1.3.3.3.5.cmml"><mi id="S4.E5.m1.3.3.3.5.2" xref="S4.E5.m1.3.3.3.5.2.cmml">f</mi><mrow id="S4.E5.m1.3.3.3.5.3" xref="S4.E5.m1.3.3.3.5.3.cmml"><mi id="S4.E5.m1.3.3.3.5.3.2" xref="S4.E5.m1.3.3.3.5.3.2.cmml">i</mi><mo id="S4.E5.m1.3.3.3.5.3.1" xref="S4.E5.m1.3.3.3.5.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.3.3.5.3.3" xref="S4.E5.m1.3.3.3.5.3.3.cmml">m</mi><mo id="S4.E5.m1.3.3.3.5.3.1a" xref="S4.E5.m1.3.3.3.5.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.3.3.5.3.4" xref="S4.E5.m1.3.3.3.5.3.4.cmml">p</mi></mrow></msub><mo id="S4.E5.m1.3.3.3.4" lspace="0.222em" rspace="0.222em" xref="S4.E5.m1.3.3.3.4.cmml">×</mo><mrow id="S4.E5.m1.3.3.3.3.2" xref="S4.E5.m1.3.3.3.3.3.cmml"><mi id="S4.E5.m1.1.1.1.1" xref="S4.E5.m1.1.1.1.1.cmml">min</mi><mo id="S4.E5.m1.3.3.3.3.2a" xref="S4.E5.m1.3.3.3.3.3.cmml">⁡</mo><mrow id="S4.E5.m1.3.3.3.3.2.2" xref="S4.E5.m1.3.3.3.3.3.cmml"><mo id="S4.E5.m1.3.3.3.3.2.2.3" xref="S4.E5.m1.3.3.3.3.3.cmml">(</mo><mrow id="S4.E5.m1.2.2.2.2.1.1.1" xref="S4.E5.m1.2.2.2.2.1.1.1.cmml"><mfrac id="S4.E5.m1.2.2.2.2.1.1.1.2" xref="S4.E5.m1.2.2.2.2.1.1.1.2.cmml"><msubsup id="S4.E5.m1.2.2.2.2.1.1.1.2.2" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.cmml"><mi id="S4.E5.m1.2.2.2.2.1.1.1.2.2.2.2" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.2.2.cmml">R</mi><mi id="S4.E5.m1.2.2.2.2.1.1.1.2.2.2.3" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.2.3.cmml">A</mi><mrow id="S4.E5.m1.2.2.2.2.1.1.1.2.2.3" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.cmml"><mi id="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.2" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.2.cmml">L</mi><mo id="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.1" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.1.cmml">⁢</mo><mi id="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.3" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.3.cmml">U</mi><mo id="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.1a" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.1.cmml">⁢</mo><mi id="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.4" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.4.cmml">T</mi></mrow></msubsup><msubsup id="S4.E5.m1.2.2.2.2.1.1.1.2.3" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.cmml"><mi id="S4.E5.m1.2.2.2.2.1.1.1.2.3.2.2" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.2.2.cmml">R</mi><mi id="S4.E5.m1.2.2.2.2.1.1.1.2.3.2.3" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.2.3.cmml">O</mi><mrow id="S4.E5.m1.2.2.2.2.1.1.1.2.3.3" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.cmml"><mi id="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.2" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.2.cmml">L</mi><mo id="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.1" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.1.cmml">⁢</mo><mi id="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.3" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.3.cmml">U</mi><mo id="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.1a" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.1.cmml">⁢</mo><mi id="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.4" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.4.cmml">T</mi></mrow></msubsup></mfrac><mo id="S4.E5.m1.2.2.2.2.1.1.1.1" lspace="0.222em" rspace="0.222em" xref="S4.E5.m1.2.2.2.2.1.1.1.1.cmml">×</mo><msubsup id="S4.E5.m1.2.2.2.2.1.1.1.3" xref="S4.E5.m1.2.2.2.2.1.1.1.3.cmml"><mi id="S4.E5.m1.2.2.2.2.1.1.1.3.2.2" xref="S4.E5.m1.2.2.2.2.1.1.1.3.2.2.cmml">U</mi><mi id="S4.E5.m1.2.2.2.2.1.1.1.3.2.3" xref="S4.E5.m1.2.2.2.2.1.1.1.3.2.3.cmml">R</mi><mrow id="S4.E5.m1.2.2.2.2.1.1.1.3.3" xref="S4.E5.m1.2.2.2.2.1.1.1.3.3.cmml"><mi id="S4.E5.m1.2.2.2.2.1.1.1.3.3.2" xref="S4.E5.m1.2.2.2.2.1.1.1.3.3.2.cmml">L</mi><mo id="S4.E5.m1.2.2.2.2.1.1.1.3.3.1" xref="S4.E5.m1.2.2.2.2.1.1.1.3.3.1.cmml">⁢</mo><mi id="S4.E5.m1.2.2.2.2.1.1.1.3.3.3" xref="S4.E5.m1.2.2.2.2.1.1.1.3.3.3.cmml">U</mi><mo id="S4.E5.m1.2.2.2.2.1.1.1.3.3.1a" xref="S4.E5.m1.2.2.2.2.1.1.1.3.3.1.cmml">⁢</mo><mi id="S4.E5.m1.2.2.2.2.1.1.1.3.3.4" xref="S4.E5.m1.2.2.2.2.1.1.1.3.3.4.cmml">T</mi></mrow></msubsup></mrow><mo id="S4.E5.m1.3.3.3.3.2.2.4" xref="S4.E5.m1.3.3.3.3.3.cmml">,</mo><mrow id="S4.E5.m1.3.3.3.3.2.2.2" xref="S4.E5.m1.3.3.3.3.2.2.2.cmml"><mfrac id="S4.E5.m1.3.3.3.3.2.2.2.2" xref="S4.E5.m1.3.3.3.3.2.2.2.2.cmml"><msubsup id="S4.E5.m1.3.3.3.3.2.2.2.2.2" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.cmml"><mi id="S4.E5.m1.3.3.3.3.2.2.2.2.2.2.2" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.2.2.cmml">R</mi><mi id="S4.E5.m1.3.3.3.3.2.2.2.2.2.2.3" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.2.3.cmml">A</mi><mrow id="S4.E5.m1.3.3.3.3.2.2.2.2.2.3" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.cmml"><mi id="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.2" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.2.cmml">D</mi><mo id="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.1" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.3" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.3.cmml">S</mi><mo id="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.1a" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.4" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.4.cmml">P</mi></mrow></msubsup><msubsup id="S4.E5.m1.3.3.3.3.2.2.2.2.3" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.cmml"><mi id="S4.E5.m1.3.3.3.3.2.2.2.2.3.2.2" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.2.2.cmml">R</mi><mi id="S4.E5.m1.3.3.3.3.2.2.2.2.3.2.3" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.2.3.cmml">O</mi><mrow id="S4.E5.m1.3.3.3.3.2.2.2.2.3.3" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.cmml"><mi id="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.2" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.2.cmml">D</mi><mo id="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.1" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.3" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.3.cmml">S</mi><mo id="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.1a" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.4" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.4.cmml">P</mi></mrow></msubsup></mfrac><mo id="S4.E5.m1.3.3.3.3.2.2.2.1" lspace="0.222em" rspace="0.222em" xref="S4.E5.m1.3.3.3.3.2.2.2.1.cmml">×</mo><msubsup id="S4.E5.m1.3.3.3.3.2.2.2.3" xref="S4.E5.m1.3.3.3.3.2.2.2.3.cmml"><mi id="S4.E5.m1.3.3.3.3.2.2.2.3.2.2" xref="S4.E5.m1.3.3.3.3.2.2.2.3.2.2.cmml">U</mi><mi id="S4.E5.m1.3.3.3.3.2.2.2.3.2.3" xref="S4.E5.m1.3.3.3.3.2.2.2.3.2.3.cmml">R</mi><mrow id="S4.E5.m1.3.3.3.3.2.2.2.3.3" xref="S4.E5.m1.3.3.3.3.2.2.2.3.3.cmml"><mi id="S4.E5.m1.3.3.3.3.2.2.2.3.3.2" xref="S4.E5.m1.3.3.3.3.2.2.2.3.3.2.cmml">D</mi><mo id="S4.E5.m1.3.3.3.3.2.2.2.3.3.1" xref="S4.E5.m1.3.3.3.3.2.2.2.3.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.3.3.3.2.2.2.3.3.3" xref="S4.E5.m1.3.3.3.3.2.2.2.3.3.3.cmml">S</mi><mo id="S4.E5.m1.3.3.3.3.2.2.2.3.3.1a" xref="S4.E5.m1.3.3.3.3.2.2.2.3.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.3.3.3.2.2.2.3.3.4" xref="S4.E5.m1.3.3.3.3.2.2.2.3.3.4.cmml">P</mi></mrow></msubsup></mrow><mo id="S4.E5.m1.3.3.3.3.2.2.5" xref="S4.E5.m1.3.3.3.3.3.cmml">)</mo></mrow></mrow></mrow><mrow id="S4.E5.m1.3.3.5" xref="S4.E5.m1.3.3.5.cmml"><mrow id="S4.E5.m1.3.3.5.2" xref="S4.E5.m1.3.3.5.2.cmml"><msub id="S4.E5.m1.3.3.5.2.2" xref="S4.E5.m1.3.3.5.2.2.cmml"><mi id="S4.E5.m1.3.3.5.2.2.2" xref="S4.E5.m1.3.3.5.2.2.2.cmml">f</mi><mrow id="S4.E5.m1.3.3.5.2.2.3" xref="S4.E5.m1.3.3.5.2.2.3.cmml"><mi id="S4.E5.m1.3.3.5.2.2.3.2" xref="S4.E5.m1.3.3.5.2.2.3.2.cmml">H</mi><mo id="S4.E5.m1.3.3.5.2.2.3.1" xref="S4.E5.m1.3.3.5.2.2.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.3.5.2.2.3.3" xref="S4.E5.m1.3.3.5.2.2.3.3.cmml">B</mi><mo id="S4.E5.m1.3.3.5.2.2.3.1a" xref="S4.E5.m1.3.3.5.2.2.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.3.5.2.2.3.4" xref="S4.E5.m1.3.3.5.2.2.3.4.cmml">M</mi></mrow></msub><mo id="S4.E5.m1.3.3.5.2.1" lspace="0.222em" rspace="0.222em" xref="S4.E5.m1.3.3.5.2.1.cmml">×</mo><msub id="S4.E5.m1.3.3.5.2.3" xref="S4.E5.m1.3.3.5.2.3.cmml"><mi id="S4.E5.m1.3.3.5.2.3.2" xref="S4.E5.m1.3.3.5.2.3.2.cmml">W</mi><mrow id="S4.E5.m1.3.3.5.2.3.3" xref="S4.E5.m1.3.3.5.2.3.3.cmml"><mi id="S4.E5.m1.3.3.5.2.3.3.2" xref="S4.E5.m1.3.3.5.2.3.3.2.cmml">H</mi><mo id="S4.E5.m1.3.3.5.2.3.3.1" xref="S4.E5.m1.3.3.5.2.3.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.3.5.2.3.3.3" xref="S4.E5.m1.3.3.5.2.3.3.3.cmml">B</mi><mo id="S4.E5.m1.3.3.5.2.3.3.1a" xref="S4.E5.m1.3.3.5.2.3.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.3.5.2.3.3.4" xref="S4.E5.m1.3.3.5.2.3.3.4.cmml">M</mi></mrow></msub><mo id="S4.E5.m1.3.3.5.2.1a" lspace="0.222em" rspace="0.222em" xref="S4.E5.m1.3.3.5.2.1.cmml">×</mo><mi id="S4.E5.m1.3.3.5.2.4" xref="S4.E5.m1.3.3.5.2.4.cmml">C</mi></mrow><mo id="S4.E5.m1.3.3.5.1" xref="S4.E5.m1.3.3.5.1.cmml">⁢</mo><msub id="S4.E5.m1.3.3.5.3" xref="S4.E5.m1.3.3.5.3.cmml"><mi id="S4.E5.m1.3.3.5.3.2" xref="S4.E5.m1.3.3.5.3.2.cmml">h</mi><mrow id="S4.E5.m1.3.3.5.3.3" xref="S4.E5.m1.3.3.5.3.3.cmml"><mi id="S4.E5.m1.3.3.5.3.3.2" xref="S4.E5.m1.3.3.5.3.3.2.cmml">H</mi><mo id="S4.E5.m1.3.3.5.3.3.1" xref="S4.E5.m1.3.3.5.3.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.3.5.3.3.3" xref="S4.E5.m1.3.3.5.3.3.3.cmml">B</mi><mo id="S4.E5.m1.3.3.5.3.3.1a" xref="S4.E5.m1.3.3.5.3.3.1.cmml">⁢</mo><mi id="S4.E5.m1.3.3.5.3.3.4" xref="S4.E5.m1.3.3.5.3.3.4.cmml">M</mi></mrow></msub></mrow></mfrac></mrow><annotation-xml encoding="MathML-Content" id="S4.E5.m1.3b"><apply id="S4.E5.m1.3.4.cmml" xref="S4.E5.m1.3.4"><and id="S4.E5.m1.3.4a.cmml" xref="S4.E5.m1.3.4"></and><apply id="S4.E5.m1.3.4b.cmml" xref="S4.E5.m1.3.4"><eq id="S4.E5.m1.3.4.3.cmml" xref="S4.E5.m1.3.4.3"></eq><apply id="S4.E5.m1.3.4.2.cmml" xref="S4.E5.m1.3.4.2"><csymbol cd="ambiguous" id="S4.E5.m1.3.4.2.1.cmml" xref="S4.E5.m1.3.4.2">subscript</csymbol><ci id="S4.E5.m1.3.4.2.2.cmml" xref="S4.E5.m1.3.4.2.2">𝑀</ci><ci id="S4.E5.m1.3.4.2.3.cmml" xref="S4.E5.m1.3.4.2.3">𝑏</ci></apply><apply id="S4.E5.m1.3.4.4.cmml" xref="S4.E5.m1.3.4.4"><divide id="S4.E5.m1.3.4.4.1.cmml" xref="S4.E5.m1.3.4.4"></divide><apply id="S4.E5.m1.3.4.4.2.cmml" xref="S4.E5.m1.3.4.4.2"><csymbol cd="ambiguous" id="S4.E5.m1.3.4.4.2.1.cmml" xref="S4.E5.m1.3.4.4.2">subscript</csymbol><ci id="S4.E5.m1.3.4.4.2.2.cmml" xref="S4.E5.m1.3.4.4.2.2">𝐶</ci><apply id="S4.E5.m1.3.4.4.2.3.cmml" xref="S4.E5.m1.3.4.4.2.3"><times id="S4.E5.m1.3.4.4.2.3.1.cmml" xref="S4.E5.m1.3.4.4.2.3.1"></times><ci id="S4.E5.m1.3.4.4.2.3.2.cmml" xref="S4.E5.m1.3.4.4.2.3.2">𝐹</ci><ci id="S4.E5.m1.3.4.4.2.3.3.cmml" xref="S4.E5.m1.3.4.4.2.3.3">𝑃</ci><ci id="S4.E5.m1.3.4.4.2.3.4.cmml" xref="S4.E5.m1.3.4.4.2.3.4">𝐺</ci><ci id="S4.E5.m1.3.4.4.2.3.5.cmml" xref="S4.E5.m1.3.4.4.2.3.5">𝐴</ci></apply></apply><apply id="S4.E5.m1.3.4.4.3.cmml" xref="S4.E5.m1.3.4.4.3"><csymbol cd="ambiguous" id="S4.E5.m1.3.4.4.3.1.cmml" xref="S4.E5.m1.3.4.4.3">subscript</csymbol><ci id="S4.E5.m1.3.4.4.3.2.cmml" xref="S4.E5.m1.3.4.4.3.2">𝐵</ci><apply id="S4.E5.m1.3.4.4.3.3.cmml" xref="S4.E5.m1.3.4.4.3.3"><times id="S4.E5.m1.3.4.4.3.3.1.cmml" xref="S4.E5.m1.3.4.4.3.3.1"></times><ci id="S4.E5.m1.3.4.4.3.3.2.cmml" xref="S4.E5.m1.3.4.4.3.3.2">𝐻</ci><ci id="S4.E5.m1.3.4.4.3.3.3.cmml" xref="S4.E5.m1.3.4.4.3.3.3">𝐵</ci><ci id="S4.E5.m1.3.4.4.3.3.4.cmml" xref="S4.E5.m1.3.4.4.3.3.4">𝑀</ci></apply></apply></apply></apply><apply id="S4.E5.m1.3.4c.cmml" xref="S4.E5.m1.3.4"><eq id="S4.E5.m1.3.4.5.cmml" xref="S4.E5.m1.3.4.5"></eq><share href="https://arxiv.org/html/2503.01561v1#S4.E5.m1.3.4.4.cmml" id="S4.E5.m1.3.4d.cmml" xref="S4.E5.m1.3.4"></share><apply id="S4.E5.m1.3.3.cmml" xref="S4.E5.m1.3.3"><divide id="S4.E5.m1.3.3.4.cmml" xref="S4.E5.m1.3.3"></divide><apply id="S4.E5.m1.3.3.3.cmml" xref="S4.E5.m1.3.3.3"><times id="S4.E5.m1.3.3.3.4.cmml" xref="S4.E5.m1.3.3.3.4"></times><apply id="S4.E5.m1.3.3.3.5.cmml" xref="S4.E5.m1.3.3.3.5"><csymbol cd="ambiguous" id="S4.E5.m1.3.3.3.5.1.cmml" xref="S4.E5.m1.3.3.3.5">subscript</csymbol><ci id="S4.E5.m1.3.3.3.5.2.cmml" xref="S4.E5.m1.3.3.3.5.2">𝑓</ci><apply id="S4.E5.m1.3.3.3.5.3.cmml" xref="S4.E5.m1.3.3.3.5.3"><times id="S4.E5.m1.3.3.3.5.3.1.cmml" xref="S4.E5.m1.3.3.3.5.3.1"></times><ci id="S4.E5.m1.3.3.3.5.3.2.cmml" xref="S4.E5.m1.3.3.3.5.3.2">𝑖</ci><ci id="S4.E5.m1.3.3.3.5.3.3.cmml" xref="S4.E5.m1.3.3.3.5.3.3">𝑚</ci><ci id="S4.E5.m1.3.3.3.5.3.4.cmml" xref="S4.E5.m1.3.3.3.5.3.4">𝑝</ci></apply></apply><apply id="S4.E5.m1.3.3.3.3.3.cmml" xref="S4.E5.m1.3.3.3.3.2"><min id="S4.E5.m1.1.1.1.1.cmml" xref="S4.E5.m1.1.1.1.1"></min><apply id="S4.E5.m1.2.2.2.2.1.1.1.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1"><times id="S4.E5.m1.2.2.2.2.1.1.1.1.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.1"></times><apply id="S4.E5.m1.2.2.2.2.1.1.1.2.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2"><divide id="S4.E5.m1.2.2.2.2.1.1.1.2.1.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2"></divide><apply id="S4.E5.m1.2.2.2.2.1.1.1.2.2.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2"><csymbol cd="ambiguous" id="S4.E5.m1.2.2.2.2.1.1.1.2.2.1.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2">superscript</csymbol><apply id="S4.E5.m1.2.2.2.2.1.1.1.2.2.2.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2"><csymbol cd="ambiguous" id="S4.E5.m1.2.2.2.2.1.1.1.2.2.2.1.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2">subscript</csymbol><ci id="S4.E5.m1.2.2.2.2.1.1.1.2.2.2.2.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.2.2">𝑅</ci><ci id="S4.E5.m1.2.2.2.2.1.1.1.2.2.2.3.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.2.3">𝐴</ci></apply><apply id="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.3"><times id="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.1.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.1"></times><ci id="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.2.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.2">𝐿</ci><ci id="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.3.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.3">𝑈</ci><ci id="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.4.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.2.3.4">𝑇</ci></apply></apply><apply id="S4.E5.m1.2.2.2.2.1.1.1.2.3.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3"><csymbol cd="ambiguous" id="S4.E5.m1.2.2.2.2.1.1.1.2.3.1.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3">superscript</csymbol><apply id="S4.E5.m1.2.2.2.2.1.1.1.2.3.2.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3"><csymbol cd="ambiguous" id="S4.E5.m1.2.2.2.2.1.1.1.2.3.2.1.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3">subscript</csymbol><ci id="S4.E5.m1.2.2.2.2.1.1.1.2.3.2.2.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.2.2">𝑅</ci><ci id="S4.E5.m1.2.2.2.2.1.1.1.2.3.2.3.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.2.3">𝑂</ci></apply><apply id="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.3"><times id="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.1.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.1"></times><ci id="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.2.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.2">𝐿</ci><ci id="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.3.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.3">𝑈</ci><ci id="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.4.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.2.3.3.4">𝑇</ci></apply></apply></apply><apply id="S4.E5.m1.2.2.2.2.1.1.1.3.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.3"><csymbol cd="ambiguous" id="S4.E5.m1.2.2.2.2.1.1.1.3.1.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.3">superscript</csymbol><apply id="S4.E5.m1.2.2.2.2.1.1.1.3.2.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.3"><csymbol cd="ambiguous" id="S4.E5.m1.2.2.2.2.1.1.1.3.2.1.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.3">subscript</csymbol><ci id="S4.E5.m1.2.2.2.2.1.1.1.3.2.2.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.3.2.2">𝑈</ci><ci id="S4.E5.m1.2.2.2.2.1.1.1.3.2.3.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.3.2.3">𝑅</ci></apply><apply id="S4.E5.m1.2.2.2.2.1.1.1.3.3.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.3.3"><times id="S4.E5.m1.2.2.2.2.1.1.1.3.3.1.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.3.3.1"></times><ci id="S4.E5.m1.2.2.2.2.1.1.1.3.3.2.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.3.3.2">𝐿</ci><ci id="S4.E5.m1.2.2.2.2.1.1.1.3.3.3.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.3.3.3">𝑈</ci><ci id="S4.E5.m1.2.2.2.2.1.1.1.3.3.4.cmml" xref="S4.E5.m1.2.2.2.2.1.1.1.3.3.4">𝑇</ci></apply></apply></apply><apply id="S4.E5.m1.3.3.3.3.2.2.2.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2"><times id="S4.E5.m1.3.3.3.3.2.2.2.1.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.1"></times><apply id="S4.E5.m1.3.3.3.3.2.2.2.2.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2"><divide id="S4.E5.m1.3.3.3.3.2.2.2.2.1.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2"></divide><apply id="S4.E5.m1.3.3.3.3.2.2.2.2.2.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2"><csymbol cd="ambiguous" id="S4.E5.m1.3.3.3.3.2.2.2.2.2.1.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2">superscript</csymbol><apply id="S4.E5.m1.3.3.3.3.2.2.2.2.2.2.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2"><csymbol cd="ambiguous" id="S4.E5.m1.3.3.3.3.2.2.2.2.2.2.1.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2">subscript</csymbol><ci id="S4.E5.m1.3.3.3.3.2.2.2.2.2.2.2.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.2.2">𝑅</ci><ci id="S4.E5.m1.3.3.3.3.2.2.2.2.2.2.3.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.2.3">𝐴</ci></apply><apply id="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.3"><times id="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.1.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.1"></times><ci id="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.2.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.2">𝐷</ci><ci id="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.3.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.3">𝑆</ci><ci id="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.4.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.2.3.4">𝑃</ci></apply></apply><apply id="S4.E5.m1.3.3.3.3.2.2.2.2.3.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3"><csymbol cd="ambiguous" id="S4.E5.m1.3.3.3.3.2.2.2.2.3.1.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3">superscript</csymbol><apply id="S4.E5.m1.3.3.3.3.2.2.2.2.3.2.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3"><csymbol cd="ambiguous" id="S4.E5.m1.3.3.3.3.2.2.2.2.3.2.1.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3">subscript</csymbol><ci id="S4.E5.m1.3.3.3.3.2.2.2.2.3.2.2.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.2.2">𝑅</ci><ci id="S4.E5.m1.3.3.3.3.2.2.2.2.3.2.3.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.2.3">𝑂</ci></apply><apply id="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.3"><times id="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.1.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.1"></times><ci id="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.2.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.2">𝐷</ci><ci id="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.3.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.3">𝑆</ci><ci id="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.4.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.2.3.3.4">𝑃</ci></apply></apply></apply><apply id="S4.E5.m1.3.3.3.3.2.2.2.3.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.3"><csymbol cd="ambiguous" id="S4.E5.m1.3.3.3.3.2.2.2.3.1.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.3">superscript</csymbol><apply id="S4.E5.m1.3.3.3.3.2.2.2.3.2.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.3"><csymbol cd="ambiguous" id="S4.E5.m1.3.3.3.3.2.2.2.3.2.1.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.3">subscript</csymbol><ci id="S4.E5.m1.3.3.3.3.2.2.2.3.2.2.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.3.2.2">𝑈</ci><ci id="S4.E5.m1.3.3.3.3.2.2.2.3.2.3.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.3.2.3">𝑅</ci></apply><apply id="S4.E5.m1.3.3.3.3.2.2.2.3.3.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.3.3"><times id="S4.E5.m1.3.3.3.3.2.2.2.3.3.1.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.3.3.1"></times><ci id="S4.E5.m1.3.3.3.3.2.2.2.3.3.2.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.3.3.2">𝐷</ci><ci id="S4.E5.m1.3.3.3.3.2.2.2.3.3.3.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.3.3.3">𝑆</ci><ci id="S4.E5.m1.3.3.3.3.2.2.2.3.3.4.cmml" xref="S4.E5.m1.3.3.3.3.2.2.2.3.3.4">𝑃</ci></apply></apply></apply></apply></apply><apply id="S4.E5.m1.3.3.5.cmml" xref="S4.E5.m1.3.3.5"><times id="S4.E5.m1.3.3.5.1.cmml" xref="S4.E5.m1.3.3.5.1"></times><apply id="S4.E5.m1.3.3.5.2.cmml" xref="S4.E5.m1.3.3.5.2"><times id="S4.E5.m1.3.3.5.2.1.cmml" xref="S4.E5.m1.3.3.5.2.1"></times><apply id="S4.E5.m1.3.3.5.2.2.cmml" xref="S4.E5.m1.3.3.5.2.2"><csymbol cd="ambiguous" id="S4.E5.m1.3.3.5.2.2.1.cmml" xref="S4.E5.m1.3.3.5.2.2">subscript</csymbol><ci id="S4.E5.m1.3.3.5.2.2.2.cmml" xref="S4.E5.m1.3.3.5.2.2.2">𝑓</ci><apply id="S4.E5.m1.3.3.5.2.2.3.cmml" xref="S4.E5.m1.3.3.5.2.2.3"><times id="S4.E5.m1.3.3.5.2.2.3.1.cmml" xref="S4.E5.m1.3.3.5.2.2.3.1"></times><ci id="S4.E5.m1.3.3.5.2.2.3.2.cmml" xref="S4.E5.m1.3.3.5.2.2.3.2">𝐻</ci><ci id="S4.E5.m1.3.3.5.2.2.3.3.cmml" xref="S4.E5.m1.3.3.5.2.2.3.3">𝐵</ci><ci id="S4.E5.m1.3.3.5.2.2.3.4.cmml" xref="S4.E5.m1.3.3.5.2.2.3.4">𝑀</ci></apply></apply><apply id="S4.E5.m1.3.3.5.2.3.cmml" xref="S4.E5.m1.3.3.5.2.3"><csymbol cd="ambiguous" id="S4.E5.m1.3.3.5.2.3.1.cmml" xref="S4.E5.m1.3.3.5.2.3">subscript</csymbol><ci id="S4.E5.m1.3.3.5.2.3.2.cmml" xref="S4.E5.m1.3.3.5.2.3.2">𝑊</ci><apply id="S4.E5.m1.3.3.5.2.3.3.cmml" xref="S4.E5.m1.3.3.5.2.3.3"><times id="S4.E5.m1.3.3.5.2.3.3.1.cmml" xref="S4.E5.m1.3.3.5.2.3.3.1"></times><ci id="S4.E5.m1.3.3.5.2.3.3.2.cmml" xref="S4.E5.m1.3.3.5.2.3.3.2">𝐻</ci><ci id="S4.E5.m1.3.3.5.2.3.3.3.cmml" xref="S4.E5.m1.3.3.5.2.3.3.3">𝐵</ci><ci id="S4.E5.m1.3.3.5.2.3.3.4.cmml" xref="S4.E5.m1.3.3.5.2.3.3.4">𝑀</ci></apply></apply><ci id="S4.E5.m1.3.3.5.2.4.cmml" xref="S4.E5.m1.3.3.5.2.4">𝐶</ci></apply><apply id="S4.E5.m1.3.3.5.3.cmml" xref="S4.E5.m1.3.3.5.3"><csymbol cd="ambiguous" id="S4.E5.m1.3.3.5.3.1.cmml" xref="S4.E5.m1.3.3.5.3">subscript</csymbol><ci id="S4.E5.m1.3.3.5.3.2.cmml" xref="S4.E5.m1.3.3.5.3.2">ℎ</ci><apply id="S4.E5.m1.3.3.5.3.3.cmml" xref="S4.E5.m1.3.3.5.3.3"><times id="S4.E5.m1.3.3.5.3.3.1.cmml" xref="S4.E5.m1.3.3.5.3.3.1"></times><ci id="S4.E5.m1.3.3.5.3.3.2.cmml" xref="S4.E5.m1.3.3.5.3.3.2">𝐻</ci><ci id="S4.E5.m1.3.3.5.3.3.3.cmml" xref="S4.E5.m1.3.3.5.3.3.3">𝐵</ci><ci id="S4.E5.m1.3.3.5.3.3.4.cmml" xref="S4.E5.m1.3.3.5.3.3.4">𝑀</ci></apply></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.E5.m1.3c">M_{b}=\frac{C_{FPGA}}{B_{HBM}}=\frac{f_{imp}\times\min\left(\frac{R_{A}^{LUT}}% {R_{O}^{LUT}}\times U_{R}^{LUT},\frac{R_{A}^{DSP}}{R_{O}^{DSP}}\times U_{R}^{% DSP}\right)}{f_{HBM}\times W_{HBM}\times Ch_{HBM}}</annotation><annotation encoding="application/x-llamapun" id="S4.E5.m1.3d">italic_M start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = divide start_ARG italic_C start_POSTSUBSCRIPT italic_F italic_P italic_G italic_A end_POSTSUBSCRIPT end_ARG start_ARG italic_B start_POSTSUBSCRIPT italic_H italic_B italic_M end_POSTSUBSCRIPT end_ARG = divide start_ARG italic_f start_POSTSUBSCRIPT italic_i italic_m italic_p end_POSTSUBSCRIPT × roman_min ( divide start_ARG italic_R start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L italic_U italic_T end_POSTSUPERSCRIPT end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L italic_U italic_T end_POSTSUPERSCRIPT end_ARG × italic_U start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L italic_U italic_T end_POSTSUPERSCRIPT , divide start_ARG italic_R start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D italic_S italic_P end_POSTSUPERSCRIPT end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D italic_S italic_P end_POSTSUPERSCRIPT end_ARG × italic_U start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D italic_S italic_P end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_H italic_B italic_M end_POSTSUBSCRIPT × italic_W start_POSTSUBSCRIPT italic_H italic_B italic_M end_POSTSUBSCRIPT × italic_C italic_h start_POSTSUBSCRIPT italic_H italic_B italic_M end_POSTSUBSCRIPT end_ARG</annotation></semantics></math></td> <td class="ltx_eqn_cell ltx_eqn_center_padright"></td> <td class="ltx_eqn_cell ltx_eqn_eqno ltx_align_middle ltx_align_right" rowspan="1"><span class="ltx_tag ltx_tag_equation ltx_align_right">(5)</span></td> </tr></tbody> </table> </div> <div class="ltx_para" id="S4.SS2.p11"> <p class="ltx_p" id="S4.SS2.p11.3">By placing our kernel’s arithmetic intensity <math alttext="I" class="ltx_Math" display="inline" id="S4.SS2.p11.1.m1.1"><semantics id="S4.SS2.p11.1.m1.1a"><mi id="S4.SS2.p11.1.m1.1.1" xref="S4.SS2.p11.1.m1.1.1.cmml">I</mi><annotation-xml encoding="MathML-Content" id="S4.SS2.p11.1.m1.1b"><ci id="S4.SS2.p11.1.m1.1.1.cmml" xref="S4.SS2.p11.1.m1.1.1">𝐼</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p11.1.m1.1c">I</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p11.1.m1.1d">italic_I</annotation></semantics></math> on the Roofline plot and comparing it to <math alttext="M_{b}" class="ltx_Math" display="inline" id="S4.SS2.p11.2.m2.1"><semantics id="S4.SS2.p11.2.m2.1a"><msub id="S4.SS2.p11.2.m2.1.1" xref="S4.SS2.p11.2.m2.1.1.cmml"><mi id="S4.SS2.p11.2.m2.1.1.2" xref="S4.SS2.p11.2.m2.1.1.2.cmml">M</mi><mi id="S4.SS2.p11.2.m2.1.1.3" xref="S4.SS2.p11.2.m2.1.1.3.cmml">b</mi></msub><annotation-xml encoding="MathML-Content" id="S4.SS2.p11.2.m2.1b"><apply id="S4.SS2.p11.2.m2.1.1.cmml" xref="S4.SS2.p11.2.m2.1.1"><csymbol cd="ambiguous" id="S4.SS2.p11.2.m2.1.1.1.cmml" xref="S4.SS2.p11.2.m2.1.1">subscript</csymbol><ci id="S4.SS2.p11.2.m2.1.1.2.cmml" xref="S4.SS2.p11.2.m2.1.1.2">𝑀</ci><ci id="S4.SS2.p11.2.m2.1.1.3.cmml" xref="S4.SS2.p11.2.m2.1.1.3">𝑏</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p11.2.m2.1c">M_{b}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p11.2.m2.1d">italic_M start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT</annotation></semantics></math>, we can determine if it is operating in a memory-bound or compute-bound region for our particular FPGA implementation. This helps guide subsequent optimizations, either by increasing arithmetic intensity (e.g., reusing data to reduce memory traffic) or by improving the resource utilization and frequency (to push <math alttext="C_{FPGA}" class="ltx_Math" display="inline" id="S4.SS2.p11.3.m3.1"><semantics id="S4.SS2.p11.3.m3.1a"><msub id="S4.SS2.p11.3.m3.1.1" xref="S4.SS2.p11.3.m3.1.1.cmml"><mi id="S4.SS2.p11.3.m3.1.1.2" xref="S4.SS2.p11.3.m3.1.1.2.cmml">C</mi><mrow id="S4.SS2.p11.3.m3.1.1.3" xref="S4.SS2.p11.3.m3.1.1.3.cmml"><mi id="S4.SS2.p11.3.m3.1.1.3.2" xref="S4.SS2.p11.3.m3.1.1.3.2.cmml">F</mi><mo id="S4.SS2.p11.3.m3.1.1.3.1" xref="S4.SS2.p11.3.m3.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p11.3.m3.1.1.3.3" xref="S4.SS2.p11.3.m3.1.1.3.3.cmml">P</mi><mo id="S4.SS2.p11.3.m3.1.1.3.1a" xref="S4.SS2.p11.3.m3.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p11.3.m3.1.1.3.4" xref="S4.SS2.p11.3.m3.1.1.3.4.cmml">G</mi><mo id="S4.SS2.p11.3.m3.1.1.3.1b" xref="S4.SS2.p11.3.m3.1.1.3.1.cmml">⁢</mo><mi id="S4.SS2.p11.3.m3.1.1.3.5" xref="S4.SS2.p11.3.m3.1.1.3.5.cmml">A</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S4.SS2.p11.3.m3.1b"><apply id="S4.SS2.p11.3.m3.1.1.cmml" xref="S4.SS2.p11.3.m3.1.1"><csymbol cd="ambiguous" id="S4.SS2.p11.3.m3.1.1.1.cmml" xref="S4.SS2.p11.3.m3.1.1">subscript</csymbol><ci id="S4.SS2.p11.3.m3.1.1.2.cmml" xref="S4.SS2.p11.3.m3.1.1.2">𝐶</ci><apply id="S4.SS2.p11.3.m3.1.1.3.cmml" xref="S4.SS2.p11.3.m3.1.1.3"><times id="S4.SS2.p11.3.m3.1.1.3.1.cmml" xref="S4.SS2.p11.3.m3.1.1.3.1"></times><ci id="S4.SS2.p11.3.m3.1.1.3.2.cmml" xref="S4.SS2.p11.3.m3.1.1.3.2">𝐹</ci><ci id="S4.SS2.p11.3.m3.1.1.3.3.cmml" xref="S4.SS2.p11.3.m3.1.1.3.3">𝑃</ci><ci id="S4.SS2.p11.3.m3.1.1.3.4.cmml" xref="S4.SS2.p11.3.m3.1.1.3.4">𝐺</ci><ci id="S4.SS2.p11.3.m3.1.1.3.5.cmml" xref="S4.SS2.p11.3.m3.1.1.3.5">𝐴</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p11.3.m3.1c">C_{FPGA}</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p11.3.m3.1d">italic_C start_POSTSUBSCRIPT italic_F italic_P italic_G italic_A end_POSTSUBSCRIPT</annotation></semantics></math> closer to its theoretical peak).</p> </div> <section class="ltx_subsubsection" id="S4.SS2.SSS1"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection">4.2.1 </span>Theoretical Performance and Bandwidth.</h4> <div class="ltx_para" id="S4.SS2.SSS1.p1"> <p class="ltx_p" id="S4.SS2.SSS1.p1.1">In this work, we implemented the kernel with single floating-point precision, albeit future work can easily use other number representations. The theoretical peak performance can be estimated by using a multiply-accumulation operation that consists of one addition and one multiplication. This method is similar to the evaluation in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib4" title="">4</a>]</cite>. Based on the report resource utilization for floating-point by Xilinx <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib36" title="">36</a>]</cite> for our <span class="ltx_glossaryref" title="">FPGA</span>, the addition operation requires 192 LUTs and 2 DSPs, whereas the multiplication operation requires 74 LUTs and 3 DSPs. On the other hand, Xilinx Alveo U55C consists of 1146240 LUTs and 8376 DSPs. Therefore, the computation performance <math alttext="C" class="ltx_Math" display="inline" id="S4.SS2.SSS1.p1.1.m1.1"><semantics id="S4.SS2.SSS1.p1.1.m1.1a"><mi id="S4.SS2.SSS1.p1.1.m1.1.1" xref="S4.SS2.SSS1.p1.1.m1.1.1.cmml">C</mi><annotation-xml encoding="MathML-Content" id="S4.SS2.SSS1.p1.1.m1.1b"><ci id="S4.SS2.SSS1.p1.1.m1.1.1.cmml" xref="S4.SS2.SSS1.p1.1.m1.1.1">𝐶</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.SSS1.p1.1.m1.1c">C</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.SSS1.p1.1.m1.1d">italic_C</annotation></semantics></math> for frequency implementation 100 MHz with an assumption utilization maximum of 80% is 288.77 GFLOPs/s. Moreover, the Xilinx Alveo U55C HBM has 32 pseudo channels with bit-width 256 and runs normally at 450 Mhz. so the maximum bandwidth of HBM is 460 GB/s <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib35" title="">35</a>]</cite>.</p> </div> </section> </section> </section> <section class="ltx_section" id="S5"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">5 </span>Experimental Setup</h2> <div class="ltx_para" id="S5.p1"> <p class="ltx_p" id="S5.p1.1">We implemented the <span class="ltx_glossaryref" title="">BCPNN</span> kernel with three distinct models, each with a different dataset and network configuration, to demonstrate its reconfigurability and adaptability (albeit, nothing limits our framework for creating accelerators with other models). As shown in Table <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S5.T1" title="Table 1 ‣ 5 Experimental Setup ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_tag">1</span></a>, these models vary in terms of input dimension, hidden layer size, output classes, dataset scale, and the number of epochs used for unsupervised training. The parameter <span class="ltx_text ltx_font_italic" id="S5.p1.1.1">nactHi</span> defines the sparsity of the input for both with and without structural plasticity. In our approach, we adopt a semi-unsupervised setup: the epoch count listed pertains to the unsupervised training phase, while the supervised training phase is performed once per configuration. MNIST comprises 28x28 grayscale images of handwritten digits from 0 to 9 <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib14" title="">14</a>]</cite>. The Pneumonia and Breast dataset are the part of MedMNIST dataset <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib37" title="">37</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib38" title="">38</a>]</cite>. The Pneumonia dataset includes pediatric chest X-ray images and focuses on a binary classification task: distinguishing healthy (normal) cases from pneumonia-infected lungs <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib37" title="">37</a>]</cite>. The Breast dataset contains ultrasound images originally split into three classes (normal, benign, and malignant). For our binary classification, we combined normal and benign into a single positive category and treated malignant cases as negative <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib37" title="">37</a>]</cite>. <span class="ltx_text ltx_font_italic" id="S5.p1.1.2">This is the first time the BCPNN theory has been applied to the pneumonia and medical breast use-cases.</span></p> </div> <figure class="ltx_table" id="S5.T1"> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 1: </span>Model Configurations and Dataset Details</figcaption> <table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle" id="S5.T1.1"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S5.T1.1.1.1"> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r" id="S5.T1.1.1.1.1"><span class="ltx_text ltx_font_bold" id="S5.T1.1.1.1.1.1">Model</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r" id="S5.T1.1.1.1.2"><span class="ltx_text ltx_font_bold" id="S5.T1.1.1.1.2.1">Dataset</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r" id="S5.T1.1.1.1.3"><span class="ltx_text ltx_font_bold" id="S5.T1.1.1.1.3.1">Input size</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_l ltx_border_r" colspan="2" id="S5.T1.1.1.1.4"><span class="ltx_text ltx_font_bold" id="S5.T1.1.1.1.4.1">Hidden Layer</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r" id="S5.T1.1.1.1.5"><span class="ltx_text ltx_font_bold" id="S5.T1.1.1.1.5.1">nactHi</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r" id="S5.T1.1.1.1.6"><span class="ltx_text ltx_font_bold" id="S5.T1.1.1.1.6.1">Out</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_l ltx_border_r" colspan="2" id="S5.T1.1.1.1.7"><span class="ltx_text ltx_font_bold" id="S5.T1.1.1.1.7.1">Data size</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S5.T1.1.1.1.8"><span class="ltx_text ltx_font_bold" id="S5.T1.1.1.1.8.1">Epoch</span></th> </tr> <tr class="ltx_tr" id="S5.T1.1.2.2"> <th class="ltx_td ltx_th ltx_th_column ltx_border_r" id="S5.T1.1.2.2.1"></th> <th class="ltx_td ltx_th ltx_th_column ltx_border_r" id="S5.T1.1.2.2.2"></th> <th class="ltx_td ltx_th ltx_th_column ltx_border_r" id="S5.T1.1.2.2.3"></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T1.1.2.2.4"><span class="ltx_text ltx_font_bold" id="S5.T1.1.2.2.4.1">Hyper</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T1.1.2.2.5"><span class="ltx_text ltx_font_bold" id="S5.T1.1.2.2.5.1">Mini</span></th> <th class="ltx_td ltx_th ltx_th_column ltx_border_r" id="S5.T1.1.2.2.6"></th> <th class="ltx_td ltx_th ltx_th_column ltx_border_r" id="S5.T1.1.2.2.7"></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T1.1.2.2.8"><span class="ltx_text ltx_font_bold" id="S5.T1.1.2.2.8.1">Train</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T1.1.2.2.9"><span class="ltx_text ltx_font_bold" id="S5.T1.1.2.2.9.1">Test</span></th> <th class="ltx_td ltx_th ltx_th_column" id="S5.T1.1.2.2.10"></th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S5.T1.1.3.1"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S5.T1.1.3.1.1">Model 1</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S5.T1.1.3.1.2">MNIST</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S5.T1.1.3.1.3">28x28</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S5.T1.1.3.1.4">32</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S5.T1.1.3.1.5">128</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S5.T1.1.3.1.6">128</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S5.T1.1.3.1.7">10</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S5.T1.1.3.1.8">60000</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S5.T1.1.3.1.9">10000</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S5.T1.1.3.1.10">5</td> </tr> <tr class="ltx_tr" id="S5.T1.1.4.2"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T1.1.4.2.1">Model 2</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T1.1.4.2.2">Pneumonia</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T1.1.4.2.3">28x28</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T1.1.4.2.4">32</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T1.1.4.2.5">256</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T1.1.4.2.6">128</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T1.1.4.2.7">2</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T1.1.4.2.8">4708</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T1.1.4.2.9">624</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T1.1.4.2.10">20</td> </tr> <tr class="ltx_tr" id="S5.T1.1.5.3"> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T1.1.5.3.1">Model 3</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T1.1.5.3.2">Breast</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T1.1.5.3.3">64x64</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T1.1.5.3.4">32</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T1.1.5.3.5">128</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T1.1.5.3.6">128</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T1.1.5.3.7">2</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T1.1.5.3.8">546</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T1.1.5.3.9">156</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S5.T1.1.5.3.10">100</td> </tr> </tbody> </table> </figure> <div class="ltx_para" id="S5.p2"> <p class="ltx_p" id="S5.p2.1">To benchmark performance, we deployed our <span class="ltx_glossaryref" title="">BCPNN</span> kernel on an AMD Xilinx Alveo U55C <span class="ltx_glossaryref" title="">FPGA</span>, using the AMD Vitis Unified software platform v2023.2 and XRT v2.16.204. For comparison, we ran equivalent <span class="ltx_glossaryref" title="">CPU</span> and <span class="ltx_glossaryref" title="">GPU</span> implementations with identical configurations. The <span class="ltx_glossaryref" title="">CPU</span> experiments were conducted on an Intel Xeon Silver 4514Y, compiled with g++ 11.4.0 and optimized using the -O3 flag on a single <span class="ltx_glossaryref" title="">CPU</span> core. For the <span class="ltx_glossaryref" title="">GPU</span>, we used Nvidia A100 and compiled it with CUDA 12.6.0 with optimization (-O3). We utilized the <span class="ltx_glossaryref" title="">GPU</span> node from the High-Performance Computer Alvis<cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib1" title="">1</a>]</cite>. Moreover, we compared the three implementations in terms of latency, power, and energy. <span class="ltx_glossaryref" title="">CPU</span> power was not measured due to unsupported interfaces, and <span class="ltx_glossaryref" title="">GPU</span> power was recorded using the visualization tools provided by the Alvis cluster <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib1" title="">1</a>]</cite>. The <span class="ltx_glossaryref" title="">FPGA</span> measurements relied on real-time reporting through the XRT tool, ensuring accurate and direct observation of power usage.</p> </div> <div class="ltx_para" id="S5.p3"> <p class="ltx_p" id="S5.p3.1">Our reference implementation, written by domain experts, is in C/C++ with a CUDA backend for GPU acceleration. We modified the code to rely solely on the BCP layer for supervised learning and selected model sizes that align both with <span class="ltx_glossaryref" title="">FPGA</span> resource constraints and dataset requirements. The <span class="ltx_glossaryref" title="">GPU</span> implementation was similarly optimized using standard techniques and restricted to a single <span class="ltx_glossaryref" title="">GPU</span>, ensuring a balanced comparison and a clear understanding of the efficiency gains offered by our <span class="ltx_glossaryref" title="">FPGA</span>-based solution.</p> </div> <div class="ltx_para" id="S5.p4"> <p class="ltx_p" id="S5.p4.1">We employed the implementation strategy <span class="ltx_text ltx_font_italic" id="S5.p4.1.1">Performance_BalanceSLRs</span> to distribute logic evenly across the three <span class="ltx_glossaryref" title="">Super Logic Region (SLR)</span> of the <span class="ltx_glossaryref" title="">FPGA</span>, mitigating routing congestion and improving achievable clock frequencies. The <span class="ltx_glossaryref" title="">FPGA</span> that we used has three <span class="ltx_glossaryref" title="">SLR</span>. Moreover, frequency selection was an iterative process, where resource utilization and routing complexity influenced the final operating speed.</p> </div> </section> <section class="ltx_section" id="S6"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">6 </span>Result</h2> <div class="ltx_para" id="S6.p1"> <p class="ltx_p" id="S6.p1.1">Next, we evaluate our contributions in four areas: correctness, performance, analysis, and resource utilization.</p> </div> <section class="ltx_subsection" id="S6.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">6.1 </span>Correctness</h3> <div class="ltx_para" id="S6.SS1.p1"> <p class="ltx_p" id="S6.SS1.p1.1">As shown in Table <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S6.T2" title="Table 2 ‣ 6.2 Performance ‣ 6 Result ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_tag">2</span></a>, the FPGA implementation achieves virtually the same accuracy as the reference CPU and GPU versions, confirming that the stream-driven dataflow architecture preserves the correctness of the C++ reference code. Across all models, accuracy differences are negligible, typically fractions of a percentage point. These minor discrepancies are primarily due to compiler optimizations (e.g. <span class="ltx_text ltx_font_typewriter" id="S6.SS1.p1.1.1">unsafe-math-optimizations</span>) and slight variations in random number generation. Such factors can introduce small floating-point rounding differences and nonidentical data sampling patterns compared to CPU and GPU platforms. Importantly, these variations do not affect the underlying BCPNN algorithm or its probabilistic learning rules. The FPGA-based accelerator still accurately replicates the intended model behaviour. Moreover, the important takeover, the test accuracy for the Pneumonia and Breast dataset is comparable with the accuracy from the CNN-based models that are reported in <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib37" title="">37</a>]</cite>. Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S6.F5" title="Figure 5 ‣ 6.1 Correctness ‣ 6 Result ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_tag">5</span></a> shows the receptive field of one HC and how it evolves with time, indicating that structural plasticity works as intended and in line with prior work <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#bib.bib24" title="">24</a>]</cite>.</p> </div> <figure class="ltx_figure" id="S6.F5"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="302" id="S6.F5.g1" src="x1.png" width="941"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 5: </span>Structural plasticity can modify a hypercolumns receptive (or visual) field to extract most information from the data. Here we show how one receptive field in a HC change as a function of time, from a random (left) to a more refined (right) field.</figcaption> </figure> </section> <section class="ltx_subsection" id="S6.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">6.2 </span>Performance</h3> <figure class="ltx_table" id="S6.T2"> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table">Table 2: </span>Comparison of Model Implementations on Different Platforms (<span class="ltx_text ltx_font_bold" id="S6.T2.6.1">infer</span>=inference only, <span class="ltx_text ltx_font_bold" id="S6.T2.7.2">train</span>=w/training, <span class="ltx_text ltx_font_bold" id="S6.T2.8.3">struct</span>=w/train+structural plasticity,<span class="ltx_text ltx_font_bold" id="S6.T2.9.4">acc.</span>=accuracy, <span class="ltx_text ltx_font_bold" id="S6.T2.10.5">-</span>=not available)</figcaption> <table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle" id="S6.T2.11"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S6.T2.11.1.1"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r" id="S6.T2.11.1.1.1"><span class="ltx_text ltx_font_bold" id="S6.T2.11.1.1.1.1">Model</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_row" id="S6.T2.11.1.1.2"><span class="ltx_text ltx_font_bold" id="S6.T2.11.1.1.2.1">Type</span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.1.1.3"><span class="ltx_text ltx_font_bold" id="S6.T2.11.1.1.3.1">Metric</span></th> <td class="ltx_td ltx_align_center" id="S6.T2.11.1.1.4"><span class="ltx_text ltx_font_bold" id="S6.T2.11.1.1.4.1">Unit</span></td> <td class="ltx_td ltx_align_center" id="S6.T2.11.1.1.5"><span class="ltx_text ltx_font_bold" id="S6.T2.11.1.1.5.1">CPU</span></td> <td class="ltx_td ltx_align_center" id="S6.T2.11.1.1.6"><span class="ltx_text ltx_font_bold" id="S6.T2.11.1.1.6.1">GPU</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.1.1.7"><span class="ltx_text ltx_font_bold" id="S6.T2.11.1.1.7.1">FPGA</span></td> <td class="ltx_td ltx_align_center" id="S6.T2.11.1.1.8"><span class="ltx_text ltx_font_bold" id="S6.T2.11.1.1.8.1">Impr.(over GPU)</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.2.2"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_tt" id="S6.T2.11.2.2.1" rowspan="10"><span class="ltx_text" id="S6.T2.11.2.2.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.2.2.1.1.1" style="width:6.9pt;height:35.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:35.6pt;transform:translate(-14.31pt,-14.31pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.2.2.1.1.1.1">Model 1</span> </span></span></span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_tt" id="S6.T2.11.2.2.2" rowspan="2"><span class="ltx_text" id="S6.T2.11.2.2.2.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.2.2.2.1.1" style="width:6.9pt;height:20.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:20.6pt;transform:translate(-6.82pt,-6.82pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.2.2.2.1.1.1">Infer</span> </span></span></span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_tt" id="S6.T2.11.2.2.3">Latency</th> <td class="ltx_td ltx_align_center ltx_border_tt" id="S6.T2.11.2.2.4">ms</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S6.T2.11.2.2.5">2.644</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S6.T2.11.2.2.6">1.495</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S6.T2.11.2.2.7">0.280</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S6.T2.11.2.2.8"><span class="ltx_text" id="S6.T2.11.2.2.8.1" style="color:#008000;">+5.3x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.3.3"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.3.3.1">Energy/img</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.3.3.2">mJ</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.3.3.3">-</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.3.3.4">124.4</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.3.3.5">7.5</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.3.3.6"><span class="ltx_text" id="S6.T2.11.3.3.6.1" style="color:#008000;">+16.5x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.4.4"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.4.4.1" rowspan="3"><span class="ltx_text" id="S6.T2.11.4.4.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.4.4.1.1.1" style="width:6.8pt;height:23.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:23.6pt;transform:translate(-8.4pt,-8.4pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.4.4.1.1.1.1">Train</span> </span></span></span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.4.4.2">Latency</th> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.4.4.3">ms</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.4.4.4">13.610</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.4.4.5">1.497</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T2.11.4.4.6">0.422</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.4.4.7"><span class="ltx_text" id="S6.T2.11.4.4.7.1" style="color:#008000;">+3.54x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.5.5"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.5.5.1">Energy/img</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.5.5.2">mJ</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.5.5.3">-</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.5.5.4">124.6</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.5.5.5">11.3</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.5.5.6"><span class="ltx_text" id="S6.T2.11.5.5.6.1" style="color:#008000;">+11.02x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.6.6"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.6.6.1">Total time</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.6.6.2">s</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.6.6.3">4302.9</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.6.6.4">572.2</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.6.6.5">314.9</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.6.6.6"><span class="ltx_text" id="S6.T2.11.6.6.6.1" style="color:#008000;">+1.81x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.7.7"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.7.7.1" rowspan="3"><span class="ltx_text" id="S6.T2.11.7.7.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.7.7.1.1.1" style="width:6.8pt;height:27.3pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:27.3pt;transform:translate(-10.21pt,-10.21pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.7.7.1.1.1.1">Struct</span> </span></span></span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.7.7.2">Latency</th> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.7.7.3">ms</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.7.7.4">40.362</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.7.7.5">1.520</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T2.11.7.7.6">0.508</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.7.7.7"><span class="ltx_text" id="S6.T2.11.7.7.7.1" style="color:#008000;">+2.99x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.8.8"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.8.8.1">Energy/img</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.8.8.2">mJ</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.8.8.3">-</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.8.8.4">126.5</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.8.8.5">13.7</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.8.8.6"><span class="ltx_text" id="S6.T2.11.8.8.6.1" style="color:#008000;">+9.23x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.9.9"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.9.9.1">Total time</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.9.9.2">s</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.9.9.3">13286.8</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.9.9.4">621.6</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.9.9.5">473.9</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.9.9.6"><span class="ltx_text" id="S6.T2.11.9.9.6.1" style="color:#008000;">+1.31x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.10.10"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.10.10.1" rowspan="3"><span class="ltx_text" id="S6.T2.11.10.10.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.10.10.1.1.1" style="width:6.9pt;height:25.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:25.6pt;transform:translate(-9.32pt,-9.32pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.10.10.1.1.1.1">Other</span> </span></span></span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.10.10.2">Train acc.</th> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.10.10.3">%</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.10.10.4">94.5</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.10.10.5">94.6</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T2.11.10.10.6">94.5</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.10.10.7">-</td> </tr> <tr class="ltx_tr" id="S6.T2.11.11.11"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.11.11.1">Test acc.</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.11.11.2">%</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.11.11.3">94.6</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.11.11.4">94.5</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.11.11.5">94.5</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.11.11.6">-</td> </tr> <tr class="ltx_tr" id="S6.T2.11.12.12"> <th class="ltx_td ltx_th ltx_th_row ltx_border_r" id="S6.T2.11.12.12.1"></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.12.12.2">Power (W)</th> <td class="ltx_td" id="S6.T2.11.12.12.3"></td> <td class="ltx_td ltx_align_center" id="S6.T2.11.12.12.4">-</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.12.12.5">83.2</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.12.12.6">27.0</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.12.12.7"><span class="ltx_text" id="S6.T2.11.12.12.7.1" style="color:#008000;">-3.08x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.13.13"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_tt" id="S6.T2.11.13.13.1" rowspan="10"><span class="ltx_text" id="S6.T2.11.13.13.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.13.13.1.1.1" style="width:6.9pt;height:35.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:35.6pt;transform:translate(-14.31pt,-14.31pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.13.13.1.1.1.1">Model 2</span> </span></span></span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_tt" id="S6.T2.11.13.13.2" rowspan="2"><span class="ltx_text" id="S6.T2.11.13.13.2.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.13.13.2.1.1" style="width:6.9pt;height:20.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:20.6pt;transform:translate(-6.82pt,-6.82pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.13.13.2.1.1.1">Infer</span> </span></span></span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_tt" id="S6.T2.11.13.13.3">Latency</th> <td class="ltx_td ltx_align_center ltx_border_tt" id="S6.T2.11.13.13.4">ms</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S6.T2.11.13.13.5">4.721</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S6.T2.11.13.13.6">1.633</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S6.T2.11.13.13.7">0.504</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S6.T2.11.13.13.8"><span class="ltx_text" id="S6.T2.11.13.13.8.1" style="color:#008000;">+3.24x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.14.14"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.14.14.1">Energy/img</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.14.14.2">mJ</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.14.14.3">-</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.14.14.4">146.6</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.14.14.5">14.2</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.14.14.6"><span class="ltx_text" id="S6.T2.11.14.14.6.1" style="color:#008000;">+10.32x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.15.15"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.15.15.1" rowspan="3"><span class="ltx_text" id="S6.T2.11.15.15.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.15.15.1.1.1" style="width:6.8pt;height:23.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:23.6pt;transform:translate(-8.4pt,-8.4pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.15.15.1.1.1.1">Train</span> </span></span></span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.15.15.2">Latency</th> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.15.15.3">ms</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.15.15.4">27.4</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.15.15.5">1.646</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T2.11.15.15.6">0.552</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.15.15.7"><span class="ltx_text" id="S6.T2.11.15.15.7.1" style="color:#008000;">+3.03x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.16.16"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.16.16.1">Energy/img</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.16.16.2">mJ</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.16.16.3">-</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.16.16.4">147.8</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.16.16.5">15.5</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.16.16.6"><span class="ltx_text" id="S6.T2.11.16.16.6.1" style="color:#008000;">+9.53x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.17.17"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.17.17.1">Total time</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.17.17.2">s</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.17.17.3">2608.5</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.17.17.4">166.1</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.17.17.5">126.7</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.17.17.6"><span class="ltx_text" id="S6.T2.11.17.17.6.1" style="color:#008000;">+1.31x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.18.18"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.18.18.1" rowspan="3"><span class="ltx_text" id="S6.T2.11.18.18.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.18.18.1.1.1" style="width:6.8pt;height:27.3pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:27.3pt;transform:translate(-10.21pt,-10.21pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.18.18.1.1.1.1">Struct</span> </span></span></span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.18.18.2">Latency</th> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.18.18.3">ms</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.18.18.4">55.258</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.18.18.5">1.631</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T2.11.18.18.6">0.609</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.18.18.7"><span class="ltx_text" id="S6.T2.11.18.18.7.1" style="color:#008000;">+2.63x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.19.19"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.19.19.1">Energy/img</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.19.19.2">mJ</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.19.19.3">-</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.19.19.4">146.5</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.19.19.5">17.1</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.19.19.6"><span class="ltx_text" id="S6.T2.11.19.19.6.1" style="color:#008000;">+8.56x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.20.20"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.20.20.1">Total time</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.20.20.2">s</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.20.20.3">5333.3</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.20.20.4">174.9</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.20.20.5">234.3</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.20.20.6"><span class="ltx_text" id="S6.T2.11.20.20.6.1" style="color:#800000;">-0.75x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.21.21"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.21.21.1" rowspan="3"><span class="ltx_text" id="S6.T2.11.21.21.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.21.21.1.1.1" style="width:6.9pt;height:25.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:25.6pt;transform:translate(-9.32pt,-9.32pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.21.21.1.1.1.1">Other</span> </span></span></span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.21.21.2">Train acc.</th> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.21.21.3">%</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.21.21.4">91.5</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.21.21.5">91.0</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T2.11.21.21.6">91.5</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.21.21.7">-</td> </tr> <tr class="ltx_tr" id="S6.T2.11.22.22"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.22.22.1">Test acc.</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.22.22.2">%</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.22.22.3">85.4</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.22.22.4">85.6</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.22.22.5">85.3</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.22.22.6">-</td> </tr> <tr class="ltx_tr" id="S6.T2.11.23.23"> <th class="ltx_td ltx_th ltx_th_row ltx_border_r" id="S6.T2.11.23.23.1"></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.23.23.2">Power (W)</th> <td class="ltx_td" id="S6.T2.11.23.23.3"></td> <td class="ltx_td ltx_align_center" id="S6.T2.11.23.23.4">-</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.23.23.5">89.8</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.23.23.6">28.1</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.23.23.7"><span class="ltx_text" id="S6.T2.11.23.23.7.1" style="color:#008000;">-3.19x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.24.24"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_tt" id="S6.T2.11.24.24.1" rowspan="10"><span class="ltx_text" id="S6.T2.11.24.24.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.24.24.1.1.1" style="width:6.9pt;height:35.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:35.6pt;transform:translate(-14.31pt,-14.31pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.24.24.1.1.1.1">Model 3</span> </span></span></span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_tt" id="S6.T2.11.24.24.2" rowspan="2"><span class="ltx_text" id="S6.T2.11.24.24.2.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.24.24.2.1.1" style="width:6.9pt;height:20.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:20.6pt;transform:translate(-6.82pt,-6.82pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.24.24.2.1.1.1">Infer</span> </span></span></span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_tt" id="S6.T2.11.24.24.3">Latency</th> <td class="ltx_td ltx_align_center ltx_border_tt" id="S6.T2.11.24.24.4">ms</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S6.T2.11.24.24.5">2.649</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S6.T2.11.24.24.6">1.541</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S6.T2.11.24.24.7">0.540</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S6.T2.11.24.24.8"><span class="ltx_text" id="S6.T2.11.24.24.8.1" style="color:#008000;">+2.75x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.25.25"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.25.25.1">Energy/img</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.25.25.2">mJ</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.25.25.3">-</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.25.25.4">105.4</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.25.25.5">14.1</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.25.25.6"><span class="ltx_text" id="S6.T2.11.25.25.6.1" style="color:#008000;">+7.48x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.26.26"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.26.26.1" rowspan="3"><span class="ltx_text" id="S6.T2.11.26.26.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.26.26.1.1.1" style="width:6.8pt;height:23.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:23.6pt;transform:translate(-8.4pt,-8.4pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.26.26.1.1.1.1">Train</span> </span></span></span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.26.26.2">Latency</th> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.26.26.3">ms</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.26.26.4">13.507</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.26.26.5">1.554</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T2.11.26.26.6">0.702</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.26.26.7"><span class="ltx_text" id="S6.T2.11.26.26.7.1" style="color:#008000;">+2.11x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.27.27"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.27.27.1">Energy/img</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.27.27.2">mJ</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.27.27.3">-</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.27.27.4">106.3</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.27.27.5">18.3</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.27.27.6"><span class="ltx_text" id="S6.T2.11.27.27.6.1" style="color:#008000;">+5.8x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.28.28"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.28.28.1">Total time</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.28.28.2">s</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.28.28.3">740.4</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.28.28.4">87.3</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.28.28.5">66.9</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.28.28.6"><span class="ltx_text" id="S6.T2.11.28.28.6.1" style="color:#008000;">+1.30x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.29.29"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.29.29.1" rowspan="3"><span class="ltx_text" id="S6.T2.11.29.29.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.29.29.1.1.1" style="width:6.8pt;height:27.3pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:27.3pt;transform:translate(-10.21pt,-10.21pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.29.29.1.1.1.1">Struct</span> </span></span></span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.29.29.2">Latency</th> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.29.29.3">ms</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.29.29.4">38.319</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.29.29.5">1.556</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T2.11.29.29.6">0.690</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.29.29.7"><span class="ltx_text" id="S6.T2.11.29.29.7.1" style="color:#008000;">+2.26x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.30.30"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.30.30.1">Energy/img</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.30.30.2">mJ</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.30.30.3">-</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.30.30.4">106.4</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.30.30.5">18.0</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.30.30.6"><span class="ltx_text" id="S6.T2.11.30.30.6.1" style="color:#008000;">+5.91x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.31.31"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.31.31.1">Total time</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.31.31.2">s</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.31.31.3">2107.6</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.31.31.4">91.6</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.31.31.5">95.1</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.31.31.6"><span class="ltx_text" id="S6.T2.11.31.31.6.1" style="color:#800000;">-0.96x</span></td> </tr> <tr class="ltx_tr" id="S6.T2.11.32.32"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.32.32.1" rowspan="3"><span class="ltx_text" id="S6.T2.11.32.32.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T2.11.32.32.1.1.1" style="width:6.9pt;height:25.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:25.6pt;transform:translate(-9.32pt,-9.32pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T2.11.32.32.1.1.1.1">Other</span> </span></span></span></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_t" id="S6.T2.11.32.32.2">Train acc.</th> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.32.32.3">%</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.32.32.4">89.1</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.32.32.5">89.7</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T2.11.32.32.6">89.7</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T2.11.32.32.7">-</td> </tr> <tr class="ltx_tr" id="S6.T2.11.33.33"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.33.33.1">Test acc.</th> <td class="ltx_td ltx_align_center" id="S6.T2.11.33.33.2">%</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.33.33.3">76.9</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.33.33.4">80.1</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.33.33.5">80.1</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.33.33.6">-</td> </tr> <tr class="ltx_tr" id="S6.T2.11.34.34"> <th class="ltx_td ltx_th ltx_th_row ltx_border_r" id="S6.T2.11.34.34.1"></th> <th class="ltx_td ltx_align_left ltx_th ltx_th_row" id="S6.T2.11.34.34.2">Power (W)</th> <td class="ltx_td" id="S6.T2.11.34.34.3"></td> <td class="ltx_td ltx_align_center" id="S6.T2.11.34.34.4">-</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.34.34.5">68.4</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T2.11.34.34.6">26.1</td> <td class="ltx_td ltx_align_center" id="S6.T2.11.34.34.7"><span class="ltx_text" id="S6.T2.11.34.34.7.1" style="color:#008000;">-2.62x</span></td> </tr> </tbody> </table> </figure> <div class="ltx_para" id="S6.SS2.p1"> <p class="ltx_p" id="S6.SS2.p1.1">Table <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S6.T2" title="Table 2 ‣ 6.2 Performance ‣ 6 Result ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_tag">2</span></a> compares the performance of each model across <span class="ltx_glossaryref" title="">CPU</span>, <span class="ltx_glossaryref" title="">GPU</span>, and <span class="ltx_glossaryref" title="">FPGA</span> platforms. The primary focus is on execution time, energy and power consumption. The <span class="ltx_glossaryref" title="">FPGA</span> implementation consistently achieves a lower average processing time per image (latency) for both training and inference compared to the <span class="ltx_glossaryref" title="">CPU</span> and <span class="ltx_glossaryref" title="">GPU</span> in all the models, reducing total execution time, or the time for executing unsupervised training with the defined epoch, one supervised training, and inference for training and testing data. Total time execution has lower improvement than the latency because it has overhead from data transfer from host to FPGA and vice versa. When the model runs with structural plasticity, every certain training computes the structural plasticity that happens in the host, which significantly adds more overhead time. It affects more when it has a small dataset; then the structural plasticity process will be relatively more frequent than the bigger dataset. That is the reason why models 2 and 3 have a slightly slower total time for the structural plasticity version compared to the GPU. However, it does not affect bigger datasets like in model 1, when the total time for structural plasticity still outperforms GPU.</p> </div> <div class="ltx_para" id="S6.SS2.p2"> <p class="ltx_p" id="S6.SS2.p2.1">In terms of power and energy consumption, the <span class="ltx_glossaryref" title="">FPGA</span> demonstrates a substantial advantage over the <span class="ltx_glossaryref" title="">GPU</span>. Whereas the <span class="ltx_glossaryref" title="">GPU</span> draws between 68.4-89.8 W, the <span class="ltx_glossaryref" title="">FPGA</span> ’s power usage hovers around 26.1–28.1 W. Energy consumptions are reduced even more for all models and versions from 5.8x to 16.5x improvement over the GPU. This efficiency, combined with competitive performance, underscores the <span class="ltx_glossaryref" title="">FPGA</span> ’s suitability for energy-constrained environments. With significantly lower power consumption, the implementation of <span class="ltx_glossaryref" title="">BCPNN</span> in an <span class="ltx_glossaryref" title="">FPGA</span> with stream-based reconfigurable architecture indicates the promising possibility of applying it to edge applications.</p> </div> </section> <section class="ltx_subsection" id="S6.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">6.3 </span>Analysis</h3> <div class="ltx_para" id="S6.SS3.p1"> <p class="ltx_p" id="S6.SS3.p1.1">Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S6.F6" title="Figure 6 ‣ 6.3 Analysis ‣ 6 Result ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_tag">6</span></a> illustrates the roofline model for the three <span class="ltx_glossaryref" title="">BCPNN</span> models that are implemented with stream-based <span class="ltx_glossaryref" title="">FPGA</span> implementation, both with and without structural plasticity. It provides valuable insights into the computational performance and memory access efficiency of the three <span class="ltx_glossaryref" title="">FPGA</span>-based models. Every model’s peak performance is derived with the assumption of maximum 80% utilization for LUT and DSP with its operating frequency. It shows only for the full version of <span class="ltx_glossaryref" title="">BCPNN</span> model implementation. None of the models achieve peak performance due to less than 80% resource usage and specific algorithmic constraints. The design process has optimized the flow of data with stream-based <span class="ltx_glossaryref" title="">FIFO</span> to make sure every resource will be maximally occupied during the operation. Using data partitioning for big arrays that are mapped to 4 pseudo channel HBM, we have <span class="ltx_glossaryref" title="">BCPNN</span> to push the models up to the peak performance. However, since there is a necessity for operation in the <span class="ltx_glossaryref" title="">BCPNN</span> algorithm to accumulate some arrays in some operations, the performance is limited.</p> </div> <figure class="ltx_figure" id="S6.F6"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="295" id="S6.F6.g1" src="extracted/6222606/RooflineModel.png" width="393"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 6: </span>Roofline model plot of our accelerators (for the different models), showing performance (y-axis) as a function of arithmetic intensity (x-axis) for our accelerators, revealing how optimized our accelerators are (given theoretical upper limit).</figcaption> </figure> <div class="ltx_para" id="S6.SS3.p2"> <p class="ltx_p" id="S6.SS3.p2.1">Model 1 has less actual performance than peak performance compared to the other models; it is because model 1 utilizes fewer hardware resources than the others. Overall, the reconfigurable design has been optimized using data-driven and array partitioning techniques in HBM. We limit the partition array in HBM to four because if we partition more, it will result in highly congested routing.</p> </div> <div class="ltx_para" id="S6.SS3.p3"> <p class="ltx_p" id="S6.SS3.p3.1">Model 2 and Model 3 have better actual performance because they utilize more hardware resources. Model 2 lies closer to peak performance because the model size combination allows it to have very high floating point operation, while the stream-based method keeps the hardware utilization optimal. On the other hand, model 3 has lower peak performance because it can only be compiled with 60 MHz because the big input image requires a large allocation on the <span class="ltx_glossaryref" title="">FIFO</span>, which results in high <span class="ltx_glossaryref" title="">BRAM</span> utilization.</p> </div> <div class="ltx_para" id="S6.SS3.p4"> <p class="ltx_p" id="S6.SS3.p4.1">The model with structural plasticity has a slightly bigger computation performance because it has additional computation for a sparsity array from the receptive field. It adds the need for bandwidth with an additional channel in HBM for 14.4 GB/s compared to the model without structural plasticity.</p> </div> <div class="ltx_para" id="S6.SS3.p5"> <p class="ltx_p" id="S6.SS3.p5.1">In summary, the <span class="ltx_glossaryref" title="">FPGA</span>-based <span class="ltx_glossaryref" title="">BCPNN</span> implementation balances resource constraints and computational efficiency through a dataflow streaming approach and memory partitioning strategies. While the <span class="ltx_glossaryref" title="">FPGA</span> may not always outperform a well-optimized <span class="ltx_glossaryref" title="">GPU</span> in total execution time for every model, it consistently delivers lower power consumption and often competitive or superior per-image processing rates. The roofline analysis confirms that while current optimizations have moved the design closer to its theoretical limits, some algorithmic and architectural constraints remain. Addressing these constraints, such as exploring different partitioning factors or optimizing data access patterns, may further improve performance and resource efficiency in future implementations.</p> </div> </section> <section class="ltx_subsection" id="S6.SS4"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">6.4 </span>Resource Consumption</h3> <div class="ltx_para" id="S6.SS4.p1"> <p class="ltx_p" id="S6.SS4.p1.1">We evaluated the three versions of resource consumption of the <span class="ltx_glossaryref" title="">BCPNN</span> kernel from every model. The first version is a full-featured kernel supporting unsupervised, supervised, and inference modes but without structural plasticity. The second version is a full-featured kernel with structural plasticity. The third version is an inference-only kernel. The inference kernel’s reduced complexity enables higher operating frequencies and lower resource utilization. This design choice makes it suitable for edge applications, where hardware resources, power, and execution time are often constrained.</p> </div> <figure class="ltx_table" id="S6.T3"> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_table">Table 3: </span>FPGA Utilization (<span class="ltx_text ltx_font_bold" id="S6.T3.4.1">infer</span>=inference only, <span class="ltx_text ltx_font_bold" id="S6.T3.5.2">train</span>=w/training, <span class="ltx_text ltx_font_bold" id="S6.T3.6.3">struct</span>=w/train+structural plasticity) </figcaption> <table class="ltx_tabular ltx_centering ltx_guessed_headers ltx_align_middle" id="S6.T3.7"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S6.T3.7.1.1"> <th class="ltx_td ltx_th ltx_th_column ltx_th_row ltx_border_r" id="S6.T3.7.1.1.1" style="padding:1pt 3.0pt;"></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r" id="S6.T3.7.1.1.2" style="padding:1pt 3.0pt;"><span class="ltx_text ltx_font_bold" id="S6.T3.7.1.1.2.1">Version</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r" id="S6.T3.7.1.1.3" style="padding:1pt 3.0pt;"><span class="ltx_text ltx_font_bold" id="S6.T3.7.1.1.3.1">LUT</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r" id="S6.T3.7.1.1.4" style="padding:1pt 3.0pt;"><span class="ltx_text ltx_font_bold" id="S6.T3.7.1.1.4.1">FF</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r" id="S6.T3.7.1.1.5" style="padding:1pt 3.0pt;"><span class="ltx_text ltx_font_bold" id="S6.T3.7.1.1.5.1">DSP</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r" id="S6.T3.7.1.1.6" style="padding:1pt 3.0pt;"><span class="ltx_text ltx_font_bold" id="S6.T3.7.1.1.6.1">BRAM</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column" id="S6.T3.7.1.1.7" style="padding:1pt 3.0pt;"><span class="ltx_text ltx_font_bold" id="S6.T3.7.1.1.7.1">Frequency</span></th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S6.T3.7.2.1"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_tt" id="S6.T3.7.2.1.1" rowspan="3" style="padding:1pt 3.0pt;"><span class="ltx_text" id="S6.T3.7.2.1.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T3.7.2.1.1.1.1" style="width:6.9pt;height:35.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:35.6pt;transform:translate(-14.31pt,-14.31pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T3.7.2.1.1.1.1.1">Model 1</span> </span></span></span></th> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S6.T3.7.2.1.2" style="padding:1pt 3.0pt;">Infer</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S6.T3.7.2.1.3" style="padding:1pt 3.0pt;">174400 (15%)</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S6.T3.7.2.1.4" style="padding:1pt 3.0pt;">257462 (11%)</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S6.T3.7.2.1.5" style="padding:1pt 3.0pt;">550 (7%)</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_tt" id="S6.T3.7.2.1.6" style="padding:1pt 3.0pt;">327.5 (18%)</td> <td class="ltx_td ltx_align_center ltx_border_tt" id="S6.T3.7.2.1.7" style="padding:1pt 3.0pt;">200.0 MHz</td> </tr> <tr class="ltx_tr" id="S6.T3.7.3.2"> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.3.2.1" style="padding:1pt 3.0pt;">Train</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.3.2.2" style="padding:1pt 3.0pt;">454024 (40%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.3.2.3" style="padding:1pt 3.0pt;">546419 (24%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.3.2.4" style="padding:1pt 3.0pt;">3573 (43%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.3.2.5" style="padding:1pt 3.0pt;">437.5 (25%)</td> <td class="ltx_td ltx_align_center" id="S6.T3.7.3.2.6" style="padding:1pt 3.0pt;">150.0 MHz</td> </tr> <tr class="ltx_tr" id="S6.T3.7.4.3"> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.4.3.1" style="padding:1pt 3.0pt;">Struct</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.4.3.2" style="padding:1pt 3.0pt;">475074 (41%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.4.3.3" style="padding:1pt 3.0pt;">574657 (25%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.4.3.4" style="padding:1pt 3.0pt;">3765 (45%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.4.3.5" style="padding:1pt 3.0pt;">473.5 (27%)</td> <td class="ltx_td ltx_align_center" id="S6.T3.7.4.3.6" style="padding:1pt 3.0pt;">147.3 MHz</td> </tr> <tr class="ltx_tr" id="S6.T3.7.5.4"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S6.T3.7.5.4.1" rowspan="3" style="padding:1pt 3.0pt;"><span class="ltx_text" id="S6.T3.7.5.4.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T3.7.5.4.1.1.1" style="width:6.9pt;height:35.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:35.6pt;transform:translate(-14.31pt,-14.31pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T3.7.5.4.1.1.1.1">Model 2</span> </span></span></span></th> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T3.7.5.4.2" style="padding:1pt 3.0pt;">Infer</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T3.7.5.4.3" style="padding:1pt 3.0pt;">177201 (15%)</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T3.7.5.4.4" style="padding:1pt 3.0pt;">261754 (11%)</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T3.7.5.4.5" style="padding:1pt 3.0pt;">644 (8%)</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T3.7.5.4.6" style="padding:1pt 3.0pt;">701.5 (40%)</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T3.7.5.4.7" style="padding:1pt 3.0pt;">160 MHz</td> </tr> <tr class="ltx_tr" id="S6.T3.7.6.5"> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.6.5.1" style="padding:1pt 3.0pt;">Train</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.6.5.2" style="padding:1pt 3.0pt;">459419 (40%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.6.5.3" style="padding:1pt 3.0pt;">488973 (21%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.6.5.4" style="padding:1pt 3.0pt;">3573 (43%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.6.5.5" style="padding:1pt 3.0pt;">862.5 (49%)</td> <td class="ltx_td ltx_align_center" id="S6.T3.7.6.5.6" style="padding:1pt 3.0pt;">110 MHz</td> </tr> <tr class="ltx_tr" id="S6.T3.7.7.6"> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.7.6.1" style="padding:1pt 3.0pt;">Struct</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.7.6.2" style="padding:1pt 3.0pt;">479801 (42%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.7.6.3" style="padding:1pt 3.0pt;">513057 (22%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.7.6.4" style="padding:1pt 3.0pt;">3765 (45%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.7.6.5" style="padding:1pt 3.0pt;">898.5 (51%)</td> <td class="ltx_td ltx_align_center" id="S6.T3.7.7.6.6" style="padding:1pt 3.0pt;">107.8 MHz</td> </tr> <tr class="ltx_tr" id="S6.T3.7.8.7"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S6.T3.7.8.7.1" rowspan="3" style="padding:1pt 3.0pt;"><span class="ltx_text" id="S6.T3.7.8.7.1.1"> <span class="ltx_inline-block ltx_transformed_outer" id="S6.T3.7.8.7.1.1.1" style="width:6.9pt;height:35.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="width:35.6pt;transform:translate(-14.31pt,-14.31pt) rotate(-90deg) ;"> <span class="ltx_p" id="S6.T3.7.8.7.1.1.1.1">Model 3</span> </span></span></span></th> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T3.7.8.7.2" style="padding:1pt 3.0pt;">Infer</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T3.7.8.7.3" style="padding:1pt 3.0pt;">180365 (16%)</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T3.7.8.7.4" style="padding:1pt 3.0pt;">259592 (11%)</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T3.7.8.7.5" style="padding:1pt 3.0pt;">640 (8%)</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S6.T3.7.8.7.6" style="padding:1pt 3.0pt;">1419 (80%)</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S6.T3.7.8.7.7" style="padding:1pt 3.0pt;">84.4 MHz</td> </tr> <tr class="ltx_tr" id="S6.T3.7.9.8"> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.9.8.1" style="padding:1pt 3.0pt;">Train</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.9.8.2" style="padding:1pt 3.0pt;">463580 (40%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.9.8.3" style="padding:1pt 3.0pt;">406798 (18%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.9.8.4" style="padding:1pt 3.0pt;">3573 (43%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.9.8.5" style="padding:1pt 3.0pt;">1568.5 (88%)</td> <td class="ltx_td ltx_align_center" id="S6.T3.7.9.8.6" style="padding:1pt 3.0pt;">60.0 MHz</td> </tr> <tr class="ltx_tr" id="S6.T3.7.10.9"> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.10.9.1" style="padding:1pt 3.0pt;">Struct</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.10.9.2" style="padding:1pt 3.0pt;">481731 (42%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.10.9.3" style="padding:1pt 3.0pt;">430927 (19%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.10.9.4" style="padding:1pt 3.0pt;">3765 (45%)</td> <td class="ltx_td ltx_align_center ltx_border_r" id="S6.T3.7.10.9.5" style="padding:1pt 3.0pt;">1604.5 (90%)</td> <td class="ltx_td ltx_align_center" id="S6.T3.7.10.9.6" style="padding:1pt 3.0pt;">60.0 MHz</td> </tr> </tbody> </table> </figure> <div class="ltx_para" id="S6.SS4.p2"> <p class="ltx_p" id="S6.SS4.p2.1">Table <a class="ltx_ref" href="https://arxiv.org/html/2503.01561v1#S6.T3" title="Table 3 ‣ 6.4 Resource Consumption ‣ 6 Result ‣ A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks"><span class="ltx_text ltx_ref_tag">3</span></a> presents the <span class="ltx_glossaryref" title="">FPGA</span> utilization for the three models evaluated, offering a clear comparison of their resource demands. Among the three, the inference-only kernel stands out, consuming fewer resources and achieving higher operating frequencies compared to the full kernel. This highlights its effectiveness and suitability for edge application scenarios, where inference speed and hardware efficiency are critical. Notably, the addition of the structural plasticity feature introduces a slight increase in resource consumption, demonstrating the trade-off between utilizing the feature and resource efficiency.</p> </div> <div class="ltx_para" id="S6.SS4.p3"> <p class="ltx_p" id="S6.SS4.p3.1">The resource utilization scales with model complexity. For example, Model 2’s larger minicolumn in the hidden layer necessitates a fair increase in LUTs, FFs, and DSPs, and a more pronounced rise in <span class="ltx_glossaryref" title="">BRAM</span> usage. Comparing Model 1 and 3, we see that increasing the input size from 28x28 to 64x64 significantly raises <span class="ltx_glossaryref" title="">BRAM</span> utilization due to the need to buffer and process larger input data streams. Even though the architecture uses a stream-based design, certain operations require preloading data, resulting in higher on-chip memory usage.</p> </div> </section> </section> <section class="ltx_section" id="S7"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">7 </span>Conclusion</h2> <div class="ltx_para" id="S7.p1"> <p class="ltx_p" id="S7.p1.1">In this paper, we introduced a reconfigurable stream-based FPGA accelerator for Bayesian Confidence Propagation Neural Networks (BCPNN), demonstrating its viability as a high-performance, energy-efficient platform for neuromorphic computing. <span class="ltx_text ltx_font_italic" id="S7.p1.1.1">This accelerator is currently the most power-efficient and fastest single-node implementation of the BCPNN theory</span>, opening up opportunities in deploying BCPNN for use in edge computing use-case as well as exploring computational neuroscience aspects of the theory. We achieved substantial gains over CPU and GPU equivalent implementations by leveraging a range of optimizations, such as stream-based <span class="ltx_glossaryref" title="">FIFO</span>, dataflow parallelization, and strategic HBM channel partitioning. We evaluated our accelerator using three BCPNN model sizes across MNIST, Pneumonia, and Breast Medical datasets. In all cases, the FPGA-based system maintained comparable accuracy while substantially reducing latency, power, and energy usage. For example, on the MNIST dataset, the training time per image decreased from 1.497 ms on the GPU to 0.422 ms on the FPGA, and energy consumption for the train fell from 124.6 mJ to 11.3 mJ. Similar improvements were observed for the other datasets, underscoring the robustness and scalability of our design. Overall, our FPGA accelerator achieves speedups of 1.3x to 5.3x compared to an NVIDIA A100 GPU, while simultaneously cutting power consumption by 2.62x to 3.19x and energy usage by 5.8x to 16.5x. These results indicate that an optimized FPGA-based approach can extend BCPNN deployments into resource-constrained environments where power, energy, and latency are critical. By bridging neuromorphic principles with specialized hardware design, this work moves brain-inspired models closer to real-world applications in edge and energy-sensitive systems.</p> </div> <div class="ltx_para" id="S7.p2"> <span class="ltx_ERROR undefined" id="S7.p2.1">{credits}</span> </div> <section class="ltx_subsubsection" id="S7.SS0.SSS1"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection">7.0.1 </span>Acknowledgements</h4> <div class="ltx_para" id="S7.SS0.SSS1.p1"> <p class="ltx_p" id="S7.SS0.SSS1.p1.1">This work was funded by the European Commission Directorate-General for Communications Networks, Content and Technology grant no. 101135809 (EXTRA-BRAIN), the Swedish Research Council grant no. 2021-04579 (Building Digital Brains), and the Swedish e-Science Research Centre (SeRC). The computations were enabled by resources provided by the Chalmers e-Commons at Chalmers and National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725.</p> </div> </section> <section class="ltx_subsubsection" id="S7.SS0.SSS2"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection">7.0.2 </span><span class="ltx_ERROR undefined" id="S7.SS0.SSS2.1.1">\discintname</span> </h4> <div class="ltx_para" id="S7.SS0.SSS2.p1"> <p class="ltx_p" id="S7.SS0.SSS2.p1.1">The authors have no competing interests to declare that are relevant to the content of this article.</p> </div> </section> </section> <section class="ltx_bibliography" id="bib"> <h2 class="ltx_title ltx_title_bibliography">References</h2> <ul class="ltx_biblist"> <li class="ltx_bibitem" id="bib.bib1"> <span class="ltx_tag ltx_tag_bibitem">[1]</span> <span class="ltx_bibblock"> Alvis - C3SE, <span class="ltx_ref ltx_nolink ltx_url ltx_font_typewriter ltx_ref_self">https://www.c3se.chalmers.se/about/Alvis/</span> </span> </li> <li class="ltx_bibitem" id="bib.bib2"> <span class="ltx_tag ltx_tag_bibitem">[2]</span> <span class="ltx_bibblock"> Alzubaidi, L., Zhang, J., Humaidi, A.J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M.A., Al-Amidie, M., Farhan, L.: Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. Journal of big Data <span class="ltx_text ltx_font_bold" id="bib.bib2.1.1">8</span>, 1–74 (2021) </span> </li> <li class="ltx_bibitem" id="bib.bib3"> <span class="ltx_tag ltx_tag_bibitem">[3]</span> <span class="ltx_bibblock"> Bate, A., Lindquist, M., Edwards, I.R., Olsson, S., Orre, R., Lansner, A., De Freitas, R.M.: A bayesian neural network method for adverse drug reaction signal generation. European journal of clinical pharmacology <span class="ltx_text ltx_font_bold" id="bib.bib3.1.1">54</span>, 315–321 (1998) </span> </li> <li class="ltx_bibitem" id="bib.bib4"> <span class="ltx_tag ltx_tag_bibitem">[4]</span> <span class="ltx_bibblock"> Calore, E., Schifano, S.F.: FER: A Benchmark for the Roofline Analysis of FPGA Based HPC Accelerators. IEEE Access <span class="ltx_text ltx_font_bold" id="bib.bib4.1.1">10</span>, 94220–94234 (2022). https://doi.org/10.1109/ACCESS.2022.3203566 </span> </li> <li class="ltx_bibitem" id="bib.bib5"> <span class="ltx_tag ltx_tag_bibitem">[5]</span> <span class="ltx_bibblock"> Citri, A., Malenka, R.C.: Synaptic plasticity: multiple forms, functions, and mechanisms. Neuropsychopharmacology <span class="ltx_text ltx_font_bold" id="bib.bib5.1.1">33</span>(1), 18–41 (2008) </span> </li> <li class="ltx_bibitem" id="bib.bib6"> <span class="ltx_tag ltx_tag_bibitem">[6]</span> <span class="ltx_bibblock"> Douglas, R.J., Martin, K.A.: NEURONAL CIRCUITS OF THE NEOCORTEX. Annual Review of Neuroscience <span class="ltx_text ltx_font_bold" id="bib.bib6.1.1">27</span>(1), 419–451 (Jul 2004). https://doi.org/10.1146/annurev.neuro.27.070203.144152 </span> </li> <li class="ltx_bibitem" id="bib.bib7"> <span class="ltx_tag ltx_tag_bibitem">[7]</span> <span class="ltx_bibblock"> Douglas, R.J., Martin, K.A., Whitteridge, D.: A Canonical Microcircuit for Neocortex. Neural Computation <span class="ltx_text ltx_font_bold" id="bib.bib7.1.1">1</span>(4), 480–488 (Dec 1989). https://doi.org/10.1162/neco.1989.1.4.480 </span> </li> <li class="ltx_bibitem" id="bib.bib8"> <span class="ltx_tag ltx_tag_bibitem">[8]</span> <span class="ltx_bibblock"> Jia, Y.: Analysis of the impact of artificial intelligence on electricity consumption. In: 2024 3rd International Conference on Artificial Intelligence, Internet of Things and Cloud Computing Technology (AIoTC). pp. 57–60. IEEE (2024) </span> </li> <li class="ltx_bibitem" id="bib.bib9"> <span class="ltx_tag ltx_tag_bibitem">[9]</span> <span class="ltx_bibblock"> Johansson, C., Lansner, A.: Bcpnn implemented with fixed-point arithmetic. Department of Numerical Analysis and Computer Science, Royal Institute of Technology, Stockholm (2004) </span> </li> <li class="ltx_bibitem" id="bib.bib10"> <span class="ltx_tag ltx_tag_bibitem">[10]</span> <span class="ltx_bibblock"> John Mccalpin: Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter pp. 19–25 (1995) </span> </li> <li class="ltx_bibitem" id="bib.bib11"> <span class="ltx_tag ltx_tag_bibitem">[11]</span> <span class="ltx_bibblock"> Kalyan, K.S., Rajasekharan, A., Sangeetha, S.: Ammus: A survey of transformer-based pretrained models in natural language processing. arXiv preprint arXiv:2108.05542 (2021) </span> </li> <li class="ltx_bibitem" id="bib.bib12"> <span class="ltx_tag ltx_tag_bibitem">[12]</span> <span class="ltx_bibblock"> Lamprecht, R., LeDoux, J.: Structural plasticity and memory. Nature Reviews Neuroscience <span class="ltx_text ltx_font_bold" id="bib.bib12.1.1">5</span>(1), 45–54 (2004) </span> </li> <li class="ltx_bibitem" id="bib.bib13"> <span class="ltx_tag ltx_tag_bibitem">[13]</span> <span class="ltx_bibblock"> Lansner, A., Ekeberg, Ö.: A one-layer feedback artificial neural network with a bayesian learning rule. International journal of neural systems <span class="ltx_text ltx_font_bold" id="bib.bib13.1.1">1</span>(01), 77–87 (1989) </span> </li> <li class="ltx_bibitem" id="bib.bib14"> <span class="ltx_tag ltx_tag_bibitem">[14]</span> <span class="ltx_bibblock"> Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE <span class="ltx_text ltx_font_bold" id="bib.bib14.1.1">86</span>(11), 2278–2324 (Nov 1998). https://doi.org/10.1109/5.726791 </span> </li> <li class="ltx_bibitem" id="bib.bib15"> <span class="ltx_tag ltx_tag_bibitem">[15]</span> <span class="ltx_bibblock"> LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature <span class="ltx_text ltx_font_bold" id="bib.bib15.1.1">521</span>(7553), 436–444 (2015) </span> </li> <li class="ltx_bibitem" id="bib.bib16"> <span class="ltx_tag ltx_tag_bibitem">[16]</span> <span class="ltx_bibblock"> Lindemann, B., Müller, T., Vietz, H., Jazdi, N., Weyrich, M.: A survey on long short-term memory networks for time series prediction. Procedia Cirp <span class="ltx_text ltx_font_bold" id="bib.bib16.1.1">99</span>, 650–655 (2021) </span> </li> <li class="ltx_bibitem" id="bib.bib17"> <span class="ltx_tag ltx_tag_bibitem">[17]</span> <span class="ltx_bibblock"> Liu, L., Wang, D., Wang, Y., Lansner, A., Hemani, A., Yang, Y., Hu, X., Zou, Z., Zheng, L.: A FPGA-based Hardware Accelerator for Bayesian Confidence Propagation Neural Network. In: 2020 IEEE Nordic Circuits and Systems Conference (NorCAS). pp. 1–6 (Oct 2020). https://doi.org/10.1109/NorCAS51424.2020.9265129 </span> </li> <li class="ltx_bibitem" id="bib.bib18"> <span class="ltx_tag ltx_tag_bibitem">[18]</span> <span class="ltx_bibblock"> Maass, W.: Networks of spiking neurons: the third generation of neural network models. Neural networks <span class="ltx_text ltx_font_bold" id="bib.bib18.1.1">10</span>(9), 1659–1671 (1997) </span> </li> <li class="ltx_bibitem" id="bib.bib19"> <span class="ltx_tag ltx_tag_bibitem">[19]</span> <span class="ltx_bibblock"> Mountcastle, V.B.: The columnar organization of the neocortex. Brain: a journal of neurology <span class="ltx_text ltx_font_bold" id="bib.bib19.1.1">120</span>(4), 701–722 (1997) </span> </li> <li class="ltx_bibitem" id="bib.bib20"> <span class="ltx_tag ltx_tag_bibitem">[20]</span> <span class="ltx_bibblock"> Nane, R., Sima, V.M., Pilato, C., Choi, J., Fort, B., Canis, A., Chen, Y.T., Hsiao, H., Brown, S., Ferrandi, F., et al.: A survey and evaluation of fpga high-level synthesis tools. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems <span class="ltx_text ltx_font_bold" id="bib.bib20.1.1">35</span>(10), 1591–1604 (2015) </span> </li> <li class="ltx_bibitem" id="bib.bib21"> <span class="ltx_tag ltx_tag_bibitem">[21]</span> <span class="ltx_bibblock"> Orre, R., Bate, A., Norén, G.N., Swahn, E., Arnborg, S., Edwards, I.R.: A bayesian recurrent neural network for unsupervised pattern recognition in large incomplete data sets. International journal of neural systems <span class="ltx_text ltx_font_bold" id="bib.bib21.1.1">15</span>(03), 207–222 (2005) </span> </li> <li class="ltx_bibitem" id="bib.bib22"> <span class="ltx_tag ltx_tag_bibitem">[22]</span> <span class="ltx_bibblock"> Podobas, A., Svedin, M., Chien, S.W.D., Peng, I.B., Ravichandran, N.B., Herman, P., Lansner, A., Markidis, S.: StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs. In: Proceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies. pp. 1–6. HEART ’21, Association for Computing Machinery, New York, NY, USA (Jun 2021). https://doi.org/10.1145/3468044.3468052 </span> </li> <li class="ltx_bibitem" id="bib.bib23"> <span class="ltx_tag ltx_tag_bibitem">[23]</span> <span class="ltx_bibblock"> Ravichandran, N., Lansner, A., Herman, P.: Unsupervised representation learning with hebbian synaptic and structural plasticity in brain-like feedforward neural networks. arXiv preprint arXiv:2406.04733 (2024) </span> </li> <li class="ltx_bibitem" id="bib.bib24"> <span class="ltx_tag ltx_tag_bibitem">[24]</span> <span class="ltx_bibblock"> Ravichandran, N.B., Lansner, A., Herman, P.: Brain-like approaches to unsupervised learning of hidden representations – a comparative study (Apr 2021). https://doi.org/10.48550/arXiv.2005.03476, <span class="ltx_ref ltx_nolink ltx_url ltx_font_typewriter ltx_ref_self">http://arxiv.org/abs/2005.03476</span>, arXiv:2005.03476 </span> </li> <li class="ltx_bibitem" id="bib.bib25"> <span class="ltx_tag ltx_tag_bibitem">[25]</span> <span class="ltx_bibblock"> Ravichandran, N.B., Lansner, A., Herman, P.: Brain-like approaches to unsupervised learning of hidden representations-a comparative study. In: International Conference on Artificial Neural Networks. pp. 162–173. Springer (2021) </span> </li> <li class="ltx_bibitem" id="bib.bib26"> <span class="ltx_tag ltx_tag_bibitem">[26]</span> <span class="ltx_bibblock"> Ravichandran, N.B., Lansner, A., Herman, P.: Brain-like combination of feedforward and recurrent network components achieves prototype extraction and robust pattern recognition. In: International Conference on Machine Learning, Optimization, and Data Science. pp. 488–501. Springer (2022) </span> </li> <li class="ltx_bibitem" id="bib.bib27"> <span class="ltx_tag ltx_tag_bibitem">[27]</span> <span class="ltx_bibblock"> Schuman, C.D., Potok, T.E., Patton, R.M., Birdwell, J.D., Dean, M.E., Rose, G.S., Plank, J.S.: A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:1705.06963 (2017) </span> </li> <li class="ltx_bibitem" id="bib.bib28"> <span class="ltx_tag ltx_tag_bibitem">[28]</span> <span class="ltx_bibblock"> Sevilla, J., Heim, L., Ho, A., Besiroglu, T., Hobbhahn, M., Villalobos, P.: Compute trends across three eras of machine learning. In: 2022 International Joint Conference on Neural Networks (IJCNN). pp. 1–8. IEEE (2022) </span> </li> <li class="ltx_bibitem" id="bib.bib29"> <span class="ltx_tag ltx_tag_bibitem">[29]</span> <span class="ltx_bibblock"> Siracusa, M., Del Sozzo, E., Rabozzi, M., Di Tucci, L., Williams, S., Sciuto, D., Santambrogio, M.D.: A Comprehensive Methodology to Optimize FPGA Designs via the Roofline Model. IEEE Transactions on Computers <span class="ltx_text ltx_font_bold" id="bib.bib29.1.1">71</span>(8), 1903–1915 (Aug 2022). https://doi.org/10.1109/TC.2021.3111761 </span> </li> <li class="ltx_bibitem" id="bib.bib30"> <span class="ltx_tag ltx_tag_bibitem">[30]</span> <span class="ltx_bibblock"> Svedin, M., Podobas, A., Chien, S.W., Markidis, S.: Higgs boson classification: Brain-inspired bcpnn learning with streambrain. In: 2021 IEEE International Conference on Cluster Computing (CLUSTER). pp. 705–710. IEEE (2021) </span> </li> <li class="ltx_bibitem" id="bib.bib31"> <span class="ltx_tag ltx_tag_bibitem">[31]</span> <span class="ltx_bibblock"> Tully, P.J., Lindén, H., Hennig, M.H., Lansner, A.: Spike-based bayesian-hebbian learning of temporal sequences. PLoS computational biology <span class="ltx_text ltx_font_bold" id="bib.bib31.1.1">12</span>(5), e1004954 (2016) </span> </li> <li class="ltx_bibitem" id="bib.bib32"> <span class="ltx_tag ltx_tag_bibitem">[32]</span> <span class="ltx_bibblock"> Wang, D., Wang, Y., Yang, Y., Stathis, D., Hemani, A., Lansner, A., Xu, J., Zheng, L.R., Zou, Z.: FPGA-Based HPC for Associative Memory System. In: 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). pp. 52–57 (Jan 2024). https://doi.org/10.1109/ASP-DAC58780.2024.10473880, iSSN: 2153-697X </span> </li> <li class="ltx_bibitem" id="bib.bib33"> <span class="ltx_tag ltx_tag_bibitem">[33]</span> <span class="ltx_bibblock"> Wang, D., Xu, J., Stathis, D., Zhang, L., Li, F., Lansner, A., Hemani, A., Yang, Y., Herman, P., Zou, Z.: Mapping the bcpnn learning rule to a memristor model. Frontiers in Neuroscience <span class="ltx_text ltx_font_bold" id="bib.bib33.1.1">15</span>, 750458 (2021) </span> </li> <li class="ltx_bibitem" id="bib.bib34"> <span class="ltx_tag ltx_tag_bibitem">[34]</span> <span class="ltx_bibblock"> Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM <span class="ltx_text ltx_font_bold" id="bib.bib34.1.1">52</span>(4), 65–76 (Apr 2009). https://doi.org/10.1145/1498765.1498785 </span> </li> <li class="ltx_bibitem" id="bib.bib35"> <span class="ltx_tag ltx_tag_bibitem">[35]</span> <span class="ltx_bibblock"> Xilinx, Inc: AXI High Bandwidth Memory Controller LogiCORE IP Product Guide (PG276) (2022), <span class="ltx_ref ltx_nolink ltx_url ltx_font_typewriter ltx_ref_self">https://docs.amd.com/r/en-US/pg276-axi-hbm/HBM-Topology</span> </span> </li> <li class="ltx_bibitem" id="bib.bib36"> <span class="ltx_tag ltx_tag_bibitem">[36]</span> <span class="ltx_bibblock"> Xilinx, Inc: Performance and Resource Utilization for Floating-point v7.1. Tech. Rep. Vivado Design Suite Release 2023.2, Xilinx, San Jose (2023), <span class="ltx_ref ltx_nolink ltx_url ltx_font_typewriter ltx_ref_self">https://download.amd.com/docnav/documents/ip_attachments/floating-point.html</span> </span> </li> <li class="ltx_bibitem" id="bib.bib37"> <span class="ltx_tag ltx_tag_bibitem">[37]</span> <span class="ltx_bibblock"> Yang, J., Shi, R., Ni, B.: MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis. In: 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI). pp. 191–195 (Apr 2021). https://doi.org/10.1109/ISBI48211.2021.9434062 </span> </li> <li class="ltx_bibitem" id="bib.bib38"> <span class="ltx_tag ltx_tag_bibitem">[38]</span> <span class="ltx_bibblock"> Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification. Scientific Data <span class="ltx_text ltx_font_bold" id="bib.bib38.1.1">10</span>(1),  41 (Jan 2023). https://doi.org/10.1038/s41597-022-01721-8 </span> </li> <li class="ltx_bibitem" id="bib.bib39"> <span class="ltx_tag ltx_tag_bibitem">[39]</span> <span class="ltx_bibblock"> Yang, Y., Stathis, D., Jordão, R., Hemani, A., Lansner, A.: Optimizing bcpnn learning rule for memory access. Frontiers in Neuroscience <span class="ltx_text ltx_font_bold" id="bib.bib39.1.1">14</span>,  878 (2020) </span> </li> </ul> </section> </article> </div> <footer class="ltx_page_footer"> <div class="ltx_page_logo">Generated on Mon Mar 3 14:04:51 2025 by <a class="ltx_LaTeXML_logo" href="http://dlmf.nist.gov/LaTeXML/"><span style="letter-spacing:-0.2em; margin-right:0.1em;">L<span class="ltx_font_smallcaps" style="position:relative; bottom:2.2pt;">a</span>T<span class="ltx_font_smallcaps" style="font-size:120%;position:relative; bottom:-0.2ex;">e</span></span><span style="font-size:90%; position:relative; bottom:-0.2ex;">XML</span><img alt="Mascot Sammy" src=""/></a> </div></footer> </div> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10