CINXE.COM

<!DOCTYPE html> <html lang="en"> <head> <meta content="text/html; charset=utf-8" http-equiv="content-type"/> <title>CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design</title>  <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv-fonts.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/latexml_styles.css" rel="stylesheet" type="text/css"/> <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.3.3/html2canvas.min.js"></script> <script src="/static/browse/0.3.4/js/addons_new.js"></script> <script src="/static/browse/0.3.4/js/feedbackOverlay.js"></script> <base href="/html/2503.01162v2/"/></head> <body> <nav class="ltx_page_navbar"> <nav class="ltx_TOC"> <ol class="ltx_toclist"> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S1" title="In CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">I </span><span class="ltx_text ltx_font_smallcaps">Introduction</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2" title="In CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">II </span><span class="ltx_text ltx_font_smallcaps">Neurosymbolic AI Background and Workload</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.SS1" title="In II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-A</span> </span><span class="ltx_text ltx_font_italic">Challenges with Neural Networks</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.SS2" title="In II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-B</span> </span><span class="ltx_text ltx_font_italic">Neurosymbolic AI Algorithm Flow</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.SS3" title="In II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-C</span> </span><span class="ltx_text ltx_font_italic">VSA-Based Symbolic Operations</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"> <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.SS4" title="In II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-D</span> </span><span class="ltx_text ltx_font_italic">Representative Neurosymbolic AI Models</span></span></a> <ol class="ltx_toclist ltx_toclist_subsection"> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.SS4.SSS1" title="In II-D Representative Neurosymbolic AI Models ‣ II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-D</span>1 </span>Neuro-Vector-Symbolic Architecture (NVSA)</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.SS4.SSS2" title="In II-D Representative Neurosymbolic AI Models ‣ II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-D</span>2 </span>Multiple-Input-Multiple-Output Networks (MIMONet)</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.SS4.SSS3" title="In II-D Representative Neurosymbolic AI Models ‣ II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-D</span>3 </span>Probabilistic Abduction via Learning Rules in Vector-symbolic Architectures (LVRF)</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsubsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.SS4.SSS4" title="In II-D Representative Neurosymbolic AI Models ‣ II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">II-D</span>4 </span>Probabilistic Abduction and Execution (PrAE) Learner</span></a></li> </ol> </li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3" title="In CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">III </span><span class="ltx_text ltx_font_smallcaps">Neurosymbolic AI System Profiling</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.SS1" title="In III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-A</span> </span><span class="ltx_text ltx_font_italic">Profiling Setup</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.SS2" title="In III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-B</span> </span><span class="ltx_text ltx_font_italic">Compute Latency Analysis</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.SS3" title="In III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-C</span> </span><span class="ltx_text ltx_font_italic">Memory and System Analysis</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.SS4" title="In III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-D</span> </span><span class="ltx_text ltx_font_italic">Symbolic Operation and Inefficiency Analysis</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.SS5" title="In III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-E</span> </span><span class="ltx_text ltx_font_italic">Unique Characteristics of Neurosymbolic vs ML Workloads</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.SS6" title="In III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">III-F</span> </span><span class="ltx_text ltx_font_italic">Identified Opportunities for Neurosymbolic Optimization</span></span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S4" title="In CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">IV </span><span class="ltx_text ltx_font_smallcaps">CogSys: Algorithm Optimization</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S4.SS1" title="In IV CogSys: Algorithm Optimization ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">IV-A</span> </span><span class="ltx_text ltx_font_italic">Symbolic Factorization Strategy</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S4.SS2" title="In IV CogSys: Algorithm Optimization ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">IV-B</span> </span><span class="ltx_text ltx_font_italic">Stochasticity and Low-Precision Operation</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S4.SS3" title="In IV CogSys: Algorithm Optimization ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">IV-C</span> </span><span class="ltx_text ltx_font_italic">Algorithm Optimizations Discussion</span></span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5" title="In CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">V </span><span class="ltx_text ltx_font_smallcaps">CogSys: Hardware Architecture</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.SS1" title="In V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">V-A</span> </span><span class="ltx_text ltx_font_italic">Overview of Proposed CogSys Architecture</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.SS2" title="In V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">V-B</span> </span><span class="ltx_text ltx_font_italic">Reconfigurable Neuro/Symbolic Processing Element</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.SS3" title="In V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">V-C</span> </span><span class="ltx_text ltx_font_italic">Efficient Bubble Streaming Dataflow</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.SS4" title="In V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">V-D</span> </span><span class="ltx_text ltx_font_italic">Adaptive Spatial and Temporal Mapping Strategy</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.SS5" title="In V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">V-E</span> </span><span class="ltx_text ltx_font_italic">Adaptive Scale-Up and Scale-Out Strategy</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.SS6" title="In V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">V-F</span> </span><span class="ltx_text ltx_font_italic">Double-buffered Memory and Custom SIMD Unit</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.SS7" title="In V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">V-G</span> </span><span class="ltx_text ltx_font_italic">Design Choices Discussion</span></span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S6" title="In CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">VI </span><span class="ltx_text ltx_font_smallcaps">CogSys: Scheduling Strategy</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S6.SS1" title="In VI CogSys: Scheduling Strategy ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">VI-A</span> </span><span class="ltx_text ltx_font_italic">Neurosymbolic System-Level Challenges</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S6.SS2" title="In VI CogSys: Scheduling Strategy ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">VI-B</span> </span><span class="ltx_text ltx_font_italic">Adaptive Workload-Aware Scheduling (adSCH) Strategy</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S6.SS3" title="In VI CogSys: Scheduling Strategy ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">VI-C</span> </span><span class="ltx_text ltx_font_italic">Scalability and Variability Support</span></span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7" title="In CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">VII </span><span class="ltx_text ltx_font_smallcaps">Evaluation Results</span></span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.SS1" title="In VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">VII-A</span> </span><span class="ltx_text ltx_font_italic">Experimental Setup</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.SS2" title="In VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">VII-B</span> </span><span class="ltx_text ltx_font_italic">CogSys Algorithm Optimization Performance</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.SS3" title="In VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref"><span class="ltx_text">VII-C</span> </span><span class="ltx_text ltx_font_italic">CogSys Accelerator Performance</span></span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S8" title="In CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">VIII </span><span class="ltx_text ltx_font_smallcaps">Related Work</span></span></a></li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S9" title="In CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">IX </span><span class="ltx_text ltx_font_smallcaps">Conclusion</span></span></a></li> </ol></nav> </nav> <div class="ltx_page_main"> <div class="ltx_page_content"> <article class="ltx_document ltx_authors_1line"> <h1 class="ltx_title ltx_title_document">CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design </h1> <div class="ltx_authors"> <span class="ltx_creator ltx_role_author"> <span class="ltx_personname"> Zishen Wan<sup class="ltx_sup" id="id16.10.id1"><span class="ltx_text ltx_font_italic" id="id16.10.id1.1">†∗</span></sup>, Hanchen Yang<sup class="ltx_sup" id="id17.11.id2"><span class="ltx_text ltx_font_italic" id="id17.11.id2.1">†∗</span></sup>, Ritik Raj<sup class="ltx_sup" id="id18.12.id3"><span class="ltx_text ltx_font_italic" id="id18.12.id3.1">†∗</span></sup>, Che-Kai Liu<sup class="ltx_sup" id="id19.13.id4"><span class="ltx_text ltx_font_italic" id="id19.13.id4.1">†</span></sup>, Ananda Samajdar<sup class="ltx_sup" id="id20.14.id5"><span class="ltx_text ltx_font_italic" id="id20.14.id5.1">‡</span></sup>, <br class="ltx_break"/>Arijit Raychowdhury<sup class="ltx_sup" id="id21.15.id6"><span class="ltx_text ltx_font_italic" id="id21.15.id6.1">†</span></sup>, Tushar Krishna<sup class="ltx_sup" id="id22.16.id7"><span class="ltx_text ltx_font_italic" id="id22.16.id7.1">†</span></sup> </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_affiliation"><sup class="ltx_sup" id="id23.17.id1"><span class="ltx_text ltx_font_italic" id="id23.17.id1.1">†</span></sup><span class="ltx_text ltx_font_italic" id="id24.18.id2">Georgia Institute of Technology, Atlanta, GA</span> <sup class="ltx_sup" id="id25.19.id3"><span class="ltx_text ltx_font_italic" id="id25.19.id3.1">‡</span></sup><span class="ltx_text ltx_font_italic" id="id26.20.id4">IBM Research, Yorktown Heights, NY</span> <br class="ltx_break"/>{zishenwan, hanchen, ritik.raj, che-kai}@gatech.edu, ananda.samajdar@ibm.com <br class="ltx_break"/>{arijit.raychowdhury, tushar}@ece.gatech.edu </span></span></span> </div> <div class="ltx_abstract"> <h6 class="ltx_title ltx_title_abstract">Abstract</h6> <p class="ltx_p" id="id27.id1">Neurosymbolic AI is an emerging compositional paradigm that fuses neural learning with symbolic reasoning to enhance the transparency, interpretability, and trustworthiness of AI. It also exhibits higher data efficiency making it promising for edge deployments. Despite the algorithmic promises and demonstrations, unfortunately executing neurosymbolic workloads on current hardware (CPU/GPU/TPU) is challenging due to higher memory intensity, greater compute heterogeneity and access pattern irregularity, leading to severe hardware underutilization. </p> <p class="ltx_p" id="id15.6">This work proposes CogSys, a characterization and co-design framework dedicated to neurosymbolic AI system acceleration, aiming to win both reasoning efficiency and scalability. <span class="ltx_text ltx_framed ltx_framed_underline" id="id15.6.1">On the algorithm side</span>, CogSys proposes an <span class="ltx_text ltx_font_italic" id="id15.6.2">efficient factorization technique</span> to alleviate compute and memory overhead. <span class="ltx_text ltx_framed ltx_framed_underline" id="id15.6.3">On the</span> <span class="ltx_text ltx_framed ltx_framed_underline" id="id15.6.4">hardware side</span>, CogSys proposes a scalable neurosymbolic architecture with <span class="ltx_text ltx_font_italic" id="id15.6.5">reconfigurable neuro/symbolic processing elements (nsPE)</span> and <span class="ltx_text ltx_font_italic" id="id15.6.6">bubble streaming (BS) dataflow</span> with <span class="ltx_text ltx_font_italic" id="id15.6.7">spatial-temporal (ST) mapping</span> for highly parallel and efficient neurosymbolic computation. <span class="ltx_text ltx_framed ltx_framed_underline" id="id15.6.8">On the system side</span>, CogSys features an <span class="ltx_text ltx_font_italic" id="id15.6.9">adaptive workload-aware scheduler (adSCH)</span> to orchestrate heterogeneous kernels and enhance resource utilization. Evaluated across cognitive workloads, CogSys enables reconfigurable support for neural and symbolic kernels and exhibits <math alttext=">" class="ltx_Math" display="inline" id="id10.1.m1.1"><semantics id="id10.1.m1.1a"><mo id="id10.1.m1.1.1" xref="id10.1.m1.1.1.cmml">></mo><annotation-xml encoding="MathML-Content" id="id10.1.m1.1b"><gt id="id10.1.m1.1.1.cmml" xref="id10.1.m1.1.1"></gt></annotation-xml><annotation encoding="application/x-tex" id="id10.1.m1.1c">></annotation><annotation encoding="application/x-llamapun" id="id10.1.m1.1d">></annotation></semantics></math>75<math alttext="\times" class="ltx_Math" display="inline" id="id11.2.m2.1"><semantics id="id11.2.m2.1a"><mo id="id11.2.m2.1.1" xref="id11.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="id11.2.m2.1b"><times id="id11.2.m2.1.1.cmml" xref="id11.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="id11.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="id11.2.m2.1d">×</annotation></semantics></math> speedup over TPU-like systolic array with only <math alttext="<" class="ltx_Math" display="inline" id="id12.3.m3.1"><semantics id="id12.3.m3.1a"><mo id="id12.3.m3.1.1" xref="id12.3.m3.1.1.cmml"><</mo><annotation-xml encoding="MathML-Content" id="id12.3.m3.1b"><lt id="id12.3.m3.1.1.cmml" xref="id12.3.m3.1.1"></lt></annotation-xml><annotation encoding="application/x-tex" id="id12.3.m3.1c"><</annotation><annotation encoding="application/x-llamapun" id="id12.3.m3.1d"><</annotation></semantics></math>5% area overhead, as benchmarked under the TSMC 28nm technology node. CogSys achieves 4<math alttext="\times" class="ltx_Math" display="inline" id="id13.4.m4.1"><semantics id="id13.4.m4.1a"><mo id="id13.4.m4.1.1" xref="id13.4.m4.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="id13.4.m4.1b"><times id="id13.4.m4.1.1.cmml" xref="id13.4.m4.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="id13.4.m4.1c">\times</annotation><annotation encoding="application/x-llamapun" id="id13.4.m4.1d">×</annotation></semantics></math>-96<math alttext="\times" class="ltx_Math" display="inline" id="id14.5.m5.1"><semantics id="id14.5.m5.1a"><mo id="id14.5.m5.1.1" xref="id14.5.m5.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="id14.5.m5.1b"><times id="id14.5.m5.1.1.cmml" xref="id14.5.m5.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="id14.5.m5.1c">\times</annotation><annotation encoding="application/x-llamapun" id="id14.5.m5.1d">×</annotation></semantics></math> speedup compared to desktop and edge GPUs. For the first time, CogSys enables real-time <span class="ltx_text ltx_font_italic" id="id15.6.10">abduction reasoning</span> towards human fluid intelligence, requiring only 0.3 s per reasoning task with 4 mm<sup class="ltx_sup" id="id15.6.11">2</sup> area and 1.48 W power consumption.</p> </div> <section class="ltx_section" id="S1"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">I </span><span class="ltx_text ltx_font_smallcaps" id="S1.1.1">Introduction</span> </h2> <div class="ltx_para" id="S1.p1"> <p class="ltx_p" id="S1.p1.1">The massive success of Large Language Models (LLMs) combined with concerns about interpretability and safety have led to an emerging paradigm of “compositional AI” systems - especially for safety-critical domains such as robotics and healthcare. The goal of such systems is to combine black-box neural networks with reasoning/rule-based AI methods <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib87" title="">87</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib61" title="">61</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib98" title="">98</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib44" title="">44</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib49" title="">49</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib43" title="">43</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib91" title="">91</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib38" title="">38</a>]</cite>. This approach mirrors human cognitive processes, which can be grouped as lower-level sensory processing (system 1 “thinking fast”) and higher-level cognitive functions like reasoning and deduction (system 2 “thinking slow”) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib17" title="">17</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib12" title="">12</a>]</cite>. The former can be modeled with neural networks, and the latter with symbolic frameworks.<span class="ltx_note ltx_role_footnotetext" id="footnotex1"><sup class="ltx_note_mark">†</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">†</sup><span class="ltx_note_type">footnotetext: </span><sup class="ltx_sup" id="footnotex1.1"><span class="ltx_text ltx_font_italic" id="footnotex1.1.1">∗</span></sup>Equal Contributions.</span></span></span></p> </div> <div class="ltx_para" id="S1.p2"> <p class="ltx_p" id="S1.p2.1">One promising example of compositional AI system is <span class="ltx_text ltx_font_bold" id="S1.p2.1.1">neurosymbolic AI</span> that synergistically integrates neural network learning with <span class="ltx_text ltx_font_italic" id="S1.p2.1.2">symbolic reasoning</span>. The neural networks are adept at identifying patterns and handling perceptual tasks, but lack transparency and logical inference capabilities. Symbolic modules (e.g., coded knowledge, rules), in contrast, excel in reasoning and interpretability but struggle with adaptability and learning from raw data. Neurosymbolic AI bridges this gap by composing strengths of both paradigms (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S1.F1" title="Figure 1 ‣ I Introduction ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">1</span></a>).</p> </div> <div class="ltx_para" id="S1.p3"> <p class="ltx_p" id="S1.p3.1">Neurosymbolic AI has demonstrated superior capabilities in human-like reasoning and logical thinking across various domains, such as natural language processing, robotics, healthcare, etc <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib26" title="">26</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib87" title="">87</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib57" title="">57</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib29" title="">29</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib59" title="">59</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib93" title="">93</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib96" title="">96</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib76" title="">76</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib35" title="">35</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>]</cite>. For example, IBM’s neuro-vector-symbolic system <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>]</cite> achieves 98.8% accuracy on spatial-temporal reasoning tasks <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib95" title="">95</a>]</cite>, greatly surpassing human (84.4%), ResNet (53.4%) and GPT-4 (89.0%); Google DeepMind’s AlphaGeometry <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib83" title="">83</a>]</cite>, another neurosymbolic system, solves geometry problems at a level of human Olympiad gold medalists, while GPT-4 completely fails. Recently, there also has been a plethora of workshops focusing on neurosymbolic AI <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib5" title="">5</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib8" title="">8</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib6" title="">6</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib2" title="">2</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib1" title="">1</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib3" title="">3</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib4" title="">4</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib7" title="">7</a>]</cite>.</p> </div> <figure class="ltx_figure" id="S1.F1"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="267" id="S1.F1.g1" src="x1.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S1.F1.5.1.1" style="font-size:90%;">Figure 1</span>: </span><span class="ltx_text ltx_font_bold" id="S1.F1.6.2" style="font-size:90%;">Neurosymbolic AI<span class="ltx_text ltx_font_medium" id="S1.F1.6.2.1"> is an emerging <span class="ltx_text ltx_font_italic" id="S1.F1.6.2.1.1">compositional</span> system that integrates neural and symbolic modules, enabling superior cognitive intelligence compared to NNs. However, it suffers from inefficient TPU/GPU execution. </span>CogSys<span class="ltx_text ltx_font_medium" id="S1.F1.6.2.2"> is a reconfigurable neural/symbolic engine excelling in both reasoning efficiency and cognitive capability.</span></span></figcaption> </figure> <figure class="ltx_figure" id="S1.F2"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="438" id="S1.F2.g1" src="x2.png" width="1702"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S1.F2.3.1.1" style="font-size:90%;">Figure 2</span>: </span><span class="ltx_text ltx_font_bold" id="S1.F2.4.2" style="font-size:90%;">Neurosymbolic algorithm flow.<span class="ltx_text ltx_font_medium" id="S1.F2.4.2.1"> Neural systems handle perception by processing raw data and extracting features, which are then utilized by symbolic reasoning systems to apply logical rules and knowledge. This compositionality enables the execution of complex cognitive tasks such as abstract deduction, ethical decision-making, and fluid intelligence.</span></span></figcaption> </figure> <div class="ltx_para" id="S1.p4"> <p class="ltx_p" id="S1.p4.1">Despite impressive cognitive capabilities of neurosymbolic AI - demonstrated by past work over distributed GPU clusters, recent study <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib86" title="">86</a>]</cite> identifies that enabling <span class="ltx_text ltx_font_italic" id="S1.p4.1.1">real-time and efficient</span> neurosymbolic AI over edge devices, which is highly desirable for numerous reasoning and human-AI applications, is a challenging open problem. For example, a neuro-vector-symbolic system takes <math alttext=">" class="ltx_Math" display="inline" id="S1.p4.1.m1.1"><semantics id="S1.p4.1.m1.1a"><mo id="S1.p4.1.m1.1.1" xref="S1.p4.1.m1.1.1.cmml">></mo><annotation-xml encoding="MathML-Content" id="S1.p4.1.m1.1b"><gt id="S1.p4.1.m1.1.1.cmml" xref="S1.p4.1.m1.1.1"></gt></annotation-xml><annotation encoding="application/x-tex" id="S1.p4.1.m1.1c">></annotation><annotation encoding="application/x-llamapun" id="S1.p4.1.m1.1d">></annotation></semantics></math>3 mins even on TPU and desktop GPU for a single task <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>]</cite>. This inefficiency threatens to hurt neurosymbolic AI deployment in the long run.</p> </div> <div class="ltx_para" id="S1.p5"> <p class="ltx_p" id="S1.p5.3">To understand this further, we systematically profile and analyze the runtime and memory behavior of various neurosymbolic workloads on multiple devices and identify the following system-level challenges. <svg class="ltx_picture" height="12.06" id="S1.p5.1.pic1" overflow="visible" version="1.1" width="12.06"><g fill="#000000" stroke="#000000" stroke-width="0.4pt" transform="translate(0,12.06) matrix(1 0 0 -1 0 0) translate(6.03,0) translate(0,6.03)"><path d="M 6.03 0 C 6.03 3.33 3.33 6.03 0 6.03 C -3.33 6.03 -6.03 3.33 -6.03 0 C -6.03 -3.33 -3.33 -6.03 0 -6.03 C 3.33 -6.03 6.03 -3.33 6.03 0 Z M 0 0" style="stroke:none"></path><g fill="#000000" stroke="#000000" transform="matrix(1.0 0.0 0.0 1.0 -3.46 -4.46)"><foreignobject height="8.92" overflow="visible" transform="matrix(1 0 0 -1 0 16.6)" width="6.92"><span class="ltx_text" id="S1.p5.1.pic1.1.1.1.1.1" style="color:#FFFFFF;">1</span></foreignobject></g></g></svg> <span class="ltx_text ltx_font_bold" id="S1.p5.3.1">Large Memory Footprint.</span> Neurosymbolic AI systems typically rely on vector-symbolic architecture (VSA) that utilizes vector operations to represent symbolic knowledge. The system generates an intermediate codebook that captures vast object combinations for higher reasoning capability (typically in the order of tens to hundreds MB) making it impractical to be cached on-chip in edge accelerators. <svg class="ltx_picture" height="12.06" id="S1.p5.2.pic2" overflow="visible" version="1.1" width="12.06"><g fill="#000000" stroke="#000000" stroke-width="0.4pt" transform="translate(0,12.06) matrix(1 0 0 -1 0 0) translate(6.03,0) translate(0,6.03)"><path d="M 6.03 0 C 6.03 3.33 3.33 6.03 0 6.03 C -3.33 6.03 -6.03 3.33 -6.03 0 C -6.03 -3.33 -3.33 -6.03 0 -6.03 C 3.33 -6.03 6.03 -3.33 6.03 0 Z M 0 0" style="stroke:none"></path><g fill="#000000" stroke="#000000" transform="matrix(1.0 0.0 0.0 1.0 -3.46 -4.46)"><foreignobject height="8.92" overflow="visible" transform="matrix(1 0 0 -1 0 16.6)" width="6.92"><span class="ltx_text" id="S1.p5.2.pic2.1.1.1.1.1" style="color:#FFFFFF;">2</span></foreignobject></g></g></svg> <span class="ltx_text ltx_font_bold" id="S1.p5.3.2">Compute Heterogeneity.</span> Rather than general matrix multiplications (GEMMs) and convolution operations that current neural hardware largely focuses on accelerating, neurosymbolic workloads typically consist of numerous holographic vector operations (e.g., circular convolution) that run inefficiently on GPU and neural engines like TPUs due to low data reuse, low compute array utilization and low parallelism. <svg class="ltx_picture" height="12.06" id="S1.p5.3.pic3" overflow="visible" version="1.1" width="12.06"><g fill="#000000" stroke="#000000" stroke-width="0.4pt" transform="translate(0,12.06) matrix(1 0 0 -1 0 0) translate(6.03,0) translate(0,6.03)"><path d="M 6.03 0 C 6.03 3.33 3.33 6.03 0 6.03 C -3.33 6.03 -6.03 3.33 -6.03 0 C -6.03 -3.33 -3.33 -6.03 0 -6.03 C 3.33 -6.03 6.03 -3.33 6.03 0 Z M 0 0" style="stroke:none"></path><g fill="#000000" stroke="#000000" transform="matrix(1.0 0.0 0.0 1.0 -3.46 -4.46)"><foreignobject height="8.92" overflow="visible" transform="matrix(1 0 0 -1 0 16.6)" width="6.92"><span class="ltx_text" id="S1.p5.3.pic3.1.1.1.1.1" style="color:#FFFFFF;">3</span></foreignobject></g></g></svg> <span class="ltx_text ltx_font_bold" id="S1.p5.3.3">Sequential Processing.</span> Typically, the symbolic-reasoning computation depends on the output of the neuro-perceptual modules, increasing the critical path during cognitive inference and underutilizing parts of the accelerator.</p> </div> <div class="ltx_para" id="S1.p6"> <p class="ltx_p" id="S1.p6.3">To address the aforementioned challenges, we develop an algorithm-hardware co-design framework, dubbed CogSys, which to the best of our knowledge is the <em class="ltx_emph ltx_font_bold ltx_font_italic" id="S1.p6.3.4">first</em> to achieve real-time efficiency and scalability of cognitive neurosymbolic systems, making it more deployable and facilitate neurosymbolic AI development. <span class="ltx_text ltx_font_bold" id="S1.p6.3.5">On the algorithm level,</span> CogSys proposes an <span class="ltx_text ltx_font_italic" id="S1.p6.3.6">efficient factorization scheme</span> to reduce memory footprint. This technique completely replaces the large-size symbolic knowledge codebook, by quickly factorizing vectors in an interactive manner when decomposing symbolic representations. <span class="ltx_text ltx_font_bold" id="S1.p6.3.7">On the hardware level,</span> CogSys proposes a scalable architecture with <span class="ltx_text ltx_font_italic" id="S1.p6.3.8">reconfigurable neuro/symbolic processing elements (nsPE)</span> and <span class="ltx_text ltx_font_italic" id="S1.p6.3.9">bubble streaming (BS) dataflow</span> with <span class="ltx_text ltx_font_italic" id="S1.p6.3.10">spatial-temporal (ST) mapping</span> for highly parallel and energy-efficient neurosymboic computation. The design is flexible to support heterogeneous neural and circular convolution symbolic kernels across vector dimensions and reduce runtime. <span class="ltx_text ltx_font_bold" id="S1.p6.3.11">On the system level,</span> CogSys also features an <span class="ltx_text ltx_font_italic" id="S1.p6.3.12">adaptive workload-aware scheduling (adSCH) scheme</span> with <span class="ltx_text ltx_font_italic" id="S1.p6.3.13">multi-level parallelism</span> to orchestra neural and symbolic kernels with improved hardware resource utilization and enables design scalability for evolving neurosymbolic AI. <em class="ltx_emph ltx_font_italic" id="S1.p6.3.3">Notably, with only <math alttext="<" class="ltx_Math" display="inline" id="S1.p6.1.1.m1.1"><semantics id="S1.p6.1.1.m1.1a"><mo id="S1.p6.1.1.m1.1.1" xref="S1.p6.1.1.m1.1.1.cmml"><</mo><annotation-xml encoding="MathML-Content" id="S1.p6.1.1.m1.1b"><lt id="S1.p6.1.1.m1.1.1.cmml" xref="S1.p6.1.1.m1.1.1"></lt></annotation-xml><annotation encoding="application/x-tex" id="S1.p6.1.1.m1.1c"><</annotation><annotation encoding="application/x-llamapun" id="S1.p6.1.1.m1.1d"><</annotation></semantics></math>5% area overhead over TPU-like systolic arrays, CogSys enables reconfigurable support for neural and symbolic kernels and demonstrates <math alttext=">" class="ltx_Math" display="inline" id="S1.p6.2.2.m2.1"><semantics id="S1.p6.2.2.m2.1a"><mo id="S1.p6.2.2.m2.1.1" xref="S1.p6.2.2.m2.1.1.cmml">></mo><annotation-xml encoding="MathML-Content" id="S1.p6.2.2.m2.1b"><gt id="S1.p6.2.2.m2.1.1.cmml" xref="S1.p6.2.2.m2.1.1"></gt></annotation-xml><annotation encoding="application/x-tex" id="S1.p6.2.2.m2.1c">></annotation><annotation encoding="application/x-llamapun" id="S1.p6.2.2.m2.1d">></annotation></semantics></math>75<math alttext="\times" class="ltx_Math" display="inline" id="S1.p6.3.3.m3.1"><semantics id="S1.p6.3.3.m3.1a"><mo id="S1.p6.3.3.m3.1.1" xref="S1.p6.3.3.m3.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S1.p6.3.3.m3.1b"><times id="S1.p6.3.3.m3.1.1.cmml" xref="S1.p6.3.3.m3.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S1.p6.3.3.m3.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S1.p6.3.3.m3.1d">×</annotation></semantics></math> system speedup.</em></p> </div> <div class="ltx_para" id="S1.p7"> <p class="ltx_p" id="S1.p7.1">This paper, therefore, makes the following contributions:</p> </div> <div class="ltx_para" id="S1.p8"> <p class="ltx_p" id="S1.p8.1">• We perform comprehensive runtime and memory analysis of various neurosymbolic workloads across devices, and identify the primary cause of the inefficiency and optimization opportunities, which can also shed light on future neurosymbolic systems acceleration and innovations (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3" title="III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">III</span></a>).</p> </div> <div class="ltx_para" id="S1.p9"> <p class="ltx_p" id="S1.p9.1">• We propose an algorithm-hardware co-design framework, dubbed CogSys, which is the first to enable real-time, efficient, and scalable VSA-based neurosymbolic systems, making it more deployable and facilitate neurosymbolic AI development.</p> </div> <div class="ltx_para" id="S1.p10"> <p class="ltx_p" id="S1.p10.1">• CogSys innovates across the algorithm-level efficient symbolic factorization strategy (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S4" title="IV CogSys: Algorithm Optimization ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">IV</span></a>), hardware-level reconfigurable neuro/symbolic architecture and dataflow (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5" title="V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">V</span></a>), and system-level scheduler (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S6" title="VI CogSys: Scheduling Strategy ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">VI</span></a>) to reduce the memory footprint while improving hardware utilization and overall neurosymbolic processing efficiency.</p> </div> <div class="ltx_para" id="S1.p11"> <p class="ltx_p" id="S1.p11.4">• Evaluated across cognitive tasks, CogSys enables reconfigurable support for neural and symbolic operations, achieving 75.9<math alttext="\times" class="ltx_Math" display="inline" id="S1.p11.1.m1.1"><semantics id="S1.p11.1.m1.1a"><mo id="S1.p11.1.m1.1.1" xref="S1.p11.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S1.p11.1.m1.1b"><times id="S1.p11.1.m1.1.1.cmml" xref="S1.p11.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S1.p11.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S1.p11.1.m1.1d">×</annotation></semantics></math> speedup with only a 4.8% area overhead compared to TPU-like systolic arrays, and demonstrates 4<math alttext="\times" class="ltx_Math" display="inline" id="S1.p11.2.m2.1"><semantics id="S1.p11.2.m2.1a"><mo id="S1.p11.2.m2.1.1" xref="S1.p11.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S1.p11.2.m2.1b"><times id="S1.p11.2.m2.1.1.cmml" xref="S1.p11.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S1.p11.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S1.p11.2.m2.1d">×</annotation></semantics></math>-95<math alttext="\times" class="ltx_Math" display="inline" id="S1.p11.3.m3.1"><semantics id="S1.p11.3.m3.1a"><mo id="S1.p11.3.m3.1.1" xref="S1.p11.3.m3.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S1.p11.3.m3.1b"><times id="S1.p11.3.m3.1.1.cmml" xref="S1.p11.3.m3.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S1.p11.3.m3.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S1.p11.3.m3.1d">×</annotation></semantics></math> speedup compared to GPUs. CogSys enables efficient neurosymbolic AI with 4mm<sup class="ltx_sup" id="S1.p11.4.1">2</sup> area and 1.48W power consumption (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7" title="VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">VII</span></a>).</p> </div> </section> <section class="ltx_section" id="S2"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">II </span><span class="ltx_text ltx_font_smallcaps" id="S2.1.1">Neurosymbolic AI Background and Workload</span> </h2> <div class="ltx_para" id="S2.p1"> <p class="ltx_p" id="S2.p1.1">This section presents the preliminaries of neurosymbolic AI with its algorithm flow and key operations, then describes four representative neurosymbolic workloads for our analysis.</p> </div> <section class="ltx_subsection" id="S2.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S2.SS1.5.1.1">II-A</span> </span><span class="ltx_text ltx_font_italic" id="S2.SS1.6.2">Challenges with Neural Networks</span> </h3> <div class="ltx_para" id="S2.SS1.p1"> <p class="ltx_p" id="S2.SS1.p1.1">Neural methods are highly effective in extracting complex features from vision and language tasks, and excel in flexibility, scalability, and handling inconsistency <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib68" title="">68</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib100" title="">100</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib90" title="">90</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib88" title="">88</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib89" title="">89</a>]</cite>. However, neural methods often suffer from limitations such as hallucinations and lack of interpretability, and typically operate as black-box where their decision-making processes are not easily understandable by humans. This undermines the model output trustworthiness in cognitive and safety-critical applications <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib84" title="">84</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib39" title="">39</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib31" title="">31</a>]</cite>.</p> </div> </section> <section class="ltx_subsection" id="S2.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S2.SS2.5.1.1">II-B</span> </span><span class="ltx_text ltx_font_italic" id="S2.SS2.6.2">Neurosymbolic AI Algorithm Flow</span> </h3> <div class="ltx_para" id="S2.SS2.p1"> <p class="ltx_p" id="S2.SS2.p1.1">Neurosymbolic AI synergistically integrates the learning capability of neural networks with the reasoning capability of symbolic AI, offering advantages in data-efficient learning with transparent and logical decision-making compared to DNNs. Neurosymbolic AI leads to superior performance in a wide range of applications, such as complex question answering <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib57" title="">57</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib59" title="">59</a>]</cite>, abstract deduction <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib96" title="">96</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>]</cite>, decision making <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib78" title="">78</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib64" title="">64</a>]</cite>, logical reasoning <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib83" title="">83</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib70" title="">70</a>]</cite> tasks, serving as a promising paradigm to achieve human-like fluid intelligence.</p> </div> <div class="ltx_para" id="S2.SS2.p2"> <p class="ltx_p" id="S2.SS2.p2.1">Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S1.F2" title="Figure 2 ‣ I Introduction ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">2</span></a> extracts a unified neurosymbolic pipeline and illustrates how they interact to perform complex cognitive tasks:</p> </div> <div class="ltx_para" id="S2.SS2.p3"> <svg class="ltx_picture" height="12.06" id="S2.SS2.p3.pic1" overflow="visible" version="1.1" width="12.06"><g fill="#000000" stroke="#000000" stroke-width="0.4pt" transform="translate(0,12.06) matrix(1 0 0 -1 0 0) translate(6.03,0) translate(0,6.03)"><path d="M 6.03 0 C 6.03 3.33 3.33 6.03 0 6.03 C -3.33 6.03 -6.03 3.33 -6.03 0 C -6.03 -3.33 -3.33 -6.03 0 -6.03 C 3.33 -6.03 6.03 -3.33 6.03 0 Z M 0 0" style="stroke:none"></path><g fill="#000000" stroke="#000000" transform="matrix(1.0 0.0 0.0 1.0 -3.46 -4.46)"><foreignobject height="8.92" overflow="visible" transform="matrix(1 0 0 -1 0 16.6)" width="6.92"><span class="ltx_text" id="S2.SS2.p3.pic1.1.1.1.1.1" style="color:#FFFFFF;">1</span></foreignobject></g></g></svg> <p class="ltx_p" id="S2.SS2.p3.1"><span class="ltx_text ltx_font_bold" id="S2.SS2.p3.1.1">Neural system.</span> The process begins with the neural module handling perception tasks by interpreting sensory data and generating meaningful scene and object representations which are essential for further reasoning processes. The neural module itself may suffer from superposition catastrophe, preventing it from extracting object constituent attributes <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>]</cite>.</p> </div> <div class="ltx_para" id="S2.SS2.p4"> <svg class="ltx_picture" height="12.06" id="S2.SS2.p4.pic1" overflow="visible" version="1.1" width="12.06"><g fill="#000000" stroke="#000000" stroke-width="0.4pt" transform="translate(0,12.06) matrix(1 0 0 -1 0 0) translate(6.03,0) translate(0,6.03)"><path d="M 6.03 0 C 6.03 3.33 3.33 6.03 0 6.03 C -3.33 6.03 -6.03 3.33 -6.03 0 C -6.03 -3.33 -3.33 -6.03 0 -6.03 C 3.33 -6.03 6.03 -3.33 6.03 0 Z M 0 0" style="stroke:none"></path><g fill="#000000" stroke="#000000" transform="matrix(1.0 0.0 0.0 1.0 -3.46 -4.46)"><foreignobject height="8.92" overflow="visible" transform="matrix(1 0 0 -1 0 16.6)" width="6.92"><span class="ltx_text" id="S2.SS2.p4.pic1.1.1.1.1.1" style="color:#FFFFFF;">2</span></foreignobject></g></g></svg> <p class="ltx_p" id="S2.SS2.p4.1"><span class="ltx_text ltx_font_bold" id="S2.SS2.p4.1.1">Symbolic system.</span> The extracted features are fed into the symbolic system for reasoning tasks. This step enhances explainability and reduces dependence on extensive training data by incorporating established models of the physical world (e.g., underlying rules, coded knowledge). Throughout this process, a knowledge codebook is maintained, which integrates learned knowledge from the neural network with symbolic rules, ensuring that the system can both learn from new data and apply logical reasoning based on existing knowledge. The outcomes of symbolic reasoning process are then used to make decisions, generates responses, or controls actions.</p> </div> <div class="ltx_para" id="S2.SS2.p5"> <p class="ltx_p" id="S2.SS2.p5.1">This neurosymbolic flow is one way to model human hierarchical reasoning procedures. Resembling the sense-reason-act cognitive cycle can be computationally modeled through a multi-layer framework <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib65" title="">65</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib37" title="">37</a>]</cite>, where <span class="ltx_text ltx_font_italic" id="S2.SS2.p5.1.1">perception</span> layer fuses sensory inputs and maps them to high-level observations, <span class="ltx_text ltx_font_italic" id="S2.SS2.p5.1.2">reasoning</span> layer conducts deliberate and conscious thinking by applying symbolic rules and knowledge, <span class="ltx_text ltx_font_italic" id="S2.SS2.p5.1.3">action</span> layer facilitates trustworthy and reliable execution. This compositional approach allows agents to tackle complex challenges that require both data-driven learning and logical reasoning.</p> </div> </section> <section class="ltx_subsection" id="S2.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S2.SS3.5.1.1">II-C</span> </span><span class="ltx_text ltx_font_italic" id="S2.SS3.6.2">VSA-Based Symbolic Operations</span> </h3> <div class="ltx_para" id="S2.SS3.p1"> <p class="ltx_p" id="S2.SS3.p1.1"><span class="ltx_text ltx_font_bold" id="S2.SS3.p1.1.1">Vector-symbolic architecture (VSA).</span> Within the compositional neurosymbolic AI flow, exploiting VSA with neural dynamics has become the powerful approach <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib46" title="">46</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib32" title="">32</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib24" title="">24</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib96" title="">96</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib60" title="">60</a>]</cite>. Specifically, VSA provides a means to represent symbolic information in a low or high-dimensional vector space. By encoding symbolic structures as vectors with dimensionality-preserving algebraic operations, VSAs enable the combination of symbolic reasoning with neural networks, thereby facilitating cognitive tasks such as learning, memory, and reasoning in a unified system. Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.F3" title="Figure 3 ‣ II-C VSA-Based Symbolic Operations ‣ II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">3</span></a> illustrates a simple example of the binding ambiguity of neural networks and the functionality of VSA structures.</p> </div> <figure class="ltx_figure" id="S2.F3"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="287" id="S2.F3.g1" src="x3.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S2.F3.3.1.1" style="font-size:90%;">Figure 3</span>: </span><span class="ltx_text ltx_font_bold" id="S2.F3.4.2" style="font-size:90%;">Illustration of VSA functionality.<span class="ltx_text ltx_font_medium" id="S2.F3.4.2.1"> Neural network suffers from binding ambiguity issues, whereas VSA constructs vector representations with circular convolution operations for reasoning process.</span></span></figcaption> </figure> <div class="ltx_para" id="S2.SS3.p2"> <p class="ltx_p" id="S2.SS3.p2.11"><span class="ltx_text ltx_font_bold" id="S2.SS3.p2.11.1">Circular convolution.</span> A key VSA operation is the blockwise <span class="ltx_text ltx_font_italic" id="S2.SS3.p2.11.2">circular convolution</span>, serving as a universal operation for vectors representing different symbols. Circular convolution combines two vectors in a way that preserves the information from both, making it suitable for representing composite symbols. Mathematically, the circular convolution of two vectors <math alttext="\mathbf{A}" class="ltx_Math" display="inline" id="S2.SS3.p2.1.m1.1"><semantics id="S2.SS3.p2.1.m1.1a"><mi id="S2.SS3.p2.1.m1.1.1" xref="S2.SS3.p2.1.m1.1.1.cmml">𝐀</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.1.m1.1b"><ci id="S2.SS3.p2.1.m1.1.1.cmml" xref="S2.SS3.p2.1.m1.1.1">𝐀</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.1.m1.1c">\mathbf{A}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.1.m1.1d">bold_A</annotation></semantics></math> and <math alttext="\mathbf{B}" class="ltx_Math" display="inline" id="S2.SS3.p2.2.m2.1"><semantics id="S2.SS3.p2.2.m2.1a"><mi id="S2.SS3.p2.2.m2.1.1" xref="S2.SS3.p2.2.m2.1.1.cmml">𝐁</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.2.m2.1b"><ci id="S2.SS3.p2.2.m2.1.1.cmml" xref="S2.SS3.p2.2.m2.1.1">𝐁</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.2.m2.1c">\mathbf{B}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.2.m2.1d">bold_B</annotation></semantics></math> (each of dimension <math alttext="N" class="ltx_Math" display="inline" id="S2.SS3.p2.3.m3.1"><semantics id="S2.SS3.p2.3.m3.1a"><mi id="S2.SS3.p2.3.m3.1.1" xref="S2.SS3.p2.3.m3.1.1.cmml">N</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.3.m3.1b"><ci id="S2.SS3.p2.3.m3.1.1.cmml" xref="S2.SS3.p2.3.m3.1.1">𝑁</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.3.m3.1c">N</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.3.m3.1d">italic_N</annotation></semantics></math>) generates vector <math alttext="\mathbf{C}" class="ltx_Math" display="inline" id="S2.SS3.p2.4.m4.1"><semantics id="S2.SS3.p2.4.m4.1a"><mi id="S2.SS3.p2.4.m4.1.1" xref="S2.SS3.p2.4.m4.1.1.cmml">𝐂</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.4.m4.1b"><ci id="S2.SS3.p2.4.m4.1.1.cmml" xref="S2.SS3.p2.4.m4.1.1">𝐂</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.4.m4.1c">\mathbf{C}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.4.m4.1d">bold_C</annotation></semantics></math> as <math alttext="C[n]=\sum_{k=0}^{N-1}A[k]\cdot B[(n-k)\mod N]" class="ltx_Math" display="inline" id="S2.SS3.p2.5.m5.3"><semantics id="S2.SS3.p2.5.m5.3a"><mrow id="S2.SS3.p2.5.m5.3.3" xref="S2.SS3.p2.5.m5.3.3.cmml"><mrow id="S2.SS3.p2.5.m5.3.3.3" xref="S2.SS3.p2.5.m5.3.3.3.cmml"><mi id="S2.SS3.p2.5.m5.3.3.3.2" xref="S2.SS3.p2.5.m5.3.3.3.2.cmml">C</mi><mo id="S2.SS3.p2.5.m5.3.3.3.1" xref="S2.SS3.p2.5.m5.3.3.3.1.cmml">⁢</mo><mrow id="S2.SS3.p2.5.m5.3.3.3.3.2" xref="S2.SS3.p2.5.m5.3.3.3.3.1.cmml"><mo id="S2.SS3.p2.5.m5.3.3.3.3.2.1" stretchy="false" xref="S2.SS3.p2.5.m5.3.3.3.3.1.1.cmml">[</mo><mi id="S2.SS3.p2.5.m5.1.1" xref="S2.SS3.p2.5.m5.1.1.cmml">n</mi><mo id="S2.SS3.p2.5.m5.3.3.3.3.2.2" stretchy="false" xref="S2.SS3.p2.5.m5.3.3.3.3.1.1.cmml">]</mo></mrow></mrow><mo id="S2.SS3.p2.5.m5.3.3.2" rspace="0.111em" xref="S2.SS3.p2.5.m5.3.3.2.cmml">=</mo><mrow id="S2.SS3.p2.5.m5.3.3.1" xref="S2.SS3.p2.5.m5.3.3.1.cmml"><msubsup id="S2.SS3.p2.5.m5.3.3.1.2" xref="S2.SS3.p2.5.m5.3.3.1.2.cmml"><mo id="S2.SS3.p2.5.m5.3.3.1.2.2.2" xref="S2.SS3.p2.5.m5.3.3.1.2.2.2.cmml">∑</mo><mrow id="S2.SS3.p2.5.m5.3.3.1.2.2.3" xref="S2.SS3.p2.5.m5.3.3.1.2.2.3.cmml"><mi id="S2.SS3.p2.5.m5.3.3.1.2.2.3.2" xref="S2.SS3.p2.5.m5.3.3.1.2.2.3.2.cmml">k</mi><mo id="S2.SS3.p2.5.m5.3.3.1.2.2.3.1" xref="S2.SS3.p2.5.m5.3.3.1.2.2.3.1.cmml">=</mo><mn id="S2.SS3.p2.5.m5.3.3.1.2.2.3.3" xref="S2.SS3.p2.5.m5.3.3.1.2.2.3.3.cmml">0</mn></mrow><mrow id="S2.SS3.p2.5.m5.3.3.1.2.3" xref="S2.SS3.p2.5.m5.3.3.1.2.3.cmml"><mi id="S2.SS3.p2.5.m5.3.3.1.2.3.2" xref="S2.SS3.p2.5.m5.3.3.1.2.3.2.cmml">N</mi><mo id="S2.SS3.p2.5.m5.3.3.1.2.3.1" xref="S2.SS3.p2.5.m5.3.3.1.2.3.1.cmml">−</mo><mn id="S2.SS3.p2.5.m5.3.3.1.2.3.3" xref="S2.SS3.p2.5.m5.3.3.1.2.3.3.cmml">1</mn></mrow></msubsup><mrow id="S2.SS3.p2.5.m5.3.3.1.1" xref="S2.SS3.p2.5.m5.3.3.1.1.cmml"><mrow id="S2.SS3.p2.5.m5.3.3.1.1.3" xref="S2.SS3.p2.5.m5.3.3.1.1.3.cmml"><mrow id="S2.SS3.p2.5.m5.3.3.1.1.3.2" xref="S2.SS3.p2.5.m5.3.3.1.1.3.2.cmml"><mi id="S2.SS3.p2.5.m5.3.3.1.1.3.2.2" xref="S2.SS3.p2.5.m5.3.3.1.1.3.2.2.cmml">A</mi><mo id="S2.SS3.p2.5.m5.3.3.1.1.3.2.1" xref="S2.SS3.p2.5.m5.3.3.1.1.3.2.1.cmml">⁢</mo><mrow id="S2.SS3.p2.5.m5.3.3.1.1.3.2.3.2" xref="S2.SS3.p2.5.m5.3.3.1.1.3.2.3.1.cmml"><mo id="S2.SS3.p2.5.m5.3.3.1.1.3.2.3.2.1" stretchy="false" xref="S2.SS3.p2.5.m5.3.3.1.1.3.2.3.1.1.cmml">[</mo><mi id="S2.SS3.p2.5.m5.2.2" xref="S2.SS3.p2.5.m5.2.2.cmml">k</mi><mo id="S2.SS3.p2.5.m5.3.3.1.1.3.2.3.2.2" rspace="0.055em" stretchy="false" xref="S2.SS3.p2.5.m5.3.3.1.1.3.2.3.1.1.cmml">]</mo></mrow></mrow><mo id="S2.SS3.p2.5.m5.3.3.1.1.3.1" rspace="0.222em" xref="S2.SS3.p2.5.m5.3.3.1.1.3.1.cmml">⋅</mo><mi id="S2.SS3.p2.5.m5.3.3.1.1.3.3" xref="S2.SS3.p2.5.m5.3.3.1.1.3.3.cmml">B</mi></mrow><mo id="S2.SS3.p2.5.m5.3.3.1.1.2" xref="S2.SS3.p2.5.m5.3.3.1.1.2.cmml">⁢</mo><mrow id="S2.SS3.p2.5.m5.3.3.1.1.1.1" xref="S2.SS3.p2.5.m5.3.3.1.1.1.2.cmml"><mo id="S2.SS3.p2.5.m5.3.3.1.1.1.1.2" stretchy="false" xref="S2.SS3.p2.5.m5.3.3.1.1.1.2.1.cmml">[</mo><mrow id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.cmml"><mrow id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.cmml"><mo id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.2" stretchy="false" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.cmml">(</mo><mrow id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.cmml"><mi id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.2" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.2.cmml">n</mi><mo id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.1" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.1.cmml">−</mo><mi id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.3" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.3.cmml">k</mi></mrow><mo id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.3" stretchy="false" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.cmml">)</mo></mrow><mo id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.2" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.2.cmml">mod</mo><mi id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.3" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.3.cmml">N</mi></mrow><mo id="S2.SS3.p2.5.m5.3.3.1.1.1.1.3" stretchy="false" xref="S2.SS3.p2.5.m5.3.3.1.1.1.2.1.cmml">]</mo></mrow></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.5.m5.3b"><apply id="S2.SS3.p2.5.m5.3.3.cmml" xref="S2.SS3.p2.5.m5.3.3"><eq id="S2.SS3.p2.5.m5.3.3.2.cmml" xref="S2.SS3.p2.5.m5.3.3.2"></eq><apply id="S2.SS3.p2.5.m5.3.3.3.cmml" xref="S2.SS3.p2.5.m5.3.3.3"><times id="S2.SS3.p2.5.m5.3.3.3.1.cmml" xref="S2.SS3.p2.5.m5.3.3.3.1"></times><ci id="S2.SS3.p2.5.m5.3.3.3.2.cmml" xref="S2.SS3.p2.5.m5.3.3.3.2">𝐶</ci><apply id="S2.SS3.p2.5.m5.3.3.3.3.1.cmml" xref="S2.SS3.p2.5.m5.3.3.3.3.2"><csymbol cd="latexml" id="S2.SS3.p2.5.m5.3.3.3.3.1.1.cmml" xref="S2.SS3.p2.5.m5.3.3.3.3.2.1">delimited-[]</csymbol><ci id="S2.SS3.p2.5.m5.1.1.cmml" xref="S2.SS3.p2.5.m5.1.1">𝑛</ci></apply></apply><apply id="S2.SS3.p2.5.m5.3.3.1.cmml" xref="S2.SS3.p2.5.m5.3.3.1"><apply id="S2.SS3.p2.5.m5.3.3.1.2.cmml" xref="S2.SS3.p2.5.m5.3.3.1.2"><csymbol cd="ambiguous" id="S2.SS3.p2.5.m5.3.3.1.2.1.cmml" xref="S2.SS3.p2.5.m5.3.3.1.2">superscript</csymbol><apply id="S2.SS3.p2.5.m5.3.3.1.2.2.cmml" xref="S2.SS3.p2.5.m5.3.3.1.2"><csymbol cd="ambiguous" id="S2.SS3.p2.5.m5.3.3.1.2.2.1.cmml" xref="S2.SS3.p2.5.m5.3.3.1.2">subscript</csymbol><sum id="S2.SS3.p2.5.m5.3.3.1.2.2.2.cmml" xref="S2.SS3.p2.5.m5.3.3.1.2.2.2"></sum><apply id="S2.SS3.p2.5.m5.3.3.1.2.2.3.cmml" xref="S2.SS3.p2.5.m5.3.3.1.2.2.3"><eq id="S2.SS3.p2.5.m5.3.3.1.2.2.3.1.cmml" xref="S2.SS3.p2.5.m5.3.3.1.2.2.3.1"></eq><ci id="S2.SS3.p2.5.m5.3.3.1.2.2.3.2.cmml" xref="S2.SS3.p2.5.m5.3.3.1.2.2.3.2">𝑘</ci><cn id="S2.SS3.p2.5.m5.3.3.1.2.2.3.3.cmml" type="integer" xref="S2.SS3.p2.5.m5.3.3.1.2.2.3.3">0</cn></apply></apply><apply id="S2.SS3.p2.5.m5.3.3.1.2.3.cmml" xref="S2.SS3.p2.5.m5.3.3.1.2.3"><minus id="S2.SS3.p2.5.m5.3.3.1.2.3.1.cmml" xref="S2.SS3.p2.5.m5.3.3.1.2.3.1"></minus><ci id="S2.SS3.p2.5.m5.3.3.1.2.3.2.cmml" xref="S2.SS3.p2.5.m5.3.3.1.2.3.2">𝑁</ci><cn id="S2.SS3.p2.5.m5.3.3.1.2.3.3.cmml" type="integer" xref="S2.SS3.p2.5.m5.3.3.1.2.3.3">1</cn></apply></apply><apply id="S2.SS3.p2.5.m5.3.3.1.1.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1"><times id="S2.SS3.p2.5.m5.3.3.1.1.2.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.2"></times><apply id="S2.SS3.p2.5.m5.3.3.1.1.3.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.3"><ci id="S2.SS3.p2.5.m5.3.3.1.1.3.1.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.3.1">⋅</ci><apply id="S2.SS3.p2.5.m5.3.3.1.1.3.2.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.3.2"><times id="S2.SS3.p2.5.m5.3.3.1.1.3.2.1.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.3.2.1"></times><ci id="S2.SS3.p2.5.m5.3.3.1.1.3.2.2.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.3.2.2">𝐴</ci><apply id="S2.SS3.p2.5.m5.3.3.1.1.3.2.3.1.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.3.2.3.2"><csymbol cd="latexml" id="S2.SS3.p2.5.m5.3.3.1.1.3.2.3.1.1.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.3.2.3.2.1">delimited-[]</csymbol><ci id="S2.SS3.p2.5.m5.2.2.cmml" xref="S2.SS3.p2.5.m5.2.2">𝑘</ci></apply></apply><ci id="S2.SS3.p2.5.m5.3.3.1.1.3.3.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.3.3">𝐵</ci></apply><apply id="S2.SS3.p2.5.m5.3.3.1.1.1.2.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1"><csymbol cd="latexml" id="S2.SS3.p2.5.m5.3.3.1.1.1.2.1.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.2">delimited-[]</csymbol><apply id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1"><csymbol cd="latexml" id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.2.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.2">modulo</csymbol><apply id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1"><minus id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.1.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.1"></minus><ci id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.2.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.2">𝑛</ci><ci id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.3.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.1.1.1.3">𝑘</ci></apply><ci id="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.3.cmml" xref="S2.SS3.p2.5.m5.3.3.1.1.1.1.1.3">𝑁</ci></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.5.m5.3c">C[n]=\sum_{k=0}^{N-1}A[k]\cdot B[(n-k)\mod N]</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.5.m5.3d">italic_C [ italic_n ] = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT italic_A [ italic_k ] ⋅ italic_B [ ( italic_n - italic_k ) roman_mod italic_N ]</annotation></semantics></math> where each element of <math alttext="\mathbf{C}" class="ltx_Math" display="inline" id="S2.SS3.p2.6.m6.1"><semantics id="S2.SS3.p2.6.m6.1a"><mi id="S2.SS3.p2.6.m6.1.1" xref="S2.SS3.p2.6.m6.1.1.cmml">𝐂</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.6.m6.1b"><ci id="S2.SS3.p2.6.m6.1.1.cmml" xref="S2.SS3.p2.6.m6.1.1">𝐂</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.6.m6.1c">\mathbf{C}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.6.m6.1d">bold_C</annotation></semantics></math> is obtained by multiplying the elements of <math alttext="\mathbf{A}" class="ltx_Math" display="inline" id="S2.SS3.p2.7.m7.1"><semantics id="S2.SS3.p2.7.m7.1a"><mi id="S2.SS3.p2.7.m7.1.1" xref="S2.SS3.p2.7.m7.1.1.cmml">𝐀</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.7.m7.1b"><ci id="S2.SS3.p2.7.m7.1.1.cmml" xref="S2.SS3.p2.7.m7.1.1">𝐀</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.7.m7.1c">\mathbf{A}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.7.m7.1d">bold_A</annotation></semantics></math> with the circularly shifted elements of <math alttext="\mathbf{B}" class="ltx_Math" display="inline" id="S2.SS3.p2.8.m8.1"><semantics id="S2.SS3.p2.8.m8.1a"><mi id="S2.SS3.p2.8.m8.1.1" xref="S2.SS3.p2.8.m8.1.1.cmml">𝐁</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.8.m8.1b"><ci id="S2.SS3.p2.8.m8.1.1.cmml" xref="S2.SS3.p2.8.m8.1.1">𝐁</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.8.m8.1c">\mathbf{B}</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.8.m8.1d">bold_B</annotation></semantics></math>, and then summing up. This process is repeated for each element <math alttext="n" class="ltx_Math" display="inline" id="S2.SS3.p2.9.m9.1"><semantics id="S2.SS3.p2.9.m9.1a"><mi id="S2.SS3.p2.9.m9.1.1" xref="S2.SS3.p2.9.m9.1.1.cmml">n</mi><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.9.m9.1b"><ci id="S2.SS3.p2.9.m9.1.1.cmml" xref="S2.SS3.p2.9.m9.1.1">𝑛</ci></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.9.m9.1c">n</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.9.m9.1d">italic_n</annotation></semantics></math> (<math alttext="0" class="ltx_Math" display="inline" id="S2.SS3.p2.10.m10.1"><semantics id="S2.SS3.p2.10.m10.1a"><mn id="S2.SS3.p2.10.m10.1.1" xref="S2.SS3.p2.10.m10.1.1.cmml">0</mn><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.10.m10.1b"><cn id="S2.SS3.p2.10.m10.1.1.cmml" type="integer" xref="S2.SS3.p2.10.m10.1.1">0</cn></annotation-xml></semantics></math> to <math alttext="N-1" class="ltx_Math" display="inline" id="S2.SS3.p2.11.m11.1"><semantics id="S2.SS3.p2.11.m11.1a"><mrow id="S2.SS3.p2.11.m11.1.1" xref="S2.SS3.p2.11.m11.1.1.cmml"><mi id="S2.SS3.p2.11.m11.1.1.2" xref="S2.SS3.p2.11.m11.1.1.2.cmml">N</mi><mo id="S2.SS3.p2.11.m11.1.1.1" xref="S2.SS3.p2.11.m11.1.1.1.cmml">−</mo><mn id="S2.SS3.p2.11.m11.1.1.3" xref="S2.SS3.p2.11.m11.1.1.3.cmml">1</mn></mrow><annotation-xml encoding="MathML-Content" id="S2.SS3.p2.11.m11.1b"><apply id="S2.SS3.p2.11.m11.1.1.cmml" xref="S2.SS3.p2.11.m11.1.1"><minus id="S2.SS3.p2.11.m11.1.1.1.cmml" xref="S2.SS3.p2.11.m11.1.1.1"></minus><ci id="S2.SS3.p2.11.m11.1.1.2.cmml" xref="S2.SS3.p2.11.m11.1.1.2">𝑁</ci><cn id="S2.SS3.p2.11.m11.1.1.3.cmml" type="integer" xref="S2.SS3.p2.11.m11.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S2.SS3.p2.11.m11.1c">N-1</annotation><annotation encoding="application/x-llamapun" id="S2.SS3.p2.11.m11.1d">italic_N - 1</annotation></semantics></math>). Circular convolution has commutativity and associativity properties, making it particularly effective in hierarchical reasoning tasks where manipulating structured information is critical.</p> </div> <div class="ltx_para" id="S2.SS3.p3"> <p class="ltx_p" id="S2.SS3.p3.1"><span class="ltx_text ltx_font_bold" id="S2.SS3.p3.1.1">Symbolic knowledge codebook.</span> Symbolic knowledge is typically represented as a set of codebooks for the attributes of interest (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.F3" title="Figure 3 ‣ II-C VSA-Based Symbolic Operations ‣ II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">3</span></a>). To describe an object with various attributes, a product vector can be computed by binding knowledge codebooks via circular convolution. Due to the properties of multiplicative binding, the co-activated VSA representations result in minimal interference, allowing each object to be accurately recovered. The query vector generated from neural networks will be compared with all codebook vectors to derive attributes for further reasoning. The codebook is typically in the order of tens to hundreds of MB, making it impractical to be cached on-chip in edge accelerators for complex tasks.</p> </div> <figure class="ltx_table" id="S2.T1"> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S2.T1.3.1.1" style="font-size:90%;">TABLE I</span>: </span><span class="ltx_text ltx_font_bold" id="S2.T1.4.2" style="font-size:90%;">Neurosymbolic models.<span class="ltx_text ltx_font_medium" id="S2.T1.4.2.1"> Selected neurosymbolic AI workloads for analysis, representing a diverse of application scenarios.</span></span></figcaption> <div class="ltx_inline-block ltx_transformed_outer" id="S2.T1.5" style="width:433.6pt;height:83.4pt;vertical-align:-0.2pt;"><span class="ltx_transformed_inner" style="transform:translate(-1041.0pt,199.6pt) scale(0.172365769516782,0.172365769516782) ;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S2.T1.5.1.1.1"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" colspan="2" id="S2.T1.5.1.1.1.1" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.1.1.1.1"> <tr class="ltx_tr" id="S2.T1.5.1.1.1.1.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.1.1.1.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.1.1.1.1.1" style="font-size:298%;">Representative Neuro-</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.1.1.1.1.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.1.1.1.1.2.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.1.1.2.1.1" style="font-size:298%;">Symbolic AI Workloads</span></td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.1.1.2" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.1.1.2.1"> <tr class="ltx_tr" id="S2.T1.5.1.1.1.2.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.1.1.2.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.2.1.1.1.1" style="font-size:298%;">Neuro-Vector-Symbolic</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.1.1.2.1.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.1.1.2.1.2.1" style="padding:2.25pt 5.0pt;"> <span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.2.1.2.1.1" style="font-size:298%;">Architecture </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.2.1.2.1.2.1" style="font-size:298%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.2.1.2.1.3.2" style="font-size:298%;">]</span></cite> </td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.1.1.3" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.1.1.3.1"> <tr class="ltx_tr" id="S2.T1.5.1.1.1.3.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.1.1.3.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.3.1.1.1.1" style="font-size:298%;">Multiple-Input-Multiple-Output</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.1.1.3.1.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.1.1.3.1.2.1" style="padding:2.25pt 5.0pt;"> <span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.3.1.2.1.1" style="font-size:298%;">Neural Networks </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.3.1.2.1.2.1" style="font-size:298%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib60" title="">60</a><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.3.1.2.1.3.2" style="font-size:298%;">]</span></cite> </td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.1.1.4" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.1.1.4.1"> <tr class="ltx_tr" id="S2.T1.5.1.1.1.4.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.1.1.4.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.4.1.1.1.1" style="font-size:298%;">Probabilistic Abduction via Learning</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.1.1.4.1.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.1.1.4.1.2.1" style="padding:2.25pt 5.0pt;"> <span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.4.1.2.1.1" style="font-size:298%;">Rules in Vector-symbolic Architecture </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.4.1.2.1.2.1" style="font-size:298%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib32" title="">32</a><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.4.1.2.1.3.2" style="font-size:298%;">]</span></cite> </td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.5.1.1.1.5" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.1.1.5.1"> <tr class="ltx_tr" id="S2.T1.5.1.1.1.5.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.1.1.5.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.5.1.1.1.1" style="font-size:298%;">Probabilistic Abduction</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.1.1.5.1.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.1.1.5.1.2.1" style="padding:2.25pt 5.0pt;"> <span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.5.1.2.1.1" style="font-size:298%;">and Execution Learner </span><cite class="ltx_cite ltx_citemacro_cite"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.5.1.2.1.2.1" style="font-size:298%;">[</span><a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib96" title="">96</a><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.1.1.5.1.2.1.3.2" style="font-size:298%;">]</span></cite> </td> </tr> </table> </td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.2.2"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" colspan="2" id="S2.T1.5.1.2.2.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.2.2.1.1" style="font-size:298%;">Abbreviation</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.2.2.2" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.2.2.2.1" style="font-size:298%;">NVSA</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.2.2.3" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.2.2.3.1" style="font-size:298%;">MIMONet</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.2.2.4" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.2.2.4.1" style="font-size:298%;">LVRF</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.5.1.2.2.5" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.2.2.5.1" style="font-size:298%;">PrAE</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.3.3"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" colspan="2" id="S2.T1.5.1.3.3.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.3.3.1.1" style="font-size:298%;">Learning Approach</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.3.3.2" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.3.3.2.1"> <tr class="ltx_tr" id="S2.T1.5.1.3.3.2.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.3.3.2.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.3.3.2.1.1.1.1" style="font-size:298%;">Supervised/Unsupervised</span></td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.3.3.3" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.3.3.3.1" style="font-size:298%;">Supervised</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.3.3.4" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.3.3.4.1" style="font-size:298%;">Supervised</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.5.1.3.3.5" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.3.3.5.1"> <tr class="ltx_tr" id="S2.T1.5.1.3.3.5.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.3.3.5.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.3.3.5.1.1.1.1" style="font-size:298%;">Supervised/Unsupervised</span></td> </tr> </table> </td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.4.4"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.4.4.1" rowspan="2" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.4.4.1.1" style="font-size:298%;"> <span class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.4.4.1.1.1"> <span class="ltx_tr" id="S2.T1.5.1.4.4.1.1.1.1"> <span class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.4.4.1.1.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.4.4.1.1.1.1.1.1">Compute</span></span></span> <span class="ltx_tr" id="S2.T1.5.1.4.4.1.1.1.2"> <span class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.4.4.1.1.1.2.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.4.4.1.1.1.2.1.1">Pattern</span></span></span> </span></span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.4.4.2" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.4.4.2.1" style="font-size:298%;">Neuro</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.4.4.3" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.4.4.3.1" style="font-size:298%;">CNN</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.4.4.4" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.4.4.4.1" style="font-size:298%;">CNN/Transformer</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.4.4.5" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.4.4.5.1" style="font-size:298%;">CNN</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.5.1.4.4.6" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.4.4.6.1" style="font-size:298%;">CNN</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.5.5"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.5.5.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.5.5.1.1" style="font-size:298%;">Symbolic</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.5.5.2" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.5.5.2.1" style="font-size:298%;">VSA binding/unbinding (Circular Conv)</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.5.5.3" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.5.5.3.1" style="font-size:298%;">VSA binding (Circular Conv)</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.5.5.4" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.5.5.4.1" style="font-size:298%;">VSA binding/unbinding (Circular Conv)</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.5.1.5.5.5" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.5.5.5.1" style="font-size:298%;">Probabilistic abduction</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.6.6"> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S2.T1.5.1.6.6.1" rowspan="3" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.6.6.1.1" style="font-size:298%;"> <span class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.6.6.1.1.1"> <span class="ltx_tr" id="S2.T1.5.1.6.6.1.1.1.1"> <span class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.6.6.1.1.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.6.6.1.1.1.1.1.1">Application</span></span></span> <span class="ltx_tr" id="S2.T1.5.1.6.6.1.1.1.2"> <span class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.6.6.1.1.1.2.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.6.6.1.1.1.2.1.1">Scenario</span></span></span> </span></span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.6.6.2" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.6.6.2.1" style="font-size:298%;">Use Case</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.6.6.3" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.6.6.3.1"> <tr class="ltx_tr" id="S2.T1.5.1.6.6.3.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.6.6.3.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.6.6.3.1.1.1.1" style="font-size:298%;">Spatial-temporal reasoning, Fluid</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.6.6.3.1.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.6.6.3.1.2.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.6.6.3.1.2.1.1" style="font-size:298%;">intelligence, Abstract reasoning</span></td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.6.6.4" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.6.6.4.1"> <tr class="ltx_tr" id="S2.T1.5.1.6.6.4.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.6.6.4.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.6.6.4.1.1.1.1" style="font-size:298%;">Multi-input simultaneously processing</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.6.6.4.1.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.6.6.4.1.2.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.6.6.4.1.2.1.1" style="font-size:298%;">with single CNN/Transformer</span></td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S2.T1.5.1.6.6.5" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.6.6.5.1"> <tr class="ltx_tr" id="S2.T1.5.1.6.6.5.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.6.6.5.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.6.6.5.1.1.1.1" style="font-size:298%;">Probabilistic reasoning, Analogy reasoning,</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.6.6.5.1.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.6.6.5.1.2.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.6.6.5.1.2.1.1" style="font-size:298%;">Out-of-distribution (OOD) data processing</span></td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_t" id="S2.T1.5.1.6.6.6" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.6.6.6.1"> <tr class="ltx_tr" id="S2.T1.5.1.6.6.6.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.6.6.6.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.6.6.6.1.1.1.1" style="font-size:298%;">Spatial-temporal reasoning, Fluid</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.6.6.6.1.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.6.6.6.1.2.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.6.6.6.1.2.1.1" style="font-size:298%;">intelligence, Abstract reasoning</span></td> </tr> </table> </td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.7.7"> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_r ltx_border_t" id="S2.T1.5.1.7.7.1" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.7.7.1.1"> <tr class="ltx_tr" id="S2.T1.5.1.7.7.1.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.7.7.1.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.7.7.1.1.1.1.1" style="font-size:298%;">Advantage vs.</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.7.7.1.1.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.7.7.1.1.2.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text ltx_font_bold" id="S2.T1.5.1.7.7.1.1.2.1.1" style="font-size:298%;">Neural Model</span></td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_r ltx_border_t" id="S2.T1.5.1.7.7.2" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.7.7.2.1"> <tr class="ltx_tr" id="S2.T1.5.1.7.7.2.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.7.7.2.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.7.7.2.1.1.1.1" style="font-size:298%;">Higher joint representation efficiency,</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.7.7.2.1.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.7.7.2.1.2.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.7.7.2.1.2.1.1" style="font-size:298%;">Better reasoning capability, Transparency</span></td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_r ltx_border_t" id="S2.T1.5.1.7.7.3" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.7.7.3.1"> <tr class="ltx_tr" id="S2.T1.5.1.7.7.3.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.7.7.3.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.7.7.3.1.1.1.1" style="font-size:298%;">Higher throughput, Lower latency,</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.7.7.3.1.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.7.7.3.1.2.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.7.7.3.1.2.1.1" style="font-size:298%;">Compositional compute, Transparency</span></td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_r ltx_border_t" id="S2.T1.5.1.7.7.4" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.7.7.4.1"> <tr class="ltx_tr" id="S2.T1.5.1.7.7.4.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.7.7.4.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.7.7.4.1.1.1.1" style="font-size:298%;">Stronger OOD handling capability, One-pass</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.7.7.4.1.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.7.7.4.1.2.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.7.7.4.1.2.1.1" style="font-size:298%;">learning, Higher flexibility, Transparency</span></td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_bb ltx_border_t" id="S2.T1.5.1.7.7.5" style="padding:2.25pt 5.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S2.T1.5.1.7.7.5.1"> <tr class="ltx_tr" id="S2.T1.5.1.7.7.5.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.7.7.5.1.1.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.7.7.5.1.1.1.1" style="font-size:298%;">Higher generalization, Transparency,</span></td> </tr> <tr class="ltx_tr" id="S2.T1.5.1.7.7.5.1.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S2.T1.5.1.7.7.5.1.2.1" style="padding:2.25pt 5.0pt;"><span class="ltx_text" id="S2.T1.5.1.7.7.5.1.2.1.1" style="font-size:298%;">Interpretability, Robustness</span></td> </tr> </table> </td> </tr> </tbody> </table> </span></div> </figure> </section> <section class="ltx_subsection" id="S2.SS4"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S2.SS4.5.1.1">II-D</span> </span><span class="ltx_text ltx_font_italic" id="S2.SS4.6.2">Representative Neurosymbolic AI Models</span> </h3> <div class="ltx_para" id="S2.SS4.p1"> <p class="ltx_p" id="S2.SS4.p1.1">Following the flow in Fig.<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S1.F2" title="Figure 2 ‣ I Introduction ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">2</span></a>, we analyze four VSA-based neurosymbolic workloads in detail: NVSA<cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>]</cite> for spatial-temporal reasoning, MIMONet for multi-input simultaneous processing <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib60" title="">60</a>]</cite>, LVRF for probabilistic abduction <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib32" title="">32</a>]</cite>, and PrAE for abstract reasoning <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib96" title="">96</a>]</cite>. These workloads achieve state-of-the-art performance and unlock advanced reasoning capabilities. Our goal is to understand their system and architectural challenges to enable scalable neurosymbolic deployment, where latency and efficiency are critical factors.</p> </div> <figure class="ltx_figure" id="S2.F4"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="333" id="S2.F4.g1" src="x4.png" width="1702"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S2.F4.7.1.1" style="font-size:90%;">Figure 4</span>: </span><span class="ltx_text ltx_font_bold" id="S2.F4.8.2" style="font-size:90%;">End-to-end neurosymbolic runtime, memory, and roofline characterization.<span class="ltx_text ltx_font_medium" id="S2.F4.8.2.1"> </span>(a)<span class="ltx_text ltx_font_medium" id="S2.F4.8.2.2"> Benchmark neurosymbolic models on CPU+GPU system, showing symbolic may serve as system bottleneck. </span>(b)<span class="ltx_text ltx_font_medium" id="S2.F4.8.2.3"> Benchmark neurosymbolic models on Coral TPU, TX2, NX, and 2080Ti GPU, showing that real-time performance cannot be satisfied. </span>(c)<span class="ltx_text ltx_font_medium" id="S2.F4.8.2.4"> Benchmark models on various task sizes, indicating the potential scalability problem. </span>(d)<span class="ltx_text ltx_font_medium" id="S2.F4.8.2.5"> Benchmark memory footprint of neurosymbolic models, showing large memory overhead of symbolic knowledge codebook.</span></span></figcaption> </figure> <div class="ltx_para" id="S2.SS4.p2"> <p class="ltx_p" id="S2.SS4.p2.1">Tab. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.T1" title="TABLE I ‣ II-C VSA-Based Symbolic Operations ‣ II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">I</span></a> lists the details of selected representative workloads:</p> </div> <section class="ltx_subsubsection" id="S2.SS4.SSS1"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection"><span class="ltx_text" id="S2.SS4.SSS1.5.1.1">II-D</span>1 </span>Neuro-Vector-Symbolic Architecture (NVSA)</h4> <div class="ltx_para" id="S2.SS4.SSS1.p1"> <p class="ltx_p" id="S2.SS4.SSS1.p1.1">NVSA is a neurosymbolic system advancing spatial-temporal abduction reasoning <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>]</cite>. Its <em class="ltx_emph ltx_font_italic" id="S2.SS4.SSS1.p1.1.1">neural</em> module handles visual perception, while the <em class="ltx_emph ltx_font_italic" id="S2.SS4.SSS1.p1.1.2">symbolic</em> module uses VSA-based operations for probabilistic inference, symbolic rule reasoning, and execution. NVSA bypasses the superposition catastrophe <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib67" title="">67</a>]</cite> and surpasses neural-only methods, achieving human-level performance on key fluid intelligence reasoning tests <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib95" title="">95</a>]</cite>.</p> </div> </section> <section class="ltx_subsubsection" id="S2.SS4.SSS2"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection"><span class="ltx_text" id="S2.SS4.SSS2.5.1.1">II-D</span>2 </span>Multiple-Input-Multiple-Output Networks (MIMONet)</h4> <div class="ltx_para" id="S2.SS4.SSS2.p1"> <p class="ltx_p" id="S2.SS4.SSS2.p1.1">MIMONet is a neurosymbolic model designed to handle multiple inputs and reduce computational cost per input <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib60" title="">60</a>]</cite>. Its <span class="ltx_text ltx_font_italic" id="S2.SS4.SSS2.p1.1.1">neural</span> modules use CNN/Transformer architectures, while its <span class="ltx_text ltx_font_italic" id="S2.SS4.SSS2.p1.1.2">symbolic</span> modules employ VSA binding/unbinding for encoding/decoding, enabling computation in superposition. MIMONet achieves 2-4<math alttext="\times" class="ltx_Math" display="inline" id="S2.SS4.SSS2.p1.1.m1.1"><semantics id="S2.SS4.SSS2.p1.1.m1.1a"><mo id="S2.SS4.SSS2.p1.1.m1.1.1" xref="S2.SS4.SSS2.p1.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S2.SS4.SSS2.p1.1.m1.1b"><times id="S2.SS4.SSS2.p1.1.m1.1.1.cmml" xref="S2.SS4.SSS2.p1.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S2.SS4.SSS2.p1.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S2.SS4.SSS2.p1.1.m1.1d">×</annotation></semantics></math> speedup with higher accuracy on LRA benchmarks compared to neural-only methods <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib82" title="">82</a>]</cite>.</p> </div> </section> <section class="ltx_subsubsection" id="S2.SS4.SSS3"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection"><span class="ltx_text" id="S2.SS4.SSS3.5.1.1">II-D</span>3 </span>Probabilistic Abduction via Learning Rules in Vector-symbolic Architectures (LVRF)</h4> <div class="ltx_para" id="S2.SS4.SSS3.p1"> <p class="ltx_p" id="S2.SS4.SSS3.p1.1">LVRF is a neurosymbolic architecture for visual reasoning and handling out-of-distribution data <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib32" title="">32</a>]</cite>. Its <span class="ltx_text ltx_font_italic" id="S2.SS4.SSS3.p1.1.1">neural</span> modules handle visual perception, while <span class="ltx_text ltx_font_italic" id="S2.SS4.SSS3.p1.1.2">symbolic</span> modules use VSA for probabilistic abduction reasoning. LVRF outperforms neural-only methods in unseen reasoning tasks <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib36" title="">36</a>]</cite>, offering greater flexibility and interpretability.</p> </div> </section> <section class="ltx_subsubsection" id="S2.SS4.SSS4"> <h4 class="ltx_title ltx_title_subsubsection"> <span class="ltx_tag ltx_tag_subsubsection"><span class="ltx_text" id="S2.SS4.SSS4.5.1.1">II-D</span>4 </span>Probabilistic Abduction and Execution (PrAE) Learner</h4> <div class="ltx_para" id="S2.SS4.SSS4.p1"> <p class="ltx_p" id="S2.SS4.SSS4.p1.1">PrAE is a neurosymbolic learner for spatial-temporal cognitive reasoning <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib96" title="">96</a>]</cite>. Its <span class="ltx_text ltx_font_italic" id="S2.SS4.SSS4.p1.1.1">neural</span> modules handle visual perception and produce scene representations, while the <span class="ltx_text ltx_font_italic" id="S2.SS4.SSS4.p1.1.2">symbolic</span> modules conduct probabilistic reasoning and abduct rules. PrAE offers human-level generalizability, transparency, and interpretability, which classic neural networks struggle to achieve.</p> </div> </section> </section> </section> <section class="ltx_section" id="S3"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">III </span><span class="ltx_text ltx_font_smallcaps" id="S3.1.1">Neurosymbolic AI System Profiling</span> </h2> <div class="ltx_para" id="S3.p1"> <p class="ltx_p" id="S3.p1.1">Building upon prior profiling study <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib87" title="">87</a>]</cite>, this section characterizes the system behavior of various vector-symbolic-based neurosymbolic models (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.SS1" title="III-A Profiling Setup ‣ III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">III-A</span></span></a>-<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.SS4" title="III-D Symbolic Operation and Inefficiency Analysis ‣ III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">III-D</span></span></a>), and provides workload insights for computer architects (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.SS5" title="III-E Unique Characteristics of Neurosymbolic vs ML Workloads ‣ III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">III-E</span></span></a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.SS6" title="III-F Identified Opportunities for Neurosymbolic Optimization ‣ III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">III-F</span></span></a>).</p> </div> <section class="ltx_subsection" id="S3.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS1.5.1.1">III-A</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS1.6.2">Profiling Setup</span> </h3> <div class="ltx_para" id="S3.SS1.p1"> <p class="ltx_p" id="S3.SS1.p1.1">To understand the real-device efficiency of neurosymbolic AI workload, we profile four representative models as elaborated in Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.SS4" title="II-D Representative Neurosymbolic AI Models ‣ II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">II-D</span></span></a>, in terms of runtime, memory, and compute operators, for solving cognitive reasoning problems on four devices, including Coral edge TPU (4 W), Jetson TX2 (15 W), Xavier NX (20 W), and RTX 2080Ti (250 W), respectively.</p> </div> </section> <section class="ltx_subsection" id="S3.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS2.5.1.1">III-B</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS2.6.2">Compute Latency Analysis</span> </h3> <div class="ltx_para" id="S3.SS2.p1"> <p class="ltx_p" id="S3.SS2.p1.1"><span class="ltx_text ltx_font_bold" id="S3.SS2.p1.1.1">End-to-end latency breakdown.</span> Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.F4" title="Figure 4 ‣ II-D Representative Neurosymbolic AI Models ‣ II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">4</span></a><span class="ltx_text" id="S3.SS2.p1.1.2" style="color:#0000FF;">a</span> and Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.F4" title="Figure 4 ‣ II-D Representative Neurosymbolic AI Models ‣ II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">4</span></a><span class="ltx_text" id="S3.SS2.p1.1.3" style="color:#0000FF;">b</span> illustrate the end-to-end latency breakdown of four neurosymbolic workloads. We can observe that (1) <span class="ltx_text ltx_font_italic" id="S3.SS2.p1.1.4">The real-time performance cannot be satisfied</span> on all four devices. Even if more computing resources are available to reduce NN runtime, the significant overhead of symbolic reasoning still prohibits real-time execution. (2) <span class="ltx_text ltx_font_italic" id="S3.SS2.p1.1.5">Symbolic operations consistently dominate runtime.</span> For example, the symbolic modules count for 87% of total NVSA inference time while its floating-point operations (FLOPS) count for only 19% of total NVSA FLOPS, indicating that the symbolic operations may not be well executed by GPU/TPU. (3) <span class="ltx_text ltx_font_italic" id="S3.SS2.p1.1.6">Symbolic reasoning computation lies on the critical path</span> due to its dependence on the neuro workloads.</p> </div> <figure class="ltx_figure" id="S3.F6"> <div class="ltx_flex_figure"> <div class="ltx_flex_cell ltx_flex_size_2"> <figure class="ltx_figure ltx_figure_panel ltx_minipage ltx_align_center ltx_align_bottom" id="S3.F6.1" style="width:203.8pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="624" id="S3.F6.1.g1" src="x5.png" width="706"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S3.F6.1.2.1.1" style="font-size:90%;">Figure 5</span>: </span><span class="ltx_text ltx_font_bold" id="S3.F6.1.3.2" style="font-size:90%;">Roofline analysis.<span class="ltx_text ltx_font_medium" id="S3.F6.1.3.2.1"> End-to-end neurosymbolic roofline characterization on RTX 2080Ti GPU, indicating that typically neuro is compute-bounded and symbolic is memory-bounded.</span></span></figcaption> </figure> </div> <div class="ltx_flex_cell ltx_flex_size_2"> <figure class="ltx_figure ltx_figure_panel ltx_minipage ltx_align_center ltx_align_bottom" id="S3.F6.2" style="width:212.5pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="605" id="S3.F6.2.g1" src="x6.png" width="697"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S3.F6.2.2.1.1" style="font-size:90%;">Figure 6</span>: </span><span class="ltx_text ltx_font_bold" id="S3.F6.2.3.2" style="font-size:90%;">Symbolic operation analysis.<span class="ltx_text ltx_font_medium" id="S3.F6.2.3.2.1"> Symbolic operations are dominated by vector-symbolic circular convolution and vector-vector multiplication stemming from hypervector representations.</span></span></figcaption> </figure> </div> </div> </figure> <figure class="ltx_table" id="S3.T2"> <figcaption class="ltx_caption ltx_centering" style="font-size:70%;"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S3.T2.5.1.1" style="font-size:129%;">TABLE II</span>: </span><span class="ltx_text ltx_font_bold" id="S3.T2.6.2" style="font-size:129%;">Hardware inefficiency analysis.<span class="ltx_text ltx_font_medium" id="S3.T2.6.2.1"> The compute, memory, and communication characteristics of representative neural and symbolic kernels on CPU/GPU platform.</span></span></figcaption> <div class="ltx_inline-block ltx_align_center ltx_transformed_outer" id="S3.T2.7" style="width:433.6pt;height:275.1pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="transform:translate(89.1pt,-56.5pt) scale(1.69793479431174,1.69793479431174) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S3.T2.7.1"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S3.T2.7.1.1.1"> <th class="ltx_td ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S3.T2.7.1.1.1.1" rowspan="2" style="padding:0.2pt 2.3pt;"></th> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" colspan="2" id="S3.T2.7.1.1.1.2" style="padding:0.2pt 2.3pt;"><span class="ltx_text ltx_font_bold" id="S3.T2.7.1.1.1.2.1" style="font-size:70%;">Neural Kernel</span></td> <td class="ltx_td ltx_align_center ltx_border_t" colspan="2" id="S3.T2.7.1.1.1.3" style="padding:0.2pt 2.3pt;"><span class="ltx_text ltx_font_bold" id="S3.T2.7.1.1.1.3.1" style="font-size:70%;">Symbolic Kernel</span></td> </tr> <tr class="ltx_tr" id="S3.T2.7.1.2.2"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.2.2.1" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.2.2.1.1" style="font-size:70%;">sgemm_nn</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.2.2.2" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.2.2.2.1" style="font-size:70%;">relu_nn</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.2.2.3" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.2.2.3.1" style="font-size:70%;">vectorized_elem</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.7.1.2.2.4" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.2.2.4.1" style="font-size:70%;">elementwise</span></td> </tr> <tr class="ltx_tr" id="S3.T2.7.1.3.3"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S3.T2.7.1.3.3.1" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.3.3.1.1" style="font-size:70%;">Compute Throughput (%)</span></th> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.3.3.2" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.3.3.2.1" style="font-size:70%;">95.1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.3.3.3" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.3.3.3.1" style="font-size:70%;">92.9</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.3.3.4" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.3.3.4.1" style="font-size:70%;">3.0</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.7.1.3.3.5" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.3.3.5.1" style="font-size:70%;">2.3</span></td> </tr> <tr class="ltx_tr" id="S3.T2.7.1.4.4"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S3.T2.7.1.4.4.1" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.4.4.1.1" style="font-size:70%;">ALU Utilization (%)</span></th> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.4.4.2" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.4.4.2.1" style="font-size:70%;">90.1</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.4.4.3" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.4.4.3.1" style="font-size:70%;">48.3</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.4.4.4" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.4.4.4.1" style="font-size:70%;">5.9</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.7.1.4.4.5" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.4.4.5.1" style="font-size:70%;">4.5</span></td> </tr> <tr class="ltx_tr" id="S3.T2.7.1.5.5"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S3.T2.7.1.5.5.1" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.5.5.1.1" style="font-size:70%;">L1 Cache Throughput (%)</span></th> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.5.5.2" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.5.5.2.1" style="font-size:70%;">79.7</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.5.5.3" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.5.5.3.1" style="font-size:70%;">82.6</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.5.5.4" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.5.5.4.1" style="font-size:70%;">28.4</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.7.1.5.5.5" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.5.5.5.1" style="font-size:70%;">10.8</span></td> </tr> <tr class="ltx_tr" id="S3.T2.7.1.6.6"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r" id="S3.T2.7.1.6.6.1" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.6.6.1.1" style="font-size:70%;">L2 Cache Throughput (%)</span></th> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T2.7.1.6.6.2" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.6.6.2.1" style="font-size:70%;">19.2</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T2.7.1.6.6.3" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.6.6.3.1" style="font-size:70%;">17.5</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T2.7.1.6.6.4" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.6.6.4.1" style="font-size:70%;">29.8</span></td> <td class="ltx_td ltx_align_center" id="S3.T2.7.1.6.6.5" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.6.6.5.1" style="font-size:70%;">22.8</span></td> </tr> <tr class="ltx_tr" id="S3.T2.7.1.7.7"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S3.T2.7.1.7.7.1" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.7.7.1.1" style="font-size:70%;">L1 Cache Hit Rate (%)</span></th> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.7.7.2" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.7.7.2.1" style="font-size:70%;">1.6</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.7.7.3" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.7.7.3.1" style="font-size:70%;">51.6</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S3.T2.7.1.7.7.4" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.7.7.4.1" style="font-size:70%;">29.5</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S3.T2.7.1.7.7.5" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.7.7.5.1" style="font-size:70%;">33.3</span></td> </tr> <tr class="ltx_tr" id="S3.T2.7.1.8.8"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_r" id="S3.T2.7.1.8.8.1" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.8.8.1.1" style="font-size:70%;">L2 Cache Hit Rate (%)</span></th> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T2.7.1.8.8.2" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.8.8.2.1" style="font-size:70%;">86.8</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T2.7.1.8.8.3" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.8.8.3.1" style="font-size:70%;">65.5</span></td> <td class="ltx_td ltx_align_center ltx_border_r" id="S3.T2.7.1.8.8.4" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.8.8.4.1" style="font-size:70%;">48.6</span></td> <td class="ltx_td ltx_align_center" id="S3.T2.7.1.8.8.5" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.8.8.5.1" style="font-size:70%;">34.3</span></td> </tr> <tr class="ltx_tr" id="S3.T2.7.1.9.9"> <th class="ltx_td ltx_align_left ltx_th ltx_th_row ltx_border_b ltx_border_r ltx_border_t" id="S3.T2.7.1.9.9.1" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.9.9.1.1" style="font-size:70%;">DRAM BW Utilization (%)</span></th> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S3.T2.7.1.9.9.2" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.9.9.2.1" style="font-size:70%;">14.9</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S3.T2.7.1.9.9.3" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.9.9.3.1" style="font-size:70%;">24.2</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S3.T2.7.1.9.9.4" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.9.9.4.1" style="font-size:70%;">90.9</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S3.T2.7.1.9.9.5" style="padding:0.2pt 2.3pt;"><span class="ltx_text" id="S3.T2.7.1.9.9.5.1" style="font-size:70%;">78.4</span></td> </tr> </tbody> </table> </span></div> </figure> <figure class="ltx_figure" id="S3.F7"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="368" id="S3.F7.g1" src="x7.png" width="1702"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S3.F7.7.1.1" style="font-size:90%;">Figure 7</span>: </span><span class="ltx_text ltx_font_bold" id="S3.F7.8.2" style="font-size:90%;">CogSys system overview.<span class="ltx_text ltx_font_medium" id="S3.F7.8.2.1"> CogSys is an algorithm-hardware co-design framework for neurosymbolic AI with the <span class="ltx_text ltx_framed ltx_framed_underline" id="S3.F7.8.2.1.1">goal</span> to achieve efficient and scalable human-fluid intelligence and cognition systems. CogSys addresses the <span class="ltx_text ltx_framed ltx_framed_underline" id="S3.F7.8.2.1.2">challenges</span> of redundant storage, symbolic and vector operation latency bottleneck, sequential processing and hardware underutilization, by proposing <span class="ltx_text ltx_framed ltx_framed_underline" id="S3.F7.8.2.1.3">methodologies</span> including efficient factorization, reconfigurable PE, efficient dataflow, mapping, scalable architecture, multi-level parallelism and scheduler. CogSys is <span class="ltx_text ltx_framed ltx_framed_underline" id="S3.F7.8.2.1.4">deployed</span> across cognitive applications and consistently demonstrates performance-efficiency-accuracy improvements for neurosymbolic systems. </span></span></figcaption> </figure> <div class="ltx_para" id="S3.SS2.p2"> <p class="ltx_p" id="S3.SS2.p2.3"><span class="ltx_text ltx_font_bold" id="S3.SS2.p2.3.1">End-to-end latency scalability.</span> Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.F4" title="Figure 4 ‣ II-D Representative Neurosymbolic AI Models ‣ II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">4</span></a><span class="ltx_text" id="S3.SS2.p2.3.2" style="color:#0000FF;">c</span> indicates that the neuro vs. symbolic runtime proportion remains relatively stable across various tasks and sizes. For example, when Raven’s Progressive Matrices (RPM) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib95" title="">95</a>]</cite> task size increases from 2<math alttext="\times" class="ltx_Math" display="inline" id="S3.SS2.p2.1.m1.1"><semantics id="S3.SS2.p2.1.m1.1a"><mo id="S3.SS2.p2.1.m1.1.1" xref="S3.SS2.p2.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S3.SS2.p2.1.m1.1b"><times id="S3.SS2.p2.1.m1.1.1.cmml" xref="S3.SS2.p2.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p2.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p2.1.m1.1d">×</annotation></semantics></math>2 to 3<math alttext="\times" class="ltx_Math" display="inline" id="S3.SS2.p2.2.m2.1"><semantics id="S3.SS2.p2.2.m2.1a"><mo id="S3.SS2.p2.2.m2.1.1" xref="S3.SS2.p2.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S3.SS2.p2.2.m2.1b"><times id="S3.SS2.p2.2.m2.1.1.cmml" xref="S3.SS2.p2.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p2.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p2.2.m2.1d">×</annotation></semantics></math>3, the NVSA symbolic modules runtime changes from 91.6% to 87.4%, while the total runtime increases by 5.02<math alttext="\times" class="ltx_Math" display="inline" id="S3.SS2.p2.3.m3.1"><semantics id="S3.SS2.p2.3.m3.1a"><mo id="S3.SS2.p2.3.m3.1.1" xref="S3.SS2.p2.3.m3.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S3.SS2.p2.3.m3.1b"><times id="S3.SS2.p2.3.m3.1.1.cmml" xref="S3.SS2.p2.3.m3.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p2.3.m3.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p2.3.m3.1d">×</annotation></semantics></math> on average across 14 test scenarios, <span class="ltx_text ltx_font_italic" id="S3.SS2.p2.3.3">indicating the scalability bottleneck of neurosymbolic models.</span></p> </div> </section> <section class="ltx_subsection" id="S3.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS3.5.1.1">III-C</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS3.6.2">Memory and System Analysis</span> </h3> <div class="ltx_para" id="S3.SS3.p1"> <p class="ltx_p" id="S3.SS3.p1.1"><span class="ltx_text ltx_font_bold" id="S3.SS3.p1.1.1">Memory footprint.</span> Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.F4" title="Figure 4 ‣ II-D Representative Neurosymbolic AI Models ‣ II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">4</span></a><span class="ltx_text" id="S3.SS3.p1.1.2" style="color:#0000FF;">d</span> characterizes the memory footprint of neurosymbolic workloads. We can observe that (1) <span class="ltx_text ltx_font_italic" id="S3.SS3.p1.1.3">Neural weights and symbolic codebooks typically consume large storage footprint</span>, because neural perception enables the expression of more object combinations than vector space dimensions, requiring the codebook to be large enough to ensure vector quasi-orthogonality. (2) Symbolic modules consume large memory due to a large number of vector operations depending on intermediate results and exhaustive search.</p> </div> <div class="ltx_para" id="S3.SS3.p2"> <p class="ltx_p" id="S3.SS3.p2.1"><span class="ltx_text ltx_font_bold" id="S3.SS3.p2.1.1">System Roofline Analysis.</span> Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.F6" title="Figure 6 ‣ III-B Compute Latency Analysis ‣ III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">6</span></a> employs the roofline model of RTX 2080Ti GPU version to quantify the neurosymbolic workloads. We observe that <span class="ltx_text ltx_font_italic" id="S3.SS3.p2.1.2">symbolic modules are memory-bounded while neuro modules are compute-bounded</span>. This is mainly due to symbolic operations requiring streaming vector elements, increasing the memory bandwidth pressure and resulting in hardware underutilization (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.SS4" title="III-D Symbolic Operation and Inefficiency Analysis ‣ III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">III-D</span></span></a>).</p> </div> </section> <section class="ltx_subsection" id="S3.SS4"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS4.5.1.1">III-D</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS4.6.2">Symbolic Operation and Inefficiency Analysis</span> </h3> <div class="ltx_para" id="S3.SS4.p1"> <p class="ltx_p" id="S3.SS4.p1.1"><span class="ltx_text ltx_font_bold" id="S3.SS4.p1.1.1">Symbolic operation analysis.</span> Inspired by the dominated symbolic bottleneck, we analyze their detailed operations in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.F6" title="Figure 6 ‣ III-B Compute Latency Analysis ‣ III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">6</span></a>. We observe that vector-symbolic <span class="ltx_text ltx_font_italic" id="S3.SS4.p1.1.2">circular convolution and vector-vector multiplication dominate symbolic modules</span>, accounting for 80% of runtime. In contrast to the shared neural modules, these symbolic operations run sequentially and separately for each downstream cognition task and underlying rule. These operations typically stem from high-dimensional distributed vectors and are difficult to process efficiently on GPU/TPU. Thus, the challenges of accelerating these vector-symbolic computations will become increasingly important as the cognitive task and feature complexities further grow.</p> </div> <div class="ltx_para" id="S3.SS4.p2"> <p class="ltx_p" id="S3.SS4.p2.1"><span class="ltx_text ltx_font_bold" id="S3.SS4.p2.1.1">Symbolic hardware inefficiency analysis.</span> To further quantify the reason for hardware inefficiency, we analyze the compute and memory units behavior of representative neuro and symbolic kernels, as listed in Tab. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.T2" title="TABLE II ‣ III-B Compute Latency Analysis ‣ III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">II</span></a>. <em class="ltx_emph ltx_font_italic" id="S3.SS4.p2.1.2">The system inefficiencies mainly come from ALU underutilization, low cache hit rate, and massive data movement of symbolic operations.</em> Symbolic data transfer accounts for half of total latency, where <math alttext=">" class="ltx_Math" display="inline" id="S3.SS4.p2.1.m1.1"><semantics id="S3.SS4.p2.1.m1.1a"><mo id="S3.SS4.p2.1.m1.1.1" xref="S3.SS4.p2.1.m1.1.1.cmml">></mo><annotation-xml encoding="MathML-Content" id="S3.SS4.p2.1.m1.1b"><gt id="S3.SS4.p2.1.m1.1.1.cmml" xref="S3.SS4.p2.1.m1.1.1"></gt></annotation-xml><annotation encoding="application/x-tex" id="S3.SS4.p2.1.m1.1c">></annotation><annotation encoding="application/x-llamapun" id="S3.SS4.p2.1.m1.1d">></annotation></semantics></math>80% is from host to GPU, while neural kernels exhibit high utilization.</p> </div> </section> <section class="ltx_subsection" id="S3.SS5"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS5.5.1.1">III-E</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS5.6.2">Unique Characteristics of Neurosymbolic vs ML Workloads</span> </h3> <div class="ltx_para" id="S3.SS5.p1"> <p class="ltx_p" id="S3.SS5.p1.1">To summarize, based on above characterization, neurosymbolic AI differs from ML workloads mainly in three aspects:</p> </div> <div class="ltx_para" id="S3.SS5.p2"> <p class="ltx_p" id="S3.SS5.p2.1"><span class="ltx_text ltx_font_bold" id="S3.SS5.p2.1.1">Compute kernels.</span> Neurosymbolic workloads consist of heterogeneous neural and symbolic kernels. Symbolic operations execute inefficiently on CPU/GPU/TPU with low hardware utilization and cache hit rate, resulting in latency bottleneck.</p> </div> <div class="ltx_para" id="S3.SS5.p3"> <p class="ltx_p" id="S3.SS5.p3.1"><span class="ltx_text ltx_font_bold" id="S3.SS5.p3.1.1">Memory.</span> Symbolic operations are memory-bounded due to large element streaming for vector-symbolic operations. Symbolic codebooks typically account for large memory footprints and require large intermediate caching during computation.</p> </div> <div class="ltx_para" id="S3.SS5.p4"> <p class="ltx_p" id="S3.SS5.p4.1"><span class="ltx_text ltx_font_bold" id="S3.SS5.p4.1.1">Dataflow and scalability.</span> Neurosymbolic workloads exhibit more complex control than CNNs. Symbolic modules typically have irregular dataflow, data dependency, and sequential processing, bringing low parallelism scalability and inefficiency.</p> </div> </section> <section class="ltx_subsection" id="S3.SS6"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S3.SS6.5.1.1">III-F</span> </span><span class="ltx_text ltx_font_italic" id="S3.SS6.6.2">Identified Opportunities for Neurosymbolic Optimization</span> </h3> <div class="ltx_para" id="S3.SS6.p1"> <p class="ltx_p" id="S3.SS6.p1.1">While neurosymbolic AI holds great promise, addressing its inefficiencies is critical for achieving real-time, scalable deployment and ensuring long-term development. To this end, we propose CogSys, an algorithm-hardware co-design framework designed to enhance both reasoning energy efficiency and accuracy in neurosymbolic AI (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.F7" title="Figure 7 ‣ III-B Compute Latency Analysis ‣ III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">7</span></a>). <span class="ltx_text ltx_font_bold" id="S3.SS6.p1.1.1">At the algorithm level</span>, hardware-friendly codebook optimization reduces memory footprint and latency (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S4" title="IV CogSys: Algorithm Optimization ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">IV</span></a>). <span class="ltx_text ltx_font_bold" id="S3.SS6.p1.1.2">At the hardware level</span>, the architecture and dataflow must be efficient for VSA operations and reconfigurable for neuro/symbolic kernels (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5" title="V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">V</span></a>). <span class="ltx_text ltx_font_bold" id="S3.SS6.p1.1.3">At the system level</span>, the architecture must efficiently and adaptively schedule diverse neurosymbolic workloads (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S6" title="VI CogSys: Scheduling Strategy ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">VI</span></a>). CogSys consistently demonstrates improvements in performance, efficiency, and accuracy across reasoning applications (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7" title="VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">VII</span></a>).</p> </div> </section> </section> <section class="ltx_section" id="S4"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">IV </span><span class="ltx_text ltx_font_smallcaps" id="S4.1.1">CogSys: Algorithm Optimization</span> </h2> <div class="ltx_para" id="S4.p1"> <p class="ltx_p" id="S4.p1.1">This section presents our proposed CogSys algorithm optimizations for efficient and scalable neurosymbolic systems. We propose <span class="ltx_text ltx_font_italic" id="S4.p1.1.1">an efficient vector-symbolic factorization strategy</span> to reduce the large memory footprint (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S4.SS1" title="IV-A Symbolic Factorization Strategy ‣ IV CogSys: Algorithm Optimization ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">IV-A</span></span></a>), and explore the <span class="ltx_text ltx_font_italic" id="S4.p1.1.2">stochasticity injection</span> and <span class="ltx_text ltx_font_italic" id="S4.p1.1.3">low-precision operation</span> to accelerate neurosymbolic systems (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S4.SS2" title="IV-B Stochasticity and Low-Precision Operation ‣ IV CogSys: Algorithm Optimization ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">IV-B</span></span></a>).</p> </div> <section class="ltx_subsection" id="S4.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S4.SS1.5.1.1">IV-A</span> </span><span class="ltx_text ltx_font_italic" id="S4.SS1.6.2">Symbolic Factorization Strategy</span> </h3> <div class="ltx_para" id="S4.SS1.p1"> <p class="ltx_p" id="S4.SS1.p1.1"><span class="ltx_text ltx_font_bold" id="S4.SS1.p1.1.1">Overall pipeline.</span> To address the large memory footprint of symbolic codebooks (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S3.SS3" title="III-C Memory and System Analysis ‣ III Neurosymbolic AI System Profiling ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">III-C</span></span></a>), we propose an efficient factorization strategy. The key idea is to disentangle the large volume of object combination vectors in symbolic knowledge codebook into the small volume of basis attribute vectors, thus lowering computational time and space complexity (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S4.F8" title="Figure 8 ‣ IV-A Symbolic Factorization Strategy ‣ IV CogSys: Algorithm Optimization ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">8</span></a>).</p> </div> <div class="ltx_para" id="S4.SS1.p2"> <p class="ltx_p" id="S4.SS1.p2.8">Specifically, given an entangled query vector <math alttext="\boldsymbol{q}" class="ltx_Math" display="inline" id="S4.SS1.p2.1.m1.1"><semantics id="S4.SS1.p2.1.m1.1a"><mi id="S4.SS1.p2.1.m1.1.1" xref="S4.SS1.p2.1.m1.1.1.cmml">𝒒</mi><annotation-xml encoding="MathML-Content" id="S4.SS1.p2.1.m1.1b"><ci id="S4.SS1.p2.1.m1.1.1.cmml" xref="S4.SS1.p2.1.m1.1.1">𝒒</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p2.1.m1.1c">\boldsymbol{q}</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p2.1.m1.1d">bold_italic_q</annotation></semantics></math> (e.g., scene representation) generated from neural module and the set of symbolic codebooks <math alttext="\{X^{i}\}_{i=1}^{F}" class="ltx_Math" display="inline" id="S4.SS1.p2.2.m2.1"><semantics id="S4.SS1.p2.2.m2.1a"><msubsup id="S4.SS1.p2.2.m2.1.1" xref="S4.SS1.p2.2.m2.1.1.cmml"><mrow id="S4.SS1.p2.2.m2.1.1.1.1.1" xref="S4.SS1.p2.2.m2.1.1.1.1.2.cmml"><mo id="S4.SS1.p2.2.m2.1.1.1.1.1.2" stretchy="false" xref="S4.SS1.p2.2.m2.1.1.1.1.2.cmml">{</mo><msup id="S4.SS1.p2.2.m2.1.1.1.1.1.1" xref="S4.SS1.p2.2.m2.1.1.1.1.1.1.cmml"><mi id="S4.SS1.p2.2.m2.1.1.1.1.1.1.2" xref="S4.SS1.p2.2.m2.1.1.1.1.1.1.2.cmml">X</mi><mi id="S4.SS1.p2.2.m2.1.1.1.1.1.1.3" xref="S4.SS1.p2.2.m2.1.1.1.1.1.1.3.cmml">i</mi></msup><mo id="S4.SS1.p2.2.m2.1.1.1.1.1.3" stretchy="false" xref="S4.SS1.p2.2.m2.1.1.1.1.2.cmml">}</mo></mrow><mrow id="S4.SS1.p2.2.m2.1.1.1.3" xref="S4.SS1.p2.2.m2.1.1.1.3.cmml"><mi id="S4.SS1.p2.2.m2.1.1.1.3.2" xref="S4.SS1.p2.2.m2.1.1.1.3.2.cmml">i</mi><mo id="S4.SS1.p2.2.m2.1.1.1.3.1" xref="S4.SS1.p2.2.m2.1.1.1.3.1.cmml">=</mo><mn id="S4.SS1.p2.2.m2.1.1.1.3.3" xref="S4.SS1.p2.2.m2.1.1.1.3.3.cmml">1</mn></mrow><mi id="S4.SS1.p2.2.m2.1.1.3" xref="S4.SS1.p2.2.m2.1.1.3.cmml">F</mi></msubsup><annotation-xml encoding="MathML-Content" id="S4.SS1.p2.2.m2.1b"><apply id="S4.SS1.p2.2.m2.1.1.cmml" xref="S4.SS1.p2.2.m2.1.1"><csymbol cd="ambiguous" id="S4.SS1.p2.2.m2.1.1.2.cmml" xref="S4.SS1.p2.2.m2.1.1">superscript</csymbol><apply id="S4.SS1.p2.2.m2.1.1.1.cmml" xref="S4.SS1.p2.2.m2.1.1"><csymbol cd="ambiguous" id="S4.SS1.p2.2.m2.1.1.1.2.cmml" xref="S4.SS1.p2.2.m2.1.1">subscript</csymbol><set id="S4.SS1.p2.2.m2.1.1.1.1.2.cmml" xref="S4.SS1.p2.2.m2.1.1.1.1.1"><apply id="S4.SS1.p2.2.m2.1.1.1.1.1.1.cmml" xref="S4.SS1.p2.2.m2.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S4.SS1.p2.2.m2.1.1.1.1.1.1.1.cmml" xref="S4.SS1.p2.2.m2.1.1.1.1.1.1">superscript</csymbol><ci id="S4.SS1.p2.2.m2.1.1.1.1.1.1.2.cmml" xref="S4.SS1.p2.2.m2.1.1.1.1.1.1.2">𝑋</ci><ci id="S4.SS1.p2.2.m2.1.1.1.1.1.1.3.cmml" xref="S4.SS1.p2.2.m2.1.1.1.1.1.1.3">𝑖</ci></apply></set><apply id="S4.SS1.p2.2.m2.1.1.1.3.cmml" xref="S4.SS1.p2.2.m2.1.1.1.3"><eq id="S4.SS1.p2.2.m2.1.1.1.3.1.cmml" xref="S4.SS1.p2.2.m2.1.1.1.3.1"></eq><ci id="S4.SS1.p2.2.m2.1.1.1.3.2.cmml" xref="S4.SS1.p2.2.m2.1.1.1.3.2">𝑖</ci><cn id="S4.SS1.p2.2.m2.1.1.1.3.3.cmml" type="integer" xref="S4.SS1.p2.2.m2.1.1.1.3.3">1</cn></apply></apply><ci id="S4.SS1.p2.2.m2.1.1.3.cmml" xref="S4.SS1.p2.2.m2.1.1.3">𝐹</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p2.2.m2.1c">\{X^{i}\}_{i=1}^{F}</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p2.2.m2.1d">{ italic_X start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_F end_POSTSUPERSCRIPT</annotation></semantics></math> (each with <math alttext="M" class="ltx_Math" display="inline" id="S4.SS1.p2.3.m3.1"><semantics id="S4.SS1.p2.3.m3.1a"><mi id="S4.SS1.p2.3.m3.1.1" xref="S4.SS1.p2.3.m3.1.1.cmml">M</mi><annotation-xml encoding="MathML-Content" id="S4.SS1.p2.3.m3.1b"><ci id="S4.SS1.p2.3.m3.1.1.cmml" xref="S4.SS1.p2.3.m3.1.1">𝑀</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p2.3.m3.1c">M</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p2.3.m3.1d">italic_M</annotation></semantics></math> possible solutions and <math alttext="F" class="ltx_Math" display="inline" id="S4.SS1.p2.4.m4.1"><semantics id="S4.SS1.p2.4.m4.1a"><mi id="S4.SS1.p2.4.m4.1.1" xref="S4.SS1.p2.4.m4.1.1.cmml">F</mi><annotation-xml encoding="MathML-Content" id="S4.SS1.p2.4.m4.1b"><ci id="S4.SS1.p2.4.m4.1.1.cmml" xref="S4.SS1.p2.4.m4.1.1">𝐹</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p2.4.m4.1c">F</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p2.4.m4.1d">italic_F</annotation></semantics></math> codebooks in total), this creates a combinatorial search and storage involving <math alttext="M^{F}" class="ltx_Math" display="inline" id="S4.SS1.p2.5.m5.1"><semantics id="S4.SS1.p2.5.m5.1a"><msup id="S4.SS1.p2.5.m5.1.1" xref="S4.SS1.p2.5.m5.1.1.cmml"><mi id="S4.SS1.p2.5.m5.1.1.2" xref="S4.SS1.p2.5.m5.1.1.2.cmml">M</mi><mi id="S4.SS1.p2.5.m5.1.1.3" xref="S4.SS1.p2.5.m5.1.1.3.cmml">F</mi></msup><annotation-xml encoding="MathML-Content" id="S4.SS1.p2.5.m5.1b"><apply id="S4.SS1.p2.5.m5.1.1.cmml" xref="S4.SS1.p2.5.m5.1.1"><csymbol cd="ambiguous" id="S4.SS1.p2.5.m5.1.1.1.cmml" xref="S4.SS1.p2.5.m5.1.1">superscript</csymbol><ci id="S4.SS1.p2.5.m5.1.1.2.cmml" xref="S4.SS1.p2.5.m5.1.1.2">𝑀</ci><ci id="S4.SS1.p2.5.m5.1.1.3.cmml" xref="S4.SS1.p2.5.m5.1.1.3">𝐹</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p2.5.m5.1c">M^{F}</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p2.5.m5.1d">italic_M start_POSTSUPERSCRIPT italic_F end_POSTSUPERSCRIPT</annotation></semantics></math> vectors. Instead of searching along all possible combinations, our factorization method iteratively searches in superposition to find the valid <math alttext="\boldsymbol{\hat{x}}^{i}\in X^{i}" class="ltx_Math" display="inline" id="S4.SS1.p2.6.m6.1"><semantics id="S4.SS1.p2.6.m6.1a"><mrow id="S4.SS1.p2.6.m6.1.1" xref="S4.SS1.p2.6.m6.1.1.cmml"><msup id="S4.SS1.p2.6.m6.1.1.2" xref="S4.SS1.p2.6.m6.1.1.2.cmml"><mover accent="true" id="S4.SS1.p2.6.m6.1.1.2.2" xref="S4.SS1.p2.6.m6.1.1.2.2.cmml"><mi id="S4.SS1.p2.6.m6.1.1.2.2.2" xref="S4.SS1.p2.6.m6.1.1.2.2.2.cmml">𝒙</mi><mo class="ltx_mathvariant_bold" id="S4.SS1.p2.6.m6.1.1.2.2.1" mathvariant="bold" xref="S4.SS1.p2.6.m6.1.1.2.2.1.cmml">^</mo></mover><mi id="S4.SS1.p2.6.m6.1.1.2.3" xref="S4.SS1.p2.6.m6.1.1.2.3.cmml">i</mi></msup><mo id="S4.SS1.p2.6.m6.1.1.1" xref="S4.SS1.p2.6.m6.1.1.1.cmml">∈</mo><msup id="S4.SS1.p2.6.m6.1.1.3" xref="S4.SS1.p2.6.m6.1.1.3.cmml"><mi id="S4.SS1.p2.6.m6.1.1.3.2" xref="S4.SS1.p2.6.m6.1.1.3.2.cmml">X</mi><mi id="S4.SS1.p2.6.m6.1.1.3.3" xref="S4.SS1.p2.6.m6.1.1.3.3.cmml">i</mi></msup></mrow><annotation-xml encoding="MathML-Content" id="S4.SS1.p2.6.m6.1b"><apply id="S4.SS1.p2.6.m6.1.1.cmml" xref="S4.SS1.p2.6.m6.1.1"><in id="S4.SS1.p2.6.m6.1.1.1.cmml" xref="S4.SS1.p2.6.m6.1.1.1"></in><apply id="S4.SS1.p2.6.m6.1.1.2.cmml" xref="S4.SS1.p2.6.m6.1.1.2"><csymbol cd="ambiguous" id="S4.SS1.p2.6.m6.1.1.2.1.cmml" xref="S4.SS1.p2.6.m6.1.1.2">superscript</csymbol><apply id="S4.SS1.p2.6.m6.1.1.2.2.cmml" xref="S4.SS1.p2.6.m6.1.1.2.2"><ci id="S4.SS1.p2.6.m6.1.1.2.2.1.cmml" xref="S4.SS1.p2.6.m6.1.1.2.2.1">bold-^</ci><ci id="S4.SS1.p2.6.m6.1.1.2.2.2.cmml" xref="S4.SS1.p2.6.m6.1.1.2.2.2">𝒙</ci></apply><ci id="S4.SS1.p2.6.m6.1.1.2.3.cmml" xref="S4.SS1.p2.6.m6.1.1.2.3">𝑖</ci></apply><apply id="S4.SS1.p2.6.m6.1.1.3.cmml" xref="S4.SS1.p2.6.m6.1.1.3"><csymbol cd="ambiguous" id="S4.SS1.p2.6.m6.1.1.3.1.cmml" xref="S4.SS1.p2.6.m6.1.1.3">superscript</csymbol><ci id="S4.SS1.p2.6.m6.1.1.3.2.cmml" xref="S4.SS1.p2.6.m6.1.1.3.2">𝑋</ci><ci id="S4.SS1.p2.6.m6.1.1.3.3.cmml" xref="S4.SS1.p2.6.m6.1.1.3.3">𝑖</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p2.6.m6.1c">\boldsymbol{\hat{x}}^{i}\in X^{i}</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p2.6.m6.1d">overbold_^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ italic_X start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT</annotation></semantics></math> such that the estimated vector <math alttext="\boldsymbol{\hat{q}}=\boldsymbol{\hat{x}}^{1}\odot\boldsymbol{\hat{x}}^{2}% \odot\dots\odot\boldsymbol{\hat{x}}^{f}" class="ltx_Math" display="inline" id="S4.SS1.p2.7.m7.1"><semantics id="S4.SS1.p2.7.m7.1a"><mrow id="S4.SS1.p2.7.m7.1.1" xref="S4.SS1.p2.7.m7.1.1.cmml"><mover accent="true" id="S4.SS1.p2.7.m7.1.1.2" xref="S4.SS1.p2.7.m7.1.1.2.cmml"><mi id="S4.SS1.p2.7.m7.1.1.2.2" xref="S4.SS1.p2.7.m7.1.1.2.2.cmml">𝒒</mi><mo class="ltx_mathvariant_bold" id="S4.SS1.p2.7.m7.1.1.2.1" mathvariant="bold" xref="S4.SS1.p2.7.m7.1.1.2.1.cmml">^</mo></mover><mo id="S4.SS1.p2.7.m7.1.1.1" xref="S4.SS1.p2.7.m7.1.1.1.cmml">=</mo><mrow id="S4.SS1.p2.7.m7.1.1.3" xref="S4.SS1.p2.7.m7.1.1.3.cmml"><msup id="S4.SS1.p2.7.m7.1.1.3.2" xref="S4.SS1.p2.7.m7.1.1.3.2.cmml"><mover accent="true" id="S4.SS1.p2.7.m7.1.1.3.2.2" xref="S4.SS1.p2.7.m7.1.1.3.2.2.cmml"><mi id="S4.SS1.p2.7.m7.1.1.3.2.2.2" xref="S4.SS1.p2.7.m7.1.1.3.2.2.2.cmml">𝒙</mi><mo class="ltx_mathvariant_bold" id="S4.SS1.p2.7.m7.1.1.3.2.2.1" mathvariant="bold" xref="S4.SS1.p2.7.m7.1.1.3.2.2.1.cmml">^</mo></mover><mn id="S4.SS1.p2.7.m7.1.1.3.2.3" xref="S4.SS1.p2.7.m7.1.1.3.2.3.cmml">1</mn></msup><mo id="S4.SS1.p2.7.m7.1.1.3.1" lspace="0.222em" rspace="0.222em" xref="S4.SS1.p2.7.m7.1.1.3.1.cmml">⊙</mo><msup id="S4.SS1.p2.7.m7.1.1.3.3" xref="S4.SS1.p2.7.m7.1.1.3.3.cmml"><mover accent="true" id="S4.SS1.p2.7.m7.1.1.3.3.2" xref="S4.SS1.p2.7.m7.1.1.3.3.2.cmml"><mi id="S4.SS1.p2.7.m7.1.1.3.3.2.2" xref="S4.SS1.p2.7.m7.1.1.3.3.2.2.cmml">𝒙</mi><mo class="ltx_mathvariant_bold" id="S4.SS1.p2.7.m7.1.1.3.3.2.1" mathvariant="bold" xref="S4.SS1.p2.7.m7.1.1.3.3.2.1.cmml">^</mo></mover><mn id="S4.SS1.p2.7.m7.1.1.3.3.3" xref="S4.SS1.p2.7.m7.1.1.3.3.3.cmml">2</mn></msup><mo id="S4.SS1.p2.7.m7.1.1.3.1a" lspace="0.222em" rspace="0.222em" xref="S4.SS1.p2.7.m7.1.1.3.1.cmml">⊙</mo><mi id="S4.SS1.p2.7.m7.1.1.3.4" mathvariant="normal" xref="S4.SS1.p2.7.m7.1.1.3.4.cmml">⋯</mi><mo id="S4.SS1.p2.7.m7.1.1.3.1b" lspace="0.222em" rspace="0.222em" xref="S4.SS1.p2.7.m7.1.1.3.1.cmml">⊙</mo><msup id="S4.SS1.p2.7.m7.1.1.3.5" xref="S4.SS1.p2.7.m7.1.1.3.5.cmml"><mover accent="true" id="S4.SS1.p2.7.m7.1.1.3.5.2" xref="S4.SS1.p2.7.m7.1.1.3.5.2.cmml"><mi id="S4.SS1.p2.7.m7.1.1.3.5.2.2" xref="S4.SS1.p2.7.m7.1.1.3.5.2.2.cmml">𝒙</mi><mo class="ltx_mathvariant_bold" id="S4.SS1.p2.7.m7.1.1.3.5.2.1" mathvariant="bold" xref="S4.SS1.p2.7.m7.1.1.3.5.2.1.cmml">^</mo></mover><mi id="S4.SS1.p2.7.m7.1.1.3.5.3" xref="S4.SS1.p2.7.m7.1.1.3.5.3.cmml">f</mi></msup></mrow></mrow><annotation-xml encoding="MathML-Content" id="S4.SS1.p2.7.m7.1b"><apply id="S4.SS1.p2.7.m7.1.1.cmml" xref="S4.SS1.p2.7.m7.1.1"><eq id="S4.SS1.p2.7.m7.1.1.1.cmml" xref="S4.SS1.p2.7.m7.1.1.1"></eq><apply id="S4.SS1.p2.7.m7.1.1.2.cmml" xref="S4.SS1.p2.7.m7.1.1.2"><ci id="S4.SS1.p2.7.m7.1.1.2.1.cmml" xref="S4.SS1.p2.7.m7.1.1.2.1">bold-^</ci><ci id="S4.SS1.p2.7.m7.1.1.2.2.cmml" xref="S4.SS1.p2.7.m7.1.1.2.2">𝒒</ci></apply><apply id="S4.SS1.p2.7.m7.1.1.3.cmml" xref="S4.SS1.p2.7.m7.1.1.3"><csymbol cd="latexml" id="S4.SS1.p2.7.m7.1.1.3.1.cmml" xref="S4.SS1.p2.7.m7.1.1.3.1">direct-product</csymbol><apply id="S4.SS1.p2.7.m7.1.1.3.2.cmml" xref="S4.SS1.p2.7.m7.1.1.3.2"><csymbol cd="ambiguous" id="S4.SS1.p2.7.m7.1.1.3.2.1.cmml" xref="S4.SS1.p2.7.m7.1.1.3.2">superscript</csymbol><apply id="S4.SS1.p2.7.m7.1.1.3.2.2.cmml" xref="S4.SS1.p2.7.m7.1.1.3.2.2"><ci id="S4.SS1.p2.7.m7.1.1.3.2.2.1.cmml" xref="S4.SS1.p2.7.m7.1.1.3.2.2.1">bold-^</ci><ci id="S4.SS1.p2.7.m7.1.1.3.2.2.2.cmml" xref="S4.SS1.p2.7.m7.1.1.3.2.2.2">𝒙</ci></apply><cn id="S4.SS1.p2.7.m7.1.1.3.2.3.cmml" type="integer" xref="S4.SS1.p2.7.m7.1.1.3.2.3">1</cn></apply><apply id="S4.SS1.p2.7.m7.1.1.3.3.cmml" xref="S4.SS1.p2.7.m7.1.1.3.3"><csymbol cd="ambiguous" id="S4.SS1.p2.7.m7.1.1.3.3.1.cmml" xref="S4.SS1.p2.7.m7.1.1.3.3">superscript</csymbol><apply id="S4.SS1.p2.7.m7.1.1.3.3.2.cmml" xref="S4.SS1.p2.7.m7.1.1.3.3.2"><ci id="S4.SS1.p2.7.m7.1.1.3.3.2.1.cmml" xref="S4.SS1.p2.7.m7.1.1.3.3.2.1">bold-^</ci><ci id="S4.SS1.p2.7.m7.1.1.3.3.2.2.cmml" xref="S4.SS1.p2.7.m7.1.1.3.3.2.2">𝒙</ci></apply><cn id="S4.SS1.p2.7.m7.1.1.3.3.3.cmml" type="integer" xref="S4.SS1.p2.7.m7.1.1.3.3.3">2</cn></apply><ci id="S4.SS1.p2.7.m7.1.1.3.4.cmml" xref="S4.SS1.p2.7.m7.1.1.3.4">⋯</ci><apply id="S4.SS1.p2.7.m7.1.1.3.5.cmml" xref="S4.SS1.p2.7.m7.1.1.3.5"><csymbol cd="ambiguous" id="S4.SS1.p2.7.m7.1.1.3.5.1.cmml" xref="S4.SS1.p2.7.m7.1.1.3.5">superscript</csymbol><apply id="S4.SS1.p2.7.m7.1.1.3.5.2.cmml" xref="S4.SS1.p2.7.m7.1.1.3.5.2"><ci id="S4.SS1.p2.7.m7.1.1.3.5.2.1.cmml" xref="S4.SS1.p2.7.m7.1.1.3.5.2.1">bold-^</ci><ci id="S4.SS1.p2.7.m7.1.1.3.5.2.2.cmml" xref="S4.SS1.p2.7.m7.1.1.3.5.2.2">𝒙</ci></apply><ci id="S4.SS1.p2.7.m7.1.1.3.5.3.cmml" xref="S4.SS1.p2.7.m7.1.1.3.5.3">𝑓</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p2.7.m7.1c">\boldsymbol{\hat{q}}=\boldsymbol{\hat{x}}^{1}\odot\boldsymbol{\hat{x}}^{2}% \odot\dots\odot\boldsymbol{\hat{x}}^{f}</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p2.7.m7.1d">overbold_^ start_ARG bold_italic_q end_ARG = overbold_^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ⊙ overbold_^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⊙ ⋯ ⊙ overbold_^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT</annotation></semantics></math> resembles with the highest similarity to the input query <math alttext="\boldsymbol{q}" class="ltx_Math" display="inline" id="S4.SS1.p2.8.m8.1"><semantics id="S4.SS1.p2.8.m8.1a"><mi id="S4.SS1.p2.8.m8.1.1" xref="S4.SS1.p2.8.m8.1.1.cmml">𝒒</mi><annotation-xml encoding="MathML-Content" id="S4.SS1.p2.8.m8.1b"><ci id="S4.SS1.p2.8.m8.1.1.cmml" xref="S4.SS1.p2.8.m8.1.1">𝒒</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p2.8.m8.1c">\boldsymbol{q}</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p2.8.m8.1d">bold_italic_q</annotation></semantics></math>. By exploiting the quasi-orthogonality of the vectors, our factorization module is able to rapidly search through the various combinations in superposition by iteratively unbinding all but one of the factors from the product vector, and then projecting it into the space of possible solutions of the considered factor that is used for the following reasoning procedure. In this way, we can replace the original symbolic codebook and greatly reduce its storage.</p> </div> <div class="ltx_para" id="S4.SS1.p3"> <p class="ltx_p" id="S4.SS1.p3.1"><span class="ltx_text ltx_font_bold" id="S4.SS1.p3.1.1">Detailed steps.</span> Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S4.F8" title="Figure 8 ‣ IV-A Symbolic Factorization Strategy ‣ IV CogSys: Algorithm Optimization ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">8</span></a> illustrates our symbolic knowledge codebook factorization strategy, consisting of three steps:</p> </div> <figure class="ltx_figure" id="S4.F8"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="419" id="S4.F8.g1" src="x8.png" width="830"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S4.F8.3.1.1" style="font-size:90%;">Figure 8</span>: </span><span class="ltx_text ltx_font_bold" id="S4.F8.4.2" style="font-size:90%;">Proposed symbolic codebook factorization strategy<span class="ltx_text ltx_font_medium" id="S4.F8.4.2.1">. The efficient factorization technique quickly factorizes a product vector in an interactive manner and significantly reduces the memory footprint demand of the symbolic codebook when decomposing query vectors.</span></span></figcaption> </figure> <div class="ltx_para" id="S4.SS1.p4"> <p class="ltx_p" id="S4.SS1.p4.5">Step <svg class="ltx_picture" height="12.06" id="S4.SS1.p4.1.pic1" overflow="visible" version="1.1" width="12.06"><g fill="#000000" stroke="#000000" stroke-width="0.4pt" transform="translate(0,12.06) matrix(1 0 0 -1 0 0) translate(6.03,0) translate(0,6.03)"><path d="M 6.03 0 C 6.03 3.33 3.33 6.03 0 6.03 C -3.33 6.03 -6.03 3.33 -6.03 0 C -6.03 -3.33 -3.33 -6.03 0 -6.03 C 3.33 -6.03 6.03 -3.33 6.03 0 Z M 0 0" style="stroke:none"></path><g fill="#000000" stroke="#000000" transform="matrix(1.0 0.0 0.0 1.0 -3.46 -4.46)"><foreignobject height="8.92" overflow="visible" transform="matrix(1 0 0 -1 0 16.6)" width="6.92"><span class="ltx_text" id="S4.SS1.p4.1.pic1.1.1.1.1.1" style="color:#FFFFFF;">1</span></foreignobject></g></g></svg>: Factor unbinding via element-wise multiplication (<math alttext="\oslash" class="ltx_Math" display="inline" id="S4.SS1.p4.2.m1.1"><semantics id="S4.SS1.p4.2.m1.1a"><mo id="S4.SS1.p4.2.m1.1.1" xref="S4.SS1.p4.2.m1.1.1.cmml">⊘</mo><annotation-xml encoding="MathML-Content" id="S4.SS1.p4.2.m1.1b"><ci id="S4.SS1.p4.2.m1.1.1.cmml" xref="S4.SS1.p4.2.m1.1.1">⊘</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p4.2.m1.1c">\oslash</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p4.2.m1.1d">⊘</annotation></semantics></math>). For a given factor, the unbinding is performed by taking the product vector <math alttext="\boldsymbol{q}" class="ltx_Math" display="inline" id="S4.SS1.p4.3.m2.1"><semantics id="S4.SS1.p4.3.m2.1a"><mi id="S4.SS1.p4.3.m2.1.1" xref="S4.SS1.p4.3.m2.1.1.cmml">𝒒</mi><annotation-xml encoding="MathML-Content" id="S4.SS1.p4.3.m2.1b"><ci id="S4.SS1.p4.3.m2.1.1.cmml" xref="S4.SS1.p4.3.m2.1.1">𝒒</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p4.3.m2.1c">\boldsymbol{q}</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p4.3.m2.1d">bold_italic_q</annotation></semantics></math> and unbinding the contribution of the other factors’ latest estimate: <math alttext="\boldsymbol{\tilde{x}}^{i}(t)=\boldsymbol{q}\oslash\Pi_{f=1}^{F}\boldsymbol{% \hat{x}}^{f}(t)" class="ltx_Math" display="inline" id="S4.SS1.p4.4.m3.2"><semantics id="S4.SS1.p4.4.m3.2a"><mrow id="S4.SS1.p4.4.m3.2.3" xref="S4.SS1.p4.4.m3.2.3.cmml"><mrow id="S4.SS1.p4.4.m3.2.3.2" xref="S4.SS1.p4.4.m3.2.3.2.cmml"><msup id="S4.SS1.p4.4.m3.2.3.2.2" xref="S4.SS1.p4.4.m3.2.3.2.2.cmml"><mover accent="true" id="S4.SS1.p4.4.m3.2.3.2.2.2" xref="S4.SS1.p4.4.m3.2.3.2.2.2.cmml"><mi id="S4.SS1.p4.4.m3.2.3.2.2.2.2" xref="S4.SS1.p4.4.m3.2.3.2.2.2.2.cmml">𝒙</mi><mo class="ltx_mathvariant_bold" id="S4.SS1.p4.4.m3.2.3.2.2.2.1" mathvariant="bold" xref="S4.SS1.p4.4.m3.2.3.2.2.2.1.cmml">~</mo></mover><mi id="S4.SS1.p4.4.m3.2.3.2.2.3" xref="S4.SS1.p4.4.m3.2.3.2.2.3.cmml">i</mi></msup><mo id="S4.SS1.p4.4.m3.2.3.2.1" xref="S4.SS1.p4.4.m3.2.3.2.1.cmml">⁢</mo><mrow id="S4.SS1.p4.4.m3.2.3.2.3.2" xref="S4.SS1.p4.4.m3.2.3.2.cmml"><mo id="S4.SS1.p4.4.m3.2.3.2.3.2.1" stretchy="false" xref="S4.SS1.p4.4.m3.2.3.2.cmml">(</mo><mi id="S4.SS1.p4.4.m3.1.1" xref="S4.SS1.p4.4.m3.1.1.cmml">t</mi><mo id="S4.SS1.p4.4.m3.2.3.2.3.2.2" stretchy="false" xref="S4.SS1.p4.4.m3.2.3.2.cmml">)</mo></mrow></mrow><mo id="S4.SS1.p4.4.m3.2.3.1" xref="S4.SS1.p4.4.m3.2.3.1.cmml">=</mo><mrow id="S4.SS1.p4.4.m3.2.3.3" xref="S4.SS1.p4.4.m3.2.3.3.cmml"><mrow id="S4.SS1.p4.4.m3.2.3.3.2" xref="S4.SS1.p4.4.m3.2.3.3.2.cmml"><mi id="S4.SS1.p4.4.m3.2.3.3.2.2" xref="S4.SS1.p4.4.m3.2.3.3.2.2.cmml">𝒒</mi><mo id="S4.SS1.p4.4.m3.2.3.3.2.1" xref="S4.SS1.p4.4.m3.2.3.3.2.1.cmml">⊘</mo><msubsup id="S4.SS1.p4.4.m3.2.3.3.2.3" xref="S4.SS1.p4.4.m3.2.3.3.2.3.cmml"><mi id="S4.SS1.p4.4.m3.2.3.3.2.3.2.2" mathvariant="normal" xref="S4.SS1.p4.4.m3.2.3.3.2.3.2.2.cmml">Π</mi><mrow id="S4.SS1.p4.4.m3.2.3.3.2.3.2.3" xref="S4.SS1.p4.4.m3.2.3.3.2.3.2.3.cmml"><mi id="S4.SS1.p4.4.m3.2.3.3.2.3.2.3.2" xref="S4.SS1.p4.4.m3.2.3.3.2.3.2.3.2.cmml">f</mi><mo id="S4.SS1.p4.4.m3.2.3.3.2.3.2.3.1" xref="S4.SS1.p4.4.m3.2.3.3.2.3.2.3.1.cmml">=</mo><mn id="S4.SS1.p4.4.m3.2.3.3.2.3.2.3.3" xref="S4.SS1.p4.4.m3.2.3.3.2.3.2.3.3.cmml">1</mn></mrow><mi id="S4.SS1.p4.4.m3.2.3.3.2.3.3" xref="S4.SS1.p4.4.m3.2.3.3.2.3.3.cmml">F</mi></msubsup></mrow><mo id="S4.SS1.p4.4.m3.2.3.3.1" xref="S4.SS1.p4.4.m3.2.3.3.1.cmml">⁢</mo><msup id="S4.SS1.p4.4.m3.2.3.3.3" xref="S4.SS1.p4.4.m3.2.3.3.3.cmml"><mover accent="true" id="S4.SS1.p4.4.m3.2.3.3.3.2" xref="S4.SS1.p4.4.m3.2.3.3.3.2.cmml"><mi id="S4.SS1.p4.4.m3.2.3.3.3.2.2" xref="S4.SS1.p4.4.m3.2.3.3.3.2.2.cmml">𝒙</mi><mo class="ltx_mathvariant_bold" id="S4.SS1.p4.4.m3.2.3.3.3.2.1" mathvariant="bold" xref="S4.SS1.p4.4.m3.2.3.3.3.2.1.cmml">^</mo></mover><mi id="S4.SS1.p4.4.m3.2.3.3.3.3" xref="S4.SS1.p4.4.m3.2.3.3.3.3.cmml">f</mi></msup><mo id="S4.SS1.p4.4.m3.2.3.3.1a" xref="S4.SS1.p4.4.m3.2.3.3.1.cmml">⁢</mo><mrow id="S4.SS1.p4.4.m3.2.3.3.4.2" xref="S4.SS1.p4.4.m3.2.3.3.cmml"><mo id="S4.SS1.p4.4.m3.2.3.3.4.2.1" stretchy="false" xref="S4.SS1.p4.4.m3.2.3.3.cmml">(</mo><mi id="S4.SS1.p4.4.m3.2.2" xref="S4.SS1.p4.4.m3.2.2.cmml">t</mi><mo id="S4.SS1.p4.4.m3.2.3.3.4.2.2" stretchy="false" xref="S4.SS1.p4.4.m3.2.3.3.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S4.SS1.p4.4.m3.2b"><apply id="S4.SS1.p4.4.m3.2.3.cmml" xref="S4.SS1.p4.4.m3.2.3"><eq id="S4.SS1.p4.4.m3.2.3.1.cmml" xref="S4.SS1.p4.4.m3.2.3.1"></eq><apply id="S4.SS1.p4.4.m3.2.3.2.cmml" xref="S4.SS1.p4.4.m3.2.3.2"><times id="S4.SS1.p4.4.m3.2.3.2.1.cmml" xref="S4.SS1.p4.4.m3.2.3.2.1"></times><apply id="S4.SS1.p4.4.m3.2.3.2.2.cmml" xref="S4.SS1.p4.4.m3.2.3.2.2"><csymbol cd="ambiguous" id="S4.SS1.p4.4.m3.2.3.2.2.1.cmml" xref="S4.SS1.p4.4.m3.2.3.2.2">superscript</csymbol><apply id="S4.SS1.p4.4.m3.2.3.2.2.2.cmml" xref="S4.SS1.p4.4.m3.2.3.2.2.2"><ci id="S4.SS1.p4.4.m3.2.3.2.2.2.1.cmml" xref="S4.SS1.p4.4.m3.2.3.2.2.2.1">bold-~</ci><ci id="S4.SS1.p4.4.m3.2.3.2.2.2.2.cmml" xref="S4.SS1.p4.4.m3.2.3.2.2.2.2">𝒙</ci></apply><ci id="S4.SS1.p4.4.m3.2.3.2.2.3.cmml" xref="S4.SS1.p4.4.m3.2.3.2.2.3">𝑖</ci></apply><ci id="S4.SS1.p4.4.m3.1.1.cmml" xref="S4.SS1.p4.4.m3.1.1">𝑡</ci></apply><apply id="S4.SS1.p4.4.m3.2.3.3.cmml" xref="S4.SS1.p4.4.m3.2.3.3"><times id="S4.SS1.p4.4.m3.2.3.3.1.cmml" xref="S4.SS1.p4.4.m3.2.3.3.1"></times><apply id="S4.SS1.p4.4.m3.2.3.3.2.cmml" xref="S4.SS1.p4.4.m3.2.3.3.2"><ci id="S4.SS1.p4.4.m3.2.3.3.2.1.cmml" xref="S4.SS1.p4.4.m3.2.3.3.2.1">⊘</ci><ci id="S4.SS1.p4.4.m3.2.3.3.2.2.cmml" xref="S4.SS1.p4.4.m3.2.3.3.2.2">𝒒</ci><apply id="S4.SS1.p4.4.m3.2.3.3.2.3.cmml" xref="S4.SS1.p4.4.m3.2.3.3.2.3"><csymbol cd="ambiguous" id="S4.SS1.p4.4.m3.2.3.3.2.3.1.cmml" xref="S4.SS1.p4.4.m3.2.3.3.2.3">superscript</csymbol><apply id="S4.SS1.p4.4.m3.2.3.3.2.3.2.cmml" xref="S4.SS1.p4.4.m3.2.3.3.2.3"><csymbol cd="ambiguous" id="S4.SS1.p4.4.m3.2.3.3.2.3.2.1.cmml" xref="S4.SS1.p4.4.m3.2.3.3.2.3">subscript</csymbol><ci id="S4.SS1.p4.4.m3.2.3.3.2.3.2.2.cmml" xref="S4.SS1.p4.4.m3.2.3.3.2.3.2.2">Π</ci><apply id="S4.SS1.p4.4.m3.2.3.3.2.3.2.3.cmml" xref="S4.SS1.p4.4.m3.2.3.3.2.3.2.3"><eq id="S4.SS1.p4.4.m3.2.3.3.2.3.2.3.1.cmml" xref="S4.SS1.p4.4.m3.2.3.3.2.3.2.3.1"></eq><ci id="S4.SS1.p4.4.m3.2.3.3.2.3.2.3.2.cmml" xref="S4.SS1.p4.4.m3.2.3.3.2.3.2.3.2">𝑓</ci><cn id="S4.SS1.p4.4.m3.2.3.3.2.3.2.3.3.cmml" type="integer" xref="S4.SS1.p4.4.m3.2.3.3.2.3.2.3.3">1</cn></apply></apply><ci id="S4.SS1.p4.4.m3.2.3.3.2.3.3.cmml" xref="S4.SS1.p4.4.m3.2.3.3.2.3.3">𝐹</ci></apply></apply><apply id="S4.SS1.p4.4.m3.2.3.3.3.cmml" xref="S4.SS1.p4.4.m3.2.3.3.3"><csymbol cd="ambiguous" id="S4.SS1.p4.4.m3.2.3.3.3.1.cmml" xref="S4.SS1.p4.4.m3.2.3.3.3">superscript</csymbol><apply id="S4.SS1.p4.4.m3.2.3.3.3.2.cmml" xref="S4.SS1.p4.4.m3.2.3.3.3.2"><ci id="S4.SS1.p4.4.m3.2.3.3.3.2.1.cmml" xref="S4.SS1.p4.4.m3.2.3.3.3.2.1">bold-^</ci><ci id="S4.SS1.p4.4.m3.2.3.3.3.2.2.cmml" xref="S4.SS1.p4.4.m3.2.3.3.3.2.2">𝒙</ci></apply><ci id="S4.SS1.p4.4.m3.2.3.3.3.3.cmml" xref="S4.SS1.p4.4.m3.2.3.3.3.3">𝑓</ci></apply><ci id="S4.SS1.p4.4.m3.2.2.cmml" xref="S4.SS1.p4.4.m3.2.2">𝑡</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p4.4.m3.2c">\boldsymbol{\tilde{x}}^{i}(t)=\boldsymbol{q}\oslash\Pi_{f=1}^{F}\boldsymbol{% \hat{x}}^{f}(t)</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p4.4.m3.2d">overbold_~ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ( italic_t ) = bold_italic_q ⊘ roman_Π start_POSTSUBSCRIPT italic_f = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_F end_POSTSUPERSCRIPT overbold_^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT ( italic_t )</annotation></semantics></math> (<math alttext="f\neq i" class="ltx_Math" display="inline" id="S4.SS1.p4.5.m4.1"><semantics id="S4.SS1.p4.5.m4.1a"><mrow id="S4.SS1.p4.5.m4.1.1" xref="S4.SS1.p4.5.m4.1.1.cmml"><mi id="S4.SS1.p4.5.m4.1.1.2" xref="S4.SS1.p4.5.m4.1.1.2.cmml">f</mi><mo id="S4.SS1.p4.5.m4.1.1.1" xref="S4.SS1.p4.5.m4.1.1.1.cmml">≠</mo><mi id="S4.SS1.p4.5.m4.1.1.3" xref="S4.SS1.p4.5.m4.1.1.3.cmml">i</mi></mrow><annotation-xml encoding="MathML-Content" id="S4.SS1.p4.5.m4.1b"><apply id="S4.SS1.p4.5.m4.1.1.cmml" xref="S4.SS1.p4.5.m4.1.1"><neq id="S4.SS1.p4.5.m4.1.1.1.cmml" xref="S4.SS1.p4.5.m4.1.1.1"></neq><ci id="S4.SS1.p4.5.m4.1.1.2.cmml" xref="S4.SS1.p4.5.m4.1.1.2">𝑓</ci><ci id="S4.SS1.p4.5.m4.1.1.3.cmml" xref="S4.SS1.p4.5.m4.1.1.3">𝑖</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p4.5.m4.1c">f\neq i</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p4.5.m4.1d">italic_f ≠ italic_i</annotation></semantics></math>).</p> </div> <div class="ltx_para" id="S4.SS1.p5"> <p class="ltx_p" id="S4.SS1.p5.4">Step <svg class="ltx_picture" height="12.06" id="S4.SS1.p5.1.pic1" overflow="visible" version="1.1" width="12.06"><g fill="#000000" stroke="#000000" stroke-width="0.4pt" transform="translate(0,12.06) matrix(1 0 0 -1 0 0) translate(6.03,0) translate(0,6.03)"><path d="M 6.03 0 C 6.03 3.33 3.33 6.03 0 6.03 C -3.33 6.03 -6.03 3.33 -6.03 0 C -6.03 -3.33 -3.33 -6.03 0 -6.03 C 3.33 -6.03 6.03 -3.33 6.03 0 Z M 0 0" style="stroke:none"></path><g fill="#000000" stroke="#000000" transform="matrix(1.0 0.0 0.0 1.0 -3.46 -4.46)"><foreignobject height="8.92" overflow="visible" transform="matrix(1 0 0 -1 0 16.6)" width="6.92"><span class="ltx_text" id="S4.SS1.p5.1.pic1.1.1.1.1.1" style="color:#FFFFFF;">2</span></foreignobject></g></g></svg>: Similarity search via matrix–vector multiplication. The similarity vector <math alttext="\boldsymbol{\alpha}^{f}(t)" class="ltx_Math" display="inline" id="S4.SS1.p5.2.m1.1"><semantics id="S4.SS1.p5.2.m1.1a"><mrow id="S4.SS1.p5.2.m1.1.2" xref="S4.SS1.p5.2.m1.1.2.cmml"><msup id="S4.SS1.p5.2.m1.1.2.2" xref="S4.SS1.p5.2.m1.1.2.2.cmml"><mi id="S4.SS1.p5.2.m1.1.2.2.2" xref="S4.SS1.p5.2.m1.1.2.2.2.cmml">𝜶</mi><mi id="S4.SS1.p5.2.m1.1.2.2.3" xref="S4.SS1.p5.2.m1.1.2.2.3.cmml">f</mi></msup><mo id="S4.SS1.p5.2.m1.1.2.1" xref="S4.SS1.p5.2.m1.1.2.1.cmml">⁢</mo><mrow id="S4.SS1.p5.2.m1.1.2.3.2" xref="S4.SS1.p5.2.m1.1.2.cmml"><mo id="S4.SS1.p5.2.m1.1.2.3.2.1" stretchy="false" xref="S4.SS1.p5.2.m1.1.2.cmml">(</mo><mi id="S4.SS1.p5.2.m1.1.1" xref="S4.SS1.p5.2.m1.1.1.cmml">t</mi><mo id="S4.SS1.p5.2.m1.1.2.3.2.2" stretchy="false" xref="S4.SS1.p5.2.m1.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S4.SS1.p5.2.m1.1b"><apply id="S4.SS1.p5.2.m1.1.2.cmml" xref="S4.SS1.p5.2.m1.1.2"><times id="S4.SS1.p5.2.m1.1.2.1.cmml" xref="S4.SS1.p5.2.m1.1.2.1"></times><apply id="S4.SS1.p5.2.m1.1.2.2.cmml" xref="S4.SS1.p5.2.m1.1.2.2"><csymbol cd="ambiguous" id="S4.SS1.p5.2.m1.1.2.2.1.cmml" xref="S4.SS1.p5.2.m1.1.2.2">superscript</csymbol><ci id="S4.SS1.p5.2.m1.1.2.2.2.cmml" xref="S4.SS1.p5.2.m1.1.2.2.2">𝜶</ci><ci id="S4.SS1.p5.2.m1.1.2.2.3.cmml" xref="S4.SS1.p5.2.m1.1.2.2.3">𝑓</ci></apply><ci id="S4.SS1.p5.2.m1.1.1.cmml" xref="S4.SS1.p5.2.m1.1.1">𝑡</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p5.2.m1.1c">\boldsymbol{\alpha}^{f}(t)</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p5.2.m1.1d">bold_italic_α start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT ( italic_t )</annotation></semantics></math> is calculated for each unbound estimate: <math alttext="\boldsymbol{\alpha}^{f}(t)=\boldsymbol{\tilde{x}}^{f}(t)\cdot X^{f}" class="ltx_Math" display="inline" id="S4.SS1.p5.3.m2.2"><semantics id="S4.SS1.p5.3.m2.2a"><mrow id="S4.SS1.p5.3.m2.2.3" xref="S4.SS1.p5.3.m2.2.3.cmml"><mrow id="S4.SS1.p5.3.m2.2.3.2" xref="S4.SS1.p5.3.m2.2.3.2.cmml"><msup id="S4.SS1.p5.3.m2.2.3.2.2" xref="S4.SS1.p5.3.m2.2.3.2.2.cmml"><mi id="S4.SS1.p5.3.m2.2.3.2.2.2" xref="S4.SS1.p5.3.m2.2.3.2.2.2.cmml">𝜶</mi><mi id="S4.SS1.p5.3.m2.2.3.2.2.3" xref="S4.SS1.p5.3.m2.2.3.2.2.3.cmml">f</mi></msup><mo id="S4.SS1.p5.3.m2.2.3.2.1" xref="S4.SS1.p5.3.m2.2.3.2.1.cmml">⁢</mo><mrow id="S4.SS1.p5.3.m2.2.3.2.3.2" xref="S4.SS1.p5.3.m2.2.3.2.cmml"><mo id="S4.SS1.p5.3.m2.2.3.2.3.2.1" stretchy="false" xref="S4.SS1.p5.3.m2.2.3.2.cmml">(</mo><mi id="S4.SS1.p5.3.m2.1.1" xref="S4.SS1.p5.3.m2.1.1.cmml">t</mi><mo id="S4.SS1.p5.3.m2.2.3.2.3.2.2" stretchy="false" xref="S4.SS1.p5.3.m2.2.3.2.cmml">)</mo></mrow></mrow><mo id="S4.SS1.p5.3.m2.2.3.1" xref="S4.SS1.p5.3.m2.2.3.1.cmml">=</mo><mrow id="S4.SS1.p5.3.m2.2.3.3" xref="S4.SS1.p5.3.m2.2.3.3.cmml"><mrow id="S4.SS1.p5.3.m2.2.3.3.2" xref="S4.SS1.p5.3.m2.2.3.3.2.cmml"><msup id="S4.SS1.p5.3.m2.2.3.3.2.2" xref="S4.SS1.p5.3.m2.2.3.3.2.2.cmml"><mover accent="true" id="S4.SS1.p5.3.m2.2.3.3.2.2.2" xref="S4.SS1.p5.3.m2.2.3.3.2.2.2.cmml"><mi id="S4.SS1.p5.3.m2.2.3.3.2.2.2.2" xref="S4.SS1.p5.3.m2.2.3.3.2.2.2.2.cmml">𝒙</mi><mo class="ltx_mathvariant_bold" id="S4.SS1.p5.3.m2.2.3.3.2.2.2.1" mathvariant="bold" xref="S4.SS1.p5.3.m2.2.3.3.2.2.2.1.cmml">~</mo></mover><mi id="S4.SS1.p5.3.m2.2.3.3.2.2.3" xref="S4.SS1.p5.3.m2.2.3.3.2.2.3.cmml">f</mi></msup><mo id="S4.SS1.p5.3.m2.2.3.3.2.1" xref="S4.SS1.p5.3.m2.2.3.3.2.1.cmml">⁢</mo><mrow id="S4.SS1.p5.3.m2.2.3.3.2.3.2" xref="S4.SS1.p5.3.m2.2.3.3.2.cmml"><mo id="S4.SS1.p5.3.m2.2.3.3.2.3.2.1" stretchy="false" xref="S4.SS1.p5.3.m2.2.3.3.2.cmml">(</mo><mi id="S4.SS1.p5.3.m2.2.2" xref="S4.SS1.p5.3.m2.2.2.cmml">t</mi><mo id="S4.SS1.p5.3.m2.2.3.3.2.3.2.2" rspace="0.055em" stretchy="false" xref="S4.SS1.p5.3.m2.2.3.3.2.cmml">)</mo></mrow></mrow><mo id="S4.SS1.p5.3.m2.2.3.3.1" rspace="0.222em" xref="S4.SS1.p5.3.m2.2.3.3.1.cmml">⋅</mo><msup id="S4.SS1.p5.3.m2.2.3.3.3" xref="S4.SS1.p5.3.m2.2.3.3.3.cmml"><mi id="S4.SS1.p5.3.m2.2.3.3.3.2" xref="S4.SS1.p5.3.m2.2.3.3.3.2.cmml">X</mi><mi id="S4.SS1.p5.3.m2.2.3.3.3.3" xref="S4.SS1.p5.3.m2.2.3.3.3.3.cmml">f</mi></msup></mrow></mrow><annotation-xml encoding="MathML-Content" id="S4.SS1.p5.3.m2.2b"><apply id="S4.SS1.p5.3.m2.2.3.cmml" xref="S4.SS1.p5.3.m2.2.3"><eq id="S4.SS1.p5.3.m2.2.3.1.cmml" xref="S4.SS1.p5.3.m2.2.3.1"></eq><apply id="S4.SS1.p5.3.m2.2.3.2.cmml" xref="S4.SS1.p5.3.m2.2.3.2"><times id="S4.SS1.p5.3.m2.2.3.2.1.cmml" xref="S4.SS1.p5.3.m2.2.3.2.1"></times><apply id="S4.SS1.p5.3.m2.2.3.2.2.cmml" xref="S4.SS1.p5.3.m2.2.3.2.2"><csymbol cd="ambiguous" id="S4.SS1.p5.3.m2.2.3.2.2.1.cmml" xref="S4.SS1.p5.3.m2.2.3.2.2">superscript</csymbol><ci id="S4.SS1.p5.3.m2.2.3.2.2.2.cmml" xref="S4.SS1.p5.3.m2.2.3.2.2.2">𝜶</ci><ci id="S4.SS1.p5.3.m2.2.3.2.2.3.cmml" xref="S4.SS1.p5.3.m2.2.3.2.2.3">𝑓</ci></apply><ci id="S4.SS1.p5.3.m2.1.1.cmml" xref="S4.SS1.p5.3.m2.1.1">𝑡</ci></apply><apply id="S4.SS1.p5.3.m2.2.3.3.cmml" xref="S4.SS1.p5.3.m2.2.3.3"><ci id="S4.SS1.p5.3.m2.2.3.3.1.cmml" xref="S4.SS1.p5.3.m2.2.3.3.1">⋅</ci><apply id="S4.SS1.p5.3.m2.2.3.3.2.cmml" xref="S4.SS1.p5.3.m2.2.3.3.2"><times id="S4.SS1.p5.3.m2.2.3.3.2.1.cmml" xref="S4.SS1.p5.3.m2.2.3.3.2.1"></times><apply id="S4.SS1.p5.3.m2.2.3.3.2.2.cmml" xref="S4.SS1.p5.3.m2.2.3.3.2.2"><csymbol cd="ambiguous" id="S4.SS1.p5.3.m2.2.3.3.2.2.1.cmml" xref="S4.SS1.p5.3.m2.2.3.3.2.2">superscript</csymbol><apply id="S4.SS1.p5.3.m2.2.3.3.2.2.2.cmml" xref="S4.SS1.p5.3.m2.2.3.3.2.2.2"><ci id="S4.SS1.p5.3.m2.2.3.3.2.2.2.1.cmml" xref="S4.SS1.p5.3.m2.2.3.3.2.2.2.1">bold-~</ci><ci id="S4.SS1.p5.3.m2.2.3.3.2.2.2.2.cmml" xref="S4.SS1.p5.3.m2.2.3.3.2.2.2.2">𝒙</ci></apply><ci id="S4.SS1.p5.3.m2.2.3.3.2.2.3.cmml" xref="S4.SS1.p5.3.m2.2.3.3.2.2.3">𝑓</ci></apply><ci id="S4.SS1.p5.3.m2.2.2.cmml" xref="S4.SS1.p5.3.m2.2.2">𝑡</ci></apply><apply id="S4.SS1.p5.3.m2.2.3.3.3.cmml" xref="S4.SS1.p5.3.m2.2.3.3.3"><csymbol cd="ambiguous" id="S4.SS1.p5.3.m2.2.3.3.3.1.cmml" xref="S4.SS1.p5.3.m2.2.3.3.3">superscript</csymbol><ci id="S4.SS1.p5.3.m2.2.3.3.3.2.cmml" xref="S4.SS1.p5.3.m2.2.3.3.3.2">𝑋</ci><ci id="S4.SS1.p5.3.m2.2.3.3.3.3.cmml" xref="S4.SS1.p5.3.m2.2.3.3.3.3">𝑓</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p5.3.m2.2c">\boldsymbol{\alpha}^{f}(t)=\boldsymbol{\tilde{x}}^{f}(t)\cdot X^{f}</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p5.3.m2.2d">bold_italic_α start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT ( italic_t ) = overbold_~ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT ( italic_t ) ⋅ italic_X start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT</annotation></semantics></math>, <math alttext="\forall f\in[1,F]" class="ltx_Math" display="inline" id="S4.SS1.p5.4.m3.2"><semantics id="S4.SS1.p5.4.m3.2a"><mrow id="S4.SS1.p5.4.m3.2.3" xref="S4.SS1.p5.4.m3.2.3.cmml"><mrow id="S4.SS1.p5.4.m3.2.3.2" xref="S4.SS1.p5.4.m3.2.3.2.cmml"><mo id="S4.SS1.p5.4.m3.2.3.2.1" rspace="0.167em" xref="S4.SS1.p5.4.m3.2.3.2.1.cmml">∀</mo><mi id="S4.SS1.p5.4.m3.2.3.2.2" xref="S4.SS1.p5.4.m3.2.3.2.2.cmml">f</mi></mrow><mo id="S4.SS1.p5.4.m3.2.3.1" xref="S4.SS1.p5.4.m3.2.3.1.cmml">∈</mo><mrow id="S4.SS1.p5.4.m3.2.3.3.2" xref="S4.SS1.p5.4.m3.2.3.3.1.cmml"><mo id="S4.SS1.p5.4.m3.2.3.3.2.1" stretchy="false" xref="S4.SS1.p5.4.m3.2.3.3.1.cmml">[</mo><mn id="S4.SS1.p5.4.m3.1.1" xref="S4.SS1.p5.4.m3.1.1.cmml">1</mn><mo id="S4.SS1.p5.4.m3.2.3.3.2.2" xref="S4.SS1.p5.4.m3.2.3.3.1.cmml">,</mo><mi id="S4.SS1.p5.4.m3.2.2" xref="S4.SS1.p5.4.m3.2.2.cmml">F</mi><mo id="S4.SS1.p5.4.m3.2.3.3.2.3" stretchy="false" xref="S4.SS1.p5.4.m3.2.3.3.1.cmml">]</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S4.SS1.p5.4.m3.2b"><apply id="S4.SS1.p5.4.m3.2.3.cmml" xref="S4.SS1.p5.4.m3.2.3"><in id="S4.SS1.p5.4.m3.2.3.1.cmml" xref="S4.SS1.p5.4.m3.2.3.1"></in><apply id="S4.SS1.p5.4.m3.2.3.2.cmml" xref="S4.SS1.p5.4.m3.2.3.2"><csymbol cd="latexml" id="S4.SS1.p5.4.m3.2.3.2.1.cmml" xref="S4.SS1.p5.4.m3.2.3.2.1">for-all</csymbol><ci id="S4.SS1.p5.4.m3.2.3.2.2.cmml" xref="S4.SS1.p5.4.m3.2.3.2.2">𝑓</ci></apply><interval closure="closed" id="S4.SS1.p5.4.m3.2.3.3.1.cmml" xref="S4.SS1.p5.4.m3.2.3.3.2"><cn id="S4.SS1.p5.4.m3.1.1.cmml" type="integer" xref="S4.SS1.p5.4.m3.1.1">1</cn><ci id="S4.SS1.p5.4.m3.2.2.cmml" xref="S4.SS1.p5.4.m3.2.2">𝐹</ci></interval></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p5.4.m3.2c">\forall f\in[1,F]</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p5.4.m3.2d">∀ italic_f ∈ [ 1 , italic_F ]</annotation></semantics></math>.</p> </div> <div class="ltx_para" id="S4.SS1.p6"> <p class="ltx_p" id="S4.SS1.p6.5">Step <svg class="ltx_picture" height="12.06" id="S4.SS1.p6.1.pic1" overflow="visible" version="1.1" width="12.06"><g fill="#000000" stroke="#000000" stroke-width="0.4pt" transform="translate(0,12.06) matrix(1 0 0 -1 0 0) translate(6.03,0) translate(0,6.03)"><path d="M 6.03 0 C 6.03 3.33 3.33 6.03 0 6.03 C -3.33 6.03 -6.03 3.33 -6.03 0 C -6.03 -3.33 -3.33 -6.03 0 -6.03 C 3.33 -6.03 6.03 -3.33 6.03 0 Z M 0 0" style="stroke:none"></path><g fill="#000000" stroke="#000000" transform="matrix(1.0 0.0 0.0 1.0 -3.46 -4.46)"><foreignobject height="8.92" overflow="visible" transform="matrix(1 0 0 -1 0 16.6)" width="6.92"><span class="ltx_text" id="S4.SS1.p6.1.pic1.1.1.1.1.1" style="color:#FFFFFF;">3</span></foreignobject></g></g></svg>: Factor projection via matrix–vector multiplication. The estimates for the factors for the subsequent time step are given by the linear combination of all the codevectors with the similarity vectors acting as weights: <math alttext="\boldsymbol{\hat{x}}^{f}(t+1)=\text{sign}(\boldsymbol{\alpha}^{f}(t)\cdot(X^{f% })^{T})" class="ltx_Math" display="inline" id="S4.SS1.p6.2.m1.3"><semantics id="S4.SS1.p6.2.m1.3a"><mrow id="S4.SS1.p6.2.m1.3.3" xref="S4.SS1.p6.2.m1.3.3.cmml"><mrow id="S4.SS1.p6.2.m1.2.2.1" xref="S4.SS1.p6.2.m1.2.2.1.cmml"><msup id="S4.SS1.p6.2.m1.2.2.1.3" xref="S4.SS1.p6.2.m1.2.2.1.3.cmml"><mover accent="true" id="S4.SS1.p6.2.m1.2.2.1.3.2" xref="S4.SS1.p6.2.m1.2.2.1.3.2.cmml"><mi id="S4.SS1.p6.2.m1.2.2.1.3.2.2" xref="S4.SS1.p6.2.m1.2.2.1.3.2.2.cmml">𝒙</mi><mo class="ltx_mathvariant_bold" id="S4.SS1.p6.2.m1.2.2.1.3.2.1" mathvariant="bold" xref="S4.SS1.p6.2.m1.2.2.1.3.2.1.cmml">^</mo></mover><mi id="S4.SS1.p6.2.m1.2.2.1.3.3" xref="S4.SS1.p6.2.m1.2.2.1.3.3.cmml">f</mi></msup><mo id="S4.SS1.p6.2.m1.2.2.1.2" xref="S4.SS1.p6.2.m1.2.2.1.2.cmml">⁢</mo><mrow id="S4.SS1.p6.2.m1.2.2.1.1.1" xref="S4.SS1.p6.2.m1.2.2.1.1.1.1.cmml"><mo id="S4.SS1.p6.2.m1.2.2.1.1.1.2" stretchy="false" xref="S4.SS1.p6.2.m1.2.2.1.1.1.1.cmml">(</mo><mrow id="S4.SS1.p6.2.m1.2.2.1.1.1.1" xref="S4.SS1.p6.2.m1.2.2.1.1.1.1.cmml"><mi id="S4.SS1.p6.2.m1.2.2.1.1.1.1.2" xref="S4.SS1.p6.2.m1.2.2.1.1.1.1.2.cmml">t</mi><mo id="S4.SS1.p6.2.m1.2.2.1.1.1.1.1" xref="S4.SS1.p6.2.m1.2.2.1.1.1.1.1.cmml">+</mo><mn id="S4.SS1.p6.2.m1.2.2.1.1.1.1.3" xref="S4.SS1.p6.2.m1.2.2.1.1.1.1.3.cmml">1</mn></mrow><mo id="S4.SS1.p6.2.m1.2.2.1.1.1.3" stretchy="false" xref="S4.SS1.p6.2.m1.2.2.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S4.SS1.p6.2.m1.3.3.3" xref="S4.SS1.p6.2.m1.3.3.3.cmml">=</mo><mrow id="S4.SS1.p6.2.m1.3.3.2" xref="S4.SS1.p6.2.m1.3.3.2.cmml"><mtext id="S4.SS1.p6.2.m1.3.3.2.3" xref="S4.SS1.p6.2.m1.3.3.2.3a.cmml">sign</mtext><mo id="S4.SS1.p6.2.m1.3.3.2.2" xref="S4.SS1.p6.2.m1.3.3.2.2.cmml">⁢</mo><mrow id="S4.SS1.p6.2.m1.3.3.2.1.1" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.cmml"><mo id="S4.SS1.p6.2.m1.3.3.2.1.1.2" stretchy="false" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.cmml">(</mo><mrow id="S4.SS1.p6.2.m1.3.3.2.1.1.1" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.cmml"><mrow id="S4.SS1.p6.2.m1.3.3.2.1.1.1.3" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.cmml"><msup id="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.2" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.2.cmml"><mi id="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.2.2" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.2.2.cmml">𝜶</mi><mi id="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.2.3" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.2.3.cmml">f</mi></msup><mo id="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.1" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.1.cmml">⁢</mo><mrow id="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.3.2" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.cmml"><mo id="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.3.2.1" stretchy="false" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.cmml">(</mo><mi id="S4.SS1.p6.2.m1.1.1" xref="S4.SS1.p6.2.m1.1.1.cmml">t</mi><mo id="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.3.2.2" rspace="0.055em" stretchy="false" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.cmml">)</mo></mrow></mrow><mo id="S4.SS1.p6.2.m1.3.3.2.1.1.1.2" rspace="0.222em" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.2.cmml">⋅</mo><msup id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.cmml"><mrow id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1.cmml"><mo id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.2" stretchy="false" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1.cmml">(</mo><msup id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1.cmml"><mi id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1.2" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1.2.cmml">X</mi><mi id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1.3" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1.3.cmml">f</mi></msup><mo id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.3" stretchy="false" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1.cmml">)</mo></mrow><mi id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.3" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.3.cmml">T</mi></msup></mrow><mo id="S4.SS1.p6.2.m1.3.3.2.1.1.3" stretchy="false" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.cmml">)</mo></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S4.SS1.p6.2.m1.3b"><apply id="S4.SS1.p6.2.m1.3.3.cmml" xref="S4.SS1.p6.2.m1.3.3"><eq id="S4.SS1.p6.2.m1.3.3.3.cmml" xref="S4.SS1.p6.2.m1.3.3.3"></eq><apply id="S4.SS1.p6.2.m1.2.2.1.cmml" xref="S4.SS1.p6.2.m1.2.2.1"><times id="S4.SS1.p6.2.m1.2.2.1.2.cmml" xref="S4.SS1.p6.2.m1.2.2.1.2"></times><apply id="S4.SS1.p6.2.m1.2.2.1.3.cmml" xref="S4.SS1.p6.2.m1.2.2.1.3"><csymbol cd="ambiguous" id="S4.SS1.p6.2.m1.2.2.1.3.1.cmml" xref="S4.SS1.p6.2.m1.2.2.1.3">superscript</csymbol><apply id="S4.SS1.p6.2.m1.2.2.1.3.2.cmml" xref="S4.SS1.p6.2.m1.2.2.1.3.2"><ci id="S4.SS1.p6.2.m1.2.2.1.3.2.1.cmml" xref="S4.SS1.p6.2.m1.2.2.1.3.2.1">bold-^</ci><ci id="S4.SS1.p6.2.m1.2.2.1.3.2.2.cmml" xref="S4.SS1.p6.2.m1.2.2.1.3.2.2">𝒙</ci></apply><ci id="S4.SS1.p6.2.m1.2.2.1.3.3.cmml" xref="S4.SS1.p6.2.m1.2.2.1.3.3">𝑓</ci></apply><apply id="S4.SS1.p6.2.m1.2.2.1.1.1.1.cmml" xref="S4.SS1.p6.2.m1.2.2.1.1.1"><plus id="S4.SS1.p6.2.m1.2.2.1.1.1.1.1.cmml" xref="S4.SS1.p6.2.m1.2.2.1.1.1.1.1"></plus><ci id="S4.SS1.p6.2.m1.2.2.1.1.1.1.2.cmml" xref="S4.SS1.p6.2.m1.2.2.1.1.1.1.2">𝑡</ci><cn id="S4.SS1.p6.2.m1.2.2.1.1.1.1.3.cmml" type="integer" xref="S4.SS1.p6.2.m1.2.2.1.1.1.1.3">1</cn></apply></apply><apply id="S4.SS1.p6.2.m1.3.3.2.cmml" xref="S4.SS1.p6.2.m1.3.3.2"><times id="S4.SS1.p6.2.m1.3.3.2.2.cmml" xref="S4.SS1.p6.2.m1.3.3.2.2"></times><ci id="S4.SS1.p6.2.m1.3.3.2.3a.cmml" xref="S4.SS1.p6.2.m1.3.3.2.3"><mtext id="S4.SS1.p6.2.m1.3.3.2.3.cmml" xref="S4.SS1.p6.2.m1.3.3.2.3">sign</mtext></ci><apply id="S4.SS1.p6.2.m1.3.3.2.1.1.1.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1"><ci id="S4.SS1.p6.2.m1.3.3.2.1.1.1.2.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.2">⋅</ci><apply id="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.3"><times id="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.1.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.1"></times><apply id="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.2.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.2"><csymbol cd="ambiguous" id="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.2.1.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.2">superscript</csymbol><ci id="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.2.2.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.2.2">𝜶</ci><ci id="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.2.3.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.3.2.3">𝑓</ci></apply><ci id="S4.SS1.p6.2.m1.1.1.cmml" xref="S4.SS1.p6.2.m1.1.1">𝑡</ci></apply><apply id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1"><csymbol cd="ambiguous" id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.2.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1">superscript</csymbol><apply id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1"><csymbol cd="ambiguous" id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1.1.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1">superscript</csymbol><ci id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1.2.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1.2">𝑋</ci><ci id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1.3.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.1.1.1.3">𝑓</ci></apply><ci id="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.3.cmml" xref="S4.SS1.p6.2.m1.3.3.2.1.1.1.1.3">𝑇</ci></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p6.2.m1.3c">\boldsymbol{\hat{x}}^{f}(t+1)=\text{sign}(\boldsymbol{\alpha}^{f}(t)\cdot(X^{f% })^{T})</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p6.2.m1.3d">overbold_^ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT ( italic_t + 1 ) = sign ( bold_italic_α start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT ( italic_t ) ⋅ ( italic_X start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT )</annotation></semantics></math>, <math alttext="\forall f\in[1,F]" class="ltx_Math" display="inline" id="S4.SS1.p6.3.m2.2"><semantics id="S4.SS1.p6.3.m2.2a"><mrow id="S4.SS1.p6.3.m2.2.3" xref="S4.SS1.p6.3.m2.2.3.cmml"><mrow id="S4.SS1.p6.3.m2.2.3.2" xref="S4.SS1.p6.3.m2.2.3.2.cmml"><mo id="S4.SS1.p6.3.m2.2.3.2.1" rspace="0.167em" xref="S4.SS1.p6.3.m2.2.3.2.1.cmml">∀</mo><mi id="S4.SS1.p6.3.m2.2.3.2.2" xref="S4.SS1.p6.3.m2.2.3.2.2.cmml">f</mi></mrow><mo id="S4.SS1.p6.3.m2.2.3.1" xref="S4.SS1.p6.3.m2.2.3.1.cmml">∈</mo><mrow id="S4.SS1.p6.3.m2.2.3.3.2" xref="S4.SS1.p6.3.m2.2.3.3.1.cmml"><mo id="S4.SS1.p6.3.m2.2.3.3.2.1" stretchy="false" xref="S4.SS1.p6.3.m2.2.3.3.1.cmml">[</mo><mn id="S4.SS1.p6.3.m2.1.1" xref="S4.SS1.p6.3.m2.1.1.cmml">1</mn><mo id="S4.SS1.p6.3.m2.2.3.3.2.2" xref="S4.SS1.p6.3.m2.2.3.3.1.cmml">,</mo><mi id="S4.SS1.p6.3.m2.2.2" xref="S4.SS1.p6.3.m2.2.2.cmml">F</mi><mo id="S4.SS1.p6.3.m2.2.3.3.2.3" stretchy="false" xref="S4.SS1.p6.3.m2.2.3.3.1.cmml">]</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S4.SS1.p6.3.m2.2b"><apply id="S4.SS1.p6.3.m2.2.3.cmml" xref="S4.SS1.p6.3.m2.2.3"><in id="S4.SS1.p6.3.m2.2.3.1.cmml" xref="S4.SS1.p6.3.m2.2.3.1"></in><apply id="S4.SS1.p6.3.m2.2.3.2.cmml" xref="S4.SS1.p6.3.m2.2.3.2"><csymbol cd="latexml" id="S4.SS1.p6.3.m2.2.3.2.1.cmml" xref="S4.SS1.p6.3.m2.2.3.2.1">for-all</csymbol><ci id="S4.SS1.p6.3.m2.2.3.2.2.cmml" xref="S4.SS1.p6.3.m2.2.3.2.2">𝑓</ci></apply><interval closure="closed" id="S4.SS1.p6.3.m2.2.3.3.1.cmml" xref="S4.SS1.p6.3.m2.2.3.3.2"><cn id="S4.SS1.p6.3.m2.1.1.cmml" type="integer" xref="S4.SS1.p6.3.m2.1.1">1</cn><ci id="S4.SS1.p6.3.m2.2.2.cmml" xref="S4.SS1.p6.3.m2.2.2">𝐹</ci></interval></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS1.p6.3.m2.2c">\forall f\in[1,F]</annotation><annotation encoding="application/x-llamapun" id="S4.SS1.p6.3.m2.2d">∀ italic_f ∈ [ 1 , italic_F ]</annotation></semantics></math>. Repeat Steps <svg class="ltx_picture" height="12.06" id="S4.SS1.p6.4.pic2" overflow="visible" version="1.1" width="12.06"><g fill="#000000" stroke="#000000" stroke-width="0.4pt" transform="translate(0,12.06) matrix(1 0 0 -1 0 0) translate(6.03,0) translate(0,6.03)"><path d="M 6.03 0 C 6.03 3.33 3.33 6.03 0 6.03 C -3.33 6.03 -6.03 3.33 -6.03 0 C -6.03 -3.33 -3.33 -6.03 0 -6.03 C 3.33 -6.03 6.03 -3.33 6.03 0 Z M 0 0" style="stroke:none"></path><g fill="#000000" stroke="#000000" transform="matrix(1.0 0.0 0.0 1.0 -3.46 -4.46)"><foreignobject height="8.92" overflow="visible" transform="matrix(1 0 0 -1 0 16.6)" width="6.92"><span class="ltx_text" id="S4.SS1.p6.4.pic2.1.1.1.1.1" style="color:#FFFFFF;">1</span></foreignobject></g></g></svg>- <svg class="ltx_picture" height="12.06" id="S4.SS1.p6.5.pic3" overflow="visible" version="1.1" width="12.06"><g fill="#000000" stroke="#000000" stroke-width="0.4pt" transform="translate(0,12.06) matrix(1 0 0 -1 0 0) translate(6.03,0) translate(0,6.03)"><path d="M 6.03 0 C 6.03 3.33 3.33 6.03 0 6.03 C -3.33 6.03 -6.03 3.33 -6.03 0 C -6.03 -3.33 -3.33 -6.03 0 -6.03 C 3.33 -6.03 6.03 -3.33 6.03 0 Z M 0 0" style="stroke:none"></path><g fill="#000000" stroke="#000000" transform="matrix(1.0 0.0 0.0 1.0 -3.46 -4.46)"><foreignobject height="8.92" overflow="visible" transform="matrix(1 0 0 -1 0 16.6)" width="6.92"><span class="ltx_text" id="S4.SS1.p6.5.pic3.1.1.1.1.1" style="color:#FFFFFF;">3</span></foreignobject></g></g></svg> until convergence.</p> </div> <div class="ltx_para" id="S4.SS1.p7"> <p class="ltx_p" id="S4.SS1.p7.1"><span class="ltx_text ltx_font_bold" id="S4.SS1.p7.1.1">Applicable across neurosymbolic workloads.</span> Our proposed efficient factorization module can apply to various levels of conceptual hierarchy, such as factoring time-varying pixel data of dynamic scenes <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib9" title="">9</a>]</cite>, factoring sentence structure into roles and fillers <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib53" title="">53</a>]</cite>, and cognitive analogical reasoning <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>]</cite>. Essentially, given its ubiquitous applicability in perception and cognitive reasoning, we envision it being a core component in future large-scale neurosymbolic cognitive systems.</p> </div> </section> <section class="ltx_subsection" id="S4.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S4.SS2.5.1.1">IV-B</span> </span><span class="ltx_text ltx_font_italic" id="S4.SS2.6.2">Stochasticity and Low-Precision Operation</span> </h3> <div class="ltx_para" id="S4.SS2.p1"> <p class="ltx_p" id="S4.SS2.p1.2"><span class="ltx_text ltx_font_bold" id="S4.SS2.p1.2.1">Factorization optimization via stochasticity.</span> To reduce the required number of iterations of factorization, we propose to apply additive Gaussian noise. We observe that injecting stochasticity in both Step <svg class="ltx_picture" height="12.06" id="S4.SS2.p1.1.pic1" overflow="visible" version="1.1" width="12.06"><g fill="#000000" stroke="#000000" stroke-width="0.4pt" transform="translate(0,12.06) matrix(1 0 0 -1 0 0) translate(6.03,0) translate(0,6.03)"><path d="M 6.03 0 C 6.03 3.33 3.33 6.03 0 6.03 C -3.33 6.03 -6.03 3.33 -6.03 0 C -6.03 -3.33 -3.33 -6.03 0 -6.03 C 3.33 -6.03 6.03 -3.33 6.03 0 Z M 0 0" style="stroke:none"></path><g fill="#000000" stroke="#000000" transform="matrix(1.0 0.0 0.0 1.0 -3.46 -4.46)"><foreignobject height="8.92" overflow="visible" transform="matrix(1 0 0 -1 0 16.6)" width="6.92"><span class="ltx_text" id="S4.SS2.p1.1.pic1.1.1.1.1.1" style="color:#FFFFFF;">2</span></foreignobject></g></g></svg> similarity and Step <svg class="ltx_picture" height="12.06" id="S4.SS2.p1.2.pic2" overflow="visible" version="1.1" width="12.06"><g fill="#000000" stroke="#000000" stroke-width="0.4pt" transform="translate(0,12.06) matrix(1 0 0 -1 0 0) translate(6.03,0) translate(0,6.03)"><path d="M 6.03 0 C 6.03 3.33 3.33 6.03 0 6.03 C -3.33 6.03 -6.03 3.33 -6.03 0 C -6.03 -3.33 -3.33 -6.03 0 -6.03 C 3.33 -6.03 6.03 -3.33 6.03 0 Z M 0 0" style="stroke:none"></path><g fill="#000000" stroke="#000000" transform="matrix(1.0 0.0 0.0 1.0 -3.46 -4.46)"><foreignobject height="8.92" overflow="visible" transform="matrix(1 0 0 -1 0 16.6)" width="6.92"><span class="ltx_text" id="S4.SS2.p1.2.pic2.1.1.1.1.1" style="color:#FFFFFF;">3</span></foreignobject></g></g></svg> projection operations helps the factorization process escape limit cycles, allowing it to explore a much larger solution space and achieve faster convergence (Tab. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.T8" title="TABLE VIII ‣ VII-B CogSys Algorithm Optimization Performance ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">VIII</span></a>).</p> </div> <div class="ltx_para" id="S4.SS2.p2"> <p class="ltx_p" id="S4.SS2.p2.1"><span class="ltx_text ltx_font_bold" id="S4.SS2.p2.1.1">Operator precision optimization.</span> To further reduce the memory footprint, we apply quantization techniques to the workloads. Specifically, we apply 8-bit floating-point and integer arithmetic for both neuro and symbolic computations with fine-tuning to maintain reasoning accuracy (Tab. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.T9" title="TABLE IX ‣ VII-B CogSys Algorithm Optimization Performance ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">IX</span></a>).</p> </div> </section> <section class="ltx_subsection" id="S4.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S4.SS3.5.1.1">IV-C</span> </span><span class="ltx_text ltx_font_italic" id="S4.SS3.6.2">Algorithm Optimizations Discussion</span> </h3> <figure class="ltx_table" id="S4.T3"> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S4.T3.3.1.1" style="font-size:90%;">TABLE III</span>: </span><span class="ltx_text ltx_font_bold" id="S4.T3.4.2" style="font-size:90%;">Algorithm optimization impact.<span class="ltx_text ltx_font_medium" id="S4.T3.4.2.1"> Factorization, stochasticity, and quantization impact accuracy, latency, and memory.</span></span></figcaption> <div class="ltx_inline-block ltx_align_center ltx_transformed_outer" id="S4.T3.5" style="width:433.6pt;height:74.7pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="transform:translate(7.9pt,-1.4pt) scale(1.03805670713684,1.03805670713684) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S4.T3.5.1"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S4.T3.5.1.1.1"> <th class="ltx_td ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S4.T3.5.1.1.1.1" style="padding:1pt 2.0pt;"></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S4.T3.5.1.1.1.2" style="padding:1pt 2.0pt;"> <span class="ltx_text ltx_font_bold" id="S4.T3.5.1.1.1.2.1">Accuracy</span> (higher is better)</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S4.T3.5.1.1.1.3" style="padding:1pt 2.0pt;"> <span class="ltx_text ltx_font_bold" id="S4.T3.5.1.1.1.3.1">Latency</span> (lower is better)</th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S4.T3.5.1.1.1.4" style="padding:1pt 2.0pt;"> <span class="ltx_text ltx_font_bold" id="S4.T3.5.1.1.1.4.1">Memory</span> (lower is better)</th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S4.T3.5.1.2.1"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T3.5.1.2.1.1" style="padding:1pt 2.0pt;">Factorization</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T3.5.1.2.1.2" style="padding:1pt 2.0pt;">Increase</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T3.5.1.2.1.3" style="padding:1pt 2.0pt;">Reduce</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T3.5.1.2.1.4" style="padding:1pt 2.0pt;">Reduce</td> </tr> <tr class="ltx_tr" id="S4.T3.5.1.3.2"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T3.5.1.3.2.1" style="padding:1pt 2.0pt;">Stochasticity</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T3.5.1.3.2.2" style="padding:1pt 2.0pt;">Increase / Reduce</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S4.T3.5.1.3.2.3" style="padding:1pt 2.0pt;">Reduce</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S4.T3.5.1.3.2.4" style="padding:1pt 2.0pt;">No Impact</td> </tr> <tr class="ltx_tr" id="S4.T3.5.1.4.3"> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T3.5.1.4.3.1" style="padding:1pt 2.0pt;">Low-Precision</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T3.5.1.4.3.2" style="padding:1pt 2.0pt;">Reduce</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S4.T3.5.1.4.3.3" style="padding:1pt 2.0pt;">Reduce</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S4.T3.5.1.4.3.4" style="padding:1pt 2.0pt;">Reduce</td> </tr> </tbody> </table> </span></div> </figure> <div class="ltx_para" id="S4.SS3.p1"> <p class="ltx_p" id="S4.SS3.p1.1"><span class="ltx_text ltx_font_bold" id="S4.SS3.p1.1.1">Impact on accuracy, latency, and memory.</span> The factorization, stochasticity, and quantization optimizations impact accuracy, latency, and memory requirements to varying extents. As shown in Tab. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S4.T3" title="TABLE III ‣ IV-C Algorithm Optimizations Discussion ‣ IV CogSys: Algorithm Optimization ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">III</span></a>, accuracy improves with factorization (due to precise attribute extraction, aiding downstream symbolic reasoning) and with stochasticity optimizations (due to improved convergence). However, quantization results in accuracy decreases due to data imprecision. Designers can balance speed and accuracy by tuning factorization convergence threshold.</p> </div> </section> </section> <section class="ltx_section" id="S5"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">V </span><span class="ltx_text ltx_font_smallcaps" id="S5.1.1">CogSys: Hardware Architecture</span> </h2> <div class="ltx_para" id="S5.p1"> <p class="ltx_p" id="S5.p1.1">This section presents CogSys architecture, the <em class="ltx_emph ltx_font_italic" id="S5.p1.1.1">first</em> hardware to enable efficient and scalable neurosymbolic processing. CogSys architecture features <span class="ltx_text ltx_font_italic" id="S5.p1.1.2">reconfigurable neuro/symbolic processing elements (nsPE)</span> (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.SS2" title="V-B Reconfigurable Neuro/Symbolic Processing Element ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">V-B</span></span></a>), <span class="ltx_text ltx_font_italic" id="S5.p1.1.3">bubble streaming (BS) dataflow</span> (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.SS3" title="V-C Efficient Bubble Streaming Dataflow ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">V-C</span></span></a>), <span class="ltx_text ltx_font_italic" id="S5.p1.1.4">adaptive spatial-temporal (ST) mapping</span> (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.SS4" title="V-D Adaptive Spatial and Temporal Mapping Strategy ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">V-D</span></span></a>) with scalable array architecture (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.SS5" title="V-E Adaptive Scale-Up and Scale-Out Strategy ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">V-E</span></span></a>) and customized SIMD units (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.SS6" title="V-F Double-buffered Memory and Custom SIMD Unit ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">V-F</span></span></a>) for neurosymbolic processing.</p> </div> <figure class="ltx_figure" id="S5.F10"> <div class="ltx_flex_figure"> <div class="ltx_flex_cell ltx_flex_size_1"> <figure class="ltx_figure ltx_figure_panel ltx_minipage ltx_align_center ltx_align_top" id="S5.F10.1" style="width:498.7pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="522" id="S5.F10.1.g1" src="x9.png" width="789"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S5.F10.1.2.1.1" style="font-size:90%;">Figure 9</span>: </span><span class="ltx_text ltx_font_bold" id="S5.F10.1.3.2" style="font-size:90%;">Architecture overview.<span class="ltx_text ltx_font_medium" id="S5.F10.1.3.2.1"> CogSys system includes DRAM, host System-on-Chip (SoC), and CogSys accelerator that consists of five major components: reconfigurable and scalable neuro/symbolic compute arrays, custom SIMD units, double-buffered SRAMs, workload scheduler, and memory controller.</span></span></figcaption> </figure> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <figure class="ltx_figure ltx_figure_panel ltx_minipage ltx_align_center ltx_align_top" id="S5.F10.2" style="width:359.9pt;"><img alt="Refer to caption" class="ltx_graphics ltx_img_square" height="736" id="S5.F10.2.g1" src="x10.png" width="789"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S5.F10.2.3.1.1" style="font-size:90%;">Figure 10</span>: </span><span class="ltx_text ltx_font_bold" id="S5.F10.2.4.2" style="font-size:90%;">Reconfigurable neuro/symbolic PE (<span class="ltx_text ltx_font_italic" id="S5.F10.2.4.2.1">nsPE</span>).<span class="ltx_text ltx_font_medium" id="S5.F10.2.4.2.2"> Each <span class="ltx_text ltx_font_italic" id="S5.F10.2.4.2.2.1">nsPE</span> includes four registers and supports three modes (load, neuro, symbolic) that provide reconfigurable support for neurosymbolic operations.</span></span></figcaption> </figure> </div> </div> </figure> <figure class="ltx_table" id="S5.T4"> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S5.T4.13.1.1" style="font-size:90%;">TABLE IV</span>: </span><span class="ltx_text ltx_font_bold" id="S5.T4.14.2" style="font-size:90%;">Comparison between CogSys with other accelerators.<span class="ltx_text ltx_font_medium" id="S5.T4.14.2.1"> CogSys is the first to enable efficient and scalable vector-symbolic circular convolution (CircConv) and neurosymbolic models. </span></span></figcaption><div class="ltx_flex_figure"> <div class="ltx_flex_cell ltx_flex_size_1"> <div class="ltx_inline-block ltx_figure_panel ltx_align_center ltx_transformed_outer" id="S5.T4.10" style="width:433.6pt;height:166.7pt;vertical-align:-0.8pt;"><span class="ltx_transformed_inner" style="transform:translate(-49.0pt,18.7pt) scale(0.815717572589116,0.815717572589116) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S5.T4.10.10"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S5.T4.3.3.3"> <th class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T4.3.3.3.4" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.3.3.3.4.1">Accelerators</span></th> <th class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T4.3.3.3.5" style="padding:1pt 1.8pt;"> <table class="ltx_tabular ltx_align_middle" id="S5.T4.3.3.3.5.1"> <tr class="ltx_tr" id="S5.T4.3.3.3.5.1.1"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.3.3.3.5.1.1.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.3.3.3.5.1.1.1.1">Reconfi-</span></td> </tr> <tr class="ltx_tr" id="S5.T4.3.3.3.5.1.2"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.3.3.3.5.1.2.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.3.3.3.5.1.2.1.1">gurable</span></td> </tr> </table> </th> <th class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T4.3.3.3.6" style="padding:1pt 1.8pt;"> <table class="ltx_tabular ltx_align_middle" id="S5.T4.3.3.3.6.1"> <tr class="ltx_tr" id="S5.T4.3.3.3.6.1.1"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.3.3.3.6.1.1.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.3.3.3.6.1.1.1.1">Scale-up/</span></td> </tr> <tr class="ltx_tr" id="S5.T4.3.3.3.6.1.2"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.3.3.3.6.1.2.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.3.3.3.6.1.2.1.1">Scale-out</span></td> </tr> </table> </th> <th class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T4.1.1.1.1" style="padding:1pt 1.8pt;"> <table class="ltx_tabular ltx_align_middle" id="S5.T4.1.1.1.1.1"> <tr class="ltx_tr" id="S5.T4.1.1.1.1.1.2"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.1.1.1.1.1.2.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.1.1.1.1.1.2.1.1">Memory Footprint for</span></td> </tr> <tr class="ltx_tr" id="S5.T4.1.1.1.1.1.3"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.1.1.1.1.1.3.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.1.1.1.1.1.3.1.1">Single CircConv</span></td> </tr> <tr class="ltx_tr" id="S5.T4.1.1.1.1.1.1"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.1.1.1.1.1.1.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.1.1.1.1.1.1.1.1">(Vector Dimension = <math alttext="\mathbf{d}" class="ltx_Math" display="inline" id="S5.T4.1.1.1.1.1.1.1.1.m1.1"><semantics id="S5.T4.1.1.1.1.1.1.1.1.m1.1a"><mi id="S5.T4.1.1.1.1.1.1.1.1.m1.1.1" xref="S5.T4.1.1.1.1.1.1.1.1.m1.1.1.cmml">𝐝</mi><annotation-xml encoding="MathML-Content" id="S5.T4.1.1.1.1.1.1.1.1.m1.1b"><ci id="S5.T4.1.1.1.1.1.1.1.1.m1.1.1.cmml" xref="S5.T4.1.1.1.1.1.1.1.1.m1.1.1">𝐝</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.T4.1.1.1.1.1.1.1.1.m1.1c">\mathbf{d}</annotation><annotation encoding="application/x-llamapun" id="S5.T4.1.1.1.1.1.1.1.1.m1.1d">bold_d</annotation></semantics></math>)</span></td> </tr> </table> </th> <th class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T4.2.2.2.2" style="padding:1pt 1.8pt;"> <table class="ltx_tabular ltx_align_middle" id="S5.T4.2.2.2.2.1"> <tr class="ltx_tr" id="S5.T4.2.2.2.2.1.1"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.2.2.2.2.1.1.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.2.2.2.2.1.1.1.1">CWP<sup class="ltx_sup" id="S5.T4.2.2.2.2.1.1.1.1.1"><span class="ltx_text ltx_font_medium" id="S5.T4.2.2.2.2.1.1.1.1.1.1">∗</span></sup></span></td> </tr> <tr class="ltx_tr" id="S5.T4.2.2.2.2.1.2"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.2.2.2.2.1.2.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.2.2.2.2.1.2.1.1">Support for</span></td> </tr> <tr class="ltx_tr" id="S5.T4.2.2.2.2.1.3"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.2.2.2.2.1.3.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.2.2.2.2.1.3.1.1">CircConv</span></td> </tr> </table> </th> <th class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T4.3.3.3.3" style="padding:1pt 1.8pt;"> <table class="ltx_tabular ltx_align_middle" id="S5.T4.3.3.3.3.1"> <tr class="ltx_tr" id="S5.T4.3.3.3.3.1.1"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.3.3.3.3.1.1.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.3.3.3.3.1.1.1.1">ScWP<sup class="ltx_sup" id="S5.T4.3.3.3.3.1.1.1.1.1"><span class="ltx_text ltx_font_medium ltx_font_italic" id="S5.T4.3.3.3.3.1.1.1.1.1.1">∗∗</span></sup></span></td> </tr> <tr class="ltx_tr" id="S5.T4.3.3.3.3.1.2"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.3.3.3.3.1.2.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.3.3.3.3.1.2.1.1">Support for</span></td> </tr> <tr class="ltx_tr" id="S5.T4.3.3.3.3.1.3"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.3.3.3.3.1.3.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.3.3.3.3.1.3.1.1">CircConv</span></td> </tr> </table> </th> <th class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S5.T4.3.3.3.7" style="padding:1pt 1.8pt;"> <table class="ltx_tabular ltx_align_middle" id="S5.T4.3.3.3.7.1"> <tr class="ltx_tr" id="S5.T4.3.3.3.7.1.1"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.3.3.3.7.1.1.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.3.3.3.7.1.1.1.1">Neuro-</span></td> </tr> <tr class="ltx_tr" id="S5.T4.3.3.3.7.1.2"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.3.3.3.7.1.2.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.3.3.3.7.1.2.1.1">symbolic AI</span></td> </tr> <tr class="ltx_tr" id="S5.T4.3.3.3.7.1.3"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S5.T4.3.3.3.7.1.3.1" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.3.3.3.7.1.3.1.1">Support</span></td> </tr> </table> </th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S5.T4.10.10.11.1"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.10.10.11.1.1" style="padding:1pt 1.8pt;">Eyeriss <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib15" title="">15</a>]</cite> </td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.10.10.11.1.2" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.10.10.11.1.3" style="padding:1pt 1.8pt;">Scale-up</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.10.10.11.1.4" style="padding:1pt 1.8pt;">No Support</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.10.10.11.1.5" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.10.10.11.1.6" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S5.T4.10.10.11.1.7" style="padding:1pt 1.8pt;">No</td> </tr> <tr class="ltx_tr" id="S5.T4.4.4.4"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.4.4.4.2" style="padding:1pt 1.8pt;">Neuro Cube <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib45" title="">45</a>]</cite> </td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.4.4.4.3" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.4.4.4.4" style="padding:1pt 1.8pt;">Scale-out</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.4.4.4.1" style="padding:1pt 1.8pt;"> <math alttext="O(d^{2})" class="ltx_Math" display="inline" id="S5.T4.4.4.4.1.m1.1"><semantics id="S5.T4.4.4.4.1.m1.1a"><mrow id="S5.T4.4.4.4.1.m1.1.1" xref="S5.T4.4.4.4.1.m1.1.1.cmml"><mi id="S5.T4.4.4.4.1.m1.1.1.3" xref="S5.T4.4.4.4.1.m1.1.1.3.cmml">O</mi><mo id="S5.T4.4.4.4.1.m1.1.1.2" xref="S5.T4.4.4.4.1.m1.1.1.2.cmml">⁢</mo><mrow id="S5.T4.4.4.4.1.m1.1.1.1.1" xref="S5.T4.4.4.4.1.m1.1.1.1.1.1.cmml"><mo id="S5.T4.4.4.4.1.m1.1.1.1.1.2" stretchy="false" xref="S5.T4.4.4.4.1.m1.1.1.1.1.1.cmml">(</mo><msup id="S5.T4.4.4.4.1.m1.1.1.1.1.1" xref="S5.T4.4.4.4.1.m1.1.1.1.1.1.cmml"><mi id="S5.T4.4.4.4.1.m1.1.1.1.1.1.2" xref="S5.T4.4.4.4.1.m1.1.1.1.1.1.2.cmml">d</mi><mn id="S5.T4.4.4.4.1.m1.1.1.1.1.1.3" xref="S5.T4.4.4.4.1.m1.1.1.1.1.1.3.cmml">2</mn></msup><mo id="S5.T4.4.4.4.1.m1.1.1.1.1.3" stretchy="false" xref="S5.T4.4.4.4.1.m1.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S5.T4.4.4.4.1.m1.1b"><apply id="S5.T4.4.4.4.1.m1.1.1.cmml" xref="S5.T4.4.4.4.1.m1.1.1"><times id="S5.T4.4.4.4.1.m1.1.1.2.cmml" xref="S5.T4.4.4.4.1.m1.1.1.2"></times><ci id="S5.T4.4.4.4.1.m1.1.1.3.cmml" xref="S5.T4.4.4.4.1.m1.1.1.3">𝑂</ci><apply id="S5.T4.4.4.4.1.m1.1.1.1.1.1.cmml" xref="S5.T4.4.4.4.1.m1.1.1.1.1"><csymbol cd="ambiguous" id="S5.T4.4.4.4.1.m1.1.1.1.1.1.1.cmml" xref="S5.T4.4.4.4.1.m1.1.1.1.1">superscript</csymbol><ci id="S5.T4.4.4.4.1.m1.1.1.1.1.1.2.cmml" xref="S5.T4.4.4.4.1.m1.1.1.1.1.1.2">𝑑</ci><cn id="S5.T4.4.4.4.1.m1.1.1.1.1.1.3.cmml" type="integer" xref="S5.T4.4.4.4.1.m1.1.1.1.1.1.3">2</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.T4.4.4.4.1.m1.1c">O(d^{2})</annotation><annotation encoding="application/x-llamapun" id="S5.T4.4.4.4.1.m1.1d">italic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )</annotation></semantics></math>, GEMV</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.4.4.4.5" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.4.4.4.6" style="padding:1pt 1.8pt;">Yes</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S5.T4.4.4.4.7" style="padding:1pt 1.8pt;">No</td> </tr> <tr class="ltx_tr" id="S5.T4.5.5.5"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.5.5.5.2" style="padding:1pt 1.8pt;">Brainwave <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib16" title="">16</a>]</cite> </td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.5.5.5.3" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.5.5.5.4" style="padding:1pt 1.8pt;">Scale-out</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.5.5.5.1" style="padding:1pt 1.8pt;"> <math alttext="O(d^{2})" class="ltx_Math" display="inline" id="S5.T4.5.5.5.1.m1.1"><semantics id="S5.T4.5.5.5.1.m1.1a"><mrow id="S5.T4.5.5.5.1.m1.1.1" xref="S5.T4.5.5.5.1.m1.1.1.cmml"><mi id="S5.T4.5.5.5.1.m1.1.1.3" xref="S5.T4.5.5.5.1.m1.1.1.3.cmml">O</mi><mo id="S5.T4.5.5.5.1.m1.1.1.2" xref="S5.T4.5.5.5.1.m1.1.1.2.cmml">⁢</mo><mrow id="S5.T4.5.5.5.1.m1.1.1.1.1" xref="S5.T4.5.5.5.1.m1.1.1.1.1.1.cmml"><mo id="S5.T4.5.5.5.1.m1.1.1.1.1.2" stretchy="false" xref="S5.T4.5.5.5.1.m1.1.1.1.1.1.cmml">(</mo><msup id="S5.T4.5.5.5.1.m1.1.1.1.1.1" xref="S5.T4.5.5.5.1.m1.1.1.1.1.1.cmml"><mi id="S5.T4.5.5.5.1.m1.1.1.1.1.1.2" xref="S5.T4.5.5.5.1.m1.1.1.1.1.1.2.cmml">d</mi><mn id="S5.T4.5.5.5.1.m1.1.1.1.1.1.3" xref="S5.T4.5.5.5.1.m1.1.1.1.1.1.3.cmml">2</mn></msup><mo id="S5.T4.5.5.5.1.m1.1.1.1.1.3" stretchy="false" xref="S5.T4.5.5.5.1.m1.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S5.T4.5.5.5.1.m1.1b"><apply id="S5.T4.5.5.5.1.m1.1.1.cmml" xref="S5.T4.5.5.5.1.m1.1.1"><times id="S5.T4.5.5.5.1.m1.1.1.2.cmml" xref="S5.T4.5.5.5.1.m1.1.1.2"></times><ci id="S5.T4.5.5.5.1.m1.1.1.3.cmml" xref="S5.T4.5.5.5.1.m1.1.1.3">𝑂</ci><apply id="S5.T4.5.5.5.1.m1.1.1.1.1.1.cmml" xref="S5.T4.5.5.5.1.m1.1.1.1.1"><csymbol cd="ambiguous" id="S5.T4.5.5.5.1.m1.1.1.1.1.1.1.cmml" xref="S5.T4.5.5.5.1.m1.1.1.1.1">superscript</csymbol><ci id="S5.T4.5.5.5.1.m1.1.1.1.1.1.2.cmml" xref="S5.T4.5.5.5.1.m1.1.1.1.1.1.2">𝑑</ci><cn id="S5.T4.5.5.5.1.m1.1.1.1.1.1.3.cmml" type="integer" xref="S5.T4.5.5.5.1.m1.1.1.1.1.1.3">2</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.T4.5.5.5.1.m1.1c">O(d^{2})</annotation><annotation encoding="application/x-llamapun" id="S5.T4.5.5.5.1.m1.1d">italic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )</annotation></semantics></math>, GEMV</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.5.5.5.5" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.5.5.5.6" style="padding:1pt 1.8pt;">Yes</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S5.T4.5.5.5.7" style="padding:1pt 1.8pt;">No</td> </tr> <tr class="ltx_tr" id="S5.T4.6.6.6"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.6.6.6.2" style="padding:1pt 1.8pt;">SARA <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib72" title="">72</a>]</cite> </td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.6.6.6.3" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.6.6.6.4" style="padding:1pt 1.8pt;">Both</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.6.6.6.1" style="padding:1pt 1.8pt;"> <math alttext="O(d^{2})" class="ltx_Math" display="inline" id="S5.T4.6.6.6.1.m1.1"><semantics id="S5.T4.6.6.6.1.m1.1a"><mrow id="S5.T4.6.6.6.1.m1.1.1" xref="S5.T4.6.6.6.1.m1.1.1.cmml"><mi id="S5.T4.6.6.6.1.m1.1.1.3" xref="S5.T4.6.6.6.1.m1.1.1.3.cmml">O</mi><mo id="S5.T4.6.6.6.1.m1.1.1.2" xref="S5.T4.6.6.6.1.m1.1.1.2.cmml">⁢</mo><mrow id="S5.T4.6.6.6.1.m1.1.1.1.1" xref="S5.T4.6.6.6.1.m1.1.1.1.1.1.cmml"><mo id="S5.T4.6.6.6.1.m1.1.1.1.1.2" stretchy="false" xref="S5.T4.6.6.6.1.m1.1.1.1.1.1.cmml">(</mo><msup id="S5.T4.6.6.6.1.m1.1.1.1.1.1" xref="S5.T4.6.6.6.1.m1.1.1.1.1.1.cmml"><mi id="S5.T4.6.6.6.1.m1.1.1.1.1.1.2" xref="S5.T4.6.6.6.1.m1.1.1.1.1.1.2.cmml">d</mi><mn id="S5.T4.6.6.6.1.m1.1.1.1.1.1.3" xref="S5.T4.6.6.6.1.m1.1.1.1.1.1.3.cmml">2</mn></msup><mo id="S5.T4.6.6.6.1.m1.1.1.1.1.3" stretchy="false" xref="S5.T4.6.6.6.1.m1.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S5.T4.6.6.6.1.m1.1b"><apply id="S5.T4.6.6.6.1.m1.1.1.cmml" xref="S5.T4.6.6.6.1.m1.1.1"><times id="S5.T4.6.6.6.1.m1.1.1.2.cmml" xref="S5.T4.6.6.6.1.m1.1.1.2"></times><ci id="S5.T4.6.6.6.1.m1.1.1.3.cmml" xref="S5.T4.6.6.6.1.m1.1.1.3">𝑂</ci><apply id="S5.T4.6.6.6.1.m1.1.1.1.1.1.cmml" xref="S5.T4.6.6.6.1.m1.1.1.1.1"><csymbol cd="ambiguous" id="S5.T4.6.6.6.1.m1.1.1.1.1.1.1.cmml" xref="S5.T4.6.6.6.1.m1.1.1.1.1">superscript</csymbol><ci id="S5.T4.6.6.6.1.m1.1.1.1.1.1.2.cmml" xref="S5.T4.6.6.6.1.m1.1.1.1.1.1.2">𝑑</ci><cn id="S5.T4.6.6.6.1.m1.1.1.1.1.1.3.cmml" type="integer" xref="S5.T4.6.6.6.1.m1.1.1.1.1.1.3">2</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.T4.6.6.6.1.m1.1c">O(d^{2})</annotation><annotation encoding="application/x-llamapun" id="S5.T4.6.6.6.1.m1.1d">italic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )</annotation></semantics></math>, GEMV</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.6.6.6.5" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.6.6.6.6" style="padding:1pt 1.8pt;">Yes</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S5.T4.6.6.6.7" style="padding:1pt 1.8pt;">No</td> </tr> <tr class="ltx_tr" id="S5.T4.7.7.7"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.7.7.7.2" style="padding:1pt 1.8pt;">TPU <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib42" title="">42</a>]</cite> </td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.7.7.7.3" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.7.7.7.4" style="padding:1pt 1.8pt;">Scale-out</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.7.7.7.1" style="padding:1pt 1.8pt;"> <math alttext="O(d^{2})" class="ltx_Math" display="inline" id="S5.T4.7.7.7.1.m1.1"><semantics id="S5.T4.7.7.7.1.m1.1a"><mrow id="S5.T4.7.7.7.1.m1.1.1" xref="S5.T4.7.7.7.1.m1.1.1.cmml"><mi id="S5.T4.7.7.7.1.m1.1.1.3" xref="S5.T4.7.7.7.1.m1.1.1.3.cmml">O</mi><mo id="S5.T4.7.7.7.1.m1.1.1.2" xref="S5.T4.7.7.7.1.m1.1.1.2.cmml">⁢</mo><mrow id="S5.T4.7.7.7.1.m1.1.1.1.1" xref="S5.T4.7.7.7.1.m1.1.1.1.1.1.cmml"><mo id="S5.T4.7.7.7.1.m1.1.1.1.1.2" stretchy="false" xref="S5.T4.7.7.7.1.m1.1.1.1.1.1.cmml">(</mo><msup id="S5.T4.7.7.7.1.m1.1.1.1.1.1" xref="S5.T4.7.7.7.1.m1.1.1.1.1.1.cmml"><mi id="S5.T4.7.7.7.1.m1.1.1.1.1.1.2" xref="S5.T4.7.7.7.1.m1.1.1.1.1.1.2.cmml">d</mi><mn id="S5.T4.7.7.7.1.m1.1.1.1.1.1.3" xref="S5.T4.7.7.7.1.m1.1.1.1.1.1.3.cmml">2</mn></msup><mo id="S5.T4.7.7.7.1.m1.1.1.1.1.3" stretchy="false" xref="S5.T4.7.7.7.1.m1.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S5.T4.7.7.7.1.m1.1b"><apply id="S5.T4.7.7.7.1.m1.1.1.cmml" xref="S5.T4.7.7.7.1.m1.1.1"><times id="S5.T4.7.7.7.1.m1.1.1.2.cmml" xref="S5.T4.7.7.7.1.m1.1.1.2"></times><ci id="S5.T4.7.7.7.1.m1.1.1.3.cmml" xref="S5.T4.7.7.7.1.m1.1.1.3">𝑂</ci><apply id="S5.T4.7.7.7.1.m1.1.1.1.1.1.cmml" xref="S5.T4.7.7.7.1.m1.1.1.1.1"><csymbol cd="ambiguous" id="S5.T4.7.7.7.1.m1.1.1.1.1.1.1.cmml" xref="S5.T4.7.7.7.1.m1.1.1.1.1">superscript</csymbol><ci id="S5.T4.7.7.7.1.m1.1.1.1.1.1.2.cmml" xref="S5.T4.7.7.7.1.m1.1.1.1.1.1.2">𝑑</ci><cn id="S5.T4.7.7.7.1.m1.1.1.1.1.1.3.cmml" type="integer" xref="S5.T4.7.7.7.1.m1.1.1.1.1.1.3">2</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.T4.7.7.7.1.m1.1c">O(d^{2})</annotation><annotation encoding="application/x-llamapun" id="S5.T4.7.7.7.1.m1.1d">italic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )</annotation></semantics></math>, GEMV</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.7.7.7.5" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.7.7.7.6" style="padding:1pt 1.8pt;">Yes</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S5.T4.7.7.7.7" style="padding:1pt 1.8pt;">No</td> </tr> <tr class="ltx_tr" id="S5.T4.8.8.8"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.8.8.8.2" style="padding:1pt 1.8pt;">SIMBA <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib77" title="">77</a>]</cite> </td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.8.8.8.3" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.8.8.8.4" style="padding:1pt 1.8pt;">Scale-out</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.8.8.8.1" style="padding:1pt 1.8pt;"> <math alttext="O(d^{2})" class="ltx_Math" display="inline" id="S5.T4.8.8.8.1.m1.1"><semantics id="S5.T4.8.8.8.1.m1.1a"><mrow id="S5.T4.8.8.8.1.m1.1.1" xref="S5.T4.8.8.8.1.m1.1.1.cmml"><mi id="S5.T4.8.8.8.1.m1.1.1.3" xref="S5.T4.8.8.8.1.m1.1.1.3.cmml">O</mi><mo id="S5.T4.8.8.8.1.m1.1.1.2" xref="S5.T4.8.8.8.1.m1.1.1.2.cmml">⁢</mo><mrow id="S5.T4.8.8.8.1.m1.1.1.1.1" xref="S5.T4.8.8.8.1.m1.1.1.1.1.1.cmml"><mo id="S5.T4.8.8.8.1.m1.1.1.1.1.2" stretchy="false" xref="S5.T4.8.8.8.1.m1.1.1.1.1.1.cmml">(</mo><msup id="S5.T4.8.8.8.1.m1.1.1.1.1.1" xref="S5.T4.8.8.8.1.m1.1.1.1.1.1.cmml"><mi id="S5.T4.8.8.8.1.m1.1.1.1.1.1.2" xref="S5.T4.8.8.8.1.m1.1.1.1.1.1.2.cmml">d</mi><mn id="S5.T4.8.8.8.1.m1.1.1.1.1.1.3" xref="S5.T4.8.8.8.1.m1.1.1.1.1.1.3.cmml">2</mn></msup><mo id="S5.T4.8.8.8.1.m1.1.1.1.1.3" stretchy="false" xref="S5.T4.8.8.8.1.m1.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S5.T4.8.8.8.1.m1.1b"><apply id="S5.T4.8.8.8.1.m1.1.1.cmml" xref="S5.T4.8.8.8.1.m1.1.1"><times id="S5.T4.8.8.8.1.m1.1.1.2.cmml" xref="S5.T4.8.8.8.1.m1.1.1.2"></times><ci id="S5.T4.8.8.8.1.m1.1.1.3.cmml" xref="S5.T4.8.8.8.1.m1.1.1.3">𝑂</ci><apply id="S5.T4.8.8.8.1.m1.1.1.1.1.1.cmml" xref="S5.T4.8.8.8.1.m1.1.1.1.1"><csymbol cd="ambiguous" id="S5.T4.8.8.8.1.m1.1.1.1.1.1.1.cmml" xref="S5.T4.8.8.8.1.m1.1.1.1.1">superscript</csymbol><ci id="S5.T4.8.8.8.1.m1.1.1.1.1.1.2.cmml" xref="S5.T4.8.8.8.1.m1.1.1.1.1.1.2">𝑑</ci><cn id="S5.T4.8.8.8.1.m1.1.1.1.1.1.3.cmml" type="integer" xref="S5.T4.8.8.8.1.m1.1.1.1.1.1.3">2</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.T4.8.8.8.1.m1.1c">O(d^{2})</annotation><annotation encoding="application/x-llamapun" id="S5.T4.8.8.8.1.m1.1d">italic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )</annotation></semantics></math>, GEMV</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.8.8.8.5" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.8.8.8.6" style="padding:1pt 1.8pt;">Yes</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S5.T4.8.8.8.7" style="padding:1pt 1.8pt;">No</td> </tr> <tr class="ltx_tr" id="S5.T4.9.9.9"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.9.9.9.2" style="padding:1pt 1.8pt;">MTIA <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib19" title="">19</a>]</cite> </td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.9.9.9.3" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.9.9.9.4" style="padding:1pt 1.8pt;">Scale-out</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.9.9.9.1" style="padding:1pt 1.8pt;"> <math alttext="O(d^{2})" class="ltx_Math" display="inline" id="S5.T4.9.9.9.1.m1.1"><semantics id="S5.T4.9.9.9.1.m1.1a"><mrow id="S5.T4.9.9.9.1.m1.1.1" xref="S5.T4.9.9.9.1.m1.1.1.cmml"><mi id="S5.T4.9.9.9.1.m1.1.1.3" xref="S5.T4.9.9.9.1.m1.1.1.3.cmml">O</mi><mo id="S5.T4.9.9.9.1.m1.1.1.2" xref="S5.T4.9.9.9.1.m1.1.1.2.cmml">⁢</mo><mrow id="S5.T4.9.9.9.1.m1.1.1.1.1" xref="S5.T4.9.9.9.1.m1.1.1.1.1.1.cmml"><mo id="S5.T4.9.9.9.1.m1.1.1.1.1.2" stretchy="false" xref="S5.T4.9.9.9.1.m1.1.1.1.1.1.cmml">(</mo><msup id="S5.T4.9.9.9.1.m1.1.1.1.1.1" xref="S5.T4.9.9.9.1.m1.1.1.1.1.1.cmml"><mi id="S5.T4.9.9.9.1.m1.1.1.1.1.1.2" xref="S5.T4.9.9.9.1.m1.1.1.1.1.1.2.cmml">d</mi><mn id="S5.T4.9.9.9.1.m1.1.1.1.1.1.3" xref="S5.T4.9.9.9.1.m1.1.1.1.1.1.3.cmml">2</mn></msup><mo id="S5.T4.9.9.9.1.m1.1.1.1.1.3" stretchy="false" xref="S5.T4.9.9.9.1.m1.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S5.T4.9.9.9.1.m1.1b"><apply id="S5.T4.9.9.9.1.m1.1.1.cmml" xref="S5.T4.9.9.9.1.m1.1.1"><times id="S5.T4.9.9.9.1.m1.1.1.2.cmml" xref="S5.T4.9.9.9.1.m1.1.1.2"></times><ci id="S5.T4.9.9.9.1.m1.1.1.3.cmml" xref="S5.T4.9.9.9.1.m1.1.1.3">𝑂</ci><apply id="S5.T4.9.9.9.1.m1.1.1.1.1.1.cmml" xref="S5.T4.9.9.9.1.m1.1.1.1.1"><csymbol cd="ambiguous" id="S5.T4.9.9.9.1.m1.1.1.1.1.1.1.cmml" xref="S5.T4.9.9.9.1.m1.1.1.1.1">superscript</csymbol><ci id="S5.T4.9.9.9.1.m1.1.1.1.1.1.2.cmml" xref="S5.T4.9.9.9.1.m1.1.1.1.1.1.2">𝑑</ci><cn id="S5.T4.9.9.9.1.m1.1.1.1.1.1.3.cmml" type="integer" xref="S5.T4.9.9.9.1.m1.1.1.1.1.1.3">2</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.T4.9.9.9.1.m1.1c">O(d^{2})</annotation><annotation encoding="application/x-llamapun" id="S5.T4.9.9.9.1.m1.1d">italic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )</annotation></semantics></math>, GEMV</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.9.9.9.5" style="padding:1pt 1.8pt;">No</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S5.T4.9.9.9.6" style="padding:1pt 1.8pt;">Yes</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S5.T4.9.9.9.7" style="padding:1pt 1.8pt;">No</td> </tr> <tr class="ltx_tr" id="S5.T4.10.10.10"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T4.10.10.10.2" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.10.10.10.2.1">CogSys (Ours)</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T4.10.10.10.3" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.10.10.10.3.1">Yes</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T4.10.10.10.4" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.10.10.10.4.1">Both</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T4.10.10.10.1" style="padding:1pt 1.8pt;"> <math alttext="\mathbf{O(d)}" class="ltx_Math" display="inline" id="S5.T4.10.10.10.1.m1.1"><semantics id="S5.T4.10.10.10.1.m1.1a"><mrow id="S5.T4.10.10.10.1.m1.1.2" xref="S5.T4.10.10.10.1.m1.1.2.cmml"><mi id="S5.T4.10.10.10.1.m1.1.2.2" xref="S5.T4.10.10.10.1.m1.1.2.2.cmml">𝐎</mi><mo id="S5.T4.10.10.10.1.m1.1.2.1" xref="S5.T4.10.10.10.1.m1.1.2.1.cmml">⁢</mo><mrow id="S5.T4.10.10.10.1.m1.1.2.3.2" xref="S5.T4.10.10.10.1.m1.1.2.cmml"><mo id="S5.T4.10.10.10.1.m1.1.2.3.2.1" stretchy="false" xref="S5.T4.10.10.10.1.m1.1.2.cmml">(</mo><mi id="S5.T4.10.10.10.1.m1.1.1" xref="S5.T4.10.10.10.1.m1.1.1.cmml">𝐝</mi><mo id="S5.T4.10.10.10.1.m1.1.2.3.2.2" stretchy="false" xref="S5.T4.10.10.10.1.m1.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S5.T4.10.10.10.1.m1.1b"><apply id="S5.T4.10.10.10.1.m1.1.2.cmml" xref="S5.T4.10.10.10.1.m1.1.2"><times id="S5.T4.10.10.10.1.m1.1.2.1.cmml" xref="S5.T4.10.10.10.1.m1.1.2.1"></times><ci id="S5.T4.10.10.10.1.m1.1.2.2.cmml" xref="S5.T4.10.10.10.1.m1.1.2.2">𝐎</ci><ci id="S5.T4.10.10.10.1.m1.1.1.cmml" xref="S5.T4.10.10.10.1.m1.1.1">𝐝</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.T4.10.10.10.1.m1.1c">\mathbf{O(d)}</annotation><annotation encoding="application/x-llamapun" id="S5.T4.10.10.10.1.m1.1d">bold_O ( bold_d )</annotation></semantics></math><span class="ltx_text ltx_font_bold" id="S5.T4.10.10.10.1.1">, BS Dataflow</span> </td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T4.10.10.10.5" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.10.10.10.5.1">Yes</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T4.10.10.10.6" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.10.10.10.6.1">Yes</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_t" id="S5.T4.10.10.10.7" style="padding:1pt 1.8pt;"><span class="ltx_text ltx_font_bold" id="S5.T4.10.10.10.7.1">Yes</span></td> </tr> </tbody> </table> </span></div> </div> <div class="ltx_flex_break"></div> <div class="ltx_flex_cell ltx_flex_size_1"> <p class="ltx_p ltx_figure_panel ltx_align_center" id="S5.T4.15"><span class="ltx_text" id="S5.T4.15.1" style="font-size:70%;">*Column-Wise Parallelism (within a systolic cell); **Systolic cell-Wise Parallelism</span></p> </div> <div class="ltx_flex_break"></div> </div> </figure> <section class="ltx_subsection" id="S5.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S5.SS1.5.1.1">V-A</span> </span><span class="ltx_text ltx_font_italic" id="S5.SS1.6.2">Overview of Proposed CogSys Architecture</span> </h3> <div class="ltx_para" id="S5.SS1.p1"> <p class="ltx_p" id="S5.SS1.p1.1">Neurosymbolic workloads feature much greater heterogeneity in compute kernels than DNNs, leading to an increasing divergence with the current hardware that focuses on GEMMs and convolutions. As illustrated in Tab. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.T4" title="TABLE IV ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">IV</span></a>, CogSys is proposed, for the first time, to support neurosymbolic workloads and achieve efficient and scalable implementation of symbolic operations.</p> </div> <div class="ltx_para" id="S5.SS1.p2"> <p class="ltx_p" id="S5.SS1.p2.3">Aiming to design a complete neurosymbolic acceleration system, our design includes DRAM, a host SoC, and CogSys accelerator consisting of five major components: reconfigurable neuro/symbolic compute array, SIMD unit, double-buffered SRAM, adaptive scheduler and memory controller (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.F10" title="Figure 10 ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">10</span></a>). During the reasoning procedure, the host SoC streams task in, and then the reconfigurable arrays perform neuro (e.g., GEMMs/convolutions) and vector-symbolic operations (e.g., circular convolution), while the SIMD units accelerate element- and vector-wise operations with multi-level parallelism and adaptive workload scheduling. <em class="ltx_emph ltx_font_italic" id="S5.SS1.p2.3.3">It is worth noting that monolithic systolic array (TPU-like) is extremely inefficient for symbolic workloads (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.SS3" title="V-C Efficient Bubble Streaming Dataflow ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">V-C</span></span></a>), while CogSys provides reconfigurable support for neural and symbolic kernels and demonstrates <math alttext=">" class="ltx_Math" display="inline" id="S5.SS1.p2.1.1.m1.1"><semantics id="S5.SS1.p2.1.1.m1.1a"><mo id="S5.SS1.p2.1.1.m1.1.1" xref="S5.SS1.p2.1.1.m1.1.1.cmml">></mo><annotation-xml encoding="MathML-Content" id="S5.SS1.p2.1.1.m1.1b"><gt id="S5.SS1.p2.1.1.m1.1.1.cmml" xref="S5.SS1.p2.1.1.m1.1.1"></gt></annotation-xml><annotation encoding="application/x-tex" id="S5.SS1.p2.1.1.m1.1c">></annotation><annotation encoding="application/x-llamapun" id="S5.SS1.p2.1.1.m1.1d">></annotation></semantics></math>75<math alttext="\times" class="ltx_Math" display="inline" id="S5.SS1.p2.2.2.m2.1"><semantics id="S5.SS1.p2.2.2.m2.1a"><mo id="S5.SS1.p2.2.2.m2.1.1" xref="S5.SS1.p2.2.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.SS1.p2.2.2.m2.1b"><times id="S5.SS1.p2.2.2.m2.1.1.cmml" xref="S5.SS1.p2.2.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.SS1.p2.2.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS1.p2.2.2.m2.1d">×</annotation></semantics></math> speedup with only <math alttext="<" class="ltx_Math" display="inline" id="S5.SS1.p2.3.3.m3.1"><semantics id="S5.SS1.p2.3.3.m3.1a"><mo id="S5.SS1.p2.3.3.m3.1.1" xref="S5.SS1.p2.3.3.m3.1.1.cmml"><</mo><annotation-xml encoding="MathML-Content" id="S5.SS1.p2.3.3.m3.1b"><lt id="S5.SS1.p2.3.3.m3.1.1.cmml" xref="S5.SS1.p2.3.3.m3.1.1"></lt></annotation-xml><annotation encoding="application/x-tex" id="S5.SS1.p2.3.3.m3.1c"><</annotation><annotation encoding="application/x-llamapun" id="S5.SS1.p2.3.3.m3.1d"><</annotation></semantics></math>5% area overhead over systolic array architecture.</em></p> </div> </section> <section class="ltx_subsection" id="S5.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S5.SS2.5.1.1">V-B</span> </span><span class="ltx_text ltx_font_italic" id="S5.SS2.6.2">Reconfigurable Neuro/Symbolic Processing Element</span> </h3> <div class="ltx_para" id="S5.SS2.p1"> <p class="ltx_p" id="S5.SS2.p1.4"><span class="ltx_text ltx_font_bold" id="S5.SS2.p1.4.1">Reconfigurable neuro/symbolic PE (<span class="ltx_text ltx_font_italic" id="S5.SS2.p1.4.1.1">nsPE</span>).</span> Instead of having separate PEs for neuro and symbolic operations that incur large area overhead, we propose <span class="ltx_text ltx_font_italic" id="S5.SS2.p1.4.2">nsPE</span> micro-architecture that provides reconfigurable support to both neuro and symbolic operations (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.F10" title="Figure 10 ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">10</span></a>). Each <span class="ltx_text ltx_font_italic" id="S5.SS2.p1.4.3">nsPE</span> consists of four registers (stationary, passing, streaming, and partial sum registers) and supports three operation modes (load, GEMM, and circular convolution). <span class="ltx_text ltx_framed ltx_framed_underline" id="S5.SS2.p1.4.4">During load mode</span>, the input vectors <math alttext="\boldsymbol{A}" class="ltx_Math" display="inline" id="S5.SS2.p1.1.m1.1"><semantics id="S5.SS2.p1.1.m1.1a"><mi id="S5.SS2.p1.1.m1.1.1" xref="S5.SS2.p1.1.m1.1.1.cmml">𝑨</mi><annotation-xml encoding="MathML-Content" id="S5.SS2.p1.1.m1.1b"><ci id="S5.SS2.p1.1.m1.1.1.cmml" xref="S5.SS2.p1.1.m1.1.1">𝑨</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS2.p1.1.m1.1c">\boldsymbol{A}</annotation><annotation encoding="application/x-llamapun" id="S5.SS2.p1.1.m1.1d">bold_italic_A</annotation></semantics></math> (weights of GEMM) are passed into the stationary register using ‘top_in_A’ links. Reconfigurability is achieved by selecting input <math alttext="\boldsymbol{B}" class="ltx_Math" display="inline" id="S5.SS2.p1.2.m2.1"><semantics id="S5.SS2.p1.2.m2.1a"><mi id="S5.SS2.p1.2.m2.1.1" xref="S5.SS2.p1.2.m2.1.1.cmml">𝑩</mi><annotation-xml encoding="MathML-Content" id="S5.SS2.p1.2.m2.1b"><ci id="S5.SS2.p1.2.m2.1.1.cmml" xref="S5.SS2.p1.2.m2.1.1">𝑩</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS2.p1.2.m2.1c">\boldsymbol{B}</annotation><annotation encoding="application/x-llamapun" id="S5.SS2.p1.2.m2.1d">bold_italic_B</annotation></semantics></math> either from ‘left_in’ link (GEMM mode) or the passing register (circular convolution mode). <span class="ltx_text ltx_framed ltx_framed_underline" id="S5.SS2.p1.4.5">During GEMM</span> <span class="ltx_text ltx_framed ltx_framed_underline" id="S5.SS2.p1.4.6">mode</span>, the <span class="ltx_text ltx_font_italic" id="S5.SS2.p1.4.7">nsPE</span> operates as TPU-like architecture for efficient GEMM and convolution. Inputs are streamed from left to right using ‘left_in’ links. <span class="ltx_text ltx_framed ltx_framed_underline" id="S5.SS2.p1.4.8">During circular convolution mode</span>, input vector <math alttext="\boldsymbol{B}" class="ltx_Math" display="inline" id="S5.SS2.p1.3.m3.1"><semantics id="S5.SS2.p1.3.m3.1a"><mi id="S5.SS2.p1.3.m3.1.1" xref="S5.SS2.p1.3.m3.1.1.cmml">𝑩</mi><annotation-xml encoding="MathML-Content" id="S5.SS2.p1.3.m3.1b"><ci id="S5.SS2.p1.3.m3.1.1.cmml" xref="S5.SS2.p1.3.m3.1.1">𝑩</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS2.p1.3.m3.1c">\boldsymbol{B}</annotation><annotation encoding="application/x-llamapun" id="S5.SS2.p1.3.m3.1d">bold_italic_B</annotation></semantics></math> is streamed from top to bottom using ‘top_in_B’ links with a bubble via passing register (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.SS3" title="V-C Efficient Bubble Streaming Dataflow ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">V-C</span></span></a>), facilitating the temporal reuse of the streaming input for efficient vector-symbolic circular convolution operation. The reconfigurable <span class="ltx_text ltx_font_italic" id="S5.SS2.p1.4.9">nsPE</span> can also support efficient circular correlation by reversing stationary vector <math alttext="\boldsymbol{A}" class="ltx_Math" display="inline" id="S5.SS2.p1.4.m4.1"><semantics id="S5.SS2.p1.4.m4.1a"><mi id="S5.SS2.p1.4.m4.1.1" xref="S5.SS2.p1.4.m4.1.1.cmml">𝑨</mi><annotation-xml encoding="MathML-Content" id="S5.SS2.p1.4.m4.1b"><ci id="S5.SS2.p1.4.m4.1.1.cmml" xref="S5.SS2.p1.4.m4.1.1">𝑨</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS2.p1.4.m4.1c">\boldsymbol{A}</annotation><annotation encoding="application/x-llamapun" id="S5.SS2.p1.4.m4.1d">bold_italic_A</annotation></semantics></math>. During both GEMM and circular convolution modes, partial products are reduced from top to bottom with ‘top_in_A’ links.</p> </div> </section> <section class="ltx_subsection" id="S5.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S5.SS3.5.1.1">V-C</span> </span><span class="ltx_text ltx_font_italic" id="S5.SS3.6.2">Efficient Bubble Streaming Dataflow</span> </h3> <figure class="ltx_figure" id="S5.F11"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="257" id="S5.F11.g1" src="x11.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S5.F11.22.7.1" style="font-size:90%;">Figure 11</span>: </span><span class="ltx_text ltx_font_bold" id="S5.F11.12.6" style="font-size:90%;">Efficient bubble streaming (<span class="ltx_text ltx_font_italic" id="S5.F11.12.6.7">BS</span>) dataflow, roofline analysis, CogSys/TPU/GPU comparison.<span class="ltx_text ltx_font_medium" id="S5.F11.12.6.8"> </span>(a)<span class="ltx_text ltx_font_medium" id="S5.F11.12.6.9"> Compute cycle and mapping comparison of TPU-like systolic array dataflow and CogSys <span class="ltx_text ltx_font_italic" id="S5.F11.12.6.9.1">BS</span> dataflow under multiple circular convolutions. </span>(b) <span class="ltx_text ltx_font_medium ltx_font_italic" id="S5.F11.10.4.4">BS<span class="ltx_text ltx_font_upright" id="S5.F11.10.4.4.4"> dataflow showing circular convolution of two vectors <math alttext="\boldsymbol{A}" class="ltx_Math" display="inline" id="S5.F11.7.1.1.1.m1.1"><semantics id="S5.F11.7.1.1.1.m1.1b"><mi id="S5.F11.7.1.1.1.m1.1.1" xref="S5.F11.7.1.1.1.m1.1.1.cmml">𝑨</mi><annotation-xml encoding="MathML-Content" id="S5.F11.7.1.1.1.m1.1c"><ci id="S5.F11.7.1.1.1.m1.1.1.cmml" xref="S5.F11.7.1.1.1.m1.1.1">𝑨</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.F11.7.1.1.1.m1.1d">\boldsymbol{A}</annotation><annotation encoding="application/x-llamapun" id="S5.F11.7.1.1.1.m1.1e">bold_italic_A</annotation></semantics></math> and <math alttext="\boldsymbol{B}" class="ltx_Math" display="inline" id="S5.F11.8.2.2.2.m2.1"><semantics id="S5.F11.8.2.2.2.m2.1b"><mi id="S5.F11.8.2.2.2.m2.1.1" xref="S5.F11.8.2.2.2.m2.1.1.cmml">𝑩</mi><annotation-xml encoding="MathML-Content" id="S5.F11.8.2.2.2.m2.1c"><ci id="S5.F11.8.2.2.2.m2.1.1.cmml" xref="S5.F11.8.2.2.2.m2.1.1">𝑩</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.F11.8.2.2.2.m2.1d">\boldsymbol{B}</annotation><annotation encoding="application/x-llamapun" id="S5.F11.8.2.2.2.m2.1e">bold_italic_B</annotation></semantics></math> (<math alttext="d" class="ltx_Math" display="inline" id="S5.F11.9.3.3.3.m3.1"><semantics id="S5.F11.9.3.3.3.m3.1b"><mi id="S5.F11.9.3.3.3.m3.1.1" xref="S5.F11.9.3.3.3.m3.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S5.F11.9.3.3.3.m3.1c"><ci id="S5.F11.9.3.3.3.m3.1.1.cmml" xref="S5.F11.9.3.3.3.m3.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.F11.9.3.3.3.m3.1d">d</annotation><annotation encoding="application/x-llamapun" id="S5.F11.9.3.3.3.m3.1e">italic_d</annotation></semantics></math>=3) in a 3<math alttext="\times" class="ltx_Math" display="inline" id="S5.F11.10.4.4.4.m4.1"><semantics id="S5.F11.10.4.4.4.m4.1b"><mo id="S5.F11.10.4.4.4.m4.1.1" xref="S5.F11.10.4.4.4.m4.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.F11.10.4.4.4.m4.1c"><times id="S5.F11.10.4.4.4.m4.1.1.cmml" xref="S5.F11.10.4.4.4.m4.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.F11.10.4.4.4.m4.1d">\times</annotation><annotation encoding="application/x-llamapun" id="S5.F11.10.4.4.4.m4.1e">×</annotation></semantics></math>1 CogSys </span>nsPE<span class="ltx_text ltx_font_upright" id="S5.F11.10.4.4.5"> array. </span></span>(c)<span class="ltx_text ltx_font_medium" id="S5.F11.12.6.6"> Roofline comparison of circular convolution implemented as <span class="ltx_text ltx_font_italic" id="S5.F11.12.6.6.1">BS</span> dataflow (compute-bound) on CogSys (<math alttext="2^{14}" class="ltx_Math" display="inline" id="S5.F11.11.5.5.m1.1"><semantics id="S5.F11.11.5.5.m1.1b"><msup id="S5.F11.11.5.5.m1.1.1" xref="S5.F11.11.5.5.m1.1.1.cmml"><mn id="S5.F11.11.5.5.m1.1.1.2" xref="S5.F11.11.5.5.m1.1.1.2.cmml">2</mn><mn id="S5.F11.11.5.5.m1.1.1.3" xref="S5.F11.11.5.5.m1.1.1.3.cmml">14</mn></msup><annotation-xml encoding="MathML-Content" id="S5.F11.11.5.5.m1.1c"><apply id="S5.F11.11.5.5.m1.1.1.cmml" xref="S5.F11.11.5.5.m1.1.1"><csymbol cd="ambiguous" id="S5.F11.11.5.5.m1.1.1.1.cmml" xref="S5.F11.11.5.5.m1.1.1">superscript</csymbol><cn id="S5.F11.11.5.5.m1.1.1.2.cmml" type="integer" xref="S5.F11.11.5.5.m1.1.1.2">2</cn><cn id="S5.F11.11.5.5.m1.1.1.3.cmml" type="integer" xref="S5.F11.11.5.5.m1.1.1.3">14</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.F11.11.5.5.m1.1d">2^{14}</annotation><annotation encoding="application/x-llamapun" id="S5.F11.11.5.5.m1.1e">2 start_POSTSUPERSCRIPT 14 end_POSTSUPERSCRIPT</annotation></semantics></math> PEs) against implemented as GEMV in TPU systolic cell (<math alttext="2^{14}" class="ltx_Math" display="inline" id="S5.F11.12.6.6.m2.1"><semantics id="S5.F11.12.6.6.m2.1b"><msup id="S5.F11.12.6.6.m2.1.1" xref="S5.F11.12.6.6.m2.1.1.cmml"><mn id="S5.F11.12.6.6.m2.1.1.2" xref="S5.F11.12.6.6.m2.1.1.2.cmml">2</mn><mn id="S5.F11.12.6.6.m2.1.1.3" xref="S5.F11.12.6.6.m2.1.1.3.cmml">14</mn></msup><annotation-xml encoding="MathML-Content" id="S5.F11.12.6.6.m2.1c"><apply id="S5.F11.12.6.6.m2.1.1.cmml" xref="S5.F11.12.6.6.m2.1.1"><csymbol cd="ambiguous" id="S5.F11.12.6.6.m2.1.1.1.cmml" xref="S5.F11.12.6.6.m2.1.1">superscript</csymbol><cn id="S5.F11.12.6.6.m2.1.1.2.cmml" type="integer" xref="S5.F11.12.6.6.m2.1.1.2">2</cn><cn id="S5.F11.12.6.6.m2.1.1.3.cmml" type="integer" xref="S5.F11.12.6.6.m2.1.1.3">14</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.F11.12.6.6.m2.1d">2^{14}</annotation><annotation encoding="application/x-llamapun" id="S5.F11.12.6.6.m2.1e">2 start_POSTSUPERSCRIPT 14 end_POSTSUPERSCRIPT</annotation></semantics></math> PEs) and GPU (memory-bound). </span></span></figcaption> </figure> <div class="ltx_para" id="S5.SS3.p1"> <p class="ltx_p" id="S5.SS3.p1.1"><span class="ltx_text ltx_font_bold" id="S5.SS3.p1.1.1">Inefficiency of TPU-like systolic array.</span> TPU-like systolic array (SA) exhibits high memory footprint and low parallelism for symbolic circular convolution operations. Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.F11" title="Figure 11 ‣ V-C Efficient Bubble Streaming Dataflow ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">11</span></a><span class="ltx_text" id="S5.SS3.p1.1.2" style="color:#0000FF;">a</span> shows a scenario of three circular convolutions. TPU-like systolic cell implements them as general matrix-vector (GEMV) multiplication where matrices contain circularly shifted stationary vectors with the matrix memory footprint of O(<math alttext="d^{2}" class="ltx_Math" display="inline" id="S5.SS3.p1.1.m1.1"><semantics id="S5.SS3.p1.1.m1.1a"><msup id="S5.SS3.p1.1.m1.1.1" xref="S5.SS3.p1.1.m1.1.1.cmml"><mi id="S5.SS3.p1.1.m1.1.1.2" xref="S5.SS3.p1.1.m1.1.1.2.cmml">d</mi><mn id="S5.SS3.p1.1.m1.1.1.3" xref="S5.SS3.p1.1.m1.1.1.3.cmml">2</mn></msup><annotation-xml encoding="MathML-Content" id="S5.SS3.p1.1.m1.1b"><apply id="S5.SS3.p1.1.m1.1.1.cmml" xref="S5.SS3.p1.1.m1.1.1"><csymbol cd="ambiguous" id="S5.SS3.p1.1.m1.1.1.1.cmml" xref="S5.SS3.p1.1.m1.1.1">superscript</csymbol><ci id="S5.SS3.p1.1.m1.1.1.2.cmml" xref="S5.SS3.p1.1.m1.1.1.2">𝑑</ci><cn id="S5.SS3.p1.1.m1.1.1.3.cmml" type="integer" xref="S5.SS3.p1.1.m1.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p1.1.m1.1c">d^{2}</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p1.1.m1.1d">italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT</annotation></semantics></math>). Additionally, TPU-like SA is incapable of parallelizing multiple GEMV on a systolic cell and need to process them sequentially.</p> </div> <div class="ltx_para" id="S5.SS3.p2"> <p class="ltx_p" id="S5.SS3.p2.6"><span class="ltx_text ltx_font_bold" id="S5.SS3.p2.6.1">Bubble streaming (<span class="ltx_text ltx_font_italic" id="S5.SS3.p2.6.1.1">BS</span>) dataflow.</span> To efficiently support symbolic operations in <span class="ltx_text ltx_font_italic" id="S5.SS3.p2.6.2">nsPE</span> array, we propose <span class="ltx_text ltx_font_italic" id="S5.SS3.p2.6.3">BS</span> dataflow for circular convolution (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.SS3" title="II-C VSA-Based Symbolic Operations ‣ II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">II-C</span></span></a>) and circular correlation (opposite direction circular shift) which is the vector-symbolic bottleneck. Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.F11" title="Figure 11 ‣ V-C Efficient Bubble Streaming Dataflow ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">11</span></a><span class="ltx_text" id="S5.SS3.p2.6.4" style="color:#0000FF;">b</span> presents an example of <span class="ltx_text ltx_font_italic" id="S5.SS3.p2.6.5">BS</span> dataflow performing circular convolution of two vectors <math alttext="\boldsymbol{A}" class="ltx_Math" display="inline" id="S5.SS3.p2.1.m1.1"><semantics id="S5.SS3.p2.1.m1.1a"><mi id="S5.SS3.p2.1.m1.1.1" xref="S5.SS3.p2.1.m1.1.1.cmml">𝑨</mi><annotation-xml encoding="MathML-Content" id="S5.SS3.p2.1.m1.1b"><ci id="S5.SS3.p2.1.m1.1.1.cmml" xref="S5.SS3.p2.1.m1.1.1">𝑨</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p2.1.m1.1c">\boldsymbol{A}</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p2.1.m1.1d">bold_italic_A</annotation></semantics></math> and <math alttext="\boldsymbol{B}" class="ltx_Math" display="inline" id="S5.SS3.p2.2.m2.1"><semantics id="S5.SS3.p2.2.m2.1a"><mi id="S5.SS3.p2.2.m2.1.1" xref="S5.SS3.p2.2.m2.1.1.cmml">𝑩</mi><annotation-xml encoding="MathML-Content" id="S5.SS3.p2.2.m2.1b"><ci id="S5.SS3.p2.2.m2.1.1.cmml" xref="S5.SS3.p2.2.m2.1.1">𝑩</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p2.2.m2.1c">\boldsymbol{B}</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p2.2.m2.1d">bold_italic_B</annotation></semantics></math> (<math alttext="d" class="ltx_Math" display="inline" id="S5.SS3.p2.3.m3.1"><semantics id="S5.SS3.p2.3.m3.1a"><mi id="S5.SS3.p2.3.m3.1.1" xref="S5.SS3.p2.3.m3.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S5.SS3.p2.3.m3.1b"><ci id="S5.SS3.p2.3.m3.1.1.cmml" xref="S5.SS3.p2.3.m3.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p2.3.m3.1c">d</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p2.3.m3.1d">italic_d</annotation></semantics></math>=3) on a 3<math alttext="\times" class="ltx_Math" display="inline" id="S5.SS3.p2.4.m4.1"><semantics id="S5.SS3.p2.4.m4.1a"><mo id="S5.SS3.p2.4.m4.1.1" xref="S5.SS3.p2.4.m4.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.SS3.p2.4.m4.1b"><times id="S5.SS3.p2.4.m4.1.1.cmml" xref="S5.SS3.p2.4.m4.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p2.4.m4.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p2.4.m4.1d">×</annotation></semantics></math>1 <span class="ltx_text ltx_font_italic" id="S5.SS3.p2.6.6">nsPE</span> array. In <span class="ltx_text ltx_font_italic" id="S5.SS3.p2.6.7">BS</span> dataflow, vector <math alttext="\boldsymbol{B}" class="ltx_Math" display="inline" id="S5.SS3.p2.5.m5.1"><semantics id="S5.SS3.p2.5.m5.1a"><mi id="S5.SS3.p2.5.m5.1.1" xref="S5.SS3.p2.5.m5.1.1.cmml">𝑩</mi><annotation-xml encoding="MathML-Content" id="S5.SS3.p2.5.m5.1b"><ci id="S5.SS3.p2.5.m5.1.1.cmml" xref="S5.SS3.p2.5.m5.1.1">𝑩</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p2.5.m5.1c">\boldsymbol{B}</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p2.5.m5.1d">bold_italic_B</annotation></semantics></math> is streamed from one <span class="ltx_text ltx_font_italic" id="S5.SS3.p2.6.8">nsPE</span> to another through bubbles (passing registers) while vector <math alttext="\boldsymbol{A}" class="ltx_Math" display="inline" id="S5.SS3.p2.6.m6.1"><semantics id="S5.SS3.p2.6.m6.1a"><mi id="S5.SS3.p2.6.m6.1.1" xref="S5.SS3.p2.6.m6.1.1.cmml">𝑨</mi><annotation-xml encoding="MathML-Content" id="S5.SS3.p2.6.m6.1b"><ci id="S5.SS3.p2.6.m6.1.1.cmml" xref="S5.SS3.p2.6.m6.1.1">𝑨</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p2.6.m6.1c">\boldsymbol{A}</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p2.6.m6.1d">bold_italic_A</annotation></semantics></math> is held in stationary registers. The <span class="ltx_text ltx_font_italic" id="S5.SS3.p2.6.9">BS</span> dataflow enables a passing register to temporarily store the streaming input for a cycle before it moves to the streaming register. This value is transferred to the passing register of the next <span class="ltx_text ltx_font_italic" id="S5.SS3.p2.6.10">nsPE</span> in the following cycle. The MAC unit processes the data from stationary and streaming registers, adding it to the partial product. The procedure is repeated until final outputs.</p> </div> <div class="ltx_para" id="S5.SS3.p3"> <p class="ltx_p" id="S5.SS3.p3.8"><span class="ltx_text ltx_font_bold" id="S5.SS3.p3.8.1">Improved arithmetic intensity of <span class="ltx_text ltx_font_italic" id="S5.SS3.p3.8.1.1">BS</span> dataflow.</span> The <span class="ltx_text ltx_font_italic" id="S5.SS3.p3.8.2">BS</span> dataflow achieves higher arithmetic intensity than GEMV in GPU/TPU-like systolic cells, as illustrated in roofline analysis (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.F11" title="Figure 11 ‣ V-C Efficient Bubble Streaming Dataflow ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">11</span></a><span class="ltx_text" id="S5.SS3.p3.8.3" style="color:#0000FF;">c</span>). This efficiency mainly comes from reduced memory footprint and increased parallelism (Tab. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.T4" title="TABLE IV ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">IV</span></a>). (1) Linear memory footprint: The bubble enables circularly shift in vector <math alttext="\boldsymbol{B}" class="ltx_Math" display="inline" id="S5.SS3.p3.1.m1.1"><semantics id="S5.SS3.p3.1.m1.1a"><mi id="S5.SS3.p3.1.m1.1.1" xref="S5.SS3.p3.1.m1.1.1.cmml">𝑩</mi><annotation-xml encoding="MathML-Content" id="S5.SS3.p3.1.m1.1b"><ci id="S5.SS3.p3.1.m1.1.1.cmml" xref="S5.SS3.p3.1.m1.1.1">𝑩</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p3.1.m1.1c">\boldsymbol{B}</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p3.1.m1.1d">bold_italic_B</annotation></semantics></math> (<math alttext="O(d)" class="ltx_Math" display="inline" id="S5.SS3.p3.2.m2.1"><semantics id="S5.SS3.p3.2.m2.1a"><mrow id="S5.SS3.p3.2.m2.1.2" xref="S5.SS3.p3.2.m2.1.2.cmml"><mi id="S5.SS3.p3.2.m2.1.2.2" xref="S5.SS3.p3.2.m2.1.2.2.cmml">O</mi><mo id="S5.SS3.p3.2.m2.1.2.1" xref="S5.SS3.p3.2.m2.1.2.1.cmml">⁢</mo><mrow id="S5.SS3.p3.2.m2.1.2.3.2" xref="S5.SS3.p3.2.m2.1.2.cmml"><mo id="S5.SS3.p3.2.m2.1.2.3.2.1" stretchy="false" xref="S5.SS3.p3.2.m2.1.2.cmml">(</mo><mi id="S5.SS3.p3.2.m2.1.1" xref="S5.SS3.p3.2.m2.1.1.cmml">d</mi><mo id="S5.SS3.p3.2.m2.1.2.3.2.2" stretchy="false" xref="S5.SS3.p3.2.m2.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S5.SS3.p3.2.m2.1b"><apply id="S5.SS3.p3.2.m2.1.2.cmml" xref="S5.SS3.p3.2.m2.1.2"><times id="S5.SS3.p3.2.m2.1.2.1.cmml" xref="S5.SS3.p3.2.m2.1.2.1"></times><ci id="S5.SS3.p3.2.m2.1.2.2.cmml" xref="S5.SS3.p3.2.m2.1.2.2">𝑂</ci><ci id="S5.SS3.p3.2.m2.1.1.cmml" xref="S5.SS3.p3.2.m2.1.1">𝑑</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p3.2.m2.1c">O(d)</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p3.2.m2.1d">italic_O ( italic_d )</annotation></semantics></math>) and alleviates the overhead of creating and fetching matrix (<math alttext="O(d^{2})" class="ltx_Math" display="inline" id="S5.SS3.p3.3.m3.1"><semantics id="S5.SS3.p3.3.m3.1a"><mrow id="S5.SS3.p3.3.m3.1.1" xref="S5.SS3.p3.3.m3.1.1.cmml"><mi id="S5.SS3.p3.3.m3.1.1.3" xref="S5.SS3.p3.3.m3.1.1.3.cmml">O</mi><mo id="S5.SS3.p3.3.m3.1.1.2" xref="S5.SS3.p3.3.m3.1.1.2.cmml">⁢</mo><mrow id="S5.SS3.p3.3.m3.1.1.1.1" xref="S5.SS3.p3.3.m3.1.1.1.1.1.cmml"><mo id="S5.SS3.p3.3.m3.1.1.1.1.2" stretchy="false" xref="S5.SS3.p3.3.m3.1.1.1.1.1.cmml">(</mo><msup id="S5.SS3.p3.3.m3.1.1.1.1.1" xref="S5.SS3.p3.3.m3.1.1.1.1.1.cmml"><mi id="S5.SS3.p3.3.m3.1.1.1.1.1.2" xref="S5.SS3.p3.3.m3.1.1.1.1.1.2.cmml">d</mi><mn id="S5.SS3.p3.3.m3.1.1.1.1.1.3" xref="S5.SS3.p3.3.m3.1.1.1.1.1.3.cmml">2</mn></msup><mo id="S5.SS3.p3.3.m3.1.1.1.1.3" stretchy="false" xref="S5.SS3.p3.3.m3.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S5.SS3.p3.3.m3.1b"><apply id="S5.SS3.p3.3.m3.1.1.cmml" xref="S5.SS3.p3.3.m3.1.1"><times id="S5.SS3.p3.3.m3.1.1.2.cmml" xref="S5.SS3.p3.3.m3.1.1.2"></times><ci id="S5.SS3.p3.3.m3.1.1.3.cmml" xref="S5.SS3.p3.3.m3.1.1.3">𝑂</ci><apply id="S5.SS3.p3.3.m3.1.1.1.1.1.cmml" xref="S5.SS3.p3.3.m3.1.1.1.1"><csymbol cd="ambiguous" id="S5.SS3.p3.3.m3.1.1.1.1.1.1.cmml" xref="S5.SS3.p3.3.m3.1.1.1.1">superscript</csymbol><ci id="S5.SS3.p3.3.m3.1.1.1.1.1.2.cmml" xref="S5.SS3.p3.3.m3.1.1.1.1.1.2">𝑑</ci><cn id="S5.SS3.p3.3.m3.1.1.1.1.1.3.cmml" type="integer" xref="S5.SS3.p3.3.m3.1.1.1.1.1.3">2</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p3.3.m3.1c">O(d^{2})</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p3.3.m3.1d">italic_O ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )</annotation></semantics></math>) of circularly shifted <math alttext="\boldsymbol{B}" class="ltx_Math" display="inline" id="S5.SS3.p3.4.m4.1"><semantics id="S5.SS3.p3.4.m4.1a"><mi id="S5.SS3.p3.4.m4.1.1" xref="S5.SS3.p3.4.m4.1.1.cmml">𝑩</mi><annotation-xml encoding="MathML-Content" id="S5.SS3.p3.4.m4.1b"><ci id="S5.SS3.p3.4.m4.1.1.cmml" xref="S5.SS3.p3.4.m4.1.1">𝑩</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p3.4.m4.1c">\boldsymbol{B}</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p3.4.m4.1d">bold_italic_B</annotation></semantics></math> as TPU-like systolic cell, reducing footprint by <math alttext="d\times" class="ltx_math_unparsed" display="inline" id="S5.SS3.p3.5.m5.1"><semantics id="S5.SS3.p3.5.m5.1a"><mrow id="S5.SS3.p3.5.m5.1b"><mi id="S5.SS3.p3.5.m5.1.1">d</mi><mo id="S5.SS3.p3.5.m5.1.2" lspace="0.222em">×</mo></mrow><annotation encoding="application/x-tex" id="S5.SS3.p3.5.m5.1c">d\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p3.5.m5.1d">italic_d ×</annotation></semantics></math>. (2) Column-wise parallelism (CWP): The <span class="ltx_text ltx_font_italic" id="S5.SS3.p3.8.4">BS</span> dataflow enables each column of a systolic cell to execute a circular convolution, thus multiple circular convolutions can be parallelized over multiple columns, which is not possible for GEMV in a TPU-like systolic cell. The low arithmetic intensity and memory-bound operations make GPUs inefficient for vector-symbolic circular convolution. For circular convolution of two <math alttext="d" class="ltx_Math" display="inline" id="S5.SS3.p3.6.m6.1"><semantics id="S5.SS3.p3.6.m6.1a"><mi id="S5.SS3.p3.6.m6.1.1" xref="S5.SS3.p3.6.m6.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S5.SS3.p3.6.m6.1b"><ci id="S5.SS3.p3.6.m6.1.1.cmml" xref="S5.SS3.p3.6.m6.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p3.6.m6.1c">d</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p3.6.m6.1d">italic_d</annotation></semantics></math>-dimensional vectors, the GPU arithmetic intensity is <math alttext="d\times(d+d-1)/(d\times d+d\times 1+d\times 1)" class="ltx_Math" display="inline" id="S5.SS3.p3.7.m7.2"><semantics id="S5.SS3.p3.7.m7.2a"><mrow id="S5.SS3.p3.7.m7.2.2" xref="S5.SS3.p3.7.m7.2.2.cmml"><mrow id="S5.SS3.p3.7.m7.1.1.1" xref="S5.SS3.p3.7.m7.1.1.1.cmml"><mi id="S5.SS3.p3.7.m7.1.1.1.3" xref="S5.SS3.p3.7.m7.1.1.1.3.cmml">d</mi><mo id="S5.SS3.p3.7.m7.1.1.1.2" lspace="0.222em" rspace="0.222em" xref="S5.SS3.p3.7.m7.1.1.1.2.cmml">×</mo><mrow id="S5.SS3.p3.7.m7.1.1.1.1.1" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.cmml"><mo id="S5.SS3.p3.7.m7.1.1.1.1.1.2" stretchy="false" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.cmml">(</mo><mrow id="S5.SS3.p3.7.m7.1.1.1.1.1.1" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.cmml"><mrow id="S5.SS3.p3.7.m7.1.1.1.1.1.1.2" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.2.cmml"><mi id="S5.SS3.p3.7.m7.1.1.1.1.1.1.2.2" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.2.2.cmml">d</mi><mo id="S5.SS3.p3.7.m7.1.1.1.1.1.1.2.1" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.2.1.cmml">+</mo><mi id="S5.SS3.p3.7.m7.1.1.1.1.1.1.2.3" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.2.3.cmml">d</mi></mrow><mo id="S5.SS3.p3.7.m7.1.1.1.1.1.1.1" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.1.cmml">−</mo><mn id="S5.SS3.p3.7.m7.1.1.1.1.1.1.3" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.3.cmml">1</mn></mrow><mo id="S5.SS3.p3.7.m7.1.1.1.1.1.3" stretchy="false" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S5.SS3.p3.7.m7.2.2.3" xref="S5.SS3.p3.7.m7.2.2.3.cmml">/</mo><mrow id="S5.SS3.p3.7.m7.2.2.2.1" xref="S5.SS3.p3.7.m7.2.2.2.1.1.cmml"><mo id="S5.SS3.p3.7.m7.2.2.2.1.2" stretchy="false" xref="S5.SS3.p3.7.m7.2.2.2.1.1.cmml">(</mo><mrow id="S5.SS3.p3.7.m7.2.2.2.1.1" xref="S5.SS3.p3.7.m7.2.2.2.1.1.cmml"><mrow id="S5.SS3.p3.7.m7.2.2.2.1.1.2" xref="S5.SS3.p3.7.m7.2.2.2.1.1.2.cmml"><mi id="S5.SS3.p3.7.m7.2.2.2.1.1.2.2" xref="S5.SS3.p3.7.m7.2.2.2.1.1.2.2.cmml">d</mi><mo id="S5.SS3.p3.7.m7.2.2.2.1.1.2.1" lspace="0.222em" rspace="0.222em" xref="S5.SS3.p3.7.m7.2.2.2.1.1.2.1.cmml">×</mo><mi id="S5.SS3.p3.7.m7.2.2.2.1.1.2.3" xref="S5.SS3.p3.7.m7.2.2.2.1.1.2.3.cmml">d</mi></mrow><mo id="S5.SS3.p3.7.m7.2.2.2.1.1.1" xref="S5.SS3.p3.7.m7.2.2.2.1.1.1.cmml">+</mo><mrow id="S5.SS3.p3.7.m7.2.2.2.1.1.3" xref="S5.SS3.p3.7.m7.2.2.2.1.1.3.cmml"><mi id="S5.SS3.p3.7.m7.2.2.2.1.1.3.2" xref="S5.SS3.p3.7.m7.2.2.2.1.1.3.2.cmml">d</mi><mo id="S5.SS3.p3.7.m7.2.2.2.1.1.3.1" lspace="0.222em" rspace="0.222em" xref="S5.SS3.p3.7.m7.2.2.2.1.1.3.1.cmml">×</mo><mn id="S5.SS3.p3.7.m7.2.2.2.1.1.3.3" xref="S5.SS3.p3.7.m7.2.2.2.1.1.3.3.cmml">1</mn></mrow><mo id="S5.SS3.p3.7.m7.2.2.2.1.1.1a" xref="S5.SS3.p3.7.m7.2.2.2.1.1.1.cmml">+</mo><mrow id="S5.SS3.p3.7.m7.2.2.2.1.1.4" xref="S5.SS3.p3.7.m7.2.2.2.1.1.4.cmml"><mi id="S5.SS3.p3.7.m7.2.2.2.1.1.4.2" xref="S5.SS3.p3.7.m7.2.2.2.1.1.4.2.cmml">d</mi><mo id="S5.SS3.p3.7.m7.2.2.2.1.1.4.1" lspace="0.222em" rspace="0.222em" xref="S5.SS3.p3.7.m7.2.2.2.1.1.4.1.cmml">×</mo><mn id="S5.SS3.p3.7.m7.2.2.2.1.1.4.3" xref="S5.SS3.p3.7.m7.2.2.2.1.1.4.3.cmml">1</mn></mrow></mrow><mo id="S5.SS3.p3.7.m7.2.2.2.1.3" stretchy="false" xref="S5.SS3.p3.7.m7.2.2.2.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S5.SS3.p3.7.m7.2b"><apply id="S5.SS3.p3.7.m7.2.2.cmml" xref="S5.SS3.p3.7.m7.2.2"><divide id="S5.SS3.p3.7.m7.2.2.3.cmml" xref="S5.SS3.p3.7.m7.2.2.3"></divide><apply id="S5.SS3.p3.7.m7.1.1.1.cmml" xref="S5.SS3.p3.7.m7.1.1.1"><times id="S5.SS3.p3.7.m7.1.1.1.2.cmml" xref="S5.SS3.p3.7.m7.1.1.1.2"></times><ci id="S5.SS3.p3.7.m7.1.1.1.3.cmml" xref="S5.SS3.p3.7.m7.1.1.1.3">𝑑</ci><apply id="S5.SS3.p3.7.m7.1.1.1.1.1.1.cmml" xref="S5.SS3.p3.7.m7.1.1.1.1.1"><minus id="S5.SS3.p3.7.m7.1.1.1.1.1.1.1.cmml" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.1"></minus><apply id="S5.SS3.p3.7.m7.1.1.1.1.1.1.2.cmml" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.2"><plus id="S5.SS3.p3.7.m7.1.1.1.1.1.1.2.1.cmml" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.2.1"></plus><ci id="S5.SS3.p3.7.m7.1.1.1.1.1.1.2.2.cmml" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.2.2">𝑑</ci><ci id="S5.SS3.p3.7.m7.1.1.1.1.1.1.2.3.cmml" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.2.3">𝑑</ci></apply><cn id="S5.SS3.p3.7.m7.1.1.1.1.1.1.3.cmml" type="integer" xref="S5.SS3.p3.7.m7.1.1.1.1.1.1.3">1</cn></apply></apply><apply id="S5.SS3.p3.7.m7.2.2.2.1.1.cmml" xref="S5.SS3.p3.7.m7.2.2.2.1"><plus id="S5.SS3.p3.7.m7.2.2.2.1.1.1.cmml" xref="S5.SS3.p3.7.m7.2.2.2.1.1.1"></plus><apply id="S5.SS3.p3.7.m7.2.2.2.1.1.2.cmml" xref="S5.SS3.p3.7.m7.2.2.2.1.1.2"><times id="S5.SS3.p3.7.m7.2.2.2.1.1.2.1.cmml" xref="S5.SS3.p3.7.m7.2.2.2.1.1.2.1"></times><ci id="S5.SS3.p3.7.m7.2.2.2.1.1.2.2.cmml" xref="S5.SS3.p3.7.m7.2.2.2.1.1.2.2">𝑑</ci><ci id="S5.SS3.p3.7.m7.2.2.2.1.1.2.3.cmml" xref="S5.SS3.p3.7.m7.2.2.2.1.1.2.3">𝑑</ci></apply><apply id="S5.SS3.p3.7.m7.2.2.2.1.1.3.cmml" xref="S5.SS3.p3.7.m7.2.2.2.1.1.3"><times id="S5.SS3.p3.7.m7.2.2.2.1.1.3.1.cmml" xref="S5.SS3.p3.7.m7.2.2.2.1.1.3.1"></times><ci id="S5.SS3.p3.7.m7.2.2.2.1.1.3.2.cmml" xref="S5.SS3.p3.7.m7.2.2.2.1.1.3.2">𝑑</ci><cn id="S5.SS3.p3.7.m7.2.2.2.1.1.3.3.cmml" type="integer" xref="S5.SS3.p3.7.m7.2.2.2.1.1.3.3">1</cn></apply><apply id="S5.SS3.p3.7.m7.2.2.2.1.1.4.cmml" xref="S5.SS3.p3.7.m7.2.2.2.1.1.4"><times id="S5.SS3.p3.7.m7.2.2.2.1.1.4.1.cmml" xref="S5.SS3.p3.7.m7.2.2.2.1.1.4.1"></times><ci id="S5.SS3.p3.7.m7.2.2.2.1.1.4.2.cmml" xref="S5.SS3.p3.7.m7.2.2.2.1.1.4.2">𝑑</ci><cn id="S5.SS3.p3.7.m7.2.2.2.1.1.4.3.cmml" type="integer" xref="S5.SS3.p3.7.m7.2.2.2.1.1.4.3">1</cn></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p3.7.m7.2c">d\times(d+d-1)/(d\times d+d\times 1+d\times 1)</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p3.7.m7.2d">italic_d × ( italic_d + italic_d - 1 ) / ( italic_d × italic_d + italic_d × 1 + italic_d × 1 )</annotation></semantics></math>, while CogSys arithmetic intensity is <math alttext="d\times(d+d-1)/(d\times 1+d\times 1+d\times 1)" class="ltx_Math" display="inline" id="S5.SS3.p3.8.m8.2"><semantics id="S5.SS3.p3.8.m8.2a"><mrow id="S5.SS3.p3.8.m8.2.2" xref="S5.SS3.p3.8.m8.2.2.cmml"><mrow id="S5.SS3.p3.8.m8.1.1.1" xref="S5.SS3.p3.8.m8.1.1.1.cmml"><mi id="S5.SS3.p3.8.m8.1.1.1.3" xref="S5.SS3.p3.8.m8.1.1.1.3.cmml">d</mi><mo id="S5.SS3.p3.8.m8.1.1.1.2" lspace="0.222em" rspace="0.222em" xref="S5.SS3.p3.8.m8.1.1.1.2.cmml">×</mo><mrow id="S5.SS3.p3.8.m8.1.1.1.1.1" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.cmml"><mo id="S5.SS3.p3.8.m8.1.1.1.1.1.2" stretchy="false" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.cmml">(</mo><mrow id="S5.SS3.p3.8.m8.1.1.1.1.1.1" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.cmml"><mrow id="S5.SS3.p3.8.m8.1.1.1.1.1.1.2" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.2.cmml"><mi id="S5.SS3.p3.8.m8.1.1.1.1.1.1.2.2" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.2.2.cmml">d</mi><mo id="S5.SS3.p3.8.m8.1.1.1.1.1.1.2.1" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.2.1.cmml">+</mo><mi id="S5.SS3.p3.8.m8.1.1.1.1.1.1.2.3" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.2.3.cmml">d</mi></mrow><mo id="S5.SS3.p3.8.m8.1.1.1.1.1.1.1" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.1.cmml">−</mo><mn id="S5.SS3.p3.8.m8.1.1.1.1.1.1.3" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.3.cmml">1</mn></mrow><mo id="S5.SS3.p3.8.m8.1.1.1.1.1.3" stretchy="false" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S5.SS3.p3.8.m8.2.2.3" xref="S5.SS3.p3.8.m8.2.2.3.cmml">/</mo><mrow id="S5.SS3.p3.8.m8.2.2.2.1" xref="S5.SS3.p3.8.m8.2.2.2.1.1.cmml"><mo id="S5.SS3.p3.8.m8.2.2.2.1.2" stretchy="false" xref="S5.SS3.p3.8.m8.2.2.2.1.1.cmml">(</mo><mrow id="S5.SS3.p3.8.m8.2.2.2.1.1" xref="S5.SS3.p3.8.m8.2.2.2.1.1.cmml"><mrow id="S5.SS3.p3.8.m8.2.2.2.1.1.2" xref="S5.SS3.p3.8.m8.2.2.2.1.1.2.cmml"><mi id="S5.SS3.p3.8.m8.2.2.2.1.1.2.2" xref="S5.SS3.p3.8.m8.2.2.2.1.1.2.2.cmml">d</mi><mo id="S5.SS3.p3.8.m8.2.2.2.1.1.2.1" lspace="0.222em" rspace="0.222em" xref="S5.SS3.p3.8.m8.2.2.2.1.1.2.1.cmml">×</mo><mn id="S5.SS3.p3.8.m8.2.2.2.1.1.2.3" xref="S5.SS3.p3.8.m8.2.2.2.1.1.2.3.cmml">1</mn></mrow><mo id="S5.SS3.p3.8.m8.2.2.2.1.1.1" xref="S5.SS3.p3.8.m8.2.2.2.1.1.1.cmml">+</mo><mrow id="S5.SS3.p3.8.m8.2.2.2.1.1.3" xref="S5.SS3.p3.8.m8.2.2.2.1.1.3.cmml"><mi id="S5.SS3.p3.8.m8.2.2.2.1.1.3.2" xref="S5.SS3.p3.8.m8.2.2.2.1.1.3.2.cmml">d</mi><mo id="S5.SS3.p3.8.m8.2.2.2.1.1.3.1" lspace="0.222em" rspace="0.222em" xref="S5.SS3.p3.8.m8.2.2.2.1.1.3.1.cmml">×</mo><mn id="S5.SS3.p3.8.m8.2.2.2.1.1.3.3" xref="S5.SS3.p3.8.m8.2.2.2.1.1.3.3.cmml">1</mn></mrow><mo id="S5.SS3.p3.8.m8.2.2.2.1.1.1a" xref="S5.SS3.p3.8.m8.2.2.2.1.1.1.cmml">+</mo><mrow id="S5.SS3.p3.8.m8.2.2.2.1.1.4" xref="S5.SS3.p3.8.m8.2.2.2.1.1.4.cmml"><mi id="S5.SS3.p3.8.m8.2.2.2.1.1.4.2" xref="S5.SS3.p3.8.m8.2.2.2.1.1.4.2.cmml">d</mi><mo id="S5.SS3.p3.8.m8.2.2.2.1.1.4.1" lspace="0.222em" rspace="0.222em" xref="S5.SS3.p3.8.m8.2.2.2.1.1.4.1.cmml">×</mo><mn id="S5.SS3.p3.8.m8.2.2.2.1.1.4.3" xref="S5.SS3.p3.8.m8.2.2.2.1.1.4.3.cmml">1</mn></mrow></mrow><mo id="S5.SS3.p3.8.m8.2.2.2.1.3" stretchy="false" xref="S5.SS3.p3.8.m8.2.2.2.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S5.SS3.p3.8.m8.2b"><apply id="S5.SS3.p3.8.m8.2.2.cmml" xref="S5.SS3.p3.8.m8.2.2"><divide id="S5.SS3.p3.8.m8.2.2.3.cmml" xref="S5.SS3.p3.8.m8.2.2.3"></divide><apply id="S5.SS3.p3.8.m8.1.1.1.cmml" xref="S5.SS3.p3.8.m8.1.1.1"><times id="S5.SS3.p3.8.m8.1.1.1.2.cmml" xref="S5.SS3.p3.8.m8.1.1.1.2"></times><ci id="S5.SS3.p3.8.m8.1.1.1.3.cmml" xref="S5.SS3.p3.8.m8.1.1.1.3">𝑑</ci><apply id="S5.SS3.p3.8.m8.1.1.1.1.1.1.cmml" xref="S5.SS3.p3.8.m8.1.1.1.1.1"><minus id="S5.SS3.p3.8.m8.1.1.1.1.1.1.1.cmml" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.1"></minus><apply id="S5.SS3.p3.8.m8.1.1.1.1.1.1.2.cmml" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.2"><plus id="S5.SS3.p3.8.m8.1.1.1.1.1.1.2.1.cmml" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.2.1"></plus><ci id="S5.SS3.p3.8.m8.1.1.1.1.1.1.2.2.cmml" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.2.2">𝑑</ci><ci id="S5.SS3.p3.8.m8.1.1.1.1.1.1.2.3.cmml" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.2.3">𝑑</ci></apply><cn id="S5.SS3.p3.8.m8.1.1.1.1.1.1.3.cmml" type="integer" xref="S5.SS3.p3.8.m8.1.1.1.1.1.1.3">1</cn></apply></apply><apply id="S5.SS3.p3.8.m8.2.2.2.1.1.cmml" xref="S5.SS3.p3.8.m8.2.2.2.1"><plus id="S5.SS3.p3.8.m8.2.2.2.1.1.1.cmml" xref="S5.SS3.p3.8.m8.2.2.2.1.1.1"></plus><apply id="S5.SS3.p3.8.m8.2.2.2.1.1.2.cmml" xref="S5.SS3.p3.8.m8.2.2.2.1.1.2"><times id="S5.SS3.p3.8.m8.2.2.2.1.1.2.1.cmml" xref="S5.SS3.p3.8.m8.2.2.2.1.1.2.1"></times><ci id="S5.SS3.p3.8.m8.2.2.2.1.1.2.2.cmml" xref="S5.SS3.p3.8.m8.2.2.2.1.1.2.2">𝑑</ci><cn id="S5.SS3.p3.8.m8.2.2.2.1.1.2.3.cmml" type="integer" xref="S5.SS3.p3.8.m8.2.2.2.1.1.2.3">1</cn></apply><apply id="S5.SS3.p3.8.m8.2.2.2.1.1.3.cmml" xref="S5.SS3.p3.8.m8.2.2.2.1.1.3"><times id="S5.SS3.p3.8.m8.2.2.2.1.1.3.1.cmml" xref="S5.SS3.p3.8.m8.2.2.2.1.1.3.1"></times><ci id="S5.SS3.p3.8.m8.2.2.2.1.1.3.2.cmml" xref="S5.SS3.p3.8.m8.2.2.2.1.1.3.2">𝑑</ci><cn id="S5.SS3.p3.8.m8.2.2.2.1.1.3.3.cmml" type="integer" xref="S5.SS3.p3.8.m8.2.2.2.1.1.3.3">1</cn></apply><apply id="S5.SS3.p3.8.m8.2.2.2.1.1.4.cmml" xref="S5.SS3.p3.8.m8.2.2.2.1.1.4"><times id="S5.SS3.p3.8.m8.2.2.2.1.1.4.1.cmml" xref="S5.SS3.p3.8.m8.2.2.2.1.1.4.1"></times><ci id="S5.SS3.p3.8.m8.2.2.2.1.1.4.2.cmml" xref="S5.SS3.p3.8.m8.2.2.2.1.1.4.2">𝑑</ci><cn id="S5.SS3.p3.8.m8.2.2.2.1.1.4.3.cmml" type="integer" xref="S5.SS3.p3.8.m8.2.2.2.1.1.4.3">1</cn></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p3.8.m8.2c">d\times(d+d-1)/(d\times 1+d\times 1+d\times 1)</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p3.8.m8.2d">italic_d × ( italic_d + italic_d - 1 ) / ( italic_d × 1 + italic_d × 1 + italic_d × 1 )</annotation></semantics></math>. As in Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.F11" title="Figure 11 ‣ V-C Efficient Bubble Streaming Dataflow ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">11</span></a><span class="ltx_text" id="S5.SS3.p3.8.5" style="color:#0000FF;">c</span>, CogSys achieves peak performance when fully utilized, while GPU suffers from memory-bound. Additionally, GPUs require extra computations to handle the index calculations for the circularly shifted vector.</p> </div> <div class="ltx_para" id="S5.SS3.p4"> <p class="ltx_p" id="S5.SS3.p4.13"><span class="ltx_text ltx_font_bold" id="S5.SS3.p4.13.1">Cycle analysis of <span class="ltx_text ltx_font_italic" id="S5.SS3.p4.13.1.1">BS</span> dataflow.</span> Assuming the case of <span class="ltx_text ltx_font_italic" id="S5.SS3.p4.13.2">nsPE</span> array size <math alttext="M=" class="ltx_Math" display="inline" id="S5.SS3.p4.1.m1.1"><semantics id="S5.SS3.p4.1.m1.1a"><mrow id="S5.SS3.p4.1.m1.1.1" xref="S5.SS3.p4.1.m1.1.1.cmml"><mi id="S5.SS3.p4.1.m1.1.1.2" xref="S5.SS3.p4.1.m1.1.1.2.cmml">M</mi><mo id="S5.SS3.p4.1.m1.1.1.1" xref="S5.SS3.p4.1.m1.1.1.1.cmml">=</mo><mi id="S5.SS3.p4.1.m1.1.1.3" xref="S5.SS3.p4.1.m1.1.1.3.cmml"></mi></mrow><annotation-xml encoding="MathML-Content" id="S5.SS3.p4.1.m1.1b"><apply id="S5.SS3.p4.1.m1.1.1.cmml" xref="S5.SS3.p4.1.m1.1.1"><eq id="S5.SS3.p4.1.m1.1.1.1.cmml" xref="S5.SS3.p4.1.m1.1.1.1"></eq><ci id="S5.SS3.p4.1.m1.1.1.2.cmml" xref="S5.SS3.p4.1.m1.1.1.2">𝑀</ci><csymbol cd="latexml" id="S5.SS3.p4.1.m1.1.1.3.cmml" xref="S5.SS3.p4.1.m1.1.1.3">absent</csymbol></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p4.1.m1.1c">M=</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p4.1.m1.1d">italic_M =</annotation></semantics></math> input vector size <math alttext="d" class="ltx_Math" display="inline" id="S5.SS3.p4.2.m2.1"><semantics id="S5.SS3.p4.2.m2.1a"><mi id="S5.SS3.p4.2.m2.1.1" xref="S5.SS3.p4.2.m2.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S5.SS3.p4.2.m2.1b"><ci id="S5.SS3.p4.2.m2.1.1.cmml" xref="S5.SS3.p4.2.m2.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p4.2.m2.1c">d</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p4.2.m2.1d">italic_d</annotation></semantics></math> for vector-symbolic circular convolution, streaming the stationary input would take <math alttext="d" class="ltx_Math" display="inline" id="S5.SS3.p4.3.m3.1"><semantics id="S5.SS3.p4.3.m3.1a"><mi id="S5.SS3.p4.3.m3.1.1" xref="S5.SS3.p4.3.m3.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S5.SS3.p4.3.m3.1b"><ci id="S5.SS3.p4.3.m3.1.1.cmml" xref="S5.SS3.p4.3.m3.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p4.3.m3.1c">d</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p4.3.m3.1d">italic_d</annotation></semantics></math> cycles followed by input vectors taking <math alttext="2d" class="ltx_Math" display="inline" id="S5.SS3.p4.4.m4.1"><semantics id="S5.SS3.p4.4.m4.1a"><mrow id="S5.SS3.p4.4.m4.1.1" xref="S5.SS3.p4.4.m4.1.1.cmml"><mn id="S5.SS3.p4.4.m4.1.1.2" xref="S5.SS3.p4.4.m4.1.1.2.cmml">2</mn><mo id="S5.SS3.p4.4.m4.1.1.1" xref="S5.SS3.p4.4.m4.1.1.1.cmml">⁢</mo><mi id="S5.SS3.p4.4.m4.1.1.3" xref="S5.SS3.p4.4.m4.1.1.3.cmml">d</mi></mrow><annotation-xml encoding="MathML-Content" id="S5.SS3.p4.4.m4.1b"><apply id="S5.SS3.p4.4.m4.1.1.cmml" xref="S5.SS3.p4.4.m4.1.1"><times id="S5.SS3.p4.4.m4.1.1.1.cmml" xref="S5.SS3.p4.4.m4.1.1.1"></times><cn id="S5.SS3.p4.4.m4.1.1.2.cmml" type="integer" xref="S5.SS3.p4.4.m4.1.1.2">2</cn><ci id="S5.SS3.p4.4.m4.1.1.3.cmml" xref="S5.SS3.p4.4.m4.1.1.3">𝑑</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p4.4.m4.1c">2d</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p4.4.m4.1d">2 italic_d</annotation></semantics></math> cycles to reach the final <span class="ltx_text ltx_font_italic" id="S5.SS3.p4.13.3">nsPE</span>. Meanwhile, partial sums are aggregated along the array for the first output, followed by the remaining (<math alttext="d-1" class="ltx_Math" display="inline" id="S5.SS3.p4.5.m5.1"><semantics id="S5.SS3.p4.5.m5.1a"><mrow id="S5.SS3.p4.5.m5.1.1" xref="S5.SS3.p4.5.m5.1.1.cmml"><mi id="S5.SS3.p4.5.m5.1.1.2" xref="S5.SS3.p4.5.m5.1.1.2.cmml">d</mi><mo id="S5.SS3.p4.5.m5.1.1.1" xref="S5.SS3.p4.5.m5.1.1.1.cmml">−</mo><mn id="S5.SS3.p4.5.m5.1.1.3" xref="S5.SS3.p4.5.m5.1.1.3.cmml">1</mn></mrow><annotation-xml encoding="MathML-Content" id="S5.SS3.p4.5.m5.1b"><apply id="S5.SS3.p4.5.m5.1.1.cmml" xref="S5.SS3.p4.5.m5.1.1"><minus id="S5.SS3.p4.5.m5.1.1.1.cmml" xref="S5.SS3.p4.5.m5.1.1.1"></minus><ci id="S5.SS3.p4.5.m5.1.1.2.cmml" xref="S5.SS3.p4.5.m5.1.1.2">𝑑</ci><cn id="S5.SS3.p4.5.m5.1.1.3.cmml" type="integer" xref="S5.SS3.p4.5.m5.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p4.5.m5.1c">d-1</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p4.5.m5.1d">italic_d - 1</annotation></semantics></math>) outputs where each is generated per cycle. Thus, the end-to-end latency for vector-symbolic circular convolution of two <math alttext="d" class="ltx_Math" display="inline" id="S5.SS3.p4.6.m6.1"><semantics id="S5.SS3.p4.6.m6.1a"><mi id="S5.SS3.p4.6.m6.1.1" xref="S5.SS3.p4.6.m6.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S5.SS3.p4.6.m6.1b"><ci id="S5.SS3.p4.6.m6.1.1.cmml" xref="S5.SS3.p4.6.m6.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p4.6.m6.1c">d</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p4.6.m6.1d">italic_d</annotation></semantics></math>-dimensional vectors in a 1-D <span class="ltx_text ltx_font_italic" id="S5.SS3.p4.13.4">nsPE</span> compute array is <math alttext="(4d-1)" class="ltx_Math" display="inline" id="S5.SS3.p4.7.m7.1"><semantics id="S5.SS3.p4.7.m7.1a"><mrow id="S5.SS3.p4.7.m7.1.1.1" xref="S5.SS3.p4.7.m7.1.1.1.1.cmml"><mo id="S5.SS3.p4.7.m7.1.1.1.2" stretchy="false" xref="S5.SS3.p4.7.m7.1.1.1.1.cmml">(</mo><mrow id="S5.SS3.p4.7.m7.1.1.1.1" xref="S5.SS3.p4.7.m7.1.1.1.1.cmml"><mrow id="S5.SS3.p4.7.m7.1.1.1.1.2" xref="S5.SS3.p4.7.m7.1.1.1.1.2.cmml"><mn id="S5.SS3.p4.7.m7.1.1.1.1.2.2" xref="S5.SS3.p4.7.m7.1.1.1.1.2.2.cmml">4</mn><mo id="S5.SS3.p4.7.m7.1.1.1.1.2.1" xref="S5.SS3.p4.7.m7.1.1.1.1.2.1.cmml">⁢</mo><mi id="S5.SS3.p4.7.m7.1.1.1.1.2.3" xref="S5.SS3.p4.7.m7.1.1.1.1.2.3.cmml">d</mi></mrow><mo id="S5.SS3.p4.7.m7.1.1.1.1.1" xref="S5.SS3.p4.7.m7.1.1.1.1.1.cmml">−</mo><mn id="S5.SS3.p4.7.m7.1.1.1.1.3" xref="S5.SS3.p4.7.m7.1.1.1.1.3.cmml">1</mn></mrow><mo id="S5.SS3.p4.7.m7.1.1.1.3" stretchy="false" xref="S5.SS3.p4.7.m7.1.1.1.1.cmml">)</mo></mrow><annotation-xml encoding="MathML-Content" id="S5.SS3.p4.7.m7.1b"><apply id="S5.SS3.p4.7.m7.1.1.1.1.cmml" xref="S5.SS3.p4.7.m7.1.1.1"><minus id="S5.SS3.p4.7.m7.1.1.1.1.1.cmml" xref="S5.SS3.p4.7.m7.1.1.1.1.1"></minus><apply id="S5.SS3.p4.7.m7.1.1.1.1.2.cmml" xref="S5.SS3.p4.7.m7.1.1.1.1.2"><times id="S5.SS3.p4.7.m7.1.1.1.1.2.1.cmml" xref="S5.SS3.p4.7.m7.1.1.1.1.2.1"></times><cn id="S5.SS3.p4.7.m7.1.1.1.1.2.2.cmml" type="integer" xref="S5.SS3.p4.7.m7.1.1.1.1.2.2">4</cn><ci id="S5.SS3.p4.7.m7.1.1.1.1.2.3.cmml" xref="S5.SS3.p4.7.m7.1.1.1.1.2.3">𝑑</ci></apply><cn id="S5.SS3.p4.7.m7.1.1.1.1.3.cmml" type="integer" xref="S5.SS3.p4.7.m7.1.1.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p4.7.m7.1c">(4d-1)</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p4.7.m7.1d">( 4 italic_d - 1 )</annotation></semantics></math> cycles. In the case where <math alttext="d\neq" class="ltx_Math" display="inline" id="S5.SS3.p4.8.m8.1"><semantics id="S5.SS3.p4.8.m8.1a"><mrow id="S5.SS3.p4.8.m8.1.1" xref="S5.SS3.p4.8.m8.1.1.cmml"><mi id="S5.SS3.p4.8.m8.1.1.2" xref="S5.SS3.p4.8.m8.1.1.2.cmml">d</mi><mo id="S5.SS3.p4.8.m8.1.1.1" xref="S5.SS3.p4.8.m8.1.1.1.cmml">≠</mo><mi id="S5.SS3.p4.8.m8.1.1.3" xref="S5.SS3.p4.8.m8.1.1.3.cmml"></mi></mrow><annotation-xml encoding="MathML-Content" id="S5.SS3.p4.8.m8.1b"><apply id="S5.SS3.p4.8.m8.1.1.cmml" xref="S5.SS3.p4.8.m8.1.1"><neq id="S5.SS3.p4.8.m8.1.1.1.cmml" xref="S5.SS3.p4.8.m8.1.1.1"></neq><ci id="S5.SS3.p4.8.m8.1.1.2.cmml" xref="S5.SS3.p4.8.m8.1.1.2">𝑑</ci><csymbol cd="latexml" id="S5.SS3.p4.8.m8.1.1.3.cmml" xref="S5.SS3.p4.8.m8.1.1.3">absent</csymbol></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p4.8.m8.1c">d\neq</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p4.8.m8.1d">italic_d ≠</annotation></semantics></math> M, latency <math alttext="T" class="ltx_Math" display="inline" id="S5.SS3.p4.9.m9.1"><semantics id="S5.SS3.p4.9.m9.1a"><mi id="S5.SS3.p4.9.m9.1.1" xref="S5.SS3.p4.9.m9.1.1.cmml">T</mi><annotation-xml encoding="MathML-Content" id="S5.SS3.p4.9.m9.1b"><ci id="S5.SS3.p4.9.m9.1.1.cmml" xref="S5.SS3.p4.9.m9.1.1">𝑇</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p4.9.m9.1c">T</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p4.9.m9.1d">italic_T</annotation></semantics></math> will be <math alttext="(3M+d-1)" class="ltx_Math" display="inline" id="S5.SS3.p4.10.m10.1"><semantics id="S5.SS3.p4.10.m10.1a"><mrow id="S5.SS3.p4.10.m10.1.1.1" xref="S5.SS3.p4.10.m10.1.1.1.1.cmml"><mo id="S5.SS3.p4.10.m10.1.1.1.2" stretchy="false" xref="S5.SS3.p4.10.m10.1.1.1.1.cmml">(</mo><mrow id="S5.SS3.p4.10.m10.1.1.1.1" xref="S5.SS3.p4.10.m10.1.1.1.1.cmml"><mrow id="S5.SS3.p4.10.m10.1.1.1.1.2" xref="S5.SS3.p4.10.m10.1.1.1.1.2.cmml"><mrow id="S5.SS3.p4.10.m10.1.1.1.1.2.2" xref="S5.SS3.p4.10.m10.1.1.1.1.2.2.cmml"><mn id="S5.SS3.p4.10.m10.1.1.1.1.2.2.2" xref="S5.SS3.p4.10.m10.1.1.1.1.2.2.2.cmml">3</mn><mo id="S5.SS3.p4.10.m10.1.1.1.1.2.2.1" xref="S5.SS3.p4.10.m10.1.1.1.1.2.2.1.cmml">⁢</mo><mi id="S5.SS3.p4.10.m10.1.1.1.1.2.2.3" xref="S5.SS3.p4.10.m10.1.1.1.1.2.2.3.cmml">M</mi></mrow><mo id="S5.SS3.p4.10.m10.1.1.1.1.2.1" xref="S5.SS3.p4.10.m10.1.1.1.1.2.1.cmml">+</mo><mi id="S5.SS3.p4.10.m10.1.1.1.1.2.3" xref="S5.SS3.p4.10.m10.1.1.1.1.2.3.cmml">d</mi></mrow><mo id="S5.SS3.p4.10.m10.1.1.1.1.1" xref="S5.SS3.p4.10.m10.1.1.1.1.1.cmml">−</mo><mn id="S5.SS3.p4.10.m10.1.1.1.1.3" xref="S5.SS3.p4.10.m10.1.1.1.1.3.cmml">1</mn></mrow><mo id="S5.SS3.p4.10.m10.1.1.1.3" stretchy="false" xref="S5.SS3.p4.10.m10.1.1.1.1.cmml">)</mo></mrow><annotation-xml encoding="MathML-Content" id="S5.SS3.p4.10.m10.1b"><apply id="S5.SS3.p4.10.m10.1.1.1.1.cmml" xref="S5.SS3.p4.10.m10.1.1.1"><minus id="S5.SS3.p4.10.m10.1.1.1.1.1.cmml" xref="S5.SS3.p4.10.m10.1.1.1.1.1"></minus><apply id="S5.SS3.p4.10.m10.1.1.1.1.2.cmml" xref="S5.SS3.p4.10.m10.1.1.1.1.2"><plus id="S5.SS3.p4.10.m10.1.1.1.1.2.1.cmml" xref="S5.SS3.p4.10.m10.1.1.1.1.2.1"></plus><apply id="S5.SS3.p4.10.m10.1.1.1.1.2.2.cmml" xref="S5.SS3.p4.10.m10.1.1.1.1.2.2"><times id="S5.SS3.p4.10.m10.1.1.1.1.2.2.1.cmml" xref="S5.SS3.p4.10.m10.1.1.1.1.2.2.1"></times><cn id="S5.SS3.p4.10.m10.1.1.1.1.2.2.2.cmml" type="integer" xref="S5.SS3.p4.10.m10.1.1.1.1.2.2.2">3</cn><ci id="S5.SS3.p4.10.m10.1.1.1.1.2.2.3.cmml" xref="S5.SS3.p4.10.m10.1.1.1.1.2.2.3">𝑀</ci></apply><ci id="S5.SS3.p4.10.m10.1.1.1.1.2.3.cmml" xref="S5.SS3.p4.10.m10.1.1.1.1.2.3">𝑑</ci></apply><cn id="S5.SS3.p4.10.m10.1.1.1.1.3.cmml" type="integer" xref="S5.SS3.p4.10.m10.1.1.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p4.10.m10.1c">(3M+d-1)</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p4.10.m10.1d">( 3 italic_M + italic_d - 1 )</annotation></semantics></math> cycles where <math alttext="M" class="ltx_Math" display="inline" id="S5.SS3.p4.11.m11.1"><semantics id="S5.SS3.p4.11.m11.1a"><mi id="S5.SS3.p4.11.m11.1.1" xref="S5.SS3.p4.11.m11.1.1.cmml">M</mi><annotation-xml encoding="MathML-Content" id="S5.SS3.p4.11.m11.1b"><ci id="S5.SS3.p4.11.m11.1.1.cmml" xref="S5.SS3.p4.11.m11.1.1">𝑀</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p4.11.m11.1c">M</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p4.11.m11.1d">italic_M</annotation></semantics></math> for loading stationary vector, <math alttext="2M" class="ltx_Math" display="inline" id="S5.SS3.p4.12.m12.1"><semantics id="S5.SS3.p4.12.m12.1a"><mrow id="S5.SS3.p4.12.m12.1.1" xref="S5.SS3.p4.12.m12.1.1.cmml"><mn id="S5.SS3.p4.12.m12.1.1.2" xref="S5.SS3.p4.12.m12.1.1.2.cmml">2</mn><mo id="S5.SS3.p4.12.m12.1.1.1" xref="S5.SS3.p4.12.m12.1.1.1.cmml">⁢</mo><mi id="S5.SS3.p4.12.m12.1.1.3" xref="S5.SS3.p4.12.m12.1.1.3.cmml">M</mi></mrow><annotation-xml encoding="MathML-Content" id="S5.SS3.p4.12.m12.1b"><apply id="S5.SS3.p4.12.m12.1.1.cmml" xref="S5.SS3.p4.12.m12.1.1"><times id="S5.SS3.p4.12.m12.1.1.1.cmml" xref="S5.SS3.p4.12.m12.1.1.1"></times><cn id="S5.SS3.p4.12.m12.1.1.2.cmml" type="integer" xref="S5.SS3.p4.12.m12.1.1.2">2</cn><ci id="S5.SS3.p4.12.m12.1.1.3.cmml" xref="S5.SS3.p4.12.m12.1.1.3">𝑀</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p4.12.m12.1c">2M</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p4.12.m12.1d">2 italic_M</annotation></semantics></math> for streaming vector reach final <span class="ltx_text ltx_font_italic" id="S5.SS3.p4.13.5">nsPE</span>, and <math alttext="(d-1)" class="ltx_Math" display="inline" id="S5.SS3.p4.13.m13.1"><semantics id="S5.SS3.p4.13.m13.1a"><mrow id="S5.SS3.p4.13.m13.1.1.1" xref="S5.SS3.p4.13.m13.1.1.1.1.cmml"><mo id="S5.SS3.p4.13.m13.1.1.1.2" stretchy="false" xref="S5.SS3.p4.13.m13.1.1.1.1.cmml">(</mo><mrow id="S5.SS3.p4.13.m13.1.1.1.1" xref="S5.SS3.p4.13.m13.1.1.1.1.cmml"><mi id="S5.SS3.p4.13.m13.1.1.1.1.2" xref="S5.SS3.p4.13.m13.1.1.1.1.2.cmml">d</mi><mo id="S5.SS3.p4.13.m13.1.1.1.1.1" xref="S5.SS3.p4.13.m13.1.1.1.1.1.cmml">−</mo><mn id="S5.SS3.p4.13.m13.1.1.1.1.3" xref="S5.SS3.p4.13.m13.1.1.1.1.3.cmml">1</mn></mrow><mo id="S5.SS3.p4.13.m13.1.1.1.3" stretchy="false" xref="S5.SS3.p4.13.m13.1.1.1.1.cmml">)</mo></mrow><annotation-xml encoding="MathML-Content" id="S5.SS3.p4.13.m13.1b"><apply id="S5.SS3.p4.13.m13.1.1.1.1.cmml" xref="S5.SS3.p4.13.m13.1.1.1"><minus id="S5.SS3.p4.13.m13.1.1.1.1.1.cmml" xref="S5.SS3.p4.13.m13.1.1.1.1.1"></minus><ci id="S5.SS3.p4.13.m13.1.1.1.1.2.cmml" xref="S5.SS3.p4.13.m13.1.1.1.1.2">𝑑</ci><cn id="S5.SS3.p4.13.m13.1.1.1.1.3.cmml" type="integer" xref="S5.SS3.p4.13.m13.1.1.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS3.p4.13.m13.1c">(d-1)</annotation><annotation encoding="application/x-llamapun" id="S5.SS3.p4.13.m13.1d">( italic_d - 1 )</annotation></semantics></math> cycles for remaining outputs.</p> </div> </section> <section class="ltx_subsection" id="S5.SS4"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S5.SS4.5.1.1">V-D</span> </span><span class="ltx_text ltx_font_italic" id="S5.SS4.6.2">Adaptive Spatial and Temporal Mapping Strategy</span> </h3> <figure class="ltx_figure ltx_figure_panel ltx_minipage ltx_align_bottom" id="S5.F12.7" style="width:433.6pt;"> <div class="ltx_block ltx_figure_panel ltx_minipage ltx_align_middle" id="S5.F12.1" style="width:433.6pt;"> <img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="244" id="S5.F12.1.g1" src="x12.png" width="830"/> </div> <br class="ltx_break ltx_break"/> <div class="ltx_inline-block ltx_transformed_outer" id="S5.F12.5.4" style="width:433.6pt;height:68pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="transform:translate(44.7pt,-7.0pt) scale(1.25982443766957,1.25982443766957) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S5.F12.5.4.4"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S5.F12.5.4.4.5.1"> <th class="ltx_td ltx_th ltx_th_column ltx_th_row ltx_border_r ltx_border_t" id="S5.F12.5.4.4.5.1.1" style="padding:1pt 2.0pt;"></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.F12.5.4.4.5.1.2" style="padding:1pt 2.0pt;"><span class="ltx_text ltx_font_bold" id="S5.F12.5.4.4.5.1.2.1">(a) Spatial Mapping</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S5.F12.5.4.4.5.1.3" style="padding:1pt 2.0pt;"><span class="ltx_text ltx_font_bold" id="S5.F12.5.4.4.5.1.3.1">(b) Temporal Mapping</span></th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S5.F12.3.2.2.2"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S5.F12.3.2.2.2.3" style="padding:1pt 2.0pt;"><span class="ltx_text ltx_font_bold" id="S5.F12.3.2.2.2.3.1">Latency</span></th> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.F12.2.1.1.1.1" style="padding:1pt 2.0pt;"><math alttext="k\times\lceil d/(N\times M)\rceil\times T" class="ltx_Math" display="inline" id="S5.F12.2.1.1.1.1.m1.1"><semantics id="S5.F12.2.1.1.1.1.m1.1a"><mrow id="S5.F12.2.1.1.1.1.m1.1.1" xref="S5.F12.2.1.1.1.1.m1.1.1.cmml"><mi id="S5.F12.2.1.1.1.1.m1.1.1.3" xref="S5.F12.2.1.1.1.1.m1.1.1.3.cmml">k</mi><mo id="S5.F12.2.1.1.1.1.m1.1.1.2" lspace="0.222em" rspace="0.222em" xref="S5.F12.2.1.1.1.1.m1.1.1.2.cmml">×</mo><mrow id="S5.F12.2.1.1.1.1.m1.1.1.1.1" xref="S5.F12.2.1.1.1.1.m1.1.1.1.2.cmml"><mo id="S5.F12.2.1.1.1.1.m1.1.1.1.1.2" stretchy="false" xref="S5.F12.2.1.1.1.1.m1.1.1.1.2.1.cmml">⌈</mo><mrow id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.cmml"><mi id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.3" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.3.cmml">d</mi><mo id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.2" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.2.cmml">/</mo><mrow id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.cmml"><mo id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.2" stretchy="false" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.cmml">(</mo><mrow id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.cmml"><mi id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.2" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.2.cmml">N</mi><mo id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.1" lspace="0.222em" rspace="0.222em" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.1.cmml">×</mo><mi id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.3" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.3.cmml">M</mi></mrow><mo id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.3" stretchy="false" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S5.F12.2.1.1.1.1.m1.1.1.1.1.3" rspace="0.055em" stretchy="false" xref="S5.F12.2.1.1.1.1.m1.1.1.1.2.1.cmml">⌉</mo></mrow><mo id="S5.F12.2.1.1.1.1.m1.1.1.2a" rspace="0.222em" xref="S5.F12.2.1.1.1.1.m1.1.1.2.cmml">×</mo><mi id="S5.F12.2.1.1.1.1.m1.1.1.4" xref="S5.F12.2.1.1.1.1.m1.1.1.4.cmml">T</mi></mrow><annotation-xml encoding="MathML-Content" id="S5.F12.2.1.1.1.1.m1.1b"><apply id="S5.F12.2.1.1.1.1.m1.1.1.cmml" xref="S5.F12.2.1.1.1.1.m1.1.1"><times id="S5.F12.2.1.1.1.1.m1.1.1.2.cmml" xref="S5.F12.2.1.1.1.1.m1.1.1.2"></times><ci id="S5.F12.2.1.1.1.1.m1.1.1.3.cmml" xref="S5.F12.2.1.1.1.1.m1.1.1.3">𝑘</ci><apply id="S5.F12.2.1.1.1.1.m1.1.1.1.2.cmml" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1"><ceiling id="S5.F12.2.1.1.1.1.m1.1.1.1.2.1.cmml" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.2"></ceiling><apply id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.cmml" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1"><divide id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.2.cmml" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.2"></divide><ci id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.3.cmml" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.3">𝑑</ci><apply id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.cmml" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1"><times id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.1.cmml" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.1"></times><ci id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.2.cmml" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.2">𝑁</ci><ci id="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.3.cmml" xref="S5.F12.2.1.1.1.1.m1.1.1.1.1.1.1.1.1.3">𝑀</ci></apply></apply></apply><ci id="S5.F12.2.1.1.1.1.m1.1.1.4.cmml" xref="S5.F12.2.1.1.1.1.m1.1.1.4">𝑇</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.F12.2.1.1.1.1.m1.1c">k\times\lceil d/(N\times M)\rceil\times T</annotation><annotation encoding="application/x-llamapun" id="S5.F12.2.1.1.1.1.m1.1d">italic_k × ⌈ italic_d / ( italic_N × italic_M ) ⌉ × italic_T</annotation></semantics></math></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.F12.3.2.2.2.2" style="padding:1pt 2.0pt;"><math alttext="\lceil k/N\rceil\times\lceil d/M\rceil\times T" class="ltx_Math" display="inline" id="S5.F12.3.2.2.2.2.m1.2"><semantics id="S5.F12.3.2.2.2.2.m1.2a"><mrow id="S5.F12.3.2.2.2.2.m1.2.2" xref="S5.F12.3.2.2.2.2.m1.2.2.cmml"><mrow id="S5.F12.3.2.2.2.2.m1.1.1.1.1" xref="S5.F12.3.2.2.2.2.m1.1.1.1.2.cmml"><mo id="S5.F12.3.2.2.2.2.m1.1.1.1.1.2" stretchy="false" xref="S5.F12.3.2.2.2.2.m1.1.1.1.2.1.cmml">⌈</mo><mrow id="S5.F12.3.2.2.2.2.m1.1.1.1.1.1" xref="S5.F12.3.2.2.2.2.m1.1.1.1.1.1.cmml"><mi id="S5.F12.3.2.2.2.2.m1.1.1.1.1.1.2" xref="S5.F12.3.2.2.2.2.m1.1.1.1.1.1.2.cmml">k</mi><mo id="S5.F12.3.2.2.2.2.m1.1.1.1.1.1.1" xref="S5.F12.3.2.2.2.2.m1.1.1.1.1.1.1.cmml">/</mo><mi id="S5.F12.3.2.2.2.2.m1.1.1.1.1.1.3" xref="S5.F12.3.2.2.2.2.m1.1.1.1.1.1.3.cmml">N</mi></mrow><mo id="S5.F12.3.2.2.2.2.m1.1.1.1.1.3" rspace="0.055em" stretchy="false" xref="S5.F12.3.2.2.2.2.m1.1.1.1.2.1.cmml">⌉</mo></mrow><mo id="S5.F12.3.2.2.2.2.m1.2.2.3" rspace="0.222em" xref="S5.F12.3.2.2.2.2.m1.2.2.3.cmml">×</mo><mrow id="S5.F12.3.2.2.2.2.m1.2.2.2.1" xref="S5.F12.3.2.2.2.2.m1.2.2.2.2.cmml"><mo id="S5.F12.3.2.2.2.2.m1.2.2.2.1.2" stretchy="false" xref="S5.F12.3.2.2.2.2.m1.2.2.2.2.1.cmml">⌈</mo><mrow id="S5.F12.3.2.2.2.2.m1.2.2.2.1.1" xref="S5.F12.3.2.2.2.2.m1.2.2.2.1.1.cmml"><mi id="S5.F12.3.2.2.2.2.m1.2.2.2.1.1.2" xref="S5.F12.3.2.2.2.2.m1.2.2.2.1.1.2.cmml">d</mi><mo id="S5.F12.3.2.2.2.2.m1.2.2.2.1.1.1" xref="S5.F12.3.2.2.2.2.m1.2.2.2.1.1.1.cmml">/</mo><mi id="S5.F12.3.2.2.2.2.m1.2.2.2.1.1.3" xref="S5.F12.3.2.2.2.2.m1.2.2.2.1.1.3.cmml">M</mi></mrow><mo id="S5.F12.3.2.2.2.2.m1.2.2.2.1.3" rspace="0.055em" stretchy="false" xref="S5.F12.3.2.2.2.2.m1.2.2.2.2.1.cmml">⌉</mo></mrow><mo id="S5.F12.3.2.2.2.2.m1.2.2.3a" rspace="0.222em" xref="S5.F12.3.2.2.2.2.m1.2.2.3.cmml">×</mo><mi id="S5.F12.3.2.2.2.2.m1.2.2.4" xref="S5.F12.3.2.2.2.2.m1.2.2.4.cmml">T</mi></mrow><annotation-xml encoding="MathML-Content" id="S5.F12.3.2.2.2.2.m1.2b"><apply id="S5.F12.3.2.2.2.2.m1.2.2.cmml" xref="S5.F12.3.2.2.2.2.m1.2.2"><times id="S5.F12.3.2.2.2.2.m1.2.2.3.cmml" xref="S5.F12.3.2.2.2.2.m1.2.2.3"></times><apply id="S5.F12.3.2.2.2.2.m1.1.1.1.2.cmml" xref="S5.F12.3.2.2.2.2.m1.1.1.1.1"><ceiling id="S5.F12.3.2.2.2.2.m1.1.1.1.2.1.cmml" xref="S5.F12.3.2.2.2.2.m1.1.1.1.1.2"></ceiling><apply id="S5.F12.3.2.2.2.2.m1.1.1.1.1.1.cmml" xref="S5.F12.3.2.2.2.2.m1.1.1.1.1.1"><divide id="S5.F12.3.2.2.2.2.m1.1.1.1.1.1.1.cmml" xref="S5.F12.3.2.2.2.2.m1.1.1.1.1.1.1"></divide><ci id="S5.F12.3.2.2.2.2.m1.1.1.1.1.1.2.cmml" xref="S5.F12.3.2.2.2.2.m1.1.1.1.1.1.2">𝑘</ci><ci id="S5.F12.3.2.2.2.2.m1.1.1.1.1.1.3.cmml" xref="S5.F12.3.2.2.2.2.m1.1.1.1.1.1.3">𝑁</ci></apply></apply><apply id="S5.F12.3.2.2.2.2.m1.2.2.2.2.cmml" xref="S5.F12.3.2.2.2.2.m1.2.2.2.1"><ceiling id="S5.F12.3.2.2.2.2.m1.2.2.2.2.1.cmml" xref="S5.F12.3.2.2.2.2.m1.2.2.2.1.2"></ceiling><apply id="S5.F12.3.2.2.2.2.m1.2.2.2.1.1.cmml" xref="S5.F12.3.2.2.2.2.m1.2.2.2.1.1"><divide id="S5.F12.3.2.2.2.2.m1.2.2.2.1.1.1.cmml" xref="S5.F12.3.2.2.2.2.m1.2.2.2.1.1.1"></divide><ci id="S5.F12.3.2.2.2.2.m1.2.2.2.1.1.2.cmml" xref="S5.F12.3.2.2.2.2.m1.2.2.2.1.1.2">𝑑</ci><ci id="S5.F12.3.2.2.2.2.m1.2.2.2.1.1.3.cmml" xref="S5.F12.3.2.2.2.2.m1.2.2.2.1.1.3">𝑀</ci></apply></apply><ci id="S5.F12.3.2.2.2.2.m1.2.2.4.cmml" xref="S5.F12.3.2.2.2.2.m1.2.2.4">𝑇</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.F12.3.2.2.2.2.m1.2c">\lceil k/N\rceil\times\lceil d/M\rceil\times T</annotation><annotation encoding="application/x-llamapun" id="S5.F12.3.2.2.2.2.m1.2d">⌈ italic_k / italic_N ⌉ × ⌈ italic_d / italic_M ⌉ × italic_T</annotation></semantics></math></td> </tr> <tr class="ltx_tr" id="S5.F12.5.4.4.4"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_b ltx_border_r ltx_border_t" id="S5.F12.5.4.4.4.3" style="padding:1pt 2.0pt;"><span class="ltx_text ltx_font_bold" id="S5.F12.5.4.4.4.3.1">Mem Reads under full util.</span></th> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.F12.4.3.3.3.1" style="padding:1pt 2.0pt;"> <math alttext="2d" class="ltx_Math" display="inline" id="S5.F12.4.3.3.3.1.m1.1"><semantics id="S5.F12.4.3.3.3.1.m1.1a"><mrow id="S5.F12.4.3.3.3.1.m1.1.1" xref="S5.F12.4.3.3.3.1.m1.1.1.cmml"><mn id="S5.F12.4.3.3.3.1.m1.1.1.2" xref="S5.F12.4.3.3.3.1.m1.1.1.2.cmml">2</mn><mo id="S5.F12.4.3.3.3.1.m1.1.1.1" xref="S5.F12.4.3.3.3.1.m1.1.1.1.cmml">⁢</mo><mi id="S5.F12.4.3.3.3.1.m1.1.1.3" xref="S5.F12.4.3.3.3.1.m1.1.1.3.cmml">d</mi></mrow><annotation-xml encoding="MathML-Content" id="S5.F12.4.3.3.3.1.m1.1b"><apply id="S5.F12.4.3.3.3.1.m1.1.1.cmml" xref="S5.F12.4.3.3.3.1.m1.1.1"><times id="S5.F12.4.3.3.3.1.m1.1.1.1.cmml" xref="S5.F12.4.3.3.3.1.m1.1.1.1"></times><cn id="S5.F12.4.3.3.3.1.m1.1.1.2.cmml" type="integer" xref="S5.F12.4.3.3.3.1.m1.1.1.2">2</cn><ci id="S5.F12.4.3.3.3.1.m1.1.1.3.cmml" xref="S5.F12.4.3.3.3.1.m1.1.1.3">𝑑</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.F12.4.3.3.3.1.m1.1c">2d</annotation><annotation encoding="application/x-llamapun" id="S5.F12.4.3.3.3.1.m1.1d">2 italic_d</annotation></semantics></math> per T cycles</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S5.F12.5.4.4.4.2" style="padding:1pt 2.0pt;"> <math alttext="(d+M)\times N" class="ltx_Math" display="inline" id="S5.F12.5.4.4.4.2.m1.1"><semantics id="S5.F12.5.4.4.4.2.m1.1a"><mrow id="S5.F12.5.4.4.4.2.m1.1.1" xref="S5.F12.5.4.4.4.2.m1.1.1.cmml"><mrow id="S5.F12.5.4.4.4.2.m1.1.1.1.1" xref="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.cmml"><mo id="S5.F12.5.4.4.4.2.m1.1.1.1.1.2" stretchy="false" xref="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.cmml">(</mo><mrow id="S5.F12.5.4.4.4.2.m1.1.1.1.1.1" xref="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.cmml"><mi id="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.2" xref="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.2.cmml">d</mi><mo id="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.1" xref="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.1.cmml">+</mo><mi id="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.3" xref="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.3.cmml">M</mi></mrow><mo id="S5.F12.5.4.4.4.2.m1.1.1.1.1.3" rspace="0.055em" stretchy="false" xref="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.cmml">)</mo></mrow><mo id="S5.F12.5.4.4.4.2.m1.1.1.2" rspace="0.222em" xref="S5.F12.5.4.4.4.2.m1.1.1.2.cmml">×</mo><mi id="S5.F12.5.4.4.4.2.m1.1.1.3" xref="S5.F12.5.4.4.4.2.m1.1.1.3.cmml">N</mi></mrow><annotation-xml encoding="MathML-Content" id="S5.F12.5.4.4.4.2.m1.1b"><apply id="S5.F12.5.4.4.4.2.m1.1.1.cmml" xref="S5.F12.5.4.4.4.2.m1.1.1"><times id="S5.F12.5.4.4.4.2.m1.1.1.2.cmml" xref="S5.F12.5.4.4.4.2.m1.1.1.2"></times><apply id="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.cmml" xref="S5.F12.5.4.4.4.2.m1.1.1.1.1"><plus id="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.1.cmml" xref="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.1"></plus><ci id="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.2.cmml" xref="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.2">𝑑</ci><ci id="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.3.cmml" xref="S5.F12.5.4.4.4.2.m1.1.1.1.1.1.3">𝑀</ci></apply><ci id="S5.F12.5.4.4.4.2.m1.1.1.3.cmml" xref="S5.F12.5.4.4.4.2.m1.1.1.3">𝑁</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.F12.5.4.4.4.2.m1.1c">(d+M)\times N</annotation><annotation encoding="application/x-llamapun" id="S5.F12.5.4.4.4.2.m1.1d">( italic_d + italic_M ) × italic_N</annotation></semantics></math> per T cycles</td> </tr> </tbody> </table> </span></div> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S5.F12.7.11.2.1" style="font-size:90%;">Figure 12</span>: </span><span class="ltx_text ltx_font_bold" id="S5.F12.7.6.1" style="font-size:90%;">Adaptive spatial-temporal (<span class="ltx_text ltx_font_italic" id="S5.F12.7.6.1.2">ST</span>) mapping.<span class="ltx_text ltx_font_medium" id="S5.F12.7.6.1.1"> The <span class="ltx_text ltx_font_italic" id="S5.F12.7.6.1.1.1">ST</span> mapping under <span class="ltx_text ltx_font_italic" id="S5.F12.7.6.1.1.2">BS</span> dataflow. CogSys adaptively chooses the mapping scheme based on <math alttext="(N,M,k,d)" class="ltx_Math" display="inline" id="S5.F12.7.6.1.1.m1.4"><semantics id="S5.F12.7.6.1.1.m1.4b"><mrow id="S5.F12.7.6.1.1.m1.4.5.2" xref="S5.F12.7.6.1.1.m1.4.5.1.cmml"><mo id="S5.F12.7.6.1.1.m1.4.5.2.1" stretchy="false" xref="S5.F12.7.6.1.1.m1.4.5.1.cmml">(</mo><mi id="S5.F12.7.6.1.1.m1.1.1" xref="S5.F12.7.6.1.1.m1.1.1.cmml">N</mi><mo id="S5.F12.7.6.1.1.m1.4.5.2.2" xref="S5.F12.7.6.1.1.m1.4.5.1.cmml">,</mo><mi id="S5.F12.7.6.1.1.m1.2.2" xref="S5.F12.7.6.1.1.m1.2.2.cmml">M</mi><mo id="S5.F12.7.6.1.1.m1.4.5.2.3" xref="S5.F12.7.6.1.1.m1.4.5.1.cmml">,</mo><mi id="S5.F12.7.6.1.1.m1.3.3" xref="S5.F12.7.6.1.1.m1.3.3.cmml">k</mi><mo id="S5.F12.7.6.1.1.m1.4.5.2.4" xref="S5.F12.7.6.1.1.m1.4.5.1.cmml">,</mo><mi id="S5.F12.7.6.1.1.m1.4.4" xref="S5.F12.7.6.1.1.m1.4.4.cmml">d</mi><mo id="S5.F12.7.6.1.1.m1.4.5.2.5" stretchy="false" xref="S5.F12.7.6.1.1.m1.4.5.1.cmml">)</mo></mrow><annotation-xml encoding="MathML-Content" id="S5.F12.7.6.1.1.m1.4c"><vector id="S5.F12.7.6.1.1.m1.4.5.1.cmml" xref="S5.F12.7.6.1.1.m1.4.5.2"><ci id="S5.F12.7.6.1.1.m1.1.1.cmml" xref="S5.F12.7.6.1.1.m1.1.1">𝑁</ci><ci id="S5.F12.7.6.1.1.m1.2.2.cmml" xref="S5.F12.7.6.1.1.m1.2.2">𝑀</ci><ci id="S5.F12.7.6.1.1.m1.3.3.cmml" xref="S5.F12.7.6.1.1.m1.3.3">𝑘</ci><ci id="S5.F12.7.6.1.1.m1.4.4.cmml" xref="S5.F12.7.6.1.1.m1.4.4">𝑑</ci></vector></annotation-xml><annotation encoding="application/x-tex" id="S5.F12.7.6.1.1.m1.4d">(N,M,k,d)</annotation><annotation encoding="application/x-llamapun" id="S5.F12.7.6.1.1.m1.4e">( italic_N , italic_M , italic_k , italic_d )</annotation></semantics></math> workload and hardware configurations.</span></span></figcaption> </figure> <div class="ltx_para" id="S5.SS4.p1"> <p class="ltx_p" id="S5.SS4.p1.33"><span class="ltx_text ltx_font_bold" id="S5.SS4.p1.33.1">Spatial-temporal (<span class="ltx_text ltx_font_italic" id="S5.SS4.p1.33.1.1">ST</span>) mapping flexibility.</span> To efficiently support the various dimensions of vector-symbolic operations, we propose <span class="ltx_text ltx_font_italic" id="S5.SS4.p1.33.2">ST</span> mapping featuring <span class="ltx_text ltx_framed ltx_framed_underline" id="S5.SS4.p1.33.3">spatial mapping mode</span> and <span class="ltx_text ltx_framed ltx_framed_underline" id="S5.SS4.p1.33.4">temporal mapping mode</span> (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.F12.7" title="Figure 12 ‣ V-D Adaptive Spatial and Temporal Mapping Strategy ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">12</span></a>). The <span class="ltx_text ltx_font_italic" id="S5.SS4.p1.33.5">nsPE</span> array is easily expanded to <math alttext="N" class="ltx_Math" display="inline" id="S5.SS4.p1.1.m1.1"><semantics id="S5.SS4.p1.1.m1.1a"><mi id="S5.SS4.p1.1.m1.1.1" xref="S5.SS4.p1.1.m1.1.1.cmml">N</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.1.m1.1b"><ci id="S5.SS4.p1.1.m1.1.1.cmml" xref="S5.SS4.p1.1.m1.1.1">𝑁</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.1.m1.1c">N</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.1.m1.1d">italic_N</annotation></semantics></math> arrays. Spatial mapping, by parallelizing a single circular convolution into folds, reduces memory reads compared to temporal mapping, especially with many folds. Taking <math alttext="N" class="ltx_Math" display="inline" id="S5.SS4.p1.2.m2.1"><semantics id="S5.SS4.p1.2.m2.1a"><mi id="S5.SS4.p1.2.m2.1.1" xref="S5.SS4.p1.2.m2.1.1.cmml">N</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.2.m2.1b"><ci id="S5.SS4.p1.2.m2.1.1.cmml" xref="S5.SS4.p1.2.m2.1.1">𝑁</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.2.m2.1c">N</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.2.m2.1d">italic_N</annotation></semantics></math> arrays (<math alttext="M" class="ltx_Math" display="inline" id="S5.SS4.p1.3.m3.1"><semantics id="S5.SS4.p1.3.m3.1a"><mi id="S5.SS4.p1.3.m3.1.1" xref="S5.SS4.p1.3.m3.1.1.cmml">M</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.3.m3.1b"><ci id="S5.SS4.p1.3.m3.1.1.cmml" xref="S5.SS4.p1.3.m3.1.1">𝑀</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.3.m3.1c">M</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.3.m3.1d">italic_M</annotation></semantics></math> PEs each) as an example, spatial mapping requires <math alttext="B_{S}" class="ltx_Math" display="inline" id="S5.SS4.p1.4.m4.1"><semantics id="S5.SS4.p1.4.m4.1a"><msub id="S5.SS4.p1.4.m4.1.1" xref="S5.SS4.p1.4.m4.1.1.cmml"><mi id="S5.SS4.p1.4.m4.1.1.2" xref="S5.SS4.p1.4.m4.1.1.2.cmml">B</mi><mi id="S5.SS4.p1.4.m4.1.1.3" xref="S5.SS4.p1.4.m4.1.1.3.cmml">S</mi></msub><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.4.m4.1b"><apply id="S5.SS4.p1.4.m4.1.1.cmml" xref="S5.SS4.p1.4.m4.1.1"><csymbol cd="ambiguous" id="S5.SS4.p1.4.m4.1.1.1.cmml" xref="S5.SS4.p1.4.m4.1.1">subscript</csymbol><ci id="S5.SS4.p1.4.m4.1.1.2.cmml" xref="S5.SS4.p1.4.m4.1.1.2">𝐵</ci><ci id="S5.SS4.p1.4.m4.1.1.3.cmml" xref="S5.SS4.p1.4.m4.1.1.3">𝑆</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.4.m4.1c">B_{S}</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.4.m4.1d">italic_B start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT</annotation></semantics></math>=<math alttext="2d" class="ltx_Math" display="inline" id="S5.SS4.p1.5.m5.1"><semantics id="S5.SS4.p1.5.m5.1a"><mrow id="S5.SS4.p1.5.m5.1.1" xref="S5.SS4.p1.5.m5.1.1.cmml"><mn id="S5.SS4.p1.5.m5.1.1.2" xref="S5.SS4.p1.5.m5.1.1.2.cmml">2</mn><mo id="S5.SS4.p1.5.m5.1.1.1" xref="S5.SS4.p1.5.m5.1.1.1.cmml">⁢</mo><mi id="S5.SS4.p1.5.m5.1.1.3" xref="S5.SS4.p1.5.m5.1.1.3.cmml">d</mi></mrow><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.5.m5.1b"><apply id="S5.SS4.p1.5.m5.1.1.cmml" xref="S5.SS4.p1.5.m5.1.1"><times id="S5.SS4.p1.5.m5.1.1.1.cmml" xref="S5.SS4.p1.5.m5.1.1.1"></times><cn id="S5.SS4.p1.5.m5.1.1.2.cmml" type="integer" xref="S5.SS4.p1.5.m5.1.1.2">2</cn><ci id="S5.SS4.p1.5.m5.1.1.3.cmml" xref="S5.SS4.p1.5.m5.1.1.3">𝑑</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.5.m5.1c">2d</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.5.m5.1d">2 italic_d</annotation></semantics></math> memory reads per <math alttext="T" class="ltx_Math" display="inline" id="S5.SS4.p1.6.m6.1"><semantics id="S5.SS4.p1.6.m6.1a"><mi id="S5.SS4.p1.6.m6.1.1" xref="S5.SS4.p1.6.m6.1.1.cmml">T</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.6.m6.1b"><ci id="S5.SS4.p1.6.m6.1.1.cmml" xref="S5.SS4.p1.6.m6.1.1">𝑇</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.6.m6.1c">T</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.6.m6.1d">italic_T</annotation></semantics></math> cycles for <math alttext="d" class="ltx_Math" display="inline" id="S5.SS4.p1.7.m7.1"><semantics id="S5.SS4.p1.7.m7.1a"><mi id="S5.SS4.p1.7.m7.1.1" xref="S5.SS4.p1.7.m7.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.7.m7.1b"><ci id="S5.SS4.p1.7.m7.1.1.cmml" xref="S5.SS4.p1.7.m7.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.7.m7.1c">d</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.7.m7.1d">italic_d</annotation></semantics></math>-dimensional vectors. while temporal mapping requires loading <math alttext="B_{T}" class="ltx_Math" display="inline" id="S5.SS4.p1.8.m8.1"><semantics id="S5.SS4.p1.8.m8.1a"><msub id="S5.SS4.p1.8.m8.1.1" xref="S5.SS4.p1.8.m8.1.1.cmml"><mi id="S5.SS4.p1.8.m8.1.1.2" xref="S5.SS4.p1.8.m8.1.1.2.cmml">B</mi><mi id="S5.SS4.p1.8.m8.1.1.3" xref="S5.SS4.p1.8.m8.1.1.3.cmml">T</mi></msub><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.8.m8.1b"><apply id="S5.SS4.p1.8.m8.1.1.cmml" xref="S5.SS4.p1.8.m8.1.1"><csymbol cd="ambiguous" id="S5.SS4.p1.8.m8.1.1.1.cmml" xref="S5.SS4.p1.8.m8.1.1">subscript</csymbol><ci id="S5.SS4.p1.8.m8.1.1.2.cmml" xref="S5.SS4.p1.8.m8.1.1.2">𝐵</ci><ci id="S5.SS4.p1.8.m8.1.1.3.cmml" xref="S5.SS4.p1.8.m8.1.1.3">𝑇</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.8.m8.1c">B_{T}</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.8.m8.1d">italic_B start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT</annotation></semantics></math>=(<math alttext="d" class="ltx_Math" display="inline" id="S5.SS4.p1.9.m9.1"><semantics id="S5.SS4.p1.9.m9.1a"><mi id="S5.SS4.p1.9.m9.1.1" xref="S5.SS4.p1.9.m9.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.9.m9.1b"><ci id="S5.SS4.p1.9.m9.1.1.cmml" xref="S5.SS4.p1.9.m9.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.9.m9.1c">d</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.9.m9.1d">italic_d</annotation></semantics></math>+<math alttext="M" class="ltx_Math" display="inline" id="S5.SS4.p1.10.m10.1"><semantics id="S5.SS4.p1.10.m10.1a"><mi id="S5.SS4.p1.10.m10.1.1" xref="S5.SS4.p1.10.m10.1.1.cmml">M</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.10.m10.1b"><ci id="S5.SS4.p1.10.m10.1.1.cmml" xref="S5.SS4.p1.10.m10.1.1">𝑀</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.10.m10.1c">M</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.10.m10.1d">italic_M</annotation></semantics></math>)<math alttext="\times" class="ltx_Math" display="inline" id="S5.SS4.p1.11.m11.1"><semantics id="S5.SS4.p1.11.m11.1a"><mo id="S5.SS4.p1.11.m11.1.1" xref="S5.SS4.p1.11.m11.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.11.m11.1b"><times id="S5.SS4.p1.11.m11.1.1.cmml" xref="S5.SS4.p1.11.m11.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.11.m11.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.11.m11.1d">×</annotation></semantics></math><math alttext="N" class="ltx_Math" display="inline" id="S5.SS4.p1.12.m12.1"><semantics id="S5.SS4.p1.12.m12.1a"><mi id="S5.SS4.p1.12.m12.1.1" xref="S5.SS4.p1.12.m12.1.1.cmml">N</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.12.m12.1b"><ci id="S5.SS4.p1.12.m12.1.1.cmml" xref="S5.SS4.p1.12.m12.1.1">𝑁</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.12.m12.1c">N</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.12.m12.1d">italic_N</annotation></semantics></math> elements per <math alttext="T" class="ltx_Math" display="inline" id="S5.SS4.p1.13.m13.1"><semantics id="S5.SS4.p1.13.m13.1a"><mi id="S5.SS4.p1.13.m13.1.1" xref="S5.SS4.p1.13.m13.1.1.cmml">T</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.13.m13.1b"><ci id="S5.SS4.p1.13.m13.1.1.cmml" xref="S5.SS4.p1.13.m13.1.1">𝑇</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.13.m13.1c">T</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.13.m13.1d">italic_T</annotation></semantics></math> cycles. Given neurosymbolic workloads typically have <math alttext="d" class="ltx_Math" display="inline" id="S5.SS4.p1.14.m14.1"><semantics id="S5.SS4.p1.14.m14.1a"><mi id="S5.SS4.p1.14.m14.1.1" xref="S5.SS4.p1.14.m14.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.14.m14.1b"><ci id="S5.SS4.p1.14.m14.1.1.cmml" xref="S5.SS4.p1.14.m14.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.14.m14.1c">d</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.14.m14.1d">italic_d</annotation></semantics></math><math alttext=">" class="ltx_Math" display="inline" id="S5.SS4.p1.15.m15.1"><semantics id="S5.SS4.p1.15.m15.1a"><mo id="S5.SS4.p1.15.m15.1.1" xref="S5.SS4.p1.15.m15.1.1.cmml">></mo><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.15.m15.1b"><gt id="S5.SS4.p1.15.m15.1.1.cmml" xref="S5.SS4.p1.15.m15.1.1"></gt></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.15.m15.1c">></annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.15.m15.1d">></annotation></semantics></math>1000, the bandwidth requirement (memory reads per <math alttext="T" class="ltx_Math" display="inline" id="S5.SS4.p1.16.m16.1"><semantics id="S5.SS4.p1.16.m16.1a"><mi id="S5.SS4.p1.16.m16.1.1" xref="S5.SS4.p1.16.m16.1.1.cmml">T</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.16.m16.1b"><ci id="S5.SS4.p1.16.m16.1.1.cmml" xref="S5.SS4.p1.16.m16.1.1">𝑇</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.16.m16.1c">T</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.16.m16.1d">italic_T</annotation></semantics></math> cycles) is reduced by <math alttext="(N/2)\times" class="ltx_math_unparsed" display="inline" id="S5.SS4.p1.17.m17.1"><semantics id="S5.SS4.p1.17.m17.1a"><mrow id="S5.SS4.p1.17.m17.1b"><mrow id="S5.SS4.p1.17.m17.1.1"><mo id="S5.SS4.p1.17.m17.1.1.1" stretchy="false">(</mo><mi id="S5.SS4.p1.17.m17.1.1.2">N</mi><mo id="S5.SS4.p1.17.m17.1.1.3">/</mo><mn id="S5.SS4.p1.17.m17.1.1.4">2</mn><mo id="S5.SS4.p1.17.m17.1.1.5" rspace="0.055em" stretchy="false">)</mo></mrow><mo id="S5.SS4.p1.17.m17.1.2">×</mo></mrow><annotation encoding="application/x-tex" id="S5.SS4.p1.17.m17.1c">(N/2)\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.17.m17.1d">( italic_N / 2 ) ×</annotation></semantics></math> via spatial mapping. Temporal mapping, on the other hand, outperforms spatial mapping when <math alttext="d" class="ltx_Math" display="inline" id="S5.SS4.p1.18.m18.1"><semantics id="S5.SS4.p1.18.m18.1a"><mi id="S5.SS4.p1.18.m18.1.1" xref="S5.SS4.p1.18.m18.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.18.m18.1b"><ci id="S5.SS4.p1.18.m18.1.1.cmml" xref="S5.SS4.p1.18.m18.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.18.m18.1c">d</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.18.m18.1d">italic_d</annotation></semantics></math><math alttext="<" class="ltx_Math" display="inline" id="S5.SS4.p1.19.m19.1"><semantics id="S5.SS4.p1.19.m19.1a"><mo id="S5.SS4.p1.19.m19.1.1" xref="S5.SS4.p1.19.m19.1.1.cmml"><</mo><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.19.m19.1b"><lt id="S5.SS4.p1.19.m19.1.1.cmml" xref="S5.SS4.p1.19.m19.1.1"></lt></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.19.m19.1c"><</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.19.m19.1d"><</annotation></semantics></math><math alttext="M" class="ltx_Math" display="inline" id="S5.SS4.p1.20.m20.1"><semantics id="S5.SS4.p1.20.m20.1a"><mi id="S5.SS4.p1.20.m20.1.1" xref="S5.SS4.p1.20.m20.1.1.cmml">M</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.20.m20.1b"><ci id="S5.SS4.p1.20.m20.1.1.cmml" xref="S5.SS4.p1.20.m20.1.1">𝑀</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.20.m20.1c">M</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.20.m20.1d">italic_M</annotation></semantics></math> by enabling the parallelization of multiple convolutions. For <math alttext="k" class="ltx_Math" display="inline" id="S5.SS4.p1.21.m21.1"><semantics id="S5.SS4.p1.21.m21.1a"><mi id="S5.SS4.p1.21.m21.1.1" xref="S5.SS4.p1.21.m21.1.1.cmml">k</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.21.m21.1b"><ci id="S5.SS4.p1.21.m21.1.1.cmml" xref="S5.SS4.p1.21.m21.1.1">𝑘</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.21.m21.1c">k</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.21.m21.1d">italic_k</annotation></semantics></math> vector-symbolic circular convolutions, temporal folding takes <math alttext="C_{T}" class="ltx_Math" display="inline" id="S5.SS4.p1.22.m22.1"><semantics id="S5.SS4.p1.22.m22.1a"><msub id="S5.SS4.p1.22.m22.1.1" xref="S5.SS4.p1.22.m22.1.1.cmml"><mi id="S5.SS4.p1.22.m22.1.1.2" xref="S5.SS4.p1.22.m22.1.1.2.cmml">C</mi><mi id="S5.SS4.p1.22.m22.1.1.3" xref="S5.SS4.p1.22.m22.1.1.3.cmml">T</mi></msub><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.22.m22.1b"><apply id="S5.SS4.p1.22.m22.1.1.cmml" xref="S5.SS4.p1.22.m22.1.1"><csymbol cd="ambiguous" id="S5.SS4.p1.22.m22.1.1.1.cmml" xref="S5.SS4.p1.22.m22.1.1">subscript</csymbol><ci id="S5.SS4.p1.22.m22.1.1.2.cmml" xref="S5.SS4.p1.22.m22.1.1.2">𝐶</ci><ci id="S5.SS4.p1.22.m22.1.1.3.cmml" xref="S5.SS4.p1.22.m22.1.1.3">𝑇</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.22.m22.1c">C_{T}</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.22.m22.1d">italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT</annotation></semantics></math>=<math alttext="\lceil k/N\rceil" class="ltx_Math" display="inline" id="S5.SS4.p1.23.m23.1"><semantics id="S5.SS4.p1.23.m23.1a"><mrow id="S5.SS4.p1.23.m23.1.1.1" xref="S5.SS4.p1.23.m23.1.1.2.cmml"><mo id="S5.SS4.p1.23.m23.1.1.1.2" stretchy="false" xref="S5.SS4.p1.23.m23.1.1.2.1.cmml">⌈</mo><mrow id="S5.SS4.p1.23.m23.1.1.1.1" xref="S5.SS4.p1.23.m23.1.1.1.1.cmml"><mi id="S5.SS4.p1.23.m23.1.1.1.1.2" xref="S5.SS4.p1.23.m23.1.1.1.1.2.cmml">k</mi><mo id="S5.SS4.p1.23.m23.1.1.1.1.1" xref="S5.SS4.p1.23.m23.1.1.1.1.1.cmml">/</mo><mi id="S5.SS4.p1.23.m23.1.1.1.1.3" xref="S5.SS4.p1.23.m23.1.1.1.1.3.cmml">N</mi></mrow><mo id="S5.SS4.p1.23.m23.1.1.1.3" stretchy="false" xref="S5.SS4.p1.23.m23.1.1.2.1.cmml">⌉</mo></mrow><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.23.m23.1b"><apply id="S5.SS4.p1.23.m23.1.1.2.cmml" xref="S5.SS4.p1.23.m23.1.1.1"><ceiling id="S5.SS4.p1.23.m23.1.1.2.1.cmml" xref="S5.SS4.p1.23.m23.1.1.1.2"></ceiling><apply id="S5.SS4.p1.23.m23.1.1.1.1.cmml" xref="S5.SS4.p1.23.m23.1.1.1.1"><divide id="S5.SS4.p1.23.m23.1.1.1.1.1.cmml" xref="S5.SS4.p1.23.m23.1.1.1.1.1"></divide><ci id="S5.SS4.p1.23.m23.1.1.1.1.2.cmml" xref="S5.SS4.p1.23.m23.1.1.1.1.2">𝑘</ci><ci id="S5.SS4.p1.23.m23.1.1.1.1.3.cmml" xref="S5.SS4.p1.23.m23.1.1.1.1.3">𝑁</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.23.m23.1c">\lceil k/N\rceil</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.23.m23.1d">⌈ italic_k / italic_N ⌉</annotation></semantics></math><math alttext="\times" class="ltx_Math" display="inline" id="S5.SS4.p1.24.m24.1"><semantics id="S5.SS4.p1.24.m24.1a"><mo id="S5.SS4.p1.24.m24.1.1" xref="S5.SS4.p1.24.m24.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.24.m24.1b"><times id="S5.SS4.p1.24.m24.1.1.cmml" xref="S5.SS4.p1.24.m24.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.24.m24.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.24.m24.1d">×</annotation></semantics></math><math alttext="\lceil d/M\rceil" class="ltx_Math" display="inline" id="S5.SS4.p1.25.m25.1"><semantics id="S5.SS4.p1.25.m25.1a"><mrow id="S5.SS4.p1.25.m25.1.1.1" xref="S5.SS4.p1.25.m25.1.1.2.cmml"><mo id="S5.SS4.p1.25.m25.1.1.1.2" stretchy="false" xref="S5.SS4.p1.25.m25.1.1.2.1.cmml">⌈</mo><mrow id="S5.SS4.p1.25.m25.1.1.1.1" xref="S5.SS4.p1.25.m25.1.1.1.1.cmml"><mi id="S5.SS4.p1.25.m25.1.1.1.1.2" xref="S5.SS4.p1.25.m25.1.1.1.1.2.cmml">d</mi><mo id="S5.SS4.p1.25.m25.1.1.1.1.1" xref="S5.SS4.p1.25.m25.1.1.1.1.1.cmml">/</mo><mi id="S5.SS4.p1.25.m25.1.1.1.1.3" xref="S5.SS4.p1.25.m25.1.1.1.1.3.cmml">M</mi></mrow><mo id="S5.SS4.p1.25.m25.1.1.1.3" stretchy="false" xref="S5.SS4.p1.25.m25.1.1.2.1.cmml">⌉</mo></mrow><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.25.m25.1b"><apply id="S5.SS4.p1.25.m25.1.1.2.cmml" xref="S5.SS4.p1.25.m25.1.1.1"><ceiling id="S5.SS4.p1.25.m25.1.1.2.1.cmml" xref="S5.SS4.p1.25.m25.1.1.1.2"></ceiling><apply id="S5.SS4.p1.25.m25.1.1.1.1.cmml" xref="S5.SS4.p1.25.m25.1.1.1.1"><divide id="S5.SS4.p1.25.m25.1.1.1.1.1.cmml" xref="S5.SS4.p1.25.m25.1.1.1.1.1"></divide><ci id="S5.SS4.p1.25.m25.1.1.1.1.2.cmml" xref="S5.SS4.p1.25.m25.1.1.1.1.2">𝑑</ci><ci id="S5.SS4.p1.25.m25.1.1.1.1.3.cmml" xref="S5.SS4.p1.25.m25.1.1.1.1.3">𝑀</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.25.m25.1c">\lceil d/M\rceil</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.25.m25.1d">⌈ italic_d / italic_M ⌉</annotation></semantics></math><math alttext="\times" class="ltx_Math" display="inline" id="S5.SS4.p1.26.m26.1"><semantics id="S5.SS4.p1.26.m26.1a"><mo id="S5.SS4.p1.26.m26.1.1" xref="S5.SS4.p1.26.m26.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.26.m26.1b"><times id="S5.SS4.p1.26.m26.1.1.cmml" xref="S5.SS4.p1.26.m26.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.26.m26.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.26.m26.1d">×</annotation></semantics></math><math alttext="T" class="ltx_Math" display="inline" id="S5.SS4.p1.27.m27.1"><semantics id="S5.SS4.p1.27.m27.1a"><mi id="S5.SS4.p1.27.m27.1.1" xref="S5.SS4.p1.27.m27.1.1.cmml">T</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.27.m27.1b"><ci id="S5.SS4.p1.27.m27.1.1.cmml" xref="S5.SS4.p1.27.m27.1.1">𝑇</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.27.m27.1c">T</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.27.m27.1d">italic_T</annotation></semantics></math> cycles, while spatial folding takes <math alttext="C_{S}" class="ltx_Math" display="inline" id="S5.SS4.p1.28.m28.1"><semantics id="S5.SS4.p1.28.m28.1a"><msub id="S5.SS4.p1.28.m28.1.1" xref="S5.SS4.p1.28.m28.1.1.cmml"><mi id="S5.SS4.p1.28.m28.1.1.2" xref="S5.SS4.p1.28.m28.1.1.2.cmml">C</mi><mi id="S5.SS4.p1.28.m28.1.1.3" xref="S5.SS4.p1.28.m28.1.1.3.cmml">S</mi></msub><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.28.m28.1b"><apply id="S5.SS4.p1.28.m28.1.1.cmml" xref="S5.SS4.p1.28.m28.1.1"><csymbol cd="ambiguous" id="S5.SS4.p1.28.m28.1.1.1.cmml" xref="S5.SS4.p1.28.m28.1.1">subscript</csymbol><ci id="S5.SS4.p1.28.m28.1.1.2.cmml" xref="S5.SS4.p1.28.m28.1.1.2">𝐶</ci><ci id="S5.SS4.p1.28.m28.1.1.3.cmml" xref="S5.SS4.p1.28.m28.1.1.3">𝑆</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.28.m28.1c">C_{S}</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.28.m28.1d">italic_C start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT</annotation></semantics></math>=<math alttext="k" class="ltx_Math" display="inline" id="S5.SS4.p1.29.m29.1"><semantics id="S5.SS4.p1.29.m29.1a"><mi id="S5.SS4.p1.29.m29.1.1" xref="S5.SS4.p1.29.m29.1.1.cmml">k</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.29.m29.1b"><ci id="S5.SS4.p1.29.m29.1.1.cmml" xref="S5.SS4.p1.29.m29.1.1">𝑘</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.29.m29.1c">k</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.29.m29.1d">italic_k</annotation></semantics></math><math alttext="\times" class="ltx_Math" display="inline" id="S5.SS4.p1.30.m30.1"><semantics id="S5.SS4.p1.30.m30.1a"><mo id="S5.SS4.p1.30.m30.1.1" xref="S5.SS4.p1.30.m30.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.30.m30.1b"><times id="S5.SS4.p1.30.m30.1.1.cmml" xref="S5.SS4.p1.30.m30.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.30.m30.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.30.m30.1d">×</annotation></semantics></math><math alttext="\lceil d/(N\times M)\rceil" class="ltx_Math" display="inline" id="S5.SS4.p1.31.m31.1"><semantics id="S5.SS4.p1.31.m31.1a"><mrow id="S5.SS4.p1.31.m31.1.1.1" xref="S5.SS4.p1.31.m31.1.1.2.cmml"><mo id="S5.SS4.p1.31.m31.1.1.1.2" stretchy="false" xref="S5.SS4.p1.31.m31.1.1.2.1.cmml">⌈</mo><mrow id="S5.SS4.p1.31.m31.1.1.1.1" xref="S5.SS4.p1.31.m31.1.1.1.1.cmml"><mi id="S5.SS4.p1.31.m31.1.1.1.1.3" xref="S5.SS4.p1.31.m31.1.1.1.1.3.cmml">d</mi><mo id="S5.SS4.p1.31.m31.1.1.1.1.2" xref="S5.SS4.p1.31.m31.1.1.1.1.2.cmml">/</mo><mrow id="S5.SS4.p1.31.m31.1.1.1.1.1.1" xref="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.cmml"><mo id="S5.SS4.p1.31.m31.1.1.1.1.1.1.2" stretchy="false" xref="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.cmml">(</mo><mrow id="S5.SS4.p1.31.m31.1.1.1.1.1.1.1" xref="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.cmml"><mi id="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.2" xref="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.2.cmml">N</mi><mo id="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.1" lspace="0.222em" rspace="0.222em" xref="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.1.cmml">×</mo><mi id="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.3" xref="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.3.cmml">M</mi></mrow><mo id="S5.SS4.p1.31.m31.1.1.1.1.1.1.3" stretchy="false" xref="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S5.SS4.p1.31.m31.1.1.1.3" stretchy="false" xref="S5.SS4.p1.31.m31.1.1.2.1.cmml">⌉</mo></mrow><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.31.m31.1b"><apply id="S5.SS4.p1.31.m31.1.1.2.cmml" xref="S5.SS4.p1.31.m31.1.1.1"><ceiling id="S5.SS4.p1.31.m31.1.1.2.1.cmml" xref="S5.SS4.p1.31.m31.1.1.1.2"></ceiling><apply id="S5.SS4.p1.31.m31.1.1.1.1.cmml" xref="S5.SS4.p1.31.m31.1.1.1.1"><divide id="S5.SS4.p1.31.m31.1.1.1.1.2.cmml" xref="S5.SS4.p1.31.m31.1.1.1.1.2"></divide><ci id="S5.SS4.p1.31.m31.1.1.1.1.3.cmml" xref="S5.SS4.p1.31.m31.1.1.1.1.3">𝑑</ci><apply id="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.cmml" xref="S5.SS4.p1.31.m31.1.1.1.1.1.1"><times id="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.1.cmml" xref="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.1"></times><ci id="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.2.cmml" xref="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.2">𝑁</ci><ci id="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.3.cmml" xref="S5.SS4.p1.31.m31.1.1.1.1.1.1.1.3">𝑀</ci></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.31.m31.1c">\lceil d/(N\times M)\rceil</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.31.m31.1d">⌈ italic_d / ( italic_N × italic_M ) ⌉</annotation></semantics></math><math alttext="\times" class="ltx_Math" display="inline" id="S5.SS4.p1.32.m32.1"><semantics id="S5.SS4.p1.32.m32.1a"><mo id="S5.SS4.p1.32.m32.1.1" xref="S5.SS4.p1.32.m32.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.32.m32.1b"><times id="S5.SS4.p1.32.m32.1.1.cmml" xref="S5.SS4.p1.32.m32.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.32.m32.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.32.m32.1d">×</annotation></semantics></math><math alttext="T" class="ltx_Math" display="inline" id="S5.SS4.p1.33.m33.1"><semantics id="S5.SS4.p1.33.m33.1a"><mi id="S5.SS4.p1.33.m33.1.1" xref="S5.SS4.p1.33.m33.1.1.cmml">T</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p1.33.m33.1b"><ci id="S5.SS4.p1.33.m33.1.1.cmml" xref="S5.SS4.p1.33.m33.1.1">𝑇</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p1.33.m33.1c">T</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p1.33.m33.1d">italic_T</annotation></semantics></math> cycles.</p> </div> <figure class="ltx_figure" id="S5.F13"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="195" id="S5.F13.g1" src="x13.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S5.F13.9.1.1" style="font-size:90%;">Figure 13</span>: </span><span class="ltx_text ltx_font_bold" id="S5.F13.10.2" style="font-size:90%;">Adaptive workload-aware scheduling (<span class="ltx_text ltx_font_italic" id="S5.F13.10.2.1">adSCH</span>) strategy.<span class="ltx_text ltx_font_medium" id="S5.F13.10.2.2"> </span>(a)<span class="ltx_text ltx_font_medium" id="S5.F13.10.2.3"> Neurosymbolic system-level challenges. The <span class="ltx_text ltx_font_italic" id="S5.F13.10.2.3.1">adSCH</span> scheme enables </span>(b)<span class="ltx_text ltx_font_medium" id="S5.F13.10.2.4"> interleaved neural/symbolic processing and </span>(c)<span class="ltx_text ltx_font_medium" id="S5.F13.10.2.5"> cell-wise partition across CogSys arrays with multi-level parallelism. </span>(d)<span class="ltx_text ltx_font_medium" id="S5.F13.10.2.6"> An <span class="ltx_text ltx_font_italic" id="S5.F13.10.2.6.1">adSCH</span> example on NVSA algorithm. The scheduling method ensures CogSys is adaptable and scalable across neurosymbolic workloads and tasks.</span></span></figcaption> </figure> <div class="ltx_para" id="S5.SS4.p2"> <p class="ltx_p" id="S5.SS4.p2.4">To efficiently process symbolic operations and balance between bandwidth and latency, we conduct an adaptive search between spatial and temporal mapping based on workloads and CogSys configurations. For example, For <math alttext="N" class="ltx_Math" display="inline" id="S5.SS4.p2.1.m1.1"><semantics id="S5.SS4.p2.1.m1.1a"><mi id="S5.SS4.p2.1.m1.1.1" xref="S5.SS4.p2.1.m1.1.1.cmml">N</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p2.1.m1.1b"><ci id="S5.SS4.p2.1.m1.1.1.cmml" xref="S5.SS4.p2.1.m1.1.1">𝑁</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p2.1.m1.1c">N</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p2.1.m1.1d">italic_N</annotation></semantics></math>=32 and <math alttext="d" class="ltx_Math" display="inline" id="S5.SS4.p2.2.m2.1"><semantics id="S5.SS4.p2.2.m2.1a"><mi id="S5.SS4.p2.2.m2.1.1" xref="S5.SS4.p2.2.m2.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p2.2.m2.1b"><ci id="S5.SS4.p2.2.m2.1.1.cmml" xref="S5.SS4.p2.2.m2.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p2.2.m2.1c">d</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p2.2.m2.1d">italic_d</annotation></semantics></math>=1024 in NVSA (<math alttext="k" class="ltx_Math" display="inline" id="S5.SS4.p2.3.m3.1"><semantics id="S5.SS4.p2.3.m3.1a"><mi id="S5.SS4.p2.3.m3.1.1" xref="S5.SS4.p2.3.m3.1.1.cmml">k</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p2.3.m3.1b"><ci id="S5.SS4.p2.3.m3.1.1.cmml" xref="S5.SS4.p2.3.m3.1.1">𝑘</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p2.3.m3.1c">k</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p2.3.m3.1d">italic_k</annotation></semantics></math>=210) and LVRF (<math alttext="k" class="ltx_Math" display="inline" id="S5.SS4.p2.4.m4.1"><semantics id="S5.SS4.p2.4.m4.1a"><mi id="S5.SS4.p2.4.m4.1.1" xref="S5.SS4.p2.4.m4.1.1.cmml">k</mi><annotation-xml encoding="MathML-Content" id="S5.SS4.p2.4.m4.1b"><ci id="S5.SS4.p2.4.m4.1.1.cmml" xref="S5.SS4.p2.4.m4.1.1">𝑘</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS4.p2.4.m4.1c">k</annotation><annotation encoding="application/x-llamapun" id="S5.SS4.p2.4.m4.1d">italic_k</annotation></semantics></math>=2575) workloads, CogSys opts for temporal mapping with 32 parallel circular convolutions.</p> </div> </section> <section class="ltx_subsection" id="S5.SS5"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S5.SS5.5.1.1">V-E</span> </span><span class="ltx_text ltx_font_italic" id="S5.SS5.6.2">Adaptive Scale-Up and Scale-Out Strategy</span> </h3> <div class="ltx_para" id="S5.SS5.p1"> <p class="ltx_p" id="S5.SS5.p1.3"><span class="ltx_text ltx_font_bold" id="S5.SS5.p1.3.1">Scale-up and scale-out flexibility.</span> To enhance design utilization and scalability, CogSys proposes to operate as a combination of scale-up and scaled-out reconfigurable arrays, with the support of <span class="ltx_text ltx_framed ltx_framed_underline" id="S5.SS5.p1.3.2">systolic cell-wise parallelism (ScWP)</span> and <span class="ltx_text ltx_framed ltx_framed_underline" id="S5.SS5.p1.3.3">column-wise parallelism (CWP)</span> for circular convolutions. The (<math alttext="N" class="ltx_Math" display="inline" id="S5.SS5.p1.1.m1.1"><semantics id="S5.SS5.p1.1.m1.1a"><mi id="S5.SS5.p1.1.m1.1.1" xref="S5.SS5.p1.1.m1.1.1.cmml">N</mi><annotation-xml encoding="MathML-Content" id="S5.SS5.p1.1.m1.1b"><ci id="S5.SS5.p1.1.m1.1.1.cmml" xref="S5.SS5.p1.1.m1.1.1">𝑁</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS5.p1.1.m1.1c">N</annotation><annotation encoding="application/x-llamapun" id="S5.SS5.p1.1.m1.1d">italic_N</annotation></semantics></math>=32, <math alttext="M" class="ltx_Math" display="inline" id="S5.SS5.p1.2.m2.1"><semantics id="S5.SS5.p1.2.m2.1a"><mi id="S5.SS5.p1.2.m2.1.1" xref="S5.SS5.p1.2.m2.1.1.cmml">M</mi><annotation-xml encoding="MathML-Content" id="S5.SS5.p1.2.m2.1b"><ci id="S5.SS5.p1.2.m2.1.1.cmml" xref="S5.SS5.p1.2.m2.1.1">𝑀</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS5.p1.2.m2.1c">M</annotation><annotation encoding="application/x-llamapun" id="S5.SS5.p1.2.m2.1d">italic_M</annotation></semantics></math>=512) configuration is constructed from 16 32<math alttext="\times" class="ltx_Math" display="inline" id="S5.SS5.p1.3.m3.1"><semantics id="S5.SS5.p1.3.m3.1a"><mo id="S5.SS5.p1.3.m3.1.1" xref="S5.SS5.p1.3.m3.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.SS5.p1.3.m3.1b"><times id="S5.SS5.p1.3.m3.1.1.cmml" xref="S5.SS5.p1.3.m3.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.SS5.p1.3.m3.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS5.p1.3.m3.1d">×</annotation></semantics></math>32 cells by configuring the muxes to choose among five schemes, i.e., scale-up GEMM, scale-out GEMM, scale-up Conv, scale-out Conv, and scale-out GEMM+Conv, enabled by the systolic cell-wise heterogeneous partitioning scheme in Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S6.SS2" title="VI-B Adaptive Workload-Aware Scheduling (adSCH) Strategy ‣ VI CogSys: Scheduling Strategy ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">VI-B</span></span></a>. For GEMM, the scale-out scheme enables higher utilization and ScWP. For symbolic circular convolution, the scaled-out scheme enables ScWP and CWP for low-dimensional vectors.</p> </div> <div class="ltx_para" id="S5.SS5.p2"> <p class="ltx_p" id="S5.SS5.p2.7"><span class="ltx_text ltx_font_bold" id="S5.SS5.p2.7.1">Design space exploration.</span> We search scale-out/scale-up schemes based on workloads and CogSys configurations to increase utilization and parallelism. For instance, the 16 32<math alttext="\times" class="ltx_Math" display="inline" id="S5.SS5.p2.1.m1.1"><semantics id="S5.SS5.p2.1.m1.1a"><mo id="S5.SS5.p2.1.m1.1.1" xref="S5.SS5.p2.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.SS5.p2.1.m1.1b"><times id="S5.SS5.p2.1.m1.1.1.cmml" xref="S5.SS5.p2.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.SS5.p2.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS5.p2.1.m1.1d">×</annotation></semantics></math>32 scaled-out cells achieve 91.26% utilization, with 10.71<math alttext="\times" class="ltx_Math" display="inline" id="S5.SS5.p2.2.m2.1"><semantics id="S5.SS5.p2.2.m2.1a"><mo id="S5.SS5.p2.2.m2.1.1" xref="S5.SS5.p2.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.SS5.p2.2.m2.1b"><times id="S5.SS5.p2.2.m2.1.1.cmml" xref="S5.SS5.p2.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.SS5.p2.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS5.p2.2.m2.1d">×</annotation></semantics></math> and 7.83<math alttext="\times" class="ltx_Math" display="inline" id="S5.SS5.p2.3.m3.1"><semantics id="S5.SS5.p2.3.m3.1a"><mo id="S5.SS5.p2.3.m3.1.1" xref="S5.SS5.p2.3.m3.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.SS5.p2.3.m3.1b"><times id="S5.SS5.p2.3.m3.1.1.cmml" xref="S5.SS5.p2.3.m3.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.SS5.p2.3.m3.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS5.p2.3.m3.1d">×</annotation></semantics></math> speedup over one 128<math alttext="\times" class="ltx_Math" display="inline" id="S5.SS5.p2.4.m4.1"><semantics id="S5.SS5.p2.4.m4.1a"><mo id="S5.SS5.p2.4.m4.1.1" xref="S5.SS5.p2.4.m4.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.SS5.p2.4.m4.1b"><times id="S5.SS5.p2.4.m4.1.1.cmml" xref="S5.SS5.p2.4.m4.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.SS5.p2.4.m4.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS5.p2.4.m4.1d">×</annotation></semantics></math>128 scaled-up and four 64<math alttext="\times" class="ltx_Math" display="inline" id="S5.SS5.p2.5.m5.1"><semantics id="S5.SS5.p2.5.m5.1a"><mo id="S5.SS5.p2.5.m5.1.1" xref="S5.SS5.p2.5.m5.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.SS5.p2.5.m5.1b"><times id="S5.SS5.p2.5.m5.1.1.cmml" xref="S5.SS5.p2.5.m5.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.SS5.p2.5.m5.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.SS5.p2.5.m5.1d">×</annotation></semantics></math>64 scaled-out cells, respectively, for NVSA and LVRF neural modules. For vector-symbolic operations, CogSys chooses a scale-up scheme for NVSA and LVRF (high-dimensional vector processing, <math alttext="d" class="ltx_Math" display="inline" id="S5.SS5.p2.6.m6.1"><semantics id="S5.SS5.p2.6.m6.1a"><mi id="S5.SS5.p2.6.m6.1.1" xref="S5.SS5.p2.6.m6.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S5.SS5.p2.6.m6.1b"><ci id="S5.SS5.p2.6.m6.1.1.cmml" xref="S5.SS5.p2.6.m6.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS5.p2.6.m6.1c">d</annotation><annotation encoding="application/x-llamapun" id="S5.SS5.p2.6.m6.1d">italic_d</annotation></semantics></math>=1024) and a scale-out scheme for MIMONet (low-dimensional vector processing, <math alttext="d" class="ltx_Math" display="inline" id="S5.SS5.p2.7.m7.1"><semantics id="S5.SS5.p2.7.m7.1a"><mi id="S5.SS5.p2.7.m7.1.1" xref="S5.SS5.p2.7.m7.1.1.cmml">d</mi><annotation-xml encoding="MathML-Content" id="S5.SS5.p2.7.m7.1b"><ci id="S5.SS5.p2.7.m7.1.1.cmml" xref="S5.SS5.p2.7.m7.1.1">𝑑</ci></annotation-xml><annotation encoding="application/x-tex" id="S5.SS5.p2.7.m7.1c">d</annotation><annotation encoding="application/x-llamapun" id="S5.SS5.p2.7.m7.1d">italic_d</annotation></semantics></math>=64).</p> </div> </section> <section class="ltx_subsection" id="S5.SS6"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S5.SS6.5.1.1">V-F</span> </span><span class="ltx_text ltx_font_italic" id="S5.SS6.6.2">Double-buffered Memory and Custom SIMD Unit</span> </h3> <div class="ltx_para" id="S5.SS6.p1"> <p class="ltx_p" id="S5.SS6.p1.1"><span class="ltx_text ltx_font_bold" id="S5.SS6.p1.1.1">Double-buffered memory.</span> CogSys array is backed by three double-buffered SRAMs (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.F10" title="Figure 10 ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">10</span></a><span class="ltx_text" id="S5.SS6.p1.1.2" style="color:#0000FF;">b</span>). The double-buffered memory is effective in reducing off-chip accesses and stalls due to loads and stores to reduce latency. SRAM A is common for all cells to utilize weights temporal reuse while SRAM B is distributed across cells. Through design space exploration, CogSys opts for 256kB for SRAM A and 4MB for SRAM B.</p> </div> <div class="ltx_para" id="S5.SS6.p2"> <p class="ltx_p" id="S5.SS6.p2.1"><span class="ltx_text ltx_font_bold" id="S5.SS6.p2.1.1">Custom SIMD units.</span> CogSys employs a custom SIMD unit to execute vector reductions and element-wise operations (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.F10" title="Figure 10 ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">10</span></a><span class="ltx_text" id="S5.SS6.p2.1.2" style="color:#0000FF;">a</span>). The SIMD unit efficiently handles data transfer between the CogSys array output and input SRAM, enabling the array to seamlessly access data for subsequent operations. The SIMD unit is comprised of multiple PEs, each designed with compact logic circuits (i.e., sum, mult/div, exp/log/tanh, norm, softmax, etc) to perform vector operations on quantized data. The adaptive workload-aware scheduling (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S6" title="VI CogSys: Scheduling Strategy ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">VI</span></a>) scheme schedules workloads across CogSys array and SIMD units to balance the runtime of neural and symbolic operations.</p> </div> <figure class="ltx_table" id="S5.T5"> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S5.T5.22.1.1" style="font-size:90%;">TABLE V</span>: </span><span class="ltx_text ltx_font_bold" id="S5.T5.23.2" style="font-size:90%;">Design choice discussion.<span class="ltx_text ltx_font_medium" id="S5.T5.23.2.1"> Area, latency, energy, and utilization comparison with reconfigurable and heterogeneous PEs.</span></span></figcaption> <div class="ltx_inline-block ltx_align_center ltx_transformed_outer" id="S5.T5.19" style="width:433.6pt;height:153.2pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="transform:translate(15.6pt,-5.5pt) scale(1.07753750866991,1.07753750866991) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S5.T5.19.19"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S5.T5.19.19.20.1"> <th class="ltx_td ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T5.19.19.20.1.1" style="padding:0.5pt 2.0pt;"></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T5.19.19.20.1.2" style="padding:0.5pt 2.0pt;"><span class="ltx_text ltx_font_bold" id="S5.T5.19.19.20.1.2.1">Configuration</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T5.19.19.20.1.3" style="padding:0.5pt 2.0pt;"><span class="ltx_text ltx_font_bold" id="S5.T5.19.19.20.1.3.1">Area</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T5.19.19.20.1.4" style="padding:0.5pt 2.0pt;"><span class="ltx_text ltx_font_bold" id="S5.T5.19.19.20.1.4.1">Latency</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S5.T5.19.19.20.1.5" style="padding:0.5pt 2.0pt;"><span class="ltx_text ltx_font_bold" id="S5.T5.19.19.20.1.5.1">Energy</span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S5.T5.19.19.20.1.6" style="padding:0.5pt 2.0pt;"><span class="ltx_text ltx_font_bold" id="S5.T5.19.19.20.1.6.1">Utilization</span></th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S5.T5.5.5.5"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T5.5.5.5.6" style="padding:0.5pt 2.0pt;">Reconfigurable PE (CogSys)</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T5.2.2.2.2" style="padding:0.5pt 2.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S5.T5.2.2.2.2.2"> <tr class="ltx_tr" id="S5.T5.2.2.2.2.2.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S5.T5.2.2.2.2.2.2.2" style="padding:0.5pt 2.0pt;">16<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.1.1.1.1.1.1.1.m1.1"><semantics id="S5.T5.1.1.1.1.1.1.1.m1.1a"><mo id="S5.T5.1.1.1.1.1.1.1.m1.1.1" xref="S5.T5.1.1.1.1.1.1.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.1.1.1.1.1.1.1.m1.1b"><times id="S5.T5.1.1.1.1.1.1.1.m1.1.1.cmml" xref="S5.T5.1.1.1.1.1.1.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.1.1.1.1.1.1.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.1.1.1.1.1.1.1.m1.1d">×</annotation></semantics></math>32<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.2.2.2.2.2.2.2.m2.1"><semantics id="S5.T5.2.2.2.2.2.2.2.m2.1a"><mo id="S5.T5.2.2.2.2.2.2.2.m2.1.1" xref="S5.T5.2.2.2.2.2.2.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.2.2.2.2.2.2.2.m2.1b"><times id="S5.T5.2.2.2.2.2.2.2.m2.1.1.cmml" xref="S5.T5.2.2.2.2.2.2.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.2.2.2.2.2.2.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.2.2.2.2.2.2.2.m2.1d">×</annotation></semantics></math>32 reconfigurable</td> </tr> <tr class="ltx_tr" id="S5.T5.2.2.2.2.2.3"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S5.T5.2.2.2.2.2.3.1" style="padding:0.5pt 2.0pt;">neuro/symbolic PE</td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T5.3.3.3.3" style="padding:0.5pt 2.0pt;">1<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.3.3.3.3.m1.1"><semantics id="S5.T5.3.3.3.3.m1.1a"><mo id="S5.T5.3.3.3.3.m1.1.1" xref="S5.T5.3.3.3.3.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.3.3.3.3.m1.1b"><times id="S5.T5.3.3.3.3.m1.1.1.cmml" xref="S5.T5.3.3.3.3.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.3.3.3.3.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.3.3.3.3.m1.1d">×</annotation></semantics></math> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T5.4.4.4.4" style="padding:0.5pt 2.0pt;">1<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.4.4.4.4.m1.1"><semantics id="S5.T5.4.4.4.4.m1.1a"><mo id="S5.T5.4.4.4.4.m1.1.1" xref="S5.T5.4.4.4.4.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.4.4.4.4.m1.1b"><times id="S5.T5.4.4.4.4.m1.1.1.cmml" xref="S5.T5.4.4.4.4.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.4.4.4.4.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.4.4.4.4.m1.1d">×</annotation></semantics></math> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T5.5.5.5.5" style="padding:0.5pt 2.0pt;">1<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.5.5.5.5.m1.1"><semantics id="S5.T5.5.5.5.5.m1.1a"><mo id="S5.T5.5.5.5.5.m1.1.1" xref="S5.T5.5.5.5.5.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.5.5.5.5.m1.1b"><times id="S5.T5.5.5.5.5.m1.1.1.cmml" xref="S5.T5.5.5.5.5.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.5.5.5.5.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.5.5.5.5.m1.1d">×</annotation></semantics></math> </td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T5.5.5.5.7" style="padding:0.5pt 2.0pt;">90%</td> </tr> <tr class="ltx_tr" id="S5.T5.12.12.12"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T5.12.12.12.8" style="padding:0.5pt 2.0pt;">Heterogeneous PE</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T5.9.9.9.4" style="padding:0.5pt 2.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S5.T5.9.9.9.4.4"> <tr class="ltx_tr" id="S5.T5.7.7.7.2.2.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S5.T5.7.7.7.2.2.2.2" style="padding:0.5pt 2.0pt;">16<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.6.6.6.1.1.1.1.m1.1"><semantics id="S5.T5.6.6.6.1.1.1.1.m1.1a"><mo id="S5.T5.6.6.6.1.1.1.1.m1.1.1" xref="S5.T5.6.6.6.1.1.1.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.6.6.6.1.1.1.1.m1.1b"><times id="S5.T5.6.6.6.1.1.1.1.m1.1.1.cmml" xref="S5.T5.6.6.6.1.1.1.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.6.6.6.1.1.1.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.6.6.6.1.1.1.1.m1.1d">×</annotation></semantics></math>32<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.7.7.7.2.2.2.2.m2.1"><semantics id="S5.T5.7.7.7.2.2.2.2.m2.1a"><mo id="S5.T5.7.7.7.2.2.2.2.m2.1.1" xref="S5.T5.7.7.7.2.2.2.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.7.7.7.2.2.2.2.m2.1b"><times id="S5.T5.7.7.7.2.2.2.2.m2.1.1.cmml" xref="S5.T5.7.7.7.2.2.2.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.7.7.7.2.2.2.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.7.7.7.2.2.2.2.m2.1d">×</annotation></semantics></math>32 neuro PE</td> </tr> <tr class="ltx_tr" id="S5.T5.9.9.9.4.4.4"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S5.T5.9.9.9.4.4.4.2" style="padding:0.5pt 2.0pt;">16<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.8.8.8.3.3.3.1.m1.1"><semantics id="S5.T5.8.8.8.3.3.3.1.m1.1a"><mo id="S5.T5.8.8.8.3.3.3.1.m1.1.1" xref="S5.T5.8.8.8.3.3.3.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.8.8.8.3.3.3.1.m1.1b"><times id="S5.T5.8.8.8.3.3.3.1.m1.1.1.cmml" xref="S5.T5.8.8.8.3.3.3.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.8.8.8.3.3.3.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.8.8.8.3.3.3.1.m1.1d">×</annotation></semantics></math>32<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.9.9.9.4.4.4.2.m2.1"><semantics id="S5.T5.9.9.9.4.4.4.2.m2.1a"><mo id="S5.T5.9.9.9.4.4.4.2.m2.1.1" xref="S5.T5.9.9.9.4.4.4.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.9.9.9.4.4.4.2.m2.1b"><times id="S5.T5.9.9.9.4.4.4.2.m2.1.1.cmml" xref="S5.T5.9.9.9.4.4.4.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.9.9.9.4.4.4.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.9.9.9.4.4.4.2.m2.1d">×</annotation></semantics></math>32 symbolic PE</td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T5.10.10.10.5" style="padding:0.5pt 2.0pt;">1.96<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.10.10.10.5.m1.1"><semantics id="S5.T5.10.10.10.5.m1.1a"><mo id="S5.T5.10.10.10.5.m1.1.1" xref="S5.T5.10.10.10.5.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.10.10.10.5.m1.1b"><times id="S5.T5.10.10.10.5.m1.1.1.cmml" xref="S5.T5.10.10.10.5.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.10.10.10.5.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.10.10.10.5.m1.1d">×</annotation></semantics></math> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T5.11.11.11.6" style="padding:0.5pt 2.0pt;">1<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.11.11.11.6.m1.1"><semantics id="S5.T5.11.11.11.6.m1.1a"><mo id="S5.T5.11.11.11.6.m1.1.1" xref="S5.T5.11.11.11.6.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.11.11.11.6.m1.1b"><times id="S5.T5.11.11.11.6.m1.1.1.cmml" xref="S5.T5.11.11.11.6.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.11.11.11.6.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.11.11.11.6.m1.1d">×</annotation></semantics></math> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S5.T5.12.12.12.7" style="padding:0.5pt 2.0pt;">1.3<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.12.12.12.7.m1.1"><semantics id="S5.T5.12.12.12.7.m1.1a"><mo id="S5.T5.12.12.12.7.m1.1.1" xref="S5.T5.12.12.12.7.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.12.12.12.7.m1.1b"><times id="S5.T5.12.12.12.7.m1.1.1.cmml" xref="S5.T5.12.12.12.7.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.12.12.12.7.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.12.12.12.7.m1.1d">×</annotation></semantics></math> </td> <td class="ltx_td ltx_align_center ltx_border_t" id="S5.T5.12.12.12.9" style="padding:0.5pt 2.0pt;">45%</td> </tr> <tr class="ltx_tr" id="S5.T5.19.19.19"> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T5.19.19.19.8" style="padding:0.5pt 2.0pt;">Heterogeneous PE</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T5.16.16.16.4" style="padding:0.5pt 2.0pt;"> <table class="ltx_tabular ltx_align_middle" id="S5.T5.16.16.16.4.4"> <tr class="ltx_tr" id="S5.T5.14.14.14.2.2.2"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S5.T5.14.14.14.2.2.2.2" style="padding:0.5pt 2.0pt;">8<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.13.13.13.1.1.1.1.m1.1"><semantics id="S5.T5.13.13.13.1.1.1.1.m1.1a"><mo id="S5.T5.13.13.13.1.1.1.1.m1.1.1" xref="S5.T5.13.13.13.1.1.1.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.13.13.13.1.1.1.1.m1.1b"><times id="S5.T5.13.13.13.1.1.1.1.m1.1.1.cmml" xref="S5.T5.13.13.13.1.1.1.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.13.13.13.1.1.1.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.13.13.13.1.1.1.1.m1.1d">×</annotation></semantics></math>32<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.14.14.14.2.2.2.2.m2.1"><semantics id="S5.T5.14.14.14.2.2.2.2.m2.1a"><mo id="S5.T5.14.14.14.2.2.2.2.m2.1.1" xref="S5.T5.14.14.14.2.2.2.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.14.14.14.2.2.2.2.m2.1b"><times id="S5.T5.14.14.14.2.2.2.2.m2.1.1.cmml" xref="S5.T5.14.14.14.2.2.2.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.14.14.14.2.2.2.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.14.14.14.2.2.2.2.m2.1d">×</annotation></semantics></math>32 neuro PE</td> </tr> <tr class="ltx_tr" id="S5.T5.16.16.16.4.4.4"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S5.T5.16.16.16.4.4.4.2" style="padding:0.5pt 2.0pt;">8<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.15.15.15.3.3.3.1.m1.1"><semantics id="S5.T5.15.15.15.3.3.3.1.m1.1a"><mo id="S5.T5.15.15.15.3.3.3.1.m1.1.1" xref="S5.T5.15.15.15.3.3.3.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.15.15.15.3.3.3.1.m1.1b"><times id="S5.T5.15.15.15.3.3.3.1.m1.1.1.cmml" xref="S5.T5.15.15.15.3.3.3.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.15.15.15.3.3.3.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.15.15.15.3.3.3.1.m1.1d">×</annotation></semantics></math>32<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.16.16.16.4.4.4.2.m2.1"><semantics id="S5.T5.16.16.16.4.4.4.2.m2.1a"><mo id="S5.T5.16.16.16.4.4.4.2.m2.1.1" xref="S5.T5.16.16.16.4.4.4.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.16.16.16.4.4.4.2.m2.1b"><times id="S5.T5.16.16.16.4.4.4.2.m2.1.1.cmml" xref="S5.T5.16.16.16.4.4.4.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.16.16.16.4.4.4.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.16.16.16.4.4.4.2.m2.1d">×</annotation></semantics></math>32 symbolic PE</td> </tr> </table> </td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T5.17.17.17.5" style="padding:0.5pt 2.0pt;">0.98<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.17.17.17.5.m1.1"><semantics id="S5.T5.17.17.17.5.m1.1a"><mo id="S5.T5.17.17.17.5.m1.1.1" xref="S5.T5.17.17.17.5.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.17.17.17.5.m1.1b"><times id="S5.T5.17.17.17.5.m1.1.1.cmml" xref="S5.T5.17.17.17.5.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.17.17.17.5.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.17.17.17.5.m1.1d">×</annotation></semantics></math> </td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T5.18.18.18.6" style="padding:0.5pt 2.0pt;">2<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.18.18.18.6.m1.1"><semantics id="S5.T5.18.18.18.6.m1.1a"><mo id="S5.T5.18.18.18.6.m1.1.1" xref="S5.T5.18.18.18.6.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.18.18.18.6.m1.1b"><times id="S5.T5.18.18.18.6.m1.1.1.cmml" xref="S5.T5.18.18.18.6.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.18.18.18.6.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.18.18.18.6.m1.1d">×</annotation></semantics></math> </td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S5.T5.19.19.19.7" style="padding:0.5pt 2.0pt;">1.3<math alttext="\times" class="ltx_Math" display="inline" id="S5.T5.19.19.19.7.m1.1"><semantics id="S5.T5.19.19.19.7.m1.1a"><mo id="S5.T5.19.19.19.7.m1.1.1" xref="S5.T5.19.19.19.7.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S5.T5.19.19.19.7.m1.1b"><times id="S5.T5.19.19.19.7.m1.1.1.cmml" xref="S5.T5.19.19.19.7.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S5.T5.19.19.19.7.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S5.T5.19.19.19.7.m1.1d">×</annotation></semantics></math> </td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S5.T5.19.19.19.9" style="padding:0.5pt 2.0pt;">45%</td> </tr> </tbody> </table> </span></div> </figure> </section> <section class="ltx_subsection" id="S5.SS7"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S5.SS7.5.1.1">V-G</span> </span><span class="ltx_text ltx_font_italic" id="S5.SS7.6.2">Design Choices Discussion</span> </h3> <div class="ltx_para" id="S5.SS7.p1"> <p class="ltx_p" id="S5.SS7.p1.1"><span class="ltx_text ltx_font_bold" id="S5.SS7.p1.1.1">Reconfigurable or heterogeneous PE.</span> While specialized PEs for neural and symbolic kernels may appear more efficient for simultaneous processing, our early-phase design space exploration reveals that using separate PEs for GEMM and circular convolution leads to hardware underutilization and increased area overhead due to the sequential execution of neural and symbolic kernels, as well as the varying proportions of these operations across different workloads. When comparing designs with the same chip size, specialized PEs result in longer latency, as they provide fewer effective compute units for either neural or symbolic tasks, as summarized in Tab. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.T5" title="TABLE V ‣ V-F Double-buffered Memory and Custom SIMD Unit ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">V</span></a>. We thus opt for the reconfigurable PE approach, which offers lower area overhead and higher hardware utilization, making it suitable for both neuro-heavy and symbolic-heavy workloads.</p> </div> </section> </section> <section class="ltx_section" id="S6"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">VI </span><span class="ltx_text ltx_font_smallcaps" id="S6.1.1">CogSys: Scheduling Strategy</span> </h2> <div class="ltx_para" id="S6.p1"> <p class="ltx_p" id="S6.p1.1">This section presents CogSys adaptive workload-aware scheduling strategy. We first identify the system-level challenges of neurosymbolic workloads (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S6.SS1" title="VI-A Neurosymbolic System-Level Challenges ‣ VI CogSys: Scheduling Strategy ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">VI-A</span></span></a>), and then introduce CogSys scheduling scheme (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S6.SS2" title="VI-B Adaptive Workload-Aware Scheduling (adSCH) Strategy ‣ VI CogSys: Scheduling Strategy ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">VI-B</span></span></a>) and further discuss its scalability to other workloads and tasks (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S6.SS3" title="VI-C Scalability and Variability Support ‣ VI CogSys: Scheduling Strategy ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">VI-C</span></span></a>).</p> </div> <section class="ltx_subsection" id="S6.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S6.SS1.5.1.1">VI-A</span> </span><span class="ltx_text ltx_font_italic" id="S6.SS1.6.2">Neurosymbolic System-Level Challenges</span> </h3> <div class="ltx_para" id="S6.SS1.p1"> <p class="ltx_p" id="S6.SS1.p1.1">We identify two main neurosymbolic system-level challenges (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.F13" title="Figure 13 ‣ V-D Adaptive Spatial and Temporal Mapping Strategy ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">13</span></a><span class="ltx_text" id="S6.SS1.p1.1.1" style="color:#0000FF;">a</span>): <span class="ltx_text ltx_framed ltx_framed_underline" id="S6.SS1.p1.1.2">First</span>, the sequential execution and frequent interactions of neural and symbolic components results in long latency and low system throughput. <span class="ltx_text ltx_framed ltx_framed_underline" id="S6.SS1.p1.1.3">Second</span>, the heterogeneous neural and symbolic kernels result in low compute array utilization and efficiency of ML accelerator.</p> </div> </section> <section class="ltx_subsection" id="S6.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S6.SS2.5.1.1">VI-B</span> </span><span class="ltx_text ltx_font_italic" id="S6.SS2.6.2">Adaptive Workload-Aware Scheduling (adSCH) Strategy</span> </h3> <div class="ltx_para" id="S6.SS2.p1"> <p class="ltx_p" id="S6.SS2.p1.1"><span class="ltx_text ltx_font_bold" id="S6.SS2.p1.1.1">Adaptive scheduling (<span class="ltx_text ltx_font_italic" id="S6.SS2.p1.1.1.1">adSCH</span>) strategy.</span> To solve the system-level challenges, CogSys features an <span class="ltx_text ltx_font_italic" id="S6.SS2.p1.1.2">adSCH</span> scheme and greatly improves hardware utilization and performance. <span class="ltx_text ltx_framed ltx_framed_underline" id="S6.SS2.p1.1.3">(1) Interleaved neural/symbolic processing.</span> Despite the dependencies in neural and symbolic tasks, symbolic operations of other tasks can be interleaved within neural layer of current task via reconfigurable neuro/symbolic PE arrays (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.F13" title="Figure 13 ‣ V-D Adaptive Spatial and Temporal Mapping Strategy ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">13</span></a><span class="ltx_text" id="S6.SS2.p1.1.4" style="color:#0000FF;">b</span>). <span class="ltx_text ltx_framed ltx_framed_underline" id="S6.SS2.p1.1.5">(2) Adaptive neuro/symbolic array partition strategy.</span> We propose to adaptively allocate CogSys cells to various neural and symbolic kernels (cell-wise partition), and allocate symbolic cell columns to parallel circular convolution operations (column-wise partition) (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.F13" title="Figure 13 ‣ V-D Adaptive Spatial and Temporal Mapping Strategy ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">13</span></a><span class="ltx_text" id="S6.SS2.p1.1.6" style="color:#0000FF;">c</span>). This partition strategy is effective in handling both neural- and symbolic-intensive workloads and promotes parallelism and hardware utilization.</p> </div> <div class="ltx_para" id="S6.SS2.p2"> <p class="ltx_p" id="S6.SS2.p2.2"><span class="ltx_text ltx_font_bold" id="S6.SS2.p2.2.1">Scheduling Implementation.</span> CogSys workload-aware scheduling is performed offline by software. Since the model architecture, size, and data are known prior to execution, the host CPU precomputes the mapping of operations and array configurations, which are then offloaded to CogSys. This ensures optimal or near-optimal scheduling with zero runtime latency. The scheduling process uses a greedy search algorithm: (1) Generate an operation graph based on operation type, size, dependencies, and number of iterations; (2) Assign ready operations (not blocked by dependencies) to newly available cells, with runtime estimated analytically; (3) Maximize utilization by prioritizing neural tasks for larger cell blocks and symbolic tasks for smaller ones. Since the search only considers available blocks within the 16 array cells and ready tasks, the search space is limited to <math alttext="<" class="ltx_Math" display="inline" id="S6.SS2.p2.1.m1.1"><semantics id="S6.SS2.p2.1.m1.1a"><mo id="S6.SS2.p2.1.m1.1.1" xref="S6.SS2.p2.1.m1.1.1.cmml"><</mo><annotation-xml encoding="MathML-Content" id="S6.SS2.p2.1.m1.1b"><lt id="S6.SS2.p2.1.m1.1.1.cmml" xref="S6.SS2.p2.1.m1.1.1"></lt></annotation-xml><annotation encoding="application/x-tex" id="S6.SS2.p2.1.m1.1c"><</annotation><annotation encoding="application/x-llamapun" id="S6.SS2.p2.1.m1.1d"><</annotation></semantics></math><math alttext="O(10^{3})" class="ltx_Math" display="inline" id="S6.SS2.p2.2.m2.1"><semantics id="S6.SS2.p2.2.m2.1a"><mrow id="S6.SS2.p2.2.m2.1.1" xref="S6.SS2.p2.2.m2.1.1.cmml"><mi id="S6.SS2.p2.2.m2.1.1.3" xref="S6.SS2.p2.2.m2.1.1.3.cmml">O</mi><mo id="S6.SS2.p2.2.m2.1.1.2" xref="S6.SS2.p2.2.m2.1.1.2.cmml">⁢</mo><mrow id="S6.SS2.p2.2.m2.1.1.1.1" xref="S6.SS2.p2.2.m2.1.1.1.1.1.cmml"><mo id="S6.SS2.p2.2.m2.1.1.1.1.2" stretchy="false" xref="S6.SS2.p2.2.m2.1.1.1.1.1.cmml">(</mo><msup id="S6.SS2.p2.2.m2.1.1.1.1.1" xref="S6.SS2.p2.2.m2.1.1.1.1.1.cmml"><mn id="S6.SS2.p2.2.m2.1.1.1.1.1.2" xref="S6.SS2.p2.2.m2.1.1.1.1.1.2.cmml">10</mn><mn id="S6.SS2.p2.2.m2.1.1.1.1.1.3" xref="S6.SS2.p2.2.m2.1.1.1.1.1.3.cmml">3</mn></msup><mo id="S6.SS2.p2.2.m2.1.1.1.1.3" stretchy="false" xref="S6.SS2.p2.2.m2.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S6.SS2.p2.2.m2.1b"><apply id="S6.SS2.p2.2.m2.1.1.cmml" xref="S6.SS2.p2.2.m2.1.1"><times id="S6.SS2.p2.2.m2.1.1.2.cmml" xref="S6.SS2.p2.2.m2.1.1.2"></times><ci id="S6.SS2.p2.2.m2.1.1.3.cmml" xref="S6.SS2.p2.2.m2.1.1.3">𝑂</ci><apply id="S6.SS2.p2.2.m2.1.1.1.1.1.cmml" xref="S6.SS2.p2.2.m2.1.1.1.1"><csymbol cd="ambiguous" id="S6.SS2.p2.2.m2.1.1.1.1.1.1.cmml" xref="S6.SS2.p2.2.m2.1.1.1.1">superscript</csymbol><cn id="S6.SS2.p2.2.m2.1.1.1.1.1.2.cmml" type="integer" xref="S6.SS2.p2.2.m2.1.1.1.1.1.2">10</cn><cn id="S6.SS2.p2.2.m2.1.1.1.1.1.3.cmml" type="integer" xref="S6.SS2.p2.2.m2.1.1.1.1.1.3">3</cn></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S6.SS2.p2.2.m2.1c">O(10^{3})</annotation><annotation encoding="application/x-llamapun" id="S6.SS2.p2.2.m2.1d">italic_O ( 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT )</annotation></semantics></math> per time step, resulting in minimal offline overhead and no runtime overhead.</p> </div> <div class="ltx_para" id="S6.SS2.p3"> <p class="ltx_p" id="S6.SS2.p3.1"><span class="ltx_text ltx_font_bold" id="S6.SS2.p3.1.1">Adaptive scheduling example.</span> Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5.F13" title="Figure 13 ‣ V-D Adaptive Spatial and Temporal Mapping Strategy ‣ V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">13</span></a><span class="ltx_text" id="S6.SS2.p3.1.2" style="color:#0000FF;">d</span> presents a detailed example of <span class="ltx_text ltx_font_italic" id="S6.SS2.p3.1.3">adSCH</span> scheme with operations and cycle numbers in a NVSA segment <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>]</cite>. CogSys reconfigurable array schedules neural (convolutions, GEMMs) and symbolic (circular convolutions), while element-wise operations are offloaded to SIMD units. To mitigate underutilization, CogSys executes VSA-based codebook and symbolic kernels of the previous batch on idle hardware pieces during neural layers of the current batch, thus eliminating symbolic bottleneck. Particularly, <em class="ltx_emph ltx_font_italic" id="S6.SS2.p3.1.4">multi-level parallelism</em> is adopted to process different symbolic rules and attributes to further improve efficiency.</p> </div> <figure class="ltx_table" id="S6.T6"> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S6.T6.7.1.1" style="font-size:90%;">TABLE VI</span>: </span><span class="ltx_text ltx_font_bold" id="S6.T6.8.2" style="font-size:90%;">Baseline.<span class="ltx_text ltx_font_medium" id="S6.T6.8.2.1"> The specifications of hardware baseline. </span></span></figcaption> <div class="ltx_inline-block ltx_align_center ltx_transformed_outer" id="S6.T6.4" style="width:433.6pt;height:69.1pt;vertical-align:-0.9pt;"><span class="ltx_transformed_inner" style="transform:translate(-12.2pt,1.9pt) scale(0.946870536719084,0.946870536719084) ;"> <table class="ltx_tabular ltx_align_middle" id="S6.T6.4.4"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S6.T6.4.4.5.1"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T6.4.4.5.1.1" rowspan="2" style="padding:2pt 1.9pt;"><span class="ltx_text" id="S6.T6.4.4.5.1.1.1">HW</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" colspan="4" id="S6.T6.4.4.5.1.2" style="padding:2pt 1.9pt;">General Purpose Processor/SoC</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_rr ltx_border_t" id="S6.T6.4.4.5.1.3" rowspan="2" style="padding:2pt 1.9pt;"><span class="ltx_text" id="S6.T6.4.4.5.1.3.1"> <span class="ltx_tabular ltx_align_middle" id="S6.T6.4.4.5.1.3.1.1"> <span class="ltx_tr" id="S6.T6.4.4.5.1.3.1.1.1"> <span class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S6.T6.4.4.5.1.3.1.1.1.1" style="padding:2pt 1.9pt;">CogSys</span></span> <span class="ltx_tr" id="S6.T6.4.4.5.1.3.1.1.2"> <span class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S6.T6.4.4.5.1.3.1.1.2.1" style="padding:2pt 1.9pt;">(Ours)</span></span> </span></span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T6.4.4.5.1.4" rowspan="2" style="padding:2pt 1.9pt;"><span class="ltx_text" id="S6.T6.4.4.5.1.4.1">HW</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" colspan="3" id="S6.T6.4.4.5.1.5" style="padding:2pt 1.9pt;">ML Accelerator</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S6.T6.4.4.5.1.6" rowspan="2" style="padding:2pt 1.9pt;"><span class="ltx_text" id="S6.T6.4.4.5.1.6.1"> <span class="ltx_tabular ltx_align_middle" id="S6.T6.4.4.5.1.6.1.1"> <span class="ltx_tr" id="S6.T6.4.4.5.1.6.1.1.1"> <span class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S6.T6.4.4.5.1.6.1.1.1.1" style="padding:2pt 1.9pt;">CogSys</span></span> <span class="ltx_tr" id="S6.T6.4.4.5.1.6.1.1.2"> <span class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S6.T6.4.4.5.1.6.1.1.2.1" style="padding:2pt 1.9pt;">(Ours)</span></span> </span></span></td> </tr> <tr class="ltx_tr" id="S6.T6.4.4.6.2"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T6.4.4.6.2.1" style="padding:2pt 1.9pt;">CPU Xeon</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T6.4.4.6.2.2" style="padding:2pt 1.9pt;">RTX GPU</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T6.4.4.6.2.3" style="padding:2pt 1.9pt;">TX2</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T6.4.4.6.2.4" style="padding:2pt 1.9pt;">NX</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T6.4.4.6.2.5" style="padding:2pt 1.9pt;">TPU-like</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T6.4.4.6.2.6" style="padding:2pt 1.9pt;">MTIA-like</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T6.4.4.6.2.7" style="padding:2pt 1.9pt;">Gemmini-like</td> </tr> <tr class="ltx_tr" id="S6.T6.4.4.7.3"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S6.T6.4.4.7.3.1" rowspan="2" style="padding:2pt 1.9pt;"><span class="ltx_text" id="S6.T6.4.4.7.3.1.1">Power</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S6.T6.4.4.7.3.2" rowspan="2" style="padding:2pt 1.9pt;"><span class="ltx_text" id="S6.T6.4.4.7.3.2.1">145W</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S6.T6.4.4.7.3.3" rowspan="2" style="padding:2pt 1.9pt;"><span class="ltx_text" id="S6.T6.4.4.7.3.3.1">250W</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S6.T6.4.4.7.3.4" rowspan="2" style="padding:2pt 1.9pt;"><span class="ltx_text" id="S6.T6.4.4.7.3.4.1">15W</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S6.T6.4.4.7.3.5" rowspan="2" style="padding:2pt 1.9pt;"><span class="ltx_text" id="S6.T6.4.4.7.3.5.1">20W</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_rr ltx_border_t" id="S6.T6.4.4.7.3.6" rowspan="2" style="padding:2pt 1.9pt;"><span class="ltx_text" id="S6.T6.4.4.7.3.6.1">1.48W</span></td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T6.4.4.7.3.7" style="padding:2pt 1.9pt;">SRAM</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T6.4.4.7.3.8" style="padding:2pt 1.9pt;">4.5MB</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T6.4.4.7.3.9" style="padding:2pt 1.9pt;">4.5MB</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_r ltx_border_t" id="S6.T6.4.4.7.3.10" style="padding:2pt 1.9pt;">4.5MB</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S6.T6.4.4.7.3.11" style="padding:2pt 1.9pt;">4.5MB</td> </tr> <tr class="ltx_tr" id="S6.T6.4.4.4"> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S6.T6.4.4.4.5" style="padding:2pt 1.9pt;">#PE</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S6.T6.1.1.1.1" style="padding:2pt 1.9pt;">1 128<math alttext="\times" class="ltx_Math" display="inline" id="S6.T6.1.1.1.1.m1.1"><semantics id="S6.T6.1.1.1.1.m1.1a"><mo id="S6.T6.1.1.1.1.m1.1.1" xref="S6.T6.1.1.1.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S6.T6.1.1.1.1.m1.1b"><times id="S6.T6.1.1.1.1.m1.1.1.cmml" xref="S6.T6.1.1.1.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S6.T6.1.1.1.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S6.T6.1.1.1.1.m1.1d">×</annotation></semantics></math>128</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S6.T6.2.2.2.2" style="padding:2pt 1.9pt;">16 32<math alttext="\times" class="ltx_Math" display="inline" id="S6.T6.2.2.2.2.m1.1"><semantics id="S6.T6.2.2.2.2.m1.1a"><mo id="S6.T6.2.2.2.2.m1.1.1" xref="S6.T6.2.2.2.2.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S6.T6.2.2.2.2.m1.1b"><times id="S6.T6.2.2.2.2.m1.1.1.cmml" xref="S6.T6.2.2.2.2.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S6.T6.2.2.2.2.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S6.T6.2.2.2.2.m1.1d">×</annotation></semantics></math>32</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S6.T6.3.3.3.3" style="padding:2pt 1.9pt;">64 16<math alttext="\times" class="ltx_Math" display="inline" id="S6.T6.3.3.3.3.m1.1"><semantics id="S6.T6.3.3.3.3.m1.1a"><mo id="S6.T6.3.3.3.3.m1.1.1" xref="S6.T6.3.3.3.3.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S6.T6.3.3.3.3.m1.1b"><times id="S6.T6.3.3.3.3.m1.1.1.cmml" xref="S6.T6.3.3.3.3.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S6.T6.3.3.3.3.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S6.T6.3.3.3.3.m1.1d">×</annotation></semantics></math>16</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_t" id="S6.T6.4.4.4.4" style="padding:2pt 1.9pt;">16 32<math alttext="\times" class="ltx_Math" display="inline" id="S6.T6.4.4.4.4.m1.1"><semantics id="S6.T6.4.4.4.4.m1.1a"><mo id="S6.T6.4.4.4.4.m1.1.1" xref="S6.T6.4.4.4.4.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S6.T6.4.4.4.4.m1.1b"><times id="S6.T6.4.4.4.4.m1.1.1.cmml" xref="S6.T6.4.4.4.4.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S6.T6.4.4.4.4.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S6.T6.4.4.4.4.m1.1d">×</annotation></semantics></math>32</td> </tr> </tbody> </table> </span></div> </figure> </section> <section class="ltx_subsection" id="S6.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S6.SS3.5.1.1">VI-C</span> </span><span class="ltx_text ltx_font_italic" id="S6.SS3.6.2">Scalability and Variability Support</span> </h3> <div class="ltx_para" id="S6.SS3.p1"> <p class="ltx_p" id="S6.SS3.p1.1"><span class="ltx_text ltx_font_bold" id="S6.SS3.p1.1.1">Scalable across neurosymbolic workloads and cognition tasks.</span> The <span class="ltx_text ltx_font_italic" id="S6.SS3.p1.1.2">adSCH</span> technique enables CogSys to be easily reconfigured across <span class="ltx_text ltx_framed ltx_framed_underline" id="S6.SS3.p1.1.3">(1) neurosymbolic workloads</span> (e.g., NVSA, MIMONets, LVRF, etc) and <span class="ltx_text ltx_framed ltx_framed_underline" id="S6.SS3.p1.1.4">(2) cognitive tasks</span> such as procedurally generated matrices (PGM) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib11" title="">11</a>]</cite>, compositional visual reasoning (CVR) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib94" title="">94</a>]</cite>, synthetic visual reasoning test (SVRT) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib20" title="">20</a>]</cite> with different attributes and rules (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S1.F2" title="Figure 2 ‣ I Introduction ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">2</span></a>, Tab. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S2.T1" title="TABLE I ‣ II-C VSA-Based Symbolic Operations ‣ II Neurosymbolic AI Background and Workload ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">I</span></a>). Coupled with <span class="ltx_text ltx_font_italic" id="S6.SS3.p1.1.5">nsPE</span> reconfigurable arrays, <span class="ltx_text ltx_font_italic" id="S6.SS3.p1.1.6">BS</span> dataflow, and <span class="ltx_text ltx_font_italic" id="S6.SS3.p1.1.7">ST</span> mapping, <span class="ltx_text ltx_font_italic" id="S6.SS3.p1.1.8">adSCH</span> scheme ensures symbolic operations interleaved with neural operations with high throughput, enabling various kinds of VSA-based neurosymbolic workloads to be executed on CogSys with high efficiency and utilization, and adapt to different neuro-symbolic workload ratios and unpredictably changing workloads.</p> </div> </section> </section> <section class="ltx_section" id="S7"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">VII </span><span class="ltx_text ltx_font_smallcaps" id="S7.1.1">Evaluation Results</span> </h2> <div class="ltx_para" id="S7.p1"> <p class="ltx_p" id="S7.p1.1">This section first introduces the detailed settings for evaluating our proposed CogSys framework (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.SS1" title="VII-A Experimental Setup ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">VII-A</span></span></a>), and then benchmarks our proposed CogSys algorithm optimization (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.SS2" title="VII-B CogSys Algorithm Optimization Performance ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">VII-B</span></span></a>) and accelerator (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.SS3" title="VII-C CogSys Accelerator Performance ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">VII-C</span></span></a>), demonstrating the practical of efficient and scalable neurosymbolic systems.</p> </div> <section class="ltx_subsection" id="S7.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S7.SS1.5.1.1">VII-A</span> </span><span class="ltx_text ltx_font_italic" id="S7.SS1.6.2">Experimental Setup</span> </h3> <div class="ltx_para" id="S7.SS1.p1"> <p class="ltx_p" id="S7.SS1.p1.1"><span class="ltx_text ltx_font_bold" id="S7.SS1.p1.1.1">Datasets.</span> To evaluate the achieved cognitive reasoning capability of CogSys, we conduct experiments on the commonly-used spatial-temporal reasoning RAVEN <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib95" title="">95</a>]</cite>, I-RAVEN <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib36" title="">36</a>]</cite>, PGM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib11" title="">11</a>]</cite>, CVR <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib94" title="">94</a>]</cite>, and SVRT <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib20" title="">20</a>]</cite>. The task performance is measured by the probabilistic abduction accuracy.</p> </div> <div class="ltx_para" id="S7.SS1.p2"> <p class="ltx_p" id="S7.SS1.p2.1"><span class="ltx_text ltx_font_bold" id="S7.SS1.p2.1.1">Algorithm setup.</span> We evaluate CogSys on three state-of-the-art VSA-based neurosymbolic workloads, i.e., NVSA <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>]</cite>, MIMONet <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib60" title="">60</a>]</cite>, and LVRF <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib32" title="">32</a>]</cite>. Following <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib60" title="">60</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib32" title="">32</a>]</cite>, we determine the training hyperparameters based on the end-to-end reasoning performance on the validation set.</p> </div> <div class="ltx_para" id="S7.SS1.p3"> <p class="ltx_p" id="S7.SS1.p3.1"><span class="ltx_text ltx_font_bold" id="S7.SS1.p3.1.1">Baselines.</span> We consider several hardware baselines, including TX2, Xavier NX, RTX GPU, Xeon CPU, and ML accelerators (TPU, MTIA, Gemmini). Tab. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S6.T6" title="TABLE VI ‣ VI-B Adaptive Workload-Aware Scheduling (adSCH) Strategy ‣ VI CogSys: Scheduling Strategy ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">VI</span></a> lists their configurations.</p> </div> <div class="ltx_para" id="S7.SS1.p4"> <p class="ltx_p" id="S7.SS1.p4.1"><span class="ltx_text ltx_font_bold" id="S7.SS1.p4.1.1">Hardware setup.</span> To evaluate energy and area of CogSys accelerator, we implement CogSys in RTL, synthesize using Synopsys Design Compiler <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib79" title="">79</a>]</cite> with 0.8 GHz, and place and route using Cadence Innovus <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib13" title="">13</a>]</cite> based on TSMC 28nm node. Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.F14" title="Figure 14 ‣ VII-A Experimental Setup ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">14</span></a> illustrates the layout and specifications of CogSys accelerator. In addition, we develop a cycle-accurate simulator to estimate CogSys accelerator performance on different reasoning tasks. The proposed CogSys accelerator consumes an area of 4.0 mm<sup class="ltx_sup" id="S7.SS1.p4.1.2">2</sup> and an average power consumption of 1.48 W. <em class="ltx_emph ltx_font_italic" id="S7.SS1.p4.1.3">Compared with conventional systolic arrays that only support neural operations, CogSys provides reconfigurable support for neural and symbolic operations with only 4.8% area overhead.</em></p> </div> <figure class="ltx_figure" id="S7.F14"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="253" id="S7.F14.g1" src="x14.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S7.F14.3.1.1" style="font-size:90%;">Figure 14</span>: </span><span class="ltx_text ltx_font_bold" id="S7.F14.4.2" style="font-size:90%;">CogSys accelerator.<span class="ltx_text ltx_font_medium" id="S7.F14.4.2.1"> The layout and performance specifications of our proposed CogSys accelerator.</span></span></figcaption> </figure> </section> <section class="ltx_subsection" id="S7.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S7.SS2.5.1.1">VII-B</span> </span><span class="ltx_text ltx_font_italic" id="S7.SS2.6.2">CogSys Algorithm Optimization Performance</span> </h3> <figure class="ltx_table" id="S7.T7"> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S7.T7.5.1.1" style="font-size:90%;">TABLE VII</span>: </span><span class="ltx_text ltx_font_bold" id="S7.T7.6.2" style="font-size:90%;">Factorization accuracy comparison.<span class="ltx_text ltx_font_medium" id="S7.T7.6.2.1"> Factorization accuracy for object constituent attribute estimation across 14 scenarios.</span></span></figcaption> <div class="ltx_inline-block ltx_align_center ltx_transformed_outer" id="S7.T7.2" style="width:433.6pt;height:92.9pt;vertical-align:-0.9pt;"><span class="ltx_transformed_inner" style="transform:translate(-37.7pt,8.0pt) scale(0.85193646274522,0.85193646274522) ;"> <table class="ltx_tabular ltx_align_middle" id="S7.T7.2.2"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S7.T7.2.2.2"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.2.3" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.2.3.1">Test</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.1.1.1.1" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.1.1.1.1.1">2<math alttext="\times" class="ltx_Math" display="inline" id="S7.T7.1.1.1.1.1.m1.1"><semantics id="S7.T7.1.1.1.1.1.m1.1a"><mo id="S7.T7.1.1.1.1.1.m1.1.1" xref="S7.T7.1.1.1.1.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.T7.1.1.1.1.1.m1.1b"><times id="S7.T7.1.1.1.1.1.m1.1.1.cmml" xref="S7.T7.1.1.1.1.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.T7.1.1.1.1.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.T7.1.1.1.1.1.m1.1d">×</annotation></semantics></math>2 Grid</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.2.2" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.2.2.1">3<math alttext="\times" class="ltx_Math" display="inline" id="S7.T7.2.2.2.2.1.m1.1"><semantics id="S7.T7.2.2.2.2.1.m1.1a"><mo id="S7.T7.2.2.2.2.1.m1.1.1" xref="S7.T7.2.2.2.2.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.T7.2.2.2.2.1.m1.1b"><times id="S7.T7.2.2.2.2.1.m1.1.1.cmml" xref="S7.T7.2.2.2.2.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.T7.2.2.2.2.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.T7.2.2.2.2.1.m1.1d">×</annotation></semantics></math>3 Grid</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.2.4" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.2.4.1">Left-Right</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.2.5" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.2.5.1">Up-Down</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.2.6" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.2.6.1">Center</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.2.7" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.2.7.1">O-IC</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.2.8" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.2.8.1">DistFour</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S7.T7.2.2.2.9" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.2.9.1">Average</span></td> </tr> <tr class="ltx_tr" id="S7.T7.2.2.3.1"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.3.1.1" style="padding:1.5pt 2.1pt;"><cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib50" title="">50</a>]</cite></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.3.1.2" style="padding:1.5pt 2.1pt;">95.8%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.3.1.3" style="padding:1.5pt 2.1pt;">94.7%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.3.1.4" style="padding:1.5pt 2.1pt;">96.1%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.3.1.5" style="padding:1.5pt 2.1pt;">95.6%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.3.1.6" style="padding:1.5pt 2.1pt;">94.9%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.3.1.7" style="padding:1.5pt 2.1pt;">95.3%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.3.1.8" style="padding:1.5pt 2.1pt;">94.5%</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S7.T7.2.2.3.1.9" style="padding:1.5pt 2.1pt;">95.3%</td> </tr> <tr class="ltx_tr" id="S7.T7.2.2.4.2"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.4.2.1" style="padding:1.5pt 2.1pt;">CogSys</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.4.2.2" style="padding:1.5pt 2.1pt;">95.7%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.4.2.3" style="padding:1.5pt 2.1pt;">95.2%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.4.2.4" style="padding:1.5pt 2.1pt;">96.1%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.4.2.5" style="padding:1.5pt 2.1pt;">95.7%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.4.2.6" style="padding:1.5pt 2.1pt;">95.3%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.4.2.7" style="padding:1.5pt 2.1pt;">95.5%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.4.2.8" style="padding:1.5pt 2.1pt;">94.4%</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S7.T7.2.2.4.2.9" style="padding:1.5pt 2.1pt;">95.4%</td> </tr> <tr class="ltx_tr" id="S7.T7.2.2.5.3"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.5.3.1" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.5.3.1.1">Test</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.5.3.2" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.5.3.2.1">Constant</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.5.3.3" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.5.3.3.1">Progression</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.5.3.4" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.5.3.4.1">XOR</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.5.3.5" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.5.3.5.1">AND</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.5.3.6" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.5.3.6.1">OR</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.5.3.7" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.5.3.7.1">Arithmetic</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.5.3.8" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.5.3.8.1">Distribution</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S7.T7.2.2.5.3.9" style="padding:1.5pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T7.2.2.5.3.9.1">Average</span></td> </tr> <tr class="ltx_tr" id="S7.T7.2.2.6.4"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.6.4.1" style="padding:1.5pt 2.1pt;"><cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib50" title="">50</a>]</cite></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.6.4.2" style="padding:1.5pt 2.1pt;">93.3%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.6.4.3" style="padding:1.5pt 2.1pt;">93.5%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.6.4.4" style="padding:1.5pt 2.1pt;">93.9%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.6.4.5" style="padding:1.5pt 2.1pt;">93.7%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.6.4.6" style="padding:1.5pt 2.1pt;">93.5%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.6.4.7" style="padding:1.5pt 2.1pt;">93.1%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T7.2.2.6.4.8" style="padding:1.5pt 2.1pt;">92.7%</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S7.T7.2.2.6.4.9" style="padding:1.5pt 2.1pt;">93.4%</td> </tr> <tr class="ltx_tr" id="S7.T7.2.2.7.5"> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S7.T7.2.2.7.5.1" style="padding:1.5pt 2.1pt;">CogSys</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S7.T7.2.2.7.5.2" style="padding:1.5pt 2.1pt;">93.3%</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S7.T7.2.2.7.5.3" style="padding:1.5pt 2.1pt;">93.6%</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S7.T7.2.2.7.5.4" style="padding:1.5pt 2.1pt;">93.9%</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S7.T7.2.2.7.5.5" style="padding:1.5pt 2.1pt;">93.6%</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S7.T7.2.2.7.5.6" style="padding:1.5pt 2.1pt;">93.7%</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S7.T7.2.2.7.5.7" style="padding:1.5pt 2.1pt;">93.4%</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S7.T7.2.2.7.5.8" style="padding:1.5pt 2.1pt;">92.7%</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S7.T7.2.2.7.5.9" style="padding:1.5pt 2.1pt;">93.5%</td> </tr> </tbody> </table> </span></div> </figure> <div class="ltx_para" id="S7.SS2.p1"> <p class="ltx_p" id="S7.SS2.p1.1"><span class="ltx_text ltx_font_bold" id="S7.SS2.p1.1.1">Factorization accuracy.</span> To assess the effectiveness of our factorization and stochasticity methods, we compare CogSys with the state-of-the-art factorizer <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib50" title="">50</a>]</cite> across 14 test cases (Tab. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.T7" title="TABLE VII ‣ VII-B CogSys Algorithm Optimization Performance ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">VII</span></a>). The results show a slight improvement in factorization accuracy for object constituent attribute extraction.</p> </div> <div class="ltx_para" id="S7.SS2.p2"> <p class="ltx_p" id="S7.SS2.p2.3"><span class="ltx_text ltx_font_bold" id="S7.SS2.p2.3.1">Reasoning accuracy.</span> To evaluate CogSys algorithm optimization (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S4" title="IV CogSys: Algorithm Optimization ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">IV</span></a>), we benchmark it on five reasoning tasks in terms of the achieved accuracy (Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.SS1" title="VII-A Experimental Setup ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag"><span class="ltx_text">VII-A</span></span></a>). Tab. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.T8" title="TABLE VIII ‣ VII-B CogSys Algorithm Optimization Performance ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">VIII</span></a> uses NVSA as an example and benchmarks on RAVEN, I-RAVEN, and PGM datasets, we observe that CogSys achieves comparable reasoning accuracy through factorization and injected stochasticity. Through quantization, CogSys enables 4.75<math alttext="\times" class="ltx_Math" display="inline" id="S7.SS2.p2.1.m1.1"><semantics id="S7.SS2.p2.1.m1.1a"><mo id="S7.SS2.p2.1.m1.1.1" xref="S7.SS2.p2.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.SS2.p2.1.m1.1b"><times id="S7.SS2.p2.1.m1.1.1.cmml" xref="S7.SS2.p2.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.SS2.p2.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.SS2.p2.1.m1.1d">×</annotation></semantics></math> memory footprint savings as well as 7.71<math alttext="\times" class="ltx_Math" display="inline" id="S7.SS2.p2.2.m2.1"><semantics id="S7.SS2.p2.2.m2.1a"><mo id="S7.SS2.p2.2.m2.1.1" xref="S7.SS2.p2.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.SS2.p2.2.m2.1b"><times id="S7.SS2.p2.2.m2.1.1.cmml" xref="S7.SS2.p2.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.SS2.p2.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.SS2.p2.2.m2.1d">×</annotation></semantics></math> area and 4.02<math alttext="\times" class="ltx_Math" display="inline" id="S7.SS2.p2.3.m3.1"><semantics id="S7.SS2.p2.3.m3.1a"><mo id="S7.SS2.p2.3.m3.1.1" xref="S7.SS2.p2.3.m3.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.SS2.p2.3.m3.1b"><times id="S7.SS2.p2.3.m3.1.1.cmml" xref="S7.SS2.p2.3.m3.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.SS2.p2.3.m3.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.SS2.p2.3.m3.1d">×</annotation></semantics></math> power savings (Tab. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.T9" title="TABLE IX ‣ VII-B CogSys Algorithm Optimization Performance ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">IX</span></a>) under TSMC 28nm technology node. We get consistent observations in MIMONet and LVRF workloads under CVR and SVRT datasets as well.</p> </div> <figure class="ltx_table" id="S7.T8"> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S7.T8.9.1.1" style="font-size:90%;">TABLE VIII</span>: </span><span class="ltx_text ltx_font_bold" id="S7.T8.10.2" style="font-size:90%;">CogSys algorithm optimization performance.<span class="ltx_text ltx_font_medium" id="S7.T8.10.2.1"> Compared with NVSA, CogSys exhibits comparable reasoning capability with smaller memory footprint requirement, achieved through the proposed factorization, stochasticity, and quantization techniques.</span></span></figcaption> <div class="ltx_inline-block ltx_align_center ltx_transformed_outer" id="S7.T8.6" style="width:433.6pt;height:79.4pt;vertical-align:-0.8pt;"><span class="ltx_transformed_inner" style="transform:translate(-46.5pt,8.4pt) scale(0.823420490779442,0.823420490779442) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S7.T8.6.6"> <thead class="ltx_thead"> <tr class="ltx_tr" id="S7.T8.6.6.7.1"> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_th_row ltx_border_r ltx_border_t" id="S7.T8.6.6.7.1.1" style="padding:1pt 2.1pt;"><span class="ltx_text" id="S7.T8.6.6.7.1.1.1"> <span class="ltx_tabular ltx_align_middle" id="S7.T8.6.6.7.1.1.1.1"> <span class="ltx_tr" id="S7.T8.6.6.7.1.1.1.1.1"> <span class="ltx_td ltx_nopad_r ltx_align_center" id="S7.T8.6.6.7.1.1.1.1.1.1" style="padding:1pt 2.1pt;">Datasets</span></span> </span></span></th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S7.T8.6.6.7.1.2" style="padding:1pt 2.1pt;"> <table class="ltx_tabular ltx_align_middle" id="S7.T8.6.6.7.1.2.1"> <tr class="ltx_tr" id="S7.T8.6.6.7.1.2.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S7.T8.6.6.7.1.2.1.1.1" style="padding:1pt 2.1pt;">NVSA <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>]</cite> </td> </tr> </table> </th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_r ltx_border_t" id="S7.T8.6.6.7.1.3" style="padding:1pt 2.1pt;"> <table class="ltx_tabular ltx_align_middle" id="S7.T8.6.6.7.1.3.1"> <tr class="ltx_tr" id="S7.T8.6.6.7.1.3.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S7.T8.6.6.7.1.3.1.1.1" style="padding:1pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T8.6.6.7.1.3.1.1.1.1">CogSys (+Factorization & Stoch.)</span></td> </tr> </table> </th> <th class="ltx_td ltx_align_center ltx_th ltx_th_column ltx_border_t" id="S7.T8.6.6.7.1.4" style="padding:1pt 2.1pt;"> <table class="ltx_tabular ltx_align_middle" id="S7.T8.6.6.7.1.4.1"> <tr class="ltx_tr" id="S7.T8.6.6.7.1.4.1.1"> <td class="ltx_td ltx_nopad_r ltx_align_center" id="S7.T8.6.6.7.1.4.1.1.1" style="padding:1pt 2.1pt;"><span class="ltx_text ltx_font_bold" id="S7.T8.6.6.7.1.4.1.1.1.1">CogSys (+Quant.)</span></td> </tr> </table> </th> </tr> </thead> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S7.T8.2.2.2"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S7.T8.2.2.2.3" style="padding:1pt 2.1pt;">RAVEN <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib95" title="">95</a>]</cite> </th> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T8.2.2.2.4" style="padding:1pt 2.1pt;">98.5%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T8.1.1.1.1" style="padding:1pt 2.1pt;"> <math alttext="\text{98.7}_{\pm\text{0.3}}" class="ltx_Math" display="inline" id="S7.T8.1.1.1.1.m1.1"><semantics id="S7.T8.1.1.1.1.m1.1a"><msub id="S7.T8.1.1.1.1.m1.1.1" xref="S7.T8.1.1.1.1.m1.1.1.cmml"><mtext id="S7.T8.1.1.1.1.m1.1.1.2" xref="S7.T8.1.1.1.1.m1.1.1.2a.cmml">98.7</mtext><mrow id="S7.T8.1.1.1.1.m1.1.1.3" xref="S7.T8.1.1.1.1.m1.1.1.3.cmml"><mo id="S7.T8.1.1.1.1.m1.1.1.3a" xref="S7.T8.1.1.1.1.m1.1.1.3.cmml">±</mo><mtext id="S7.T8.1.1.1.1.m1.1.1.3.2" xref="S7.T8.1.1.1.1.m1.1.1.3.2a.cmml">0.3</mtext></mrow></msub><annotation-xml encoding="MathML-Content" id="S7.T8.1.1.1.1.m1.1b"><apply id="S7.T8.1.1.1.1.m1.1.1.cmml" xref="S7.T8.1.1.1.1.m1.1.1"><csymbol cd="ambiguous" id="S7.T8.1.1.1.1.m1.1.1.1.cmml" xref="S7.T8.1.1.1.1.m1.1.1">subscript</csymbol><ci id="S7.T8.1.1.1.1.m1.1.1.2a.cmml" xref="S7.T8.1.1.1.1.m1.1.1.2"><mtext id="S7.T8.1.1.1.1.m1.1.1.2.cmml" xref="S7.T8.1.1.1.1.m1.1.1.2">98.7</mtext></ci><apply id="S7.T8.1.1.1.1.m1.1.1.3.cmml" xref="S7.T8.1.1.1.1.m1.1.1.3"><csymbol cd="latexml" id="S7.T8.1.1.1.1.m1.1.1.3.1.cmml" xref="S7.T8.1.1.1.1.m1.1.1.3">plus-or-minus</csymbol><ci id="S7.T8.1.1.1.1.m1.1.1.3.2a.cmml" xref="S7.T8.1.1.1.1.m1.1.1.3.2"><mtext id="S7.T8.1.1.1.1.m1.1.1.3.2.cmml" mathsize="70%" xref="S7.T8.1.1.1.1.m1.1.1.3.2">0.3</mtext></ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S7.T8.1.1.1.1.m1.1c">\text{98.7}_{\pm\text{0.3}}</annotation><annotation encoding="application/x-llamapun" id="S7.T8.1.1.1.1.m1.1d">98.7 start_POSTSUBSCRIPT ± 0.3 end_POSTSUBSCRIPT</annotation></semantics></math>%</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S7.T8.2.2.2.2" style="padding:1pt 2.1pt;"> <math alttext="\text{98.6}_{\pm\text{0.4}}" class="ltx_Math" display="inline" id="S7.T8.2.2.2.2.m1.1"><semantics id="S7.T8.2.2.2.2.m1.1a"><msub id="S7.T8.2.2.2.2.m1.1.1" xref="S7.T8.2.2.2.2.m1.1.1.cmml"><mtext id="S7.T8.2.2.2.2.m1.1.1.2" xref="S7.T8.2.2.2.2.m1.1.1.2a.cmml">98.6</mtext><mrow id="S7.T8.2.2.2.2.m1.1.1.3" xref="S7.T8.2.2.2.2.m1.1.1.3.cmml"><mo id="S7.T8.2.2.2.2.m1.1.1.3a" xref="S7.T8.2.2.2.2.m1.1.1.3.cmml">±</mo><mtext id="S7.T8.2.2.2.2.m1.1.1.3.2" xref="S7.T8.2.2.2.2.m1.1.1.3.2a.cmml">0.4</mtext></mrow></msub><annotation-xml encoding="MathML-Content" id="S7.T8.2.2.2.2.m1.1b"><apply id="S7.T8.2.2.2.2.m1.1.1.cmml" xref="S7.T8.2.2.2.2.m1.1.1"><csymbol cd="ambiguous" id="S7.T8.2.2.2.2.m1.1.1.1.cmml" xref="S7.T8.2.2.2.2.m1.1.1">subscript</csymbol><ci id="S7.T8.2.2.2.2.m1.1.1.2a.cmml" xref="S7.T8.2.2.2.2.m1.1.1.2"><mtext id="S7.T8.2.2.2.2.m1.1.1.2.cmml" xref="S7.T8.2.2.2.2.m1.1.1.2">98.6</mtext></ci><apply id="S7.T8.2.2.2.2.m1.1.1.3.cmml" xref="S7.T8.2.2.2.2.m1.1.1.3"><csymbol cd="latexml" id="S7.T8.2.2.2.2.m1.1.1.3.1.cmml" xref="S7.T8.2.2.2.2.m1.1.1.3">plus-or-minus</csymbol><ci id="S7.T8.2.2.2.2.m1.1.1.3.2a.cmml" xref="S7.T8.2.2.2.2.m1.1.1.3.2"><mtext id="S7.T8.2.2.2.2.m1.1.1.3.2.cmml" mathsize="70%" xref="S7.T8.2.2.2.2.m1.1.1.3.2">0.4</mtext></ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S7.T8.2.2.2.2.m1.1c">\text{98.6}_{\pm\text{0.4}}</annotation><annotation encoding="application/x-llamapun" id="S7.T8.2.2.2.2.m1.1d">98.6 start_POSTSUBSCRIPT ± 0.4 end_POSTSUBSCRIPT</annotation></semantics></math>%</td> </tr> <tr class="ltx_tr" id="S7.T8.4.4.4"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S7.T8.4.4.4.3" style="padding:1pt 2.1pt;">I-RAVEN <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib36" title="">36</a>]</cite> </th> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T8.4.4.4.4" style="padding:1pt 2.1pt;">99.0%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T8.3.3.3.1" style="padding:1pt 2.1pt;"> <math alttext="\text{99.0}_{\pm\text{0.3}}" class="ltx_Math" display="inline" id="S7.T8.3.3.3.1.m1.1"><semantics id="S7.T8.3.3.3.1.m1.1a"><msub id="S7.T8.3.3.3.1.m1.1.1" xref="S7.T8.3.3.3.1.m1.1.1.cmml"><mtext id="S7.T8.3.3.3.1.m1.1.1.2" xref="S7.T8.3.3.3.1.m1.1.1.2a.cmml">99.0</mtext><mrow id="S7.T8.3.3.3.1.m1.1.1.3" xref="S7.T8.3.3.3.1.m1.1.1.3.cmml"><mo id="S7.T8.3.3.3.1.m1.1.1.3a" xref="S7.T8.3.3.3.1.m1.1.1.3.cmml">±</mo><mtext id="S7.T8.3.3.3.1.m1.1.1.3.2" xref="S7.T8.3.3.3.1.m1.1.1.3.2a.cmml">0.3</mtext></mrow></msub><annotation-xml encoding="MathML-Content" id="S7.T8.3.3.3.1.m1.1b"><apply id="S7.T8.3.3.3.1.m1.1.1.cmml" xref="S7.T8.3.3.3.1.m1.1.1"><csymbol cd="ambiguous" id="S7.T8.3.3.3.1.m1.1.1.1.cmml" xref="S7.T8.3.3.3.1.m1.1.1">subscript</csymbol><ci id="S7.T8.3.3.3.1.m1.1.1.2a.cmml" xref="S7.T8.3.3.3.1.m1.1.1.2"><mtext id="S7.T8.3.3.3.1.m1.1.1.2.cmml" xref="S7.T8.3.3.3.1.m1.1.1.2">99.0</mtext></ci><apply id="S7.T8.3.3.3.1.m1.1.1.3.cmml" xref="S7.T8.3.3.3.1.m1.1.1.3"><csymbol cd="latexml" id="S7.T8.3.3.3.1.m1.1.1.3.1.cmml" xref="S7.T8.3.3.3.1.m1.1.1.3">plus-or-minus</csymbol><ci id="S7.T8.3.3.3.1.m1.1.1.3.2a.cmml" xref="S7.T8.3.3.3.1.m1.1.1.3.2"><mtext id="S7.T8.3.3.3.1.m1.1.1.3.2.cmml" mathsize="70%" xref="S7.T8.3.3.3.1.m1.1.1.3.2">0.3</mtext></ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S7.T8.3.3.3.1.m1.1c">\text{99.0}_{\pm\text{0.3}}</annotation><annotation encoding="application/x-llamapun" id="S7.T8.3.3.3.1.m1.1d">99.0 start_POSTSUBSCRIPT ± 0.3 end_POSTSUBSCRIPT</annotation></semantics></math>%</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S7.T8.4.4.4.2" style="padding:1pt 2.1pt;"> <math alttext="\text{98.8}_{\pm\text{0.4}}" class="ltx_Math" display="inline" id="S7.T8.4.4.4.2.m1.1"><semantics id="S7.T8.4.4.4.2.m1.1a"><msub id="S7.T8.4.4.4.2.m1.1.1" xref="S7.T8.4.4.4.2.m1.1.1.cmml"><mtext id="S7.T8.4.4.4.2.m1.1.1.2" xref="S7.T8.4.4.4.2.m1.1.1.2a.cmml">98.8</mtext><mrow id="S7.T8.4.4.4.2.m1.1.1.3" xref="S7.T8.4.4.4.2.m1.1.1.3.cmml"><mo id="S7.T8.4.4.4.2.m1.1.1.3a" xref="S7.T8.4.4.4.2.m1.1.1.3.cmml">±</mo><mtext id="S7.T8.4.4.4.2.m1.1.1.3.2" xref="S7.T8.4.4.4.2.m1.1.1.3.2a.cmml">0.4</mtext></mrow></msub><annotation-xml encoding="MathML-Content" id="S7.T8.4.4.4.2.m1.1b"><apply id="S7.T8.4.4.4.2.m1.1.1.cmml" xref="S7.T8.4.4.4.2.m1.1.1"><csymbol cd="ambiguous" id="S7.T8.4.4.4.2.m1.1.1.1.cmml" xref="S7.T8.4.4.4.2.m1.1.1">subscript</csymbol><ci id="S7.T8.4.4.4.2.m1.1.1.2a.cmml" xref="S7.T8.4.4.4.2.m1.1.1.2"><mtext id="S7.T8.4.4.4.2.m1.1.1.2.cmml" xref="S7.T8.4.4.4.2.m1.1.1.2">98.8</mtext></ci><apply id="S7.T8.4.4.4.2.m1.1.1.3.cmml" xref="S7.T8.4.4.4.2.m1.1.1.3"><csymbol cd="latexml" id="S7.T8.4.4.4.2.m1.1.1.3.1.cmml" xref="S7.T8.4.4.4.2.m1.1.1.3">plus-or-minus</csymbol><ci id="S7.T8.4.4.4.2.m1.1.1.3.2a.cmml" xref="S7.T8.4.4.4.2.m1.1.1.3.2"><mtext id="S7.T8.4.4.4.2.m1.1.1.3.2.cmml" mathsize="70%" xref="S7.T8.4.4.4.2.m1.1.1.3.2">0.4</mtext></ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S7.T8.4.4.4.2.m1.1c">\text{98.8}_{\pm\text{0.4}}</annotation><annotation encoding="application/x-llamapun" id="S7.T8.4.4.4.2.m1.1d">98.8 start_POSTSUBSCRIPT ± 0.4 end_POSTSUBSCRIPT</annotation></semantics></math>%</td> </tr> <tr class="ltx_tr" id="S7.T8.6.6.6"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S7.T8.6.6.6.3" style="padding:1pt 2.1pt;">PGM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib11" title="">11</a>]</cite> </th> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T8.6.6.6.4" style="padding:1pt 2.1pt;">68.3%</td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T8.5.5.5.1" style="padding:1pt 2.1pt;"> <math alttext="\text{68.6}_{\pm\text{0.8}}" class="ltx_Math" display="inline" id="S7.T8.5.5.5.1.m1.1"><semantics id="S7.T8.5.5.5.1.m1.1a"><msub id="S7.T8.5.5.5.1.m1.1.1" xref="S7.T8.5.5.5.1.m1.1.1.cmml"><mtext id="S7.T8.5.5.5.1.m1.1.1.2" xref="S7.T8.5.5.5.1.m1.1.1.2a.cmml">68.6</mtext><mrow id="S7.T8.5.5.5.1.m1.1.1.3" xref="S7.T8.5.5.5.1.m1.1.1.3.cmml"><mo id="S7.T8.5.5.5.1.m1.1.1.3a" xref="S7.T8.5.5.5.1.m1.1.1.3.cmml">±</mo><mtext id="S7.T8.5.5.5.1.m1.1.1.3.2" xref="S7.T8.5.5.5.1.m1.1.1.3.2a.cmml">0.8</mtext></mrow></msub><annotation-xml encoding="MathML-Content" id="S7.T8.5.5.5.1.m1.1b"><apply id="S7.T8.5.5.5.1.m1.1.1.cmml" xref="S7.T8.5.5.5.1.m1.1.1"><csymbol cd="ambiguous" id="S7.T8.5.5.5.1.m1.1.1.1.cmml" xref="S7.T8.5.5.5.1.m1.1.1">subscript</csymbol><ci id="S7.T8.5.5.5.1.m1.1.1.2a.cmml" xref="S7.T8.5.5.5.1.m1.1.1.2"><mtext id="S7.T8.5.5.5.1.m1.1.1.2.cmml" xref="S7.T8.5.5.5.1.m1.1.1.2">68.6</mtext></ci><apply id="S7.T8.5.5.5.1.m1.1.1.3.cmml" xref="S7.T8.5.5.5.1.m1.1.1.3"><csymbol cd="latexml" id="S7.T8.5.5.5.1.m1.1.1.3.1.cmml" xref="S7.T8.5.5.5.1.m1.1.1.3">plus-or-minus</csymbol><ci id="S7.T8.5.5.5.1.m1.1.1.3.2a.cmml" xref="S7.T8.5.5.5.1.m1.1.1.3.2"><mtext id="S7.T8.5.5.5.1.m1.1.1.3.2.cmml" mathsize="70%" xref="S7.T8.5.5.5.1.m1.1.1.3.2">0.8</mtext></ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S7.T8.5.5.5.1.m1.1c">\text{68.6}_{\pm\text{0.8}}</annotation><annotation encoding="application/x-llamapun" id="S7.T8.5.5.5.1.m1.1d">68.6 start_POSTSUBSCRIPT ± 0.8 end_POSTSUBSCRIPT</annotation></semantics></math>%</td> <td class="ltx_td ltx_align_center ltx_border_t" id="S7.T8.6.6.6.2" style="padding:1pt 2.1pt;"> <math alttext="\text{68.4}_{\pm\text{1.0}}" class="ltx_Math" display="inline" id="S7.T8.6.6.6.2.m1.1"><semantics id="S7.T8.6.6.6.2.m1.1a"><msub id="S7.T8.6.6.6.2.m1.1.1" xref="S7.T8.6.6.6.2.m1.1.1.cmml"><mtext id="S7.T8.6.6.6.2.m1.1.1.2" xref="S7.T8.6.6.6.2.m1.1.1.2a.cmml">68.4</mtext><mrow id="S7.T8.6.6.6.2.m1.1.1.3" xref="S7.T8.6.6.6.2.m1.1.1.3.cmml"><mo id="S7.T8.6.6.6.2.m1.1.1.3a" xref="S7.T8.6.6.6.2.m1.1.1.3.cmml">±</mo><mtext id="S7.T8.6.6.6.2.m1.1.1.3.2" xref="S7.T8.6.6.6.2.m1.1.1.3.2a.cmml">1.0</mtext></mrow></msub><annotation-xml encoding="MathML-Content" id="S7.T8.6.6.6.2.m1.1b"><apply id="S7.T8.6.6.6.2.m1.1.1.cmml" xref="S7.T8.6.6.6.2.m1.1.1"><csymbol cd="ambiguous" id="S7.T8.6.6.6.2.m1.1.1.1.cmml" xref="S7.T8.6.6.6.2.m1.1.1">subscript</csymbol><ci id="S7.T8.6.6.6.2.m1.1.1.2a.cmml" xref="S7.T8.6.6.6.2.m1.1.1.2"><mtext id="S7.T8.6.6.6.2.m1.1.1.2.cmml" xref="S7.T8.6.6.6.2.m1.1.1.2">68.4</mtext></ci><apply id="S7.T8.6.6.6.2.m1.1.1.3.cmml" xref="S7.T8.6.6.6.2.m1.1.1.3"><csymbol cd="latexml" id="S7.T8.6.6.6.2.m1.1.1.3.1.cmml" xref="S7.T8.6.6.6.2.m1.1.1.3">plus-or-minus</csymbol><ci id="S7.T8.6.6.6.2.m1.1.1.3.2a.cmml" xref="S7.T8.6.6.6.2.m1.1.1.3.2"><mtext id="S7.T8.6.6.6.2.m1.1.1.3.2.cmml" mathsize="70%" xref="S7.T8.6.6.6.2.m1.1.1.3.2">1.0</mtext></ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S7.T8.6.6.6.2.m1.1c">\text{68.4}_{\pm\text{1.0}}</annotation><annotation encoding="application/x-llamapun" id="S7.T8.6.6.6.2.m1.1d">68.4 start_POSTSUBSCRIPT ± 1.0 end_POSTSUBSCRIPT</annotation></semantics></math>%</td> </tr> <tr class="ltx_tr" id="S7.T8.6.6.8.1"> <th class="ltx_td ltx_align_center ltx_th ltx_th_row ltx_border_b ltx_border_r ltx_border_t" id="S7.T8.6.6.8.1.1" style="padding:1pt 2.1pt;">#Parameters</th> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S7.T8.6.6.8.1.2" style="padding:1pt 2.1pt;">38 MB</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S7.T8.6.6.8.1.3" style="padding:1pt 2.1pt;">32 MB</td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S7.T8.6.6.8.1.4" style="padding:1pt 2.1pt;">8 MB</td> </tr> </tbody> </table> </span></div> </figure> <figure class="ltx_table" id="S7.T9"> <figcaption class="ltx_caption ltx_centering" style="font-size:70%;"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S7.T9.9.1.1" style="font-size:129%;">TABLE IX</span>: </span><span class="ltx_text ltx_font_bold" id="S7.T9.10.2" style="font-size:129%;">Efficiency improvement from optimized precision.<span class="ltx_text ltx_font_medium" id="S7.T9.10.2.1"> CogSys optimizes NVSA algorithm to INT8 to enable hardware area and power savings while maintaining the reasoning capability. </span></span></figcaption> <div class="ltx_inline-block ltx_align_center ltx_transformed_outer" id="S7.T9.4" style="width:433.6pt;height:233.6pt;vertical-align:-0.0pt;"><span class="ltx_transformed_inner" style="transform:translate(99.9pt,-53.8pt) scale(1.854000431243,1.854000431243) ;"> <table class="ltx_tabular ltx_align_middle" id="S7.T9.4.4"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S7.T9.4.4.5.1"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" colspan="2" id="S7.T9.4.4.5.1.1" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.5.1.1.1" style="font-size:70%;">Arithmetic Precision</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T9.4.4.5.1.2" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text ltx_font_bold" id="S7.T9.4.4.5.1.2.1" style="font-size:70%;">FP32</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T9.4.4.5.1.3" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text ltx_font_bold" id="S7.T9.4.4.5.1.3.1" style="font-size:70%;">FP8</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S7.T9.4.4.5.1.4" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text ltx_font_bold" id="S7.T9.4.4.5.1.4.1" style="font-size:70%;">INT8</span></td> </tr> <tr class="ltx_tr" id="S7.T9.4.4.6.2"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" colspan="2" id="S7.T9.4.4.6.2.1" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.6.2.1.1" style="font-size:70%;">CogSys Accuracy (NVSA=98.5%)</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T9.4.4.6.2.2" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.6.2.2.1" style="font-size:70%;">98.9%</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T9.4.4.6.2.3" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.6.2.3.1" style="font-size:70%;">98.9%</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S7.T9.4.4.6.2.4" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.6.2.4.1" style="font-size:70%;">98.7%</span></td> </tr> <tr class="ltx_tr" id="S7.T9.2.2.2"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T9.1.1.1.1" rowspan="2" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.1.1.1.1.1" style="font-size:70%;"> <span class="ltx_tabular ltx_align_middle" id="S7.T9.1.1.1.1.1.1"> <span class="ltx_tr" id="S7.T9.1.1.1.1.1.1.2"> <span class="ltx_td ltx_nopad_r ltx_align_center" id="S7.T9.1.1.1.1.1.1.2.1" style="padding-top:0.5pt;padding-bottom:0.5pt;">Reconfigurable Array</span></span> <span class="ltx_tr" id="S7.T9.1.1.1.1.1.1.1"> <span class="ltx_td ltx_nopad_r ltx_align_center" id="S7.T9.1.1.1.1.1.1.1.1" style="padding-top:0.5pt;padding-bottom:0.5pt;">16 32<math alttext="\times" class="ltx_Math" display="inline" id="S7.T9.1.1.1.1.1.1.1.1.m1.1"><semantics id="S7.T9.1.1.1.1.1.1.1.1.m1.1a"><mo id="S7.T9.1.1.1.1.1.1.1.1.m1.1.1" xref="S7.T9.1.1.1.1.1.1.1.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.T9.1.1.1.1.1.1.1.1.m1.1b"><times id="S7.T9.1.1.1.1.1.1.1.1.m1.1.1.cmml" xref="S7.T9.1.1.1.1.1.1.1.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.T9.1.1.1.1.1.1.1.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.T9.1.1.1.1.1.1.1.1.m1.1d">×</annotation></semantics></math>32 PEs</span></span> </span></span></td> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S7.T9.2.2.2.2" style="padding-top:0.5pt;padding-bottom:0.5pt;"> <span class="ltx_text" id="S7.T9.2.2.2.2.1" style="font-size:70%;">Area (mm</span><sup class="ltx_sup" id="S7.T9.2.2.2.2.2"><span class="ltx_text" id="S7.T9.2.2.2.2.2.1" style="font-size:70%;">2</span></sup><span class="ltx_text" id="S7.T9.2.2.2.2.3" style="font-size:70%;">)</span> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T9.2.2.2.3" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.2.2.2.3.1" style="font-size:70%;">28.9</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T9.2.2.2.4" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.2.2.2.4.1" style="font-size:70%;">9.9</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S7.T9.2.2.2.5" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.2.2.2.5.1" style="font-size:70%;">3.8</span></td> </tr> <tr class="ltx_tr" id="S7.T9.4.4.7.3"> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S7.T9.4.4.7.3.1" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.7.3.1.1" style="font-size:70%;">Power (mW)</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T9.4.4.7.3.2" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.7.3.2.1" style="font-size:70%;">4468.5</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T9.4.4.7.3.3" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.7.3.3.1" style="font-size:70%;">1237.8</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S7.T9.4.4.7.3.4" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.7.3.4.1" style="font-size:70%;">1104.6</span></td> </tr> <tr class="ltx_tr" id="S7.T9.3.3.3"> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T9.3.3.3.2" rowspan="2" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.3.3.3.2.1" style="font-size:70%;"> <span class="ltx_tabular ltx_align_middle" id="S7.T9.3.3.3.2.1.1"> <span class="ltx_tr" id="S7.T9.3.3.3.2.1.1.1"> <span class="ltx_td ltx_nopad_r ltx_align_center" id="S7.T9.3.3.3.2.1.1.1.1" style="padding-top:0.5pt;padding-bottom:0.5pt;">Custom SIMD Unit</span></span> <span class="ltx_tr" id="S7.T9.3.3.3.2.1.1.2"> <span class="ltx_td ltx_nopad_r ltx_align_center" id="S7.T9.3.3.3.2.1.1.2.1" style="padding-top:0.5pt;padding-bottom:0.5pt;">512 PEs</span></span> </span></span></td> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S7.T9.3.3.3.1" style="padding-top:0.5pt;padding-bottom:0.5pt;"> <span class="ltx_text" id="S7.T9.3.3.3.1.1" style="font-size:70%;">Area (mm</span><sup class="ltx_sup" id="S7.T9.3.3.3.1.2"><span class="ltx_text" id="S7.T9.3.3.3.1.2.1" style="font-size:70%;">2</span></sup><span class="ltx_text" id="S7.T9.3.3.3.1.3" style="font-size:70%;">)</span> </td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T9.3.3.3.3" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.3.3.3.3.1" style="font-size:70%;">2.01</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T9.3.3.3.4" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.3.3.3.4.1" style="font-size:70%;">0.28</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S7.T9.3.3.3.5" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.3.3.3.5.1" style="font-size:70%;">0.21</span></td> </tr> <tr class="ltx_tr" id="S7.T9.4.4.8.4"> <td class="ltx_td ltx_align_left ltx_border_r ltx_border_t" id="S7.T9.4.4.8.4.1" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.8.4.1.1" style="font-size:70%;">Power (mW)</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T9.4.4.8.4.2" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.8.4.2.1" style="font-size:70%;">297.0</span></td> <td class="ltx_td ltx_align_center ltx_border_r ltx_border_t" id="S7.T9.4.4.8.4.3" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.8.4.3.1" style="font-size:70%;">64.8</span></td> <td class="ltx_td ltx_align_center ltx_border_t" id="S7.T9.4.4.8.4.4" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.8.4.4.1" style="font-size:70%;">80.4</span></td> </tr> <tr class="ltx_tr" id="S7.T9.4.4.4"> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" colspan="2" id="S7.T9.4.4.4.2" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.4.2.1" style="font-size:70%;">Reconfig. Array Area Overhead vs. SA</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S7.T9.4.4.4.1" style="padding-top:0.5pt;padding-bottom:0.5pt;"> <math alttext="<" class="ltx_Math" display="inline" id="S7.T9.4.4.4.1.m1.1"><semantics id="S7.T9.4.4.4.1.m1.1a"><mo id="S7.T9.4.4.4.1.m1.1.1" mathsize="70%" xref="S7.T9.4.4.4.1.m1.1.1.cmml"><</mo><annotation-xml encoding="MathML-Content" id="S7.T9.4.4.4.1.m1.1b"><lt id="S7.T9.4.4.4.1.m1.1.1.cmml" xref="S7.T9.4.4.4.1.m1.1.1"></lt></annotation-xml><annotation encoding="application/x-tex" id="S7.T9.4.4.4.1.m1.1c"><</annotation><annotation encoding="application/x-llamapun" id="S7.T9.4.4.4.1.m1.1d"><</annotation></semantics></math><span class="ltx_text" id="S7.T9.4.4.4.1.1" style="font-size:70%;">1%</span> </td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_r ltx_border_t" id="S7.T9.4.4.4.3" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.4.3.1" style="font-size:70%;">4.8%</span></td> <td class="ltx_td ltx_align_center ltx_border_b ltx_border_t" id="S7.T9.4.4.4.4" style="padding-top:0.5pt;padding-bottom:0.5pt;"><span class="ltx_text" id="S7.T9.4.4.4.4.1" style="font-size:70%;">12.1%</span></td> </tr> </tbody> </table> </span></div> </figure> </section> <section class="ltx_subsection" id="S7.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection"><span class="ltx_text" id="S7.SS3.5.1.1">VII-C</span> </span><span class="ltx_text ltx_font_italic" id="S7.SS3.6.2">CogSys Accelerator Performance</span> </h3> <div class="ltx_para" id="S7.SS3.p1"> <p class="ltx_p" id="S7.SS3.p1.3"><span class="ltx_text ltx_font_bold" id="S7.SS3.p1.3.1">Performance improvement.</span> We benchmark CogSys accelerator with RTX GPU, Xeon CPU, and edge SoCs (Jetson TX2, Xavier NX) for accelerating neurosymbolic algorithms on five reasoning tasks (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.F15" title="Figure 15 ‣ VII-C CogSys Accelerator Performance ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">15</span></a>) with different difficulty levels. For GPU baseline, for neuro kernels, we use Pytorch package that leverages CUDA and cuBLAS/cuDNN libraries; for symbolic kernels, we implement custom kernels optimized for vector-symbolic operations. The workload is tiled by CuDNN in Pytorch based on block sizes that fit well in GPU memory. We observe that CogSys exhibits consistent speedup across datasets, e.g., 90.82<math alttext="\times" class="ltx_Math" display="inline" id="S7.SS3.p1.1.m1.1"><semantics id="S7.SS3.p1.1.m1.1a"><mo id="S7.SS3.p1.1.m1.1.1" xref="S7.SS3.p1.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.SS3.p1.1.m1.1b"><times id="S7.SS3.p1.1.m1.1.1.cmml" xref="S7.SS3.p1.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.SS3.p1.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.SS3.p1.1.m1.1d">×</annotation></semantics></math>/56.76<math alttext="\times" class="ltx_Math" display="inline" id="S7.SS3.p1.2.m2.1"><semantics id="S7.SS3.p1.2.m2.1a"><mo id="S7.SS3.p1.2.m2.1.1" xref="S7.SS3.p1.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.SS3.p1.2.m2.1b"><times id="S7.SS3.p1.2.m2.1.1.cmml" xref="S7.SS3.p1.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.SS3.p1.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.SS3.p1.2.m2.1d">×</annotation></semantics></math> speedup over TX2 and NX, indicating its high efficiency and scalability capability. Furthermore, CogSys achieves real-time performance (<math alttext="<" class="ltx_Math" display="inline" id="S7.SS3.p1.3.m3.1"><semantics id="S7.SS3.p1.3.m3.1a"><mo id="S7.SS3.p1.3.m3.1.1" xref="S7.SS3.p1.3.m3.1.1.cmml"><</mo><annotation-xml encoding="MathML-Content" id="S7.SS3.p1.3.m3.1b"><lt id="S7.SS3.p1.3.m3.1.1.cmml" xref="S7.SS3.p1.3.m3.1.1"></lt></annotation-xml><annotation encoding="application/x-tex" id="S7.SS3.p1.3.m3.1c"><</annotation><annotation encoding="application/x-llamapun" id="S7.SS3.p1.3.m3.1d"><</annotation></semantics></math>0.3 s) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>]</cite> for solving logical reasoning tasks, indicating that CogSys is the <span class="ltx_text ltx_font_italic" id="S7.SS3.p1.3.2">first</span> to enable real-time neurosymbolic system with superior reasoning and generalization capability, offering a promising solution for future cognitive applications.</p> </div> <figure class="ltx_figure" id="S7.F15"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="217" id="S7.F15.g1" src="x15.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S7.F15.3.1.1" style="font-size:90%;">Figure 15</span>: </span><span class="ltx_text ltx_font_bold" id="S7.F15.4.2" style="font-size:90%;">End-to-end runtime improvement.<span class="ltx_text ltx_font_medium" id="S7.F15.4.2.1"> CogSys consistently outperforms Xeon CPU, RTX GPU, and edge SoCs (TX2, NX) in end-to-end runtime evaluated on five spatial-temporal reasoning tasks.</span></span></figcaption> </figure> <div class="ltx_para" id="S7.SS3.p2"> <p class="ltx_p" id="S7.SS3.p2.5"><span class="ltx_text ltx_font_bold" id="S7.SS3.p2.5.1">Energy efficiency improvement.</span> We benchmark CogSys accelerator on energy consumption and efficiency on five reasoning tasks (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.F16" title="Figure 16 ‣ VII-C CogSys Accelerator Performance ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">16</span></a>). We can observe that CogSys accelerator achieves two orders of energy efficiency than RTX GPU, Xeon CPU, TX2, and NX, indicating its efficiency and applicability to resource-constrained neurosymbolic systems. To further assess CogSys energy efficiency in long-term deployment, we conduct consecutive tests on CogSys using mixed workloads, incorporating both high-demand and low-activity periods, with 10-second idle intervals between scenarios. On average, CogSys achieves 730<math alttext="\times" class="ltx_Math" display="inline" id="S7.SS3.p2.1.m1.1"><semantics id="S7.SS3.p2.1.m1.1a"><mo id="S7.SS3.p2.1.m1.1.1" xref="S7.SS3.p2.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.SS3.p2.1.m1.1b"><times id="S7.SS3.p2.1.m1.1.1.cmml" xref="S7.SS3.p2.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.SS3.p2.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.SS3.p2.1.m1.1d">×</annotation></semantics></math> energy efficiency compared to RTX GPU. Additionally, when compared to V100 and A100 GPUs, CogSys shows 4.43<math alttext="\times" class="ltx_Math" display="inline" id="S7.SS3.p2.2.m2.1"><semantics id="S7.SS3.p2.2.m2.1a"><mo id="S7.SS3.p2.2.m2.1.1" xref="S7.SS3.p2.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.SS3.p2.2.m2.1b"><times id="S7.SS3.p2.2.m2.1.1.cmml" xref="S7.SS3.p2.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.SS3.p2.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.SS3.p2.2.m2.1d">×</annotation></semantics></math> and 1.43<math alttext="\times" class="ltx_Math" display="inline" id="S7.SS3.p2.3.m3.1"><semantics id="S7.SS3.p2.3.m3.1a"><mo id="S7.SS3.p2.3.m3.1.1" xref="S7.SS3.p2.3.m3.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.SS3.p2.3.m3.1b"><times id="S7.SS3.p2.3.m3.1.1.cmml" xref="S7.SS3.p2.3.m3.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.SS3.p2.3.m3.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.SS3.p2.3.m3.1d">×</annotation></semantics></math> speedup, with 748<math alttext="\times" class="ltx_Math" display="inline" id="S7.SS3.p2.4.m4.1"><semantics id="S7.SS3.p2.4.m4.1a"><mo id="S7.SS3.p2.4.m4.1.1" xref="S7.SS3.p2.4.m4.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.SS3.p2.4.m4.1b"><times id="S7.SS3.p2.4.m4.1.1.cmml" xref="S7.SS3.p2.4.m4.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.SS3.p2.4.m4.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.SS3.p2.4.m4.1d">×</annotation></semantics></math> and 241<math alttext="\times" class="ltx_Math" display="inline" id="S7.SS3.p2.5.m5.1"><semantics id="S7.SS3.p2.5.m5.1a"><mo id="S7.SS3.p2.5.m5.1.1" xref="S7.SS3.p2.5.m5.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.SS3.p2.5.m5.1b"><times id="S7.SS3.p2.5.m5.1.1.cmml" xref="S7.SS3.p2.5.m5.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.SS3.p2.5.m5.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.SS3.p2.5.m5.1d">×</annotation></semantics></math> energy efficiency, respectively.</p> </div> <figure class="ltx_figure" id="S7.F16"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="182" id="S7.F16.g1" src="x16.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S7.F16.3.1.1" style="font-size:90%;">Figure 16</span>: </span><span class="ltx_text ltx_font_bold" id="S7.F16.4.2" style="font-size:90%;">Energy efficiency improvement.<span class="ltx_text ltx_font_medium" id="S7.F16.4.2.1"> CogSys consistently reduces energy consumption and improves performance per watt compared to CPU and GPUs, evaluated from five reasoning tasks.</span></span></figcaption> </figure> <div class="ltx_para" id="S7.SS3.p3"> <p class="ltx_p" id="S7.SS3.p3.2"><span class="ltx_text ltx_font_bold" id="S7.SS3.p3.2.1">Comparison with TPU/GPU.</span> We benchmark symbolic circular convolution over TPU-like SA (with the same number of PEs) and GPU under different vector dimensions and number of operations (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.F17" title="Figure 17 ‣ VII-C CogSys Accelerator Performance ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">17</span></a>). We observe CogSys reconfigurable array achieves up to 75.96<math alttext="\times" class="ltx_Math" display="inline" id="S7.SS3.p3.1.m1.1"><semantics id="S7.SS3.p3.1.m1.1a"><mo id="S7.SS3.p3.1.m1.1.1" xref="S7.SS3.p3.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.SS3.p3.1.m1.1b"><times id="S7.SS3.p3.1.m1.1.1.cmml" xref="S7.SS3.p3.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.SS3.p3.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.SS3.p3.1.m1.1d">×</annotation></semantics></math> and 18.90<math alttext="\times" class="ltx_Math" display="inline" id="S7.SS3.p3.2.m2.1"><semantics id="S7.SS3.p3.2.m2.1a"><mo id="S7.SS3.p3.2.m2.1.1" xref="S7.SS3.p3.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.SS3.p3.2.m2.1b"><times id="S7.SS3.p3.2.m2.1.1.cmml" xref="S7.SS3.p3.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.SS3.p3.2.m2.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.SS3.p3.2.m2.1d">×</annotation></semantics></math> speedup over TPU-like SA and GPU, and is effective in both low-dimension and high-dimension vector-symbolic operations.</p> </div> <figure class="ltx_figure" id="S7.F17"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="306" id="S7.F17.g1" src="x17.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S7.F17.7.3.1" style="font-size:90%;">Figure 17</span>: </span><span class="ltx_text ltx_font_bold" id="S7.F17.4.2" style="font-size:90%;">Improved efficiency over TPU/GPU.<span class="ltx_text ltx_font_medium" id="S7.F17.4.2.2"> Speedup comparison of circular convolution on CogSys, TPU-like systolic array and GPU, CogSys shows up to 75.96<math alttext="\times" class="ltx_Math" display="inline" id="S7.F17.3.1.1.m1.1"><semantics id="S7.F17.3.1.1.m1.1b"><mo id="S7.F17.3.1.1.m1.1.1" xref="S7.F17.3.1.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.F17.3.1.1.m1.1c"><times id="S7.F17.3.1.1.m1.1.1.cmml" xref="S7.F17.3.1.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.F17.3.1.1.m1.1d">\times</annotation><annotation encoding="application/x-llamapun" id="S7.F17.3.1.1.m1.1e">×</annotation></semantics></math> and 18.90<math alttext="\times" class="ltx_Math" display="inline" id="S7.F17.4.2.2.m2.1"><semantics id="S7.F17.4.2.2.m2.1b"><mo id="S7.F17.4.2.2.m2.1.1" xref="S7.F17.4.2.2.m2.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.F17.4.2.2.m2.1c"><times id="S7.F17.4.2.2.m2.1.1.cmml" xref="S7.F17.4.2.2.m2.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.F17.4.2.2.m2.1d">\times</annotation><annotation encoding="application/x-llamapun" id="S7.F17.4.2.2.m2.1e">×</annotation></semantics></math> runtime improvement.</span></span></figcaption> </figure> <div class="ltx_para" id="S7.SS3.p4"> <p class="ltx_p" id="S7.SS3.p4.1"><span class="ltx_text ltx_font_bold" id="S7.SS3.p4.1.1">Comparison with ML accelerators.</span> We benchmark the runtime of neural and symbolic operations on TPU <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib41" title="">41</a>]</cite>, Gemmini <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib28" title="">28</a>]</cite>, and MTIA <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib19" title="">19</a>]</cite>-like architecture over different neurosymbolic models and tasks (Fig. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.F18" title="Figure 18 ‣ VII-C CogSys Accelerator Performance ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">18</span></a>). For a fair comparison, we keep all hardware configurations with the same number of PEs. Compared with current ML accelerators, we observe that CogSys achieves similar performance in neural operations, while exhibiting superior symbolic operation efficiency thus end-to-end speedup in neurosymbolic systems. Additionally, we compare CogSys with hyperdimensional computing accelerator <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib37" title="">37</a>]</cite> across models and tasks and observe a 7.2<math alttext="\times" class="ltx_Math" display="inline" id="S7.SS3.p4.1.m1.1"><semantics id="S7.SS3.p4.1.m1.1a"><mo id="S7.SS3.p4.1.m1.1.1" xref="S7.SS3.p4.1.m1.1.1.cmml">×</mo><annotation-xml encoding="MathML-Content" id="S7.SS3.p4.1.m1.1b"><times id="S7.SS3.p4.1.m1.1.1.cmml" xref="S7.SS3.p4.1.m1.1.1"></times></annotation-xml><annotation encoding="application/x-tex" id="S7.SS3.p4.1.m1.1c">\times</annotation><annotation encoding="application/x-llamapun" id="S7.SS3.p4.1.m1.1d">×</annotation></semantics></math> average speedup. This improvement is mainly due to the lack of efficient neuro and symbolic support and circular convolution handling in hyperdimensional computing architectures.</p> </div> <figure class="ltx_figure" id="S7.F18"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="232" id="S7.F18.g1" src="x18.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S7.F18.3.1.1" style="font-size:90%;">Figure 18</span>: </span><span class="ltx_text ltx_font_bold" id="S7.F18.4.2" style="font-size:90%;">Improved efficiency over ML accelerators.<span class="ltx_text ltx_font_medium" id="S7.F18.4.2.1"> Speedup comparison of neural, symbolic, and end-to-end neurosymbolic system over TPU <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib41" title="">41</a>]</cite>, Gemmini <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib28" title="">28</a>]</cite>, and MTIA <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib19" title="">19</a>]</cite>-like architecture.</span></span></figcaption> </figure> <figure class="ltx_figure" id="S7.F19"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="216" id="S7.F19.g1" src="x19.png" width="830"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure"><span class="ltx_text" id="S7.F19.3.1.1" style="font-size:90%;">Figure 19</span>: </span><span class="ltx_text ltx_font_bold" id="S7.F19.4.2" style="font-size:90%;">Ablation Study on CogSys Accelerator Techniques<span class="ltx_text ltx_font_medium" id="S7.F19.4.2.1">. The runtime achieved by CogSys w/o the adaptive scheduling (adSCH), scalable array (SO), and reconfigurable PE (nsPE) across tasks.</span></span></figcaption> </figure> <div class="ltx_para" id="S7.SS3.p5"> <p class="ltx_p" id="S7.SS3.p5.1"><span class="ltx_text ltx_font_bold" id="S7.SS3.p5.1.1">Ablation study on the proposed hardware techniques.</span> As illustrated in Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S5" title="V CogSys: Hardware Architecture ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">V</span></a> and Sec. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S6" title="VI CogSys: Scheduling Strategy ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">VI</span></a>, CogSys features reconfigurable neuro/symbolic PE with bubble streaming dataflow and spatial-temporal mapping, scalable array architecture, and adaptive scheduling strategy to reduce compute latency and memory footprint for neural and symbolic kernels. To verify the effectiveness of our proposed methods, we summarize the runtime of CogSys w/o the scheduling, scalable architecture, and reconfigurable PE in Fig <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.F19" title="Figure 19 ‣ VII-C CogSys Accelerator Performance ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">19</span></a>. In particular, the proposed scheduling strategy can trim down the runtime by 28% on average. Additionally, with the proposed scalable array and reconfigurable PE, the runtime reduction ratio can be further enlarged to 61% and 71%, indicating that both proposed techniques are necessary for our CogSys accelerator to achieve the desired efficient and scalable reasoning capability.</p> </div> <div class="ltx_para" id="S7.SS3.p6"> <p class="ltx_p" id="S7.SS3.p6.1"><span class="ltx_text ltx_font_bold" id="S7.SS3.p6.1.1">Ablation study of necessity of co-design.</span> To the best of our knowledge, our proposed CogSys, as an algorithm-hardware co-design framework, is the first that has achieved efficient and scalable on-device neurosymbolic-based system. To verify the necessity of such co-design strategy, we summarize the runtime of our CogSys w/o the proposed algorithm optimization or hardware techniques in Tab. <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#S7.T10" title="TABLE X ‣ VII-C CogSys Accelerator Performance ‣ VII Evaluation Results ‣ CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design"><span class="ltx_text ltx_ref_tag">X</span></a>. Specifically, with our proposed CogSys algorithm optimization, we can trim down the runtime to 89.5% as compared to NVSA <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>]</cite> on the same Xavier NX hardware and RAVEN task. Moreover, with both proposed CogSys algorithm optimization and accelerator, the runtime can be reduced to 1.76%, indicating the necessity of the co-design strategy of CogSys framework.</p> </div> <figure class="ltx_table" id="S7.T10"> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_table"><span class="ltx_text" id="S7.T10.3.1.1" style="font-size:90%;">TABLE X</span>: </span><span class="ltx_text ltx_font_bold" id="S7.T10.4.2" style="font-size:90%;">Ablation study of necessity of co-design.<span class="ltx_text ltx_font_medium" id="S7.T10.4.2.1"> The normalized runtime achieved by CogSys framework w/o the proposed algorithm optimization or hardware techniques on different tasks.</span></span></figcaption> <div class="ltx_inline-block ltx_align_center ltx_transformed_outer" id="S7.T10.5" style="width:433.6pt;height:48pt;vertical-align:-0.5pt;"><span class="ltx_transformed_inner" style="transform:translate(-194.1pt,21.3pt) scale(0.527642683048446,0.527642683048446) ;"> <table class="ltx_tabular ltx_guessed_headers ltx_align_middle" id="S7.T10.5.1"> <tbody class="ltx_tbody"> <tr class="ltx_tr" id="S7.T10.5.1.1.1"> <th class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S7.T10.5.1.1.1.1" style="padding:1pt 1.6pt;">Neurosymbolic Cognitive Solution</th> <td class="ltx_td ltx_nopad_l ltx_align_center ltx_border_t" colspan="5" id="S7.T10.5.1.1.1.2" style="padding:1pt 1.6pt;">Normalized Runtime (%) on</td> </tr> <tr class="ltx_tr" id="S7.T10.5.1.2.2"> <th class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_th ltx_th_row ltx_border_r" id="S7.T10.5.1.2.2.1" style="padding:1pt 1.6pt;">Algorithm @ Hardware</th> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S7.T10.5.1.2.2.2" style="padding:1pt 1.6pt;">RAVEN <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib95" title="">95</a>]</cite> </td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S7.T10.5.1.2.2.3" style="padding:1pt 1.6pt;">I-RAVEN <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib36" title="">36</a>]</cite> </td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S7.T10.5.1.2.2.4" style="padding:1pt 1.6pt;">PGM <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib11" title="">11</a>]</cite> </td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S7.T10.5.1.2.2.5" style="padding:1pt 1.6pt;">CVR <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib94" title="">94</a>]</cite> </td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center" id="S7.T10.5.1.2.2.6" style="padding:1pt 1.6pt;">SVRT <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib20" title="">20</a>]</cite> </td> </tr> <tr class="ltx_tr" id="S7.T10.5.1.3.3"> <th class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S7.T10.5.1.3.3.1" style="padding:1pt 1.6pt;">NVSA <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>]</cite> @ Xavier NX</th> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S7.T10.5.1.3.3.2" style="padding:1pt 1.6pt;">100</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S7.T10.5.1.3.3.3" style="padding:1pt 1.6pt;">100</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S7.T10.5.1.3.3.4" style="padding:1pt 1.6pt;">100</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S7.T10.5.1.3.3.5" style="padding:1pt 1.6pt;">100</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S7.T10.5.1.3.3.6" style="padding:1pt 1.6pt;">100</td> </tr> <tr class="ltx_tr" id="S7.T10.5.1.4.4"> <th class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_th ltx_th_row ltx_border_r ltx_border_t" id="S7.T10.5.1.4.4.1" style="padding:1pt 1.6pt;"><span class="ltx_text ltx_font_bold" id="S7.T10.5.1.4.4.1.1">CogSys Algorithm @ Xavier NX</span></th> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S7.T10.5.1.4.4.2" style="padding:1pt 1.6pt;">89.5%</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S7.T10.5.1.4.4.3" style="padding:1pt 1.6pt;">88.9%</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S7.T10.5.1.4.4.4" style="padding:1pt 1.6pt;">90.7%</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S7.T10.5.1.4.4.5" style="padding:1pt 1.6pt;">87.6%</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_t" id="S7.T10.5.1.4.4.6" style="padding:1pt 1.6pt;">88.4%</td> </tr> <tr class="ltx_tr" id="S7.T10.5.1.5.5"> <th class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_th ltx_th_row ltx_border_b ltx_border_r ltx_border_t" id="S7.T10.5.1.5.5.1" style="padding:1pt 1.6pt;"><span class="ltx_text ltx_font_bold" id="S7.T10.5.1.5.5.1.1">CogSys Algorithm @ CogSys Accelerator</span></th> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_t" id="S7.T10.5.1.5.5.2" style="padding:1pt 1.6pt;">1.76%</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_t" id="S7.T10.5.1.5.5.3" style="padding:1pt 1.6pt;">1.74%</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_t" id="S7.T10.5.1.5.5.4" style="padding:1pt 1.6pt;">1.78%</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_t" id="S7.T10.5.1.5.5.5" style="padding:1pt 1.6pt;">1.72%</td> <td class="ltx_td ltx_nopad_l ltx_nopad_r ltx_align_center ltx_border_b ltx_border_t" id="S7.T10.5.1.5.5.6" style="padding:1pt 1.6pt;">1.69%</td> </tr> </tbody> </table> </span></div> </figure> </section> </section> <section class="ltx_section" id="S8"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">VIII </span><span class="ltx_text ltx_font_smallcaps" id="S8.1.1">Related Work</span> </h2> <div class="ltx_para" id="S8.p1"> <p class="ltx_p" id="S8.p1.1"><span class="ltx_text ltx_font_bold" id="S8.p1.1.1">Neurosymbolic AI.</span> Neurosymbolic AI holds significant potential for enhancing trustworthiness, reasoning, and robustness of next-generation cognitive applications where agents can make decisions in an explainable manner, and intelligence is pervasively embedded in human-AI interactions <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib97" title="">97</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib10" title="">10</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib34" title="">34</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib18" title="">18</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib33" title="">33</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib66" title="">66</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib56" title="">56</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib92" title="">92</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib93" title="">93</a>]</cite>. Current neurosymbolic research mostly focuses on algorithms; however, the lack of attention to its inefficiency on off-the-shelf hardware may hinder neurosymbolic AI development in the long run. <em class="ltx_emph ltx_font_italic" id="S8.p1.1.2">CogSys thus takes the first step to understand neurosymbolic architectural and system characteristics and proposes a co-design framework to make it more efficient and deployable at scale.</em></p> </div> <div class="ltx_para" id="S8.p2"> <p class="ltx_p" id="S8.p2.1"><span class="ltx_text ltx_font_bold" id="S8.p2.1.1">Accelerators for emerging applications.</span> With the slowdown of technology scaling, custom architecture is a pragmatic approach to ensure simultaneous improvements in performance and efficiency. Beyond DNNs <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib71" title="">71</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib99" title="">99</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib80" title="">80</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib40" title="">40</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib77" title="">77</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib21" title="">21</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib69" title="">69</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib81" title="">81</a>]</cite>, hardware acceleration has been found effective in emerging applications such as genome sequencing <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib23" title="">23</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib22" title="">22</a>]</cite>, graph <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib25" title="">25</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib75" title="">75</a>]</cite>, mobile vision <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib54" title="">54</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib55" title="">55</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib85" title="">85</a>]</cite>, drone <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib48" title="">48</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib14" title="">14</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib47" title="">47</a>]</cite>, robotics <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib63" title="">63</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib58" title="">58</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib51" title="">51</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib30" title="">30</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib52" title="">52</a>]</cite>, privacy and security <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib73" title="">73</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib74" title="">74</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib27" title="">27</a>, <a class="ltx_ref" href="https://arxiv.org/html/2503.01162v2#bib.bib62" title="">62</a>]</cite>, etc. <span class="ltx_text ltx_font_italic" id="S8.p2.1.2">Despite the presence of these accelerators, CogSys is the first proposal to offer reconfigurable support for both neural and symbolic kernels, facilitating efficient and scalable neurosymbolic systems.</span></p> </div> </section> <section class="ltx_section" id="S9"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">IX </span><span class="ltx_text ltx_font_smallcaps" id="S9.1.1">Conclusion</span> </h2> <div class="ltx_para" id="S9.p1"> <p class="ltx_p" id="S9.p1.1">To enable efficient and scalable neurosymbolic AI towards real-time cognitive applications, we propose CogSys, the first algorithm-hardware co-design framework dedicated to accelerating neurosymbolic AI. CogSys identifies the unique opportunities for neurosymbolic acceleration, including efficient factorization, reconfigurable neural/symbolic PE, bubble streaming dataflow, and adaptive scheduler, leveraging which we develop algorithm optimizations and dedicated accelerator. We believe CogSys can open up an exciting perspective toward efficient and scalable cognitive reasoning systems at scale. </p> </div> </section> <section class="ltx_section" id="Sx1"> <h2 class="ltx_title ltx_font_smallcaps ltx_title_section">Acknowledgements</h2> <div class="ltx_para" id="Sx1.p1"> <p class="ltx_p" id="Sx1.p1.1">This work was supported in part by CoCoSys, one of seven centers in JUMP 2.0, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.</p> </div> </section> <section class="ltx_bibliography" id="bib"> <h2 class="ltx_title ltx_title_bibliography">References</h2> <ul class="ltx_biblist"> <li class="ltx_bibitem" id="bib.bib1"> <span class="ltx_tag ltx_tag_bibitem">[1]</span> <span class="ltx_bibblock"> “Neurips workshop on neuro causal and symbolic ai (ncsi),” 2022. [Online]. Available: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://neurips.cc/virtual/2022/workshop/50011" title="">https://neurips.cc/virtual/2022/workshop/50011</a> </span> </li> <li class="ltx_bibitem" id="bib.bib2"> <span class="ltx_tag ltx_tag_bibitem">[2]</span> <span class="ltx_bibblock"> “Aaai tutorial on advances in neuro symbolic reasoning and learning,” 2023. [Online]. Available: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://neurosymbolic.asu.edu/2023-aaai-tutorial-advances-in-neuro-symbolic-reasoning/" title="">https://neurosymbolic.asu.edu/2023-aaai-tutorial-advances-in-neuro-symbolic-reasoning/</a> </span> </li> <li class="ltx_bibitem" id="bib.bib3"> <span class="ltx_tag ltx_tag_bibitem">[3]</span> <span class="ltx_bibblock"> “Ibm neuro-symbolic ai workshop,” 2023. [Online]. Available: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://ibm.github.io/neuro-symbolic-ai/" title="">https://ibm.github.io/neuro-symbolic-ai/</a> </span> </li> <li class="ltx_bibitem" id="bib.bib4"> <span class="ltx_tag ltx_tag_bibitem">[4]</span> <span class="ltx_bibblock"> “Neuro-symbolic ai summer school,” 2023. [Online]. Available: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://neurosymbolic.github.io/nsss2023/" title="">https://neurosymbolic.github.io/nsss2023/</a> </span> </li> <li class="ltx_bibitem" id="bib.bib5"> <span class="ltx_tag ltx_tag_bibitem">[5]</span> <span class="ltx_bibblock"> “1st international conference on neuro-symbolic systems (neus),” May 2024. [Online]. Available: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://www.neusconference.org" title="">https://www.neusconference.org</a> </span> </li> <li class="ltx_bibitem" id="bib.bib6"> <span class="ltx_tag ltx_tag_bibitem">[6]</span> <span class="ltx_bibblock"> “Aaai worshop on neuro-symbolic learning and reasoning in the era of large language models (nuclear),” 2024. [Online]. Available: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://nuclear-workshop.github.io/aaai2024/" title="">https://nuclear-workshop.github.io/aaai2024/</a> </span> </li> <li class="ltx_bibitem" id="bib.bib7"> <span class="ltx_tag ltx_tag_bibitem">[7]</span> <span class="ltx_bibblock"> “Ijcai first international workshop on logical foundations of neuro-symbolic ai (lnsai 2024),” 2024. [Online]. Available: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://sites.google.com/view/lnsai2024/" title="">https://sites.google.com/view/lnsai2024/</a> </span> </li> <li class="ltx_bibitem" id="bib.bib8"> <span class="ltx_tag ltx_tag_bibitem">[8]</span> <span class="ltx_bibblock"> “International conference on neuro-symbolic learning and reasoning (nesy),” 2024. [Online]. Available: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://sites.google.com/view/nesy2024" title="">https://sites.google.com/view/nesy2024</a> </span> </li> <li class="ltx_bibitem" id="bib.bib9"> <span class="ltx_tag ltx_tag_bibitem">[9]</span> <span class="ltx_bibblock"> A. G. Anderson, K. Ratnam, A. Roorda, and B. A. Olshausen, “High-acuity vision from retinal image motion,” <em class="ltx_emph ltx_font_italic" id="bib.bib9.1.1">Journal of vision</em>, vol. 20, no. 7, pp. 34–34, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib10"> <span class="ltx_tag ltx_tag_bibitem">[10]</span> <span class="ltx_bibblock"> S. Badreddine, A. d. Garcez, L. Serafini, and M. Spranger, “Logic tensor networks,” <em class="ltx_emph ltx_font_italic" id="bib.bib10.1.1">Artificial Intelligence</em>, vol. 303, p. 103649, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib11"> <span class="ltx_tag ltx_tag_bibitem">[11]</span> <span class="ltx_bibblock"> D. Barrett, F. Hill, A. Santoro, A. Morcos, and T. Lillicrap, “Measuring abstract reasoning in neural networks,” in <em class="ltx_emph ltx_font_italic" id="bib.bib11.1.1">International conference on machine learning (ICML)</em>. PMLR, 2018, pp. 511–520. </span> </li> <li class="ltx_bibitem" id="bib.bib12"> <span class="ltx_tag ltx_tag_bibitem">[12]</span> <span class="ltx_bibblock"> G. Booch, F. Fabiano, L. Horesh, K. Kate, J. Lenchner, N. Linck, A. Loreggia, K. Murgesan, N. Mattei, F. Rossi <em class="ltx_emph ltx_font_italic" id="bib.bib12.1.1">et al.</em>, “Thinking fast and slow in ai,” in <em class="ltx_emph ltx_font_italic" id="bib.bib12.2.2">Proceedings of the AAAI Conference on Artificial Intelligence</em>, vol. 35, no. 17, 2021, pp. 15 042–15 046. </span> </li> <li class="ltx_bibitem" id="bib.bib13"> <span class="ltx_tag ltx_tag_bibitem">[13]</span> <span class="ltx_bibblock"> Cadence, “Innovus implementation system - cadence,” <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://www.cadence.com/en_US/home/tools/digital-design-and-signoff/soc-implementation-and-floorplanning/innovus-implementation-system.html" title="">https://www.cadence.com/en_US/home/tools/digital-design-and-signoff/soc-implementation-and-floorplanning/innovus-implementation-system.html</a>. </span> </li> <li class="ltx_bibitem" id="bib.bib14"> <span class="ltx_tag ltx_tag_bibitem">[14]</span> <span class="ltx_bibblock"> M. Chang, A. S. Lele, S. D. Spetalnick, B. Crafton, S. Konno, Z. Wan, A. Bhat, W.-S. Khwa, Y.-D. Chih, M.-F. Chang <em class="ltx_emph ltx_font_italic" id="bib.bib14.1.1">et al.</em>, “A 73.53 tops/w 14.74 tops heterogeneous rram in-memory and sram near-memory soc for hybrid frame and event-based target tracking,” in <em class="ltx_emph ltx_font_italic" id="bib.bib14.2.2">2023 IEEE International Solid-State Circuits Conference (ISSCC)</em>. IEEE, 2023, pp. 426–428. </span> </li> <li class="ltx_bibitem" id="bib.bib15"> <span class="ltx_tag ltx_tag_bibitem">[15]</span> <span class="ltx_bibblock"> Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” <em class="ltx_emph ltx_font_italic" id="bib.bib15.1.1">IEEE Journal of Solid-State Circuits (JSSC)</em>, vol. 52, no. 1, pp. 127–138, 2016. </span> </li> <li class="ltx_bibitem" id="bib.bib16"> <span class="ltx_tag ltx_tag_bibitem">[16]</span> <span class="ltx_bibblock"> E. Chung, J. Fowers, K. Ovtcharov, M. Papamichael, A. Caulfield, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman <em class="ltx_emph ltx_font_italic" id="bib.bib16.1.1">et al.</em>, “Serving dnns in real time at datacenter scale with project brainwave,” <em class="ltx_emph ltx_font_italic" id="bib.bib16.2.2">IEEE Micro</em>, vol. 38, no. 2, pp. 8–20, 2018. </span> </li> <li class="ltx_bibitem" id="bib.bib17"> <span class="ltx_tag ltx_tag_bibitem">[17]</span> <span class="ltx_bibblock"> K. Daniel, <em class="ltx_emph ltx_font_italic" id="bib.bib17.1.1">Thinking, fast and slow</em>, 2017. </span> </li> <li class="ltx_bibitem" id="bib.bib18"> <span class="ltx_tag ltx_tag_bibitem">[18]</span> <span class="ltx_bibblock"> H. Dong, J. Mao, T. Lin, C. Wang, L. Li, and D. Zhou, “Neural logic machines,” in <em class="ltx_emph ltx_font_italic" id="bib.bib18.1.1">International Conference on Learning Representations (ICLR)</em>, 2019. </span> </li> <li class="ltx_bibitem" id="bib.bib19"> <span class="ltx_tag ltx_tag_bibitem">[19]</span> <span class="ltx_bibblock"> A. Firoozshahian, J. Coburn, R. Levenstein, R. Nattoji, A. Kamath, O. Wu, G. Grewal, H. Aepala, B. Jakka, B. Dreyer <em class="ltx_emph ltx_font_italic" id="bib.bib19.1.1">et al.</em>, “Mtia: First generation silicon targeting meta’s recommendation systems,” in <em class="ltx_emph ltx_font_italic" id="bib.bib19.2.2">Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA)</em>, 2023, pp. 1–13. </span> </li> <li class="ltx_bibitem" id="bib.bib20"> <span class="ltx_tag ltx_tag_bibitem">[20]</span> <span class="ltx_bibblock"> F. Fleuret, T. Li, C. Dubout, E. K. Wampler, S. Yantis, and D. Geman, “Comparing machines and humans on a visual categorization test,” <em class="ltx_emph ltx_font_italic" id="bib.bib20.1.1">Proceedings of the National Academy of Sciences</em>, vol. 108, no. 43, pp. 17 621–17 625, 2011. </span> </li> <li class="ltx_bibitem" id="bib.bib21"> <span class="ltx_tag ltx_tag_bibitem">[21]</span> <span class="ltx_bibblock"> J. Fowers, K. Ovtcharov, M. Papamichael, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, L. Adams, M. Ghandi <em class="ltx_emph ltx_font_italic" id="bib.bib21.1.1">et al.</em>, “A configurable cloud-scale dnn processor for real-time ai,” in <em class="ltx_emph ltx_font_italic" id="bib.bib21.2.2">2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)</em>. IEEE, 2018, pp. 1–14. </span> </li> <li class="ltx_bibitem" id="bib.bib22"> <span class="ltx_tag ltx_tag_bibitem">[22]</span> <span class="ltx_bibblock"> D. Fujiki, A. Subramaniyan, T. Zhang, Y. Zeng, R. Das, D. Blaauw, and S. Narayanasamy, “Genax: A genome sequencing accelerator,” in <em class="ltx_emph ltx_font_italic" id="bib.bib22.1.1">2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)</em>. IEEE, 2018, pp. 69–82. </span> </li> <li class="ltx_bibitem" id="bib.bib23"> <span class="ltx_tag ltx_tag_bibitem">[23]</span> <span class="ltx_bibblock"> D. Fujiki, S. Wu, N. Ozog, K. Goliya, D. Blaauw, S. Narayanasamy, and R. Das, “Seedex: A genome sequencing accelerator for optimal alignments in subminimal space,” in <em class="ltx_emph ltx_font_italic" id="bib.bib23.1.1">2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)</em>. IEEE, 2020, pp. 937–950. </span> </li> <li class="ltx_bibitem" id="bib.bib24"> <span class="ltx_tag ltx_tag_bibitem">[24]</span> <span class="ltx_bibblock"> P. M. Furlong and C. Eliasmith, “Modelling neural probabilistic computation using vector symbolic architectures,” <em class="ltx_emph ltx_font_italic" id="bib.bib24.1.1">Cognitive Neurodynamics</em>, pp. 1–24, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib25"> <span class="ltx_tag ltx_tag_bibitem">[25]</span> <span class="ltx_bibblock"> C. Gao, M. Afarin, S. Rahman, N. Abu-Ghazaleh, and R. Gupta, “Mega evolving graph accelerator,” in <em class="ltx_emph ltx_font_italic" id="bib.bib25.1.1">Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)</em>, 2023, pp. 310–323. </span> </li> <li class="ltx_bibitem" id="bib.bib26"> <span class="ltx_tag ltx_tag_bibitem">[26]</span> <span class="ltx_bibblock"> A. d. Garcez and L. C. Lamb, “Neurosymbolic ai: The 3 rd wave,” <em class="ltx_emph ltx_font_italic" id="bib.bib26.1.1">Artificial Intelligence Review</em>, pp. 1–20, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib27"> <span class="ltx_tag ltx_tag_bibitem">[27]</span> <span class="ltx_bibblock"> R. Geelen, M. Van Beirendonck, H. V. Pereira, B. Huffman, T. McAuley, B. Selfridge, D. Wagner, G. Dimou, I. Verbauwhede, F. Vercauteren <em class="ltx_emph ltx_font_italic" id="bib.bib27.1.1">et al.</em>, “Basalisc: Programmable asynchronous hardware accelerator for bgv fully homomorphic encryption,” <em class="ltx_emph ltx_font_italic" id="bib.bib27.2.2">Cryptology ePrint Archive</em>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib28"> <span class="ltx_tag ltx_tag_bibitem">[28]</span> <span class="ltx_bibblock"> H. Genc, S. Kim, A. Amid, A. Haj-Ali, V. Iyer, P. Prakash, J. Zhao, D. Grubb, H. Liew, H. Mao <em class="ltx_emph ltx_font_italic" id="bib.bib28.1.1">et al.</em>, “Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration,” in <em class="ltx_emph ltx_font_italic" id="bib.bib28.2.2">2021 58th ACM/IEEE Design Automation Conference (DAC)</em>. IEEE, 2021, pp. 769–774. </span> </li> <li class="ltx_bibitem" id="bib.bib29"> <span class="ltx_tag ltx_tag_bibitem">[29]</span> <span class="ltx_bibblock"> C. Han, J. Mao, C. Gan, J. Tenenbaum, and J. Wu, “Visual concept-metaconcept learning,” <em class="ltx_emph ltx_font_italic" id="bib.bib29.1.1">Advances in Neural Information Processing Systems (NeurIPS)</em>, vol. 32, 2019. </span> </li> <li class="ltx_bibitem" id="bib.bib30"> <span class="ltx_tag ltx_tag_bibitem">[30]</span> <span class="ltx_bibblock"> Y. Hao, Y. Gan, B. Yu, Q. Liu, Y. Han, Z. Wan, and S. Liu, “Orianna: An accelerator generation framework for optimization-based robotic applications,” in <em class="ltx_emph ltx_font_italic" id="bib.bib30.1.1">Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Volume 2</em>, 2024, pp. 813–829. </span> </li> <li class="ltx_bibitem" id="bib.bib31"> <span class="ltx_tag ltx_tag_bibitem">[31]</span> <span class="ltx_bibblock"> V. Hassija, V. Chamola, A. Mahapatra, A. Singal, D. Goel, K. Huang, S. Scardapane, I. Spinelli, M. Mahmud, and A. Hussain, “Interpreting black-box models: a review on explainable artificial intelligence,” <em class="ltx_emph ltx_font_italic" id="bib.bib31.1.1">Cognitive Computation</em>, vol. 16, no. 1, pp. 45–74, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib32"> <span class="ltx_tag ltx_tag_bibitem">[32]</span> <span class="ltx_bibblock"> M. Hersche, F. Di Stefano, T. Hofmann, A. Sebastian, and A. Rahimi, “Probabilistic abduction for visual abstract reasoning via learning rules in vector-symbolic architectures,” <em class="ltx_emph ltx_font_italic" id="bib.bib32.1.1">Advances in Neural Information Processing Systems (NeurIPS)</em>, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib33"> <span class="ltx_tag ltx_tag_bibitem">[33]</span> <span class="ltx_bibblock"> M. Hersche, M. Zeqiri, L. Benini, A. Sebastian, and A. Rahimi, “A neuro-vector-symbolic architecture for solving raven’s progressive matrices,” <em class="ltx_emph ltx_font_italic" id="bib.bib33.1.1">Nature Machine Intelligence</em>, vol. 5, no. 4, pp. 363–375, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib34"> <span class="ltx_tag ltx_tag_bibitem">[34]</span> <span class="ltx_bibblock"> P. Hohenecker and T. Lukas, “Ontology reasoning with deep neural networks,” <em class="ltx_emph ltx_font_italic" id="bib.bib34.1.1">Journal of Artificial Intelligence Research</em>, vol. 68, pp. 503–540, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib35"> <span class="ltx_tag ltx_tag_bibitem">[35]</span> <span class="ltx_bibblock"> J. Hsu, J. Mao, and J. Wu, “Ns3d: Neuro-symbolic grounding of 3d objects and relations,” in <em class="ltx_emph ltx_font_italic" id="bib.bib35.1.1">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</em>, 2023, pp. 2614–2623. </span> </li> <li class="ltx_bibitem" id="bib.bib36"> <span class="ltx_tag ltx_tag_bibitem">[36]</span> <span class="ltx_bibblock"> S. Hu, Y. Ma, X. Liu, Y. Wei, and S. Bai, “Stratified rule-aware network for abstract visual reasoning,” in <em class="ltx_emph ltx_font_italic" id="bib.bib36.1.1">Proceedings of the AAAI Conference on Artificial Intelligence</em>, vol. 35, no. 2, 2021, pp. 1567–1574. </span> </li> <li class="ltx_bibitem" id="bib.bib37"> <span class="ltx_tag ltx_tag_bibitem">[37]</span> <span class="ltx_bibblock"> M. Ibrahim, Y. Kim, and J. M. Rabaey, “Efficient design of a hyperdimensional processing unit for multi-layer cognition,” in <em class="ltx_emph ltx_font_italic" id="bib.bib37.1.1">2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)</em>. IEEE, 2024, pp. 1–6. </span> </li> <li class="ltx_bibitem" id="bib.bib38"> <span class="ltx_tag ltx_tag_bibitem">[38]</span> <span class="ltx_bibblock"> M. Ibrahim, Z. Wan, H. Li, P. Panda, T. Krishna, P. Kanerva, Y. Chen, and A. Raychowdhury, “Special session: Neuro-symbolic architecture meets large language models: A memory-centric perspective,” in <em class="ltx_emph ltx_font_italic" id="bib.bib38.1.1">2024 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS)</em>. IEEE, 2024, pp. 11–20. </span> </li> <li class="ltx_bibitem" id="bib.bib39"> <span class="ltx_tag ltx_tag_bibitem">[39]</span> <span class="ltx_bibblock"> Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,” <em class="ltx_emph ltx_font_italic" id="bib.bib39.1.1">ACM Computing Surveys</em>, vol. 55, no. 12, pp. 1–38, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib40"> <span class="ltx_tag ltx_tag_bibitem">[40]</span> <span class="ltx_bibblock"> N. Jouppi, G. Kurian, S. Li, P. Ma, R. Nagarajan, L. Nai, N. Patil, S. Subramanian, A. Swing, B. Towles <em class="ltx_emph ltx_font_italic" id="bib.bib40.1.1">et al.</em>, “Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings,” in <em class="ltx_emph ltx_font_italic" id="bib.bib40.2.2">Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA)</em>, 2023, pp. 1–14. </span> </li> <li class="ltx_bibitem" id="bib.bib41"> <span class="ltx_tag ltx_tag_bibitem">[41]</span> <span class="ltx_bibblock"> N. P. Jouppi, D. H. Yoon, M. Ashcraft, M. Gottscho, T. B. Jablin, G. Kurian, J. Laudon, S. Li, P. Ma, X. Ma <em class="ltx_emph ltx_font_italic" id="bib.bib41.1.1">et al.</em>, “Ten lessons from three generations shaped google’s tpuv4i: Industrial product,” in <em class="ltx_emph ltx_font_italic" id="bib.bib41.2.2">2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)</em>. IEEE, 2021, pp. 1–14. </span> </li> <li class="ltx_bibitem" id="bib.bib42"> <span class="ltx_tag ltx_tag_bibitem">[42]</span> <span class="ltx_bibblock"> N. P. Jouppi, D. H. Yoon, G. Kurian, S. Li, N. Patil, J. Laudon, C. Young, and D. Patterson, “A domain-specific supercomputer for training deep neural networks,” <em class="ltx_emph ltx_font_italic" id="bib.bib42.1.1">Communications of the ACM</em>, vol. 63, no. 7, pp. 67–78, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib43"> <span class="ltx_tag ltx_tag_bibitem">[43]</span> <span class="ltx_bibblock"> A. Kalyanpur, K. Saravanakumar, V. Barres, J. Chu-Carroll, D. Melville, and D. Ferrucci, “Llm-arc: Enhancing llms with an automated reasoning critic,” <em class="ltx_emph ltx_font_italic" id="bib.bib43.1.1">arXiv preprint arXiv:2406.17663</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib44"> <span class="ltx_tag ltx_tag_bibitem">[44]</span> <span class="ltx_bibblock"> M. Kang and B. Li, “R<sup class="ltx_sup" id="bib.bib44.2.1">2</sup>-guard: Robust reasoning enabled llm guardrail via knowledge-enhanced logical reasoning,” <em class="ltx_emph ltx_font_italic" id="bib.bib44.3.2">arXiv preprint arXiv:2407.05557</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib45"> <span class="ltx_tag ltx_tag_bibitem">[45]</span> <span class="ltx_bibblock"> D. Kim, J. Kung, S. Chai, S. Yalamanchili, and S. Mukhopadhyay, “Neurocube: A programmable digital neuromorphic architecture with high-density 3d memory,” <em class="ltx_emph ltx_font_italic" id="bib.bib45.1.1">ACM SIGARCH Computer Architecture News</em>, vol. 44, no. 3, pp. 380–392, 2016. </span> </li> <li class="ltx_bibitem" id="bib.bib46"> <span class="ltx_tag ltx_tag_bibitem">[46]</span> <span class="ltx_bibblock"> D. Kleyko, M. Davies, E. P. Frady, P. Kanerva, S. J. Kent, B. A. Olshausen, E. Osipov, J. M. Rabaey, D. A. Rachkovskij, A. Rahimi <em class="ltx_emph ltx_font_italic" id="bib.bib46.1.1">et al.</em>, “Vector symbolic architectures as a computing framework for emerging hardware,” <em class="ltx_emph ltx_font_italic" id="bib.bib46.2.2">Proceedings of the IEEE</em>, vol. 110, no. 10, pp. 1538–1571, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib47"> <span class="ltx_tag ltx_tag_bibitem">[47]</span> <span class="ltx_bibblock"> S. Krishnan, Z. Wan, K. Bhardwaj, N. Jadhav, A. Faust, and V. J. Reddi, “Roofline model for uavs: A bottleneck analysis tool for onboard compute characterization of autonomous unmanned aerial vehicles,” in <em class="ltx_emph ltx_font_italic" id="bib.bib47.1.1">2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)</em>. IEEE, 2022, pp. 162–174. </span> </li> <li class="ltx_bibitem" id="bib.bib48"> <span class="ltx_tag ltx_tag_bibitem">[48]</span> <span class="ltx_bibblock"> S. Krishnan, Z. Wan, K. Bhardwaj, P. Whatmough, A. Faust, S. Neuman, G.-Y. Wei, D. Brooks, and V. J. Reddi, “Automatic domain-specific soc design for autonomous unmanned aerial vehicles,” in <em class="ltx_emph ltx_font_italic" id="bib.bib48.1.1">2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)</em>. IEEE, 2022, pp. 300–317. </span> </li> <li class="ltx_bibitem" id="bib.bib49"> <span class="ltx_tag ltx_tag_bibitem">[49]</span> <span class="ltx_bibblock"> J. Kwon, J. Tenenbaum, and S. Levine, “Neuro-symbolic models of human moral judgment,” in <em class="ltx_emph ltx_font_italic" id="bib.bib49.1.1">Proceedings of the Annual Meeting of the Cognitive Science Society</em>, vol. 46, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib50"> <span class="ltx_tag ltx_tag_bibitem">[50]</span> <span class="ltx_bibblock"> J. Langenegger, G. Karunaratne, M. Hersche, L. Benini, A. Sebastian, and A. Rahimi, “In-memory factorization of holographic perceptual representations,” <em class="ltx_emph ltx_font_italic" id="bib.bib50.1.1">Nature Nanotechnology</em>, vol. 18, no. 5, pp. 479–485, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib51"> <span class="ltx_tag ltx_tag_bibitem">[51]</span> <span class="ltx_bibblock"> Q. Liu, Z. Wan, B. Yu, W. Liu, S. Liu, and A. Raychowdhury, “An energy-efficient and runtime-reconfigurable fpga-based accelerator for robotic localization systems,” in <em class="ltx_emph ltx_font_italic" id="bib.bib51.1.1">2022 IEEE Custom Integrated Circuits Conference (CICC)</em>. IEEE, 2022, pp. 01–02. </span> </li> <li class="ltx_bibitem" id="bib.bib52"> <span class="ltx_tag ltx_tag_bibitem">[52]</span> <span class="ltx_bibblock"> S. Liu, Z. Wan, B. Yu, and Y. Wang, <em class="ltx_emph ltx_font_italic" id="bib.bib52.1.1">Robotic computing on fpgas</em>. Springer, 2021, vol. 16. </span> </li> <li class="ltx_bibitem" id="bib.bib53"> <span class="ltx_tag ltx_tag_bibitem">[53]</span> <span class="ltx_bibblock"> Y. Liu, R. Ryskin, R. Futrell, and E. Gibson, “A verb-frame frequency account of constraints on long-distance dependencies in english,” <em class="ltx_emph ltx_font_italic" id="bib.bib53.1.1">Cognition</em>, vol. 222, p. 104902, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib54"> <span class="ltx_tag ltx_tag_bibitem">[54]</span> <span class="ltx_bibblock"> Z.-G. Liu, P. N. Whatmough, and M. Mattina, “Systolic tensor array: An efficient structured-sparse gemm accelerator for mobile cnn inference,” <em class="ltx_emph ltx_font_italic" id="bib.bib54.1.1">IEEE Computer Architecture Letters</em>, vol. 19, no. 1, pp. 34–37, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib55"> <span class="ltx_tag ltx_tag_bibitem">[55]</span> <span class="ltx_bibblock"> Z.-G. Liu, P. N. Whatmough, Y. Zhu, and M. Mattina, “S2ta: Exploiting structured sparsity for energy-efficient mobile cnn acceleration,” in <em class="ltx_emph ltx_font_italic" id="bib.bib55.1.1">2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)</em>. IEEE, 2022, pp. 573–586. </span> </li> <li class="ltx_bibitem" id="bib.bib56"> <span class="ltx_tag ltx_tag_bibitem">[56]</span> <span class="ltx_bibblock"> R. Manhaeve, S. Dumančić, A. Kimmig, T. Demeester, and L. De Raedt, “Neural probabilistic logic programming in deepproblog,” <em class="ltx_emph ltx_font_italic" id="bib.bib56.1.1">Artificial Intelligence</em>, vol. 298, p. 103504, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib57"> <span class="ltx_tag ltx_tag_bibitem">[57]</span> <span class="ltx_bibblock"> J. Mao, C. Gan, P. Kohli, J. B. Tenenbaum, and J. Wu, “The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision,” <em class="ltx_emph ltx_font_italic" id="bib.bib57.1.1">International Conference on Learning Representations (ICLR)</em>, 2019. </span> </li> <li class="ltx_bibitem" id="bib.bib58"> <span class="ltx_tag ltx_tag_bibitem">[58]</span> <span class="ltx_bibblock"> V. Mayoral-Vilches, J. Jabbour, Y.-S. Hsiao, Z. Wan, M. Crespo-Álvarez, M. Stewart, J. M. Reina-Muñoz, P. Nagras, G. Vikhe, M. Bakhshalipour <em class="ltx_emph ltx_font_italic" id="bib.bib58.1.1">et al.</em>, “Robotperf: An open-source, vendor-agnostic, benchmarking suite for evaluating robotics computing system performance,” in <em class="ltx_emph ltx_font_italic" id="bib.bib58.2.2">2024 IEEE International Conference on Robotics and Automation (ICRA)</em>. IEEE, 2024, pp. 8288–8297. </span> </li> <li class="ltx_bibitem" id="bib.bib59"> <span class="ltx_tag ltx_tag_bibitem">[59]</span> <span class="ltx_bibblock"> L. Mei, J. Mao, Z. Wang, C. Gan, and J. B. Tenenbaum, “Falcon: fast visual concept learning by integrating images, linguistic descriptions, and conceptual relations,” <em class="ltx_emph ltx_font_italic" id="bib.bib59.1.1">International Conference on Learning Representations (ICLR)</em>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib60"> <span class="ltx_tag ltx_tag_bibitem">[60]</span> <span class="ltx_bibblock"> N. Menet, M. Hersche, G. Karunaratne, L. Benini, A. Sebastian, and A. Rahimi, “Mimonets: Multiple-input-multiple-output neural networks exploiting computation in superposition,” <em class="ltx_emph ltx_font_italic" id="bib.bib60.1.1">Advances in Neural Information Processing Systems (NeurIPS)</em>, vol. 36, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib61"> <span class="ltx_tag ltx_tag_bibitem">[61]</span> <span class="ltx_bibblock"> T. Mu, A. Helyar, J. Heidecke, J. Achiam, A. Vallone, I. Kivlichan, M. Lin, A. Beutel, J. Schulman, and L. Weng, “Rule based rewards for language model safety,” <em class="ltx_emph ltx_font_italic" id="bib.bib61.1.1">Open AI</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib62"> <span class="ltx_tag ltx_tag_bibitem">[62]</span> <span class="ltx_bibblock"> M. Nabeel, D. Soni, M. Ashraf, M. A. Gebremichael, H. Gamil, E. Chielle, R. Karri, M. Sanduleanu, and M. Maniatakos, “Cofhee: A co-processor for fully homomorphic encryption execution,” in <em class="ltx_emph ltx_font_italic" id="bib.bib62.1.1">2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)</em>. IEEE, 2023, pp. 1–2. </span> </li> <li class="ltx_bibitem" id="bib.bib63"> <span class="ltx_tag ltx_tag_bibitem">[63]</span> <span class="ltx_bibblock"> S. M. Neuman, R. Ghosal, T. Bourgeat, B. Plancher, and V. J. Reddi, “Roboshape: Using topology patterns to scalably and flexibly deploy accelerators across robots,” in <em class="ltx_emph ltx_font_italic" id="bib.bib63.1.1">Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA)</em>, 2023, pp. 1–13. </span> </li> <li class="ltx_bibitem" id="bib.bib64"> <span class="ltx_tag ltx_tag_bibitem">[64]</span> <span class="ltx_bibblock"> C. Núñez-Molina, P. Mesejo, and J. Fernández-Olivares, “A review of symbolic, subsymbolic and hybrid methods for sequential decision making,” <em class="ltx_emph ltx_font_italic" id="bib.bib64.1.1">ACM Computing Surveys</em>, vol. 56, no. 11, pp. 1–36, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib65"> <span class="ltx_tag ltx_tag_bibitem">[65]</span> <span class="ltx_bibblock"> L. I. G. Olascoaga, A. Menon, M. Ibrahim, and J. Rabaey, “A brain-inspired hierarchical reasoning framework for cognition-augmented prosthetic grasping,” in <em class="ltx_emph ltx_font_italic" id="bib.bib65.1.1">Combining Learning and Reasoning: Programming Languages, Formalisms, and Representations</em>, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib66"> <span class="ltx_tag ltx_tag_bibitem">[66]</span> <span class="ltx_bibblock"> C. Pryor, C. Dickens, E. Augustine, A. Albalak, W. Wang, and L. Getoor, “Neupsl: Neural probabilistic soft logic,” <em class="ltx_emph ltx_font_italic" id="bib.bib66.1.1">arXiv preprint arXiv:2205.14268</em>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib67"> <span class="ltx_tag ltx_tag_bibitem">[67]</span> <span class="ltx_bibblock"> D. A. Rachkovskij and E. M. Kussul, “Binding and normalization of binary sparse distributed representations by context-dependent thinning,” <em class="ltx_emph ltx_font_italic" id="bib.bib67.1.1">Neural Computation</em>, vol. 13, no. 2, pp. 411–452, 2001. </span> </li> <li class="ltx_bibitem" id="bib.bib68"> <span class="ltx_tag ltx_tag_bibitem">[68]</span> <span class="ltx_bibblock"> A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark <em class="ltx_emph ltx_font_italic" id="bib.bib68.1.1">et al.</em>, “Learning transferable visual models from natural language supervision,” in <em class="ltx_emph ltx_font_italic" id="bib.bib68.2.2">International conference on machine learning (ICML)</em>. PMLR, 2021, pp. 8748–8763. </span> </li> <li class="ltx_bibitem" id="bib.bib69"> <span class="ltx_tag ltx_tag_bibitem">[69]</span> <span class="ltx_bibblock"> A. Ramachandran, Z. Wan, G. Jeong, J. Gustafson, and T. Krishna, “Algorithm-hardware co-design of distribution-aware logarithmic-posit encodings for efficient dnn inference,” in <em class="ltx_emph ltx_font_italic" id="bib.bib69.1.1">Proceedings of the 61st ACM/IEEE Design Automation Conference (DAC)</em>, 2024, pp. 1–6. </span> </li> <li class="ltx_bibitem" id="bib.bib70"> <span class="ltx_tag ltx_tag_bibitem">[70]</span> <span class="ltx_bibblock"> B. Romera-Paredes, M. Barekatain, A. Novikov, M. Balog, M. P. Kumar, E. Dupont, F. J. Ruiz, J. S. Ellenberg, P. Wang, O. Fawzi <em class="ltx_emph ltx_font_italic" id="bib.bib70.1.1">et al.</em>, “Mathematical discoveries from program search with large language models,” <em class="ltx_emph ltx_font_italic" id="bib.bib70.2.2">Nature</em>, vol. 625, no. 7995, pp. 468–475, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib71"> <span class="ltx_tag ltx_tag_bibitem">[71]</span> <span class="ltx_bibblock"> A. Samajdar, P. Mannan, K. Garg, and T. Krishna, “Genesys: Enabling continuous learning through neural network evolution in hardware,” in <em class="ltx_emph ltx_font_italic" id="bib.bib71.1.1">2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)</em>. IEEE, 2018, pp. 855–866. </span> </li> <li class="ltx_bibitem" id="bib.bib72"> <span class="ltx_tag ltx_tag_bibitem">[72]</span> <span class="ltx_bibblock"> A. Samajdar, E. Qin, M. Pellauer, and T. Krishna, “Self adaptive reconfigurable arrays (sara) learning flexible gemm accelerator configuration and mapping-space using ml,” in <em class="ltx_emph ltx_font_italic" id="bib.bib72.1.1">Proceedings of the 59th ACM/IEEE Design Automation Conference (DAC)</em>, 2022, pp. 583–588. </span> </li> <li class="ltx_bibitem" id="bib.bib73"> <span class="ltx_tag ltx_tag_bibitem">[73]</span> <span class="ltx_bibblock"> N. Samardzic, A. Feldmann, A. Krastev, S. Devadas, R. Dreslinski, C. Peikert, and D. Sanchez, “F1: A fast and programmable accelerator for fully homomorphic encryption,” in <em class="ltx_emph ltx_font_italic" id="bib.bib73.1.1">54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)</em>, 2021, pp. 238–252. </span> </li> <li class="ltx_bibitem" id="bib.bib74"> <span class="ltx_tag ltx_tag_bibitem">[74]</span> <span class="ltx_bibblock"> N. Samardzic, A. Feldmann, A. Krastev, N. Manohar, N. Genise, S. Devadas, K. Eldefrawy, C. Peikert, and D. Sanchez, “Craterlake: a hardware accelerator for efficient unbounded computation on encrypted data,” in <em class="ltx_emph ltx_font_italic" id="bib.bib74.1.1">Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA)</em>, 2022, pp. 173–187. </span> </li> <li class="ltx_bibitem" id="bib.bib75"> <span class="ltx_tag ltx_tag_bibitem">[75]</span> <span class="ltx_bibblock"> N. Shah, W. Meert, and M. Verhelst, “Dpu-v2: Energy-efficient execution of irregular directed acyclic graphs,” in <em class="ltx_emph ltx_font_italic" id="bib.bib75.1.1">2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)</em>. IEEE, 2022, pp. 1288–1307. </span> </li> <li class="ltx_bibitem" id="bib.bib76"> <span class="ltx_tag ltx_tag_bibitem">[76]</span> <span class="ltx_bibblock"> V. Shah, A. Sharma, G. Shroff, L. Vig, T. Dash, and A. Srinivasan, “Knowledge-based analogical reasoning in neuro-symbolic latent spaces,” <em class="ltx_emph ltx_font_italic" id="bib.bib76.1.1">arXiv preprint arXiv:2209.08750</em>, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib77"> <span class="ltx_tag ltx_tag_bibitem">[77]</span> <span class="ltx_bibblock"> Y. S. Shao, J. Clemons, R. Venkatesan, B. Zimmer, M. Fojtik, N. Jiang, B. Keller, A. Klinefelter, N. Pinckney, P. Raina <em class="ltx_emph ltx_font_italic" id="bib.bib77.1.1">et al.</em>, “Simba: Scaling deep-learning inference with multi-chip-module-based architecture,” in <em class="ltx_emph ltx_font_italic" id="bib.bib77.2.2">Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)</em>, 2019, pp. 14–27. </span> </li> <li class="ltx_bibitem" id="bib.bib78"> <span class="ltx_tag ltx_tag_bibitem">[78]</span> <span class="ltx_bibblock"> A. Sheth and K. Roy, “Neurosymbolic value-inspired artificial intelligence (why, what, and how),” <em class="ltx_emph ltx_font_italic" id="bib.bib78.1.1">IEEE Intelligent Systems</em>, vol. 39, no. 1, pp. 5–11, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib79"> <span class="ltx_tag ltx_tag_bibitem">[79]</span> <span class="ltx_bibblock"> Synopsys, “Design compiler - synopsys,” <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/dc-ultra.html" title="">https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/dc-ultra.html</a>. </span> </li> <li class="ltx_bibitem" id="bib.bib80"> <span class="ltx_tag ltx_tag_bibitem">[80]</span> <span class="ltx_bibblock"> T. Tambe, E.-Y. Yang, G. G. Ko, Y. Chai, C. Hooper, M. Donato, P. N. Whatmough, A. M. Rush, D. Brooks, and G.-Y. Wei, “A 16-nm soc for noise-robust speech and nlp edge ai inference with bayesian sound source separation and attention-based dnns,” <em class="ltx_emph ltx_font_italic" id="bib.bib80.1.1">IEEE Journal of Solid-State Circuits (JSSC)</em>, vol. 58, no. 2, pp. 569–581, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib81"> <span class="ltx_tag ltx_tag_bibitem">[81]</span> <span class="ltx_bibblock"> T. Tambe, E.-Y. Yang, Z. Wan, Y. Deng, V. J. Reddi, A. Rush, D. Brooks, and G.-Y. Wei, “Algorithm-hardware co-design of adaptive floating-point encodings for resilient deep learning inference,” in <em class="ltx_emph ltx_font_italic" id="bib.bib81.1.1">2020 57th ACM/IEEE Design Automation Conference (DAC)</em>. IEEE, 2020, pp. 1–6. </span> </li> <li class="ltx_bibitem" id="bib.bib82"> <span class="ltx_tag ltx_tag_bibitem">[82]</span> <span class="ltx_bibblock"> Y. Tay, M. Dehghani, S. Abnar, Y. Shen, D. Bahri, P. Pham, J. Rao, L. Yang, S. Ruder, and D. Metzler, “Long range arena: A benchmark for efficient transformers,” <em class="ltx_emph ltx_font_italic" id="bib.bib82.1.1">International Conference on Learning Representations (ICLR)</em>, 2021. </span> </li> <li class="ltx_bibitem" id="bib.bib83"> <span class="ltx_tag ltx_tag_bibitem">[83]</span> <span class="ltx_bibblock"> T. H. Trinh, Y. Wu, Q. V. Le, H. He, and T. Luong, “Solving olympiad geometry without human demonstrations,” <em class="ltx_emph ltx_font_italic" id="bib.bib83.1.1">Nature</em>, vol. 625, no. 7995, pp. 476–482, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib84"> <span class="ltx_tag ltx_tag_bibitem">[84]</span> <span class="ltx_bibblock"> Z. Wan, A. Anwar, Y.-S. Hsiao, T. Jia, V. J. Reddi, and A. Raychowdhury, “Analyzing and improving fault tolerance of learning-based navigation systems,” in <em class="ltx_emph ltx_font_italic" id="bib.bib84.1.1">2021 58th ACM/IEEE Design Automation Conference (DAC)</em>. IEEE, 2021, pp. 841–846. </span> </li> <li class="ltx_bibitem" id="bib.bib85"> <span class="ltx_tag ltx_tag_bibitem">[85]</span> <span class="ltx_bibblock"> Z. Wan, C.-K. Liu, M. Ibrahim, H. Yang, S. Spetalnick, T. Krishna, and A. Raychowdhury, “H3dfact: Heterogeneous 3d integrated cim for factorization with holographic perceptual representations,” in <em class="ltx_emph ltx_font_italic" id="bib.bib85.1.1">2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)</em>. IEEE, 2024, pp. 1–6. </span> </li> <li class="ltx_bibitem" id="bib.bib86"> <span class="ltx_tag ltx_tag_bibitem">[86]</span> <span class="ltx_bibblock"> Z. Wan, C.-K. Liu, H. Yang, R. Raj, C. Li, H. You, Y. Fu, C. Wan, S. Li, Y. Kim <em class="ltx_emph ltx_font_italic" id="bib.bib86.1.1">et al.</em>, “Towards efficient neuro-symbolic ai: From workload characterization to hardware architecture,” <em class="ltx_emph ltx_font_italic" id="bib.bib86.2.2">IEEE Transactions on Circuits and Systems for Artificial Intelligence (TCASAI)</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib87"> <span class="ltx_tag ltx_tag_bibitem">[87]</span> <span class="ltx_bibblock"> Z. Wan, C.-K. Liu, H. Yang, R. Raj, C. Li, H. You, Y. Fu, C. Wan, A. Samajdar, Y. C. Lin <em class="ltx_emph ltx_font_italic" id="bib.bib87.1.1">et al.</em>, “Towards cognitive ai systems: Workload and characterization of neuro-symbolic ai,” in <em class="ltx_emph ltx_font_italic" id="bib.bib87.2.2">2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)</em>. IEEE, 2024, pp. 268–279. </span> </li> <li class="ltx_bibitem" id="bib.bib88"> <span class="ltx_tag ltx_tag_bibitem">[88]</span> <span class="ltx_bibblock"> W. Wang, H. Bao, L. Dong, J. Bjorck, Z. Peng, Q. Liu, K. Aggarwal, O. K. Mohammed, S. Singhal, S. Som <em class="ltx_emph ltx_font_italic" id="bib.bib88.1.1">et al.</em>, “Image as a foreign language: Beit pretraining for vision and vision-language tasks,” in <em class="ltx_emph ltx_font_italic" id="bib.bib88.2.2">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</em>, 2023, pp. 19 175–19 186. </span> </li> <li class="ltx_bibitem" id="bib.bib89"> <span class="ltx_tag ltx_tag_bibitem">[89]</span> <span class="ltx_bibblock"> Y. Wang, Y. Han, C. Wang, S. Song, Q. Tian, and G. Huang, “Computation-efficient deep learning for computer vision: A survey,” <em class="ltx_emph ltx_font_italic" id="bib.bib89.1.1">Cybernetics and Intelligence</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib90"> <span class="ltx_tag ltx_tag_bibitem">[90]</span> <span class="ltx_bibblock"> A. Xiao, J. Huang, D. Guan, X. Zhang, S. Lu, and L. Shao, “Unsupervised point cloud representation learning with deep neural networks: A survey,” <em class="ltx_emph ltx_font_italic" id="bib.bib90.1.1">IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)</em>, 2023. </span> </li> <li class="ltx_bibitem" id="bib.bib91"> <span class="ltx_tag ltx_tag_bibitem">[91]</span> <span class="ltx_bibblock"> H. Xiong, Z. Wang, X. Li, J. Bian, Z. Xie, S. Mumtaz, and L. E. Barnes, “Converging paradigms: The synergy of symbolic and connectionist ai in llm-empowered autonomous agents,” <em class="ltx_emph ltx_font_italic" id="bib.bib91.1.1">arXiv preprint arXiv:2407.08516</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib92"> <span class="ltx_tag ltx_tag_bibitem">[92]</span> <span class="ltx_bibblock"> Z. Yang, A. Ishay, and J. Lee, “Neurasp: Embracing neural networks into answer set programming,” in <em class="ltx_emph ltx_font_italic" id="bib.bib92.1.1">29th International Joint Conference on Artificial Intelligence (IJCAI 2020)</em>, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib93"> <span class="ltx_tag ltx_tag_bibitem">[93]</span> <span class="ltx_bibblock"> K. Yi, C. Gan, Y. Li, P. Kohli, J. Wu, A. Torralba, and J. B. Tenenbaum, “Clevrer: Collision events for video representation and reasoning,” in <em class="ltx_emph ltx_font_italic" id="bib.bib93.1.1">International Conference on Learning Representations (ICLR)</em>, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib94"> <span class="ltx_tag ltx_tag_bibitem">[94]</span> <span class="ltx_bibblock"> A. Zerroug, M. Vaishnav, J. Colin, S. Musslick, and T. Serre, “A benchmark for compositional visual reasoning,” <em class="ltx_emph ltx_font_italic" id="bib.bib94.1.1">Advances in Neural Information Processing Systems (NeurIPS)</em>, vol. 35, pp. 29 776–29 788, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib95"> <span class="ltx_tag ltx_tag_bibitem">[95]</span> <span class="ltx_bibblock"> C. Zhang, F. Gao, B. Jia, Y. Zhu, and S.-C. Zhu, “Raven: A dataset for relational and analogical visual reasoning,” in <em class="ltx_emph ltx_font_italic" id="bib.bib95.1.1">Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)</em>, 2019, pp. 5317–5327. </span> </li> <li class="ltx_bibitem" id="bib.bib96"> <span class="ltx_tag ltx_tag_bibitem">[96]</span> <span class="ltx_bibblock"> C. Zhang, B. Jia, S.-C. Zhu, and Y. Zhu, “Abstract spatial-temporal reasoning via probabilistic abduction and execution,” in <em class="ltx_emph ltx_font_italic" id="bib.bib96.1.1">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</em>, 2021, pp. 9736–9746. </span> </li> <li class="ltx_bibitem" id="bib.bib97"> <span class="ltx_tag ltx_tag_bibitem">[97]</span> <span class="ltx_bibblock"> H. Zhang and T. Yu, “Alphazero,” <em class="ltx_emph ltx_font_italic" id="bib.bib97.1.1">Deep Reinforcement Learning: Fundamentals, Research and Applications</em>, pp. 391–415, 2020. </span> </li> <li class="ltx_bibitem" id="bib.bib98"> <span class="ltx_tag ltx_tag_bibitem">[98]</span> <span class="ltx_bibblock"> P. Zhao, H. Zhang, Q. Yu, Z. Wang, Y. Geng, F. Fu, L. Yang, W. Zhang, and B. Cui, “Retrieval-augmented generation for ai-generated content: A survey,” <em class="ltx_emph ltx_font_italic" id="bib.bib98.1.1">arXiv preprint arXiv:2402.19473</em>, 2024. </span> </li> <li class="ltx_bibitem" id="bib.bib99"> <span class="ltx_tag ltx_tag_bibitem">[99]</span> <span class="ltx_bibblock"> C. Zhou, F. G. Redondo, J. Büchel, I. Boybat, X. T. Comas, S. Nandakumar, S. Das, A. Sebastian, M. Le Gallo, and P. N. Whatmough, “Ml-hw co-design of noise-robust tinyml models and always-on analog compute-in-memory edge accelerator,” <em class="ltx_emph ltx_font_italic" id="bib.bib99.1.1">IEEE Micro</em>, vol. 42, no. 6, pp. 76–87, 2022. </span> </li> <li class="ltx_bibitem" id="bib.bib100"> <span class="ltx_tag ltx_tag_bibitem">[100]</span> <span class="ltx_bibblock"> K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Learning to prompt for vision-language models,” <em class="ltx_emph ltx_font_italic" id="bib.bib100.1.1">International Journal of Computer Vision (IJCV)</em>, vol. 130, no. 9, pp. 2337–2348, 2022. </span> </li> </ul> </section> <div class="ltx_pagination ltx_role_newpage"></div> </article> </div> <footer class="ltx_page_footer"> <div class="ltx_page_logo">Generated on Mon Mar 17 16:42:57 2025 by <a class="ltx_LaTeXML_logo" href="http://dlmf.nist.gov/LaTeXML/"><span style="letter-spacing:-0.2em; margin-right:0.1em;">L<span class="ltx_font_smallcaps" style="position:relative; bottom:2.2pt;">a</span>T<span class="ltx_font_smallcaps" style="font-size:120%;position:relative; bottom:-0.2ex;">e</span></span><span style="font-size:90%; position:relative; bottom:-0.2ex;">XML</span><img alt="Mascot Sammy" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAsAAAAOCAYAAAD5YeaVAAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A/wD/oL2nkwAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9wKExQZLWTEaOUAAAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAdpJREFUKM9tkL+L2nAARz9fPZNCKFapUn8kyI0e4iRHSR1Kb8ng0lJw6FYHFwv2LwhOpcWxTjeUunYqOmqd6hEoRDhtDWdA8ApRYsSUCDHNt5ul13vz4w0vWCgUnnEc975arX6ORqN3VqtVZbfbTQC4uEHANM3jSqXymFI6yWazP2KxWAXAL9zCUa1Wy2tXVxheKA9YNoR8Pt+aTqe4FVVVvz05O6MBhqUIBGk8Hn8HAOVy+T+XLJfLS4ZhTiRJgqIoVBRFIoric47jPnmeB1mW/9rr9ZpSSn3Lsmir1fJZlqWlUonKsvwWwD8ymc/nXwVBeLjf7xEKhdBut9Hr9WgmkyGEkJwsy5eHG5vN5g0AKIoCAEgkEkin0wQAfN9/cXPdheu6P33fBwB4ngcAcByHJpPJl+fn54mD3Gg0NrquXxeLRQAAwzAYj8cwTZPwPH9/sVg8PXweDAauqqr2cDjEer1GJBLBZDJBs9mE4zjwfZ85lAGg2+06hmGgXq+j3+/DsixYlgVN03a9Xu8jgCNCyIegIAgx13Vfd7vdu+FweG8YRkjXdWy329+dTgeSJD3ieZ7RNO0VAXAPwDEAO5VKndi2fWrb9jWl9Esul6PZbDY9Go1OZ7PZ9z/lyuD3OozU2wAAAABJRU5ErkJggg=="/></a> </div></footer> </div> </body> </html>