class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2503.14354</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Hardware Architecture">cs.AR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Emerging Technologies">cs.ET</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> </div> </div> <p class="title is-5 mathjax"> Retrospective: A CORDIC Based Configurable Activation Function for NN Applications </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Kokane%2C+O">Omkar Kokane</a>, <a href="/search/cs?searchtype=author&amp;query=Raut%2C+G">Gopal Raut</a>, <a href="/search/cs?searchtype=author&amp;query=Ullah%2C+S">Salim Ullah</a>, <a href="/search/cs?searchtype=author&amp;query=Lokhande%2C+M">Mukul Lokhande</a>, <a href="/search/cs?searchtype=author&amp;query=Teman%2C+A">Adam Teman</a>, <a href="/search/cs?searchtype=author&amp;query=Kumar%2C+A">Akash Kumar</a>, <a href="/search/cs?searchtype=author&amp;query=Vishvakarma%2C+S+K">Santosh Kumar Vishvakarma</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2503.14354v1-abstract-short" style="display: inline;"> A CORDIC-based configuration for the design of Activation Functions (AF) was previously suggested to accelerate ASIC hardware design for resource-constrained systems by providing functional reconfigurability. Since its introduction, this new approach for neural network acceleration has gained widespread popularity, influencing numerous designs for activation functions in both academic and commerci&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2503.14354v1-abstract-full').style.display = 'inline'; document.getElementById('2503.14354v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2503.14354v1-abstract-full" style="display: none;"> A CORDIC-based configuration for the design of Activation Functions (AF) was previously suggested to accelerate ASIC hardware design for resource-constrained systems by providing functional reconfigurability. Since its introduction, this new approach for neural network acceleration has gained widespread popularity, influencing numerous designs for activation functions in both academic and commercial AI processors. In this retrospective analysis, we explore the foundational aspects of this initiative, summarize key developments over recent years, and introduce the DA-VINCI AF tailored for the evolving needs of AI applications. This new generation of dynamically configurable and precision-adjustable activation function cores promise greater adaptability for a range of activation functions in AI workloads, including Swish, SoftMax, SeLU, and GeLU, utilizing the Shift-and-Add CORDIC technique. The previously presented design has been optimized for MAC, Sigmoid, and Tanh functionalities and incorporated into ReLU AFs, culminating in an accumulative NEURIC compute unit. These enhancements position NEURIC as a fundamental component in the resource-efficient vector engine for the realization of AI accelerators that focus on DNNs, RNNs/LSTMs, and Transformers, achieving a quality of results (QoR) of 98.5%. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2503.14354v1-abstract-full').style.display = 'none'; document.getElementById('2503.14354v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 18 March, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> March 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2503.11685</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Hardware Architecture">cs.AR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> </div> </div> <p class="title is-5 mathjax"> CORDIC Is All You Need </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Kokane%2C+O">Omkar Kokane</a>, <a href="/search/cs?searchtype=author&amp;query=Teman%2C+A">Adam Teman</a>, <a href="/search/cs?searchtype=author&amp;query=Jha%2C+A">Anushka Jha</a>, <a href="/search/cs?searchtype=author&amp;query=SL%2C+G+P">Guru Prasath SL</a>, <a href="/search/cs?searchtype=author&amp;query=Raut%2C+G">Gopal Raut</a>, <a href="/search/cs?searchtype=author&amp;query=Lokhande%2C+M">Mukul Lokhande</a>, <a href="/search/cs?searchtype=author&amp;query=Chand%2C+S+V+J">S. V. Jaya Chand</a>, <a href="/search/cs?searchtype=author&amp;query=Dewangan%2C+T">Tanushree Dewangan</a>, <a href="/search/cs?searchtype=author&amp;query=Vishvakarma%2C+S+K">Santosh Kumar Vishvakarma</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2503.11685v1-abstract-short" style="display: inline;"> Artificial intelligence necessitates adaptable hardware accelerators for efficient high-throughput million operations. We present pipelined architecture with CORDIC block for linear MAC computations and nonlinear iterative Activation Functions (AF) such as $tanh$, $sigmoid$, and $softmax$. This approach focuses on a Reconfigurable Processing Engine (RPE) based systolic array, with 40\% pruning rat&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2503.11685v1-abstract-full').style.display = 'inline'; document.getElementById('2503.11685v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2503.11685v1-abstract-full" style="display: none;"> Artificial intelligence necessitates adaptable hardware accelerators for efficient high-throughput million operations. We present pipelined architecture with CORDIC block for linear MAC computations and nonlinear iterative Activation Functions (AF) such as $tanh$, $sigmoid$, and $softmax$. This approach focuses on a Reconfigurable Processing Engine (RPE) based systolic array, with 40\% pruning rate, enhanced throughput up to 4.64$\times$, and reduction in power and area by 5.02 $\times$ and 4.06 $\times$ at CMOS 28 nm, with minor accuracy loss. FPGA implementation achieves a reduction of up to 2.5 $\times$ resource savings and 3 $\times$ power compared to prior works. The Systolic CORDIC engine for Reconfigurability and Enhanced throughput (SYCore) deploys an output stationary dataflow with the CAESAR control engine for diverse AI workloads such as Transformers, RNNs/LSTMs, and DNNs for applications like image detection, LLMs, and speech recognition. The energy-efficient and flexible approach extends the enhanced approach for edge AI accelerators supporting emerging workloads. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2503.11685v1-abstract-full').style.display = 'none'; document.getElementById('2503.11685v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 March, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> March 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2412.11702</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Hardware Architecture">cs.AR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Distributed, Parallel, and Cluster Computing">cs.DC</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Emerging Technologies">cs.ET</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> </div> </div> <p class="title is-5 mathjax"> Flex-PE: Flexible and SIMD Multi-Precision Processing Element for AI Workloads </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Lokhande%2C+M">Mukul Lokhande</a>, <a href="/search/cs?searchtype=author&amp;query=Raut%2C+G">Gopal Raut</a>, <a href="/search/cs?searchtype=author&amp;query=Vishvakarma%2C+S+K">Santosh Kumar Vishvakarma</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2412.11702v1-abstract-short" style="display: inline;"> The rapid adaptation of data driven AI models, such as deep learning inference, training, Vision Transformers (ViTs), and other HPC applications, drives a strong need for runtime precision configurable different non linear activation functions (AF) hardware support. Existing solutions support diverse precision or runtime AF reconfigurability but fail to address both simultaneously. This work propo&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.11702v1-abstract-full').style.display = 'inline'; document.getElementById('2412.11702v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2412.11702v1-abstract-full" style="display: none;"> The rapid adaptation of data driven AI models, such as deep learning inference, training, Vision Transformers (ViTs), and other HPC applications, drives a strong need for runtime precision configurable different non linear activation functions (AF) hardware support. Existing solutions support diverse precision or runtime AF reconfigurability but fail to address both simultaneously. This work proposes a flexible and SIMD multiprecision processing element (FlexPE), which supports diverse runtime configurable AFs, including sigmoid, tanh, ReLU and softmax, and MAC operation. The proposed design achieves an improved throughput of up to 16X FxP4, 8X FxP8, 4X FxP16 and 1X FxP32 in pipeline mode with 100% time multiplexed hardware. This work proposes an area efficient multiprecision iterative mode in the SIMD systolic arrays for edge AI use cases. The design delivers superior performance with up to 62X and 371X reductions in DMA reads for input feature maps and weight filters in VGG16, with an energy efficiency of 8.42 GOPS / W within the accuracy loss of 2%. The proposed architecture supports emerging 4-bit computations for DL inference while enhancing throughput in FxP8/16 modes for transformers and other HPC applications. The proposed approach enables future energy-efficient AI accelerators in edge and cloud environments. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.11702v1-abstract-full').style.display = 'none'; document.getElementById('2412.11702v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 16 December, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">10 pages, 5 figures, Preprint, Submitted to TVLSI Regular papers</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.04976</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Hardware Architecture">cs.AR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> </div> </div> <p class="title is-5 mathjax"> HYDRA: Hybrid Data Multiplexing and Run-time Layer Configurable DNN Accelerator </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Kumar%2C+S">Sonu Kumar</a>, <a href="/search/cs?searchtype=author&amp;query=Gupta%2C+K">Komal Gupta</a>, <a href="/search/cs?searchtype=author&amp;query=Raut%2C+G">Gopal Raut</a>, <a href="/search/cs?searchtype=author&amp;query=Lokhande%2C+M">Mukul Lokhande</a>, <a href="/search/cs?searchtype=author&amp;query=Vishvakarma%2C+S+K">Santosh Kumar Vishvakarma</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.04976v1-abstract-short" style="display: inline;"> Deep neural networks (DNNs) offer plenty of challenges in executing efficient computation at edge nodes, primarily due to the huge hardware resource demands. The article proposes HYDRA, hybrid data multiplexing, and runtime layer configurable DNN accelerators to overcome the drawbacks. The work proposes a layer-multiplexed approach, which further reuses a single activation function within the exec&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.04976v1-abstract-full').style.display = 'inline'; document.getElementById('2409.04976v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.04976v1-abstract-full" style="display: none;"> Deep neural networks (DNNs) offer plenty of challenges in executing efficient computation at edge nodes, primarily due to the huge hardware resource demands. The article proposes HYDRA, hybrid data multiplexing, and runtime layer configurable DNN accelerators to overcome the drawbacks. The work proposes a layer-multiplexed approach, which further reuses a single activation function within the execution of a single layer with improved Fused-Multiply-Accumulate (FMA). The proposed approach works in iterative mode to reuse the same hardware and execute different layers in a configurable fashion. The proposed architectures achieve reductions over 90% of power consumption and resource utilization improvements of state-of-the-art works, with 35.21 TOPSW. The proposed architecture reduces the area overhead (N-1) times required in bandwidth, AF and layer architecture. This work shows HYDRA architecture supports optimal DNN computations while improving performance on resource-constrained edge devices. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.04976v1-abstract-full').style.display = 'none'; document.getElementById('2409.04976v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 8 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.00806</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Hardware Architecture">cs.AR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> <div class="is-inline-block" style="margin-left: 0.5rem"> <div class="tags has-addons"> <span class="tag is-dark is-size-7">doi</span> <span class="tag is-light is-size-7"><a class="" href="">10.1109/VDAT63601.2024.10705729 <i class="fa fa-external-link" aria-hidden="true"></i></a></span> </div> </div> </div> <p class="title is-5 mathjax"> HOAA: Hybrid Overestimating Approximate Adder for Enhanced Performance Processing Engine </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Kokane%2C+O">Omkar Kokane</a>, <a href="/search/cs?searchtype=author&amp;query=Sati%2C+P">Prabhat Sati</a>, <a href="/search/cs?searchtype=author&amp;query=Lokhande%2C+M">Mukul Lokhande</a>, <a href="/search/cs?searchtype=author&amp;query=Vishvakarma%2C+S+K">Santosh Kumar Vishvakarma</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.00806v1-abstract-short" style="display: inline;"> This paper presents the Hybrid Overestimating Approximate Adder designed to enhance the performance in processing engines, specifically focused on edge AI applications. A novel Plus One Adder design is proposed as an incremental adder in the RCA chain, incorporating a Full Adder with an excess 1 alongside inputs A, B, and Cin. The design approximates outputs to 2 bit values to reduce hardware comp&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.00806v1-abstract-full').style.display = 'inline'; document.getElementById('2408.00806v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.00806v1-abstract-full" style="display: none;"> This paper presents the Hybrid Overestimating Approximate Adder designed to enhance the performance in processing engines, specifically focused on edge AI applications. A novel Plus One Adder design is proposed as an incremental adder in the RCA chain, incorporating a Full Adder with an excess 1 alongside inputs A, B, and Cin. The design approximates outputs to 2 bit values to reduce hardware complexity and improve resource efficiency. The Plus One Adder is integrated into a dynamically reconfigurable HOAA, allowing runtime interchangeability between accurate and approximate overestimation modes. The proposed design is demonstrated for multiple applications, such as Twos complement subtraction and Rounding to even, and the Configurable Activation function, which are critical components of the Processing engine. Our approach shows 21 percent improvement in area efficiency and 33 percent reduction in power consumption, compared to state of the art designs with minimal accuracy loss. Thus, the proposed HOAA could be a promising solution for resource-constrained environments, offering ideal trade-offs between hardware efficiency vs computational accuracy. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.00806v1-abstract-full').style.display = 'none'; document.getElementById('2408.00806v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 29 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Journal ref:</span> 28th International Symposium on VLSI Design and Test (VDAT 2024) </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.21370</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Neural and Evolutionary Computing">cs.NE</span> </div> </div> <p class="title is-5 mathjax"> SHA-CNN: Scalable Hierarchical Aware Convolutional Neural Network for Edge AI </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Dhakad%2C+N+S">Narendra Singh Dhakad</a>, <a href="/search/cs?searchtype=author&amp;query=Malhotra%2C+Y">Yuvnish Malhotra</a>, <a href="/search/cs?searchtype=author&amp;query=Vishvakarma%2C+S+K">Santosh Kumar Vishvakarma</a>, <a href="/search/cs?searchtype=author&amp;query=Roy%2C+K">Kaushik Roy</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.21370v1-abstract-short" style="display: inline;"> This paper introduces a Scalable Hierarchical Aware Convolutional Neural Network (SHA-CNN) model architecture for Edge AI applications. The proposed hierarchical CNN model is meticulously crafted to strike a balance between computational efficiency and accuracy, addressing the challenges posed by resource-constrained edge devices. SHA-CNN demonstrates its efficacy by achieving accuracy comparable&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.21370v1-abstract-full').style.display = 'inline'; document.getElementById('2407.21370v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.21370v1-abstract-full" style="display: none;"> This paper introduces a Scalable Hierarchical Aware Convolutional Neural Network (SHA-CNN) model architecture for Edge AI applications. The proposed hierarchical CNN model is meticulously crafted to strike a balance between computational efficiency and accuracy, addressing the challenges posed by resource-constrained edge devices. SHA-CNN demonstrates its efficacy by achieving accuracy comparable to state-of-the-art hierarchical models while outperforming baseline models in accuracy metrics. The key innovation lies in the model&#39;s hierarchical awareness, enabling it to discern and prioritize relevant features at multiple levels of abstraction. The proposed architecture classifies data in a hierarchical manner, facilitating a nuanced understanding of complex features within the datasets. Moreover, SHA-CNN exhibits a remarkable capacity for scalability, allowing for the seamless incorporation of new classes. This flexibility is particularly advantageous in dynamic environments where the model needs to adapt to evolving datasets and accommodate additional classes without the need for extensive retraining. Testing has been conducted on the PYNQ Z2 FPGA board to validate the proposed model. The results achieved an accuracy of 99.34%, 83.35%, and 63.66% for MNIST, CIFAR-10, and CIFAR-100 datasets, respectively. For CIFAR-100, our proposed architecture performs hierarchical classification with 10% reduced computation while compromising only 0.7% accuracy with the state-of-the-art. The adaptability of SHA-CNN to FPGA architecture underscores its potential for deployment in edge devices, where computational resources are limited. The SHA-CNN framework thus emerges as a promising advancement in the intersection of hierarchical CNNs, scalability, and FPGA-based Edge AI. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.21370v1-abstract-full').style.display = 'none'; document.getElementById('2407.21370v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 31 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.20628</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Hardware Architecture">cs.AR</span> </div> <div class="is-inline-block" style="margin-left: 0.5rem"> <div class="tags has-addons"> <span class="tag is-dark is-size-7">doi</span> <span class="tag is-light is-size-7"><a class="" href="">10.1109/LES.2024.3485509 <i class="fa fa-external-link" aria-hidden="true"></i></a></span> </div> </div> </div> <p class="title is-5 mathjax"> Configurable Multi-Port Memory Architecture for High-Speed Data Communication </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Dhakad%2C+N+S">Narendra Singh Dhakad</a>, <a href="/search/cs?searchtype=author&amp;query=Vishvakarma%2C+S+K">Santosh Kumar Vishvakarma</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.20628v2-abstract-short" style="display: inline;"> Memory management is necessary with the increasing number of multi-connected AI devices and data bandwidth issues. For this purpose, high-speed multi-port memory is used. The traditional multi-port memory solutions are hard-bounded to a fixed number of ports for read or write operations. In this work, we proposed a pseudo-quad-port memory architecture. Here, ports can be configured (1-port, 2-port&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.20628v2-abstract-full').style.display = 'inline'; document.getElementById('2407.20628v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.20628v2-abstract-full" style="display: none;"> Memory management is necessary with the increasing number of multi-connected AI devices and data bandwidth issues. For this purpose, high-speed multi-port memory is used. The traditional multi-port memory solutions are hard-bounded to a fixed number of ports for read or write operations. In this work, we proposed a pseudo-quad-port memory architecture. Here, ports can be configured (1-port, 2-port, 3-port, 4-port) for all possible combinations of read/write operations for the 6T static random access memory (SRAM) memory array, which improves the speed and reduces the bandwidth for data transfer. The proposed architecture improves the bandwidth of data transfer by 4x. The proposed solution provides 1.3x and 2x area efficiency as compared to dual-port 8T and quad-port 12T SRAM. All the design and performance analyses are done using 65nm CMOS technology. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.20628v2-abstract-full').style.display = 'none'; document.getElementById('2407.20628v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 7 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 30 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">in IEEE Embedded Systems Letters</span> </p> </li> </ol> <div class="is-hidden-tablet"> <!-- feedback for mobile only --> <span class="help" style="display: inline-block;"><a href="">Search v0.5.6 released 