CINXE.COM
Distributed, Parallel, and Cluster Computing
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> <title>Distributed, Parallel, and Cluster Computing </title> <meta name="viewport" content="width=device-width, initial-scale=1"> <link rel="apple-touch-icon" sizes="180x180" href="/static/browse/0.3.4/images/icons/apple-touch-icon.png"> <link rel="icon" type="image/png" sizes="32x32" href="/static/browse/0.3.4/images/icons/favicon-32x32.png"> <link rel="icon" type="image/png" sizes="16x16" href="/static/browse/0.3.4/images/icons/favicon-16x16.png"> <link rel="manifest" href="/static/browse/0.3.4/images/icons/site.webmanifest"> <link rel="mask-icon" href="/static/browse/0.3.4/images/icons/safari-pinned-tab.svg" color="#5bbad5"> <meta name="msapplication-TileColor" content="#da532c"> <meta name="theme-color" content="#ffffff"> <link rel="stylesheet" type="text/css" media="screen" href="/static/browse/0.3.4/css/arXiv.css?v=20240822" /> <link rel="stylesheet" type="text/css" media="print" href="/static/browse/0.3.4/css/arXiv-print.css?v=20200611" /> <link rel="stylesheet" type="text/css" media="screen" href="/static/browse/0.3.4/css/browse_search.css" /> <script language="javascript" src="/static/browse/0.3.4/js/accordion.js" /></script> <script src="/static/browse/0.3.4/js/mathjaxToggle.min.js" type="text/javascript"></script> <script type="text/javascript" language="javascript">mathjaxToggle();</script> </head> <body class="with-cu-identity"> <div class="flex-wrap-footer"> <header> <a href="#content" class="is-sr-only">Skip to main content</a> <!-- start desktop header --> <div class="columns is-vcentered is-hidden-mobile" id="cu-identity"> <div class="column" id="cu-logo"> <a href="https://www.cornell.edu/"><img src="/static/browse/0.3.4/images/icons/cu/cornell-reduced-white-SMALL.svg" alt="Cornell University" /></a> </div><div class="column" id="support-ack"> <span id="support-ack-url">We gratefully acknowledge support from the Simons Foundation, <a href="https://info.arxiv.org/about/ourmembers.html">member institutions</a>, and all contributors.</span> <a href="https://info.arxiv.org/about/donate.html" class="btn-header-donate">Donate</a> </div> </div> <div id="header" class="is-hidden-mobile"> <a aria-hidden="true" tabindex="-1" href="/IgnoreMe"></a> <div class="header-breadcrumbs"> <a href="/"><img src="/static/browse/0.3.4/images/arxiv-logo-one-color-white.svg" alt="arxiv logo" style="height:40px;"/></a> <span>></span> <a href="/list/cs.DC/recent">cs.DC</a> </div> <div class="search-block level-right"> <form class="level-item mini-search" method="GET" action="https://arxiv.org/search"> <div class="field has-addons"> <div class="control"> <input class="input is-small" type="text" name="query" placeholder="Search..." aria-label="Search term or terms" /> <p class="help"><a href="https://info.arxiv.org/help">Help</a> | <a href="https://arxiv.org/search/advanced">Advanced Search</a></p> </div> <div class="control"> <div class="select is-small"> <select name="searchtype" aria-label="Field to search"> <option value="all" selected="selected">All fields</option> <option value="title">Title</option> <option value="author">Author</option> <option value="abstract">Abstract</option> <option value="comments">Comments</option> <option value="journal_ref">Journal reference</option> <option value="acm_class">ACM classification</option> <option value="msc_class">MSC classification</option> <option value="report_num">Report number</option> <option value="paper_id">arXiv identifier</option> <option value="doi">DOI</option> <option value="orcid">ORCID</option> <option value="author_id">arXiv author ID</option> <option value="help">Help pages</option> <option value="full_text">Full text</option> </select> </div> </div> <input type="hidden" name="source" value="header"> <button class="button is-small is-cul-darker">Search</button> </div> </form> </div> </div><!-- /end desktop header --> <div class="mobile-header"> <div class="columns is-mobile"> <div class="column logo-arxiv"><a href="https://arxiv.org/"><img src="/static/browse/0.3.4/images/arxiv-logomark-small-white.svg" alt="arXiv logo" style="height:60px;" /></a></div> <div class="column logo-cornell"><a href="https://www.cornell.edu/"> <picture> <source media="(min-width: 501px)" srcset="/static/browse/0.3.4/images/icons/cu/cornell-reduced-white-SMALL.svg 400w" sizes="400w" /> <source srcset="/static/browse/0.3.4/images/icons/cu/cornell_seal_simple_black.svg 2x" /> <img src="/static/browse/0.3.4/images/icons/cu/cornell-reduced-white-SMALL.svg" alt="Cornell University Logo" /> </picture> </a></div> <div class="column nav" id="toggle-container" role="menubar"> <button class="toggle-control"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512" class="icon filter-white"><title>open search</title><path d="M505 442.7L405.3 343c-4.5-4.5-10.6-7-17-7H372c27.6-35.3 44-79.7 44-128C416 93.1 322.9 0 208 0S0 93.1 0 208s93.1 208 208 208c48.3 0 92.7-16.4 128-44v16.3c0 6.4 2.5 12.5 7 17l99.7 99.7c9.4 9.4 24.6 9.4 33.9 0l28.3-28.3c9.4-9.4 9.4-24.6.1-34zM208 336c-70.7 0-128-57.2-128-128 0-70.7 57.2-128 128-128 70.7 0 128 57.2 128 128 0 70.7-57.2 128-128 128z"/></svg></button> <div class="mobile-toggle-block toggle-target"> <form class="mobile-search-form" method="GET" action="https://arxiv.org/search"> <div class="field has-addons"> <input class="input" type="text" name="query" placeholder="Search..." aria-label="Search term or terms" /> <input type="hidden" name="source" value="header"> <input type="hidden" name="searchtype" value="all"> <button class="button">GO</button> </div> </form> </div> <button class="toggle-control"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512" class="icon filter-white" role="menu"><title>open navigation menu</title><path d="M16 132h416c8.837 0 16-7.163 16-16V76c0-8.837-7.163-16-16-16H16C7.163 60 0 67.163 0 76v40c0 8.837 7.163 16 16 16zm0 160h416c8.837 0 16-7.163 16-16v-40c0-8.837-7.163-16-16-16H16c-8.837 0-16 7.163-16 16v40c0 8.837 7.163 16 16 16zm0 160h416c8.837 0 16-7.163 16-16v-40c0-8.837-7.163-16-16-16H16c-8.837 0-16 7.163-16 16v40c0 8.837 7.163 16 16 16z"/ ></svg></button> <div class="mobile-toggle-block toggle-target"> <nav class="mobile-menu" aria-labelledby="mobilemenulabel"> <h2 id="mobilemenulabel">quick links</h2> <ul> <li><a href="https://arxiv.org/login">Login</a></li> <li><a href="https://info.arxiv.org/help">Help Pages</a></li> <li><a href="https://info.arxiv.org/about">About</a></li> </ul> </nav> </div> </div> </div> </div><!-- /end mobile-header --> </header> <main> <div id="content"> <div id='content-inner'> <div id='dlpage'> <h1>Distributed, Parallel, and Cluster Computing</h1> <ul> <li><a href="#item0">New submissions</a></li> <li><a href="#item6">Cross-lists</a></li> <li><a href="#item11">Replacements</a></li> </ul> <p>See <a id="recent-cs.DC" aria-labelledby="recent-cs.DC" href="/list/cs.DC/recent">recent</a> articles</p> <h3>Showing new listings for Friday, 22 November 2024</h3> <div class='paging'>Total of 19 entries </div> <div class='morefewer'>Showing up to 2000 entries per page: <a href=/list/cs.DC/new?skip=0&show=1000 rel="nofollow"> fewer</a> | <span style="color: #454545">more</span> | <span style="color: #454545">all</span> </div> <dl id='articles'> <h3>New submissions (showing 5 of 5 entries)</h3> <dt> <a name='item1'>[1]</a> <a href ="/abs/2411.13809" title="Abstract" id="2411.13809"> arXiv:2411.13809 </a> [<a href="/pdf/2411.13809" title="Download PDF" id="pdf-2411.13809" aria-labelledby="pdf-2411.13809">pdf</a>, <a href="/format/2411.13809" title="Other formats" id="oth-2411.13809" aria-labelledby="oth-2411.13809">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> DCSim: Computing and Networking Integration based Container Scheduling Simulator for Data Centers </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Hu,+J">Jinlong Hu</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Rao,+Z">Zhizhe Rao</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Liu,+X">Xingchen Liu</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Deng,+L">Lihao Deng</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Dong,+S">Shoubin Dong</a></div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Distributed, Parallel, and Cluster Computing (cs.DC)</span> </div> <p class='mathjax'> The increasing prevalence of cloud-native technologies, particularly containers, has led to the widespread adoption of containerized deployments in data centers. The advancement of deep neural network models has increased the demand for container-based distributed model training and inference, where frequent data transmission among nodes has emerged as a significant performance bottleneck. However, traditional container scheduling simulators often overlook the influence of network modeling on the efficiency of container scheduling, primarily concentrating on modeling computational resources. In this paper, we focus on a container scheduling simulator based on collaboration between computing and networking within data centers. We propose a new container scheduling simulator for data centers, named DCSim. The simulator consists of several modules: a data center module, a network simulation module, a container scheduling module, a discrete event-driven module, and a data collection and analysis module. Together, these modules provide heterogeneous computing power modeling and dynamic network simulation capabilities. We design a discrete event model using SimPy to represent various aspects of container processing, including container requests, scheduling, execution, pauses, communication, migration, and termination within data centers. Among these, lightweight virtualization technology based on Mininet is employed to construct a software-defined network. An experimental environment for container scheduling simulation was established, and functional and performance tests were conducted on the simulator to validate its scheduling simulation capabilities. </p> </div> </dd> <dt> <a name='item2'>[2]</a> <a href ="/abs/2411.13861" title="Abstract" id="2411.13861"> arXiv:2411.13861 </a> [<a href="/pdf/2411.13861" title="Download PDF" id="pdf-2411.13861" aria-labelledby="pdf-2411.13861">pdf</a>, <a href="https://arxiv.org/html/2411.13861v1" title="View HTML" id="html-2411.13861" aria-labelledby="html-2411.13861" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2411.13861" title="Other formats" id="oth-2411.13861" aria-labelledby="oth-2411.13861">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> Asynchronous Federated Learning Using Outdated Local Updates Over TDMA Channel </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Song,+J">Jaeyoung Song</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Hong,+J">Jun-Pyo Hong</a></div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Distributed, Parallel, and Cluster Computing (cs.DC)</span> </div> <p class='mathjax'> In this paper, we consider asynchronous federated learning (FL) over time-division multiple access (TDMA)-based communication networks. <br>Considering TDMA for transmitting local updates can introduce significant delays to conventional synchronous FL, where all devices start local training from a common global model. In the proposed asynchronous FL approach, we partition devices into multiple TDMA groups, enabling simultaneous local computation and communication across different groups. This enhances time efficiency at the expense of staleness of local updates. We derive the relationship between the staleness of local updates and the size of the TDMA group in a training round. Moreover, our convergence analysis shows that although outdated local updates hinder appropriate global model updates, asynchronous FL over the TDMA channel converges even in the presence of data heterogeneity. Notably, the analysis identifies the impact of outdated local updates on convergence rate. <br>Based on observations from our convergence rate, we refine asynchronous FL strategy by introducing an intentional delay in local training. <br>This refinement accelerates the convergence by reducing the staleness of local updates. <br>Our extensive simulation results demonstrate that asynchronous FL with the intentional delay can rapidly reduce global loss by lowering the staleness of local updates in resource-limited wireless communication networks. </p> </div> </dd> <dt> <a name='item3'>[3]</a> <a href ="/abs/2411.13979" title="Abstract" id="2411.13979"> arXiv:2411.13979 </a> [<a href="/pdf/2411.13979" title="Download PDF" id="pdf-2411.13979" aria-labelledby="pdf-2411.13979">pdf</a>, <a href="https://arxiv.org/html/2411.13979v1" title="View HTML" id="html-2411.13979" aria-labelledby="html-2411.13979" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2411.13979" title="Other formats" id="oth-2411.13979" aria-labelledby="oth-2411.13979">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> FedRAV: Hierarchically Federated Region-Learning for Traffic Object Classification of Autonomous Vehicles </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Zhai,+Y">Yijun Zhai</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Zhou,+P">Pengzhan Zhou</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=He,+Y">Yuepeng He</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Qu,+F">Fang Qu</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Qin,+Z">Zhida Qin</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Jiao,+X">Xianlong Jiao</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Liu,+G">Guiyan Liu</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Guo,+S">Songtao Guo</a></div> <div class='list-comments mathjax'><span class='descriptor'>Comments:</span> 8 pages, 4 figures </div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Distributed, Parallel, and Cluster Computing (cs.DC)</span>; Artificial Intelligence (cs.AI) </div> <p class='mathjax'> The emerging federated learning enables distributed autonomous vehicles to train equipped deep learning models collaboratively without exposing their raw data, providing great potential for utilizing explosively growing autonomous driving data. However, considering the complicated traffic environments and driving scenarios, deploying federated learning for autonomous vehicles is inevitably challenged by non-independent and identically distributed (Non-IID) data of vehicles, which may lead to failed convergence and low training accuracy. In this paper, we propose a novel hierarchically Federated Region-learning framework of Autonomous Vehicles (FedRAV), a two-stage framework, which adaptively divides a large area containing vehicles into sub-regions based on the defined region-wise distance, and achieves personalized vehicular models and regional models. This approach ensures that the personalized vehicular model adopts the beneficial models while discarding the unprofitable ones. We validate our FedRAV framework against existing federated learning algorithms on three real-world autonomous driving datasets in various heterogeneous settings. The experiment results demonstrate that our framework outperforms those known algorithms, and improves the accuracy by at least 3.69%. The source code of FedRAV is available at: <a href="https://github.com/yjzhai-cs/FedRAV" rel="external noopener nofollow" class="link-external link-https">this https URL</a>. </p> </div> </dd> <dt> <a name='item4'>[4]</a> <a href ="/abs/2411.14070" title="Abstract" id="2411.14070"> arXiv:2411.14070 </a> [<a href="/pdf/2411.14070" title="Download PDF" id="pdf-2411.14070" aria-labelledby="pdf-2411.14070">pdf</a>, <a href="https://arxiv.org/html/2411.14070v1" title="View HTML" id="html-2411.14070" aria-labelledby="html-2411.14070" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2411.14070" title="Other formats" id="oth-2411.14070" aria-labelledby="oth-2411.14070">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> Towards Adaptive Asynchronous Federated Learning for Human Activity Recognition </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Gajanin,+R">Rastko Gajanin</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Danilenka,+A">Anastasiya Danilenka</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Morichetta,+A">Andrea Morichetta</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Nastic,+S">Stefan Nastic</a></div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Distributed, Parallel, and Cluster Computing (cs.DC)</span> </div> <p class='mathjax'> In this work, we tackle the problem of performing multi-label classification in the case of extremely heterogeneous data and with decentralized Machine Learning. Solving this issue is very important in IoT scenarios, where data coming from various sources, collected by heterogeneous devices, serve the learning of a distributed ML model through Federated Learning (FL). Specifically, we focus on the combination of FL applied to Human Activity Recognition HAR), where the task is to detect which kind of movements or actions individuals perform. In this case, transitioning from centralized learning (CL) to federated learning is non-trivial as HAR displays heterogeneity in action and devices, leading to significant skews in label and feature distributions. We address this scenario by presenting concrete solutions and tools for transitioning from centralized to FL for non-IID scenarios, outlining the main design decisions that need to be taken. Leveraging an open-sourced HAR dataset, we experimentally evaluate the effects that data augmentation, scaling, optimizer, learning rate, and batch size choices have on the performance of resulting machine learning models. Some of our main findings include using SGD-m as an optimizer, global feature scaling across clients, and persistent feature skew in the presence of heterogeneous HAR data. Finally, we provide an open-source extension of the Flower framework that enables asynchronous FL. </p> </div> </dd> <dt> <a name='item5'>[5]</a> <a href ="/abs/2411.14420" title="Abstract" id="2411.14420"> arXiv:2411.14420 </a> [<a href="/pdf/2411.14420" title="Download PDF" id="pdf-2411.14420" aria-labelledby="pdf-2411.14420">pdf</a>, <a href="/format/2411.14420" title="Other formats" id="oth-2411.14420" aria-labelledby="oth-2411.14420">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> Aggregating Funnels for Faster Fetch&Add and Queues </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Roh,+Y">Younghun Roh</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Wei,+Y">Yuanhao Wei</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Ruppert,+E">Eric Ruppert</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Fatourou,+P">Panagiota Fatourou</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Jayanti,+S">Siddhartha Jayanti</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Shun,+J">Julian Shun</a></div> <div class='list-comments mathjax'><span class='descriptor'>Comments:</span> This is the full version of the paper appearing in PPoPP 2025 </div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Distributed, Parallel, and Cluster Computing (cs.DC)</span> </div> <p class='mathjax'> Many concurrent algorithms require processes to perform fetch-and-add operations on a single memory location, which can be a hot spot of contention. We present a novel algorithm called Aggregating Funnels that reduces this contention by spreading the fetch-and-add operations across multiple memory locations. It aggregates fetch-and-add operations into batches so that the batch can be performed by a single hardware fetch-and-add instruction on one location and all operations in the batch can efficiently compute their results by performing a fetch-and-add instruction on a different location. We show experimentally that this approach achieves higher throughput than previous combining techniques, such as Combining Funnels, and is substantially more scalable than applying hardware fetch-and-add instructions on a single memory location. We show that replacing the fetch-and-add instructions in the fastest state-of-the-art concurrent queue by our Aggregating Funnels eliminates a bottleneck and greatly improves the queue's overall throughput. </p> </div> </dd> </dl> <dl id='articles'> <h3>Cross submissions (showing 5 of 5 entries)</h3> <dt> <a name='item6'>[6]</a> <a href ="/abs/2411.12694" title="Abstract" id="2411.12694"> arXiv:2411.12694 </a> (cross-list from cs.DS) [<a href="/pdf/2411.12694" title="Download PDF" id="pdf-2411.12694" aria-labelledby="pdf-2411.12694">pdf</a>, <a href="https://arxiv.org/html/2411.12694v2" title="View HTML" id="html-2411.12694" aria-labelledby="html-2411.12694" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2411.12694" title="Other formats" id="oth-2411.12694" aria-labelledby="oth-2411.12694">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> Local Density and its Distributed Approximation </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Christiansen,+A+B">Aleksander Bj酶rn Christiansen</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=van+der+Hoog,+I">Ivor van der Hoog</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Rotenberg,+E">Eva Rotenberg</a></div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Data Structures and Algorithms (cs.DS)</span>; Distributed, Parallel, and Cluster Computing (cs.DC) </div> <p class='mathjax'> The densest subgraph problem is a classic problem in combinatorial optimisation. Danisch, Chan, and Sozio propose a definition for \emph{local density} that assigns to each vertex $v$ a value $\rho^*(v)$. This local density is a generalisation of the maximum subgraph density of a graph. I.e., if $\rho(G)$ is the subgraph density of a finite graph $G$, then $\rho(G)$ equals the maximum local density $\rho^*(v)$ over vertices $v$ in $G$. They approximate the local density of each vertex with no theoretical (asymptotic) guarantees. <br>We provide an extensive study of this local density measure. Just as with (global) maximum subgraph density, we show that there is a dual relation between the local out-degrees and the minimum out-degree orientations of the graph. We introduce the definition of the local out-degree $g^*(v)$ of a vertex $v$, and show it to be equal to the local density $\rho^*(v)$. We consider the local out-degree to be conceptually simpler, shorter to define, and easier to compute. <br>Using the local out-degree we show a previously unknown fact: that existing algorithms already dynamically approximate the local density. Next, we provide the first distributed algorithms that compute the local density with provable guarantees: given any $\varepsilon$ such that $\varepsilon^{-1} \in O(poly \, n)$, we show a deterministic distributed algorithm in the LOCAL model where, after $O(\varepsilon^{-2} \log^2 n)$ rounds, every vertex $v$ outputs a $(1 + \varepsilon)$-approximation of their local density $\rho^*(v)$. In CONGEST, we show a deterministic distributed algorithm that requires $\text{poly}(\log n,\varepsilon^{-1}) \cdot 2^{O(\sqrt{\log n})}$ rounds, which is sublinear in $n$. <br>As a corollary, we obtain the first deterministic algorithm running in a sublinear number of rounds for $(1+\varepsilon)$-approximate densest subgraph detection in the CONGEST model. </p> </div> </dd> <dt> <a name='item7'>[7]</a> <a href ="/abs/2411.13583" title="Abstract" id="2411.13583"> arXiv:2411.13583 </a> (cross-list from cs.CR) [<a href="/pdf/2411.13583" title="Download PDF" id="pdf-2411.13583" aria-labelledby="pdf-2411.13583">pdf</a>, <a href="https://arxiv.org/html/2411.13583v1" title="View HTML" id="html-2411.13583" aria-labelledby="html-2411.13583" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2411.13583" title="Other formats" id="oth-2411.13583" aria-labelledby="oth-2411.13583">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> Enhanced FIWARE-Based Architecture for Cyberphysical Systems With Tiny Machine Learning and Machine Learning Operations: A Case Study on Urban Mobility Systems </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Conde,+J">Javier Conde</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Munoz-Arcentales,+A">Andr茅s Munoz-Arcentales</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Alonso,+%C3%81">脕lvaro Alonso</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Salvach%C3%BAa,+J">Joaqu铆n Salvach煤a</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Huecas,+G">Gabriel Huecas</a></div> <div class='list-journal-ref'><span class='descriptor'>Journal-ref:</span> IT Professional ( Volume: 26, Issue: 5, Sept.-Oct. 2024) </div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Cryptography and Security (cs.CR)</span>; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI) </div> <p class='mathjax'> The rise of AI and the Internet of Things is accelerating the digital transformation of society. Mobility computing presents specific barriers due to its real-time requirements, decentralization, and connectivity through wireless networks. New research on edge computing and tiny machine learning (tinyML) explores the execution of AI models on low-performance devices to address these issues. However, there are not many studies proposing agnostic architectures that manage the entire lifecycle of intelligent cyberphysical systems. This article extends a previous architecture based on FIWARE software components to implement the machine learning operations flow, enabling the management of the entire tinyML lifecycle in cyberphysical systems. We also provide a use case to showcase how to implement the FIWARE architecture through a complete example of a smart traffic system. We conclude that the FIWARE ecosystem constitutes a real reference option for developing tinyML and edge computing in cyberphysical systems. </p> </div> </dd> <dt> <a name='item8'>[8]</a> <a href ="/abs/2411.13740" title="Abstract" id="2411.13740"> arXiv:2411.13740 </a> (cross-list from cs.LG) [<a href="/pdf/2411.13740" title="Download PDF" id="pdf-2411.13740" aria-labelledby="pdf-2411.13740">pdf</a>, <a href="https://arxiv.org/html/2411.13740v1" title="View HTML" id="html-2411.13740" aria-labelledby="html-2411.13740" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2411.13740" title="Other formats" id="oth-2411.13740" aria-labelledby="oth-2411.13740">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> Federated Continual Learning for Edge-AI: A Comprehensive Survey </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Wang,+Z">Zi Wang</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Wu,+F">Fei Wu</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Yu,+F">Feng Yu</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Zhou,+Y">Yurui Zhou</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Hu,+J">Jia Hu</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Min,+G">Geyong Min</a></div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Machine Learning (cs.LG)</span>; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI) </div> <p class='mathjax'> Edge-AI, the convergence of edge computing and artificial intelligence (AI), has become a promising paradigm that enables the deployment of advanced AI models at the network edge, close to users. In Edge-AI, federated continual learning (FCL) has emerged as an imperative framework, which fuses knowledge from different clients while preserving data privacy and retaining knowledge from previous tasks as it learns new ones. By so doing, FCL aims to ensure stable and reliable performance of learning models in dynamic and distributed environments. In this survey, we thoroughly review the state-of-the-art research and present the first comprehensive survey of FCL for Edge-AI. We categorize FCL methods based on three task characteristics: federated class continual learning, federated domain continual learning, and federated task continual learning. For each category, an in-depth investigation and review of the representative methods are provided, covering background, challenges, problem formalisation, solutions, and limitations. Besides, existing real-world applications empowered by FCL are reviewed, indicating the current progress and potential of FCL in diverse application domains. Furthermore, we discuss and highlight several prospective research directions of FCL such as algorithm-hardware co-design for FCL and FCL with foundation models, which could provide insights into the future development and practical deployment of FCL in the era of Edge-AI. </p> </div> </dd> <dt> <a name='item9'>[9]</a> <a href ="/abs/2411.13820" title="Abstract" id="2411.13820"> arXiv:2411.13820 </a> (cross-list from cs.CL) [<a href="/pdf/2411.13820" title="Download PDF" id="pdf-2411.13820" aria-labelledby="pdf-2411.13820">pdf</a>, <a href="https://arxiv.org/html/2411.13820v1" title="View HTML" id="html-2411.13820" aria-labelledby="html-2411.13820" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2411.13820" title="Other formats" id="oth-2411.13820" aria-labelledby="oth-2411.13820">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> InstCache: A Predictive Cache for LLM Serving </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Zou,+L">Longwei Zou</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Liu,+T">Tingfeng Liu</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Chen,+K">Kai Chen</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Kong,+J">Jiangang Kong</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Deng,+Y">Yangdong Deng</a></div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Computation and Language (cs.CL)</span>; Distributed, Parallel, and Cluster Computing (cs.DC) </div> <p class='mathjax'> Large language models are revolutionizing every aspect of human life. However, the unprecedented power comes at the cost of significant computing intensity, suggesting long latency and large energy footprint. Key-Value Cache and Semantic Cache have been proposed as a solution to the above problem, but both suffer from limited scalability due to significant memory cost for each token or instruction embeddings. Motivated by the observations that most instructions are short, repetitive and predictable by LLMs, we propose to predict user-instructions by an instruction-aligned LLM and store them in a predictive cache, so-called InstCache. We introduce an instruction pre-population algorithm based on the negative log likelihood of instructions, determining the cache size with regard to the hit rate. The proposed InstCache is efficiently implemented as a hash table with minimal lookup latency for deployment. Experimental results show that InstCache can achieve up to 51.34% hit rate on LMSys dataset, which corresponds to a 2x speedup, at a memory cost of only 4.5GB. </p> </div> </dd> <dt> <a name='item10'>[10]</a> <a href ="/abs/2411.14006" title="Abstract" id="2411.14006"> arXiv:2411.14006 </a> (cross-list from cs.DS) [<a href="/pdf/2411.14006" title="Download PDF" id="pdf-2411.14006" aria-labelledby="pdf-2411.14006">pdf</a>, <a href="https://arxiv.org/html/2411.14006v1" title="View HTML" id="html-2411.14006" aria-labelledby="html-2411.14006" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2411.14006" title="Other formats" id="oth-2411.14006" aria-labelledby="oth-2411.14006">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> Experimental comparison of graph-based approximate nearest neighbor search algorithms on edge devices </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Ganbarov,+A">Ali Ganbarov</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Yuan,+J">Jicheng Yuan</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Le-Tuan,+A">Anh Le-Tuan</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Hauswirth,+M">Manfred Hauswirth</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Le-Phuoc,+D">Danh Le-Phuoc</a></div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Data Structures and Algorithms (cs.DS)</span>; Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF) </div> <p class='mathjax'> In this paper, we present an experimental comparison of various graph-based approximate nearest neighbor (ANN) search algorithms deployed on edge devices for real-time nearest neighbor search applications, such as smart city infrastructure and autonomous vehicles. To the best of our knowledge, this specific comparative analysis has not been previously conducted. While existing research has explored graph-based ANN algorithms, it has often been limited to single-threaded implementations on standard commodity hardware. Our study leverages the full computational and storage capabilities of edge devices, incorporating additional metrics such as insertion and deletion latency of new vectors and power consumption. This comprehensive evaluation aims to provide valuable insights into the performance and suitability of these algorithms for edge-based real-time tracking systems enhanced by nearest-neighbor search algorithms. </p> </div> </dd> </dl> <dl id='articles'> <h3>Replacement submissions (showing 9 of 9 entries)</h3> <dt> <a name='item11'>[11]</a> <a href ="/abs/2105.04086" title="Abstract" id="2105.04086"> arXiv:2105.04086 </a> (replaced) [<a href="/pdf/2105.04086" title="Download PDF" id="pdf-2105.04086" aria-labelledby="pdf-2105.04086">pdf</a>, <a href="https://arxiv.org/html/2105.04086v2" title="View HTML" id="html-2105.04086" aria-labelledby="html-2105.04086" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2105.04086" title="Other formats" id="oth-2105.04086" aria-labelledby="oth-2105.04086">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> Deep Reinforcement Learning-based Methods for Resource Scheduling in Cloud Computing: A Review and Future Directions </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Zhou,+G">Guangyao Zhou</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Tian,+W">Wenhong Tian</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Buyya,+R">Rajkumar Buyya</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Xue,+R">Ruini Xue</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Song,+L">Liang Song</a></div> <div class='list-comments mathjax'><span class='descriptor'>Comments:</span> 22 pages,14 figures </div> <div class='list-journal-ref'><span class='descriptor'>Journal-ref:</span> Artif. Intell. Rev. 57 (5) (2024) 124 </div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Distributed, Parallel, and Cluster Computing (cs.DC)</span> </div> <p class='mathjax'> As the quantity and complexity of information processed by software systems increase, large-scale software systems have an increasing requirement for high-performance distributed computing systems. With the acceleration of the Internet in Web 2.0, Cloud computing as a paradigm to provide dynamic, uncertain and elastic services has shown superiorities to meet the computing needs dynamically. Without an appropriate scheduling approach, extensive Cloud computing may cause high energy consumptions and high cost, in addition that high energy consumption will cause massive carbon dioxide emissions. Moreover, inappropriate scheduling will reduce the service life of physical devices as well as increase response time to users' request. Hence, efficient scheduling of resource or optimal allocation of request, that usually a NP-hard problem, is one of the prominent issues in emerging trends of Cloud computing. Focusing on improving quality of service (QoS), reducing cost and abating contamination, researchers have conducted extensive work on resource scheduling problems of Cloud computing over years. Nevertheless, growing complexity of Cloud computing, that the super-massive distributed system, is limiting the application of scheduling approaches. Machine learning, a utility method to tackle problems in complex scenes, is used to resolve the resource scheduling of Cloud computing as an innovative idea in recent years. Deep reinforcement learning (DRL), a combination of deep learning (DL) and reinforcement learning (RL), is one branch of the machine learning and has a considerable prospect in resource scheduling of Cloud computing. This paper surveys the methods of resource scheduling with focus on DRL-based scheduling approaches in Cloud computing, also reviews the application of DRL as well as discusses challenges and future directions of DRL in scheduling of Cloud computing. </p> </div> </dd> <dt> <a name='item12'>[12]</a> <a href ="/abs/2406.19430" title="Abstract" id="2406.19430"> arXiv:2406.19430 </a> (replaced) [<a href="/pdf/2406.19430" title="Download PDF" id="pdf-2406.19430" aria-labelledby="pdf-2406.19430">pdf</a>, <a href="/format/2406.19430" title="Other formats" id="oth-2406.19430" aria-labelledby="oth-2406.19430">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> Invitation to Local Algorithms </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Rozho%C5%88,+V">V谩clav Rozho艌</a></div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Distributed, Parallel, and Cluster Computing (cs.DC)</span>; Data Structures and Algorithms (cs.DS); Combinatorics (math.CO) </div> <p class='mathjax'> This text provides an introduction to distributed local algorithms -- an area at the intersection of theoretical computer science and discrete mathematics. We collect recent results in the area and demonstrate how they lead to a clean theory. We also discuss many connections of local algorithms to fields such as parallel, distributed, and sublinear algorithms, or descriptive combinatorics. </p> </div> </dd> <dt> <a name='item13'>[13]</a> <a href ="/abs/2409.04022" title="Abstract" id="2409.04022"> arXiv:2409.04022 </a> (replaced) [<a href="/pdf/2409.04022" title="Download PDF" id="pdf-2409.04022" aria-labelledby="pdf-2409.04022">pdf</a>, <a href="https://arxiv.org/html/2409.04022v4" title="View HTML" id="html-2409.04022" aria-labelledby="html-2409.04022" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2409.04022" title="Other formats" id="oth-2409.04022" aria-labelledby="oth-2409.04022">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> Heterogeneity-Aware Cooperative Federated Edge Learning with Adaptive Computation and Communication Compression </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Zhang,+Z">Zhenxiao Zhang</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Gao,+Z">Zhidong Gao</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Guo,+Y">Yuanxiong Guo</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Gong,+Y">Yanmin Gong</a></div> <div class='list-comments mathjax'><span class='descriptor'>Comments:</span> 20 pages, 8 figures, accepted by IEEE Transactions on Mobile Computing </div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Distributed, Parallel, and Cluster Computing (cs.DC)</span>; Machine Learning (cs.LG) </div> <p class='mathjax'> Motivated by the drawbacks of cloud-based federated learning (FL), cooperative federated edge learning (CFEL) has been proposed to improve efficiency for FL over mobile edge networks, where multiple edge servers collaboratively coordinate the distributed model training across a large number of edge devices. However, CFEL faces critical challenges arising from dynamic and heterogeneous device properties, which slow down the convergence and increase resource consumption. This paper proposes a heterogeneity-aware CFEL scheme called \textit{Heterogeneity-Aware Cooperative Edge-based Federated Averaging} (HCEF) that aims to maximize the model accuracy while minimizing the training time and energy consumption via adaptive computation and communication compression in CFEL. By theoretically analyzing how local update frequency and gradient compression affect the convergence error bound in CFEL, we develop an efficient online control algorithm for HCEF to dynamically determine local update frequencies and compression ratios for heterogeneous devices. Experimental results show that compared with prior schemes, the proposed HCEF scheme can maintain higher model accuracy while reducing training latency and improving energy efficiency simultaneously. </p> </div> </dd> <dt> <a name='item14'>[14]</a> <a href ="/abs/2409.13503" title="Abstract" id="2409.13503"> arXiv:2409.13503 </a> (replaced) [<a href="/pdf/2409.13503" title="Download PDF" id="pdf-2409.13503" aria-labelledby="pdf-2409.13503">pdf</a>, <a href="https://arxiv.org/html/2409.13503v3" title="View HTML" id="html-2409.13503" aria-labelledby="html-2409.13503" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2409.13503" title="Other formats" id="oth-2409.13503" aria-labelledby="oth-2409.13503">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> SatFed: A Resource-Efficient LEO Satellite-Assisted Heterogeneous Federated Learning Framework </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Zhang,+Y">Yuxin Zhang</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Lin,+Z">Zheng Lin</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Chen,+Z">Zhe Chen</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Fang,+Z">Zihan Fang</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Zhu,+W">Wenjun Zhu</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Chen,+X">Xianhao Chen</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Zhao,+J">Jin Zhao</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Gao,+Y">Yue Gao</a></div> <div class='list-comments mathjax'><span class='descriptor'>Comments:</span> 10 pages, 12 figures </div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Distributed, Parallel, and Cluster Computing (cs.DC)</span>; Artificial Intelligence (cs.AI); Machine Learning (cs.LG) </div> <p class='mathjax'> Traditional federated learning (FL) frameworks rely heavily on terrestrial networks, where coverage limitations and increasing bandwidth congestion significantly hinder model convergence. Fortunately, the advancement of low-Earth orbit (LEO) satellite networks offers promising new communication avenues to augment traditional terrestrial FL. Despite this potential, the limited satellite-ground communication bandwidth and the heterogeneous operating environments of ground devices-including variations in data, bandwidth, and computing power-pose substantial challenges for effective and robust satellite-assisted FL. To address these challenges, we propose SatFed, a resource-efficient satellite-assisted heterogeneous FL framework. SatFed implements freshness-based model prioritization queues to optimize the use of highly constrained satellite-ground bandwidth, ensuring the transmission of the most critical models. Additionally, a multigraph is constructed to capture real-time heterogeneous relationships between devices, including data distribution, terrestrial bandwidth, and computing capability. This multigraph enables SatFed to aggregate satellite-transmitted models into peer guidance, enhancing local training in heterogeneous environments. Extensive experiments with real-world LEO satellite networks demonstrate that SatFed achieves superior performance and robustness compared to state-of-the-art benchmarks. </p> </div> </dd> <dt> <a name='item15'>[15]</a> <a href ="/abs/2411.10003" title="Abstract" id="2411.10003"> arXiv:2411.10003 </a> (replaced) [<a href="/pdf/2411.10003" title="Download PDF" id="pdf-2411.10003" aria-labelledby="pdf-2411.10003">pdf</a>, <a href="https://arxiv.org/html/2411.10003v2" title="View HTML" id="html-2411.10003" aria-labelledby="html-2411.10003" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2411.10003" title="Other formats" id="oth-2411.10003" aria-labelledby="oth-2411.10003">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-scale MoE Models </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Wang,+W">Wei Wang</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Lai,+Z">Zhiquan Lai</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Li,+S">Shengwei Li</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Liu,+W">Weijie Liu</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Ge,+K">Keshi Ge</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Shen,+A">Ao Shen</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Su,+H">Huayou Su</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Li,+D">Dongsheng Li</a></div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Distributed, Parallel, and Cluster Computing (cs.DC)</span> </div> <p class='mathjax'> The size of deep learning models has been increasing to enhance model quality. The linear increase in training computation budget with model size means that training an extremely large-scale model is exceedingly time-consuming. Recently, the Mixture of Expert (MoE) has drawn significant attention as it can scale models to extra-large sizes with a stable computation budget. However, inefficient distributed training of large-scale MoE models hinders their broader application. Specifically, a considerable dynamic load imbalance occurs among devices during training, significantly reducing throughput. Several load-balancing works have been proposed to address the challenge. System-level solutions draw more attention for their hardware affinity and non-disruption of model convergence compared to algorithm-level ones. However, they are troubled by high communication costs and poor communication-computation overlapping. To address these challenges, we propose a systematic load-balancing method, Pro-Prophet, which consists of a planner and a scheduler for efficient parallel training of large-scale MoE models. To adapt to the dynamic load imbalance, we profile training statistics and use them to design Pro-Prophet. For lower communication volume, Pro-Prophet planner determines a series of lightweight load-balancing strategies and efficiently searches for a communication-efficient one for training based on the statistics. For sufficient overlapping of communication and computation, Pro-Prophet scheduler schedules the data-dependent operations based on the statistics and operation features, further improving the training throughput. Experimental results indicate that Pro-Prophet achieves up to 2.66x speedup compared to Deepspeed-MoE and FasterMoE. Additionally, Pro-Prophet achieves a load-balancing enhancement of up to 11.01x when compared to FasterMoE. </p> </div> </dd> <dt> <a name='item16'>[16]</a> <a href ="/abs/2405.20988" title="Abstract" id="2405.20988"> arXiv:2405.20988 </a> (replaced) [<a href="/pdf/2405.20988" title="Download PDF" id="pdf-2405.20988" aria-labelledby="pdf-2405.20988">pdf</a>, <a href="https://arxiv.org/html/2405.20988v4" title="View HTML" id="html-2405.20988" aria-labelledby="html-2405.20988" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2405.20988" title="Other formats" id="oth-2405.20988" aria-labelledby="oth-2405.20988">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Theologitis,+M">Michail Theologitis</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Frangias,+G">Georgios Frangias</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Anestis,+G">Georgios Anestis</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Samoladas,+V">Vasilis Samoladas</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Deligiannakis,+A">Antonios Deligiannakis</a></div> <div class='list-comments mathjax'><span class='descriptor'>Comments:</span> Accepted as research paper at EDBT 2025 </div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Machine Learning (cs.LG)</span>; Distributed, Parallel, and Cluster Computing (cs.DC) </div> <p class='mathjax'> The ever-growing volume and decentralized nature of data, coupled with the need to harness it and extract knowledge, have led to the extensive use of distributed deep learning (DDL) techniques for training. These techniques rely on local training performed at distributed nodes using locally collected data, followed by a periodic synchronization process that combines these models to create a unified global model. However, the frequent synchronization of deep learning models, encompassing millions to many billions of parameters, creates a communication bottleneck, severely hindering scalability. Worse yet, DDL algorithms typically waste valuable bandwidth and render themselves less practical in bandwidth-constrained federated settings by relying on overly simplistic, periodic, and rigid synchronization schedules. These inefficiencies make the training process increasingly impractical as they demand excessive time for data communication. To address these shortcomings, we propose Federated Dynamic Averaging (FDA), a communication-efficient DDL strategy that dynamically triggers synchronization based on the value of the model variance. In essence, the costly synchronization step is triggered only if the local models -- initialized from a common global model after each synchronization -- have significantly diverged. This decision is facilitated by the transmission of a small local state from each distributed node. Through extensive experiments across a wide range of learning tasks we demonstrate that FDA reduces communication cost by orders of magnitude, compared to both traditional and cutting-edge communication-efficient algorithms. Additionally, we show that FDA maintains robust performance across diverse data heterogeneity settings. </p> </div> </dd> <dt> <a name='item17'>[17]</a> <a href ="/abs/2406.10916" title="Abstract" id="2406.10916"> arXiv:2406.10916 </a> (replaced) [<a href="/pdf/2406.10916" title="Download PDF" id="pdf-2406.10916" aria-labelledby="pdf-2406.10916">pdf</a>, <a href="https://arxiv.org/html/2406.10916v2" title="View HTML" id="html-2406.10916" aria-labelledby="html-2406.10916" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2406.10916" title="Other formats" id="oth-2406.10916" aria-labelledby="oth-2406.10916">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> M-SET: Multi-Drone Swarm Intelligence Experimentation with Collision Avoidance Realism </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Qin,+C">Chuhao Qin</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Robins,+A">Alexander Robins</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Lillywhite-Roake,+C">Callum Lillywhite-Roake</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Pearce,+A">Adam Pearce</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Mehta,+H">Hritik Mehta</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=James,+S">Scott James</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Wong,+T+H">Tsz Ho Wong</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Pournaras,+E">Evangelos Pournaras</a></div> <div class='list-comments mathjax'><span class='descriptor'>Comments:</span> 7 pages, 7 figures. This work has been accepted by 2024 IEEE 49th Conference on Local Computer Networks (LCN) </div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Robotics (cs.RO)</span>; Distributed, Parallel, and Cluster Computing (cs.DC) </div> <p class='mathjax'> Distributed sensing by cooperative drone swarms is crucial for several Smart City applications, such as traffic monitoring and disaster response. Using an indoor lab with inexpensive drones, a testbed supports complex and ambitious studies on these systems while maintaining low cost, rigor, and external validity. This paper introduces the Multi-drone Sensing Experimentation Testbed (M-SET), a novel platform designed to prototype, develop, test, and evaluate distributed sensing with swarm intelligence. M-SET addresses the limitations of existing testbeds that fail to emulate collisions, thus lacking realism in outdoor environments. By integrating a collision avoidance method based on a potential field algorithm, M-SET ensures collision-free navigation and sensing, further optimized via a multi-agent collective learning algorithm. Extensive evaluation demonstrates accurate energy consumption estimation and a low risk of collisions, providing a robust proof-of-concept. New insights show that M-SET has significant potential to support ambitious research with minimal cost, simplicity, and high sensing quality. </p> </div> </dd> <dt> <a name='item18'>[18]</a> <a href ="/abs/2410.09747" title="Abstract" id="2410.09747"> arXiv:2410.09747 </a> (replaced) [<a href="/pdf/2410.09747" title="Download PDF" id="pdf-2410.09747" aria-labelledby="pdf-2410.09747">pdf</a>, <a href="https://arxiv.org/html/2410.09747v3" title="View HTML" id="html-2410.09747" aria-labelledby="html-2410.09747" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2410.09747" title="Other formats" id="oth-2410.09747" aria-labelledby="oth-2410.09747">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> t-READi: Transformer-Powered Robust and Efficient Multimodal Inference for Autonomous Driving </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Hu,+P">Pengfei Hu</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Qian,+Y">Yuhang Qian</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Zheng,+T">Tianyue Zheng</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Li,+A">Ang Li</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Chen,+Z">Zhe Chen</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Gao,+Y">Yue Gao</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Cheng,+X">Xiuzhen Cheng</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Luo,+J">Jun Luo</a></div> <div class='list-comments mathjax'><span class='descriptor'>Comments:</span> 14 pages, 16 figures </div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Computer Vision and Pattern Recognition (cs.CV)</span>; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Robotics (cs.RO) </div> <p class='mathjax'> Given the wide adoption of multimodal sensors (e.g., camera, lidar, radar) by autonomous vehicles (AVs), deep analytics to fuse their outputs for a robust perception become imperative. However, existing fusion methods often make two assumptions rarely holding in practice: i) similar data distributions for all inputs and ii) constant availability for all sensors. Because, for example, lidars have various resolutions and failures of radars may occur, such variability often results in significant performance degradation in fusion. To this end, we present tREADi, an adaptive inference system that accommodates the variability of multimodal sensory data and thus enables robust and efficient perception. t-READi identifies variation-sensitive yet structure-specific model parameters; it then adapts only these parameters while keeping the rest intact. t-READi also leverages a cross-modality contrastive learning method to compensate for the loss from missing modalities. Both functions are implemented to maintain compatibility with existing multimodal deep fusion methods. The extensive experiments evidently demonstrate that compared with the status quo approaches, t-READi not only improves the average inference accuracy by more than 6% but also reduces the inference latency by almost 15x with the cost of only 5% extra memory overhead in the worst case under realistic data and modal variations. </p> </div> </dd> <dt> <a name='item19'>[19]</a> <a href ="/abs/2411.03832" title="Abstract" id="2411.03832"> arXiv:2411.03832 </a> (replaced) [<a href="/pdf/2411.03832" title="Download PDF" id="pdf-2411.03832" aria-labelledby="pdf-2411.03832">pdf</a>, <a href="https://arxiv.org/html/2411.03832v2" title="View HTML" id="html-2411.03832" aria-labelledby="html-2411.03832" rel="noopener noreferrer" target="_blank">html</a>, <a href="/format/2411.03832" title="Other formats" id="oth-2411.03832" aria-labelledby="oth-2411.03832">other</a>] </dt> <dd> <div class='meta'> <div class='list-title mathjax'><span class='descriptor'>Title:</span> Accelerating DNA Read Mapping with Digital Processing-in-Memory </div> <div class='list-authors'><a href="https://arxiv.org/search/cs?searchtype=author&query=Ben-Hur,+R">Rotem Ben-Hur</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Leitersdorf,+O">Orian Leitersdorf</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Ronen,+R">Ronny Ronen</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Goldshmidt,+L">Lidor Goldshmidt</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Magram,+I">Idan Magram</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Kaplun,+L">Lior Kaplun</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Yavitz,+L">Leonid Yavitz</a>, <a href="https://arxiv.org/search/cs?searchtype=author&query=Kvatinsky,+S">Shahar Kvatinsky</a></div> <div class='list-subjects'><span class='descriptor'>Subjects:</span> <span class="primary-subject">Hardware Architecture (cs.AR)</span>; Distributed, Parallel, and Cluster Computing (cs.DC); Quantitative Methods (q-bio.QM) </div> <p class='mathjax'> Genome analysis has revolutionized fields such as personalized medicine and forensics. Modern sequencing machines generate vast amounts of fragmented strings of genome data called reads. The alignment of these reads into a complete DNA sequence of an organism (the read mapping process) requires extensive data transfer between processing units and memory, leading to execution bottlenecks. Prior studies have primarily focused on accelerating specific stages of the read-mapping task. Conversely, this paper introduces a holistic framework called DART-PIM that accelerates the entire read-mapping process. DART-PIM facilitates digital processing-in-memory (PIM) for an end-to-end acceleration of the entire read-mapping process, from indexing using a unique data organization schema to filtering and read alignment with an optimized Wagner Fischer algorithm. A comprehensive performance evaluation with real genomic data shows that DART-PIM achieves a 5.7x and 257x improvement in throughput and a 92x and 27x energy efficiency enhancement compared to state-of-the-art GPU and PIM implementations, respectively. </p> </div> </dd> </dl> <div class='paging'>Total of 19 entries </div> <div class='morefewer'>Showing up to 2000 entries per page: <a href=/list/cs.DC/new?skip=0&show=1000 rel="nofollow"> fewer</a> | <span style="color: #454545">more</span> | <span style="color: #454545">all</span> </div> </div> </div> </div> </main> <footer style="clear: both;"> <div class="columns is-desktop" role="navigation" aria-label="Secondary" style="margin: -0.75em -0.75em 0.75em -0.75em"> <!-- Macro-Column 1 --> <div class="column" style="padding: 0;"> <div class="columns"> <div class="column"> <ul style="list-style: none; line-height: 2;"> <li><a href="https://info.arxiv.org/about">About</a></li> <li><a href="https://info.arxiv.org/help">Help</a></li> </ul> </div> <div class="column"> <ul style="list-style: none; line-height: 2;"> <li> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512" class="icon filter-black" role="presentation"><title>contact arXiv</title><desc>Click here to contact arXiv</desc><path d="M502.3 190.8c3.9-3.1 9.7-.2 9.7 4.7V400c0 26.5-21.5 48-48 48H48c-26.5 0-48-21.5-48-48V195.6c0-5 5.7-7.8 9.7-4.7 22.4 17.4 52.1 39.5 154.1 113.6 21.1 15.4 56.7 47.8 92.2 47.6 35.7.3 72-32.8 92.3-47.6 102-74.1 131.6-96.3 154-113.7zM256 320c23.2.4 56.6-29.2 73.4-41.4 132.7-96.3 142.8-104.7 173.4-128.7 5.8-4.5 9.2-11.5 9.2-18.9v-19c0-26.5-21.5-48-48-48H48C21.5 64 0 85.5 0 112v19c0 7.4 3.4 14.3 9.2 18.9 30.6 23.9 40.7 32.4 173.4 128.7 16.8 12.2 50.2 41.8 73.4 41.4z"/></svg> <a href="https://info.arxiv.org/help/contact.html"> Contact</a> </li> <li> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512" class="icon filter-black" role="presentation"><title>subscribe to arXiv mailings</title><desc>Click here to subscribe</desc><path d="M476 3.2L12.5 270.6c-18.1 10.4-15.8 35.6 2.2 43.2L121 358.4l287.3-253.2c5.5-4.9 13.3 2.6 8.6 8.3L176 407v80.5c0 23.6 28.5 32.9 42.5 15.8L282 426l124.6 52.2c14.2 6 30.4-2.9 33-18.2l72-432C515 7.8 493.3-6.8 476 3.2z"/></svg> <a href="https://info.arxiv.org/help/subscribe"> Subscribe</a> </li> </ul> </div> </div> </div> <!-- End Macro-Column 1 --> <!-- Macro-Column 2 --> <div class="column" style="padding: 0;"> <div class="columns"> <div class="column"> <ul style="list-style: none; line-height: 2;"> <li><a href="https://info.arxiv.org/help/license/index.html">Copyright</a></li> <li><a href="https://info.arxiv.org/help/policies/privacy_policy.html">Privacy Policy</a></li> </ul> </div> <div class="column sorry-app-links"> <ul style="list-style: none; line-height: 2;"> <li><a href="https://info.arxiv.org/help/web_accessibility.html">Web Accessibility Assistance</a></li> <li> <p class="help"> <a class="a11y-main-link" href="https://status.arxiv.org" target="_blank">arXiv Operational Status <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 256 512" class="icon filter-dark_grey" role="presentation"><path d="M224.3 273l-136 136c-9.4 9.4-24.6 9.4-33.9 0l-22.6-22.6c-9.4-9.4-9.4-24.6 0-33.9l96.4-96.4-96.4-96.4c-9.4-9.4-9.4-24.6 0-33.9L54.3 103c9.4-9.4 24.6-9.4 33.9 0l136 136c9.5 9.4 9.5 24.6.1 34z"/></svg></a><br> Get status notifications via <a class="is-link" href="https://subscribe.sorryapp.com/24846f03/email/new" target="_blank"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512" class="icon filter-black" role="presentation"><path d="M502.3 190.8c3.9-3.1 9.7-.2 9.7 4.7V400c0 26.5-21.5 48-48 48H48c-26.5 0-48-21.5-48-48V195.6c0-5 5.7-7.8 9.7-4.7 22.4 17.4 52.1 39.5 154.1 113.6 21.1 15.4 56.7 47.8 92.2 47.6 35.7.3 72-32.8 92.3-47.6 102-74.1 131.6-96.3 154-113.7zM256 320c23.2.4 56.6-29.2 73.4-41.4 132.7-96.3 142.8-104.7 173.4-128.7 5.8-4.5 9.2-11.5 9.2-18.9v-19c0-26.5-21.5-48-48-48H48C21.5 64 0 85.5 0 112v19c0 7.4 3.4 14.3 9.2 18.9 30.6 23.9 40.7 32.4 173.4 128.7 16.8 12.2 50.2 41.8 73.4 41.4z"/></svg>email</a> or <a class="is-link" href="https://subscribe.sorryapp.com/24846f03/slack/new" target="_blank"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512" class="icon filter-black" role="presentation"><path d="M94.12 315.1c0 25.9-21.16 47.06-47.06 47.06S0 341 0 315.1c0-25.9 21.16-47.06 47.06-47.06h47.06v47.06zm23.72 0c0-25.9 21.16-47.06 47.06-47.06s47.06 21.16 47.06 47.06v117.84c0 25.9-21.16 47.06-47.06 47.06s-47.06-21.16-47.06-47.06V315.1zm47.06-188.98c-25.9 0-47.06-21.16-47.06-47.06S139 32 164.9 32s47.06 21.16 47.06 47.06v47.06H164.9zm0 23.72c25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06H47.06C21.16 243.96 0 222.8 0 196.9s21.16-47.06 47.06-47.06H164.9zm188.98 47.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06h-47.06V196.9zm-23.72 0c0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06V79.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06V196.9zM283.1 385.88c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06v-47.06h47.06zm0-23.72c-25.9 0-47.06-21.16-47.06-47.06 0-25.9 21.16-47.06 47.06-47.06h117.84c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06H283.1z"/></svg>slack</a> </p> </li> </ul> </div> </div> </div> <!-- end MetaColumn 2 --> <!-- End Macro-Column 2 --> </div> </footer> </div> <script src="/static/base/1.0.1/js/member_acknowledgement.js"></script> </body> </html>