<li><span class="pagination-ellipsis">&hellip;</span></li> </ul> </nav> <ol class="breathe-horizontal" start="1"> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.13407</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+D+V">Dat Van-Thanh Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Van+Huynh%2C+T">Tin Van Huynh</a>, <a href="/search/cs?searchtype=author&amp;query=Van+Nguyen%2C+K">Kiet Van Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+N+L">Ngan Luu-Thuy Nguyen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.13407v2-abstract-short" style="display: inline;"> Natural Language Inference (NLI) is a task within Natural Language Processing (NLP) that holds value for various AI applications. However, there have been limited studies on Natural Language Inference in Vietnamese that explore the concept of joint models. Therefore, we conducted experiments using various combinations of contextualized language models (CLM) and neural networks. We use CLM to creat&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.13407v2-abstract-full').style.display = 'inline'; document.getElementById('2411.13407v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.13407v2-abstract-full" style="display: none;"> Natural Language Inference (NLI) is a task within Natural Language Processing (NLP) that holds value for various AI applications. However, there have been limited studies on Natural Language Inference in Vietnamese that explore the concept of joint models. Therefore, we conducted experiments using various combinations of contextualized language models (CLM) and neural networks. We use CLM to create contextualized work presentations and use Neural Networks for classification. Furthermore, we have evaluated the strengths and weaknesses of each joint model and identified the model failure points in the Vietnamese context. The highest F1 score in this experiment, up to 82.78% in the benchmark dataset (ViNLI). By conducting experiments with various models, the most considerable size of the CLM is XLM-R (355M). That combination has consistently demonstrated superior performance compared to fine-tuning strong pre-trained language models like PhoBERT (+6.58%), mBERT (+19.08%), and XLM-R (+0.94%) in terms of F1-score. This article aims to introduce a novel approach or model that attains improved performance for Vietnamese NLI. Overall, we find that the joint approach of CLM and neural networks is simple yet capable of achieving high-quality performance, which makes it suitable for applications that require efficient resource utilization. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.13407v2-abstract-full').style.display = 'none'; document.getElementById('2411.13407v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 20 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 20 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.10509</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Pham%2C+Q+P+M">Quang P. M. Pham</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K+T+N">Khoi T. N. Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Ngo%2C+L+C">Lan C. Ngo</a>, <a href="/search/cs?searchtype=author&amp;query=Song%2C+D">Dezhen Song</a>, <a href="/search/cs?searchtype=author&amp;query=Do%2C+T">Truong Do</a>, <a href="/search/cs?searchtype=author&amp;query=Hy%2C+T+S">Truong Son Hy</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.10509v1-abstract-short" style="display: inline;"> Scene graphs have proven to be highly effective for various scene understanding tasks due to their compact and explicit representation of relational information. However, current methods often overlook the critical importance of preserving symmetry when generating scene graphs from 3D point clouds, which can lead to reduced accuracy and robustness, particularly when dealing with noisy, multi-view&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10509v1-abstract-full').style.display = 'inline'; document.getElementById('2411.10509v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.10509v1-abstract-full" style="display: none;"> Scene graphs have proven to be highly effective for various scene understanding tasks due to their compact and explicit representation of relational information. However, current methods often overlook the critical importance of preserving symmetry when generating scene graphs from 3D point clouds, which can lead to reduced accuracy and robustness, particularly when dealing with noisy, multi-view data. This work, to the best of our knowledge, presents the first implementation of an Equivariant Scene Graph Neural Network (ESGNN) to generate semantic scene graphs from 3D point clouds, specifically for enhanced scene understanding. Furthermore, a significant limitation of prior methods is the absence of temporal modeling to capture time-dependent relationships among dynamically evolving entities within a scene. To address this gap, we introduce a novel temporal layer that leverages the symmetry-preserving properties of ESGNN to fuse scene graphs across multiple sequences into a unified global representation by an approximate graph-matching algorithm. Our combined architecture, termed the Temporal Equivariant Scene Graph Neural Network (TESGNN), not only surpasses existing state-of-the-art methods in scene estimation accuracy but also achieves faster convergence. Importantly, TESGNN is computationally efficient and straightforward to implement using existing frameworks, making it well-suited for real-time applications in robotics and computer vision. This approach paves the way for more robust and scalable solutions to complex multi-view scene understanding challenges. Our source code is publicly available at: <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10509v1-abstract-full').style.display = 'none'; document.getElementById('2411.10509v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 15 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">arXiv admin note: text overlap with arXiv:2407.00609</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.05699</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Networking and Internet Architecture">cs.NI</span> </div> <div class="is-inline-block" style="margin-left: 0.5rem"> <div class="tags has-addons"> <span class="tag is-dark is-size-7">doi</span> <span class="tag is-light is-size-7"><a class="" href="">10.1109/TGCN.2024.3431989 <i class="fa fa-external-link" aria-hidden="true"></i></a></span> </div> </div> </div> <p class="title is-5 mathjax"> Renewable Energy Powered and Open RAN-based Architecture for 5G Fixed Wireless Access Provisioning in Rural Areas </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ndikumana%2C+A">Anselme Ndikumana</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K+K">Kim Khoa Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Cheriet%2C+M">Mohamed Cheriet</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.05699v1-abstract-short" style="display: inline;"> Due to the high costs of optical fiber deployment in Low-Density and Rural Areas (LDRAs), 5G Fixed Wireless Access (5G FWA) recently emerged as an affordable solution. A widely adopted deployment scenario of 5G FWA includes edge cloud that supports computing services and Radio Access Network (RAN) functions. Such edge cloud requires network and energy resources for 5G FWA. This paper proposes rene&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05699v1-abstract-full').style.display = 'inline'; document.getElementById('2411.05699v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.05699v1-abstract-full" style="display: none;"> Due to the high costs of optical fiber deployment in Low-Density and Rural Areas (LDRAs), 5G Fixed Wireless Access (5G FWA) recently emerged as an affordable solution. A widely adopted deployment scenario of 5G FWA includes edge cloud that supports computing services and Radio Access Network (RAN) functions. Such edge cloud requires network and energy resources for 5G FWA. This paper proposes renewable energy powered and Open RAN-based architecture for 5G FWA serving LDRAs using three-level closed-loops. Open RAN is a new 5G RAN architecture allowing Open Central Unit and Open Distributed Unit to be distributed in virtualized environment. The first closed-loop distributes radio resources to Open RAN instances and slices at the edge cloud. The second closed-loop allocates radio resources to houses. We design a new energy model that leverages renewable energy. We jointly optimize radio and energy resource allocation in closed-loop 3. We formulate ultra-small and small-time scale optimization problems that link closed-loops to maximize communication utility while minimizing energy costs. We propose reinforcement learning and successive convex approximation to solve the formulated problems. Then, we use solution data and continual learning to improve resource allocation on a large timescale. Our proposal satisfies 97.14% slice delay budget. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05699v1-abstract-full').style.display = 'none'; document.getElementById('2411.05699v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 8 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Journal ref:</span> IEEE Transactions on Green Communications and Networking ( Volume: 8, Issue: 3, September 2024) </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.05664</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Networking and Internet Architecture">cs.NI</span> </div> <div class="is-inline-block" style="margin-left: 0.5rem"> <div class="tags has-addons"> <span class="tag is-dark is-size-7">doi</span> <span class="tag is-light is-size-7"><a class="" href="">10.1109/TMC.2024.3482985 <i class="fa fa-external-link" aria-hidden="true"></i></a></span> </div> </div> </div> <p class="title is-5 mathjax"> Digital Twin Backed Closed-Loops for Energy-Aware and Open RAN-based Fixed Wireless Access Serving Rural Areas </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ndikumana%2C+A">Anselme Ndikumana</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K+K">Kim Khoa Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Cheriet%2C+M">Mohamed Cheriet</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.05664v1-abstract-short" style="display: inline;"> Internet access in rural areas should be improved to support digital inclusion and 5G services. Due to the high deployment costs of fiber optics in these areas, Fixed Wireless Access (FWA) has become a preferable alternative. Additionally, the Open Radio Access Network (O-RAN) can facilitate the interoperability of FWA elements, allowing some FWA functions to be deployed at the edge cloud. However&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05664v1-abstract-full').style.display = 'inline'; document.getElementById('2411.05664v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.05664v1-abstract-full" style="display: none;"> Internet access in rural areas should be improved to support digital inclusion and 5G services. Due to the high deployment costs of fiber optics in these areas, Fixed Wireless Access (FWA) has become a preferable alternative. Additionally, the Open Radio Access Network (O-RAN) can facilitate the interoperability of FWA elements, allowing some FWA functions to be deployed at the edge cloud. However, deploying edge clouds in rural areas can increase network and energy costs. To address these challenges, we propose a closed-loop system assisted by a Digital Twin (DT) to automate energy-aware O-RAN based FWA resource management in rural areas. We consider the FWA and edge cloud as the Physical Twin (PT) and design a closed-loop that distributes radio resources to edge cloud instances for scheduling. We develop another closed-loop for intra-slice resource allocation to houses. We design an energy model that integrates radio resource allocation and formulate ultra-small and small-timescale optimizations for the PT to maximize slice requirement satisfaction while minimizing energy costs. We then design a reinforcement learning approach and successive convex approximation to address the formulated problems. We present a DT that replicates the PT by incorporating solution experiences into future states. The results show that our approach efficiently uses radio and energy resources. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05664v1-abstract-full').style.display = 'none'; document.getElementById('2411.05664v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 8 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Journal ref:</span> IEEE Transactions on Mobile Computing 01 (2024): 1-15 </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.05641</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=To%2C+L+T">Long Truong To</a>, <a href="/search/cs?searchtype=author&amp;query=Le%2C+H+T">Hung Tuan Le</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+D+V">Dat Van-Thanh Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+M+T">Manh Trong Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+T+T">Tri Thien Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Van+Huynh%2C+T">Tin Van Huynh</a>, <a href="/search/cs?searchtype=author&amp;query=Van+Nguyen%2C+K">Kiet Van Nguyen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.05641v1-abstract-short" style="display: inline;"> Large Language Models (LLMs), with gradually improving reading comprehension and reasoning capabilities, are being applied to a range of complex language tasks, including the automatic generation of language data for various purposes. However, research on applying LLMs for automatic data generation in low-resource languages like Vietnamese is still underdeveloped and lacks comprehensive evaluation&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05641v1-abstract-full').style.display = 'inline'; document.getElementById('2411.05641v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.05641v1-abstract-full" style="display: none;"> Large Language Models (LLMs), with gradually improving reading comprehension and reasoning capabilities, are being applied to a range of complex language tasks, including the automatic generation of language data for various purposes. However, research on applying LLMs for automatic data generation in low-resource languages like Vietnamese is still underdeveloped and lacks comprehensive evaluation. In this paper, we explore the use of LLMs for automatic data generation for the Vietnamese fact-checking task, which faces significant data limitations. Specifically, we focus on fact-checking data where claims are synthesized from multiple evidence sentences to assess the information synthesis capabilities of LLMs. We develop an automatic data construction process using simple prompt techniques on LLMs and explore several methods to improve the quality of the generated data. To evaluate the quality of the data generated by LLMs, we conduct both manual quality assessments and performance evaluations using language models. Experimental results and manual evaluations illustrate that while the quality of the generated data has significantly improved through fine-tuning techniques, LLMs still cannot match the data quality produced by humans. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05641v1-abstract-full').style.display = 'none'; document.getElementById('2411.05641v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 8 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.04270</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Quantum Physics">quant-ph</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Hardware Architecture">cs.AR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Optimization and Control">math.OC</span> </div> </div> <p class="title is-5 mathjax"> Optimizing Multi-level Magic State Factories for Fault-Tolerant Quantum Architectures </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Silva%2C+A">Allyson Silva</a>, <a href="/search/cs?searchtype=author&amp;query=Scherer%2C+A">Artur Scherer</a>, <a href="/search/cs?searchtype=author&amp;query=Webb%2C+Z">Zak Webb</a>, <a href="/search/cs?searchtype=author&amp;query=Khalid%2C+A">Abdullah Khalid</a>, <a href="/search/cs?searchtype=author&amp;query=Kulchytskyy%2C+B">Bohdan Kulchytskyy</a>, <a href="/search/cs?searchtype=author&amp;query=Kramer%2C+M">Mia Kramer</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Kevin Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Kong%2C+X">Xiangzhou Kong</a>, <a href="/search/cs?searchtype=author&amp;query=Dagnew%2C+G+A">Gebremedhin A. Dagnew</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yumeng Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+H+A">Huy Anh Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Olfert%2C+K">Katiemarie Olfert</a>, <a href="/search/cs?searchtype=author&amp;query=Ronagh%2C+P">Pooya Ronagh</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.04270v1-abstract-short" style="display: inline;"> We propose a novel technique for optimizing a modular fault-tolerant quantum computing architecture, taking into account any desired space-time trade--offs between the number of physical qubits and the fault-tolerant execution time of a quantum algorithm. We consider a concept architecture comprising a dedicated zone as a multi-level magic state factory and a core processor for efficient logical o&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.04270v1-abstract-full').style.display = 'inline'; document.getElementById('2411.04270v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.04270v1-abstract-full" style="display: none;"> We propose a novel technique for optimizing a modular fault-tolerant quantum computing architecture, taking into account any desired space-time trade--offs between the number of physical qubits and the fault-tolerant execution time of a quantum algorithm. We consider a concept architecture comprising a dedicated zone as a multi-level magic state factory and a core processor for efficient logical operations, forming a supply chain network for production and consumption of magic states. Using a heuristic algorithm, we solve the multi-objective optimization problem of minimizing space and time subject to a user-defined error budget for the success of the computation, taking the performance of various fault-tolerant protocols such as quantum memory, state preparation, magic state distillation, code growth, and logical operations into account. As an application, we show that physical quantum resource estimation reduces to a simple model involving a small number of key parameters, namely, the circuit volume, the error prefactors ($渭$) and error suppression rates ($螞$) of the fault-tolerant protocols, and an allowed slowdown factor ($尾$). We show that, in the proposed architecture, $10^5$--$10^8$ physical qubits are required for quantum algorithms with $T$-counts in the range $10^6$--$10^{15}$ and logical qubit counts in the range $10^2$--$10^4$, when run on quantum computers with quantum memory $螞$ in the range 3--10, for all slowdown factors $尾\geq 0.2$. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.04270v1-abstract-full').style.display = 'none'; document.getElementById('2411.04270v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 6 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">21 pages, 6 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.03730</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Cryptography and Security">cs.CR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Tobaben%2C+M">Marlon Tobaben</a>, <a href="/search/cs?searchtype=author&amp;query=Souibgui%2C+M+A">Mohamed Ali Souibgui</a>, <a href="/search/cs?searchtype=author&amp;query=Tito%2C+R">Rub猫n Tito</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khanh Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Kerkouche%2C+R">Raouf Kerkouche</a>, <a href="/search/cs?searchtype=author&amp;query=Jung%2C+K">Kangsoo Jung</a>, <a href="/search/cs?searchtype=author&amp;query=J%C3%A4lk%C3%B6%2C+J">Joonas J盲lk枚</a>, <a href="/search/cs?searchtype=author&amp;query=Kang%2C+L">Lei Kang</a>, <a href="/search/cs?searchtype=author&amp;query=Barsky%2C+A">Andrey Barsky</a>, <a href="/search/cs?searchtype=author&amp;query=d%27Andecy%2C+V+P">Vincent Poulain d&#39;Andecy</a>, <a href="/search/cs?searchtype=author&amp;query=Joseph%2C+A">Aur茅lie Joseph</a>, <a href="/search/cs?searchtype=author&amp;query=Muhamed%2C+A">Aashiq Muhamed</a>, <a href="/search/cs?searchtype=author&amp;query=Kuo%2C+K">Kevin Kuo</a>, <a href="/search/cs?searchtype=author&amp;query=Smith%2C+V">Virginia Smith</a>, <a href="/search/cs?searchtype=author&amp;query=Yamasaki%2C+Y">Yusuke Yamasaki</a>, <a href="/search/cs?searchtype=author&amp;query=Fukami%2C+T">Takumi Fukami</a>, <a href="/search/cs?searchtype=author&amp;query=Niwa%2C+K">Kenta Niwa</a>, <a href="/search/cs?searchtype=author&amp;query=Tyou%2C+I">Iifan Tyou</a>, <a href="/search/cs?searchtype=author&amp;query=Ishii%2C+H">Hiro Ishii</a>, <a href="/search/cs?searchtype=author&amp;query=Yokota%2C+R">Rio Yokota</a>, <a href="/search/cs?searchtype=author&amp;query=N%2C+R">Ragul N</a>, <a href="/search/cs?searchtype=author&amp;query=Kutum%2C+R">Rintu Kutum</a>, <a href="/search/cs?searchtype=author&amp;query=Llados%2C+J">Josep Llados</a>, <a href="/search/cs?searchtype=author&amp;query=Valveny%2C+E">Ernest Valveny</a>, <a href="/search/cs?searchtype=author&amp;query=Honkela%2C+A">Antti Honkela</a> , et al. (2 additional authors not shown) </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.03730v1-abstract-short" style="display: inline;"> The Privacy Preserving Federated Learning Document VQA (PFL-DocVQA) competition challenged the community to develop provably private and communication-efficient solutions in a federated setting for a real-life use case: invoice processing. The competition introduced a dataset of real invoice documents, along with associated questions and answers requiring information extraction and reasoning over&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.03730v1-abstract-full').style.display = 'inline'; document.getElementById('2411.03730v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.03730v1-abstract-full" style="display: none;"> The Privacy Preserving Federated Learning Document VQA (PFL-DocVQA) competition challenged the community to develop provably private and communication-efficient solutions in a federated setting for a real-life use case: invoice processing. The competition introduced a dataset of real invoice documents, along with associated questions and answers requiring information extraction and reasoning over the document images. Thereby, it brings together researchers and expertise from the document analysis, privacy, and federated learning communities. Participants fine-tuned a pre-trained, state-of-the-art Document Visual Question Answering model provided by the organizers for this new domain, mimicking a typical federated invoice processing setup. The base model is a multi-modal generative language model, and sensitive information could be exposed through either the visual or textual input modality. Participants proposed elegant solutions to reduce communication costs while maintaining a minimum utility threshold in track 1 and to protect all information from each document provider using differential privacy in track 2. The competition served as a new testbed for developing and testing private federated learning methods, simultaneously raising awareness about privacy within the document image analysis and recognition community. Ultimately, the competition analysis provides best practices and recommendations for successfully running privacy-focused federated learning challenges in the future. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.03730v1-abstract-full').style.display = 'none'; document.getElementById('2411.03730v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 6 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">27 pages, 6 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.00172</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> SeafloorAI: A Large-scale Vision-Language Dataset for Seafloor Geological Survey </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K+X">Kien X. Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Qiao%2C+F">Fengchun Qiao</a>, <a href="/search/cs?searchtype=author&amp;query=Trembanis%2C+A">Arthur Trembanis</a>, <a href="/search/cs?searchtype=author&amp;query=Peng%2C+X">Xi Peng</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.00172v2-abstract-short" style="display: inline;"> A major obstacle to the advancements of machine learning models in marine science, particularly in sonar imagery analysis, is the scarcity of AI-ready datasets. While there have been efforts to make AI-ready sonar image dataset publicly available, they suffer from limitations in terms of environment setting and scale. To bridge this gap, we introduce SeafloorAI, the first extensive AI-ready datase&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.00172v2-abstract-full').style.display = 'inline'; document.getElementById('2411.00172v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.00172v2-abstract-full" style="display: none;"> A major obstacle to the advancements of machine learning models in marine science, particularly in sonar imagery analysis, is the scarcity of AI-ready datasets. While there have been efforts to make AI-ready sonar image dataset publicly available, they suffer from limitations in terms of environment setting and scale. To bridge this gap, we introduce SeafloorAI, the first extensive AI-ready datasets for seafloor mapping across 5 geological layers that is curated in collaboration with marine scientists. We further extend the dataset to SeafloorGenAI by incorporating the language component in order to facilitate the development of both vision- and language-capable machine learning models for sonar imagery. The dataset consists of 62 geo-distributed data surveys spanning 17,300 square kilometers, with 696K sonar images, 827K annotated segmentation masks, 696K detailed language descriptions and approximately 7M question-answer pairs. By making our data processing source code publicly available, we aim to engage the marine science community to enrich the data pool and inspire the machine learning community to develop more robust models. This collaborative approach will enhance the capabilities and applications of our datasets within both fields. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.00172v2-abstract-full').style.display = 'none'; document.getElementById('2411.00172v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 6 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 31 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.21276</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computers and Society">cs.CY</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Sound">cs.SD</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Audio and Speech Processing">eess.AS</span> </div> </div> <p class="title is-5 mathjax"> GPT-4o System Card </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=OpenAI"> OpenAI</a>, <a href="/search/cs?searchtype=author&amp;query=%3A"> :</a>, <a href="/search/cs?searchtype=author&amp;query=Hurst%2C+A">Aaron Hurst</a>, <a href="/search/cs?searchtype=author&amp;query=Lerer%2C+A">Adam Lerer</a>, <a href="/search/cs?searchtype=author&amp;query=Goucher%2C+A+P">Adam P. Goucher</a>, <a href="/search/cs?searchtype=author&amp;query=Perelman%2C+A">Adam Perelman</a>, <a href="/search/cs?searchtype=author&amp;query=Ramesh%2C+A">Aditya Ramesh</a>, <a href="/search/cs?searchtype=author&amp;query=Clark%2C+A">Aidan Clark</a>, <a href="/search/cs?searchtype=author&amp;query=Ostrow%2C+A">AJ Ostrow</a>, <a href="/search/cs?searchtype=author&amp;query=Welihinda%2C+A">Akila Welihinda</a>, <a href="/search/cs?searchtype=author&amp;query=Hayes%2C+A">Alan Hayes</a>, <a href="/search/cs?searchtype=author&amp;query=Radford%2C+A">Alec Radford</a>, <a href="/search/cs?searchtype=author&amp;query=M%C4%85dry%2C+A">Aleksander M膮dry</a>, <a href="/search/cs?searchtype=author&amp;query=Baker-Whitcomb%2C+A">Alex Baker-Whitcomb</a>, <a href="/search/cs?searchtype=author&amp;query=Beutel%2C+A">Alex Beutel</a>, <a href="/search/cs?searchtype=author&amp;query=Borzunov%2C+A">Alex Borzunov</a>, <a href="/search/cs?searchtype=author&amp;query=Carney%2C+A">Alex Carney</a>, <a href="/search/cs?searchtype=author&amp;query=Chow%2C+A">Alex Chow</a>, <a href="/search/cs?searchtype=author&amp;query=Kirillov%2C+A">Alex Kirillov</a>, <a href="/search/cs?searchtype=author&amp;query=Nichol%2C+A">Alex Nichol</a>, <a href="/search/cs?searchtype=author&amp;query=Paino%2C+A">Alex Paino</a>, <a href="/search/cs?searchtype=author&amp;query=Renzin%2C+A">Alex Renzin</a>, <a href="/search/cs?searchtype=author&amp;query=Passos%2C+A+T">Alex Tachard Passos</a>, <a href="/search/cs?searchtype=author&amp;query=Kirillov%2C+A">Alexander Kirillov</a>, <a href="/search/cs?searchtype=author&amp;query=Christakis%2C+A">Alexi Christakis</a> , et al. (395 additional authors not shown) </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.21276v1-abstract-short" style="display: inline;"> GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It&#39;s trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.21276v1-abstract-full').style.display = 'inline'; document.getElementById('2410.21276v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.21276v1-abstract-full" style="display: none;"> GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It&#39;s trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o&#39;s capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we&#39;ve implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o&#39;s text and vision capabilities. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.21276v1-abstract-full').style.display = 'none'; document.getElementById('2410.21276v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 25 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.18935</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Schema-Guided Culture-Aware Complex Event Simulation with Multi-Agent Role-Play </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Li%2C+S">Sha Li</a>, <a href="/search/cs?searchtype=author&amp;query=Reddy%2C+R+G">Revanth Gangi Reddy</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K+D">Khanh Duy Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Q">Qingyun Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Fung%2C+M">May Fung</a>, <a href="/search/cs?searchtype=author&amp;query=Han%2C+C">Chi Han</a>, <a href="/search/cs?searchtype=author&amp;query=Han%2C+J">Jiawei Han</a>, <a href="/search/cs?searchtype=author&amp;query=Natarajan%2C+K">Kartik Natarajan</a>, <a href="/search/cs?searchtype=author&amp;query=Voss%2C+C+R">Clare R. Voss</a>, <a href="/search/cs?searchtype=author&amp;query=Ji%2C+H">Heng Ji</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.18935v1-abstract-short" style="display: inline;"> Complex news events, such as natural disasters and socio-political conflicts, require swift responses from the government and society. Relying on historical events to project the future is insufficient as such events are sparse and do not cover all possible conditions and nuanced situations. Simulation of these complex events can help better prepare and reduce the negative impact. We develop a con&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.18935v1-abstract-full').style.display = 'inline'; document.getElementById('2410.18935v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.18935v1-abstract-full" style="display: none;"> Complex news events, such as natural disasters and socio-political conflicts, require swift responses from the government and society. Relying on historical events to project the future is insufficient as such events are sparse and do not cover all possible conditions and nuanced situations. Simulation of these complex events can help better prepare and reduce the negative impact. We develop a controllable complex news event simulator guided by both the event schema representing domain knowledge about the scenario and user-provided assumptions representing case-specific conditions. As event dynamics depend on the fine-grained social and cultural context, we further introduce a geo-diverse commonsense and cultural norm-aware knowledge enhancement component. To enhance the coherence of the simulation, apart from the global timeline of events, we take an agent-based approach to simulate the individual character states, plans, and actions. By incorporating the schema and cultural norms, our generated simulations achieve much higher coherence and appropriateness and are received favorably by participants from a humanitarian assistance organization. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.18935v1-abstract-full').style.display = 'none'; document.getElementById('2410.18935v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 24 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted as EMNLP 2024 Demo</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.16151</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Neural and Evolutionary Computing">cs.NE</span> </div> </div> <p class="title is-5 mathjax"> Small Contributions, Small Networks: Efficient Neural Network Pruning Based on Relative Importance </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Hussien%2C+M">Mostafa Hussien</a>, <a href="/search/cs?searchtype=author&amp;query=Afifi%2C+M">Mahmoud Afifi</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K+K">Kim Khoa Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Cheriet%2C+M">Mohamed Cheriet</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.16151v1-abstract-short" style="display: inline;"> Recent advancements have scaled neural networks to unprecedented sizes, achieving remarkable performance across a wide range of tasks. However, deploying these large-scale models on resource-constrained devices poses significant challenges due to substantial storage and computational requirements. Neural network pruning has emerged as an effective technique to mitigate these limitations by reducin&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.16151v1-abstract-full').style.display = 'inline'; document.getElementById('2410.16151v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.16151v1-abstract-full" style="display: none;"> Recent advancements have scaled neural networks to unprecedented sizes, achieving remarkable performance across a wide range of tasks. However, deploying these large-scale models on resource-constrained devices poses significant challenges due to substantial storage and computational requirements. Neural network pruning has emerged as an effective technique to mitigate these limitations by reducing model size and complexity. In this paper, we introduce an intuitive and interpretable pruning method based on activation statistics, rooted in information theory and statistical analysis. Our approach leverages the statistical properties of neuron activations to identify and remove weights with minimal contributions to neuron outputs. Specifically, we build a distribution of weight contributions across the dataset and utilize its parameters to guide the pruning process. Furthermore, we propose a Pruning-aware Training strategy that incorporates an additional regularization term to enhance the effectiveness of our pruning method. Extensive experiments on multiple datasets and network architectures demonstrate that our method consistently outperforms several baseline and state-of-the-art pruning techniques. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.16151v1-abstract-full').style.display = 'none'; document.getElementById('2410.16151v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.12522</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> MING: A Functional Approach to Learning Molecular Generative Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+V+K">Van Khoa Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Falkiewicz%2C+M">Maciej Falkiewicz</a>, <a href="/search/cs?searchtype=author&amp;query=Mercatali%2C+G">Giangiacomo Mercatali</a>, <a href="/search/cs?searchtype=author&amp;query=Kalousis%2C+A">Alexandros Kalousis</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.12522v1-abstract-short" style="display: inline;"> Traditional molecule generation methods often rely on sequence or graph-based representations, which can limit their expressive power or require complex permutation-equivariant architectures. This paper introduces a novel paradigm for learning molecule generative models based on functional representations. Specifically, we propose Molecular Implicit Neural Generation (MING), a diffusion-based mode&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.12522v1-abstract-full').style.display = 'inline'; document.getElementById('2410.12522v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.12522v1-abstract-full" style="display: none;"> Traditional molecule generation methods often rely on sequence or graph-based representations, which can limit their expressive power or require complex permutation-equivariant architectures. This paper introduces a novel paradigm for learning molecule generative models based on functional representations. Specifically, we propose Molecular Implicit Neural Generation (MING), a diffusion-based model that learns molecular distributions in function space. Unlike standard diffusion processes in data space, MING employs a novel functional denoising probabilistic process, which jointly denoises the information in both the function&#39;s input and output spaces by leveraging an expectation-maximization procedure for latent implicit neural representations of data. This approach allows for a simple yet effective model design that accurately captures underlying function distributions. Experimental results on molecule-related datasets demonstrate MING&#39;s superior performance and ability to generate plausible molecular samples, surpassing state-of-the-art data-space methods while offering a more streamlined architecture and significantly faster generation times. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.12522v1-abstract-full').style.display = 'none'; document.getElementById('2410.12522v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 16 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.12068</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> V3D-SLAM: Robust RGB-D SLAM in Dynamic Environments with 3D Semantic Geometry Voting </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Dang%2C+T">Tuan Dang</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khang Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Huber%2C+M">Mandfred Huber</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.12068v1-abstract-short" style="display: inline;"> Simultaneous localization and mapping (SLAM) in highly dynamic environments is challenging due to the correlation complexity between moving objects and the camera pose. Many methods have been proposed to deal with this problem; however, the moving properties of dynamic objects with a moving camera remain unclear. Therefore, to improve SLAM&#39;s performance, minimizing disruptive events of moving obje&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.12068v1-abstract-full').style.display = 'inline'; document.getElementById('2410.12068v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.12068v1-abstract-full" style="display: none;"> Simultaneous localization and mapping (SLAM) in highly dynamic environments is challenging due to the correlation complexity between moving objects and the camera pose. Many methods have been proposed to deal with this problem; however, the moving properties of dynamic objects with a moving camera remain unclear. Therefore, to improve SLAM&#39;s performance, minimizing disruptive events of moving objects with a physical understanding of 3D shapes and dynamics of objects is needed. In this paper, we propose a robust method, V3D-SLAM, to remove moving objects via two lightweight re-evaluation stages, including identifying potentially moving and static objects using a spatial-reasoned Hough voting mechanism and refining static objects by detecting dynamic noise caused by intra-object motions using Chamfer distances as similarity measurements. Our experiment on the TUM RGB-D benchmark on dynamic sequences with ground-truth camera trajectories showed that our methods outperform the most recent state-of-the-art SLAM methods. Our source code is available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.12068v1-abstract-full').style.display = 'none'; document.getElementById('2410.12068v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 15 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Journal ref:</span> IEEE/RSJ International Conference on Intelligent Robots and Systems 2024 </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.08464</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Chen%2C+S">Sirui Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+C">Chen Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Kaden Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Fei-Fei%2C+L">Li Fei-Fei</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+C+K">C. Karen Liu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.08464v1-abstract-short" style="display: inline;"> Recent progress in imitation learning from human demonstrations has shown promising results in teaching robots manipulation skills. To further scale up training datasets, recent works start to use portable data collection devices without the need for physical robot hardware. However, due to the absence of on-robot feedback during data collection, the data quality depends heavily on user expertise,&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.08464v1-abstract-full').style.display = 'inline'; document.getElementById('2410.08464v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.08464v1-abstract-full" style="display: none;"> Recent progress in imitation learning from human demonstrations has shown promising results in teaching robots manipulation skills. To further scale up training datasets, recent works start to use portable data collection devices without the need for physical robot hardware. However, due to the absence of on-robot feedback during data collection, the data quality depends heavily on user expertise, and many devices are limited to specific robot embodiments. We propose ARCap, a portable data collection system that provides visual feedback through augmented reality (AR) and haptic warnings to guide users in collecting high-quality demonstrations. Through extensive user studies, we show that ARCap enables novice users to collect robot-executable data that matches robot kinematics and avoids collisions with the scenes. With data collected from ARCap, robots can perform challenging tasks, such as manipulation in cluttered environments and long-horizon cross-embodiment manipulation. ARCap is fully open-source and easy to calibrate; all components are built from off-the-shelf products. More details and results can be found on our website: <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.08464v1-abstract-full').style.display = 'none'; document.getElementById('2410.08464v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">8 pages, 8 Figures, submitted to ICRA 2025</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.08050</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Multiagent Systems">cs.MA</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Distributed, Parallel, and Cluster Computing">cs.DC</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Physics and Society">physics.soc-ph</span> </div> </div> <p class="title is-5 mathjax"> Agent-based modeling for realistic reproduction of human mobility and contact behavior to evaluate test and isolation strategies in epidemic infectious disease spread </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Kerkmann%2C+D">David Kerkmann</a>, <a href="/search/cs?searchtype=author&amp;query=Korf%2C+S">Sascha Korf</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khoa Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Abele%2C+D">Daniel Abele</a>, <a href="/search/cs?searchtype=author&amp;query=Schengen%2C+A">Alain Schengen</a>, <a href="/search/cs?searchtype=author&amp;query=Gerstein%2C+C">Carlotta Gerstein</a>, <a href="/search/cs?searchtype=author&amp;query=G%C3%B6bbert%2C+J+H">Jens Henrik G枚bbert</a>, <a href="/search/cs?searchtype=author&amp;query=Basermann%2C+A">Achim Basermann</a>, <a href="/search/cs?searchtype=author&amp;query=K%C3%BChn%2C+M+J">Martin J. K眉hn</a>, <a href="/search/cs?searchtype=author&amp;query=Meyer-Hermann%2C+M">Michael Meyer-Hermann</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.08050v1-abstract-short" style="display: inline;"> Agent-based models have proven to be useful tools in supporting decision-making processes in different application domains. The advent of modern computers and supercomputers has enabled these bottom-up approaches to realistically model human mobility and contact behavior. The COVID-19 pandemic showcased the urgent need for detailed and informative models that can answer research questions on trans&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.08050v1-abstract-full').style.display = 'inline'; document.getElementById('2410.08050v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.08050v1-abstract-full" style="display: none;"> Agent-based models have proven to be useful tools in supporting decision-making processes in different application domains. The advent of modern computers and supercomputers has enabled these bottom-up approaches to realistically model human mobility and contact behavior. The COVID-19 pandemic showcased the urgent need for detailed and informative models that can answer research questions on transmission dynamics. We present a sophisticated agent-based model to simulate the spread of respiratory diseases. The model is highly modularized and can be used on various scales, from a small collection of buildings up to cities or countries. Although not being the focus of this paper, the model has undergone performance engineering on a single core and provides an efficient intra- and inter-simulation parallelization for time-critical decision-making processes. In order to allow answering research questions on individual level resolution, nonpharmaceutical intervention strategies such as face masks or venue closures can be implemented for particular locations or agents. In particular, we allow for sophisticated testing and isolation strategies to study the effects of minimal-invasive infectious disease mitigation. With realistic human mobility patterns for the region of Brunswick, Germany, we study the effects of different interventions between March 1st and May 30, 2021 in the SARS-CoV-2 pandemic. Our analyses suggest that symptom-independent testing has limited impact on the mitigation of disease dynamics if the dark figure in symptomatic cases is high. Furthermore, we found that quarantine length is more important than quarantine efficiency but that, with sufficient symptomatic control, also short quarantines can have a substantial effect. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.08050v1-abstract-full').style.display = 'none'; document.getElementById('2410.08050v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">35 pages, 13 figures, to be submitted to Elsevier</span> </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">ACM Class:</span> I.6.4; I.6.5; D.1.3 </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.03458</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Van+Dinh%2C+N">Nguyen Van Dinh</a>, <a href="/search/cs?searchtype=author&amp;query=Dang%2C+T+C">Thanh Chi Dang</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+L+T">Luan Thanh Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Van+Nguyen%2C+K">Kiet Van Nguyen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.03458v1-abstract-short" style="display: inline;"> Vietnamese, a low-resource language, is typically categorized into three primary dialect groups that belong to Northern, Central, and Southern Vietnam. However, each province within these regions exhibits its own distinct pronunciation variations. Despite the existence of various speech recognition datasets, none of them has provided a fine-grained classification of the 63 dialects specific to ind&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.03458v1-abstract-full').style.display = 'inline'; document.getElementById('2410.03458v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.03458v1-abstract-full" style="display: none;"> Vietnamese, a low-resource language, is typically categorized into three primary dialect groups that belong to Northern, Central, and Southern Vietnam. However, each province within these regions exhibits its own distinct pronunciation variations. Despite the existence of various speech recognition datasets, none of them has provided a fine-grained classification of the 63 dialects specific to individual provinces of Vietnam. To address this gap, we introduce Vietnamese Multi-Dialect (ViMD) dataset, a novel comprehensive dataset capturing the rich diversity of 63 provincial dialects spoken across Vietnam. Our dataset comprises 102.56 hours of audio, consisting of approximately 19,000 utterances, and the associated transcripts contain over 1.2 million words. To provide benchmarks and simultaneously demonstrate the challenges of our dataset, we fine-tune state-of-the-art pre-trained models for two downstream tasks: (1) Dialect identification and (2) Speech recognition. The empirical results suggest two implications including the influence of geographical factors on dialects, and the constraints of current approaches in speech recognition tasks involving multi-dialect speech data. Our dataset is available for research purposes. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.03458v1-abstract-full').style.display = 'none'; document.getElementById('2410.03458v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Main EMNLP 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.00348</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Revisiting the Role of Texture in 3D Person Re-identification </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+H">Huy Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Kien Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Pemasiri%2C+A">Akila Pemasiri</a>, <a href="/search/cs?searchtype=author&amp;query=Sridharan%2C+S">Sridha Sridharan</a>, <a href="/search/cs?searchtype=author&amp;query=Fookes%2C+C">Clinton Fookes</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.00348v1-abstract-short" style="display: inline;"> This study introduces a new framework for 3D person re-identification (re-ID) that leverages readily available high-resolution texture data in 3D reconstruction to improve the performance and explainability of the person re-ID task. We propose a method to emphasize texture in 3D person re-ID models by incorporating UVTexture mapping, which better differentiates human subjects. Our approach uniquel&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.00348v1-abstract-full').style.display = 'inline'; document.getElementById('2410.00348v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.00348v1-abstract-full" style="display: none;"> This study introduces a new framework for 3D person re-identification (re-ID) that leverages readily available high-resolution texture data in 3D reconstruction to improve the performance and explainability of the person re-ID task. We propose a method to emphasize texture in 3D person re-ID models by incorporating UVTexture mapping, which better differentiates human subjects. Our approach uniquely combines UVTexture and its heatmaps with 3D models to visualize and explain the person re-ID process. In particular, the visualization and explanation are achieved through activation maps and attribute-based attention maps, which highlight the important regions and features contributing to the person re-ID decision. Our contributions include: (1) a novel technique for emphasizing texture in 3D models using UVTexture processing, (2) an innovative method for explicating person re-ID matches through a combination of 3D models and UVTexture mapping, and (3) achieving state-of-the-art performance in 3D person re-ID. We ensure the reproducibility of our results by making all data, codes, and models publicly available. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.00348v1-abstract-full').style.display = 'none'; document.getElementById('2410.00348v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 30 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.20467</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> A Weakly Supervised Data Labeling Framework for Machine Lexical Normalization in Vietnamese Social Media </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+D+H">Dung Ha Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+A+T+H">Anh Thi Hoang Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Van+Nguyen%2C+K">Kiet Van Nguyen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.20467v1-abstract-short" style="display: inline;"> This study introduces an innovative automatic labeling framework to address the challenges of lexical normalization in social media texts for low-resource languages like Vietnamese. Social media data is rich and diverse, but the evolving and varied language used in these contexts makes manual labeling labor-intensive and expensive. To tackle these issues, we propose a framework that integrates sem&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.20467v1-abstract-full').style.display = 'inline'; document.getElementById('2409.20467v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.20467v1-abstract-full" style="display: none;"> This study introduces an innovative automatic labeling framework to address the challenges of lexical normalization in social media texts for low-resource languages like Vietnamese. Social media data is rich and diverse, but the evolving and varied language used in these contexts makes manual labeling labor-intensive and expensive. To tackle these issues, we propose a framework that integrates semi-supervised learning with weak supervision techniques. This approach enhances the quality of training dataset and expands its size while minimizing manual labeling efforts. Our framework automatically labels raw data, converting non-standard vocabulary into standardized forms, thereby improving the accuracy and consistency of the training data. Experimental results demonstrate the effectiveness of our weak supervision framework in normalizing Vietnamese text, especially when utilizing Pre-trained Language Models. The proposed framework achieves an impressive F1-score of 82.72% and maintains vocabulary integrity with an accuracy of up to 99.22%. Additionally, it effectively handles undiacritized text under various conditions. This framework significantly enhances natural language normalization quality and improves the accuracy of various NLP tasks, leading to an average accuracy increase of 1-3%. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.20467v1-abstract-full').style.display = 'none'; document.getElementById('2409.20467v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 30 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.14435</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> </div> </div> <p class="title is-5 mathjax"> Adaptive Compensation for Robotic Joint Failures Using Partially Observable Reinforcement Learning </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Pham%2C+T">Tan-Hanh Pham</a>, <a href="/search/cs?searchtype=author&amp;query=Aikins%2C+G">Godwyll Aikins</a>, <a href="/search/cs?searchtype=author&amp;query=Truong%2C+T">Tri Truong</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Kim-Doang Nguyen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.14435v1-abstract-short" style="display: inline;"> Robotic manipulators are widely used in various industries for complex and repetitive tasks. However, they remain vulnerable to unexpected hardware failures. In this study, we address the challenge of enabling a robotic manipulator to complete tasks despite joint malfunctions. Specifically, we develop a reinforcement learning (RL) framework to adaptively compensate for a non-functional joint durin&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.14435v1-abstract-full').style.display = 'inline'; document.getElementById('2409.14435v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.14435v1-abstract-full" style="display: none;"> Robotic manipulators are widely used in various industries for complex and repetitive tasks. However, they remain vulnerable to unexpected hardware failures. In this study, we address the challenge of enabling a robotic manipulator to complete tasks despite joint malfunctions. Specifically, we develop a reinforcement learning (RL) framework to adaptively compensate for a non-functional joint during task execution. Our experimental platform is the Franka robot with 7 degrees of freedom (DOFs). We formulate the problem as a partially observable Markov decision process (POMDP), where the robot is trained under various joint failure conditions and tested in both seen and unseen scenarios. We consider scenarios where a joint is permanently broken and where it functions intermittently. Additionally, we demonstrate the effectiveness of our approach by comparing it with traditional inverse kinematics-based control methods. The results show that the RL algorithm enables the robot to successfully complete tasks even with joint failures, achieving a high success rate with an average rate of 93.6%. This showcases its robustness and adaptability. Our findings highlight the potential of RL to enhance the resilience and reliability of robotic systems, making them better suited for unpredictable environments. All related codes and models are published online. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.14435v1-abstract-full').style.display = 'none'; document.getElementById('2409.14435v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 22 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">15 pages</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.07420</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">ps</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Cryptography and Security">cs.CR</span> </div> </div> <p class="title is-5 mathjax"> XDC Staking and Tokenomics -- Improvement Proposal: Enhancing Sustainability and Decentralization on the Eve of XDC 2.0 </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+V+K">Van Khanh Nguyen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.07420v1-abstract-short" style="display: inline;"> As the XDC network celebrates five years of stable mainnet operation and prepares for the highly anticipated launch of XDC 2.0, this research proposes a comprehensive improvement plan for the network&#39;s staking and tokenomics mechanisms. Our analysis reveals opportunities to optimize the current model, ensuring a more sustainable, decentralized, and resilient ecosystem. We introduce novel concepts,&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.07420v1-abstract-full').style.display = 'inline'; document.getElementById('2409.07420v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.07420v1-abstract-full" style="display: none;"> As the XDC network celebrates five years of stable mainnet operation and prepares for the highly anticipated launch of XDC 2.0, this research proposes a comprehensive improvement plan for the network&#39;s staking and tokenomics mechanisms. Our analysis reveals opportunities to optimize the current model, ensuring a more sustainable, decentralized, and resilient ecosystem. We introduce novel concepts, including validator NFTs, decentralized governance, and utility-based tokenomics, to increase validator node liquidity and promote staking participation. Our proposal aims to establish a robust foundation for XDC 2.0, fostering a thriving ecosystem that rewards validators, stakeholders, and users alike. By addressing the intricacies of staking and tokenomics, this research paves the way for XDC to solidify its position as a leading decentralized network, poised for long-term success and growth. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.07420v1-abstract-full').style.display = 'none'; document.getElementById('2409.07420v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 11 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.06422</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Cryptography and Security">cs.CR</span> </div> </div> <p class="title is-5 mathjax"> A Pervasive, Efficient and Private Future: Realizing Privacy-Preserving Machine Learning Through Hybrid Homomorphic Encryption </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khoa Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Budzys%2C+M">Mindaugas Budzys</a>, <a href="/search/cs?searchtype=author&amp;query=Frimpong%2C+E">Eugene Frimpong</a>, <a href="/search/cs?searchtype=author&amp;query=Khan%2C+T">Tanveer Khan</a>, <a href="/search/cs?searchtype=author&amp;query=Michalas%2C+A">Antonis Michalas</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.06422v1-abstract-short" style="display: inline;"> Machine Learning (ML) has become one of the most impactful fields of data science in recent years. However, a significant concern with ML is its privacy risks due to rising attacks against ML models. Privacy-Preserving Machine Learning (PPML) methods have been proposed to mitigate the privacy and security risks of ML models. A popular approach to achieving PPML uses Homomorphic Encryption (HE). Ho&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.06422v1-abstract-full').style.display = 'inline'; document.getElementById('2409.06422v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.06422v1-abstract-full" style="display: none;"> Machine Learning (ML) has become one of the most impactful fields of data science in recent years. However, a significant concern with ML is its privacy risks due to rising attacks against ML models. Privacy-Preserving Machine Learning (PPML) methods have been proposed to mitigate the privacy and security risks of ML models. A popular approach to achieving PPML uses Homomorphic Encryption (HE). However, the highly publicized inefficiencies of HE make it unsuitable for highly scalable scenarios with resource-constrained devices. Hence, Hybrid Homomorphic Encryption (HHE) -- a modern encryption scheme that combines symmetric cryptography with HE -- has recently been introduced to overcome these challenges. HHE potentially provides a foundation to build new efficient and privacy-preserving services that transfer expensive HE operations to the cloud. This work introduces HHE to the ML field by proposing resource-friendly PPML protocols for edge devices. More precisely, we utilize HHE as the primary building block of our PPML protocols. We assess the performance of our protocols by first extensively evaluating each party&#39;s communication and computational cost on a dummy dataset and show the efficiency of our protocols by comparing them with similar protocols implemented using plain BFV. Subsequently, we demonstrate the real-world applicability of our construction by building an actual PPML application that uses HHE as its foundation to classify heart disease based on sensitive ECG data. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.06422v1-abstract-full').style.display = 'none'; document.getElementById('2409.06422v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted in The 22nd IEEE International Conference on Dependable, Autonomic and Secure Computing (DASC 2024)</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.02840</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> R2GQA: Retriever-Reader-Generator Question Answering System to Support Students Understanding Legal Regulations in Higher Education </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Do%2C+P+P">Phuc-Tinh Pham Do</a>, <a href="/search/cs?searchtype=author&amp;query=Cao%2C+D+D">Duy-Ngoc Dinh Cao</a>, <a href="/search/cs?searchtype=author&amp;query=Tran%2C+K+Q">Khanh Quoc Tran</a>, <a href="/search/cs?searchtype=author&amp;query=Van+Nguyen%2C+K">Kiet Van Nguyen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.02840v1-abstract-short" style="display: inline;"> In this article, we propose the R2GQA system, a Retriever-Reader-Generator Question Answering system, consisting of three main components: Document Retriever, Machine Reader, and Answer Generator. The Retriever module employs advanced information retrieval techniques to extract the context of articles from a dataset of legal regulation documents. The Machine Reader module utilizes state-of-the-art&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.02840v1-abstract-full').style.display = 'inline'; document.getElementById('2409.02840v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.02840v1-abstract-full" style="display: none;"> In this article, we propose the R2GQA system, a Retriever-Reader-Generator Question Answering system, consisting of three main components: Document Retriever, Machine Reader, and Answer Generator. The Retriever module employs advanced information retrieval techniques to extract the context of articles from a dataset of legal regulation documents. The Machine Reader module utilizes state-of-the-art natural language understanding algorithms to comprehend the retrieved documents and extract answers. Finally, the Generator module synthesizes the extracted answers into concise and informative responses to questions of students regarding legal regulations. Furthermore, we built the ViRHE4QA dataset in the domain of university training regulations, comprising 9,758 question-answer pairs with a rigorous construction process. This is the first Vietnamese dataset in the higher regulations domain with various types of answers, both extractive and abstractive. In addition, the R2GQA system is the first system to offer abstractive answers in Vietnamese. This paper discusses the design and implementation of each module within the R2GQA system on the ViRHE4QA dataset, highlighting their functionalities and interactions. Furthermore, we present experimental results demonstrating the effectiveness and utility of the proposed system in supporting the comprehension of students of legal regulations in higher education settings. In general, the R2GQA system and the ViRHE4QA dataset promise to contribute significantly to related research and help students navigate complex legal documents and regulations, empowering them to make informed decisions and adhere to institutional policies effectively. Our dataset is available for research purposes. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.02840v1-abstract-full').style.display = 'none'; document.getElementById('2409.02840v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.01989</a> <span>&nbsp;[<a href="">pdf</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Disordered Systems and Neural Networks">cond-mat.dis-nn</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Materials Science">cond-mat.mtrl-sci</span> </div> </div> <p class="title is-5 mathjax"> Improving Electrolyte Performance for Target Cathode Loading Using Interpretable Data-Driven Approach </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Sharma%2C+V">Vidushi Sharma</a>, <a href="/search/cs?searchtype=author&amp;query=Tek%2C+A">Andy Tek</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khanh Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Giammona%2C+M">Max Giammona</a>, <a href="/search/cs?searchtype=author&amp;query=Zohair%2C+M">Murtaza Zohair</a>, <a href="/search/cs?searchtype=author&amp;query=Sundberg%2C+L">Linda Sundberg</a>, <a href="/search/cs?searchtype=author&amp;query=La%2C+Y">Young-Hye La</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.01989v1-abstract-short" style="display: inline;"> Higher loading of active electrode materials is desired in batteries, especially those based on conversion reactions, for enhanced energy density and cost efficiency. However, increasing active material loading in electrodes can cause significant performance depreciation due to internal resistance, shuttling, and parasitic side reactions, which can be alleviated to a certain extent by a compatible&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.01989v1-abstract-full').style.display = 'inline'; document.getElementById('2409.01989v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.01989v1-abstract-full" style="display: none;"> Higher loading of active electrode materials is desired in batteries, especially those based on conversion reactions, for enhanced energy density and cost efficiency. However, increasing active material loading in electrodes can cause significant performance depreciation due to internal resistance, shuttling, and parasitic side reactions, which can be alleviated to a certain extent by a compatible design of electrolytes. In this work, a data-driven approach is leveraged to find a high-performing electrolyte formulation for a novel interhalogen battery custom to the target cathode loading. An electrolyte design consisting of 4 solvents and 4 salts is experimentally devised for a novel interhalogen battery based on a multi-electron redox reaction. The experimental dataset with variable electrolyte compositions and active cathode loading, is used to train a graph-based deep learning model mapping changing variables in the battery&#39;s material design to its specific capacity. The trained model is used to further optimize the electrolyte formulation compositions for enhancing the battery capacity at a target cathode loading by a two-fold approach: large-scale screening and interpreting electrolyte design principles for different cathode loadings. The data-driven approach is demonstrated to bring about an additional 20% increment in the specific capacity of the battery over capacities obtained from the experimental optimization. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.01989v1-abstract-full').style.display = 'none'; document.getElementById('2409.01989v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 3 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">34 Pages, 5 Figures, 2 Tables</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.14176</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Dao%2C+T">Trung Dao</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+T+H">Thuan Hoang Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Le%2C+T">Thanh Le</a>, <a href="/search/cs?searchtype=author&amp;query=Vu%2C+D">Duc Vu</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khoi Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Pham%2C+C">Cuong Pham</a>, <a href="/search/cs?searchtype=author&amp;query=Tran%2C+A">Anh Tran</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.14176v2-abstract-short" style="display: inline;"> In this paper, we aim to enhance the performance of SwiftBrush, a prominent one-step text-to-image diffusion model, to be competitive with its multi-step Stable Diffusion counterpart. Initially, we explore the quality-diversity trade-off between SwiftBrush and SD Turbo: the former excels in image diversity, while the latter excels in image quality. This observation motivates our proposed modificat&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.14176v2-abstract-full').style.display = 'inline'; document.getElementById('2408.14176v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.14176v2-abstract-full" style="display: none;"> In this paper, we aim to enhance the performance of SwiftBrush, a prominent one-step text-to-image diffusion model, to be competitive with its multi-step Stable Diffusion counterpart. Initially, we explore the quality-diversity trade-off between SwiftBrush and SD Turbo: the former excels in image diversity, while the latter excels in image quality. This observation motivates our proposed modifications in the training methodology, including better weight initialization and efficient LoRA training. Moreover, our introduction of a novel clamped CLIP loss enhances image-text alignment and results in improved image quality. Remarkably, by combining the weights of models trained with efficient LoRA and full training, we achieve a new state-of-the-art one-step diffusion model, achieving an FID of 8.14 and surpassing all GAN-based and multi-step Stable Diffusion models. The project page is available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.14176v2-abstract-full').style.display = 'none'; document.getElementById('2408.14176v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 27 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 26 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted to ECCV&#39;24</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.12772</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Symmetric masking strategy enhances the performance of Masked Image Modeling </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khanh-Binh Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Park%2C+C+J">Chae Jung Park</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.12772v1-abstract-short" style="display: inline;"> Masked Image Modeling (MIM) is a technique in self-supervised learning that focuses on acquiring detailed visual representations from unlabeled images by estimating the missing pixels in randomly masked sections. It has proven to be a powerful tool for the preliminary training of Vision Transformers (ViTs), yielding impressive results across various tasks. Nevertheless, most MIM methods heavily de&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.12772v1-abstract-full').style.display = 'inline'; document.getElementById('2408.12772v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.12772v1-abstract-full" style="display: none;"> Masked Image Modeling (MIM) is a technique in self-supervised learning that focuses on acquiring detailed visual representations from unlabeled images by estimating the missing pixels in randomly masked sections. It has proven to be a powerful tool for the preliminary training of Vision Transformers (ViTs), yielding impressive results across various tasks. Nevertheless, most MIM methods heavily depend on the random masking strategy to formulate the pretext task. This strategy necessitates numerous trials to ascertain the optimal dropping ratio, which can be resource-intensive, requiring the model to be pre-trained for anywhere between 800 to 1600 epochs. Furthermore, this approach may not be suitable for all datasets. In this work, we propose a new masking strategy that effectively helps the model capture global and local features. Based on this masking strategy, SymMIM, our proposed training pipeline for MIM is introduced. SymMIM achieves a new SOTA accuracy of 85.9\% on ImageNet using ViT-Large and surpasses previous SOTA across downstream tasks such as image classification, semantic segmentation, object detection, instance segmentation tasks, and so on. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.12772v1-abstract-full').style.display = 'none'; document.getElementById('2408.12772v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 22 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted at ICPR 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.11747</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Open-Ended 3D Point Cloud Instance Segmentation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+P+D+A">Phuc D. A. Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Luu%2C+M">Minh Luu</a>, <a href="/search/cs?searchtype=author&amp;query=Tran%2C+A">Anh Tran</a>, <a href="/search/cs?searchtype=author&amp;query=Pham%2C+C">Cuong Pham</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khoi Nguyen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.11747v1-abstract-short" style="display: inline;"> Open-Vocab 3D Instance Segmentation methods (OV-3DIS) have recently demonstrated their ability to generalize to unseen objects. However, these methods still depend on predefined class names during testing, restricting the autonomy of agents. To mitigate this constraint, we propose a novel problem termed Open-Ended 3D Instance Segmentation (OE-3DIS), which eliminates the necessity for predefined cl&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.11747v1-abstract-full').style.display = 'inline'; document.getElementById('2408.11747v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.11747v1-abstract-full" style="display: none;"> Open-Vocab 3D Instance Segmentation methods (OV-3DIS) have recently demonstrated their ability to generalize to unseen objects. However, these methods still depend on predefined class names during testing, restricting the autonomy of agents. To mitigate this constraint, we propose a novel problem termed Open-Ended 3D Instance Segmentation (OE-3DIS), which eliminates the necessity for predefined class names during testing. Moreover, we contribute a comprehensive set of strong baselines, derived from OV-3DIS approaches and leveraging 2D Multimodal Large Language Models. To assess the performance of our OE-3DIS system, we introduce a novel Open-Ended score, evaluating both the semantic and geometric quality of predicted masks and their associated class names, alongside the standard AP score. Our approach demonstrates significant performance improvements over the baselines on the ScanNet200 and ScanNet++ datasets. Remarkably, our method surpasses the performance of Open3DIS, the current state-of-the-art method in OV-3DIS, even in the absence of ground-truth object class names. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.11747v1-abstract-full').style.display = 'none'; document.getElementById('2408.11747v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.11559</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Pham%2C+D">Duc-Hai Pham</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+D+D">Duc Dung Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Pham%2C+H">Hoang-Anh Pham</a>, <a href="/search/cs?searchtype=author&amp;query=Tuan%2C+H+L">Ho Lai Tuan</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+P+H">Phong Ha Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khoi Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+R">Rang Nguyen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.11559v2-abstract-short" style="display: inline;"> Accurate prediction of 3D semantic occupancy from 2D visual images is vital in enabling autonomous agents to comprehend their surroundings for planning and navigation. State-of-the-art methods typically employ fully supervised approaches, necessitating a huge labeled dataset acquired through expensive LiDAR sensors and meticulous voxel-wise labeling by human annotators. The resource-intensive natu&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.11559v2-abstract-full').style.display = 'inline'; document.getElementById('2408.11559v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.11559v2-abstract-full" style="display: none;"> Accurate prediction of 3D semantic occupancy from 2D visual images is vital in enabling autonomous agents to comprehend their surroundings for planning and navigation. State-of-the-art methods typically employ fully supervised approaches, necessitating a huge labeled dataset acquired through expensive LiDAR sensors and meticulous voxel-wise labeling by human annotators. The resource-intensive nature of this annotating process significantly hampers the application and scalability of these methods. We introduce a novel semi-supervised framework to alleviate the dependency on densely annotated data. Our approach leverages 2D foundation models to generate essential 3D scene geometric and semantic cues, facilitating a more efficient training process. Our framework exhibits notable properties: (1) Generalizability, applicable to various 3D semantic scene completion approaches, including 2D-3D lifting and 3D-2D transformer methods. (2) Effectiveness, as demonstrated through experiments on SemanticKITTI and NYUv2, wherein our method achieves up to 85% of the fully-supervised performance using only 10% labeled data. This approach not only reduces the cost and labor associated with data annotation but also demonstrates the potential for broader adoption in camera-based systems for 3D semantic occupancy prediction. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.11559v2-abstract-full').style.display = 'none'; document.getElementById('2408.11559v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 12 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 21 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.05822</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Sampling Foundational Transformer: A Theoretical Perspective </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+V+A">Viet Anh Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Lenhat%2C+M">Minh Lenhat</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khoa Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Hieu%2C+D+D">Duong Duc Hieu</a>, <a href="/search/cs?searchtype=author&amp;query=Hung%2C+D+H">Dao Huu Hung</a>, <a href="/search/cs?searchtype=author&amp;query=Hy%2C+T+S">Truong Son Hy</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.05822v2-abstract-short" style="display: inline;"> The versatility of self-attention mechanism earned transformers great success in almost all data modalities, with limitations on the quadratic complexity and difficulty of training. To apply transformers across different data modalities, practitioners have to make specific clever data-modality-dependent constructions. In this paper, we propose Sampling Foundational Transformer (SFT) that can work&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.05822v2-abstract-full').style.display = 'inline'; document.getElementById('2408.05822v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.05822v2-abstract-full" style="display: none;"> The versatility of self-attention mechanism earned transformers great success in almost all data modalities, with limitations on the quadratic complexity and difficulty of training. To apply transformers across different data modalities, practitioners have to make specific clever data-modality-dependent constructions. In this paper, we propose Sampling Foundational Transformer (SFT) that can work on multiple data modalities (e.g., point cloud, graph, and sequence) and constraints (e.g., rotational-invariant). The existence of such model is important as contemporary foundational modeling requires operability on multiple data sources. For efficiency on large number of tokens, our model relies on our context aware sampling-without-replacement mechanism for both linear asymptotic computational complexity and real inference time gain. For efficiency, we rely on our newly discovered pseudoconvex formulation of transformer layer to increase model&#39;s convergence rate. As a model working on multiple data modalities, SFT has achieved competitive results on many benchmarks, while being faster in inference, compared to other very specialized models. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.05822v2-abstract-full').style.display = 'none'; document.getElementById('2408.05822v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 17 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 11 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.05391</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> SAMSA: Efficient Transformer for Many Data Modalities </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Lenhat%2C+M">Minh Lenhat</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+V+A">Viet Anh Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khoa Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Hieu%2C+D+D">Duong Duc Hieu</a>, <a href="/search/cs?searchtype=author&amp;query=Hung%2C+D+H">Dao Huu Hung</a>, <a href="/search/cs?searchtype=author&amp;query=Hy%2C+T+S">Truong Son Hy</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.05391v2-abstract-short" style="display: inline;"> The versatility of self-attention mechanism earned transformers great success in almost all data modalities, with limitations on the quadratic complexity and difficulty of training. Efficient transformers, on the other hand, often rely on clever data-modality-dependent construction to get over the quadratic complexity of transformers. This greatly hinders their applications on different data modal&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.05391v2-abstract-full').style.display = 'inline'; document.getElementById('2408.05391v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.05391v2-abstract-full" style="display: none;"> The versatility of self-attention mechanism earned transformers great success in almost all data modalities, with limitations on the quadratic complexity and difficulty of training. Efficient transformers, on the other hand, often rely on clever data-modality-dependent construction to get over the quadratic complexity of transformers. This greatly hinders their applications on different data modalities, which is one of the pillars of contemporary foundational modeling. In this paper, we lay the groundwork for efficient foundational modeling by proposing SAMSA - SAMpling-Self-Attention, a context-aware linear complexity self-attention mechanism that works well on multiple data modalities. Our mechanism is based on a differentiable sampling without replacement method we discovered. This enables the self-attention module to attend to the most important token set, where the importance is defined by data. Moreover, as differentiability is not needed in inference, the sparse formulation of our method costs little time overhead, further lowering computational costs. In short, SAMSA achieved competitive or even SOTA results on many benchmarks, while being faster in inference, compared to other very specialized models. Against full self-attention, real inference time significantly decreases while performance ranges from negligible degradation to outperformance. We release our source code in the repository: <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.05391v2-abstract-full').style.display = 'none'; document.getElementById('2408.05391v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 18 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 9 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.02349</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Active Sensing of Knee Osteoarthritis Progression with Reinforcement Learning </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khanh Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+H+H">Huy Hoang Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Panfilov%2C+E">Egor Panfilov</a>, <a href="/search/cs?searchtype=author&amp;query=Tiulpin%2C+A">Aleksei Tiulpin</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.02349v3-abstract-short" style="display: inline;"> Osteoarthritis (OA) is the most common musculoskeletal disease, which has no cure. Knee OA (KOA) is one of the highest causes of disability worldwide, and it costs billions of United States dollars to the global community. Prediction of KOA progression has been of high interest to the community for years, as it can advance treatment development through more efficient clinical trials and improve pa&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.02349v3-abstract-full').style.display = 'inline'; document.getElementById('2408.02349v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.02349v3-abstract-full" style="display: none;"> Osteoarthritis (OA) is the most common musculoskeletal disease, which has no cure. Knee OA (KOA) is one of the highest causes of disability worldwide, and it costs billions of United States dollars to the global community. Prediction of KOA progression has been of high interest to the community for years, as it can advance treatment development through more efficient clinical trials and improve patient outcomes through more efficient healthcare utilization. Existing approaches for predicting KOA, however, are predominantly static, i.e. consider data from a single time point to predict progression many years into the future, and knee level, i.e. consider progression in a single joint only. Due to these and related reasons, these methods fail to deliver the level of predictive performance, which is sufficient to result in cost savings and better patient outcomes. Collecting extensive data from all patients on a regular basis could address the issue, but it is limited by the high cost at a population level. In this work, we propose to go beyond static prediction models in OA, and bring a novel Active Sensing (AS) approach, designed to dynamically follow up patients with the objective of maximizing the number of informative data acquisitions, while minimizing their total cost over a period of time. Our approach is based on Reinforcement Learning (RL), and it leverages a novel reward function designed specifically for AS of disease progression in more than one part of a human body. Our method is end-to-end, relies on multi-modal Deep Learning, and requires no human input at inference time. Throughout an exhaustive experimental evaluation, we show that using RL can provide a higher monetary benefit when compared to state-of-the-art baselines. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.02349v3-abstract-full').style.display = 'none'; document.getElementById('2408.02349v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 22 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 5 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.02115</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Cryptography and Security">cs.CR</span> </div> </div> <p class="title is-5 mathjax"> Assessing the XDC Network: A Comprehensive Evaluation of its qualitative and technical aspects </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Khekade%2C+A">Atul Khekade</a>, <a href="/search/cs?searchtype=author&amp;query=Mestry%2C+O">Omkar Mestry</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+V+K">Van Khanh Nguyen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.02115v1-abstract-short" style="display: inline;"> This research provides a thorough assessment of the XDC Network, a delegated proof of stake (XDPoS) consensus-based blockchain technology, across its technical, security, and business dimensions. The study evaluates the network&#39;s decentralization, scalability, and security features, including its Nakamoto coefficient, validator participation, and client distribution. Additionally, it examines the&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.02115v1-abstract-full').style.display = 'inline'; document.getElementById('2408.02115v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.02115v1-abstract-full" style="display: none;"> This research provides a thorough assessment of the XDC Network, a delegated proof of stake (XDPoS) consensus-based blockchain technology, across its technical, security, and business dimensions. The study evaluates the network&#39;s decentralization, scalability, and security features, including its Nakamoto coefficient, validator participation, and client distribution. Additionally, it examines the developer ecosystem, including GitHub metrics, and business aspects such as transaction costs and predictability. The findings of this research will provide valuable insights into the strengths and weaknesses of the XDC Network, informing stakeholders and decision-makers about its suitability for various use cases, particularly in trade finance, asset tokenization, and enterprise blockchain solutions. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.02115v1-abstract-full').style.display = 'none'; document.getElementById('2408.02115v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.01934</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> A Survey and Evaluation of Adversarial Attacks for Object Detection </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K+N+T">Khoi Nguyen Tiet Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+W">Wenyu Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Lu%2C+K">Kangkang Lu</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+Y">Yuhuan Wu</a>, <a href="/search/cs?searchtype=author&amp;query=Zheng%2C+X">Xingjian Zheng</a>, <a href="/search/cs?searchtype=author&amp;query=Tan%2C+H+L">Hui Li Tan</a>, <a href="/search/cs?searchtype=author&amp;query=Zhen%2C+L">Liangli Zhen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.01934v2-abstract-short" style="display: inline;"> Deep learning models excel in various computer vision tasks but are susceptible to adversarial examples-subtle perturbations in input data that lead to incorrect predictions. This vulnerability poses significant risks in safety-critical applications such as autonomous vehicles, security surveillance, and aircraft health monitoring. While numerous surveys focus on adversarial attacks in image class&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.01934v2-abstract-full').style.display = 'inline'; document.getElementById('2408.01934v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.01934v2-abstract-full" style="display: none;"> Deep learning models excel in various computer vision tasks but are susceptible to adversarial examples-subtle perturbations in input data that lead to incorrect predictions. This vulnerability poses significant risks in safety-critical applications such as autonomous vehicles, security surveillance, and aircraft health monitoring. While numerous surveys focus on adversarial attacks in image classification, the literature on such attacks in object detection is limited. This paper offers a comprehensive taxonomy of adversarial attacks specific to object detection, reviews existing adversarial robustness evaluation metrics, and systematically assesses open-source attack methods and model robustness. Key observations are provided to enhance the understanding of attack effectiveness and corresponding countermeasures. Additionally, we identify crucial research challenges to guide future efforts in securing automated object detection systems. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.01934v2-abstract-full').style.display = 'none'; document.getElementById('2408.01934v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 4 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">14 pages</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.01026</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> PINNs for Medical Image Analysis: A Survey </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Banerjee%2C+C">Chayan Banerjee</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Kien Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Salvado%2C+O">Olivier Salvado</a>, <a href="/search/cs?searchtype=author&amp;query=Tran%2C+T">Truyen Tran</a>, <a href="/search/cs?searchtype=author&amp;query=Fookes%2C+C">Clinton Fookes</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.01026v1-abstract-short" style="display: inline;"> The incorporation of physical information in machine learning frameworks is transforming medical image analysis (MIA). By integrating fundamental knowledge and governing physical laws, these models achieve enhanced robustness and interpretability. In this work, we explore the utility of physics-informed approaches for MIA (PIMIA) tasks such as registration, generation, classification, and reconstr&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.01026v1-abstract-full').style.display = 'inline'; document.getElementById('2408.01026v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.01026v1-abstract-full" style="display: none;"> The incorporation of physical information in machine learning frameworks is transforming medical image analysis (MIA). By integrating fundamental knowledge and governing physical laws, these models achieve enhanced robustness and interpretability. In this work, we explore the utility of physics-informed approaches for MIA (PIMIA) tasks such as registration, generation, classification, and reconstruction. We present a systematic literature review of over 80 papers on physics-informed methods dedicated to MIA. We propose a unified taxonomy to investigate what physics knowledge and processes are modelled, how they are represented, and the strategies to incorporate them into MIA models. We delve deep into a wide range of image analysis tasks, from imaging, generation, prediction, inverse imaging (super-resolution and reconstruction), registration, and image analysis (segmentation and classification). For each task, we thoroughly examine and present in a tabular format the central physics-guided operation, the region of interest (with respect to human anatomy), the corresponding imaging modality, the dataset used for model training, the deep network architecture employed, and the primary physical process, equation, or principle utilized. Additionally, we also introduce a novel metric to compare the performance of PIMIA methods across different tasks and datasets. Based on this review, we summarize and distil our perspectives on the challenges, open research questions, and directions for future research. We highlight key open challenges in PIMIA, including selecting suitable physics priors and establishing a standardized benchmarking platform. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.01026v1-abstract-full').style.display = 'none'; document.getElementById('2408.01026v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 2 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.00118</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Gemma 2: Improving Open Language Models at a Practical Size </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Gemma+Team"> Gemma Team</a>, <a href="/search/cs?searchtype=author&amp;query=Riviere%2C+M">Morgane Riviere</a>, <a href="/search/cs?searchtype=author&amp;query=Pathak%2C+S">Shreya Pathak</a>, <a href="/search/cs?searchtype=author&amp;query=Sessa%2C+P+G">Pier Giuseppe Sessa</a>, <a href="/search/cs?searchtype=author&amp;query=Hardin%2C+C">Cassidy Hardin</a>, <a href="/search/cs?searchtype=author&amp;query=Bhupatiraju%2C+S">Surya Bhupatiraju</a>, <a href="/search/cs?searchtype=author&amp;query=Hussenot%2C+L">L茅onard Hussenot</a>, <a href="/search/cs?searchtype=author&amp;query=Mesnard%2C+T">Thomas Mesnard</a>, <a href="/search/cs?searchtype=author&amp;query=Shahriari%2C+B">Bobak Shahriari</a>, <a href="/search/cs?searchtype=author&amp;query=Ram%C3%A9%2C+A">Alexandre Ram茅</a>, <a href="/search/cs?searchtype=author&amp;query=Ferret%2C+J">Johan Ferret</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+P">Peter Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Tafti%2C+P">Pouya Tafti</a>, <a href="/search/cs?searchtype=author&amp;query=Friesen%2C+A">Abe Friesen</a>, <a href="/search/cs?searchtype=author&amp;query=Casbon%2C+M">Michelle Casbon</a>, <a href="/search/cs?searchtype=author&amp;query=Ramos%2C+S">Sabela Ramos</a>, <a href="/search/cs?searchtype=author&amp;query=Kumar%2C+R">Ravin Kumar</a>, <a href="/search/cs?searchtype=author&amp;query=Lan%2C+C+L">Charline Le Lan</a>, <a href="/search/cs?searchtype=author&amp;query=Jerome%2C+S">Sammy Jerome</a>, <a href="/search/cs?searchtype=author&amp;query=Tsitsulin%2C+A">Anton Tsitsulin</a>, <a href="/search/cs?searchtype=author&amp;query=Vieillard%2C+N">Nino Vieillard</a>, <a href="/search/cs?searchtype=author&amp;query=Stanczyk%2C+P">Piotr Stanczyk</a>, <a href="/search/cs?searchtype=author&amp;query=Girgin%2C+S">Sertan Girgin</a>, <a href="/search/cs?searchtype=author&amp;query=Momchev%2C+N">Nikola Momchev</a>, <a href="/search/cs?searchtype=author&amp;query=Hoffman%2C+M">Matt Hoffman</a> , et al. (173 additional authors not shown) </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.00118v3-abstract-short" style="display: inline;"> In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.00118v3-abstract-full').style.display = 'inline'; document.getElementById('2408.00118v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.00118v3-abstract-full" style="display: none;"> In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.00118v3-abstract-full').style.display = 'none'; document.getElementById('2408.00118v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 2 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 31 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.21054</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Sound">cs.SD</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Audio and Speech Processing">eess.AS</span> </div> </div> <p class="title is-5 mathjax"> Sentiment Reasoning for Healthcare </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khai-Nguyen Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Le-Duc%2C+K">Khai Le-Duc</a>, <a href="/search/cs?searchtype=author&amp;query=Tat%2C+B+P">Bach Phan Tat</a>, <a href="/search/cs?searchtype=author&amp;query=Le%2C+D">Duy Le</a>, <a href="/search/cs?searchtype=author&amp;query=Vo-Dang%2C+L">Long Vo-Dang</a>, <a href="/search/cs?searchtype=author&amp;query=Hy%2C+T">Truong-Son Hy</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.21054v3-abstract-short" style="display: inline;"> Transparency in AI healthcare decision-making is crucial for building trust among AI and users. Incorporating reasoning capabilities enables Large Language Models (LLMs) to understand emotions in context, handle nuanced language, and infer unstated sentiments. In this work, we introduce a new task -- Sentiment Reasoning -- for both speech and text modalities, along with our proposed multimodal mul&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.21054v3-abstract-full').style.display = 'inline'; document.getElementById('2407.21054v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.21054v3-abstract-full" style="display: none;"> Transparency in AI healthcare decision-making is crucial for building trust among AI and users. Incorporating reasoning capabilities enables Large Language Models (LLMs) to understand emotions in context, handle nuanced language, and infer unstated sentiments. In this work, we introduce a new task -- Sentiment Reasoning -- for both speech and text modalities, along with our proposed multimodal multitask framework and dataset. Sentiment Reasoning is an auxiliary task in sentiment analysis where the model predicts both the sentiment label and generates the rationale behind it based on the input transcript. Our study conducted on both human transcripts and Automatic Speech Recognition (ASR) transcripts shows that Sentiment Reasoning helps improve model transparency by providing rationale for model prediction with quality semantically comparable to humans while also improving model performance (1% increase in both accuracy and macro-F1) via rationale-augmented fine-tuning. Also, no significant difference in the semantic quality of generated rationales between human and ASR transcripts. All code, data (English-translated and Vietnamese) and models are published online: <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.21054v3-abstract-full').style.display = 'none'; document.getElementById('2407.21054v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 11 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 24 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">NeurIPS AIM-FM Workshop, 20 pages</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.12240</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Adaptive Cascading Network for Continual Test-Time Adaptation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K+X">Kien X. Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Qiao%2C+F">Fengchun Qiao</a>, <a href="/search/cs?searchtype=author&amp;query=Peng%2C+X">Xi Peng</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.12240v2-abstract-short" style="display: inline;"> We study the problem of continual test-time adaption where the goal is to adapt a source pre-trained model to a sequence of unlabelled target domains at test time. Existing methods on test-time training suffer from several limitations: (1) Mismatch between the feature extractor and classifier; (2) Interference between the main and self-supervised tasks; (3) Lack of the ability to quickly adapt to&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.12240v2-abstract-full').style.display = 'inline'; document.getElementById('2407.12240v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.12240v2-abstract-full" style="display: none;"> We study the problem of continual test-time adaption where the goal is to adapt a source pre-trained model to a sequence of unlabelled target domains at test time. Existing methods on test-time training suffer from several limitations: (1) Mismatch between the feature extractor and classifier; (2) Interference between the main and self-supervised tasks; (3) Lack of the ability to quickly adapt to the current distribution. In light of these challenges, we propose a cascading paradigm that simultaneously updates the feature extractor and classifier at test time, mitigating the mismatch between them and enabling long-term model adaptation. The pre-training of our model is structured within a meta-learning framework, thereby minimizing the interference between the main and self-supervised tasks and encouraging fast adaptation in the presence of limited unlabelled data. Additionally, we introduce innovative evaluation metrics, average accuracy and forward transfer, to effectively measure the model&#39;s adaptation capabilities in dynamic, real-world scenarios. Extensive experiments and ablation studies demonstrate the superiority of our approach in a range of tasks including image classification, text classification, and speech recognition. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.12240v2-abstract-full').style.display = 'none'; document.getElementById('2407.12240v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 1 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 16 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">ACM Class:</span> I.5.1; I.5.2 </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.09828</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Enhancing Semantic Segmentation with Adaptive Focal Loss: A Novel Approach </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Islam%2C+M+R">Md Rakibul Islam</a>, <a href="/search/cs?searchtype=author&amp;query=Hassan%2C+R">Riad Hassan</a>, <a href="/search/cs?searchtype=author&amp;query=Nazib%2C+A">Abdullah Nazib</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Kien Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Fookes%2C+C">Clinton Fookes</a>, <a href="/search/cs?searchtype=author&amp;query=Islam%2C+M+Z">Md Zahidul Islam</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.09828v1-abstract-short" style="display: inline;"> Deep learning has achieved outstanding accuracy in medical image segmentation, particularly for objects like organs or tumors with smooth boundaries or large sizes. Whereas, it encounters significant difficulties with objects that have zigzag boundaries or are small in size, leading to a notable decrease in segmentation effectiveness. In this context, using a loss function that incorporates smooth&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.09828v1-abstract-full').style.display = 'inline'; document.getElementById('2407.09828v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.09828v1-abstract-full" style="display: none;"> Deep learning has achieved outstanding accuracy in medical image segmentation, particularly for objects like organs or tumors with smooth boundaries or large sizes. Whereas, it encounters significant difficulties with objects that have zigzag boundaries or are small in size, leading to a notable decrease in segmentation effectiveness. In this context, using a loss function that incorporates smoothness and volume information into a model&#39;s predictions offers a promising solution to these shortcomings. In this work, we introduce an Adaptive Focal Loss (A-FL) function designed to mitigate class imbalance by down-weighting the loss for easy examples that results in up-weighting the loss for hard examples and giving greater emphasis to challenging examples, such as small and irregularly shaped objects. The proposed A-FL involves dynamically adjusting a focusing parameter based on an object&#39;s surface smoothness, size information, and adjusting the class balancing parameter based on the ratio of targeted area to total area in an image. We evaluated the performance of the A-FL using ResNet50-encoded U-Net architecture on the Picai 2022 and BraTS 2018 datasets. On the Picai 2022 dataset, the A-FL achieved an Intersection over Union (IoU) of 0.696 and a Dice Similarity Coefficient (DSC) of 0.769, outperforming the regular Focal Loss (FL) by 5.5% and 5.4% respectively. It also surpassed the best baseline Dice-Focal by 2.0% and 1.2%. On the BraTS 2018 dataset, A-FL achieved an IoU of 0.883 and a DSC of 0.931. The comparative studies show that the proposed A-FL function surpasses conventional methods, including Dice Loss, Focal Loss, and their hybrid variants, in IoU, DSC, Sensitivity, and Specificity metrics. This work highlights A-FL&#39;s potential to improve deep learning models for segmenting clinically significant regions in medical images, leading to more precise and reliable diagnostic tools. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.09828v1-abstract-full').style.display = 'none'; document.getElementById('2407.09828v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">15 pages, 4 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.02004</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Sound">cs.SD</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Audio and Speech Processing">eess.AS</span> </div> </div> <p class="title is-5 mathjax"> SAVE: Segment Audio-Visual Easy way using Segment Anything Model </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khanh-Binh Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Park%2C+C+J">Chae Jung Park</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.02004v2-abstract-short" style="display: inline;"> The primary aim of Audio-Visual Segmentation (AVS) is to precisely identify and locate auditory elements within visual scenes by accurately predicting segmentation masks at the pixel level. Achieving this involves comprehensively considering data and model aspects to address this task effectively. This study presents a lightweight approach, SAVE, which efficiently adapts the pre-trained segment an&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.02004v2-abstract-full').style.display = 'inline'; document.getElementById('2407.02004v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.02004v2-abstract-full" style="display: none;"> The primary aim of Audio-Visual Segmentation (AVS) is to precisely identify and locate auditory elements within visual scenes by accurately predicting segmentation masks at the pixel level. Achieving this involves comprehensively considering data and model aspects to address this task effectively. This study presents a lightweight approach, SAVE, which efficiently adapts the pre-trained segment anything model (SAM) to the AVS task. By incorporating an image encoder adapter into the transformer blocks to better capture the distinct dataset information and proposing a residual audio encoder adapter to encode the audio features as a sparse prompt, our proposed model achieves effective audio-visual fusion and interaction during the encoding stage. Our proposed method accelerates the training and inference speed by reducing the input resolution from 1024 to 256 pixels while achieving higher performance compared with the previous SOTA. Extensive experimentation validates our approach, demonstrating that our proposed model outperforms other SOTA methods significantly. Moreover, leveraging the pre-trained model on synthetic data enhances performance on real AVSBench data, achieving 84.59 mIoU on the S4 (V1S) subset and 70.28 mIoU on the MS3 (V1M) set with only 256 pixels for input images. This increases up to 86.16 mIoU on the S4 (V1S) and 70.83 mIoU on the MS3 (V1M) with inputs of 1024 pixels. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.02004v2-abstract-full').style.display = 'none'; document.getElementById('2407.02004v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 3 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 2 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.00609</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Pham%2C+Q+P+M">Quang P. M. Pham</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K+T+N">Khoi T. N. Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Ngo%2C+L+C">Lan C. Ngo</a>, <a href="/search/cs?searchtype=author&amp;query=Do%2C+T">Truong Do</a>, <a href="/search/cs?searchtype=author&amp;query=Hy%2C+T+S">Truong Son Hy</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.00609v1-abstract-short" style="display: inline;"> Scene graphs have been proven to be useful for various scene understanding tasks due to their compact and explicit nature. However, existing approaches often neglect the importance of maintaining the symmetry-preserving property when generating scene graphs from 3D point clouds. This oversight can diminish the accuracy and robustness of the resulting scene graphs, especially when handling noisy, m&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.00609v1-abstract-full').style.display = 'inline'; document.getElementById('2407.00609v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.00609v1-abstract-full" style="display: none;"> Scene graphs have been proven to be useful for various scene understanding tasks due to their compact and explicit nature. However, existing approaches often neglect the importance of maintaining the symmetry-preserving property when generating scene graphs from 3D point clouds. This oversight can diminish the accuracy and robustness of the resulting scene graphs, especially when handling noisy, multi-view 3D data. This work, to the best of our knowledge, is the first to implement an Equivariant Graph Neural Network in semantic scene graph generation from 3D point clouds for scene understanding. Our proposed method, ESGNN, outperforms existing state-of-the-art approaches, demonstrating a significant improvement in scene estimation with faster convergence. ESGNN demands low computational resources and is easy to implement from available frameworks, paving the way for real-time applications such as robotics and computer vision. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.00609v1-abstract-full').style.display = 'none'; document.getElementById('2407.00609v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 30 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.17716</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> ViANLI: Adversarial Natural Language Inference for Vietnamese </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Van+Huynh%2C+T">Tin Van Huynh</a>, <a href="/search/cs?searchtype=author&amp;query=Van+Nguyen%2C+K">Kiet Van Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+N+L">Ngan Luu-Thuy Nguyen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.17716v2-abstract-short" style="display: inline;"> The development of Natural Language Processing (NLI) datasets and models has been inspired by innovations in annotation design. With the rapid development of machine learning models today, the performance of existing machine learning models has quickly reached state-of-the-art results on a variety of tasks related to natural language processing, including natural language inference tasks. By using&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.17716v2-abstract-full').style.display = 'inline'; document.getElementById('2406.17716v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.17716v2-abstract-full" style="display: none;"> The development of Natural Language Processing (NLI) datasets and models has been inspired by innovations in annotation design. With the rapid development of machine learning models today, the performance of existing machine learning models has quickly reached state-of-the-art results on a variety of tasks related to natural language processing, including natural language inference tasks. By using a pre-trained model during the annotation process, it is possible to challenge current NLI models by having humans produce premise-hypothesis combinations that the machine model cannot correctly predict. To remain attractive and challenging in the research of natural language inference for Vietnamese, in this paper, we introduce the adversarial NLI dataset to the NLP research community with the name ViANLI. This data set contains more than 10K premise-hypothesis pairs and is built by a continuously adjusting process to obtain the most out of the patterns generated by the annotators. ViANLI dataset has brought many difficulties to many current SOTA models when the accuracy of the most powerful model on the test set only reached 48.4%. Additionally, the experimental results show that the models trained on our dataset have significantly improved the results on other Vietnamese NLI datasets. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.17716v2-abstract-full').style.display = 'none'; document.getElementById('2406.17716v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 1 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 25 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.15888</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Sound">cs.SD</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Audio and Speech Processing">eess.AS</span> </div> </div> <p class="title is-5 mathjax"> Real-time Speech Summarization for Medical Conversations </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Le-Duc%2C+K">Khai Le-Duc</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khai-Nguyen Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Vo-Dang%2C+L">Long Vo-Dang</a>, <a href="/search/cs?searchtype=author&amp;query=Hy%2C+T">Truong-Son Hy</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.15888v1-abstract-short" style="display: inline;"> In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation.&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.15888v1-abstract-full').style.display = 'inline'; document.getElementById('2406.15888v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.15888v1-abstract-full" style="display: none;"> In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation. Our system could enhance user experience from a business standpoint, while also reducing computational costs from a technical perspective. Secondly, we present VietMed-Sum which, to our knowledge, is the first speech summarization dataset for medical conversations. Thirdly, we are the first to utilize LLM and human annotators collaboratively to create gold standard and synthetic summaries for medical conversation summarization. Finally, we present baseline results of state-of-the-art models on VietMed-Sum. All code, data (English-translated and Vietnamese) and models are available online: <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.15888v1-abstract-full').style.display = 'none'; document.getElementById('2406.15888v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 22 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Interspeech 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.14572</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Quantitative Methods">q-bio.QM</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Information Retrieval">cs.IR</span> </div> </div> <p class="title is-5 mathjax"> Bioptic -- A Target-Agnostic Potency-Based Small Molecules Search Engine </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Vinogradov%2C+V">Vlad Vinogradov</a>, <a href="/search/cs?searchtype=author&amp;query=Izmailov%2C+I">Ivan Izmailov</a>, <a href="/search/cs?searchtype=author&amp;query=Steshin%2C+S">Simon Steshin</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K+T">Kong T. Nguyen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.14572v3-abstract-short" style="display: inline;"> Recent successes in virtual screening have been made possible by large models and extensive chemical libraries. However, combining these elements is challenging: the larger the model, the more expensive it is to run, making ultra-large libraries unfeasible. To address this, we developed a target-agnostic, efficacy-based molecule search model, which allows us to find structurally dissimilar molecul&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.14572v3-abstract-full').style.display = 'inline'; document.getElementById('2406.14572v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.14572v3-abstract-full" style="display: none;"> Recent successes in virtual screening have been made possible by large models and extensive chemical libraries. However, combining these elements is challenging: the larger the model, the more expensive it is to run, making ultra-large libraries unfeasible. To address this, we developed a target-agnostic, efficacy-based molecule search model, which allows us to find structurally dissimilar molecules with similar biological activities. We used the best practices to design fast retrieval system, based on processor-optimized SIMD instructions, enabling us to screen the ultra-large 40B Enamine REAL library with 100\% recall rate. We extensively benchmarked our model and several state-of-the-art models for both speed performance and retrieval quality of novel molecules. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.14572v3-abstract-full').style.display = 'none'; document.getElementById('2406.14572v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 30 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 13 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.13337</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Audio and Speech Processing">eess.AS</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Sound">cs.SD</span> </div> </div> <p class="title is-5 mathjax"> Medical Spoken Named Entity Recognition </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Le-Duc%2C+K">Khai Le-Duc</a>, <a href="/search/cs?searchtype=author&amp;query=Thulke%2C+D">David Thulke</a>, <a href="/search/cs?searchtype=author&amp;query=Tran%2C+H">Hung-Phong Tran</a>, <a href="/search/cs?searchtype=author&amp;query=Vo-Dang%2C+L">Long Vo-Dang</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khai-Nguyen Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Hy%2C+T">Truong-Son Hy</a>, <a href="/search/cs?searchtype=author&amp;query=Schl%C3%BCter%2C+R">Ralf Schl眉ter</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.13337v2-abstract-short" style="display: inline;"> Spoken Named Entity Recognition (NER) aims to extracting named entities from speech and categorizing them into types like person, location, organization, etc. In this work, we present VietMed-NER - the first spoken NER dataset in the medical domain. To our best knowledge, our real-world dataset is the largest spoken NER dataset in the world in terms of the number of entity types, featuring 18 dist&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.13337v2-abstract-full').style.display = 'inline'; document.getElementById('2406.13337v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.13337v2-abstract-full" style="display: none;"> Spoken Named Entity Recognition (NER) aims to extracting named entities from speech and categorizing them into types like person, location, organization, etc. In this work, we present VietMed-NER - the first spoken NER dataset in the medical domain. To our best knowledge, our real-world dataset is the largest spoken NER dataset in the world in terms of the number of entity types, featuring 18 distinct types. Secondly, we present baseline results using various state-of-the-art pre-trained models: encoder-only and sequence-to-sequence. We found that pre-trained multilingual models XLM-R outperformed all monolingual models on both reference text and ASR output. Also in general, encoders perform better than sequence-to-sequence models for the NER task. By simply translating, the transcript is applicable not just to Vietnamese but to other languages as well. All code, data and models are made publicly available here: <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.13337v2-abstract-full').style.display = 'none'; document.getElementById('2406.13337v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 20 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 19 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Preprint, 41 pages</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.10724</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Beyond the Visible: Jointly Attending to Spectral and Spatial Dimensions with HSI-Diffusion for the FINCH Spacecraft </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Vyse%2C+I">Ian Vyse</a>, <a href="/search/cs?searchtype=author&amp;query=Dagli%2C+R">Rishit Dagli</a>, <a href="/search/cs?searchtype=author&amp;query=Chadha%2C+D+V">Dav Vrat Chadha</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+J+P">John P. Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+H">Hector Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Ruparelia%2C+I">Isha Ruparelia</a>, <a href="/search/cs?searchtype=author&amp;query=Seran%2C+P">Prithvi Seran</a>, <a href="/search/cs?searchtype=author&amp;query=Xie%2C+M">Matthew Xie</a>, <a href="/search/cs?searchtype=author&amp;query=Aamer%2C+E">Eesa Aamer</a>, <a href="/search/cs?searchtype=author&amp;query=Armstrong%2C+A">Aidan Armstrong</a>, <a href="/search/cs?searchtype=author&amp;query=Black%2C+N">Naveen Black</a>, <a href="/search/cs?searchtype=author&amp;query=Borstein%2C+B">Ben Borstein</a>, <a href="/search/cs?searchtype=author&amp;query=Caldwell%2C+K">Kevin Caldwell</a>, <a href="/search/cs?searchtype=author&amp;query=Dahanaggamaarachchi%2C+O">Orrin Dahanaggamaarachchi</a>, <a href="/search/cs?searchtype=author&amp;query=Dai%2C+J">Joe Dai</a>, <a href="/search/cs?searchtype=author&amp;query=Fatima%2C+A">Abeer Fatima</a>, <a href="/search/cs?searchtype=author&amp;query=Lu%2C+S">Stephanie Lu</a>, <a href="/search/cs?searchtype=author&amp;query=Michet%2C+M">Maxime Michet</a>, <a href="/search/cs?searchtype=author&amp;query=Paul%2C+A">Anoushka Paul</a>, <a href="/search/cs?searchtype=author&amp;query=Po%2C+C+A">Carrie Ann Po</a>, <a href="/search/cs?searchtype=author&amp;query=Prakash%2C+S">Shivesh Prakash</a>, <a href="/search/cs?searchtype=author&amp;query=Prosser%2C+N">Noa Prosser</a>, <a href="/search/cs?searchtype=author&amp;query=Roy%2C+R">Riddhiman Roy</a>, <a href="/search/cs?searchtype=author&amp;query=Shinjo%2C+M">Mirai Shinjo</a>, <a href="/search/cs?searchtype=author&amp;query=Shofman%2C+I">Iliya Shofman</a> , et al. (4 additional authors not shown) </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.10724v1-abstract-short" style="display: inline;"> Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.10724v1-abstract-full').style.display = 'inline'; document.getElementById('2406.10724v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.10724v1-abstract-full" style="display: none;"> Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and spatial information, it is prone to various types of noise, including random noise, stripe noise, and dead pixels. Effective denoising of these images is crucial for downstream scientific tasks. Traditional methods, including hand-crafted techniques encoding strong priors, learned 2D image denoising methods applied across different hyperspectral bands, or diffusion generative models applied independently on bands, often struggle with varying noise strengths across spectral bands, leading to significant spectral distortion. This paper presents a novel approach to hyperspectral image denoising using latent diffusion models that integrate spatial and spectral information. We particularly do so by building a 3D diffusion model and presenting a 3-stage training approach on real and synthetically crafted datasets. The proposed method preserves image structure while reducing noise. Evaluations on both popular hyperspectral denoising datasets and synthetically crafted datasets for the FINCH mission demonstrate the effectiveness of this approach. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.10724v1-abstract-full').style.display = 'none'; document.getElementById('2406.10724v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 15 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">To appear in 38th Annual Small Satellite Conference</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.03413</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> UnWave-Net: Unrolled Wavelet Network for Compton Tomography Image Reconstruction </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ayad%2C+I">Ishak Ayad</a>, <a href="/search/cs?searchtype=author&amp;query=Tarpau%2C+C">C茅cilia Tarpau</a>, <a href="/search/cs?searchtype=author&amp;query=Cebeiro%2C+J">Javier Cebeiro</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+M+K">Ma茂 K. Nguyen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.03413v1-abstract-short" style="display: inline;"> Computed tomography (CT) is a widely used medical imaging technique to scan internal structures of a body, typically involving collimation and mechanical rotation. Compton scatter tomography (CST) presents an interesting alternative to conventional CT by leveraging Compton physics instead of collimation to gather information from multiple directions. While CST introduces new imaging opportunities&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.03413v1-abstract-full').style.display = 'inline'; document.getElementById('2406.03413v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.03413v1-abstract-full" style="display: none;"> Computed tomography (CT) is a widely used medical imaging technique to scan internal structures of a body, typically involving collimation and mechanical rotation. Compton scatter tomography (CST) presents an interesting alternative to conventional CT by leveraging Compton physics instead of collimation to gather information from multiple directions. While CST introduces new imaging opportunities with several advantages such as high sensitivity, compactness, and entirely fixed systems, image reconstruction remains an open problem due to the mathematical challenges of CST modeling. In contrast, deep unrolling networks have demonstrated potential in CT image reconstruction, despite their computationally intensive nature. In this study, we investigate the efficiency of unrolling networks for CST image reconstruction. To address the important computational cost required for training, we propose UnWave-Net, a novel unrolled wavelet-based reconstruction network. This architecture includes a non-local regularization term based on wavelets, which captures long-range dependencies within images and emphasizes the multi-scale components of the wavelet transform. We evaluate our approach using a CST of circular geometry which stays completely static during data acquisition, where UnWave-Net facilitates image reconstruction in the absence of a specific reconstruction formula. Our method outperforms existing approaches and achieves state-of-the-art performance in terms of SSIM and PSNR, and offers an improved computational efficiency compared to traditional unrolling networks. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.03413v1-abstract-full').style.display = 'none'; document.getElementById('2406.03413v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">This paper has been early accepted by MICCAI 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.17002</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Van+Nguyen%2C+Q">Quan Van Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Pham%2C+H+Q">Huy Quang Pham</a>, <a href="/search/cs?searchtype=author&amp;query=Tran%2C+D+Q">Dan Quang Tran</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+T+K">Thang Kien-Bao Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen-Dang%2C+N">Nhat-Hao Nguyen-Dang</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen-Tat%2C+B">Bao-Thien Nguyen-Tat</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.17002v2-abstract-short" style="display: inline;"> Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Metho&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.17002v2-abstract-full').style.display = 'inline'; document.getElementById('2405.17002v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.17002v2-abstract-full" style="display: none;"> Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Methods: In our participation in the ImageCLEFmedical2024 Caption evaluation campaign, we explored caption prediction tasks using advanced Transformer-based models. We developed methods incorporating Transformer encoder-decoder and Query Transformer architectures. These models were trained and evaluated to generate diagnostic captions from radiology images. Results: Experimental evaluations demonstrated the effectiveness of our models, with the VisionDiagnostor-BioBART model achieving the highest BERTScore of 0.6267. This performance contributed to our team, DarkCow, achieving third place on the leaderboard. Conclusion: Our diagnostic captioning models show great promise in aiding medical professionals by generating high-quality reports efficiently. This approach can facilitate better data processing and performance optimization in medical imaging departments, ultimately benefiting healthcare delivery. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.17002v2-abstract-full').style.display = 'none'; document.getElementById('2405.17002v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 27 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 27 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.15311</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khanh-Binh Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Park%2C+C+J">Chae Jung Park</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.15311v3-abstract-short" style="display: inline;"> Self-supervised learning (SSL) is gaining attention for its ability to learn effective representations with large amounts of unlabeled data. Lightweight models can be distilled from larger self-supervised pre-trained models using contrastive and consistency constraints. Still, the different sizes of the projection heads make it challenging for students to mimic the teacher&#39;s embedding accurately.&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.15311v3-abstract-full').style.display = 'inline'; document.getElementById('2405.15311v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.15311v3-abstract-full" style="display: none;"> Self-supervised learning (SSL) is gaining attention for its ability to learn effective representations with large amounts of unlabeled data. Lightweight models can be distilled from larger self-supervised pre-trained models using contrastive and consistency constraints. Still, the different sizes of the projection heads make it challenging for students to mimic the teacher&#39;s embedding accurately. We propose \textsc{Retro}, which reuses the teacher&#39;s projection head for students, and our experimental results demonstrate significant improvements over the state-of-the-art on all lightweight models. For instance, when training EfficientNet-B0 using ResNet-50/101/152 as teachers, our approach improves the linear result on ImageNet to $66.9\%$, $69.3\%$, and $69.8\%$, respectively, with significantly fewer parameters. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.15311v3-abstract-full').style.display = 'none'; document.getElementById('2405.15311v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 24 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 24 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted at BMVC 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.13160</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Borrowing Strength in Distributionally Robust Optimization via Hierarchical Dirichlet Processes </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Bariletto%2C+N">Nicola Bariletto</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khai Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Ho%2C+N">Nhat Ho</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.13160v1-abstract-short" style="display: inline;"> This paper presents a novel optimization framework to address key challenges presented by modern machine learning applications: High dimensionality, distributional uncertainty, and data heterogeneity. Our approach unifies regularized estimation, distributionally robust optimization (DRO), and hierarchical Bayesian modeling in a single data-driven criterion. By employing a hierarchical Dirichlet pr&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.13160v1-abstract-full').style.display = 'inline'; document.getElementById('2405.13160v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.13160v1-abstract-full" style="display: none;"> This paper presents a novel optimization framework to address key challenges presented by modern machine learning applications: High dimensionality, distributional uncertainty, and data heterogeneity. Our approach unifies regularized estimation, distributionally robust optimization (DRO), and hierarchical Bayesian modeling in a single data-driven criterion. By employing a hierarchical Dirichlet process (HDP) prior, the method effectively handles multi-source data, achieving regularization, distributional robustness, and borrowing strength across diverse yet related data-generating processes. We demonstrate the method&#39;s advantages by establishing theoretical performance guarantees and tractable Monte Carlo approximations based on Dirichlet process (DP) theory. Numerical experiments validate the framework&#39;s efficacy in improving and stabilizing both prediction and parameter estimation accuracy, showcasing its potential for application in complex data environments. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.13160v1-abstract-full').style.display = 'none'; document.getElementById('2405.13160v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.10084</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Audio and Speech Processing">eess.AS</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Sound">cs.SD</span> </div> </div> <p class="title is-5 mathjax"> Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Luong%2C+M">Manh Luong</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+K">Khai Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Ho%2C+N">Nhat Ho</a>, <a href="/search/cs?searchtype=author&amp;query=Haf%2C+R">Reza Haf</a>, <a href="/search/cs?searchtype=author&amp;query=Phung%2C+D">Dinh Phung</a>, <a href="/search/cs?searchtype=author&amp;query=Qu%2C+L">Lizhen Qu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.10084v1-abstract-short" style="display: inline;"> The Learning-to-match (LTM) framework proves to be an effective inverse optimal transport approach for learning the underlying ground metric between two sources of data, facilitating subsequent matching. However, the conventional LTM framework faces scalability challenges, necessitating the use of the entire dataset each time the parameters of the ground metric are updated. In adapting LTM to the&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.10084v1-abstract-full').style.display = 'inline'; document.getElementById('2405.10084v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.10084v1-abstract-full" style="display: none;"> The Learning-to-match (LTM) framework proves to be an effective inverse optimal transport approach for learning the underlying ground metric between two sources of data, facilitating subsequent matching. However, the conventional LTM framework faces scalability challenges, necessitating the use of the entire dataset each time the parameters of the ground metric are updated. In adapting LTM to the deep learning context, we introduce the mini-batch Learning-to-match (m-LTM) framework for audio-text retrieval problems. This framework leverages mini-batch subsampling and Mahalanobis-enhanced family of ground metrics. Moreover, to cope with misaligned training data in practice, we propose a variant using partial optimal transport to mitigate the harm of misaligned data pairs in training data. We conduct extensive experiments on audio-text matching problems using three datasets: AudioCaps, Clotho, and ESC-50. Results demonstrate that our proposed method is capable of learning rich and expressive joint embedding space, which achieves SOTA performance. Beyond this, the proposed m-LTM framework is able to close the modality gap across audio and text embedding, which surpasses both triplet and contrastive loss in the zero-shot sound event detection task on the ESC-50 dataset. Notably, our strategy of employing partial optimal transport with m-LTM demonstrates greater noise tolerance than contrastive loss, especially under varying noise ratios in training data on the AudioCaps dataset. Our code is available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.10084v1-abstract-full').style.display = 'none'; document.getElementById('2405.10084v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 16 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.07615</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> ViWikiFC: Fact-Checking for Vietnamese Wikipedia-Based Textual Knowledge Source </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Le%2C+H+T">Hung Tuan Le</a>, <a href="/search/cs?searchtype=author&amp;query=To%2C+L+T">Long Truong To</a>, <a href="/search/cs?searchtype=author&amp;query=Nguyen%2C+M+T">Manh Trong Nguyen</a>, <a href="/search/cs?searchtype=author&amp;query=Van+Nguyen%2C+K">Kiet Van Nguyen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.07615v1-abstract-short" style="display: inline;"> Fact-checking is essential due to the explosion of misinformation in the media ecosystem. Although false information exists in every language and country, most research to solve the problem mainly concentrated on huge communities like English and Chinese. Low-resource languages like Vietnamese are necessary to explore corpora and models for fact verification. To bridge this gap, we construct ViWik&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.07615v1-abstract-full').style.display = 'inline'; document.getElementById('2405.07615v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.07615v1-abstract-full" style="display: none;"> Fact-checking is essential due to the explosion of misinformation in the media ecosystem. Although false information exists in every language and country, most research to solve the problem mainly concentrated on huge communities like English and Chinese. Low-resource languages like Vietnamese are necessary to explore corpora and models for fact verification. To bridge this gap, we construct ViWikiFC, the first manual annotated open-domain corpus for Vietnamese Wikipedia Fact Checking more than 20K claims generated by converting evidence sentences extracted from Wikipedia articles. We analyze our corpus through many linguistic aspects, from the new dependency rate, the new n-gram rate, and the new word rate. We conducted various experiments for Vietnamese fact-checking, including evidence retrieval and verdict prediction. BM25 and InfoXLM (Large) achieved the best results in two tasks, with BM25 achieving an accuracy of 88.30% for SUPPORTS, 86.93% for REFUTES, and only 56.67% for the NEI label in the evidence retrieval task, InfoXLM (Large) achieved an F1 score of 86.51%. Furthermore, we also conducted a pipeline approach, which only achieved a strict accuracy of 67.00% when using InfoXLM (Large) and BM25. These results demonstrate that our dataset is challenging for the Vietnamese language model in fact-checking tasks. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.07615v1-abstract-full').style.display = 'none'; document.getElementById('2405.07615v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> </li> </ol> <nav class="pagination is-small is-centered breathe-horizontal" role="navigation" aria-label="pagination"> <a href="" class="pagination-previous is-invisible">Previous </a> <a href="/search/?searchtype=author&amp;query=Nguyen%2C+K&amp;start=50" class="pagination-next" >Next </a> <ul class="pagination-list"> <li> <a href="/search/?searchtype=author&amp;query=Nguyen%2C+K&amp;start=0" class="pagination-link 