class="pagination-ellipsis">&hellip;</span></li> </ul> </nav> <ol class="breathe-horizontal" start="1"> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.20256</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Do computer vision foundation models learn the low-level characteristics of the human visual system? </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yancheng Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Yin%2C+F">Fei Yin</a>, <a href="/search/cs?searchtype=author&amp;query=Hammou%2C+D">Dounia Hammou</a>, <a href="/search/cs?searchtype=author&amp;query=Mantiuk%2C+R">Rafal Mantiuk</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.20256v1-abstract-short" style="display: inline;"> Computer vision foundation models, such as DINO or OpenCLIP, are trained in a self-supervised manner on large image datasets. Analogously, substantial evidence suggests that the human visual system (HVS) is influenced by the statistical distribution of colors and patterns in the natural world, characteristics also present in the training data of foundation models. The question we address in this p&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.20256v1-abstract-full').style.display = 'inline'; document.getElementById('2502.20256v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.20256v1-abstract-full" style="display: none;"> Computer vision foundation models, such as DINO or OpenCLIP, are trained in a self-supervised manner on large image datasets. Analogously, substantial evidence suggests that the human visual system (HVS) is influenced by the statistical distribution of colors and patterns in the natural world, characteristics also present in the training data of foundation models. The question we address in this paper is whether foundation models trained on natural images mimic some of the low-level characteristics of the human visual system, such as contrast detection, contrast masking, and contrast constancy. Specifically, we designed a protocol comprising nine test types to evaluate the image encoders of 45 foundation and generative models. Our results indicate that some foundation models (e.g., DINO, DINOv2, and OpenCLIP), share some of the characteristics of human vision, but other models show little resemblance. Foundation models tend to show smaller sensitivity to low contrast and rather irregular responses to contrast across frequencies. The foundation models show the best agreement with human data in terms of contrast masking. Our findings suggest that human vision and computer vision may take both similar and different paths when learning to interpret images of the real world. Overall, while differences remain, foundation models trained on vision tasks start to align with low-level human vision, with DINOv2 showing the closest resemblance. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.20256v1-abstract-full').style.display = 'none'; document.getElementById('2502.20256v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 27 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted by CVPR 2025</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.18801</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> </div> </div> <p class="title is-5 mathjax"> Efficient and Distributed Large-Scale Point Cloud Bundle Adjustment via Majorization-Minimization </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Li%2C+R">Rundong Li</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+Z">Zheng Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Wei%2C+H">Hairuo Wei</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yixi Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+H">Haotian Li</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+F">Fu Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.18801v1-abstract-short" style="display: inline;"> Point cloud bundle adjustment is critical in large-scale point cloud mapping. However, it is both computationally and memory intensive, with its complexity growing cubically as the number of scan poses increases. This paper presents BALM3.0, an efficient and distributed large-scale point cloud bundle adjustment method. The proposed method employs the majorization-minimization algorithm to decouple&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.18801v1-abstract-full').style.display = 'inline'; document.getElementById('2502.18801v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.18801v1-abstract-full" style="display: none;"> Point cloud bundle adjustment is critical in large-scale point cloud mapping. However, it is both computationally and memory intensive, with its complexity growing cubically as the number of scan poses increases. This paper presents BALM3.0, an efficient and distributed large-scale point cloud bundle adjustment method. The proposed method employs the majorization-minimization algorithm to decouple the scan poses in the bundle adjustment process, thus performing the point cloud bundle adjustment on large-scale data with improved computational efficiency. The key difficulty of applying majorization-minimization on bundle adjustment is to identify the proper surrogate cost function. In this paper, the proposed surrogate cost function is based on the point-to-plane distance. The primary advantages of decoupling the scan poses via a majorization-minimization algorithm stem from two key aspects. First, the decoupling of scan poses reduces the optimization time complexity from cubic to linear, significantly enhancing the computational efficiency of the bundle adjustment process in large-scale environments. Second, it lays the theoretical foundation for distributed bundle adjustment. By distributing both data and computation across multiple devices, this approach helps overcome the limitations posed by large memory and computational requirements, which may be difficult for a single device to handle. The proposed method is extensively evaluated in both simulated and real-world environments. The results demonstrate that the proposed method achieves the same optimal residual with comparable accuracy while offering up to 704 times faster optimization speed and reducing memory usage to 1/8. Furthermore, this paper also presented and implemented a distributed bundle adjustment framework and successfully optimized large-scale data (21,436 poses with 70 GB point clouds) with four consumer-level laptops. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.18801v1-abstract-full').style.display = 'none'; document.getElementById('2502.18801v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 25 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.18072</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Multiagent Systems">cs.MA</span> </div> </div> <p class="title is-5 mathjax"> MRBTP: Efficient Multi-Robot Behavior Tree Planning and Collaboration </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yishuai Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+X">Xinglin Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Z">Zhongxuan Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Mao%2C+Y">Yunxin Mao</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+M">Minglong Li</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+W">Wenjing Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+J">Ji Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.18072v1-abstract-short" style="display: inline;"> Multi-robot task planning and collaboration are critical challenges in robotics. While Behavior Trees (BTs) have been established as a popular control architecture and are plannable for a single robot, the development of effective multi-robot BT planning algorithms remains challenging due to the complexity of coordinating diverse action spaces. We propose the Multi-Robot Behavior Tree Planning (MR&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.18072v1-abstract-full').style.display = 'inline'; document.getElementById('2502.18072v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.18072v1-abstract-full" style="display: none;"> Multi-robot task planning and collaboration are critical challenges in robotics. While Behavior Trees (BTs) have been established as a popular control architecture and are plannable for a single robot, the development of effective multi-robot BT planning algorithms remains challenging due to the complexity of coordinating diverse action spaces. We propose the Multi-Robot Behavior Tree Planning (MRBTP) algorithm, with theoretical guarantees of both soundness and completeness. MRBTP features cross-tree expansion to coordinate heterogeneous actions across different BTs to achieve the team&#39;s goal. For homogeneous actions, we retain backup structures among BTs to ensure robustness and prevent redundant execution through intention sharing. While MRBTP is capable of generating BTs for both homogeneous and heterogeneous robot teams, its efficiency can be further improved. We then propose an optional plugin for MRBTP when Large Language Models (LLMs) are available to reason goal-related actions for each robot. These relevant actions can be pre-planned to form long-horizon subtrees, significantly enhancing the planning speed and collaboration efficiency of MRBTP. We evaluate our algorithm in warehouse management and everyday service scenarios. Results demonstrate MRBTP&#39;s robustness and execution efficiency under varying settings, as well as the ability of the pre-trained LLM to generate effective task-specific subtrees for MRBTP. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.18072v1-abstract-full').style.display = 'none'; document.getElementById('2502.18072v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 25 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.17809</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Science and Game Theory">cs.GT</span> </div> </div> <p class="title is-5 mathjax"> Information Disclosure Makes Simple Mechanisms Competitive </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yang Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+Y">Yingkai Li</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+J">Jinzhao Wu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.17809v1-abstract-short" style="display: inline;"> In classical mechanism design, the prevailing assumption is that the information structure about agents&#39; types is exogenous. This assumption introduces complexity, especially with multi-dimensional agent types, leading to mechanisms that, while optimal, may appear complex and unnatural. Furthermore, Hart and Nisan (2019) show that the gap between the performance of any simple mechanism and the opt&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.17809v1-abstract-full').style.display = 'inline'; document.getElementById('2502.17809v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.17809v1-abstract-full" style="display: none;"> In classical mechanism design, the prevailing assumption is that the information structure about agents&#39; types is exogenous. This assumption introduces complexity, especially with multi-dimensional agent types, leading to mechanisms that, while optimal, may appear complex and unnatural. Furthermore, Hart and Nisan (2019) show that the gap between the performance of any simple mechanism and the optimal solution could be potentially unbounded. We challenge this conventional view by showing that simple mechanisms can be highly competitive if the information structure is endogenous and can be influenced by the designer. We study a multi-dimensional generalization of a single-dimensional model proposed by Bergemann and Pesendorfer (2007), where the designer can shape the information structure via information disclosure. Specifically, we consider a fundamental multi-dimensional mechanism design problem, where a seller is selling m items to a single unit-demand buyer to maximize her revenue. The buyer&#39;s values can be arbitrarily correlated across the items. Our main result shows that, following an appropriately chosen information disclosure scheme, item pricing, i.e., set a take-it-or-leave-it price on each item is highly competitive and guarantees to attain at least 50.1% of the optimal revenue. To our knowledge, this is the first result demonstrating the (approximate) optimality of simple mechanisms in this extensively studied multi-dimensional setting, without making any assumptions about the buyer&#39;s value distribution. We believe our result not only demonstrates the power of information disclosure in enhancing the performance of simple mechanisms but also suggests a new framework for reevaluating their efficacy in multi-dimensional settings. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.17809v1-abstract-full').style.display = 'none'; document.getElementById('2502.17809v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 24 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.16075</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">ps</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Optimization and Control">math.OC</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> </div> </div> <p class="title is-5 mathjax"> Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yuhang Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+K">Kangjie Zhou</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+J">Jingfeng Wu</a>, <a href="/search/cs?searchtype=author&amp;query=Mei%2C+S">Song Mei</a>, <a href="/search/cs?searchtype=author&amp;query=Lindsey%2C+M">Michael Lindsey</a>, <a href="/search/cs?searchtype=author&amp;query=Bartlett%2C+P+L">Peter L. Bartlett</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.16075v1-abstract-short" style="display: inline;"> We establish the asymptotic implicit bias of gradient descent (GD) for generic non-homogeneous deep networks under exponential loss. Specifically, we characterize three key properties of GD iterates starting from a sufficiently small empirical risk, where the threshold is determined by a measure of the network&#39;s non-homogeneity. First, we show that a normalized margin induced by the GD iterates in&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.16075v1-abstract-full').style.display = 'inline'; document.getElementById('2502.16075v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.16075v1-abstract-full" style="display: none;"> We establish the asymptotic implicit bias of gradient descent (GD) for generic non-homogeneous deep networks under exponential loss. Specifically, we characterize three key properties of GD iterates starting from a sufficiently small empirical risk, where the threshold is determined by a measure of the network&#39;s non-homogeneity. First, we show that a normalized margin induced by the GD iterates increases nearly monotonically. Second, we prove that while the norm of the GD iterates diverges to infinity, the iterates themselves converge in direction. Finally, we establish that this directional limit satisfies the Karush-Kuhn-Tucker (KKT) conditions of a margin maximization problem. Prior works on implicit bias have focused exclusively on homogeneous networks; in contrast, our results apply to a broad class of non-homogeneous networks satisfying a mild near-homogeneity condition. In particular, our results apply to networks with residual connections and non-homogeneous activation functions, thereby resolving an open problem posed by Ji and Telgarsky (2020). <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.16075v1-abstract-full').style.display = 'none'; document.getElementById('2502.16075v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">96 pages</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.15885</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhu%2C+H">Hongjie Zhu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+Z">Zeyu Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Pang%2C+G">Guansong Pang</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+X">Xu Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Wen%2C+S">Shimin Wen</a>, <a href="/search/cs?searchtype=author&amp;query=Bai%2C+Y">Yu Bai</a>, <a href="/search/cs?searchtype=author&amp;query=Ergu%2C+D">Daji Ergu</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Ying Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+Y">Yang Zhao</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.15885v1-abstract-short" style="display: inline;"> Weakly supervised semantic segmentation (WSSS) typically utilizes limited semantic annotations to obtain initial Class Activation Maps (CAMs). However, due to the inadequate coupling between class activation responses and semantic information in high-dimensional space, the CAM is prone to object co-occurrence or under-activation, resulting in inferior recognition accuracy. To tackle this issue, we&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.15885v1-abstract-full').style.display = 'inline'; document.getElementById('2502.15885v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.15885v1-abstract-full" style="display: none;"> Weakly supervised semantic segmentation (WSSS) typically utilizes limited semantic annotations to obtain initial Class Activation Maps (CAMs). However, due to the inadequate coupling between class activation responses and semantic information in high-dimensional space, the CAM is prone to object co-occurrence or under-activation, resulting in inferior recognition accuracy. To tackle this issue, we propose DOEI, Dual Optimization of Embedding Information, a novel approach that reconstructs embedding representations through semantic-aware attention weight matrices to optimize the expression capability of embedding information. Specifically, DOEI amplifies tokens with high confidence and suppresses those with low confidence during the class-to-patch interaction. This alignment of activation responses with semantic information strengthens the propagation and decoupling of target features, enabling the generated embeddings to more accurately represent target features in high-level semantic space. In addition, we propose a hybrid-feature alignment module in DOEI that combines RGB values, embedding-guided features, and self-attention weights to increase the reliability of candidate tokens. Comprehensive experiments show that DOEI is an effective plug-and-play module that empowers state-of-the-art visual transformer-based WSSS models to significantly improve the quality of CAMs and segmentation performance on popular benchmarks, including PASCAL VOC (+3.6%, +1.5%, +1.2% mIoU) and MS COCO (+1.2%, +1.6% mIoU). Code will be available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.15885v1-abstract-full').style.display = 'none'; document.getElementById('2502.15885v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.15307</a> <span>&nbsp;[<a href="">pdf</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Road Traffic Sign Recognition method using Siamese network Combining Efficient-CNN based Encoder </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Xi%2C+Z">Zhenghao Xi</a>, <a href="/search/cs?searchtype=author&amp;query=Shao%2C+Y">Yuchao Shao</a>, <a href="/search/cs?searchtype=author&amp;query=Zheng%2C+Y">Yang Zheng</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+X">Xiang Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+Y">Yaqi Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yitong Cai</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.15307v1-abstract-short" style="display: inline;"> Traffic signs recognition (TSR) plays an essential role in assistant driving and intelligent transportation system. However, the noise of complex environment may lead to motion-blur or occlusion problems, which raise the tough challenge to real-time recognition with high accuracy and robust. In this article, we propose IECES-network which with improved encoders and Siamese net. The three-stage app&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.15307v1-abstract-full').style.display = 'inline'; document.getElementById('2502.15307v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.15307v1-abstract-full" style="display: none;"> Traffic signs recognition (TSR) plays an essential role in assistant driving and intelligent transportation system. However, the noise of complex environment may lead to motion-blur or occlusion problems, which raise the tough challenge to real-time recognition with high accuracy and robust. In this article, we propose IECES-network which with improved encoders and Siamese net. The three-stage approach of our method includes Efficient-CNN based encoders, Siamese backbone and the fully-connected layers. We firstly use convolutional encoders to extract and encode the traffic sign features of augmented training samples and standard images. Then, we design the Siamese neural network with Efficient-CNN based encoder and contrastive loss function, which can be trained to improve the robustness of TSR problem when facing the samples of motion-blur and occlusion by computing the distance between inputs and templates. Additionally, the template branch of the proposed network can be stopped when executing the recognition tasks after training to raise the process speed of our real-time model, and alleviate the computational resource and parameter scale. Finally, we recombined the feature code and a fully-connected layer with SoftMax function to classify the codes of samples and recognize the category of traffic signs. The results of experiments on the Tsinghua-Tencent 100K dataset and the German Traffic Sign Recognition Benchmark dataset demonstrate the performance of the proposed IECESnetwork. Compared with other state-of-the-art methods, in the case of motion-blur and occluded environment, the proposed method achieves competitive performance precision-recall and accuracy metric average is 88.1%, 86.43% and 86.1% with a 2.9M lightweight scale, respectively. Moreover, processing time of our model is 0.1s per frame, of which the speed is increased by 1.5 times compared with existing methods. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.15307v1-abstract-full').style.display = 'none'; document.getElementById('2502.15307v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.14950</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Quantum Physics">quant-ph</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Statistics Theory">math.ST</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> </div> </div> <p class="title is-5 mathjax"> Symmetric observations without symmetric causal explanations </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=William%2C+C">Christian William</a>, <a href="/search/cs?searchtype=author&amp;query=Remy%2C+P">Patrick Remy</a>, <a href="/search/cs?searchtype=author&amp;query=Bancal%2C+J">Jean-Daniel Bancal</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yu Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Brunner%2C+N">Nicolas Brunner</a>, <a href="/search/cs?searchtype=author&amp;query=Pozas-Kerstjens%2C+A">Alejandro Pozas-Kerstjens</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.14950v1-abstract-short" style="display: inline;"> Inferring causal models from observed correlations is a challenging task, crucial to many areas of science. In order to alleviate the effort, it is important to know whether symmetries in the observations correspond to symmetries in the underlying realization. Via an explicit example, we answer this question in the negative. We use a tripartite probability distribution over binary events that is r&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.14950v1-abstract-full').style.display = 'inline'; document.getElementById('2502.14950v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.14950v1-abstract-full" style="display: none;"> Inferring causal models from observed correlations is a challenging task, crucial to many areas of science. In order to alleviate the effort, it is important to know whether symmetries in the observations correspond to symmetries in the underlying realization. Via an explicit example, we answer this question in the negative. We use a tripartite probability distribution over binary events that is realized by using three (different) independent sources of classical randomness. We prove that even removing the condition that the sources distribute systems described by classical physics, the requirements that i) the sources distribute the same physical systems, ii) these physical systems respect relativistic causality, and iii) the correlations are the observed ones, are incompatible. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.14950v1-abstract-full').style.display = 'none'; document.getElementById('2502.14950v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 20 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">8 pages, 4 figures, RevTeX 4.2. The computational appendix is available at</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.14275</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Fact or Guesswork? Evaluating Large Language Model&#39;s Medical Knowledge with Structured One-Hop Judgment </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Li%2C+J">Jiaxi Li</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yiwei Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+K">Kai Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yujun Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Hooi%2C+B">Bryan Hooi</a>, <a href="/search/cs?searchtype=author&amp;query=Peng%2C+N">Nanyun Peng</a>, <a href="/search/cs?searchtype=author&amp;query=Chang%2C+K">Kai-Wei Chang</a>, <a href="/search/cs?searchtype=author&amp;query=Lu%2C+J">Jin Lu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.14275v1-abstract-short" style="display: inline;"> Large language models (LLMs) have been widely adopted in various downstream task domains. However, their ability to directly recall and apply factual medical knowledge remains under-explored. Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs&#39; inherent medical knowledge from their reasoning capabilities. Given the high-stakes na&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.14275v1-abstract-full').style.display = 'inline'; document.getElementById('2502.14275v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.14275v1-abstract-full" style="display: none;"> Large language models (LLMs) have been widely adopted in various downstream task domains. However, their ability to directly recall and apply factual medical knowledge remains under-explored. Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs&#39; inherent medical knowledge from their reasoning capabilities. Given the high-stakes nature of medical applications, where incorrect information can have critical consequences, it is essential to evaluate how well LLMs encode, retain, and recall fundamental medical facts. To bridge this gap, we introduce the Medical Knowledge Judgment, a dataset specifically designed to measure LLMs&#39; one-hop factual medical knowledge. MKJ is constructed from the Unified Medical Language System (UMLS), a large-scale repository of standardized biomedical vocabularies and knowledge graphs. We frame knowledge assessment as a binary judgment task, requiring LLMs to verify the correctness of medical statements extracted from reliable and structured knowledge sources. Our experiments reveal that LLMs struggle with factual medical knowledge retention, exhibiting significant performance variance across different semantic categories, particularly for rare medical conditions. Furthermore, LLMs show poor calibration, often being overconfident in incorrect answers. To mitigate these issues, we explore retrieval-augmented generation, demonstrating its effectiveness in improving factual accuracy and reducing uncertainty in medical decision-making. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.14275v1-abstract-full').style.display = 'none'; document.getElementById('2502.14275v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 20 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">15 pages, 11 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.11158</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> AnyRefill: A Unified, Data-Efficient Framework for Left-Prompt-Guided Vision Tasks </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Xie%2C+M">Ming Xie</a>, <a href="/search/cs?searchtype=author&amp;query=Cao%2C+C">Chenjie Cao</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yunuo Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Xue%2C+X">Xiangyang Xue</a>, <a href="/search/cs?searchtype=author&amp;query=Jiang%2C+Y">Yu-Gang Jiang</a>, <a href="/search/cs?searchtype=author&amp;query=Fu%2C+Y">Yanwei Fu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.11158v2-abstract-short" style="display: inline;"> In this paper, we present a novel Left-Prompt-Guided (LPG) paradigm to address a diverse range of reference-based vision tasks. Inspired by the human creative process, we reformulate these tasks using a left-right stitching formulation to construct contextual input. Building upon this foundation, we propose AnyRefill, an extension of LeftRefill, that effectively adapts Text-to-Image (T2I) models t&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.11158v2-abstract-full').style.display = 'inline'; document.getElementById('2502.11158v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.11158v2-abstract-full" style="display: none;"> In this paper, we present a novel Left-Prompt-Guided (LPG) paradigm to address a diverse range of reference-based vision tasks. Inspired by the human creative process, we reformulate these tasks using a left-right stitching formulation to construct contextual input. Building upon this foundation, we propose AnyRefill, an extension of LeftRefill, that effectively adapts Text-to-Image (T2I) models to various vision tasks. AnyRefill leverages the inpainting priors of advanced T2I model based on the Diffusion Transformer (DiT) architecture, and incorporates flexible components to enhance its capabilities. By combining task-specific LoRAs with the stitching input, AnyRefill unlocks its potential across diverse tasks, including conditional generation, visual perception, and image editing, without requiring additional visual encoders. Meanwhile, AnyRefill exhibits remarkable data efficiency, requiring minimal task-specific fine-tuning while maintaining high generative performance. Through extensive ablation studies, we demonstrate that AnyRefill outperforms other image condition injection methods and achieves competitive results compared to state-of-the-art open-source methods. Notably, AnyRefill delivers results comparable to advanced commercial tools, such as IC-Light and SeedEdit, even in challenging scenarios. Comprehensive experiments and ablation studies across versatile tasks validate the strong generation of the proposed simple yet effective LPG formulation, establishing AnyRefill as a unified, highly data-efficient solution for reference-based vision tasks. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.11158v2-abstract-full').style.display = 'none'; document.getElementById('2502.11158v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 18 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 16 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">19 pages, submitted to TPAMI</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.10689</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Self-Explaining Hypergraph Neural Networks for Diagnosis Prediction </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Yu%2C+L">Leisheng Yu</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yanxiao Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+M">Minxing Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Hu%2C+X">Xia Hu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.10689v1-abstract-short" style="display: inline;"> The burgeoning volume of electronic health records (EHRs) has enabled deep learning models to excel in predictive healthcare. However, for high-stakes applications such as diagnosis prediction, model interpretability remains paramount. Existing deep learning diagnosis prediction models with intrinsic interpretability often assign attention weights to every past diagnosis or hospital visit, providi&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.10689v1-abstract-full').style.display = 'inline'; document.getElementById('2502.10689v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.10689v1-abstract-full" style="display: none;"> The burgeoning volume of electronic health records (EHRs) has enabled deep learning models to excel in predictive healthcare. However, for high-stakes applications such as diagnosis prediction, model interpretability remains paramount. Existing deep learning diagnosis prediction models with intrinsic interpretability often assign attention weights to every past diagnosis or hospital visit, providing explanations lacking flexibility and succinctness. In this paper, we introduce SHy, a self-explaining hypergraph neural network model, designed to offer personalized, concise and faithful explanations that allow for interventions from clinical experts. By modeling each patient as a unique hypergraph and employing a message-passing mechanism, SHy captures higher-order disease interactions and extracts distinct temporal phenotypes as personalized explanations. It also addresses the incompleteness of the EHR data by accounting for essential false negatives in the original diagnosis record. A qualitative case study and extensive quantitative evaluations on two real-world EHR datasets demonstrate the superior predictive performance and interpretability of SHy over existing state-of-the-art models. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.10689v1-abstract-full').style.display = 'none'; document.getElementById('2502.10689v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 15 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.09782</a> <span>&nbsp;&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Audio and Speech Processing">eess.AS</span> </div> </div> <p class="title is-5 mathjax"> Improving Acoustic Side-Channel Attacks on Keyboards Using Transformers and Large Language Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Park%2C+J+H">Jin Hyun Park</a>, <a href="/search/cs?searchtype=author&amp;query=Ayati%2C+S+A">Seyyed Ali Ayati</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yichen Cai</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.09782v3-abstract-short" style="display: inline;"> The increasing prevalence of microphones in everyday devices and the growing reliance on online services have amplified the risk of acoustic side-channel attacks (ASCAs) targeting keyboards. This study explores deep learning techniques, specifically vision transformers (VTs) and large language models (LLMs), to enhance the effectiveness and applicability of such attacks. We present substantial imp&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.09782v3-abstract-full').style.display = 'inline'; document.getElementById('2502.09782v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.09782v3-abstract-full" style="display: none;"> The increasing prevalence of microphones in everyday devices and the growing reliance on online services have amplified the risk of acoustic side-channel attacks (ASCAs) targeting keyboards. This study explores deep learning techniques, specifically vision transformers (VTs) and large language models (LLMs), to enhance the effectiveness and applicability of such attacks. We present substantial improvements over prior research, with the CoAtNet model achieving state-of-the-art performance. Our CoAtNet shows a 5.0% improvement for keystrokes recorded via smartphone (Phone) and 5.9% for those recorded via Zoom compared to previous benchmarks. We also evaluate transformer architectures and language models, with the best VT model matching CoAtNet&#39;s performance. A key advancement is the introduction of a noise mitigation method for real-world scenarios. By using LLMs for contextual understanding, we detect and correct erroneous keystrokes in noisy environments, enhancing ASCA performance. Additionally, fine-tuned lightweight language models with Low-Rank Adaptation (LoRA) deliver comparable performance to heavyweight models with 67X more parameters. This integration of VTs and LLMs improves the practical applicability of ASCA mitigation, marking the first use of these technologies to address ASCAs and error correction in real-world scenarios. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.09782v3-abstract-full').style.display = 'none'; document.getElementById('2502.09782v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 18 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 13 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">We would like to withdraw our paper due to a significant error in the experimental methodology, which impacts the validity of our results. The error specifically affects the analysis presented in Section 4, where an incorrect dataset preprocessing step led to misleading conclusions</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.08332</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Cryptography and Security">cs.CR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Modification and Generated-Text Detection: Achieving Dual Detection Capabilities for the Outputs of LLM by Watermark </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yuhang Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yaofei Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Hu%2C+D">Donghui Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+G">Gu Chen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.08332v1-abstract-short" style="display: inline;"> The development of large language models (LLMs) has raised concerns about potential misuse. One practical solution is to embed a watermark in the text, allowing ownership verification through watermark extraction. Existing methods primarily focus on defending against modification attacks, often neglecting other spoofing attacks. For example, attackers can alter the watermarked text to produce harm&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.08332v1-abstract-full').style.display = 'inline'; document.getElementById('2502.08332v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.08332v1-abstract-full" style="display: none;"> The development of large language models (LLMs) has raised concerns about potential misuse. One practical solution is to embed a watermark in the text, allowing ownership verification through watermark extraction. Existing methods primarily focus on defending against modification attacks, often neglecting other spoofing attacks. For example, attackers can alter the watermarked text to produce harmful content without compromising the presence of the watermark, which could lead to false attribution of this malicious content to the LLM. This situation poses a serious threat to the LLMs service providers and highlights the significance of achieving modification detection and generated-text detection simultaneously. Therefore, we propose a technique to detect modifications in text for unbiased watermark which is sensitive to modification. We introduce a new metric called ``discarded tokens&#34;, which measures the number of tokens not included in watermark detection. When a modification occurs, this metric changes and can serve as evidence of the modification. Additionally, we improve the watermark detection process and introduce a novel method for unbiased watermark. Our experiments demonstrate that we can achieve effective dual detection capabilities: modification detection and generated-text detection by watermark. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.08332v1-abstract-full').style.display = 'none'; document.getElementById('2502.08332v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 12 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.08180</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Enhancing LLM Character-Level Manipulation via Divide and Conquer </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Xiong%2C+Z">Zhen Xiong</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yujun Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Hooi%2C+B">Bryan Hooi</a>, <a href="/search/cs?searchtype=author&amp;query=Peng%2C+N">Nanyun Peng</a>, <a href="/search/cs?searchtype=author&amp;query=Chang%2C+K">Kai-Wei Chang</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+Z">Zhecheng Li</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yiwei Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.08180v1-abstract-short" style="display: inline;"> Large Language Models (LLMs) have demonstrated strong generalization capabilities across a wide range of natural language processing (NLP) tasks. However, they exhibit notable weaknesses in character-level string manipulation, struggling with fundamental operations such as character deletion, insertion, and substitution. These challenges stem primarily from tokenization constraints, despite the cr&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.08180v1-abstract-full').style.display = 'inline'; document.getElementById('2502.08180v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.08180v1-abstract-full" style="display: none;"> Large Language Models (LLMs) have demonstrated strong generalization capabilities across a wide range of natural language processing (NLP) tasks. However, they exhibit notable weaknesses in character-level string manipulation, struggling with fundamental operations such as character deletion, insertion, and substitution. These challenges stem primarily from tokenization constraints, despite the critical role of such operations in data preprocessing and code generation. Through systematic analysis, we derive two key insights: (1) LLMs face significant difficulties in leveraging intrinsic token knowledge for character-level reasoning, and (2) atomized word structures can substantially enhance LLMs&#39; ability to process token-level structural information. Building on these insights, we propose Character-Level Manipulation via Divide and Conquer, a novel approach designed to bridge the gap between token-level processing and character-level manipulation. Our method decomposes complex operations into explicit character-level subtasks coupled with controlled token reconstruction phases, leading to significant improvements in accuracy. Without additional training, our method significantly improves accuracies on the $\texttt{Deletion}$, $\texttt{Insertion}$, and $\texttt{Substitution}$ tasks. To support further research, we open-source our implementation and benchmarks. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.08180v1-abstract-full').style.display = 'none'; document.getElementById('2502.08180v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 12 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.06419</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> </div> </div> <p class="title is-5 mathjax"> Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Xu%2C+T">Tianshuo Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Lu%2C+H">Hao Lu</a>, <a href="/search/cs?searchtype=author&amp;query=Yan%2C+X">Xu Yan</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yingjie Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+B">Bingbing Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+Y">Yingcong Chen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.06419v1-abstract-short" style="display: inline;"> Large Language Models (LLMs) have made substantial advancements in the field of robotic and autonomous driving. This study presents the first Occupancy-based Large Language Model (Occ-LLM), which represents a pioneering effort to integrate LLMs with an important representation. To effectively encode occupancy as input for the LLM and address the category imbalances associated with occupancy, we pr&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.06419v1-abstract-full').style.display = 'inline'; document.getElementById('2502.06419v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.06419v1-abstract-full" style="display: none;"> Large Language Models (LLMs) have made substantial advancements in the field of robotic and autonomous driving. This study presents the first Occupancy-based Large Language Model (Occ-LLM), which represents a pioneering effort to integrate LLMs with an important representation. To effectively encode occupancy as input for the LLM and address the category imbalances associated with occupancy, we propose Motion Separation Variational Autoencoder (MS-VAE). This innovative approach utilizes prior knowledge to distinguish dynamic objects from static scenes before inputting them into a tailored Variational Autoencoder (VAE). This separation enhances the model&#39;s capacity to concentrate on dynamic trajectories while effectively reconstructing static scenes. The efficacy of Occ-LLM has been validated across key tasks, including 4D occupancy forecasting, self-ego planning, and occupancy-based scene question answering. Comprehensive evaluations demonstrate that Occ-LLM significantly surpasses existing state-of-the-art methodologies, achieving gains of about 6\% in Intersection over Union (IoU) and 4\% in mean Intersection over Union (mIoU) for the task of 4D occupancy forecasting. These findings highlight the transformative potential of Occ-LLM in reshaping current paradigms within robotic and autonomous driving. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.06419v1-abstract-full').style.display = 'none'; document.getElementById('2502.06419v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted in 2025 IEEE International Conference on Robotics and Automation (ICRA)</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.05869</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> HyLiFormer: Hyperbolic Linear Attention for Skeleton-based Human Action Recognition </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Li%2C+Y">Yue Li</a>, <a href="/search/cs?searchtype=author&amp;query=Qu%2C+H">Haoxuan Qu</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+M">Mengyuan Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+J">Jun Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yujun Cai</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.05869v1-abstract-short" style="display: inline;"> Transformers have demonstrated remarkable performance in skeleton-based human action recognition, yet their quadratic computational complexity remains a bottleneck for real-world applications. To mitigate this, linear attention mechanisms have been explored but struggle to capture the hierarchical structure of skeleton data. Meanwhile, the Poincar茅 model, as a typical hyperbolic geometry, offers a&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.05869v1-abstract-full').style.display = 'inline'; document.getElementById('2502.05869v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.05869v1-abstract-full" style="display: none;"> Transformers have demonstrated remarkable performance in skeleton-based human action recognition, yet their quadratic computational complexity remains a bottleneck for real-world applications. To mitigate this, linear attention mechanisms have been explored but struggle to capture the hierarchical structure of skeleton data. Meanwhile, the Poincar茅 model, as a typical hyperbolic geometry, offers a powerful framework for modeling hierarchical structures but lacks well-defined operations for existing mainstream linear attention. In this paper, we propose HyLiFormer, a novel hyperbolic linear attention Transformer tailored for skeleton-based action recognition. Our approach incorporates a Hyperbolic Transformation with Curvatures (HTC) module to map skeleton data into hyperbolic space and a Hyperbolic Linear Attention (HLA) module for efficient long-range dependency modeling. Theoretical analysis and extensive experiments on NTU RGB+D and NTU RGB+D 120 datasets demonstrate that HyLiFormer significantly reduces computational complexity while preserving model accuracy, making it a promising solution for efficiency-critical applications. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.05869v1-abstract-full').style.display = 'none'; document.getElementById('2502.05869v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 9 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.04952</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Software Engineering">cs.SE</span> </div> </div> <p class="title is-5 mathjax"> Boosting Path-Sensitive Value Flow Analysis via Removal of Redundant Summaries </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yongchao Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yuandao Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+C">Charles Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.04952v3-abstract-short" style="display: inline;"> Value flow analysis that tracks the flow of values via data dependence is a widely used technique for detecting a broad spectrum of software bugs. However, the scalability issue often deteriorates when high precision (i.e., path-sensitivity) is required, as the instantiation of function summaries becomes excessively time- and memory-intensive. The primary culprit, as we observe, is the existence o&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.04952v3-abstract-full').style.display = 'inline'; document.getElementById('2502.04952v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.04952v3-abstract-full" style="display: none;"> Value flow analysis that tracks the flow of values via data dependence is a widely used technique for detecting a broad spectrum of software bugs. However, the scalability issue often deteriorates when high precision (i.e., path-sensitivity) is required, as the instantiation of function summaries becomes excessively time- and memory-intensive. The primary culprit, as we observe, is the existence of redundant computations resulting from blindly computing summaries for a function, irrespective of whether they are related to bugs being checked. To address this problem, we present the first approach that can effectively identify and eliminate redundant summaries, thereby reducing the size of collected summaries from callee functions without compromising soundness or efficiency. Our evaluation on large programs demonstrates that our identification algorithm can significantly reduce the time and memory overhead of the state-of-the-art value flow analysis by 45\% and 27\%, respectively. Furthermore, the identification algorithm demonstrates remarkable efficiency by identifying nearly 80\% of redundant summaries while incurring a minimal additional overhead. In the largest \textit{mysqld} project, the identification algorithm reduces the time by 8107 seconds (2.25 hours) with a mere 17.31 seconds of additional overhead, leading to a ratio of time savings to paid overhead (i.e., performance gain) of 468.48 $\times$. In total, our method attains an average performance gain of 632.1 $\times$. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.04952v3-abstract-full').style.display = 'none'; document.getElementById('2502.04952v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 12 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 7 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.04364</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Human-Computer Interaction">cs.HC</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Lost in Edits? A $位$-Compass for AIGC Provenance </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=You%2C+W">Wenhao You</a>, <a href="/search/cs?searchtype=author&amp;query=Hooi%2C+B">Bryan Hooi</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yiwei Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Choo%2C+E">Euijin Choo</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+M">Ming-Hsuan Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Yuan%2C+J">Junsong Yuan</a>, <a href="/search/cs?searchtype=author&amp;query=Huang%2C+Z">Zi Huang</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yujun Cai</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.04364v1-abstract-short" style="display: inline;"> Recent advancements in diffusion models have driven the growth of text-guided image editing tools, enabling precise and iterative modifications of synthesized content. However, as these tools become increasingly accessible, they also introduce significant risks of misuse, emphasizing the critical need for robust attribution methods to ensure content authenticity and traceability. Despite the creat&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.04364v1-abstract-full').style.display = 'inline'; document.getElementById('2502.04364v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.04364v1-abstract-full" style="display: none;"> Recent advancements in diffusion models have driven the growth of text-guided image editing tools, enabling precise and iterative modifications of synthesized content. However, as these tools become increasingly accessible, they also introduce significant risks of misuse, emphasizing the critical need for robust attribution methods to ensure content authenticity and traceability. Despite the creative potential of such tools, they pose significant challenges for attribution, particularly in adversarial settings where edits can be layered to obscure an image&#39;s origins. We propose LambdaTracer, a novel latent-space attribution method that robustly identifies and differentiates authentic outputs from manipulated ones without requiring any modifications to generative or editing pipelines. By adaptively calibrating reconstruction losses, LambdaTracer remains effective across diverse iterative editing processes, whether automated through text-guided editing tools such as InstructPix2Pix and ControlNet or performed manually with editing software such as Adobe Photoshop. Extensive experiments reveal that our method consistently outperforms baseline approaches in distinguishing maliciously edited images, providing a practical solution to safeguard ownership, creativity, and credibility in the open, fast-evolving AI ecosystems. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.04364v1-abstract-full').style.display = 'none'; document.getElementById('2502.04364v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.03122</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> </div> </div> <p class="title is-5 mathjax"> HiLo: Learning Whole-Body Human-like Locomotion with Motion Tracking Controller </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+Q">Qiyuan Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Weng%2C+C">Chenfan Weng</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+G">Guanwu Li</a>, <a href="/search/cs?searchtype=author&amp;query=He%2C+F">Fulai He</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yusheng Cai</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.03122v1-abstract-short" style="display: inline;"> Deep Reinforcement Learning (RL) has emerged as a promising method to develop humanoid robot locomotion controllers. Despite the robust and stable locomotion demonstrated by previous RL controllers, their behavior often lacks the natural and agile motion patterns necessary for human-centric scenarios. In this work, we propose HiLo (human-like locomotion with motion tracking), an effective framewor&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.03122v1-abstract-full').style.display = 'inline'; document.getElementById('2502.03122v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.03122v1-abstract-full" style="display: none;"> Deep Reinforcement Learning (RL) has emerged as a promising method to develop humanoid robot locomotion controllers. Despite the robust and stable locomotion demonstrated by previous RL controllers, their behavior often lacks the natural and agile motion patterns necessary for human-centric scenarios. In this work, we propose HiLo (human-like locomotion with motion tracking), an effective framework designed to learn RL policies that perform human-like locomotion. The primary challenges of human-like locomotion are complex reward engineering and domain randomization. HiLo overcomes these issues by developing an RL-based motion tracking controller and simple domain randomization through random force injection and action delay. Within the framework of HiLo, the whole-body control problem can be decomposed into two components: One part is solved using an open-loop control method, while the residual part is addressed with RL policies. A distributional value function is also implemented to stabilize the training process by improving the estimation of cumulative rewards under perturbed dynamics. Our experiments demonstrate that the motion tracking controller trained using HiLo can perform natural and agile human-like locomotion while exhibiting resilience to external disturbances in real-world systems. Furthermore, we show that the motion patterns of humanoid robots can be adapted through the residual mechanism without fine-tuning, allowing quick adjustments to task requirements. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.03122v1-abstract-full').style.display = 'none'; document.getElementById('2502.03122v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.02977</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Disentangling CLIP Features for Enhanced Localized Understanding </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Rawlekar%2C+S">Samyak Rawlekar</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yujun Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yiwei Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+M">Ming-Hsuan Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Ahuja%2C+N">Narendra Ahuja</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.02977v2-abstract-short" style="display: inline;"> Vision-language models (VLMs) demonstrate impressive capabilities in coarse-grained tasks like image classification and retrieval. However, they struggle with fine-grained tasks that require localized understanding. To investigate this weakness, we comprehensively analyze CLIP features and identify an important issue: semantic features are highly correlated. Specifically, the features of a class e&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.02977v2-abstract-full').style.display = 'inline'; document.getElementById('2502.02977v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.02977v2-abstract-full" style="display: none;"> Vision-language models (VLMs) demonstrate impressive capabilities in coarse-grained tasks like image classification and retrieval. However, they struggle with fine-grained tasks that require localized understanding. To investigate this weakness, we comprehensively analyze CLIP features and identify an important issue: semantic features are highly correlated. Specifically, the features of a class encode information about other classes, which we call mutual feature information (MFI). This mutual information becomes evident when we query a specific class and unrelated objects are activated along with the target class. To address this issue, we propose Unmix-CLIP, a novel framework designed to reduce MFI and improve feature disentanglement. We introduce MFI loss, which explicitly separates text features by projecting them into a space where inter-class similarity is minimized. To ensure a corresponding separation in image features, we use multi-label recognition (MLR) to align the image features with the separated text features. This ensures that both image and text features are disentangled and aligned across modalities, improving feature separation for downstream tasks. For the COCO- 14 dataset, Unmix-CLIP reduces feature similarity by 24.9%. We demonstrate its effectiveness through extensive evaluations of MLR and zeroshot semantic segmentation (ZS3). In MLR, our method performs competitively on the VOC2007 and surpasses SOTA approaches on the COCO-14 dataset, using fewer training parameters. Additionally, Unmix-CLIP consistently outperforms existing ZS3 methods on COCO and VOC <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.02977v2-abstract-full').style.display = 'none'; document.getElementById('2502.02977v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 8 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 5 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.01191</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Towards Robust and Reliable Concept Representations: Reliability-Enhanced Concept Embedding Model </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yuxuan Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+X">Xiyu Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Tsutsui%2C+S">Satoshi Tsutsui</a>, <a href="/search/cs?searchtype=author&amp;query=Pang%2C+W">Winnie Pang</a>, <a href="/search/cs?searchtype=author&amp;query=Wen%2C+B">Bihan Wen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.01191v1-abstract-short" style="display: inline;"> Concept Bottleneck Models (CBMs) aim to enhance interpretability by predicting human-understandable concepts as intermediates for decision-making. However, these models often face challenges in ensuring reliable concept representations, which can propagate to downstream tasks and undermine robustness, especially under distribution shifts. Two inherent issues contribute to concept unreliability: se&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.01191v1-abstract-full').style.display = 'inline'; document.getElementById('2502.01191v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.01191v1-abstract-full" style="display: none;"> Concept Bottleneck Models (CBMs) aim to enhance interpretability by predicting human-understandable concepts as intermediates for decision-making. However, these models often face challenges in ensuring reliable concept representations, which can propagate to downstream tasks and undermine robustness, especially under distribution shifts. Two inherent issues contribute to concept unreliability: sensitivity to concept-irrelevant features (e.g., background variations) and lack of semantic consistency for the same concept across different samples. To address these limitations, we propose the Reliability-Enhanced Concept Embedding Model (RECEM), which introduces a two-fold strategy: Concept-Level Disentanglement to separate irrelevant features from concept-relevant information and a Concept Mixup mechanism to ensure semantic alignment across samples. These mechanisms work together to improve concept reliability, enabling the model to focus on meaningful object attributes and generate faithful concept representations. Experimental results demonstrate that RECEM consistently outperforms existing baselines across multiple datasets, showing superior performance under background and domain shifts. These findings highlight the effectiveness of disentanglement and alignment strategies in enhancing both reliability and robustness in CBMs. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.01191v1-abstract-full').style.display = 'none'; document.getElementById('2502.01191v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 3 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2502.01090</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Chen%2C+J">Jiali Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Hei%2C+X">Xusen Hei</a>, <a href="/search/cs?searchtype=author&amp;query=Xue%2C+Y">Yuqi Xue</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+Z">Zihan Wu</a>, <a href="/search/cs?searchtype=author&amp;query=Xie%2C+J">Jiayuan Xie</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yi Cai</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2502.01090v1-abstract-short" style="display: inline;"> Chinese literary classics hold significant cultural and educational value, offering deep insights into morality, history, and human nature. These works often include classical Chinese and complex narratives, making them difficult for children to read. To bridge this gap, we introduce a child-friendly literary adaptation (CLA) task to adapt the Chinese literary classic into engaging and accessible&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.01090v1-abstract-full').style.display = 'inline'; document.getElementById('2502.01090v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2502.01090v1-abstract-full" style="display: none;"> Chinese literary classics hold significant cultural and educational value, offering deep insights into morality, history, and human nature. These works often include classical Chinese and complex narratives, making them difficult for children to read. To bridge this gap, we introduce a child-friendly literary adaptation (CLA) task to adapt the Chinese literary classic into engaging and accessible text for children. However, recent large language models (LLMs) overlook children&#39;s reading preferences (\ie, vivid character portrayals, concise narrative structures, and appropriate readability), which poses challenges in CLA. In this paper, we propose a method called InstructChild, which augments the LLM with these preferences for adaptation. Specifically, we first obtain the characters&#39; personalities and narrative structure as additional information for fine-grained instruction tuning. Then, we devise a readability metric as the reward to align the LLM with the children&#39;s reading level. Finally, a lookahead decoding strategy is applied to improve the readability of the generated text during inference. To support the evaluation of CLA task, we construct the Classic4Children dataset, which comprises both the original and child-friendly versions of the Four Great Classical Novels of Chinese literature. Experimental results show that our InstructChild significantly improves automatic and human evaluation performance. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2502.01090v1-abstract-full').style.display = 'none'; document.getElementById('2502.01090v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 3 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted at NAACL 2025 Findings</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.18280</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Neural and Evolutionary Computing">cs.NE</span> </div> </div> <p class="title is-5 mathjax"> Jailbreaking LLMs&#39; Safeguard with Universal Magic Words for Text Embedding Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Liang%2C+H">Haoyu Liang</a>, <a href="/search/cs?searchtype=author&amp;query=Sun%2C+Y">Youran Sun</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yunfeng Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Zhu%2C+J">Jun Zhu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+B">Bo Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.18280v2-abstract-short" style="display: inline;"> The security issue of large language models (LLMs) has gained significant attention recently, with various defense mechanisms developed to prevent harmful outputs, among which safeguards based on text embedding models serve as a fundamental defense. Through testing, we discover that the distribution of text embedding model outputs is significantly biased with a large mean. Inspired by this observa&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.18280v2-abstract-full').style.display = 'inline'; document.getElementById('2501.18280v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.18280v2-abstract-full" style="display: none;"> The security issue of large language models (LLMs) has gained significant attention recently, with various defense mechanisms developed to prevent harmful outputs, among which safeguards based on text embedding models serve as a fundamental defense. Through testing, we discover that the distribution of text embedding model outputs is significantly biased with a large mean. Inspired by this observation, we propose novel efficient methods to search for universal magic words that can attack text embedding models. The universal magic words as suffixes can move the embedding of any text towards the bias direction, therefore manipulate the similarity of any text pair and mislead safeguards. By appending magic words to user prompts and requiring LLMs to end answers with magic words, attackers can jailbreak the safeguard. To eradicate this security risk, we also propose defense mechanisms against such attacks, which can correct the biased distribution of text embeddings in a train-free manner. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.18280v2-abstract-full').style.display = 'none'; document.getElementById('2501.18280v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 30 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.17889</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Knoop: Practical Enhancement of Knockoff with Over-Parameterization for Variable Selection </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+X">Xiaochen Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yunfeng Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Xiong%2C+H">Haoyi Xiong</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.17889v1-abstract-short" style="display: inline;"> Variable selection plays a crucial role in enhancing modeling effectiveness across diverse fields, addressing the challenges posed by high-dimensional datasets of correlated variables. This work introduces a novel approach namely Knockoff with over-parameterization (Knoop) to enhance Knockoff filters for variable selection. Specifically, Knoop first generates multiple knockoff variables for each o&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.17889v1-abstract-full').style.display = 'inline'; document.getElementById('2501.17889v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.17889v1-abstract-full" style="display: none;"> Variable selection plays a crucial role in enhancing modeling effectiveness across diverse fields, addressing the challenges posed by high-dimensional datasets of correlated variables. This work introduces a novel approach namely Knockoff with over-parameterization (Knoop) to enhance Knockoff filters for variable selection. Specifically, Knoop first generates multiple knockoff variables for each original variable and integrates them with the original variables into an over-parameterized Ridgeless regression model. For each original variable, Knoop evaluates the coefficient distribution of its knockoffs and compares these with the original coefficients to conduct an anomaly-based significance test, ensuring robust variable selection. Extensive experiments demonstrate superior performance compared to existing methods in both simulation and real-world datasets. Knoop achieves a notably higher Area under the Curve (AUC) of the Receiver Operating Characteristic (ROC) Curve for effectively identifying relevant variables against the ground truth by controlled simulations, while showcasing enhanced predictive accuracy across diverse regression and classification tasks. The analytical results further backup our observations. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.17889v1-abstract-full').style.display = 'none'; document.getElementById('2501.17889v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 28 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">An earlier version of our paper at Machine Learning</span> </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Journal ref:</span> Machine Learning, Volume 114, article number 26 (2025) </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.15207</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Information Theory">cs.IT</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Signal Processing">eess.SP</span> </div> </div> <p class="title is-5 mathjax"> Hybrid Near/Far-Field Frequency-Dependent Beamforming via Joint Phase-Time Arrays </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yeyue Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Tao%2C+M">Meixia Tao</a>, <a href="/search/cs?searchtype=author&amp;query=Mo%2C+J">Jianhua Mo</a>, <a href="/search/cs?searchtype=author&amp;query=Sun%2C+S">Shu Sun</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.15207v1-abstract-short" style="display: inline;"> Joint phase-time arrays (JPTA) emerge as a cost-effective and energy-efficient architecture for frequency-dependent beamforming in wideband communications by utilizing both true-time delay units and phase shifters. This paper exploits the potential of JPTA to simultaneously serve multiple users in both near- and far-field regions with a single radio frequency chain. The goal is to jointly optimize&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.15207v1-abstract-full').style.display = 'inline'; document.getElementById('2501.15207v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.15207v1-abstract-full" style="display: none;"> Joint phase-time arrays (JPTA) emerge as a cost-effective and energy-efficient architecture for frequency-dependent beamforming in wideband communications by utilizing both true-time delay units and phase shifters. This paper exploits the potential of JPTA to simultaneously serve multiple users in both near- and far-field regions with a single radio frequency chain. The goal is to jointly optimize JPTA-based beamforming and subband allocation to maximize overall system performance. To this end, we formulate a system utility maximization problem, including sum-rate maximization and proportional fairness as special cases. We develop a 3-step alternating optimization (AO) algorithm and an efficient deep learning (DL) method for this problem. The DL approach includes a 2-layer convolutional neural network, a 3-layer graph attention network (GAT), and a normalization module for resource and beamforming optimization. The GAT efficiently captures the interactions between resource allocation and analog beamformers. Simulation results confirm that JPTA outperforms conventional phased arrays (PA) in enhancing user rate and strikes a good balance between PA and fully-digital approach in energy efficiency. Employing a logarithmic utility function for user rates ensures greater fairness than maximizing sum-rates. Furthermore, the DL network achieves comparable performance to the AO approach, while having orders of magnitude lower computational complexity. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.15207v1-abstract-full').style.display = 'none'; document.getElementById('2501.15207v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 25 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.13876</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> </div> </div> <p class="title is-5 mathjax"> FAST-LIVO2 on Resource-Constrained Platforms: LiDAR-Inertial-Visual Odometry with Efficient Memory and Computation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+B">Bingyang Zhou</a>, <a href="/search/cs?searchtype=author&amp;query=Zheng%2C+C">Chunran Zheng</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Z">Ziming Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhu%2C+F">Fangcheng Zhu</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yixi Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+F">Fu Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.13876v1-abstract-short" style="display: inline;"> This paper presents a lightweight LiDAR-inertial-visual odometry system optimized for resource-constrained platforms. It integrates a degeneration-aware adaptive visual frame selector into error-state iterated Kalman filter (ESIKF) with sequential updates, improving computation efficiency significantly while maintaining a similar level of robustness. Additionally, a memory-efficient mapping struct&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.13876v1-abstract-full').style.display = 'inline'; document.getElementById('2501.13876v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.13876v1-abstract-full" style="display: none;"> This paper presents a lightweight LiDAR-inertial-visual odometry system optimized for resource-constrained platforms. It integrates a degeneration-aware adaptive visual frame selector into error-state iterated Kalman filter (ESIKF) with sequential updates, improving computation efficiency significantly while maintaining a similar level of robustness. Additionally, a memory-efficient mapping structure combining a locally unified visual-LiDAR map and a long-term visual map achieves a good trade-off between performance and memory usage. Extensive experiments on x86 and ARM platforms demonstrate the system&#39;s robustness and efficiency. On the Hilti dataset, our system achieves a 33% reduction in per-frame runtime and 47% lower memory usage compared to FAST-LIVO2, with only a 3 cm increase in RMSE. Despite this slight accuracy trade-off, our system remains competitive, outperforming state-of-the-art (SOTA) LIO methods such as FAST-LIO2 and most existing LIVO systems. These results validate the system&#39;s capability for scalable deployment on resource-constrained edge computing platforms. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.13876v1-abstract-full').style.display = 'none'; document.getElementById('2501.13876v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 23 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.13727</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Multiagent Systems">cs.MA</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Scalable Safe Multi-Agent Reinforcement Learning for Multi-Agent System </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Du%2C+H">Haikuo Du</a>, <a href="/search/cs?searchtype=author&amp;query=Gou%2C+F">Fandi Gou</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yunze Cai</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.13727v1-abstract-short" style="display: inline;"> Safety and scalability are two critical challenges faced by practical Multi-Agent Systems (MAS). However, existing Multi-Agent Reinforcement Learning (MARL) algorithms that rely solely on reward shaping are ineffective in ensuring safety, and their scalability is rather limited due to the fixed-size network output. To address these issues, we propose a novel framework, Scalable Safe MARL (SS-MARL)&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.13727v1-abstract-full').style.display = 'inline'; document.getElementById('2501.13727v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.13727v1-abstract-full" style="display: none;"> Safety and scalability are two critical challenges faced by practical Multi-Agent Systems (MAS). However, existing Multi-Agent Reinforcement Learning (MARL) algorithms that rely solely on reward shaping are ineffective in ensuring safety, and their scalability is rather limited due to the fixed-size network output. To address these issues, we propose a novel framework, Scalable Safe MARL (SS-MARL), to enhance the safety and scalability of MARL methods. Leveraging the inherent graph structure of MAS, we design a multi-layer message passing network to aggregate local observations and communications of varying sizes. Furthermore, we develop a constrained joint policy optimization method in the setting of local observation to improve safety. Simulation experiments demonstrate that SS-MARL achieves a better trade-off between optimality and safety compared to baselines, and its scalability significantly outperforms the latest methods in scenarios with a large number of agents. The feasibility of our method is also verified by hardware implementation with Mecanum-wheeled vehicles. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.13727v1-abstract-full').style.display = 'none'; document.getElementById('2501.13727v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 23 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.12202</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+Z">Zibo Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Lai%2C+Z">Zeqiang Lai</a>, <a href="/search/cs?searchtype=author&amp;query=Lin%2C+Q">Qingxiang Lin</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+Y">Yunfei Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+H">Haolin Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+S">Shuhui Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Feng%2C+Y">Yifei Feng</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+M">Mingxin Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+S">Sheng Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+X">Xianghui Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Shi%2C+H">Huiwen Shi</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+S">Sicong Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+J">Junta Wu</a>, <a href="/search/cs?searchtype=author&amp;query=Lian%2C+Y">Yihang Lian</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+F">Fan Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Tang%2C+R">Ruining Tang</a>, <a href="/search/cs?searchtype=author&amp;query=He%2C+Z">Zebin He</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+X">Xinzhou Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+J">Jian Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Zuo%2C+X">Xuhui Zuo</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+Z">Zhuo Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Lei%2C+B">Biwen Lei</a>, <a href="/search/cs?searchtype=author&amp;query=Weng%2C+H">Haohan Weng</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+J">Jing Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhu%2C+Y">Yiling Zhu</a> , et al. (49 additional authors not shown) </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.12202v3-abstract-short" style="display: inline;"> We present Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets. This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint. The shape generative model, built on a scalable flow-based diffusion transformer, aims to create geometry that pro&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.12202v3-abstract-full').style.display = 'inline'; document.getElementById('2501.12202v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.12202v3-abstract-full" style="display: none;"> We present Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets. This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint. The shape generative model, built on a scalable flow-based diffusion transformer, aims to create geometry that properly aligns with a given condition image, laying a solid foundation for downstream applications. The texture synthesis model, benefiting from strong geometric and diffusion priors, produces high-resolution and vibrant texture maps for either generated or hand-crafted meshes. Furthermore, we build Hunyuan3D-Studio -- a versatile, user-friendly production platform that simplifies the re-creation process of 3D assets. It allows both professional and amateur users to manipulate or even animate their meshes efficiently. We systematically evaluate our models, showing that Hunyuan3D 2.0 outperforms previous state-of-the-art models, including the open-source models and closed-source models in geometry details, condition alignment, texture quality, and etc. Hunyuan3D 2.0 is publicly released in order to fill the gaps in the open-source 3D community for large-scale foundation generative models. The code and pre-trained weights of our models are available at: <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.12202v3-abstract-full').style.display = 'none'; document.getElementById('2501.12202v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 26 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 21 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">GitHub link:</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.10881</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Cryptography and Security">cs.CR</span> </div> </div> <p class="title is-5 mathjax"> Addressing Network Packet-based Cheats in Multiplayer Games: A Secret Sharing Approach </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yaqi Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Markantonakis%2C+K">Konstantinos Markantonakis</a>, <a href="/search/cs?searchtype=author&amp;query=Shepherd%2C+C">Carlton Shepherd</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.10881v1-abstract-short" style="display: inline;"> Multiplayer online gaming has witnessed an explosion in popularity over the past two decades. However, security issues continue to give rise to in-game cheating, deterring honest gameplay, detracting from user experience, and ultimately bringing financial harm to game developers. In this paper, we present a new approach for detecting network packet-based cheats, such as forgery and timing cheats,&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.10881v1-abstract-full').style.display = 'inline'; document.getElementById('2501.10881v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.10881v1-abstract-full" style="display: none;"> Multiplayer online gaming has witnessed an explosion in popularity over the past two decades. However, security issues continue to give rise to in-game cheating, deterring honest gameplay, detracting from user experience, and ultimately bringing financial harm to game developers. In this paper, we present a new approach for detecting network packet-based cheats, such as forgery and timing cheats, within the context of multiplayer games using an application of secret sharing. Our developed protocols are subjected to formal verification using AVISPA, and we present simulation results using a Python-based implementation. We show that our proposal is practical in addressing some widely used attacks in online gaming. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.10881v1-abstract-full').style.display = 'none'; document.getElementById('2501.10881v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 18 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.09396</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Joint Transmission and Deblurring: A Semantic Communication Approach Using Events </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Yang%2C+P">Pujing Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+G">Guangyi Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yunlong Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Yu%2C+L">Lei Yu</a>, <a href="/search/cs?searchtype=author&amp;query=Yu%2C+G">Guanding Yu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.09396v1-abstract-short" style="display: inline;"> Deep learning-based joint source-channel coding (JSCC) is emerging as a promising technology for effective image transmission. However, most existing approaches focus on transmitting clear images, overlooking real-world challenges such as motion blur caused by camera shaking or fast-moving objects. Motion blur often degrades image quality, making transmission and reconstruction more challenging. E&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.09396v1-abstract-full').style.display = 'inline'; document.getElementById('2501.09396v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.09396v1-abstract-full" style="display: none;"> Deep learning-based joint source-channel coding (JSCC) is emerging as a promising technology for effective image transmission. However, most existing approaches focus on transmitting clear images, overlooking real-world challenges such as motion blur caused by camera shaking or fast-moving objects. Motion blur often degrades image quality, making transmission and reconstruction more challenging. Event cameras, which asynchronously record pixel intensity changes with extremely low latency, have shown great potential for motion deblurring tasks. However, the efficient transmission of the abundant data generated by event cameras remains a significant challenge. In this work, we propose a novel JSCC framework for the joint transmission of blurry images and events, aimed at achieving high-quality reconstructions under limited channel bandwidth. This approach is designed as a deblurring task-oriented JSCC system. Since RGB cameras and event cameras capture the same scene through different modalities, their outputs contain both shared and domain-specific information. To avoid repeatedly transmitting the shared information, we extract and transmit their shared information and domain-specific information, respectively. At the receiver, the received signals are processed by a deblurring decoder to generate clear images. Additionally, we introduce a multi-stage training strategy to train the proposed model. Simulation results demonstrate that our method significantly outperforms existing JSCC-based image transmission schemes, addressing motion blur effectively. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.09396v1-abstract-full').style.display = 'none'; document.getElementById('2501.09396v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 16 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.08484</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Operating Systems">cs.OS</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Performance">cs.PF</span> </div> </div> <p class="title is-5 mathjax"> CORD: Co-design of Resource Allocation and Deadline Decomposition with Generative Profiling </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Gifford%2C+R">Robert Gifford</a>, <a href="/search/cs?searchtype=author&amp;query=Eisenklam%2C+A">Abby Eisenklam</a>, <a href="/search/cs?searchtype=author&amp;query=Bondar%2C+G+A">Georgiy A. Bondar</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yifan Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Sial%2C+T">Tushar Sial</a>, <a href="/search/cs?searchtype=author&amp;query=Phan%2C+L+T+X">Linh Thi Xuan Phan</a>, <a href="/search/cs?searchtype=author&amp;query=Halder%2C+A">Abhishek Halder</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.08484v1-abstract-short" style="display: inline;"> As multicore hardware is becoming increasingly common in real-time systems, traditional scheduling techniques that assume a single worst-case execution time for a task are no longer adequate, since they ignore the impact of shared resources on execution time. When tasks execute concurrently on different cores, their execution times often vary substantially with their allocated budgets of shared re&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.08484v1-abstract-full').style.display = 'inline'; document.getElementById('2501.08484v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.08484v1-abstract-full" style="display: none;"> As multicore hardware is becoming increasingly common in real-time systems, traditional scheduling techniques that assume a single worst-case execution time for a task are no longer adequate, since they ignore the impact of shared resources on execution time. When tasks execute concurrently on different cores, their execution times often vary substantially with their allocated budgets of shared resources, such as cache and memory bandwidth. Even under a specific resource allocation, the resource use pattern of a task also changes with time during a job execution. It is therefore important to consider the relationship between multicore resources and execution time in task modeling and scheduling algorithm design. In this paper, we propose a much more precise execution model for DAG-based real-time tasks that captures the time-varying resource use characteristics of a task under different budgets of shared resources. We present a generative resource profiling algorithm that efficiently predicts, from limited measurement data, the resource profile of a task at any time during its execution under a given resource budget. The generative profiles can then be used to construct the execution models for tasks, using which one can make informed resource allocation decisions. We further introduce a multicore resource allocation and deadline decomposition co-design technique for DAG-based tasks that leverages the generated execution models to jointly allocate resources and deadlines to subtasks, to maximize resource efficiency and schedulability. Our evaluation results show that our generative profiling algorithm achieves high accuracy while being efficient, and that our co-allocation technique substantially improves schedulability compared to a state-of-the-art deadline decomposition method. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.08484v1-abstract-full').style.display = 'none'; document.getElementById('2501.08484v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 14 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.05473</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Implicit Guidance and Explicit Representation of Semantic Information in Points Cloud: A Survey </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Tang%2C+J">Jingyuan Tang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+Y">Yuhuan Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Sun%2C+S">Songlin Sun</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yangang Cai</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.05473v1-abstract-short" style="display: inline;"> Point clouds, a prominent method of 3D representation, are extensively utilized across industries such as autonomous driving, surveying, electricity, architecture, and gaming, and have been rigorously investigated for their accuracy and resilience. The extraction of semantic information from scenes enhances both human understanding and machine perception. By integrating semantic information from t&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.05473v1-abstract-full').style.display = 'inline'; document.getElementById('2501.05473v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.05473v1-abstract-full" style="display: none;"> Point clouds, a prominent method of 3D representation, are extensively utilized across industries such as autonomous driving, surveying, electricity, architecture, and gaming, and have been rigorously investigated for their accuracy and resilience. The extraction of semantic information from scenes enhances both human understanding and machine perception. By integrating semantic information from two-dimensional scenes with three-dimensional point clouds, researchers aim to improve the precision and efficiency of various tasks. This paper provides a comprehensive review of the diverse applications and recent advancements in the integration of semantic information within point clouds. We explore the dual roles of semantic information in point clouds, encompassing both implicit guidance and explicit representation, across traditional and emerging tasks. Additionally, we offer a comparative analysis of publicly available datasets tailored to specific tasks and present notable observations. In conclusion, we discuss several challenges and potential issues that may arise in the future when fully utilizing semantic information in point clouds, providing our perspectives on these obstacles. The classified and organized articles related to semantic based point cloud tasks, and continuously followed up on relevant achievements in different fields, which can be accessed through <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.05473v1-abstract-full').style.display = 'none'; document.getElementById('2501.05473v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 7 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.05098</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+Y">Yuhong Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Lin%2C+J">Jing Lin</a>, <a href="/search/cs?searchtype=author&amp;query=Zeng%2C+A">Ailing Zeng</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+G">Guanlin Wu</a>, <a href="/search/cs?searchtype=author&amp;query=Lu%2C+S">Shunlin Lu</a>, <a href="/search/cs?searchtype=author&amp;query=Fu%2C+Y">Yurong Fu</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yuanhao Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+R">Ruimao Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+H">Haoqian Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+L">Lei Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.05098v1-abstract-short" style="display: inline;"> In this paper, we introduce Motion-X++, a large-scale multimodal 3D expressive whole-body human motion dataset. Existing motion datasets predominantly capture body-only poses, lacking facial expressions, hand gestures, and fine-grained pose descriptions, and are typically limited to lab settings with manually labeled text descriptions, thereby restricting their scalability. To address this issue,&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.05098v1-abstract-full').style.display = 'inline'; document.getElementById('2501.05098v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.05098v1-abstract-full" style="display: none;"> In this paper, we introduce Motion-X++, a large-scale multimodal 3D expressive whole-body human motion dataset. Existing motion datasets predominantly capture body-only poses, lacking facial expressions, hand gestures, and fine-grained pose descriptions, and are typically limited to lab settings with manually labeled text descriptions, thereby restricting their scalability. To address this issue, we develop a scalable annotation pipeline that can automatically capture 3D whole-body human motion and comprehensive textural labels from RGB videos and build the Motion-X dataset comprising 81.1K text-motion pairs. Furthermore, we extend Motion-X into Motion-X++ by improving the annotation pipeline, introducing more data modalities, and scaling up the data quantities. Motion-X++ provides 19.5M 3D whole-body pose annotations covering 120.5K motion sequences from massive scenes, 80.8K RGB videos, 45.3K audios, 19.5M frame-level whole-body pose descriptions, and 120.5K sequence-level semantic labels. Comprehensive experiments validate the accuracy of our annotation pipeline and highlight Motion-X++&#39;s significant benefits for generating expressive, precise, and natural motion with paired multimodal labels supporting several downstream tasks, including text-driven whole-body motion generation,audio-driven motion generation, 3D whole-body human mesh recovery, and 2D whole-body keypoints estimation, etc. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.05098v1-abstract-full').style.display = 'none'; document.getElementById('2501.05098v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 9 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">17 pages, 14 figures, This work extends and enhances the research published in the NeurIPS 2023 paper, &#34;Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset&#34;. arXiv admin note: substantial text overlap with arXiv:2307.00818</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.04641</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Statistics Theory">math.ST</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> </div> </div> <p class="title is-5 mathjax"> A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Oko%2C+K">Kazusato Oko</a>, <a href="/search/cs?searchtype=author&amp;query=Lin%2C+L">Licong Lin</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yuhang Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Mei%2C+S">Song Mei</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.04641v1-abstract-short" style="display: inline;"> Multi-modal generative AI systems, such as those combining vision and language, rely on contrastive pre-training to learn representations across different modalities. While their practical benefits are widely acknowledged, a rigorous theoretical understanding of the contrastive pre-training framework remains limited. This paper develops a theoretical framework to explain the success of contrastive&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.04641v1-abstract-full').style.display = 'inline'; document.getElementById('2501.04641v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.04641v1-abstract-full" style="display: none;"> Multi-modal generative AI systems, such as those combining vision and language, rely on contrastive pre-training to learn representations across different modalities. While their practical benefits are widely acknowledged, a rigorous theoretical understanding of the contrastive pre-training framework remains limited. This paper develops a theoretical framework to explain the success of contrastive pre-training in downstream tasks, such as zero-shot classification, conditional diffusion models, and vision-language models. We introduce the concept of approximate sufficient statistics, a generalization of the classical sufficient statistics, and show that near-minimizers of the contrastive pre-training loss are approximately sufficient, making them adaptable to diverse downstream tasks. We further propose the Joint Generative Hierarchical Model for the joint distribution of images and text, showing that transformers can efficiently approximate relevant functions within this model via belief propagation. Building on this framework, we derive sample complexity guarantees for multi-modal learning based on contrastive pre-trained representations. Numerical simulations validate these theoretical findings, demonstrating the strong generalization performance of contrastively pre-trained transformers in various multi-modal tasks. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.04641v1-abstract-full').style.display = 'none'; document.getElementById('2501.04641v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 8 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">108 pages</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.04102</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Enhancing Distribution and Label Consistency for Graph Out-of-Distribution Generalization </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Wang%2C+S">Song Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+X">Xiaodong Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Islam%2C+R">Rashidul Islam</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+H">Huiyuan Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+M">Minghua Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+J">Jundong Li</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yiwei Cai</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.04102v1-abstract-short" style="display: inline;"> To deal with distribution shifts in graph data, various graph out-of-distribution (OOD) generalization techniques have been recently proposed. These methods often employ a two-step strategy that first creates augmented environments and subsequently identifies invariant subgraphs to improve generalizability. Nevertheless, this approach could be suboptimal from the perspective of consistency. First,&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.04102v1-abstract-full').style.display = 'inline'; document.getElementById('2501.04102v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.04102v1-abstract-full" style="display: none;"> To deal with distribution shifts in graph data, various graph out-of-distribution (OOD) generalization techniques have been recently proposed. These methods often employ a two-step strategy that first creates augmented environments and subsequently identifies invariant subgraphs to improve generalizability. Nevertheless, this approach could be suboptimal from the perspective of consistency. First, the process of augmenting environments by altering the graphs while preserving labels may lead to graphs that are not realistic or meaningfully related to the origin distribution, thus lacking distribution consistency. Second, the extracted subgraphs are obtained from directly modifying graphs, and may not necessarily maintain a consistent predictive relationship with their labels, thereby impacting label consistency. In response to these challenges, we introduce an innovative approach that aims to enhance these two types of consistency for graph OOD generalization. We propose a modifier to obtain both augmented and invariant graphs in a unified manner. With the augmented graphs, we enrich the training data without compromising the integrity of label-graph relationships. The label consistency enhancement in our framework further preserves the supervision information in the invariant graph. We conduct extensive experiments on real-world datasets to demonstrate the superiority of our framework over other state-of-the-art baselines. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.04102v1-abstract-full').style.display = 'none'; document.getElementById('2501.04102v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 7 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted by ICDM 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.02441</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Cryptography and Security">cs.CR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Statistics Theory">math.ST</span> </div> </div> <p class="title is-5 mathjax"> A Statistical Hypothesis Testing Framework for Data Misappropriation Detection in Large Language Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yinpeng Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+L">Lexin Li</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+L">Linjun Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.02441v1-abstract-short" style="display: inline;"> Large Language Models (LLMs) are rapidly gaining enormous popularity in recent years. However, the training of LLMs has raised significant privacy and legal concerns, particularly regarding the inclusion of copyrighted materials in their training data without proper attribution or licensing, which falls under the broader issue of data misappropriation. In this article, we focus on a specific probl&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.02441v1-abstract-full').style.display = 'inline'; document.getElementById('2501.02441v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.02441v1-abstract-full" style="display: none;"> Large Language Models (LLMs) are rapidly gaining enormous popularity in recent years. However, the training of LLMs has raised significant privacy and legal concerns, particularly regarding the inclusion of copyrighted materials in their training data without proper attribution or licensing, which falls under the broader issue of data misappropriation. In this article, we focus on a specific problem of data misappropriation detection, namely, to determine whether a given LLM has incorporated data generated by another LLM. To address this issue, we propose embedding watermarks into the copyrighted training data and formulating the detection of data misappropriation as a hypothesis testing problem. We develop a general statistical testing framework, construct a pivotal statistic, determine the optimal rejection threshold, and explicitly control the type I and type II errors. Furthermore, we establish the asymptotic optimality properties of the proposed tests, and demonstrate its empirical effectiveness through intensive numerical experiments. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.02441v1-abstract-full').style.display = 'none'; document.getElementById('2501.02441v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">29 pages, 5 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2501.01949</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Cong%2C+W">Wenyan Cong</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+K">Kevin Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Lei%2C+J">Jiahui Lei</a>, <a href="/search/cs?searchtype=author&amp;query=Stearns%2C+C">Colton Stearns</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yuanhao Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+D">Dilin Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Ranjan%2C+R">Rakesh Ranjan</a>, <a href="/search/cs?searchtype=author&amp;query=Feiszli%2C+M">Matt Feiszli</a>, <a href="/search/cs?searchtype=author&amp;query=Guibas%2C+L">Leonidas Guibas</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Z">Zhangyang Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+W">Weiyao Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Fan%2C+Z">Zhiwen Fan</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2501.01949v1-abstract-short" style="display: inline;"> Efficiently reconstructing accurate 3D models from monocular video is a key challenge in computer vision, critical for advancing applications in virtual reality, robotics, and scene understanding. Existing approaches typically require pre-computed camera parameters and frame-by-frame reconstruction pipelines, which are prone to error accumulation and entail significant computational overhead. To a&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.01949v1-abstract-full').style.display = 'inline'; document.getElementById('2501.01949v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2501.01949v1-abstract-full" style="display: none;"> Efficiently reconstructing accurate 3D models from monocular video is a key challenge in computer vision, critical for advancing applications in virtual reality, robotics, and scene understanding. Existing approaches typically require pre-computed camera parameters and frame-by-frame reconstruction pipelines, which are prone to error accumulation and entail significant computational overhead. To address these limitations, we introduce VideoLifter, a novel framework that leverages geometric priors from a learnable model to incrementally optimize a globally sparse to dense 3D representation directly from video sequences. VideoLifter segments the video sequence into local windows, where it matches and registers frames, constructs consistent fragments, and aligns them hierarchically to produce a unified 3D model. By tracking and propagating sparse point correspondences across frames and fragments, VideoLifter incrementally refines camera poses and 3D structure, minimizing reprojection error for improved accuracy and robustness. This approach significantly accelerates the reconstruction process, reducing training time by over 82% while surpassing current state-of-the-art methods in visual fidelity and computational efficiency. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2501.01949v1-abstract-full').style.display = 'none'; document.getElementById('2501.01949v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 3 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2025. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">project page:</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2412.20846</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Are LLMs Really Not Knowledgable? Mining the Submerged Knowledge in LLMs&#39; Memory </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Tao%2C+X">Xingjian Tao</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yiwei Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yujun Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+Z">Zhicheng Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Tang%2C+J">Jing Tang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2412.20846v1-abstract-short" style="display: inline;"> Large language models (LLMs) have shown promise as potential knowledge bases, yet they often struggle with question-answering tasks and are prone to hallucinations. While previous research attributes these issues to knowledge gaps in the model&#39;s parameters, our investigation reveals a different phenomenon: LLMs often retain correct knowledge even when generating incorrect answers. Through analysis&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.20846v1-abstract-full').style.display = 'inline'; document.getElementById('2412.20846v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2412.20846v1-abstract-full" style="display: none;"> Large language models (LLMs) have shown promise as potential knowledge bases, yet they often struggle with question-answering tasks and are prone to hallucinations. While previous research attributes these issues to knowledge gaps in the model&#39;s parameters, our investigation reveals a different phenomenon: LLMs often retain correct knowledge even when generating incorrect answers. Through analysis of model&#39;s internal representations, we find that correct answers frequently appear among high-probability tokens despite not being selected as final outputs. Based on this observation, we introduce Hits@k, a new metric to assess knowledge retention independent of expression accuracy. Our extensive experiments demonstrate that LLMs store significantly more knowledge than their QA performance suggests. Building on these findings, we develop SkipUnsure, a method to improve answer accuracy by leveraging detected but unexpressed knowledge. Experiments on both open-domain and specific-domain datasets show consistent improvements, with accuracy gains of up to 11.8% on DBPedia and 6.3% on IMDB, without requiring model retraining. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.20846v1-abstract-full').style.display = 'none'; document.getElementById('2412.20846v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 30 December, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2412.20471</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">ps</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Science and Game Theory">cs.GT</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Optimization and Control">math.OC</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> </div> </div> <p class="title is-5 mathjax"> On the Convergence of Min-Max Langevin Dynamics and Algorithm </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yang Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Mitra%2C+S">Siddharth Mitra</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+X">Xiuyuan Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Wibisono%2C+A">Andre Wibisono</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2412.20471v2-abstract-short" style="display: inline;"> We study zero-sum games in the space of probability distributions over the Euclidean space $\mathbb{R}^d$ with entropy regularization, in the setting when the interaction function between the players is smooth and strongly convex-strongly concave. We prove an exponential convergence guarantee for the mean-field min-max Langevin dynamics to compute the equilibrium distribution of the zero-sum game.&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.20471v2-abstract-full').style.display = 'inline'; document.getElementById('2412.20471v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2412.20471v2-abstract-full" style="display: none;"> We study zero-sum games in the space of probability distributions over the Euclidean space $\mathbb{R}^d$ with entropy regularization, in the setting when the interaction function between the players is smooth and strongly convex-strongly concave. We prove an exponential convergence guarantee for the mean-field min-max Langevin dynamics to compute the equilibrium distribution of the zero-sum game. We also study the finite-particle approximation of the mean-field min-max Langevin dynamics, both in continuous and discrete times. We prove biased convergence guarantees for the continuous-time finite-particle min-max Langevin dynamics to the stationary mean-field equilibrium distribution with an explicit bias term which does not scale with the number of particles. We also prove biased convergence guarantees for the discrete-time finite-particle min-max Langevin algorithm to the stationary mean-field equilibrium distribution with an additional bias term which scales with the step size and the number of particles. This provides an explicit iteration complexity for the average particle along the finite-particle algorithm to approximately compute the equilibrium distribution of the zero-sum game. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.20471v2-abstract-full').style.display = 'none'; document.getElementById('2412.20471v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 7 February, 2025; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 29 December, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">v2: Revised introduction and presentation of results</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2412.19990</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> SegKAN: High-Resolution Medical Image Segmentation with Long-Distance Dependencies </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Tan%2C+S">Shengbo Tan</a>, <a href="/search/cs?searchtype=author&amp;query=Xue%2C+R">Rundong Xue</a>, <a href="/search/cs?searchtype=author&amp;query=Luo%2C+S">Shipeng Luo</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+Z">Zeyu Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+X">Xinran Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+L">Lei Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Ergu%2C+D">Daji Ergu</a>, <a href="/search/cs?searchtype=author&amp;query=Yi%2C+Z">Zhang Yi</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+Y">Yang Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Ying Cai</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2412.19990v2-abstract-short" style="display: inline;"> Hepatic vessels in computed tomography scans often suffer from image fragmentation and noise interference, making it difficult to maintain vessel integrity and posing significant challenges for vessel segmentation. To address this issue, we propose an innovative model: SegKAN. First, we improve the conventional embedding module by adopting a novel convolutional network structure for image embeddin&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.19990v2-abstract-full').style.display = 'inline'; document.getElementById('2412.19990v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2412.19990v2-abstract-full" style="display: none;"> Hepatic vessels in computed tomography scans often suffer from image fragmentation and noise interference, making it difficult to maintain vessel integrity and posing significant challenges for vessel segmentation. To address this issue, we propose an innovative model: SegKAN. First, we improve the conventional embedding module by adopting a novel convolutional network structure for image embedding, which smooths out image noise and prevents issues such as gradient explosion in subsequent stages. Next, we transform the spatial relationships between Patch blocks into temporal relationships to solve the problem of capturing positional relationships between Patch blocks in traditional Vision Transformer models. We conducted experiments on a Hepatic vessel dataset, and compared to the existing state-of-the-art model, the Dice score improved by 1.78%. These results demonstrate that the proposed new structure effectively enhances the segmentation performance of high-resolution extended objects. Code will be available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.19990v2-abstract-full').style.display = 'none'; document.getElementById('2412.19990v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 2 January, 2025; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 27 December, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2412.19537</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Human-Computer Interaction">cs.HC</span> </div> </div> <p class="title is-5 mathjax"> Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Wu%2C+M">Meiqi Wu</a>, <a href="/search/cs?searchtype=author&amp;query=Huang%2C+K">Kaiqi Huang</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yuanqiang Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Hu%2C+S">Shiyu Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+Y">Yuzhong Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+W">Weiqiang Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2412.19537v1-abstract-short" style="display: inline;"> Air-writing is a challenging task that combines the fields of computer vision and natural language processing, offering an intuitive and natural approach for human-computer interaction. However, current air-writing solutions face two primary challenges: (1) their dependency on complex sensors (e.g., Radar, EEGs and others) for capturing precise handwritten trajectories, and (2) the absence of a vi&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.19537v1-abstract-full').style.display = 'inline'; document.getElementById('2412.19537v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2412.19537v1-abstract-full" style="display: none;"> Air-writing is a challenging task that combines the fields of computer vision and natural language processing, offering an intuitive and natural approach for human-computer interaction. However, current air-writing solutions face two primary challenges: (1) their dependency on complex sensors (e.g., Radar, EEGs and others) for capturing precise handwritten trajectories, and (2) the absence of a video-based air-writing dataset that covers a comprehensive vocabulary range. These limitations impede their practicality in various real-world scenarios, including the use on devices like iPhones and laptops. To tackle these challenges, we present the groundbreaking air-writing Chinese character video dataset (AWCV-100K-UCAS2024), serving as a pioneering benchmark for video-based air-writing. This dataset captures handwritten trajectories in various real-world scenarios using commonly accessible RGB cameras, eliminating the need for complex sensors. AWCV-100K-UCAS2024 includes 8.8 million video frames, encompassing the complete set of 3,755 characters from the GB2312-80 level-1 set (GB1). Furthermore, we introduce our baseline approach, the video-based character recognizer (VCRec). VCRec adeptly extracts fingertip features from sparse visual cues and employs a spatio-temporal sequence module for analysis. Experimental results showcase the superior performance of VCRec compared to existing models in recognizing air-written characters, both quantitatively and qualitatively. This breakthrough paves the way for enhanced human-computer interaction in real-world contexts. Moreover, our approach leverages affordable RGB cameras, enabling its applicability in a diverse range of scenarios. The code and data examples will be made public at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.19537v1-abstract-full').style.display = 'none'; document.getElementById('2412.19537v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 27 December, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2412.18417</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Ultra-Low Complexity On-Orbit Compression for Remote Sensing Imagery via Block Modulated Imaging </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Z">Zhibin Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yanxin Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+J">Jiayi Zhou</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+Y">Yangming Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+T">Tianyu Li</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+W">Wei Li</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+X">Xun Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+G">Guoqing Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+Y">Yang Yang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2412.18417v1-abstract-short" style="display: inline;"> The growing field of remote sensing faces a challenge: the ever-increasing size and volume of imagery data are exceeding the storage and transmission capabilities of satellite platforms. Efficient compression of remote sensing imagery is a critical solution to alleviate these burdens on satellites. However, existing compression methods are often too computationally expensive for satellites. With t&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.18417v1-abstract-full').style.display = 'inline'; document.getElementById('2412.18417v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2412.18417v1-abstract-full" style="display: none;"> The growing field of remote sensing faces a challenge: the ever-increasing size and volume of imagery data are exceeding the storage and transmission capabilities of satellite platforms. Efficient compression of remote sensing imagery is a critical solution to alleviate these burdens on satellites. However, existing compression methods are often too computationally expensive for satellites. With the continued advancement of compressed sensing theory, single-pixel imaging emerges as a powerful tool that brings new possibilities for on-orbit image compression. However, it still suffers from prolonged imaging times and the inability to perform high-resolution imaging, hindering its practical application. This paper advances the study of compressed sensing in remote sensing image compression, proposing Block Modulated Imaging (BMI). By requiring only a single exposure, BMI significantly enhances imaging acquisition speeds. Additionally, BMI obviates the need for digital micromirror devices and surpasses limitations in image resolution. Furthermore, we propose a novel decoding network specifically designed to reconstruct images compressed under the BMI framework. Leveraging the gated 3D convolutions and promoting efficient information flow across stages through a Two-Way Cross-Attention module, our decoding network exhibits demonstrably superior reconstruction performance. Extensive experiments conducted on multiple renowned remote sensing datasets unequivocally demonstrate the efficacy of our proposed method. To further validate its practical applicability, we developed and tested a prototype of the BMI-based camera, which has shown promising potential for on-orbit image compression. The code is available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.18417v1-abstract-full').style.display = 'none'; document.getElementById('2412.18417v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 24 December, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2412.15272</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Information Retrieval">cs.IR</span> </div> </div> <p class="title is-5 mathjax"> SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yuzheng Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Guo%2C+Z">Zhenyue Guo</a>, <a href="/search/cs?searchtype=author&amp;query=Pei%2C+Y">Yiwen Pei</a>, <a href="/search/cs?searchtype=author&amp;query=Bian%2C+W">Wanrui Bian</a>, <a href="/search/cs?searchtype=author&amp;query=Zheng%2C+W">Weiguo Zheng</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2412.15272v1-abstract-short" style="display: inline;"> Recent advancements in large language models (LLMs) have shown impressive versatility across various tasks. To eliminate its hallucinations, retrieval-augmented generation (RAG) has emerged as a powerful approach, leveraging external knowledge sources like knowledge graphs (KGs). In this paper, we study the task of KG-driven RAG and propose a novel Similar Graph Enhanced Retrieval-Augmented Genera&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.15272v1-abstract-full').style.display = 'inline'; document.getElementById('2412.15272v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2412.15272v1-abstract-full" style="display: none;"> Recent advancements in large language models (LLMs) have shown impressive versatility across various tasks. To eliminate its hallucinations, retrieval-augmented generation (RAG) has emerged as a powerful approach, leveraging external knowledge sources like knowledge graphs (KGs). In this paper, we study the task of KG-driven RAG and propose a novel Similar Graph Enhanced Retrieval-Augmented Generation (SimGRAG) method. It effectively addresses the challenge of aligning query texts and KG structures through a two-stage process: (1) query-to-pattern, which uses an LLM to transform queries into a desired graph pattern, and (2) pattern-to-subgraph, which quantifies the alignment between the pattern and candidate subgraphs using a graph semantic distance (GSD) metric. We also develop an optimized retrieval algorithm that efficiently identifies the top-$k$ subgraphs within 1-second latency on a 10-million-scale KG. Extensive experiments show that SimGRAG outperforms state-of-the-art KG-driven RAG methods in both question answering and fact verification, offering superior plug-and-play usability and scalability. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.15272v1-abstract-full').style.display = 'none'; document.getElementById('2412.15272v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 17 December, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2412.15040</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> </div> <div class="is-inline-block" style="margin-left: 0.5rem"> <div class="tags has-addons"> <span class="tag is-dark is-size-7">doi</span> <span class="tag is-light is-size-7"><a class="" href="">10.1109/COINS61597.2024.10622644 <i class="fa fa-external-link" aria-hidden="true"></i></a></span> </div> </div> </div> <p class="title is-5 mathjax"> Noise Analysis and Modeling of the PMD Flexx2 Depth Camera for Robotic Applications </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yuke Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Plozza%2C+D">Davide Plozza</a>, <a href="/search/cs?searchtype=author&amp;query=Marty%2C+S">Steven Marty</a>, <a href="/search/cs?searchtype=author&amp;query=Joseph%2C+P">Paul Joseph</a>, <a href="/search/cs?searchtype=author&amp;query=Magno%2C+M">Michele Magno</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2412.15040v1-abstract-short" style="display: inline;"> Time of Flight ToF cameras renowned for their ability to capture realtime 3D information have become indispensable for agile mobile robotics These cameras utilize light signals to accurately measure distances enabling robots to navigate complex environments with precision Innovative depth cameras characterized by their compact size and lightweight design such as the recently released PMD Flexx2 ar&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.15040v1-abstract-full').style.display = 'inline'; document.getElementById('2412.15040v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2412.15040v1-abstract-full" style="display: none;"> Time of Flight ToF cameras renowned for their ability to capture realtime 3D information have become indispensable for agile mobile robotics These cameras utilize light signals to accurately measure distances enabling robots to navigate complex environments with precision Innovative depth cameras characterized by their compact size and lightweight design such as the recently released PMD Flexx2 are particularly suited for mobile robots Capable of achieving high frame rates while capturing depth information this innovative sensor is suitable for tasks such as robot navigation and terrain mapping Operating on the ToF measurement principle the sensor offers multiple benefits over classic stereobased depth cameras However the depth images produced by the camera are subject to noise from multiple sources complicating their simulation This paper proposes an accurate quantification and modeling of the nonsystematic noise of the PMD Flexx2 We propose models for both axial and lateral noise across various camera modes assuming Gaussian distributions Axial noise modeled as a function of distance and incidence angle demonstrated a low average KullbackLeibler KL divergence of 0015 nats reflecting precise noise characterization Lateral noise deviating from a Gaussian distribution was modeled conservatively yielding a satisfactory KL divergence of 0868 nats These results validate our noise models crucial for accurately simulating sensor behavior in virtual environments and reducing the simtoreal gap in learningbased control approaches <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.15040v1-abstract-full').style.display = 'none'; document.getElementById('2412.15040v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 19 December, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted by COINS 2024</span> </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Journal ref:</span> IEEE International Conference on Omni-layer Intelligent Systems (COINS), 2024, pp. 422-427 </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2412.11479</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Information Theory">cs.IT</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Signal Processing">eess.SP</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Systems and Control">eess.SY</span> </div> <div class="is-inline-block" style="margin-left: 0.5rem"> <div class="tags has-addons"> <span class="tag is-dark is-size-7">doi</span> <span class="tag is-light is-size-7"><a class="" href="">10.1109/GCWkshps58843.2023.10464958 <i class="fa fa-external-link" aria-hidden="true"></i></a></span> </div> </div> </div> <p class="title is-5 mathjax"> Wireless Environmental Information Theory: A New Paradigm towards 6G Online and Proactive Environment Intelligence Communication </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+J">Jianhua Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Yu%2C+L">Li Yu</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+S">Shaoyi Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yichen Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+Y">Yuxiang Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Xing%2C+H">Hongbo Xing</a>, <a href="/search/cs?searchtype=author&amp;query=jiang%2C+T">Tao jiang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2412.11479v1-abstract-short" style="display: inline;"> The channel is one of the five critical components of a communication system, and its ergodic capacity is based on all realizations of statistic channel model. This statistical paradigm has successfully guided the design of mobile communication systems from 1G to 5G. However, this approach relies on offline channel measurements in specific environments, and the system passively adapts to new envir&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.11479v1-abstract-full').style.display = 'inline'; document.getElementById('2412.11479v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2412.11479v1-abstract-full" style="display: none;"> The channel is one of the five critical components of a communication system, and its ergodic capacity is based on all realizations of statistic channel model. This statistical paradigm has successfully guided the design of mobile communication systems from 1G to 5G. However, this approach relies on offline channel measurements in specific environments, and the system passively adapts to new environments, resulting in deviation from the optimal performance. With the pursuit of higher capacity and data rate of 6G, especially facing the ubiquitous environments, there is an urgent need for a new paradigm to combat the randomness of channel, i.e., more proactive and online manner. Motivated by this, we propose an environment intelligence communication (EIC) based on wireless environmental information theory (WEIT) for 6G. The proposed EIC architecture is composed of three steps: Firstly, wireless environmental information (WEI) is acquired using sensing techniques. Then, leveraging WEI and channel data, AI techniques are employed to predict channel fading, thereby mitigating channel uncertainty. Thirdly, the communication system autonomously determines the optimal air-interface transmission strategy based on real-time channel predictions, enabling intelligent interaction with the physical environment. To make this attractive paradigm shift from theory to practice, we answer three key problems to establish WEIT for the first time. How should WEI be defined? Can it be quantified? Does it hold the same properties as statistical communication information? Furthermore, EIC aided by WEI (EIC-WEI) is validated across multiple air-interface tasks, including CSI prediction, beam prediction, and radio resource management. Simulation results demonstrate that the proposed EIC-WEI significantly outperforms the statistical paradigm in decreasing overhead and performance optimization. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.11479v1-abstract-full').style.display = 'none'; document.getElementById('2412.11479v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 16 December, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2412.10347</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Biomolecules">q-bio.BM</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ren%2C+Y">Yuchen Ren</a>, <a href="/search/cs?searchtype=author&amp;query=Han%2C+W">Wenwei Han</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+Q">Qianyuan Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Tang%2C+Y">Yining Tang</a>, <a href="/search/cs?searchtype=author&amp;query=Bai%2C+W">Weiqiang Bai</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yuchen Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Qiao%2C+L">Lifeng Qiao</a>, <a href="/search/cs?searchtype=author&amp;query=Jiang%2C+H">Hao Jiang</a>, <a href="/search/cs?searchtype=author&amp;query=Yuan%2C+D">Dong Yuan</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+T">Tao Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Sun%2C+S">Siqi Sun</a>, <a href="/search/cs?searchtype=author&amp;query=Tan%2C+P">Pan Tan</a>, <a href="/search/cs?searchtype=author&amp;query=Ouyang%2C+W">Wanli Ouyang</a>, <a href="/search/cs?searchtype=author&amp;query=Dong%2C+N">Nanqing Dong</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+X">Xinzhu Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Ye%2C+P">Peng Ye</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2412.10347v1-abstract-short" style="display: inline;"> As key elements within the central dogma, DNA, RNA, and proteins play crucial roles in maintaining life by guaranteeing accurate genetic expression and implementation. Although research on these molecules has profoundly impacted fields like medicine, agriculture, and industry, the diversity of machine learning approaches-from traditional statistical methods to deep learning models and large langua&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.10347v1-abstract-full').style.display = 'inline'; document.getElementById('2412.10347v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2412.10347v1-abstract-full" style="display: none;"> As key elements within the central dogma, DNA, RNA, and proteins play crucial roles in maintaining life by guaranteeing accurate genetic expression and implementation. Although research on these molecules has profoundly impacted fields like medicine, agriculture, and industry, the diversity of machine learning approaches-from traditional statistical methods to deep learning models and large language models-poses challenges for researchers in choosing the most suitable models for specific tasks, especially for cross-omics and multi-omics tasks due to the lack of comprehensive benchmarks. To address this, we introduce the first comprehensive multi-omics benchmark COMET (Benchmark for Biological COmprehensive Multi-omics Evaluation Tasks and Language Models), designed to evaluate models across single-omics, cross-omics, and multi-omics tasks. First, we curate and develop a diverse collection of downstream tasks and datasets covering key structural and functional aspects in DNA, RNA, and proteins, including tasks that span multiple omics levels. Then, we evaluate existing foundational language models for DNA, RNA, and proteins, as well as the newly proposed multi-omics method, offering valuable insights into their performance in integrating and analyzing data from different biological modalities. This benchmark aims to define critical issues in multi-omics research and guide future directions, ultimately promoting advancements in understanding biological processes through integrated and different omics data analysis. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.10347v1-abstract-full').style.display = 'none'; document.getElementById('2412.10347v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 December, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2412.07801</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> <div class="is-inline-block" style="margin-left: 0.5rem"> <div class="tags has-addons"> <span class="tag is-dark is-size-7">doi</span> <span class="tag is-light is-size-7"><a class="" href="">10.1145/3664647.3681590 <i class="fa fa-external-link" aria-hidden="true"></i></a></span> </div> </div> </div> <p class="title is-5 mathjax"> Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Chen%2C+J">Jiali Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Hei%2C+X">Xusen Hei</a>, <a href="/search/cs?searchtype=author&amp;query=Xue%2C+Y">Yuqi Xue</a>, <a href="/search/cs?searchtype=author&amp;query=Wei%2C+Y">Yuancheng Wei</a>, <a href="/search/cs?searchtype=author&amp;query=Xie%2C+J">Jiayuan Xie</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yi Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+Q">Qing Li</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2412.07801v1-abstract-short" style="display: inline;"> Large multimodal models (LMMs) have shown remarkable performance in the visual commonsense reasoning (VCR) task, which aims to answer a multiple-choice question based on visual commonsense within an image. However, the ability of LMMs to correct potential visual commonsense errors in the distractor upon their occurrence is yet under-explored. Drawing inspiration from how a human teacher crafts cha&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.07801v1-abstract-full').style.display = 'inline'; document.getElementById('2412.07801v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2412.07801v1-abstract-full" style="display: none;"> Large multimodal models (LMMs) have shown remarkable performance in the visual commonsense reasoning (VCR) task, which aims to answer a multiple-choice question based on visual commonsense within an image. However, the ability of LMMs to correct potential visual commonsense errors in the distractor upon their occurrence is yet under-explored. Drawing inspiration from how a human teacher crafts challenging distractors to test students&#39; comprehension of the concepts or skills and assists them in identifying and correcting errors toward the answer, we are the pioneering research for LMMs to simulate this error correction process. To this end, we employ GPT-4 as a ``teacher&#39;&#39; to collect the explainable feedback dataset VCR-DF for error correction, which serves as a benchmark to evaluate the ability of LMMs to identify misconceptions and clarify reasons behind the error in VCR distractors toward final answers. In addition, we propose an LMM-based Pedagogical Expert Instructed Feedback Generation (PEIFG) model to incorporate the learnable expert prompts and multimodal instruction as guidance for feedback generation. Experimental results show that our PEIFG significantly outperforms existing LMMs. We believe that our benchmark provides a new direction for evaluating the capabilities of LMMs. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.07801v1-abstract-full').style.display = 'none'; document.getElementById('2412.07801v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 7 December, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted by ACM MM 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2412.07255</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Label-Confidence-Aware Uncertainty Estimation in Natural Language Generation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Lin%2C+Q">Qinhong Lin</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+L">Linna Zhou</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+Z">Zhongliang Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yuang Cai</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2412.07255v1-abstract-short" style="display: inline;"> Large Language Models (LLMs) display formidable capabilities in generative tasks but also pose potential risks due to their tendency to generate hallucinatory responses. Uncertainty Quantification (UQ), the evaluation of model output reliability, is crucial for ensuring the safety and robustness of AI systems. Recent studies have concentrated on model uncertainty by analyzing the relationship betw&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.07255v1-abstract-full').style.display = 'inline'; document.getElementById('2412.07255v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2412.07255v1-abstract-full" style="display: none;"> Large Language Models (LLMs) display formidable capabilities in generative tasks but also pose potential risks due to their tendency to generate hallucinatory responses. Uncertainty Quantification (UQ), the evaluation of model output reliability, is crucial for ensuring the safety and robustness of AI systems. Recent studies have concentrated on model uncertainty by analyzing the relationship between output entropy under various sampling conditions and the corresponding labels. However, these methods primarily focus on measuring model entropy with precision to capture response characteristics, often neglecting the uncertainties associated with greedy decoding results-the sources of model labels, which can lead to biased classification outcomes. In this paper, we explore the biases introduced by greedy decoding and propose a label-confidence-aware (LCA) uncertainty estimation based on Kullback-Leibler (KL) divergence bridging between samples and label source, thus enhancing the reliability and stability of uncertainty assessments. Our empirical evaluations across a range of popular LLMs and NLP datasets reveal that different label sources can indeed affect classification, and that our approach can effectively capture differences in sampling results and label sources, demonstrating more effective uncertainty estimation. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.07255v1-abstract-full').style.display = 'none'; document.getElementById('2412.07255v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 December, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2412.06568</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> CONDEN-FI: Consistency and Diversity Learning-based Multi-View Unsupervised Feature and In-stance Co-Selection </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Huang%2C+Y">Yanyong Huang</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yuxin Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+D">Dongjie Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Yi%2C+X">Xiuwen Yi</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+T">Tianrui Li</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2412.06568v1-abstract-short" style="display: inline;"> The objective of multi-view unsupervised feature and instance co-selection is to simultaneously iden-tify the most representative features and samples from multi-view unlabeled data, which aids in mit-igating the curse of dimensionality and reducing instance size to improve the performance of down-stream tasks. However, existing methods treat feature selection and instance selection as two separat&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.06568v1-abstract-full').style.display = 'inline'; document.getElementById('2412.06568v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2412.06568v1-abstract-full" style="display: none;"> The objective of multi-view unsupervised feature and instance co-selection is to simultaneously iden-tify the most representative features and samples from multi-view unlabeled data, which aids in mit-igating the curse of dimensionality and reducing instance size to improve the performance of down-stream tasks. However, existing methods treat feature selection and instance selection as two separate processes, failing to leverage the potential interactions between the feature and instance spaces. Addi-tionally, previous co-selection methods for multi-view data require concatenating different views, which overlooks the consistent information among them. In this paper, we propose a CONsistency and DivErsity learNing-based multi-view unsupervised Feature and Instance co-selection (CONDEN-FI) to address the above-mentioned issues. Specifically, CONDEN-FI reconstructs mul-ti-view data from both the sample and feature spaces to learn representations that are consistent across views and specific to each view, enabling the simultaneous selection of the most important features and instances. Moreover, CONDEN-FI adaptively learns a view-consensus similarity graph to help select both dissimilar and similar samples in the reconstructed data space, leading to a more diverse selection of instances. An efficient algorithm is developed to solve the resultant optimization problem, and the comprehensive experimental results on real-world datasets demonstrate that CONDEN-FI is effective compared to state-of-the-art methods. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.06568v1-abstract-full').style.display = 'none'; document.getElementById('2412.06568v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 9 December, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2412.06512</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Software Engineering">cs.SE</span> </div> </div> <p class="title is-5 mathjax"> The Fusion of Large Language Models and Formal Methods for Trustworthy AI Agents: A Roadmap </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+Y">Yedi Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yufan Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Zuo%2C+X">Xinyue Zuo</a>, <a href="/search/cs?searchtype=author&amp;query=Luan%2C+X">Xiaokun Luan</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+K">Kailong Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Hou%2C+Z">Zhe Hou</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+Y">Yifan Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Wei%2C+Z">Zhiyuan Wei</a>, <a href="/search/cs?searchtype=author&amp;query=Sun%2C+M">Meng Sun</a>, <a href="/search/cs?searchtype=author&amp;query=Sun%2C+J">Jun Sun</a>, <a href="/search/cs?searchtype=author&amp;query=Sun%2C+J">Jing Sun</a>, <a href="/search/cs?searchtype=author&amp;query=Dong%2C+J+S">Jin Song Dong</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2412.06512v1-abstract-short" style="display: inline;"> Large Language Models (LLMs) have emerged as a transformative AI paradigm, profoundly influencing daily life through their exceptional language understanding and contextual generation capabilities. Despite their remarkable performance, LLMs face a critical challenge: the propensity to produce unreliable outputs due to the inherent limitations of their learning-based nature. Formal methods (FMs), o&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.06512v1-abstract-full').style.display = 'inline'; document.getElementById('2412.06512v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2412.06512v1-abstract-full" style="display: none;"> Large Language Models (LLMs) have emerged as a transformative AI paradigm, profoundly influencing daily life through their exceptional language understanding and contextual generation capabilities. Despite their remarkable performance, LLMs face a critical challenge: the propensity to produce unreliable outputs due to the inherent limitations of their learning-based nature. Formal methods (FMs), on the other hand, are a well-established computation paradigm that provides mathematically rigorous techniques for modeling, specifying, and verifying the correctness of systems. FMs have been extensively applied in mission-critical software engineering, embedded systems, and cybersecurity. However, the primary challenge impeding the deployment of FMs in real-world settings lies in their steep learning curves, the absence of user-friendly interfaces, and issues with efficiency and adaptability. This position paper outlines a roadmap for advancing the next generation of trustworthy AI systems by leveraging the mutual enhancement of LLMs and FMs. First, we illustrate how FMs, including reasoning and certification techniques, can help LLMs generate more reliable and formally certified outputs. Subsequently, we highlight how the advanced learning capabilities and adaptability of LLMs can significantly enhance the usability, efficiency, and scalability of existing FM tools. Finally, we show that unifying these two computation paradigms -- integrating the flexibility and intelligence of LLMs with the rigorous reasoning abilities of FMs -- has transformative potential for the development of trustworthy AI software systems. We acknowledge that this integration has the potential to enhance both the trustworthiness and efficiency of software engineering practices while fostering the development of intelligent FM tools capable of addressing complex yet real-world challenges. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2412.06512v1-abstract-full').style.display = 'none'; document.getElementById('2412.06512v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 9 December, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">24 pages, 4 figures</span> </p> </li> </ol> <nav class="pagination is-small is-centered breathe-horizontal" role="navigation" 