<p class="title is-5 mathjax">
Rethinking Normalization Strategies and Convolutional Kernels for Multimodal Image Fusion
</p>
<p class="authors">
<span class="search-hit">Authors:</span>
<a href="/search/cs?searchtype=author&amp;query=He%2C+D">Dan He</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+G">Guofen Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+W">Weisheng Li</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yucheng Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+W">Wenbo Li</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+L">Lijian Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Huang%2C+Y">Yuping Huang</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+F">Feiyan Li</a>
</p> tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Rethinking Normalization Strategies and Convolutional Kernels for Multimodal Image Fusion </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=He%2C+D">Dan He</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+G">Guofen Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+W">Weisheng Li</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yucheng Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+W">Wenbo Li</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+L">Lijian Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Huang%2C+Y">Yuping Huang</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+F">Feiyan Li</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.10036v1-abstract-short" style="display: inline;"> Multimodal image fusion (MMIF) aims to integrate information from different modalities to obtain a comprehensive image, aiding downstream tasks. However, existing methods tend to prioritize natural image fusion and focus on information complementary and network training strategies. They ignore the essential distinction between natural and medical image fusion and the influence of underlying compon&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10036v1-abstract-full').style.display = 'inline'; document.getElementById('2411.10036v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.10036v1-abstract-full" style="display: none;"> Multimodal image fusion (MMIF) aims to integrate information from different modalities to obtain a comprehensive image, aiding downstream tasks. However, existing methods tend to prioritize natural image fusion and focus on information complementary and network training strategies. They ignore the essential distinction between natural and medical image fusion and the influence of underlying components. This paper dissects the significant differences between the two tasks regarding fusion goals, statistical properties, and data distribution. Based on this, we rethink the suitability of the normalization strategy and convolutional kernels for end-to-end MMIF.Specifically, this paper proposes a mixture of instance normalization and group normalization to preserve sample independence and reinforce intrinsic feature correlation.This strategy promotes the potential of enriching feature maps, thus boosting fusion performance. To this end, we further introduce the large kernel convolution, effectively expanding receptive fields and enhancing the preservation of image detail. Moreover, the proposed multipath adaptive fusion module recalibrates the decoder input with features of various scales and receptive fields, ensuring the transmission of crucial information. Extensive experiments demonstrate that our method exhibits state-of-the-art performance in multiple fusion tasks and significantly improves downstream applications. The code is available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10036v1-abstract-full').style.display = 'none'; document.getElementById('2411.10036v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 15 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.02715</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> CIT: Rethinking Class-incremental Semantic Segmentation with a Class Independent Transformation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ge%2C+J">Jinchao Ge</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+B">Bowen Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+A">Akide Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Phan%2C+M+H">Minh Hieu Phan</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+Q">Qi Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yangyang Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+Y">Yang Zhao</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.02715v1-abstract-short" style="display: inline;"> Class-incremental semantic segmentation (CSS) requires that a model learn to segment new classes without forgetting how to segment previous ones: this is typically achieved by distilling the current knowledge and incorporating the latest data. However, bypassing iterative distillation by directly transferring outputs of initial classes to the current learning task is not supported in existing clas&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02715v1-abstract-full').style.display = 'inline'; document.getElementById('2411.02715v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.02715v1-abstract-full" style="display: none;"> Class-incremental semantic segmentation (CSS) requires that a model learn to segment new classes without forgetting how to segment previous ones: this is typically achieved by distilling the current knowledge and incorporating the latest data. However, bypassing iterative distillation by directly transferring outputs of initial classes to the current learning task is not supported in existing class-specific CSS methods. Via Softmax, they enforce dependency between classes and adjust the output distribution at each learning step, resulting in a large probability distribution gap between initial and current tasks. We introduce a simple, yet effective Class Independent Transformation (CIT) that converts the outputs of existing semantic segmentation models into class-independent forms with negligible cost or performance loss. By utilizing class-independent predictions facilitated by CIT, we establish an accumulative distillation framework, ensuring equitable incorporation of all class information. We conduct extensive experiments on various segmentation architectures, including DeepLabV3, Mask2Former, and SegViTv2. Results from these experiments show minimal task forgetting across different datasets, with less than 5% for ADE20K in the most challenging 11 task configurations and less than 1% across all configurations for the PASCAL VOC 2012 dataset. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02715v1-abstract-full').style.display = 'none'; document.getElementById('2411.02715v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">11 pages, 5 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.02183</a> <span>&nbsp;[<a href="">pdf</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Science and Game Theory">cs.GT</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Physics and Society">physics.soc-ph</span> </div> </div> <p class="title is-5 mathjax"> Vehicles, Pedestrians, and E-bikes: a Three-party Game at Right-turn-on-red Crossroads Revealing the Dual and Irrational Role of E-bikes that Risks Traffic Safety </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+G">Gangcheng Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yeshuo Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+K">Keyi Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yuxuan Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+D">Donghang Li</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+L">Liyan Xu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.02183v1-abstract-short" style="display: inline;"> The widespread use of e-bikes has facilitated short-distance travel yet led to confusion and safety problems in road traffic. This study focuses on the dual characteristics of e-bikes in traffic conflicts: they resemble pedestrians when interacting with motor vehicles and behave like motor vehicles when in conflict with pedestrians, which raises the right of way concerns when potential conflicts a&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02183v1-abstract-full').style.display = 'inline'; document.getElementById('2411.02183v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.02183v1-abstract-full" style="display: none;"> The widespread use of e-bikes has facilitated short-distance travel yet led to confusion and safety problems in road traffic. This study focuses on the dual characteristics of e-bikes in traffic conflicts: they resemble pedestrians when interacting with motor vehicles and behave like motor vehicles when in conflict with pedestrians, which raises the right of way concerns when potential conflicts are at stake. Using the Quantal Response Equilibrium model, this research analyzes the behavioral choice differences of three groups of road users (vehicle-pedestrian, vehicle-e-bike, e-bike-pedestrian) at right-turn-on-red crossroads in right-turning lines and straight-going lines conflict scenarios. The results show that the behavior of e-bikes is more similar to that of motor vehicles than pedestrians overall, and their interactions with either pedestrians or motor vehicles do not establish a reasonable order, increasing the likelihood of confusion and conflict. In contrast, a mutual understanding has developed between motor vehicles and pedestrians, where motor vehicles tend to yield, and pedestrians tend to cross. By clarifying the game theoretical model and introducing the rationality parameter, this study precisely locates the role of e-bikes among road users, which provides a reliable theoretical basis for optimizing traffic regulations. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02183v1-abstract-full').style.display = 'none'; document.getElementById('2411.02183v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">12 pages, 4 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.01844</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Human-Computer Interaction">cs.HC</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Social and Information Networks">cs.SI</span> </div> </div> <p class="title is-5 mathjax"> DeMod: A Holistic Tool with Explainable Detection and Personalized Modification for Toxicity Censorship </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Li%2C+Y">Yaqiong Li</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+P">Peng Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Gu%2C+H">Hansu Gu</a>, <a href="/search/cs?searchtype=author&amp;query=Lu%2C+T">Tun Lu</a>, <a href="/search/cs?searchtype=author&amp;query=Qiao%2C+S">Siyuan Qiao</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yubo Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Shao%2C+Y">Yiyang Shao</a>, <a href="/search/cs?searchtype=author&amp;query=Gu%2C+N">Ning Gu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.01844v1-abstract-short" style="display: inline;"> Although there have been automated approaches and tools supporting toxicity censorship for social posts, most of them focus on detection. Toxicity censorship is a complex process, wherein detection is just an initial task and a user can have further needs such as rationale understanding and content modification. For this problem, we conduct a needfinding study to investigate people&#39;s diverse needs&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.01844v1-abstract-full').style.display = 'inline'; document.getElementById('2411.01844v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.01844v1-abstract-full" style="display: none;"> Although there have been automated approaches and tools supporting toxicity censorship for social posts, most of them focus on detection. Toxicity censorship is a complex process, wherein detection is just an initial task and a user can have further needs such as rationale understanding and content modification. For this problem, we conduct a needfinding study to investigate people&#39;s diverse needs in toxicity censorship and then build a ChatGPT-based censorship tool named DeMod accordingly. DeMod is equipped with the features of explainable Detection and personalized Modification, providing fine-grained detection results, detailed explanations, and personalized modification suggestions. We also implemented the tool and recruited 35 Weibo users for evaluation. The results suggest DeMod&#39;s multiple strengths like the richness of functionality, the accuracy of censorship, and ease of use. Based on the findings, we further propose several insights into the design of content censorship systems. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.01844v1-abstract-full').style.display = 'none'; document.getElementById('2411.01844v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.15997</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> MultiRC: Joint Learning for Time Series Anomaly Prediction and Detection with Multi-scale Reconstructive Contrast </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Hu%2C+S">Shiyan Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+K">Kai Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Qiu%2C+X">Xiangfei Qiu</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yang Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Hu%2C+J">Jilin Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+B">Bin Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Guo%2C+C">Chenjuan Guo</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.15997v1-abstract-short" style="display: inline;"> Many methods have been proposed for unsupervised time series anomaly detection. Despite some progress, research on predicting future anomalies is still relatively scarce. Predicting anomalies is particularly challenging due to the diverse reaction time and the lack of labeled data. To address these challenges, we propose MultiRC to integrate reconstructive and contrastive learning for joint learni&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.15997v1-abstract-full').style.display = 'inline'; document.getElementById('2410.15997v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.15997v1-abstract-full" style="display: none;"> Many methods have been proposed for unsupervised time series anomaly detection. Despite some progress, research on predicting future anomalies is still relatively scarce. Predicting anomalies is particularly challenging due to the diverse reaction time and the lack of labeled data. To address these challenges, we propose MultiRC to integrate reconstructive and contrastive learning for joint learning of anomaly prediction and detection, with multi-scale structure and adaptive dominant period mask to deal with the diverse reaction time. MultiRC also generates negative samples to provide essential training momentum for the anomaly prediction tasks and prevent model degradation. We evaluate seven benchmark datasets from different fields. For both anomaly prediction and detection tasks, MultiRC outperforms existing state-of-the-art methods. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.15997v1-abstract-full').style.display = 'none'; document.getElementById('2410.15997v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.11845</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">ps</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Distributed, Parallel, and Cluster Computing">cs.DC</span> </div> </div> <p class="title is-5 mathjax"> A Review on Edge Large Language Models: Design, Execution, and Applications </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zheng%2C+Y">Yue Zheng</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+Y">Yuhao Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Qian%2C+B">Bin Qian</a>, <a href="/search/cs?searchtype=author&amp;query=Shi%2C+X">Xiufang Shi</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yuanchao Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+J">Jiming Chen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.11845v1-abstract-short" style="display: inline;"> Large language models (LLMs) have revolutionized natural language processing with their exceptional capabilities. However, deploying LLMs on resource-constrained edge devices presents significant challenges due to computational limitations, memory constraints, and edge hardware heterogeneity. This survey summarizes recent developments in edge LLMs across their lifecycle, examining resource-efficie&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.11845v1-abstract-full').style.display = 'inline'; document.getElementById('2410.11845v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.11845v1-abstract-full" style="display: none;"> Large language models (LLMs) have revolutionized natural language processing with their exceptional capabilities. However, deploying LLMs on resource-constrained edge devices presents significant challenges due to computational limitations, memory constraints, and edge hardware heterogeneity. This survey summarizes recent developments in edge LLMs across their lifecycle, examining resource-efficient designs from pre-deployment techniques to runtime optimizations. Additionally, it explores on-device LLM applications in personal, enterprise, and industrial scenarios. By synthesizing advancements and identifying future directions, this survey aims to provide a comprehensive understanding of state-of-the-art methods for deploying LLMs on edge devices, bridging the gap between their immense potential and edge computing limitations. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.11845v1-abstract-full').style.display = 'none'; document.getElementById('2410.11845v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 29 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.11802</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Li%2C+Z">Zhe Li</a>, <a href="/search/cs?searchtype=author&amp;query=Qiu%2C+X">Xiangfei Qiu</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+P">Peng Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yihang Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Cheng%2C+H">Hanyin Cheng</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yang Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Hu%2C+J">Jilin Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Guo%2C+C">Chenjuan Guo</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+A">Aoying Zhou</a>, <a href="/search/cs?searchtype=author&amp;query=Wen%2C+Q">Qingsong Wen</a>, <a href="/search/cs?searchtype=author&amp;query=Jensen%2C+C+S">Christian S. Jensen</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+B">Bin Yang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.11802v3-abstract-short" style="display: inline;"> Time Series Forecasting (TSF) is key functionality in numerous fields, including in finance, weather services, and energy management. While TSF methods are emerging these days, many of them require domain-specific data collection and model training and struggle with poor generalization performance on new domains. Foundation models aim to overcome this limitation. Pre-trained on large-scale languag&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.11802v3-abstract-full').style.display = 'inline'; document.getElementById('2410.11802v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.11802v3-abstract-full" style="display: none;"> Time Series Forecasting (TSF) is key functionality in numerous fields, including in finance, weather services, and energy management. While TSF methods are emerging these days, many of them require domain-specific data collection and model training and struggle with poor generalization performance on new domains. Foundation models aim to overcome this limitation. Pre-trained on large-scale language or time series data, they exhibit promising inferencing capabilities in new or unseen data. This has spurred a surge in new TSF foundation models. We propose a new benchmark, FoundTS, to enable thorough and fair evaluation and comparison of such models. FoundTS covers a variety of TSF foundation models, including those based on large language models and those pretrained on time series. Next, FoundTS supports different forecasting strategies, including zero-shot, few-shot, and full-shot, thereby facilitating more thorough evaluations. Finally, FoundTS offers a pipeline that standardizes evaluation processes such as dataset splitting, loading, normalization, and few-shot sampling, thereby facilitating fair evaluations. Building on this, we report on an extensive evaluation of TSF foundation models on a broad range of datasets from diverse domains and with different statistical characteristics. Specifically, we identify pros and cons and inherent limitations of existing foundation models, and we identify directions for future model design. We make our code and datasets available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.11802v3-abstract-full').style.display = 'none'; document.getElementById('2410.11802v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 1 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 15 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.10168</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Li%2C+Z">Zhenhang Li</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yan Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Zeng%2C+W">Weichao Zeng</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+D">Dongbao Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+Y">Yu Zhou</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.10168v1-abstract-short" style="display: inline;"> Diffusion models, known for their impressive image generation abilities, have played a pivotal role in the rise of visual text generation. Nevertheless, existing visual text generation methods often focus on generating entire images with text prompts, leading to imprecise control and limited practicality. A more promising direction is visual text blending, which focuses on seamlessly merging texts&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.10168v1-abstract-full').style.display = 'inline'; document.getElementById('2410.10168v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.10168v1-abstract-full" style="display: none;"> Diffusion models, known for their impressive image generation abilities, have played a pivotal role in the rise of visual text generation. Nevertheless, existing visual text generation methods often focus on generating entire images with text prompts, leading to imprecise control and limited practicality. A more promising direction is visual text blending, which focuses on seamlessly merging texts onto text-free backgrounds. However, existing visual text blending methods often struggle to generate high-fidelity and diverse images due to a shortage of backgrounds for synthesis and limited generalization capabilities. To overcome these challenges, we propose a new visual text blending paradigm including both creating backgrounds and rendering texts. Specifically, a background generator is developed to produce high-fidelity and text-free natural images. Moreover, a text renderer named GlyphOnly is designed for achieving visually plausible text-background integration. GlyphOnly, built on a Stable Diffusion framework, utilizes glyphs and backgrounds as conditions for accurate rendering and consistency control, as well as equipped with an adaptive text block exploration strategy for small-scale text rendering. We also explore several downstream applications based on our method, including scene text dataset synthesis for boosting scene text detectors, as well as text image customization and editing. Code and model will be available at \url{}. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.10168v1-abstract-full').style.display = 'none'; document.getElementById('2410.10168v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 14 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted to ECAI2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.10133</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zeng%2C+W">Weichao Zeng</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yan Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+Z">Zhenhang Li</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+D">Dongbao Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+Y">Yu Zhou</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.10133v1-abstract-short" style="display: inline;"> Centred on content modification and style preservation, Scene Text Editing (STE) remains a challenging task despite considerable progress in text-to-image synthesis and text-driven image manipulation recently. GAN-based STE methods generally encounter a common issue of model generalization, while Diffusion-based STE methods suffer from undesired style deviations. To address these problems, we prop&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.10133v1-abstract-full').style.display = 'inline'; document.getElementById('2410.10133v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.10133v1-abstract-full" style="display: none;"> Centred on content modification and style preservation, Scene Text Editing (STE) remains a challenging task despite considerable progress in text-to-image synthesis and text-driven image manipulation recently. GAN-based STE methods generally encounter a common issue of model generalization, while Diffusion-based STE methods suffer from undesired style deviations. To address these problems, we propose TextCtrl, a diffusion-based method that edits text with prior guidance control. Our method consists of two key components: (i) By constructing fine-grained text style disentanglement and robust text glyph structure representation, TextCtrl explicitly incorporates Style-Structure guidance into model design and network training, significantly improving text style consistency and rendering accuracy. (ii) To further leverage the style prior, a Glyph-adaptive Mutual Self-attention mechanism is proposed which deconstructs the implicit fine-grained features of the source image to enhance style consistency and vision quality during inference. Furthermore, to fill the vacancy of the real-world STE evaluation benchmark, we create the first real-world image-pair dataset termed ScenePair for fair comparisons. Experiments demonstrate the effectiveness of TextCtrl compared with previous methods concerning both style fidelity and text accuracy. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.10133v1-abstract-full').style.display = 'none'; document.getElementById('2410.10133v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.05243</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Gou%2C+B">Boyu Gou</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+R">Ruohan Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Zheng%2C+B">Boyuan Zheng</a>, <a href="/search/cs?searchtype=author&amp;query=Xie%2C+Y">Yanan Xie</a>, <a href="/search/cs?searchtype=author&amp;query=Chang%2C+C">Cheng Chang</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yiheng Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Sun%2C+H">Huan Sun</a>, <a href="/search/cs?searchtype=author&amp;query=Su%2C+Y">Yu Su</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.05243v1-abstract-short" style="display: inline;"> Multimodal large language models (MLLMs) are transforming the capabilities of graphical user interface (GUI) agents, facilitating their transition from controlled simulations to complex, real-world applications across various platforms. However, the effectiveness of these agents hinges on the robustness of their grounding capability. Current GUI agents predominantly utilize text-based representati&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.05243v1-abstract-full').style.display = 'inline'; document.getElementById('2410.05243v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.05243v1-abstract-full" style="display: none;"> Multimodal large language models (MLLMs) are transforming the capabilities of graphical user interface (GUI) agents, facilitating their transition from controlled simulations to complex, real-world applications across various platforms. However, the effectiveness of these agents hinges on the robustness of their grounding capability. Current GUI agents predominantly utilize text-based representations such as HTML or accessibility trees, which, despite their utility, often introduce noise, incompleteness, and increased computational overhead. In this paper, we advocate a human-like embodiment for GUI agents that perceive the environment entirely visually and directly take pixel-level operations on the GUI. The key is visual grounding models that can accurately map diverse referring expressions of GUI elements to their coordinates on the GUI across different platforms. We show that a simple recipe, which includes web-based synthetic data and slight adaptation of the LLaVA architecture, is surprisingly effective for training such visual grounding models. We collect the largest dataset for GUI visual grounding so far, containing 10M GUI elements and their referring expressions over 1.3M screenshots, and use it to train UGround, a strong universal visual grounding model for GUI agents. Empirical results on six benchmarks spanning three categories (grounding, offline agent, and online agent) show that 1) UGround substantially outperforms existing visual grounding models for GUI agents, by up to 20% absolute, and 2) agents with UGround outperform state-of-the-art agents, despite the fact that existing agents use additional text-based input while ours only uses visual perception. These results provide strong support for the feasibility and promises of GUI agents that navigate the digital world as humans do. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.05243v1-abstract-full').style.display = 'none'; document.getElementById('2410.05243v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 7 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.17618</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> </div> </div> <p class="title is-5 mathjax"> Learning Occlusion-aware Decision-making from Agent Interaction via Active Perception </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Jia%2C+J">Jie Jia</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yiming Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Gan%2C+Z">Zhongxue Gan</a>, <a href="/search/cs?searchtype=author&amp;query=Ding%2C+W">Wenchao Ding</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.17618v2-abstract-short" style="display: inline;"> Occlusion-aware decision-making is essential in autonomous driving due to the high uncertainty of various occlusions. Recent occlusion-aware decision-making methods encounter issues such as high computational complexity, scenario scalability challenges, or reliance on limited expert data. Benefiting from automatically generating data by exploration randomization, we uncover that reinforcement lear&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.17618v2-abstract-full').style.display = 'inline'; document.getElementById('2409.17618v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.17618v2-abstract-full" style="display: none;"> Occlusion-aware decision-making is essential in autonomous driving due to the high uncertainty of various occlusions. Recent occlusion-aware decision-making methods encounter issues such as high computational complexity, scenario scalability challenges, or reliance on limited expert data. Benefiting from automatically generating data by exploration randomization, we uncover that reinforcement learning (RL) may show promise in occlusion-aware decision-making. However, previous occlusion-aware RL faces challenges in expanding to various dynamic and static occlusion scenarios, low learning efficiency, and lack of predictive ability. To address these issues, we introduce Pad-AI, a self-reinforcing framework to learn occlusion-aware decision-making through active perception. Pad-AI utilizes vectorized representation to represent occluded environments efficiently and learns over the semantic motion primitives to focus on high-level active perception exploration. Furthermore, Pad-AI integrates prediction and RL within a unified framework to provide risk-aware learning and security guarantees. Our framework was tested in challenging scenarios under both dynamic and static occlusions and demonstrated efficient and general perception-aware exploration performance to other strong baselines in closed-loop evaluations. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.17618v2-abstract-full').style.display = 'none'; document.getElementById('2409.17618v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 26 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 26 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.14485</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yan Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+P">Peitian Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+Z">Zheng Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Qin%2C+M">Minghao Qin</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+J">Junjie Zhou</a>, <a href="/search/cs?searchtype=author&amp;query=Huang%2C+T">Tiejun Huang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+B">Bo Zhao</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.14485v3-abstract-short" style="display: inline;"> Although current Multi-modal Large Language Models (MLLMs) demonstrate promising results in video understanding, processing extremely long videos remains an ongoing challenge. Typically, MLLMs struggle with handling thousands of visual tokens that exceed the maximum context length, and they suffer from the information decay due to token aggregation. Another challenge is the high computational cost&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.14485v3-abstract-full').style.display = 'inline'; document.getElementById('2409.14485v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.14485v3-abstract-full" style="display: none;"> Although current Multi-modal Large Language Models (MLLMs) demonstrate promising results in video understanding, processing extremely long videos remains an ongoing challenge. Typically, MLLMs struggle with handling thousands of visual tokens that exceed the maximum context length, and they suffer from the information decay due to token aggregation. Another challenge is the high computational cost stemming from the large number of video tokens. To tackle these issues, we propose Video-XL, an extra-long vision language model designed for efficient hour-scale video understanding. Specifically, we argue that LLMs can be adapted as effective visual condensers and propose Visual Context Latent Summarization which condenses visual contexts into highly compact forms. Extensive experiments demonstrate that our model achieves promising results on popular long video understanding benchmarks. For example, Video-XL outperforms the current state-of-the-art method on VNBench by nearly 10\% in accuracy. Moreover, Video-XL presents an impressive balance between efficiency and effectiveness, processing 2048 frames on a single 80GB GPU while achieving nearly 95% accuracy in the Needle-in-a-Haystack evaluation. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.14485v3-abstract-full').style.display = 'none'; document.getElementById('2409.14485v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 18 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 22 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.08665</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Systems and Control">eess.SY</span> </div> </div> <p class="title is-5 mathjax"> Agile Decision-Making and Safety-Critical Motion Planning for Emergency Autonomous Vehicles </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yiming Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+J">Jingyuan Zhou</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+F">Fu Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.08665v3-abstract-short" style="display: inline;"> Efficiency is critical for autonomous vehicles (AVs), especially for emergency AVs. However, most existing methods focus on regular vehicles, overlooking the distinct strategies required by emergency vehicles to address the challenge of maximizing efficiency while ensuring safety. In this paper, we propose an Integrated Agile Decision-Making with Active and Safety-Critical Motion Planning System (&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.08665v3-abstract-full').style.display = 'inline'; document.getElementById('2409.08665v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.08665v3-abstract-full" style="display: none;"> Efficiency is critical for autonomous vehicles (AVs), especially for emergency AVs. However, most existing methods focus on regular vehicles, overlooking the distinct strategies required by emergency vehicles to address the challenge of maximizing efficiency while ensuring safety. In this paper, we propose an Integrated Agile Decision-Making with Active and Safety-Critical Motion Planning System (IDEAM). IDEAM focuses on enabling emergency AVs, such as ambulances, to actively attain efficiency in dense traffic scenarios with safety in mind. Firstly, the speed-centric decision-making algorithm named the long short-term spatio-temporal graph-centric decision-making (LSGM) is given. LSGM comprises conditional depth-first search (C-DFS) for multiple paths generation as well as methods for speed gains and risk evaluation for path selection, which presents a robust algorithm for high efficiency and safety consideration. Secondly, with an output path from LSGM, the motion planner reconsiders environmental conditions to decide constraints states for the final planning stage, among which the lane-probing state is designed for actively attaining spatial and speed advantage. Thirdly, under the Frenet-based model predictive control (MPC) framework with final constraints state and selected path, the safety-critical motion planner employs decoupled discrete control barrier functions (DCBFs) and linearized discrete-time high-order control barrier functions (DHOCBFs) to model the constraints associated with different driving behaviors, making the optimal optimization problem convex. Finally, we extensively validate our system using scenarios from a randomly synthetic dataset, demonstrating its capability to achieve speed benefits and assure safety simultaneously. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.08665v3-abstract-full').style.display = 'none'; document.getElementById('2409.08665v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 22 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 13 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.06277</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yao Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Hu%2C+W">Wenyang Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Ng%2C+S">See-Kiong Ng</a>, <a href="/search/cs?searchtype=author&amp;query=Low%2C+B+K+H">Bryan Kian Hsiang Low</a>, <a href="/search/cs?searchtype=author&amp;query=Yu%2C+F+R">Fei Richard Yu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.06277v2-abstract-short" style="display: inline;"> Large Language Models (LLMs) have become indispensable in numerous real-world applications. Unfortunately, fine-tuning these models at scale, especially in federated settings where data privacy and communication efficiency are critical, presents significant challenges. Existing methods often resort to parameter-efficient fine-tuning (PEFT) to mitigate communication overhead, but this typically com&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.06277v2-abstract-full').style.display = 'inline'; document.getElementById('2409.06277v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.06277v2-abstract-full" style="display: none;"> Large Language Models (LLMs) have become indispensable in numerous real-world applications. Unfortunately, fine-tuning these models at scale, especially in federated settings where data privacy and communication efficiency are critical, presents significant challenges. Existing methods often resort to parameter-efficient fine-tuning (PEFT) to mitigate communication overhead, but this typically comes at the cost of model accuracy. To address these limitations, we propose federated full-parameter tuning at scale for LLMs (Ferret), the first first-order method with shared randomness to enable scalable full-parameter tuning of LLMs across decentralized data sources while maintaining competitive model accuracy. Ferret accomplishes this through three aspects: (1) it employs widely applied first-order methods for efficient local updates; (2) it projects these updates into a low-dimensional space to considerably reduce communication overhead; and (3) it reconstructs local updates from this low-dimensional space with shared randomness to facilitate effective full-parameter global aggregation, ensuring fast convergence and competitive final performance. Our rigorous theoretical analyses and insights along with extensive experiments, show that Ferret significantly enhances the scalability of existing federated full-parameter tuning approaches by achieving high computational efficiency, reduced communication overhead, and fast convergence, all while maintaining competitive model accuracy. Our implementation is available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.06277v2-abstract-full').style.display = 'none'; document.getElementById('2409.06277v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 10 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.10774</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Flexora: Flexible Low Rank Adaptation for Large Language Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Wei%2C+C">Chenxing Wei</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yao Shu</a>, <a href="/search/cs?searchtype=author&amp;query=He%2C+Y+T">Ying Tiffany He</a>, <a href="/search/cs?searchtype=author&amp;query=Yu%2C+F+R">Fei Richard Yu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.10774v2-abstract-short" style="display: inline;"> Large Language Models (LLMs) are driving advancements in artificial intelligence by increasing the scale of model parameters, which has significantly enhanced generalization ability and unlocked new capabilities in practice. However, their performance in specific downstream tasks is usually hindered by their knowledge boundaries on these tasks. Thus, fine-tuning techniques, especially the widely u&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.10774v2-abstract-full').style.display = 'inline'; document.getElementById('2408.10774v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.10774v2-abstract-full" style="display: none;"> Large Language Models (LLMs) are driving advancements in artificial intelligence by increasing the scale of model parameters, which has significantly enhanced generalization ability and unlocked new capabilities in practice. However, their performance in specific downstream tasks is usually hindered by their knowledge boundaries on these tasks. Thus, fine-tuning techniques, especially the widely used Low-Rank Adaptation (LoRA) method, have been introduced to expand the boundaries on these tasks, whereas LoRA would underperform on certain tasks owing to its potential overfitting on these tasks. To overcome this overfitting and improve the performance of LoRA, we propose the flexible low rank adaptation (Flexora) method to automatically and flexibly select the most important layers needing to be fine-tuned to achieve the best performance on different downstream tasks. Specifically, Flexora firstly frames this layer selection problem as a well-defined hyperparameter optimization (HPO) problem, then addresses it using the unrolled differentiation (UD) method, and finally selects the most useful layers based on the optimized hyperparameters. Our extensive experiments on many pretrained models and natural language tasks show that Flexora is able to consistently improve over the existing baselines, indicating the effectiveness of our Flexora in practice. We additionally provide insightful theoretical results and many ablation studies to deliver a comprehensive understanding of our Flexora. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.10774v2-abstract-full').style.display = 'none'; document.getElementById('2408.10774v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 20 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">29 pages, 13 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.08538</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Information Retrieval">cs.IR</span> </div> </div> <p class="title is-5 mathjax"> Don&#39;t Click the Bait: Title Debiasing News Recommendation via Cross-Field Contrastive Learning </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yijie Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+X">Xiaokun Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+Y">Youlin Wu</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+B">Bo Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+L">Liang Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Lin%2C+H">Hongfei Lin</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.08538v1-abstract-short" style="display: inline;"> News recommendation emerges as a primary means for users to access content of interest from the vast amount of news. The title clickbait extensively exists in news domain and increases the difficulty for news recommendation to offer satisfactory services for users. Fortunately, we find that news abstract, as a critical field of news, aligns cohesively with the news authenticity. To this end, we pr&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.08538v1-abstract-full').style.display = 'inline'; document.getElementById('2408.08538v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.08538v1-abstract-full" style="display: none;"> News recommendation emerges as a primary means for users to access content of interest from the vast amount of news. The title clickbait extensively exists in news domain and increases the difficulty for news recommendation to offer satisfactory services for users. Fortunately, we find that news abstract, as a critical field of news, aligns cohesively with the news authenticity. To this end, we propose a Title Debiasing News Recommendation with Cross-field Contrastive learning (TDNR-C2) to overcome the title bias by incorporating news abstract. Specifically, a multi-field knowledge extraction module is devised to extract multi-view knowledge about news from various fields. Afterwards, we present a cross-field contrastive learning module to conduct bias removal via contrasting learned knowledge from title and abstract fileds. Experimental results on a real-world dataset demonstrate the superiority of the proposed TDNR-C2 over existing state-of-the-art methods. Further analysis also indicates the significance of news abstract for title debiasing. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.08538v1-abstract-full').style.display = 'none'; document.getElementById('2408.08538v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 16 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.12817</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Sound">cs.SD</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Audio and Speech Processing">eess.AS</span> </div> </div> <p class="title is-5 mathjax"> Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yuchun Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Hu%2C+B">Bo Hu</a>, <a href="/search/cs?searchtype=author&amp;query=He%2C+Y">Yifeng He</a>, <a href="/search/cs?searchtype=author&amp;query=Shi%2C+H">Hao Shi</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+L">Longbiao Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Dang%2C+J">Jianwu Dang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.12817v1-abstract-short" style="display: inline;"> Accurately finding the wrong words in the automatic speech recognition (ASR) hypothesis and recovering them well-founded is the goal of speech error correction. In this paper, we propose a non-autoregressive speech error correction method. A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses as the reference to find the wrong word position. Besides, the acoustic f&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.12817v1-abstract-full').style.display = 'inline'; document.getElementById('2407.12817v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.12817v1-abstract-full" style="display: none;"> Accurately finding the wrong words in the automatic speech recognition (ASR) hypothesis and recovering them well-founded is the goal of speech error correction. In this paper, we propose a non-autoregressive speech error correction method. A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses as the reference to find the wrong word position. Besides, the acoustic feature from the ASR encoder is also used to provide the correct pronunciation references. N-best candidates from ASR are aligned using the edit path, to confirm each other and recover some missing character errors. Furthermore, the cross-attention mechanism fuses the information between error correction references and the ASR hypothesis. The experimental results show that both the acoustic and confidence references help with error correction. The proposed system reduces the error rate by 21% compared with the ASR model. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.12817v1-abstract-full').style.display = 'none'; document.getElementById('2407.12817v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 29 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.11948</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Rethinking Transformer-based Multi-document Summarization: An Empirical Investigation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ma%2C+C">Congbo Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+W+E">Wei Emma Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Pitawela%2C+D">Dileepa Pitawela</a>, <a href="/search/cs?searchtype=author&amp;query=Zhuang%2C+H">Haojie Zhuang</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yanfeng Shu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.11948v1-abstract-short" style="display: inline;"> The utilization of Transformer-based models prospers the growth of multi-document summarization (MDS). Given the huge impact and widespread adoption of Transformer-based models in various natural language processing tasks, investigating their performance and behaviors in the context of MDS becomes crucial for advancing the field and enhancing the quality of summary. To thoroughly examine the behav&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.11948v1-abstract-full').style.display = 'inline'; document.getElementById('2407.11948v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.11948v1-abstract-full" style="display: none;"> The utilization of Transformer-based models prospers the growth of multi-document summarization (MDS). Given the huge impact and widespread adoption of Transformer-based models in various natural language processing tasks, investigating their performance and behaviors in the context of MDS becomes crucial for advancing the field and enhancing the quality of summary. To thoroughly examine the behaviours of Transformer-based MDS models, this paper presents five empirical studies on (1) measuring the impact of document boundary separators quantitatively; (2) exploring the effectiveness of different mainstream Transformer structures; (3) examining the sensitivity of the encoder and decoder; (4) discussing different training strategies; and (5) discovering the repetition in a summary generation. The experimental results on prevalent MDS datasets and eleven evaluation metrics show the influence of document boundary separators, the granularity of different level features and different model training strategies. The results also reveal that the decoder exhibits greater sensitivity to noises compared to the encoder. This underscores the important role played by the decoder, suggesting a potential direction for future research in MDS. Furthermore, the experimental results indicate that the repetition problem in the generated summaries has correlations with the high uncertainty scores. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.11948v1-abstract-full').style.display = 'none'; document.getElementById('2407.11948v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 16 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.04331</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Sound">cs.SD</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Audio and Speech Processing">eess.AS</span> </div> </div> <p class="title is-5 mathjax"> MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yangyang Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+H">Haiming Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+Z">Ziqin Zhou</a>, <a href="/search/cs?searchtype=author&amp;query=Hengel%2C+A+v+d">Anton van den Hengel</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+L">Lingqiao Liu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.04331v1-abstract-short" style="display: inline;"> Automatically generating symbolic music-music scores tailored to specific human needs-can be highly beneficial for musicians and enthusiasts. Recent studies have shown promising results using extensive datasets and advanced transformer architectures. However, these state-of-the-art models generally offer only basic control over aspects like tempo and style for the entire composition, lacking the a&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.04331v1-abstract-full').style.display = 'inline'; document.getElementById('2407.04331v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.04331v1-abstract-full" style="display: none;"> Automatically generating symbolic music-music scores tailored to specific human needs-can be highly beneficial for musicians and enthusiasts. Recent studies have shown promising results using extensive datasets and advanced transformer architectures. However, these state-of-the-art models generally offer only basic control over aspects like tempo and style for the entire composition, lacking the ability to manage finer details, such as control at the level of individual bars. While fine-tuning a pre-trained symbolic music generation model might seem like a straightforward method for achieving this finer control, our research indicates challenges in this approach. The model often fails to respond adequately to new, fine-grained bar-level control signals. To address this, we propose two innovative solutions. First, we introduce a pre-training task designed to link control signals directly with corresponding musical tokens, which helps in achieving a more effective initialization for subsequent fine-tuning. Second, we implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts. Together, these techniques significantly enhance our ability to control music generation at the bar level, showing a 13.06\% improvement over conventional methods. Our subjective evaluations also confirm that this enhanced control does not compromise the musical quality of the original pre-trained generative model. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.04331v1-abstract-full').style.display = 'none'; document.getElementById('2407.04331v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Demo is available at:</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.14473</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Data-Centric AI in the Age of Large Language Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Xu%2C+X">Xinyi Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+Z">Zhaoxuan Wu</a>, <a href="/search/cs?searchtype=author&amp;query=Qiao%2C+R">Rui Qiao</a>, <a href="/search/cs?searchtype=author&amp;query=Verma%2C+A">Arun Verma</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yao Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+J">Jingtan Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Niu%2C+X">Xinyuan Niu</a>, <a href="/search/cs?searchtype=author&amp;query=He%2C+Z">Zhenfeng He</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+J">Jiangwei Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+Z">Zijian Zhou</a>, <a href="/search/cs?searchtype=author&amp;query=Lau%2C+G+K+R">Gregory Kang Ruey Lau</a>, <a href="/search/cs?searchtype=author&amp;query=Dao%2C+H">Hieu Dao</a>, <a href="/search/cs?searchtype=author&amp;query=Agussurja%2C+L">Lucas Agussurja</a>, <a href="/search/cs?searchtype=author&amp;query=Sim%2C+R+H+L">Rachael Hwee Ling Sim</a>, <a href="/search/cs?searchtype=author&amp;query=Lin%2C+X">Xiaoqiang Lin</a>, <a href="/search/cs?searchtype=author&amp;query=Hu%2C+W">Wenyang Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Dai%2C+Z">Zhongxiang Dai</a>, <a href="/search/cs?searchtype=author&amp;query=Koh%2C+P+W">Pang Wei Koh</a>, <a href="/search/cs?searchtype=author&amp;query=Low%2C+B+K+H">Bryan Kian Hsiang Low</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.14473v1-abstract-short" style="display: inline;"> This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.14473v1-abstract-full').style.display = 'inline'; document.getElementById('2406.14473v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.14473v1-abstract-full" style="display: none;"> This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization. In each scenario, we underscore the importance of data, highlight promising research directions, and articulate the potential impacts on the research community and, where applicable, the society as a whole. For instance, we advocate for a suite of data-centric benchmarks tailored to the scale and complexity of data for LLMs. These benchmarks can be used to develop new data curation methods and document research efforts and results, which can help promote openness and transparency in AI and LLM research. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.14473v1-abstract-full').style.display = 'none'; document.getElementById('2406.14473v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 20 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Preprint</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.07438</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> DeformTime: Capturing Variable Dependencies with Deformable Attention for Time Series Forecasting </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yuxuan Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Lampos%2C+V">Vasileios Lampos</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.07438v2-abstract-short" style="display: inline;"> In multivariate time series (MTS) forecasting, existing state-of-the-art deep learning approaches tend to focus on autoregressive formulations and overlook the information within exogenous indicators. To address this limitation, we present DeformTime, a neural network architecture that attempts to capture correlated temporal patterns from the input space, and hence, improve forecasting accuracy. I&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.07438v2-abstract-full').style.display = 'inline'; document.getElementById('2406.07438v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.07438v2-abstract-full" style="display: none;"> In multivariate time series (MTS) forecasting, existing state-of-the-art deep learning approaches tend to focus on autoregressive formulations and overlook the information within exogenous indicators. To address this limitation, we present DeformTime, a neural network architecture that attempts to capture correlated temporal patterns from the input space, and hence, improve forecasting accuracy. It deploys two core operations performed by deformable attention blocks (DABs): learning dependencies across variables from different time steps (variable DAB), and preserving temporal dependencies in data from previous time steps (temporal DAB). Input data transformation is explicitly designed to enhance learning from the deformed series of information while passing through a DAB. We conduct extensive experiments on 6 MTS data sets, using previously established benchmarks as well as challenging infectious disease modelling tasks with more exogenous variables. The results demonstrate that DeformTime improves accuracy against previous competitive methods across the vast majority of MTS forecasting tasks, reducing the mean absolute error by 10% on average. Notably, performance gains remain consistent across longer forecasting horizons. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.07438v2-abstract-full').style.display = 'none'; document.getElementById('2406.07438v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 18 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 11 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">The code is available at</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.04264</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+J">Junjie Zhou</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yan Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+B">Bo Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+B">Boya Wu</a>, <a href="/search/cs?searchtype=author&amp;query=Xiao%2C+S">Shitao Xiao</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+X">Xi Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Xiong%2C+Y">Yongping Xiong</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+B">Bo Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Huang%2C+T">Tiejun Huang</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+Z">Zheng Liu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.04264v2-abstract-short" style="display: inline;"> The evaluation of Long Video Understanding (LVU) performance poses an important but challenging research problem. Despite previous efforts, the existing video understanding benchmarks are severely constrained by several issues, especially the insufficient lengths of videos, a lack of diversity in video types and evaluation tasks, and the inappropriateness for evaluating LVU performances. To addres&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.04264v2-abstract-full').style.display = 'inline'; document.getElementById('2406.04264v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.04264v2-abstract-full" style="display: none;"> The evaluation of Long Video Understanding (LVU) performance poses an important but challenging research problem. Despite previous efforts, the existing video understanding benchmarks are severely constrained by several issues, especially the insufficient lengths of videos, a lack of diversity in video types and evaluation tasks, and the inappropriateness for evaluating LVU performances. To address the above problems, we propose a new benchmark, called MLVU (Multi-task Long Video Understanding Benchmark), for the comprehensive and in-depth evaluation of LVU. MLVU presents the following critical values: 1) The substantial and flexible extension of video lengths, which enables the benchmark to evaluate LVU performance across a wide range of durations. 2) The inclusion of various video genres, e.g., movies, surveillance footage, egocentric videos, cartoons, game videos, etc., which reflects the models&#39; LVU performances in different scenarios. 3) The development of diversified evaluation tasks, which enables a comprehensive examination of MLLMs&#39; key abilities in long-video understanding. The empirical study with 20 latest MLLMs reveals significant room for improvement in today&#39;s technique, as all existing methods struggle with most of the evaluation tasks and exhibit severe performance degradation when handling longer videos. Additionally, it suggests that factors such as context length, image-understanding quality, and the choice of LLM backbone can play critical roles in future advancements. We anticipate that MLVU will advance the research of long video understanding by providing a comprehensive and in-depth analysis of MLLMs. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.04264v2-abstract-full').style.display = 'none'; document.getElementById('2406.04264v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 19 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 6 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.02309</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Youwei Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Xiao%2C+X">Xi Xiao</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+D">Derui Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Cao%2C+Y">Yuxin Cao</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+S">Siji Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Xue%2C+J">Jason Xue</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+L">Linyi Li</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+B">Bo Li</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.02309v2-abstract-short" style="display: inline;"> Randomized Smoothing (RS) is currently a scalable certified defense method providing robustness certification against adversarial examples. Although significant progress has been achieved in providing defenses against $\ell_p$ adversaries, the interaction between the smoothing distribution and the robustness certification still remains vague. In this work, we comprehensively study the effect of tw&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.02309v2-abstract-full').style.display = 'inline'; document.getElementById('2406.02309v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.02309v2-abstract-full" style="display: none;"> Randomized Smoothing (RS) is currently a scalable certified defense method providing robustness certification against adversarial examples. Although significant progress has been achieved in providing defenses against $\ell_p$ adversaries, the interaction between the smoothing distribution and the robustness certification still remains vague. In this work, we comprehensively study the effect of two families of distributions, named Exponential Standard Gaussian (ESG) and Exponential General Gaussian (EGG) distributions, on Randomized Smoothing and Double Sampling Randomized Smoothing (DSRS). We derive an analytic formula for ESG&#39;s certified radius, which converges to the origin formula of RS as the dimension $d$ increases. Additionally, we prove that EGG can provide tighter constant factors than DSRS in providing $惟(\sqrt{d})$ lower bounds of $\ell_2$ certified radius, and thus further addresses the curse of dimensionality in RS. Our experiments on real-world datasets confirm our theoretical analysis of the ESG distributions, that they provide almost the same certification under different exponents $畏$ for both RS and DSRS. In addition, EGG brings a significant improvement to the DSRS certification, but the mechanism can be different when the classifier properties are different. Compared to the primitive DSRS, the increase in certified accuracy provided by EGG is prominent, up to 6.4% on ImageNet. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.02309v2-abstract-full').style.display = 'none'; document.getElementById('2406.02309v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 4 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">ICML 2024 Poster</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.19131</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Distributed, Parallel, and Cluster Computing">cs.DC</span> </div> </div> <p class="title is-5 mathjax"> Learning Interpretable Scheduling Algorithms for Data Processing Clusters </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Hu%2C+Z">Zhibo Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+C">Chen Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Helen"> Helen</a>, <a href="/search/cs?searchtype=author&amp;query=Paik"> Paik</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yanfeng Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhu%2C+L">Liming Zhu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.19131v1-abstract-short" style="display: inline;"> Workloads in data processing clusters are often represented in the form of DAG (Directed Acyclic Graph) jobs. Scheduling DAG jobs is challenging. Simple heuristic scheduling algorithms are often adopted in practice in production data centres. There is much room for scheduling performance optimisation for cost saving. Recently, reinforcement learning approaches (like decima) have been attempted to&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.19131v1-abstract-full').style.display = 'inline'; document.getElementById('2405.19131v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.19131v1-abstract-full" style="display: none;"> Workloads in data processing clusters are often represented in the form of DAG (Directed Acyclic Graph) jobs. Scheduling DAG jobs is challenging. Simple heuristic scheduling algorithms are often adopted in practice in production data centres. There is much room for scheduling performance optimisation for cost saving. Recently, reinforcement learning approaches (like decima) have been attempted to optimise DAG job scheduling and demonstrate clear performance gain in comparison to traditional algorithms. However, reinforcement learning (RL) approaches face their own problems in real-world deployment. In particular, their black-box decision making processes and generalizability in unseen workloads may add a non-trivial burden to the cluster administrators. Moreover, adapting RL models on unseen workloads often requires significant amount of training data, which leaves edge cases run in a sub-optimal mode. To fill the gap, we propose a new method to distill a simple scheduling policy based on observations of the behaviours of a complex deep learning model. The simple model not only provides interpretability of scheduling decisions, but also adaptive to edge cases easily through tuning. We show that our method achieves high fidelity to the decisions made by deep learning models and outperforms these models when additional heuristics are taken into account. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.19131v1-abstract-full').style.display = 'none'; document.getElementById('2405.19131v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 29 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">20 pages, 18 figures</span> </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">MSC Class:</span> 68M20 <span class="has-text-black-bis has-text-weight-semibold">ACM Class:</span> I.2.8; D.4.1 </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.17478</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> </div> </div> <p class="title is-5 mathjax"> ROSE: Register Assisted General Time Series Forecasting with Decomposed Frequency Learning </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yihang Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Qiu%2C+Y">Yuying Qiu</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+P">Peng Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+K">Kai Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yang Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Rao%2C+Z">Zhongwen Rao</a>, <a href="/search/cs?searchtype=author&amp;query=Pan%2C+L">Lujia Pan</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+B">Bin Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Guo%2C+C">Chenjuan Guo</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.17478v2-abstract-short" style="display: inline;"> With the increasing collection of time series data from various domains, there arises a strong demand for general time series forecasting models pre-trained on a large number of time-series datasets to support a variety of downstream prediction tasks. Enabling general time series forecasting faces two challenges: how to obtain unified representations from multi-domian time series data, and how to&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.17478v2-abstract-full').style.display = 'inline'; document.getElementById('2405.17478v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.17478v2-abstract-full" style="display: none;"> With the increasing collection of time series data from various domains, there arises a strong demand for general time series forecasting models pre-trained on a large number of time-series datasets to support a variety of downstream prediction tasks. Enabling general time series forecasting faces two challenges: how to obtain unified representations from multi-domian time series data, and how to capture domain-specific features from time series data across various domains for adaptive transfer in downstream tasks. To address these challenges, we propose a Register Assisted General Time Series Forecasting Model with Decomposed Frequency Learning (ROSE), a novel pre-trained model for time series forecasting. ROSE employs Decomposed Frequency Learning for the pre-training task, which decomposes coupled semantic and periodic information in time series with frequency-based masking and reconstruction to obtain unified representations across domains. We also equip ROSE with a Time Series Register, which learns to generate a register codebook to capture domain-specific representations during pre-training and enhances domain-adaptive transfer by selecting related register tokens on downstream tasks. After pre-training on large-scale time series data, ROSE achieves state-of-the-art forecasting performance on 8 real-world benchmarks. Remarkably, even in few-shot scenarios, it demonstrates competitive or superior performance compared to existing methods trained with full data. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.17478v2-abstract-full').style.display = 'none'; document.getElementById('2405.17478v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 9 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 24 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.16122</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> </div> </div> <p class="title is-5 mathjax"> Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Wu%2C+Z">Zhaoxuan Wu</a>, <a href="/search/cs?searchtype=author&amp;query=Lin%2C+X">Xiaoqiang Lin</a>, <a href="/search/cs?searchtype=author&amp;query=Dai%2C+Z">Zhongxiang Dai</a>, <a href="/search/cs?searchtype=author&amp;query=Hu%2C+W">Wenyang Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yao Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Ng%2C+S">See-Kiong Ng</a>, <a href="/search/cs?searchtype=author&amp;query=Jaillet%2C+P">Patrick Jaillet</a>, <a href="/search/cs?searchtype=author&amp;query=Low%2C+B+K+H">Bryan Kian Hsiang Low</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.16122v2-abstract-short" style="display: inline;"> Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar s&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.16122v2-abstract-full').style.display = 'inline'; document.getElementById('2405.16122v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.16122v2-abstract-full" style="display: none;"> Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar selection method. Recent studies have explored retrieval-based approaches to select exemplars tailored to individual test queries, which can be undesirable due to extra test-time computation and an increased risk of data exposure. Moreover, existing methods fail to adequately account for the impact of exemplar ordering on the performance. On the other hand, the impact of the instruction, another essential component in the prompt given to the LLM, is often overlooked in existing exemplar selection methods. To address these challenges, we propose a novel method named EASE, which leverages the hidden embedding from a pre-trained language model to represent ordered sets of exemplars and uses a neural bandit algorithm to optimize the sets of exemplars while accounting for exemplar ordering. Our EASE can efficiently find an ordered set of exemplars that performs well for all test queries from a given task, thereby eliminating test-time computation. Importantly, EASE can be readily extended to jointly optimize both the exemplars and the instruction. Through extensive empirical evaluations (including novel tasks), we demonstrate the superiority of EASE over existing methods, and reveal practical insights about the impact of exemplar selection on ICL, which may be of independent interest. Our code is available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.16122v2-abstract-full').style.display = 'none'; document.getElementById('2405.16122v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 29 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 25 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">28 pages, 1 figure, 35 tables</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.15273</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shentu%2C+Q">Qichao Shentu</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+B">Beibu Li</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+K">Kai Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yang Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Rao%2C+Z">Zhongwen Rao</a>, <a href="/search/cs?searchtype=author&amp;query=Pan%2C+L">Lujia Pan</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+B">Bin Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Guo%2C+C">Chenjuan Guo</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.15273v3-abstract-short" style="display: inline;"> Time series anomaly detection plays a vital role in a wide range of applications. Existing methods require training one specific model for each dataset, which exhibits limited generalization capability across different target datasets, hindering anomaly detection performance in various scenarios with scarce training data. Aiming at this problem, we propose constructing a general time series anomal&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.15273v3-abstract-full').style.display = 'inline'; document.getElementById('2405.15273v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.15273v3-abstract-full" style="display: none;"> Time series anomaly detection plays a vital role in a wide range of applications. Existing methods require training one specific model for each dataset, which exhibits limited generalization capability across different target datasets, hindering anomaly detection performance in various scenarios with scarce training data. Aiming at this problem, we propose constructing a general time series anomaly detection model, which is pre-trained on extensive multi-domain datasets and can subsequently apply to a multitude of downstream scenarios. The significant divergence of time series data across different domains presents two primary challenges in building such a general model: (1) meeting the diverse requirements of appropriate information bottlenecks tailored to different datasets in one unified model, and (2) enabling distinguishment between multiple normal and abnormal patterns, both are crucial for effective anomaly detection in various target scenarios. To tackle these two challenges, we propose a General time series anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders (DADA), which enables flexible selection of bottlenecks based on different data and explicitly enhances clear differentiation between normal and abnormal series. We conduct extensive experiments on nine target datasets from different domains. After pre-training on multi-domain data, DADA, serving as a zero-shot anomaly detector for these datasets, still achieves competitive or even superior results compared to those models tailored to each specific dataset. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.15273v3-abstract-full').style.display = 'none'; document.getElementById('2405.15273v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 8 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 24 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.14831</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Guti%C3%A9rrez%2C+B+J">Bernal Jim茅nez Guti茅rrez</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yiheng Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Gu%2C+Y">Yu Gu</a>, <a href="/search/cs?searchtype=author&amp;query=Yasunaga%2C+M">Michihiro Yasunaga</a>, <a href="/search/cs?searchtype=author&amp;query=Su%2C+Y">Yu Su</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.14831v1-abstract-short" style="display: inline;"> In order to thrive in hostile and ever-changing natural environments, mammalian brains evolved to store large amounts of knowledge about the world and continually integrate new information while avoiding catastrophic forgetting. Despite the impressive accomplishments, large language models (LLMs), even with retrieval-augmented generation (RAG), still struggle to efficiently and effectively integra&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.14831v1-abstract-full').style.display = 'inline'; document.getElementById('2405.14831v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.14831v1-abstract-full" style="display: none;"> In order to thrive in hostile and ever-changing natural environments, mammalian brains evolved to store large amounts of knowledge about the world and continually integrate new information while avoiding catastrophic forgetting. Despite the impressive accomplishments, large language models (LLMs), even with retrieval-augmented generation (RAG), still struggle to efficiently and effectively integrate a large amount of new experiences after pre-training. In this work, we introduce HippoRAG, a novel retrieval framework inspired by the hippocampal indexing theory of human long-term memory to enable deeper and more efficient knowledge integration over new experiences. HippoRAG synergistically orchestrates LLMs, knowledge graphs, and the Personalized PageRank algorithm to mimic the different roles of neocortex and hippocampus in human memory. We compare HippoRAG with existing RAG methods on multi-hop question answering and show that our method outperforms the state-of-the-art methods remarkably, by up to 20%. Single-step retrieval with HippoRAG achieves comparable or better performance than iterative retrieval like IRCoT while being 10-30 times cheaper and 6-13 times faster, and integrating HippoRAG into IRCoT brings further substantial gains. Finally, we show that our method can tackle new types of scenarios that are out of reach of existing methods. Code and data are available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.14831v1-abstract-full').style.display = 'none'; document.getElementById('2405.14831v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 23 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.05733</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Batched Stochastic Bandit for Nondegenerate Functions </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Liu%2C+Y">Yu Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yunlu Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+T">Tianyu Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.05733v2-abstract-short" style="display: inline;"> This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{\mathcal{O}} ( A_{+}^d \sqrt{T} )$. In addition, GN only needs&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.05733v2-abstract-full').style.display = 'inline'; document.getElementById('2405.05733v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.05733v2-abstract-full" style="display: none;"> This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{\mathcal{O}} ( A_{+}^d \sqrt{T} )$. In addition, GN only needs $\mathcal{O} (\log \log T)$ batches to achieve this regret. We also provide lower bound analysis for this problem. More specifically, we prove that over some (compact) doubling metric space of doubling dimension $d$: 1. For any policy $蟺$, there exists a problem instance on which $蟺$ admits a regret of order $惟 ( A_-^d \sqrt{T})$; 2. No policy can achieve a regret of order $ A_-^d \sqrt{T} $ over all problem instances, using less than $ 惟( \log \log T ) $ rounds of communications. Our lower bound analysis shows that the GN algorithm achieves near optimal regret with minimal number of batches. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.05733v2-abstract-full').style.display = 'none'; document.getElementById('2405.05733v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 29 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 9 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">34 pages, 14 colored figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.00244</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yong Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Shen%2C+L">Liquan Shen</a>, <a href="/search/cs?searchtype=author&amp;query=Hu%2C+X">Xiangyu Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+M">Mengyao Li</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+Z">Zihao Zhou</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.00244v1-abstract-short" style="display: inline;"> As an important and practical way to obtain high dynamic range (HDR) video, HDR video reconstruction from sequences with alternating exposures is still less explored, mainly due to the lack of large-scale real-world datasets. Existing methods are mostly trained on synthetic datasets, which perform poorly in real scenes. In this work, to facilitate the development of real-world HDR video reconstruc&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.00244v1-abstract-full').style.display = 'inline'; document.getElementById('2405.00244v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.00244v1-abstract-full" style="display: none;"> As an important and practical way to obtain high dynamic range (HDR) video, HDR video reconstruction from sequences with alternating exposures is still less explored, mainly due to the lack of large-scale real-world datasets. Existing methods are mostly trained on synthetic datasets, which perform poorly in real scenes. In this work, to facilitate the development of real-world HDR video reconstruction, we present Real-HDRV, a large-scale real-world benchmark dataset for HDR video reconstruction, featuring various scenes, diverse motion patterns, and high-quality labels. Specifically, our dataset contains 500 LDRs-HDRs video pairs, comprising about 28,000 LDR frames and 4,000 HDR labels, covering daytime, nighttime, indoor, and outdoor scenes. To our best knowledge, our dataset is the largest real-world HDR video reconstruction dataset. Correspondingly, we propose an end-to-end network for HDR video reconstruction, where a novel two-stage strategy is designed to perform alignment sequentially. Specifically, the first stage performs global alignment with the adaptively estimated global offsets, reducing the difficulty of subsequent alignment. The second stage implicitly performs local alignment in a coarse-to-fine manner at the feature level using the adaptive separable convolution. Extensive experiments demonstrate that: (1) models trained on our dataset can achieve better performance on real scenes than those trained on synthetic datasets; (2) our method outperforms previous state-of-the-art methods. Our dataset is available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.00244v1-abstract-full').style.display = 'none'; document.getElementById('2405.00244v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 30 April, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">This paper has been accepted by CVPR 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2403.20198</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Information Theory">cs.IT</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Systems and Control">eess.SY</span> </div> </div> <p class="title is-5 mathjax"> Minimizing End-to-End Latency for Joint Source-Channel Coding Systems </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Chi%2C+K">Kaiyi Chi</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+Q">Qianqian Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yuanchao Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+Z">Zhaohui Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Shi%2C+Z">Zhiguo Shi</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2403.20198v1-abstract-short" style="display: inline;"> While existing studies have highlighted the advantages of deep learning (DL)-based joint source-channel coding (JSCC) schemes in enhancing transmission efficiency, they often overlook the crucial aspect of resource management during the deployment phase. In this paper, we propose an approach to minimize the transmission latency in an uplink JSCC-based system. We first analyze the correlation betwe&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.20198v1-abstract-full').style.display = 'inline'; document.getElementById('2403.20198v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2403.20198v1-abstract-full" style="display: none;"> While existing studies have highlighted the advantages of deep learning (DL)-based joint source-channel coding (JSCC) schemes in enhancing transmission efficiency, they often overlook the crucial aspect of resource management during the deployment phase. In this paper, we propose an approach to minimize the transmission latency in an uplink JSCC-based system. We first analyze the correlation between end-to-end latency and task performance, based on which the end-to-end delay model for each device is established. Then, we formulate a non-convex optimization problem aiming at minimizing the maximum end-to-end latency across all devices, which is proved to be NP-hard. We then transform the original problem into a more tractable one, from which we derive the closed form solution on the optimal compression ratio, truncation threshold selection policy, and resource allocation strategy. We further introduce a heuristic algorithm with low complexity, leveraging insights from the structure of the optimal solution. Simulation results demonstrate that both the proposed optimal algorithm and the heuristic algorithm significantly reduce end-to-end latency. Notably, the proposed heuristic algorithm achieves nearly the same performance to the optimal solution but with considerably lower computational complexity. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.20198v1-abstract-full').style.display = 'none'; document.getElementById('2403.20198v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 29 March, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> March 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">7 Pages, 5 Figures, accepted by 2024 IEEE ICC Workshop</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2403.13677</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Retina Vision Transformer (RetinaViT): Introducing Scaled Patches into Vision Transformers </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yuyang Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Bain%2C+M+E">Michael E. Bain</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2403.13677v1-abstract-short" style="display: inline;"> Humans see low and high spatial frequency components at the same time, and combine the information from both to form a visual scene. Drawing on this neuroscientific inspiration, we propose an altered Vision Transformer architecture where patches from scaled down versions of the input image are added to the input of the first Transformer Encoder layer. We name this model Retina Vision Transformer (&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.13677v1-abstract-full').style.display = 'inline'; document.getElementById('2403.13677v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2403.13677v1-abstract-full" style="display: none;"> Humans see low and high spatial frequency components at the same time, and combine the information from both to form a visual scene. Drawing on this neuroscientific inspiration, we propose an altered Vision Transformer architecture where patches from scaled down versions of the input image are added to the input of the first Transformer Encoder layer. We name this model Retina Vision Transformer (RetinaViT) due to its inspiration from the human visual system. Our experiments show that when trained on the ImageNet-1K dataset with a moderate configuration, RetinaViT achieves a 3.3% performance improvement over the original ViT. We hypothesize that this improvement can be attributed to the inclusion of low spatial frequency components in the input, which improves the ability to capture structural features, and to select and forward important features to deeper layers. RetinaViT thereby opens doors to further investigations into vertical pathways and attention patterns. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.13677v1-abstract-full').style.display = 'none'; document.getElementById('2403.13677v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 20 March, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> March 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2403.07591</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Robustifying and Boosting Training-Free Neural Architecture Search </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=He%2C+Z">Zhenfeng He</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yao Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Dai%2C+Z">Zhongxiang Dai</a>, <a href="/search/cs?searchtype=author&amp;query=Low%2C+B+K+H">Bryan Kian Hsiang Low</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2403.07591v1-abstract-short" style="display: inline;"> Neural architecture search (NAS) has become a key component of AutoML and a standard tool to automate the design of deep neural networks. Recently, training-free NAS as an emerging paradigm has successfully reduced the search costs of standard training-based NAS by estimating the true architecture performance with only training-free metrics. Nevertheless, the estimation ability of these metrics ty&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.07591v1-abstract-full').style.display = 'inline'; document.getElementById('2403.07591v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2403.07591v1-abstract-full" style="display: none;"> Neural architecture search (NAS) has become a key component of AutoML and a standard tool to automate the design of deep neural networks. Recently, training-free NAS as an emerging paradigm has successfully reduced the search costs of standard training-based NAS by estimating the true architecture performance with only training-free metrics. Nevertheless, the estimation ability of these metrics typically varies across different tasks, making it challenging to achieve robust and consistently good search performance on diverse tasks with only a single training-free metric. Meanwhile, the estimation gap between training-free metrics and the true architecture performances limits training-free NAS to achieve superior performance. To address these challenges, we propose the robustifying and boosting training-free NAS (RoBoT) algorithm which (a) employs the optimized combination of existing training-free metrics explored from Bayesian optimization to develop a robust and consistently better-performing metric on diverse tasks, and (b) applies greedy search, i.e., the exploitation, on the newly developed metric to bridge the aforementioned gap and consequently to boost the search performance of standard training-free NAS further. Remarkably, the expected performance of our RoBoT can be theoretically guaranteed, which improves over the existing training-free NAS under mild conditions with additional interesting insights. Our extensive experiments on various NAS benchmark tasks yield substantial empirical evidence to support our theoretical results. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.07591v1-abstract-full').style.display = 'none'; document.getElementById('2403.07591v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 12 March, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> March 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted by ICLR 2024. Code available at</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2403.02993</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Localized Zeroth-Order Prompt Optimization </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Hu%2C+W">Wenyang Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yao Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Yu%2C+Z">Zongmin Yu</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+Z">Zhaoxuan Wu</a>, <a href="/search/cs?searchtype=author&amp;query=Lin%2C+X">Xiangqiang Lin</a>, <a href="/search/cs?searchtype=author&amp;query=Dai%2C+Z">Zhongxiang Dai</a>, <a href="/search/cs?searchtype=author&amp;query=Ng%2C+S">See-Kiong Ng</a>, <a href="/search/cs?searchtype=author&amp;query=Low%2C+B+K+H">Bryan Kian Hsiang Low</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2403.02993v1-abstract-short" style="display: inline;"> The efficacy of large language models (LLMs) in understanding and generating natural language has aroused a wide interest in developing prompt-based methods to harness the power of black-box LLMs. Existing methodologies usually prioritize a global optimization for finding the global optimum, which however will perform poorly in certain tasks. This thus motivates us to re-think the necessity of fin&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.02993v1-abstract-full').style.display = 'inline'; document.getElementById('2403.02993v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2403.02993v1-abstract-full" style="display: none;"> The efficacy of large language models (LLMs) in understanding and generating natural language has aroused a wide interest in developing prompt-based methods to harness the power of black-box LLMs. Existing methodologies usually prioritize a global optimization for finding the global optimum, which however will perform poorly in certain tasks. This thus motivates us to re-think the necessity of finding a global optimum in prompt optimization. To answer this, we conduct a thorough empirical study on prompt optimization and draw two major insights. Contrasting with the rarity of global optimum, local optima are usually prevalent and well-performed, which can be more worthwhile for efficient prompt optimization (Insight I). The choice of the input domain, covering both the generation and the representation of prompts, affects the identification of well-performing local optima (Insight II). Inspired by these insights, we propose a novel algorithm, namely localized zeroth-order prompt optimization (ZOPO), which incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization. Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency, which we demonstrate through extensive experiments. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.02993v1-abstract-full').style.display = 'none'; document.getElementById('2403.02993v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 March, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> March 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2403.01437</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> <div class="is-inline-block" style="margin-left: 0.5rem"> <div class="tags has-addons"> <span class="tag is-dark is-size-7">doi</span> <span class="tag is-light is-size-7"><a class="" href="">10.1109/LSP.2023.3340103 <i class="fa fa-external-link" aria-hidden="true"></i></a></span> </div> </div> </div> <p class="title is-5 mathjax"> GPTSee: Enhancing Moment Retrieval and Highlight Detection via Description-Based Similarity Features </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Sun%2C+Y">Yunzhuo Sun</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+Y">Yifang Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Xie%2C+Z">Zien Xie</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yukun Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Du%2C+S">Sidan Du</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2403.01437v2-abstract-short" style="display: inline;"> Moment retrieval (MR) and highlight detection (HD) aim to identify relevant moments and highlights in video from corresponding natural language query. Large language models (LLMs) have demonstrated proficiency in various computer vision tasks. However, existing methods for MR\&amp;HD have not yet been integrated with LLMs. In this letter, we propose a novel two-stage model that takes the output of LLM&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.01437v2-abstract-full').style.display = 'inline'; document.getElementById('2403.01437v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2403.01437v2-abstract-full" style="display: none;"> Moment retrieval (MR) and highlight detection (HD) aim to identify relevant moments and highlights in video from corresponding natural language query. Large language models (LLMs) have demonstrated proficiency in various computer vision tasks. However, existing methods for MR\&amp;HD have not yet been integrated with LLMs. In this letter, we propose a novel two-stage model that takes the output of LLMs as the input to the second-stage transformer encoder-decoder. First, MiniGPT-4 is employed to generate the detailed description of the video frame and rewrite the query statement, fed into the encoder as new features. Then, semantic similarity is computed between the generated description and the rewritten queries. Finally, continuous high-similarity video frames are converted into span anchors, serving as prior position information for the decoder. Experiments demonstrate that our approach achieves a state-of-the-art result, and by using only span anchors and similarity scores as outputs, positioning accuracy outperforms traditional methods, like Moment-DETR. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.01437v2-abstract-full').style.display = 'none'; document.getElementById('2403.01437v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 March, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 3 March, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> March 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">5 pages, 3 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2402.18292</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> FSL-Rectifier: Rectify Outliers in Few-Shot Learning via Test-Time Augmentation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Bai%2C+Y">Yunwei Bai</a>, <a href="/search/cs?searchtype=author&amp;query=Tan%2C+Y+K">Ying Kiat Tan</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+S">Shiming Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yao Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+T">Tsuhan Chen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2402.18292v5-abstract-short" style="display: inline;"> Few-shot-learning (FSL) commonly requires a model to identify images (queries) that belong to classes unseen during training, based on a few labeled samples of the new classes (support set) as reference. So far, plenty of algorithms involve training data augmentation to improve the generalization capability of FSL models, but outlier queries or support images during inference can still pose great&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2402.18292v5-abstract-full').style.display = 'inline'; document.getElementById('2402.18292v5-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2402.18292v5-abstract-full" style="display: none;"> Few-shot-learning (FSL) commonly requires a model to identify images (queries) that belong to classes unseen during training, based on a few labeled samples of the new classes (support set) as reference. So far, plenty of algorithms involve training data augmentation to improve the generalization capability of FSL models, but outlier queries or support images during inference can still pose great generalization challenges. In this work, to reduce the bias caused by the outlier samples, we generate additional test-class samples by combining original samples with suitable train-class samples via a generative image combiner. Then, we obtain averaged features via an augmentor, which leads to more typical representations through the averaging. We experimentally and theoretically demonstrate the effectiveness of our method, e.g., obtaining a test accuracy improvement proportion of around 10% (e.g., from 46.86% to 53.28%) for trained FSL models. Importantly, given pretrained image combiner, our method is training-free for off-the-shelf FSL models, whose performance can be improved without extra datasets nor further training of the models themselves. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2402.18292v5-abstract-full').style.display = 'none'; document.getElementById('2402.18292v5-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 28 February, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2402.14672</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Gu%2C+Y">Yu Gu</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yiheng Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Yu%2C+H">Hao Yu</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+X">Xiao Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Dong%2C+Y">Yuxiao Dong</a>, <a href="/search/cs?searchtype=author&amp;query=Tang%2C+J">Jie Tang</a>, <a href="/search/cs?searchtype=author&amp;query=Srinivasa%2C+J">Jayanth Srinivasa</a>, <a href="/search/cs?searchtype=author&amp;query=Latapie%2C+H">Hugo Latapie</a>, <a href="/search/cs?searchtype=author&amp;query=Su%2C+Y">Yu Su</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2402.14672v2-abstract-short" style="display: inline;"> The applications of large language models (LLMs) have expanded well beyond the confines of text processing, signaling a new era where LLMs are envisioned as generalist agents capable of operating within complex environments. These environments are often highly expansive, making it impossible for the LLM to process them within its short-term memory. Motivated by recent research on extending the cap&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2402.14672v2-abstract-full').style.display = 'inline'; document.getElementById('2402.14672v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2402.14672v2-abstract-full" style="display: none;"> The applications of large language models (LLMs) have expanded well beyond the confines of text processing, signaling a new era where LLMs are envisioned as generalist agents capable of operating within complex environments. These environments are often highly expansive, making it impossible for the LLM to process them within its short-term memory. Motivated by recent research on extending the capabilities of LLMs with tools, we seek to investigate the intriguing potential of tools to augment LLMs in handling such complexity by introducing a novel class of tools, termed middleware, to aid in the proactive exploration within these massive environments. Such specialized tools can serve as a middleware layer shielding the LLM from environmental complexity. In two representative complex environments -- knowledge bases (KBs) and databases -- we demonstrate the significant potential of augmenting language agents with tools in complex environments. Notably, equipped with the middleware, GPT-4 achieves 2.8X the performance of the best baseline in tasks requiring access to database content and 2.2X in KB tasks. Our findings illuminate the path for advancing language agents in real-world applications. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2402.14672v2-abstract-full').style.display = 'none'; document.getElementById('2402.14672v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 22 February, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">EMNLP&#39;2024; 18 pages, 8 figures, 8 tables</span> </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">ACM Class:</span> I.2.7 </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2402.11427</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> </div> </div> <p class="title is-5 mathjax"> OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yao Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Fang%2C+J">Jiongfeng Fang</a>, <a href="/search/cs?searchtype=author&amp;query=He%2C+Y+T">Ying Tiffany He</a>, <a href="/search/cs?searchtype=author&amp;query=Yu%2C+F+R">Fei Richard Yu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2402.11427v2-abstract-short" style="display: inline;"> First-order optimization (FOO) algorithms are pivotal in numerous computational domains such as machine learning and signal denoising. However, their application to complex tasks like neural network training often entails significant inefficiencies due to the need for many sequential iterations for convergence. In response, we introduce first-order optimization expedited with approximately paralle&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2402.11427v2-abstract-full').style.display = 'inline'; document.getElementById('2402.11427v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2402.11427v2-abstract-full" style="display: none;"> First-order optimization (FOO) algorithms are pivotal in numerous computational domains such as machine learning and signal denoising. However, their application to complex tasks like neural network training often entails significant inefficiencies due to the need for many sequential iterations for convergence. In response, we introduce first-order optimization expedited with approximately parallelized iterations (OptEx), the first framework that enhances the efficiency of FOO by leveraging parallel computing to mitigate its iterative bottleneck. OptEx employs kernelized gradient estimation to make use of gradient history for future gradient prediction, enabling parallelization of iterations -- a strategy once considered impractical because of the inherent iterative dependency in FOO. We provide theoretical guarantees for the reliability of our kernelized gradient estimation and the iteration complexity of SGD-based OptEx, confirming that estimation errors diminish to zero as historical gradients accumulate and that SGD-based OptEx enjoys an effective acceleration rate of $惟(\sqrt{N})$ over standard SGD given parallelism of N. We also use extensive empirical studies, including synthetic functions, reinforcement learning tasks, and neural network training across various datasets, to underscore the substantial efficiency improvements achieved by OptEx. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2402.11427v2-abstract-full').style.display = 'none'; document.getElementById('2402.11427v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 29 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 17 February, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Published as a conference paper at NeurIPS 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2402.07179</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Information Retrieval">cs.IR</span> </div> </div> <p class="title is-5 mathjax"> Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Hu%2C+Z">Zhibo Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+C">Chen Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yanfeng Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Helen"> Helen</a>, <a href="/search/cs?searchtype=author&amp;query=Paik"> Paik</a>, <a href="/search/cs?searchtype=author&amp;query=Zhu%2C+L">Liming Zhu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2402.07179v3-abstract-short" style="display: inline;"> The robustness of large language models (LLMs) becomes increasingly important as their use rapidly grows in a wide range of domains. Retrieval-Augmented Generation (RAG) is considered as a means to improve the trustworthiness of text generation from LLMs. However, how the outputs from RAG-based LLMs are affected by slightly different inputs is not well studied. In this work, we find that the inser&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2402.07179v3-abstract-full').style.display = 'inline'; document.getElementById('2402.07179v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2402.07179v3-abstract-full" style="display: none;"> The robustness of large language models (LLMs) becomes increasingly important as their use rapidly grows in a wide range of domains. Retrieval-Augmented Generation (RAG) is considered as a means to improve the trustworthiness of text generation from LLMs. However, how the outputs from RAG-based LLMs are affected by slightly different inputs is not well studied. In this work, we find that the insertion of even a short prefix to the prompt leads to the generation of outputs far away from factually correct answers. We systematically evaluate the effect of such prefixes on RAG by introducing a novel optimization technique called Gradient Guided Prompt Perturbation (GGPP). GGPP achieves a high success rate in steering outputs of RAG-based LLMs to targeted wrong answers. It can also cope with instructions in the prompts requesting to ignore irrelevant context. We also exploit LLMs&#39; neuron activation difference between prompts with and without GGPP perturbations to give a method that improves the robustness of RAG-based LLMs through a highly effective detector trained on neuron activation triggered by GGPP generated prompts. Our evaluation on open-sourced LLMs demonstrates the effectiveness of our methods. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2402.07179v3-abstract-full').style.display = 'none'; document.getElementById('2402.07179v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 23 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 11 February, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">12 pages, 9 figures</span> </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">ACM Class:</span> I.2.7; H.3.3 </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2402.05956</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Pathformer: Multi-scale Transformers with Adaptive Pathways for Time Series Forecasting </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Chen%2C+P">Peng Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+Y">Yingying Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Cheng%2C+Y">Yunyao Cheng</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yang Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yihang Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Wen%2C+Q">Qingsong Wen</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+B">Bin Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Guo%2C+C">Chenjuan Guo</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2402.05956v5-abstract-short" style="display: inline;"> Transformers for time series forecasting mainly model time series from limited or fixed scales, making it challenging to capture different characteristics spanning various scales. We propose Pathformer, a multi-scale Transformer with adaptive pathways. It integrates both temporal resolution and temporal distance for multi-scale modeling. Multi-scale division divides the time series into different&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2402.05956v5-abstract-full').style.display = 'inline'; document.getElementById('2402.05956v5-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2402.05956v5-abstract-full" style="display: none;"> Transformers for time series forecasting mainly model time series from limited or fixed scales, making it challenging to capture different characteristics spanning various scales. We propose Pathformer, a multi-scale Transformer with adaptive pathways. It integrates both temporal resolution and temporal distance for multi-scale modeling. Multi-scale division divides the time series into different temporal resolutions using patches of various sizes. Based on the division of each scale, dual attention is performed over these patches to capture global correlations and local details as temporal dependencies. We further enrich the multi-scale Transformer with adaptive pathways, which adaptively adjust the multi-scale modeling process based on the varying temporal dynamics of the input, improving the accuracy and generalization of Pathformer. Extensive experiments on eleven real-world datasets demonstrate that Pathformer not only achieves state-of-the-art performance by surpassing all current models but also exhibits stronger generalization abilities under various transfer scenarios. The code is made available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2402.05956v5-abstract-full').style.display = 'none'; document.getElementById('2402.05956v5-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 15 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 4 February, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted by the 12th International Conference on Learning Representations (ICLR 2024)</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2402.03082</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yan Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Zeng%2C+W">Weichao Zeng</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+Z">Zhenhang Li</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+F">Fangmin Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+Y">Yu Zhou</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2402.03082v1-abstract-short" style="display: inline;"> Visual text, a pivotal element in both document and scene images, speaks volumes and attracts significant attention in the computer vision domain. Beyond visual text detection and recognition, the field of visual text processing has experienced a surge in research, driven by the advent of fundamental generative models. However, challenges persist due to the unique properties and features that dist&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2402.03082v1-abstract-full').style.display = 'inline'; document.getElementById('2402.03082v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2402.03082v1-abstract-full" style="display: none;"> Visual text, a pivotal element in both document and scene images, speaks volumes and attracts significant attention in the computer vision domain. Beyond visual text detection and recognition, the field of visual text processing has experienced a surge in research, driven by the advent of fundamental generative models. However, challenges persist due to the unique properties and features that distinguish text from general objects. Effectively leveraging these unique textual characteristics is crucial in visual text processing, as observed in our study. In this survey, we present a comprehensive, multi-perspective analysis of recent advancements in this field. Initially, we introduce a hierarchical taxonomy encompassing areas ranging from text image enhancement and restoration to text image manipulation, followed by different learning paradigms. Subsequently, we conduct an in-depth discussion of how specific textual features such as structure, stroke, semantics, style, and spatial context are seamlessly integrated into various tasks. Furthermore, we explore available public datasets and benchmark the reviewed methods on several widely-used datasets. Finally, we identify principal challenges and potential avenues for future research. Our aim is to establish this survey as a fundamental resource, fostering continued exploration and innovation in the dynamic area of visual text processing. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2402.03082v1-abstract-full').style.display = 'none'; document.getElementById('2402.03082v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 February, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2402.01157</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Source-Free Unsupervised Domain Adaptation with Hypothesis Consolidation of Prediction Rationale </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yangyang Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Cao%2C+X">Xiaofeng Cao</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+Q">Qi Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+B">Bowen Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+Z">Ziqin Zhou</a>, <a href="/search/cs?searchtype=author&amp;query=Hengel%2C+A+v+d">Anton van den Hengel</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+L">Lingqiao Liu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2402.01157v1-abstract-short" style="display: inline;"> Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data. The primary difficulty in this task is that the model&#39;s predictions may be inaccurate, and using these inaccurate predictions for model adaptation can lead to misleading results. To address this issue, this paper pr&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2402.01157v1-abstract-full').style.display = 'inline'; document.getElementById('2402.01157v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2402.01157v1-abstract-full" style="display: none;"> Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data. The primary difficulty in this task is that the model&#39;s predictions may be inaccurate, and using these inaccurate predictions for model adaptation can lead to misleading results. To address this issue, this paper proposes a novel approach that considers multiple prediction hypotheses for each sample and investigates the rationale behind each hypothesis. By consolidating these hypothesis rationales, we identify the most likely correct hypotheses, which we then use as a pseudo-labeled set to support a semi-supervised learning procedure for model adaptation. To achieve the optimal performance, we propose a three-step adaptation process: model pre-adaptation, hypothesis consolidation, and semi-supervised learning. Extensive experimental results demonstrate that our approach achieves state-of-the-art performance in the SFUDA task and can be easily integrated into existing approaches to improve their performance. The codes are available at \url{}. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2402.01157v1-abstract-full').style.display = 'none'; document.getElementById('2402.01157v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 2 February, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2401.07213</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">ps</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Depth-agnostic Single Image Dehazing </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Xu%2C+H">Honglei Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yan Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+S">Shaohui Liu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2401.07213v1-abstract-short" style="display: inline;"> Single image dehazing is a challenging ill-posed problem. Existing datasets for training deep learning-based methods can be generated by hand-crafted or synthetic schemes. However, the former often suffers from small scales, while the latter forces models to learn scene depth instead of haze distribution, decreasing their dehazing ability. To overcome the problem, we propose a simple yet novel syn&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2401.07213v1-abstract-full').style.display = 'inline'; document.getElementById('2401.07213v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2401.07213v1-abstract-full" style="display: none;"> Single image dehazing is a challenging ill-posed problem. Existing datasets for training deep learning-based methods can be generated by hand-crafted or synthetic schemes. However, the former often suffers from small scales, while the latter forces models to learn scene depth instead of haze distribution, decreasing their dehazing ability. To overcome the problem, we propose a simple yet novel synthetic method to decouple the relationship between haze density and scene depth, by which a depth-agnostic dataset (DA-HAZE) is generated. Meanwhile, a Global Shuffle Strategy (GSS) is proposed for generating differently scaled datasets, thereby enhancing the generalization ability of the model. Extensive experiments indicate that models trained on DA-HAZE achieve significant improvements on real-world benchmarks, with less discrepancy between SOTS and DA-SOTS (the test set of DA-HAZE). Additionally, Depth-agnostic dehazing is a more complicated task because of the lack of depth prior. Therefore, an efficient architecture with stronger feature modeling ability and fewer computational costs is necessary. We revisit the U-Net-based architectures for dehazing, in which dedicatedly designed blocks are incorporated. However, the performances of blocks are constrained by limited feature fusion methods. To this end, we propose a Convolutional Skip Connection (CSC) module, allowing vanilla feature fusion methods to achieve promising results with minimal costs. Extensive experimental results demonstrate that current state-of-the-art methods. equipped with CSC can achieve better performance and reasonable computational expense, whether the haze distribution is relevant to the scene depth. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2401.07213v1-abstract-full').style.display = 'none'; document.getElementById('2401.07213v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 14 January, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2401.02594</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Unsupervised hard Negative Augmentation for contrastive learning </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yuxuan Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Lampos%2C+V">Vasileios Lampos</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2401.02594v1-abstract-short" style="display: inline;"> We present Unsupervised hard Negative Augmentation (UNA), a method that generates synthetic negative instances based on the term frequency-inverse document frequency (TF-IDF) retrieval model. UNA uses TF-IDF scores to ascertain the perceived importance of terms in a sentence and then produces negative samples by replacing terms with respect to that. Our experiments demonstrate that models trained&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2401.02594v1-abstract-full').style.display = 'inline'; document.getElementById('2401.02594v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2401.02594v1-abstract-full" style="display: none;"> We present Unsupervised hard Negative Augmentation (UNA), a method that generates synthetic negative instances based on the term frequency-inverse document frequency (TF-IDF) retrieval model. UNA uses TF-IDF scores to ascertain the perceived importance of terms in a sentence and then produces negative samples by replacing terms with respect to that. Our experiments demonstrate that models trained with UNA improve the overall performance in semantic textual similarity tasks. Additional performance gains are obtained when combining UNA with the paraphrasing augmentation. Further results show that our method is compatible with different backbone models. Ablation studies also support the choice of having a TF-IDF-driven control on negative augmentation. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2401.02594v1-abstract-full').style.display = 'none'; document.getElementById('2401.02594v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 January, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> January 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">The code and pre-trained models are available at</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2312.05927</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Digital Libraries">cs.DL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Social and Information Networks">cs.SI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Physics and Society">physics.soc-ph</span> </div> </div> <p class="title is-5 mathjax"> The survival of scientific stylization </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yuanyuan Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Pan%2C+T">Tianxing Pan</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2312.05927v1-abstract-short" style="display: inline;"> This study elaborates a text-based metric to quantify the unique position of stylized scientific research, characterized by its innovative integration of diverse knowledge components and potential to pivot established scientific paradigms. Our analysis reveals a concerning decline in stylized research, highlighted by its comparative undervaluation in terms of citation counts and protracted peer-re&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2312.05927v1-abstract-full').style.display = 'inline'; document.getElementById('2312.05927v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2312.05927v1-abstract-full" style="display: none;"> This study elaborates a text-based metric to quantify the unique position of stylized scientific research, characterized by its innovative integration of diverse knowledge components and potential to pivot established scientific paradigms. Our analysis reveals a concerning decline in stylized research, highlighted by its comparative undervaluation in terms of citation counts and protracted peer-review duration. Despite facing these challenges, the disruptive potential of stylized research remains robust, consistently introducing groundbreaking questions and theories. This paper posits that substantive reforms are necessary to incentivize and recognize the value of stylized research, including optimizations to the peer-review process and the criteria for evaluating scientific impact. Embracing these changes may be imperative to halt the downturn in stylized research and ensure enduring scholarly exploration in endless frontiers. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2312.05927v1-abstract-full').style.display = 'none'; document.getElementById('2312.05927v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 December, 2023; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2023. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">55 pages (23 main text, 32 SI)</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2312.00411</a> <span>&nbsp;[<a href="">pdf</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> A framework for mining lifestyle profiles through multi-dimensional and high-order mobility feature clustering </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yeshuo Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+G">Gangcheng Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+K">Keyi Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Tang%2C+J">Jintong Tang</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+L">Liyan Xu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2312.00411v1-abstract-short" style="display: inline;"> Human mobility demonstrates a high degree of regularity, which facilitates the discovery of lifestyle profiles. Existing research has yet to fully utilize the regularities embedded in high-order features extracted from human mobility records in such profiling. This study proposes a progressive feature extraction strategy that mines high-order mobility features from users&#39; moving trajectory records&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2312.00411v1-abstract-full').style.display = 'inline'; document.getElementById('2312.00411v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2312.00411v1-abstract-full" style="display: none;"> Human mobility demonstrates a high degree of regularity, which facilitates the discovery of lifestyle profiles. Existing research has yet to fully utilize the regularities embedded in high-order features extracted from human mobility records in such profiling. This study proposes a progressive feature extraction strategy that mines high-order mobility features from users&#39; moving trajectory records from the spatial, temporal, and semantic dimensions. Specific features are extracted such as travel motifs, rhythms decomposed by discrete Fourier transform (DFT) of mobility time series, and vectorized place semantics by word2vec, respectively to the three dimensions, and they are further clustered to reveal the users&#39; lifestyle characteristics. An experiment using a trajectory dataset of over 500k users in Shenzhen, China yields seven user clusters with different lifestyle profiles that can be well interpreted by common sense. The results suggest the possibility of fine-grained user profiling through cross-order trajectory feature engineering and clustering. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2312.00411v1-abstract-full').style.display = 'none'; document.getElementById('2312.00411v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 1 December, 2023; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> December 2023. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2311.13381</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Distributed, Parallel, and Cluster Computing">cs.DC</span> </div> </div> <p class="title is-5 mathjax"> Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Chen%2C+Y">Yuhao Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Yan%2C+Y">Yuxuan Yan</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+Q">Qianqian Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yuanchao Shu</a>, <a href="/search/cs?searchtype=author&amp;query=He%2C+S">Shibo He</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+J">Jiming Chen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2311.13381v1-abstract-short" style="display: inline;"> Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2311.13381v1-abstract-full').style.display = 'inline'; document.getElementById('2311.13381v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2311.13381v1-abstract-full" style="display: none;"> Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices like smartphones. Confidant partitions an LLM into several sub-models so that each fits into a mobile device&#39;s memory. A pipeline parallel training mechanism is further developed to ensure fast and efficient distributed training. In addition, we propose a novel backend scheduler to allocate different attention heads to heterogeneous compute hardware, including mobile CPU and GPUs, to maximize the compute resource utilization on each edge device. Our preliminary experimental results show that Confidant achieves at most 45.3% memory reduction and 8.03x inference speedup in practical settings. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2311.13381v1-abstract-full').style.display = 'none'; document.getElementById('2311.13381v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 22 November, 2023; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2023. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">6 pages, 7 figures; Submitted to HotMobile 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2311.11572</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Emerging Technologies">cs.ET</span> </div> </div> <p class="title is-5 mathjax"> Cryogenic quasi-static embedded DRAM for energy-efficient compute-in-memory applications </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yuhao Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+H">Hongtu Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Sun%2C+H">Hao Sun</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+M">Mengru Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+W">Wenfeng Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Deng%2C+Q">Qi Deng</a>, <a href="/search/cs?searchtype=author&amp;query=Tang%2C+Z">Zhidong Tang</a>, <a href="/search/cs?searchtype=author&amp;query=Yuan%2C+Y">Yumeng Yuan</a>, <a href="/search/cs?searchtype=author&amp;query=Hu%2C+Y">Yongqi Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Gu%2C+Y">Yu Gu</a>, <a href="/search/cs?searchtype=author&amp;query=Kou%2C+X">Xufeng Kou</a>, <a href="/search/cs?searchtype=author&amp;query=Ha%2C+Y">Yajun Ha</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2311.11572v1-abstract-short" style="display: inline;"> Compute-in-memory (CIM) presents an attractive approach for energy-efficient computing in data-intensive applications. However, the development of suitable memory designs to achieve high-performance CIM remains a challenging task. Here, we propose a cryogenic quasi-static embedded DRAM to address the logic-memory mismatch of CIM. Guided by the re-calibrated cryogenic device model, the designed fou&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2311.11572v1-abstract-full').style.display = 'inline'; document.getElementById('2311.11572v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2311.11572v1-abstract-full" style="display: none;"> Compute-in-memory (CIM) presents an attractive approach for energy-efficient computing in data-intensive applications. However, the development of suitable memory designs to achieve high-performance CIM remains a challenging task. Here, we propose a cryogenic quasi-static embedded DRAM to address the logic-memory mismatch of CIM. Guided by the re-calibrated cryogenic device model, the designed four-transistor bit-cell achieves full-swing data storage, low power consumption, and extended retention time at cryogenic temperatures. Combined with the adoption of cryogenic write bitline biasing technique and readout circuitry optimization, our 4Kb cryogenic eDRAM chip demonstrates a 1.37$\times$10$^6$ times improvement in retention time, while achieving a 75 times improvement in retention variability, compared to room-temperature operation. Moreover, it also achieves outstanding power performance with a retention power of 112 fW and a dynamic power of 108 $渭$W at 4.2 K, which can be further decreased by 7.1% and 13.6% using the dynamic voltage scaling technique. This work reveals the great potential of cryogenic CMOS for high-density data storage and lays a solid foundation for energy-efficient CIM implementations. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2311.11572v1-abstract-full').style.display = 'none'; document.getElementById('2311.11572v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 20 November, 2023; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2023. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2311.07090</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level Semantic Information related to Human Feelings </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Mi%2C+Y">Yachun Mi</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+Y">Yu Li</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yan Shu</a>, <a href="/search/cs?searchtype=author&amp;query=Hui%2C+C">Chen Hui</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+P">Puchao Zhou</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+S">Shaohui Liu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2311.07090v1-abstract-short" style="display: inline;"> Video Quality Assessment (VQA) aims to simulate the process of perceiving video quality by the human visual system (HVS). The judgments made by HVS are always influenced by human subjective feelings. However, most of the current VQA research focuses on capturing various distortions in the spatial and temporal domains of videos, while ignoring the impact of human feelings. In this paper, we propose&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2311.07090v1-abstract-full').style.display = 'inline'; document.getElementById('2311.07090v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2311.07090v1-abstract-full" style="display: none;"> Video Quality Assessment (VQA) aims to simulate the process of perceiving video quality by the human visual system (HVS). The judgments made by HVS are always influenced by human subjective feelings. However, most of the current VQA research focuses on capturing various distortions in the spatial and temporal domains of videos, while ignoring the impact of human feelings. In this paper, we propose CLiF-VQA, which considers both features related to human feelings and spatial features of videos. In order to effectively extract features related to human feelings from videos, we explore the consistency between CLIP and human feelings in video perception for the first time. Specifically, we design multiple objective and subjective descriptions closely related to human feelings as prompts. Further we propose a novel CLIP-based semantic feature extractor (SFE) which extracts features related to human feelings by sliding over multiple regions of the video frame. In addition, we further capture the low-level-aware features of the video through a spatial feature extraction module. The two different features are then aggregated thereby obtaining the quality score of the video. Extensive experiments show that the proposed CLiF-VQA exhibits excellent performance on several VQA datasets. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2311.07090v1-abstract-full').style.display = 'none'; document.getElementById('2311.07090v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 November, 2023; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2023. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2311.05827</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Chen%2C+Y">Yuhao Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Yan%2C+Y">Yuxuan Yan</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+Q">Qianqian Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Shu%2C+Y">Yuanchao Shu</a>, <a href="/search/cs?searchtype=author&amp;query=He%2C+S">Shibo He</a>, <a href="/search/cs?searchtype=author&amp;query=Shi%2C+Z">Zhiguo Shi</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+J">Jiming Chen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2311.05827v1-abstract-short" style="display: inline;"> It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communicat&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2311.05827v1-abstract-full').style.display = 'inline'; document.getElementById('2311.05827v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2311.05827v1-abstract-full" style="display: none;"> It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communication overhead caused by the large amount of data transmitted from one device to another during training, as well as the sub-optimal partition point due to the inaccurate latency prediction of computation at each edge device can significantly slow down training. In this paper, we propose AccEPT, an acceleration scheme for accelerating the edge collaborative pipeline-parallel training. In particular, we propose a light-weight adaptive latency predictor to accurately estimate the computation latency of each layer at different devices, which also adapts to unseen devices through continuous learning. Therefore, the proposed latency predictor leads to better model partitioning which balances the computation loads across participating devices. Moreover, we propose a bit-level computation-efficient data compression scheme to compress the data to be transmitted between devices during training. Our numerical results demonstrate that our proposed acceleration approach is able to significantly speed up edge pipeline parallel training up to 3 times faster in the considered experimental settings. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2311.05827v1-abstract-full').style.display = 'none'; document.getElementById('2311.05827v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 9 November, 2023; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2023. </p> </li> </ol> <nav class="pagination is-small is-centered breathe-horizontal" role="navigation" aria-label="pagination"> <a href="" class="pagination-previous is-invisible">Previous </a> <a href="/search/?searchtype=author&amp;query=Shu%2C+Y&amp;start=50" class="pagination-next" >Next </a> <ul class="pagination-list"> <li> <a 