start="1"> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.14405</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Zhao%2C+Y">Yu Zhao</a>, <a href="/search/cs?searchtype=author&query=Yin%2C+H">Huifeng Yin</a>, <a href="/search/cs?searchtype=author&query=Zeng%2C+B">Bo Zeng</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+H">Hao Wang</a>, <a href="/search/cs?searchtype=author&query=Shi%2C+T">Tianqi Shi</a>, <a href="/search/cs?searchtype=author&query=Lyu%2C+C">Chenyang Lyu</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Longyue Wang</a>, <a href="/search/cs?searchtype=author&query=Luo%2C+W">Weihua Luo</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+K">Kaifu Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.14405v1-abstract-short" style="display: inline;"> Currently OpenAI o1 has sparked a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: "Can the o1 model… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.14405v1-abstract-full').style.display = 'inline'; document.getElementById('2411.14405v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.14405v1-abstract-full" style="display: none;"> Currently OpenAI o1 has sparked a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: "Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?" Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.14405v1-abstract-full').style.display = 'none'; document.getElementById('2411.14405v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.13602</a> <span> [<a href="">pdf</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Large-scale cross-modality pretrained model enhances cardiovascular state estimation and cardiomyopathy detection from electrocardiograms: An AI system development and multi-center validation study </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Ding%2C+Z">Zhengyao Ding</a>, <a href="/search/cs?searchtype=author&query=Hu%2C+Y">Yujian Hu</a>, <a href="/search/cs?searchtype=author&query=Xu%2C+Y">Youyao Xu</a>, <a href="/search/cs?searchtype=author&query=Zhao%2C+C">Chengchen Zhao</a>, <a href="/search/cs?searchtype=author&query=Li%2C+Z">Ziyu Li</a>, <a href="/search/cs?searchtype=author&query=Mao%2C+Y">Yiheng Mao</a>, <a href="/search/cs?searchtype=author&query=Li%2C+H">Haitao Li</a>, <a href="/search/cs?searchtype=author&query=Li%2C+Q">Qian Li</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+J">Jing Wang</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+Y">Yue Chen</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+M">Mengjia Chen</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Longbo Wang</a>, <a href="/search/cs?searchtype=author&query=Chu%2C+X">Xuesen Chu</a>, <a href="/search/cs?searchtype=author&query=Pan%2C+W">Weichao Pan</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+Z">Ziyi Liu</a>, <a href="/search/cs?searchtype=author&query=Wu%2C+F">Fei Wu</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+H">Hongkun Zhang</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+T">Ting Chen</a>, <a href="/search/cs?searchtype=author&query=Huang%2C+Z">Zhengxing Huang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.13602v1-abstract-short" style="display: inline;"> Cardiovascular diseases (CVDs) present significant challenges for early and accurate diagnosis. While cardiac magnetic resonance imaging (CMR) is the gold standard for assessing cardiac function and diagnosing CVDs, its high cost and technical complexity limit accessibility. In contrast, electrocardiography (ECG) offers promise for large-scale early screening. This study introduces CardiacNets, an… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.13602v1-abstract-full').style.display = 'inline'; document.getElementById('2411.13602v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.13602v1-abstract-full" style="display: none;"> Cardiovascular diseases (CVDs) present significant challenges for early and accurate diagnosis. While cardiac magnetic resonance imaging (CMR) is the gold standard for assessing cardiac function and diagnosing CVDs, its high cost and technical complexity limit accessibility. In contrast, electrocardiography (ECG) offers promise for large-scale early screening. This study introduces CardiacNets, an innovative model that enhances ECG analysis by leveraging the diagnostic strengths of CMR through cross-modal contrastive learning and generative pretraining. CardiacNets serves two primary functions: (1) it evaluates detailed cardiac function indicators and screens for potential CVDs, including coronary artery disease, cardiomyopathy, pericarditis, heart failure and pulmonary hypertension, using ECG input; and (2) it enhances interpretability by generating high-quality CMR images from ECG data. We train and validate the proposed CardiacNets on two large-scale public datasets (the UK Biobank with 41,519 individuals and the MIMIC-IV-ECG comprising 501,172 samples) as well as three private datasets (FAHZU with 410 individuals, SAHZU with 464 individuals, and QPH with 338 individuals), and the findings demonstrate that CardiacNets consistently outperforms traditional ECG-only models, substantially improving screening accuracy. Furthermore, the generated CMR images provide valuable diagnostic support for physicians of all experience levels. This proof-of-concept study highlights how ECG can facilitate cross-modal insights into cardiac function assessment, paving the way for enhanced CVD screening and diagnosis at a population level. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.13602v1-abstract-full').style.display = 'none'; document.getElementById('2411.13602v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 19 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">23 pages, 8 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.13503</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Huang%2C+Z">Ziqi Huang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+F">Fan Zhang</a>, <a href="/search/cs?searchtype=author&query=Xu%2C+X">Xiaojie Xu</a>, <a href="/search/cs?searchtype=author&query=He%2C+Y">Yinan He</a>, <a href="/search/cs?searchtype=author&query=Yu%2C+J">Jiashuo Yu</a>, <a href="/search/cs?searchtype=author&query=Dong%2C+Z">Ziyue Dong</a>, <a href="/search/cs?searchtype=author&query=Ma%2C+Q">Qianli Ma</a>, <a href="/search/cs?searchtype=author&query=Chanpaisit%2C+N">Nattapol Chanpaisit</a>, <a href="/search/cs?searchtype=author&query=Si%2C+C">Chenyang Si</a>, <a href="/search/cs?searchtype=author&query=Jiang%2C+Y">Yuming Jiang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+Y">Yaohui Wang</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+X">Xinyuan Chen</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+Y">Ying-Cong Chen</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Limin Wang</a>, <a href="/search/cs?searchtype=author&query=Lin%2C+D">Dahua Lin</a>, <a href="/search/cs?searchtype=author&query=Qiao%2C+Y">Yu Qiao</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+Z">Ziwei Liu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.13503v1-abstract-short" style="display: inline;"> Video generation has witnessed significant advancements, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal evaluation system should provide insights to inform future developments of video generation. To this end, we present VBench, a… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.13503v1-abstract-full').style.display = 'inline'; document.getElementById('2411.13503v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.13503v1-abstract-full" style="display: none;"> Video generation has witnessed significant advancements, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal evaluation system should provide insights to inform future developments of video generation. To this end, we present VBench, a comprehensive benchmark suite that dissects "video generation quality" into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods. VBench has several appealing properties: 1) Comprehensive Dimensions: VBench comprises 16 dimensions in video generation (e.g., subject identity inconsistency, motion smoothness, temporal flickering, and spatial relationship, etc). The evaluation metrics with fine-grained levels reveal individual models' strengths and weaknesses. 2) Human Alignment: We also provide a dataset of human preference annotations to validate our benchmarks' alignment with human perception, for each evaluation dimension respectively. 3) Valuable Insights: We look into current models' ability across various evaluation dimensions, and various content types. We also investigate the gaps between video and image generation models. 4) Versatile Benchmarking: VBench++ supports evaluating text-to-video and image-to-video. We introduce a high-quality Image Suite with an adaptive aspect ratio to enable fair evaluations across different image-to-video generation settings. Beyond assessing technical quality, VBench++ evaluates the trustworthiness of video generative models, providing a more holistic view of model performance. 5) Full Open-Sourcing: We fully open-source VBench++ and continually add new video generation models to our leaderboard to drive forward the field of video generation. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.13503v1-abstract-full').style.display = 'none'; document.getElementById('2411.13503v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 20 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Leaderboard: Code: Project page: extension of arXiv:2311.17982. arXiv admin note: substantial text overlap with arXiv:2311.17982</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.12161</a> <span> [<a href="">pdf</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Distributed, Parallel, and Cluster Computing">cs.DC</span> </div> </div> <p class="title is-5 mathjax"> Adaptive Cache Management for Complex Storage Systems Using CNN-LSTM-Based Spatiotemporal Prediction </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+X">Xiaoye Wang</a>, <a href="/search/cs?searchtype=author&query=Li%2C+X">Xuan Li</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Linji Wang</a>, <a href="/search/cs?searchtype=author&query=Ruan%2C+T">Tingyi Ruan</a>, <a href="/search/cs?searchtype=author&query=Li%2C+P">Pochun Li</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.12161v1-abstract-short" style="display: inline;"> This paper proposes an intelligent cache management strategy based on CNN-LSTM to improve the performance and cache hit rate of storage systems. Through comparative experiments with traditional algorithms (such as LRU and LFU) and other deep learning models (such as RNN, GRU-RNN and LSTM), the results show that the CNN-LSTM model has significant advantages in cache demand prediction. The MSE and M… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.12161v1-abstract-full').style.display = 'inline'; document.getElementById('2411.12161v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.12161v1-abstract-full" style="display: none;"> This paper proposes an intelligent cache management strategy based on CNN-LSTM to improve the performance and cache hit rate of storage systems. Through comparative experiments with traditional algorithms (such as LRU and LFU) and other deep learning models (such as RNN, GRU-RNN and LSTM), the results show that the CNN-LSTM model has significant advantages in cache demand prediction. The MSE and MAE values of this model are significantly reduced, proving its effectiveness under complex data access patterns. This study not only verifies the potential of deep learning technology in storage system optimization, but also provides direction and reference for further optimizing and improving cache management strategies. This intelligent cache management strategy performs well in complex storage environments. By combining the spatial feature extraction capabilities of convolutional neural networks and the time series modeling capabilities of long short-term memory networks, the CNN-LSTM model can more accurately predict cache needs, thereby Dynamically optimize cache allocation to improve system response speed and resource utilization. This research provides theoretical support and practical reference for cache optimization under large-scale data access modes, and is of great significance to improving the performance of future storage systems. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.12161v1-abstract-full').style.display = 'none'; document.getElementById('2411.12161v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 18 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.11581</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> OASIS: Open Agents Social Interaction Simulations on One Million Agents </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Yang%2C+Z">Ziyi Yang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+Z">Zaibin Zhang</a>, <a href="/search/cs?searchtype=author&query=Zheng%2C+Z">Zirui Zheng</a>, <a href="/search/cs?searchtype=author&query=Jiang%2C+Y">Yuxian Jiang</a>, <a href="/search/cs?searchtype=author&query=Gan%2C+Z">Ziyue Gan</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+Z">Zhiyu Wang</a>, <a href="/search/cs?searchtype=author&query=Ling%2C+Z">Zijian Ling</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+J">Jinsong Chen</a>, <a href="/search/cs?searchtype=author&query=Ma%2C+M">Martz Ma</a>, <a href="/search/cs?searchtype=author&query=Dong%2C+B">Bowen Dong</a>, <a href="/search/cs?searchtype=author&query=Gupta%2C+P">Prateek Gupta</a>, <a href="/search/cs?searchtype=author&query=Hu%2C+S">Shuyue Hu</a>, <a href="/search/cs?searchtype=author&query=Yin%2C+Z">Zhenfei Yin</a>, <a href="/search/cs?searchtype=author&query=Li%2C+G">Guohao Li</a>, <a href="/search/cs?searchtype=author&query=Jia%2C+X">Xu Jia</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lijun Wang</a>, <a href="/search/cs?searchtype=author&query=Ghanem%2C+B">Bernard Ghanem</a>, <a href="/search/cs?searchtype=author&query=Lu%2C+H">Huchuan Lu</a>, <a href="/search/cs?searchtype=author&query=Ouyang%2C+W">Wanli Ouyang</a>, <a href="/search/cs?searchtype=author&query=Qiao%2C+Y">Yu Qiao</a>, <a href="/search/cs?searchtype=author&query=Torr%2C+P">Philip Torr</a>, <a href="/search/cs?searchtype=author&query=Shao%2C+J">Jing Shao</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.11581v2-abstract-short" style="display: inline;"> There has been a growing interest in enhancing rule-based agent-based models (ABMs) for social media platforms (i.e., X, Reddit) with more realistic large language model (LLM) agents, thereby allowing for a more nuanced study of complex systems. As a result, several LLM-based ABMs have been proposed in the past year. While they hold promise, each simulator is specifically designed to study a parti… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.11581v2-abstract-full').style.display = 'inline'; document.getElementById('2411.11581v2-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.11581v2-abstract-full" style="display: none;"> There has been a growing interest in enhancing rule-based agent-based models (ABMs) for social media platforms (i.e., X, Reddit) with more realistic large language model (LLM) agents, thereby allowing for a more nuanced study of complex systems. As a result, several LLM-based ABMs have been proposed in the past year. While they hold promise, each simulator is specifically designed to study a particular scenario, making it time-consuming and resource-intensive to explore other phenomena using the same ABM. Additionally, these models simulate only a limited number of agents, whereas real-world social media platforms involve millions of users. To this end, we propose OASIS, a generalizable and scalable social media simulator. OASIS is designed based on real-world social media platforms, incorporating dynamically updated environments (i.e., dynamic social networks and post information), diverse action spaces (i.e., following, commenting), and recommendation systems (i.e., interest-based and hot-score-based). Additionally, OASIS supports large-scale user simulations, capable of modeling up to one million users. With these features, OASIS can be easily extended to different social media platforms to study large-scale group phenomena and behaviors. We replicate various social phenomena, including information spreading, group polarization, and herd effects across X and Reddit platforms. Moreover, we provide observations of social phenomena at different agent group scales. We observe that the larger agent group scale leads to more enhanced group dynamics and more diverse and helpful agents' opinions. These findings demonstrate OASIS's potential as a powerful tool for studying complex systems in digital environments. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.11581v2-abstract-full').style.display = 'none'; document.getElementById('2411.11581v2-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 18 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.11435</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=He%2C+J">Junwen He</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+Y">Yifan Wang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lijun Wang</a>, <a href="/search/cs?searchtype=author&query=Lu%2C+H">Huchuan Lu</a>, <a href="/search/cs?searchtype=author&query=He%2C+J">Jun-Yan He</a>, <a href="/search/cs?searchtype=author&query=Li%2C+C">Chenyang Li</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+H">Hanyuan Chen</a>, <a href="/search/cs?searchtype=author&query=Lan%2C+J">Jin-Peng Lan</a>, <a href="/search/cs?searchtype=author&query=Luo%2C+B">Bin Luo</a>, <a href="/search/cs?searchtype=author&query=Geng%2C+Y">Yifeng Geng</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.11435v1-abstract-short" style="display: inline;"> Text logo design heavily relies on the creativity and expertise of professional designers, in which arranging element layouts is one of the most important procedures. However, few attention has been paid to this specific task which needs to take precise textural details and user constraints into consideration, but only on the broader tasks such as document/poster layout generation. In this paper,… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.11435v1-abstract-full').style.display = 'inline'; document.getElementById('2411.11435v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.11435v1-abstract-full" style="display: none;"> Text logo design heavily relies on the creativity and expertise of professional designers, in which arranging element layouts is one of the most important procedures. However, few attention has been paid to this specific task which needs to take precise textural details and user constraints into consideration, but only on the broader tasks such as document/poster layout generation. In this paper, we propose a VLM-based framework that generates content-aware text logo layouts by integrating multi-modal inputs with user constraints, supporting a more flexible and stable layout design in real-world applications. We introduce two model techniques to reduce the computation for processing multiple glyph images simultaneously, while does not face performance degradation. To support instruction-tuning of out model, we construct two extensive text logo datasets, which are 5x more larger than the existing public dataset. Except for the geometric annotations (e.g. text masks and character recognition), we also compliment with comprehensive layout descriptions in natural language format, for more effective training to have reasoning ability when dealing with complex layouts and custom user constraints. Experimental studies demonstrate the effectiveness of our proposed model and datasets, when comparing with previous methods in various benchmarks to evaluate geometric aesthetics and human preferences. The code and datasets will be publicly available. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.11435v1-abstract-full').style.display = 'none'; document.getElementById('2411.11435v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 18 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.11348</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Fluid Dynamics">physics.flu-dyn</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Modeling Multivariable High-resolution 3D Urban Microclimate Using Localized Fourier Neural Operator </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Qin%2C+S">Shaoxiang Qin</a>, <a href="/search/cs?searchtype=author&query=Zhan%2C+D">Dongxue Zhan</a>, <a href="/search/cs?searchtype=author&query=Geng%2C+D">Dingyang Geng</a>, <a href="/search/cs?searchtype=author&query=Peng%2C+W">Wenhui Peng</a>, <a href="/search/cs?searchtype=author&query=Tian%2C+G">Geng Tian</a>, <a href="/search/cs?searchtype=author&query=Shi%2C+Y">Yurong Shi</a>, <a href="/search/cs?searchtype=author&query=Gao%2C+N">Naiping Gao</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+X">Xue Liu</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L+L">Liangzhu Leon Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.11348v1-abstract-short" style="display: inline;"> Accurate urban microclimate analysis with wind velocity and temperature is vital for energy-efficient urban planning, supporting carbon reduction, enhancing public health and comfort, and advancing the low-altitude economy. However, traditional computational fluid dynamics (CFD) simulations that couple velocity and temperature are computationally expensive. Recent machine learning advancements off… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.11348v1-abstract-full').style.display = 'inline'; document.getElementById('2411.11348v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.11348v1-abstract-full" style="display: none;"> Accurate urban microclimate analysis with wind velocity and temperature is vital for energy-efficient urban planning, supporting carbon reduction, enhancing public health and comfort, and advancing the low-altitude economy. However, traditional computational fluid dynamics (CFD) simulations that couple velocity and temperature are computationally expensive. Recent machine learning advancements offer promising alternatives for accelerating urban microclimate simulations. The Fourier neural operator (FNO) has shown efficiency and accuracy in predicting single-variable velocity magnitudes in urban wind fields. Yet, for multivariable high-resolution 3D urban microclimate prediction, FNO faces three key limitations: blurry output quality, high GPU memory demand, and substantial data requirements. To address these issues, we propose a novel localized Fourier neural operator (Local-FNO) model that employs local training, geometry encoding, and patch overlapping. Local-FNO provides accurate predictions for rapidly changing turbulence in urban microclimate over 60 seconds, four times the average turbulence integral time scale, with an average error of 0.35 m/s in velocity and 0.30 掳C in temperature. It also accurately captures turbulent heat flux represented by the velocity-temperature correlation. In a 2 km by 2 km domain, Local-FNO resolves turbulence patterns down to a 10 m resolution. It provides high-resolution predictions with 150 million feature dimensions on a single 32 GB GPU at nearly 50 times the speed of a CFD solver. Compared to FNO, Local-FNO achieves a 23.9% reduction in prediction error and a 47.3% improvement in turbulent fluctuation correlation. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.11348v1-abstract-full').style.display = 'none'; document.getElementById('2411.11348v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 18 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.11189</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Freqformer: Frequency-Domain Transformer for 3-D Visualization and Quantification of Human Retinal Circulation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lingyun Wang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+B">Bingjie Wang</a>, <a href="/search/cs?searchtype=author&query=Chhablani%2C+J">Jay Chhablani</a>, <a href="/search/cs?searchtype=author&query=Sahel%2C+J+A">Jose Alain Sahel</a>, <a href="/search/cs?searchtype=author&query=Pi%2C+S">Shaohua Pi</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.11189v1-abstract-short" style="display: inline;"> We introduce Freqformer, a novel Transformer-based architecture designed for 3-D, high-definition visualization of human retinal circulation from a single scan in commercial optical coherence tomography angiography (OCTA). Freqformer addresses the challenge of limited signal-to-noise ratio in OCTA volume by utilizing a complex-valued frequency-domain module (CFDM) and a simplified multi-head atten… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.11189v1-abstract-full').style.display = 'inline'; document.getElementById('2411.11189v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.11189v1-abstract-full" style="display: none;"> We introduce Freqformer, a novel Transformer-based architecture designed for 3-D, high-definition visualization of human retinal circulation from a single scan in commercial optical coherence tomography angiography (OCTA). Freqformer addresses the challenge of limited signal-to-noise ratio in OCTA volume by utilizing a complex-valued frequency-domain module (CFDM) and a simplified multi-head attention (Sim-MHA) mechanism. Using merged volumes as ground truth, Freqformer enables accurate reconstruction of retinal vasculature across the depth planes, allowing for 3-D quantification of capillary segments (count, density, and length). Our method outperforms state-of-the-art convolutional neural networks (CNNs) and several Transformer-based models, with superior performance in peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and learned perceptual image patch similarity (LPIPS). Furthermore, Freqformer demonstrates excellent generalizability across lower scanning density, effectively enhancing OCTA scans with larger fields of view (from 3$\times$3 $mm^{2}$ to 6$\times$6 $mm^{2}$ and 12$\times$12 $mm^{2}$). These results suggest that Freqformer can significantly improve the understanding and characterization of retinal circulation, offering potential clinical applications in diagnosing and managing retinal vascular diseases. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.11189v1-abstract-full').style.display = 'none'; document.getElementById('2411.11189v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 17 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.10962</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Yang%2C+L">Lei Yang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+X">Xinyu Zhang</a>, <a href="/search/cs?searchtype=author&query=Li%2C+J">Jun Li</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+C">Chen Wang</a>, <a href="/search/cs?searchtype=author&query=Song%2C+Z">Zhiying Song</a>, <a href="/search/cs?searchtype=author&query=Zhao%2C+T">Tong Zhao</a>, <a href="/search/cs?searchtype=author&query=Song%2C+Z">Ziying Song</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Li Wang</a>, <a href="/search/cs?searchtype=author&query=Zhou%2C+M">Mo Zhou</a>, <a href="/search/cs?searchtype=author&query=Shen%2C+Y">Yang Shen</a>, <a href="/search/cs?searchtype=author&query=Wu%2C+K">Kai Wu</a>, <a href="/search/cs?searchtype=author&query=Lv%2C+C">Chen Lv</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.10962v1-abstract-short" style="display: inline;"> Modern autonomous vehicle perception systems often struggle with occlusions and limited perception range. Previous studies have demonstrated the effectiveness of cooperative perception in extending the perception range and overcoming occlusions, thereby improving the safety of autonomous driving. In recent years, a series of cooperative perception datasets have emerged. However, these datasets onl… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10962v1-abstract-full').style.display = 'inline'; document.getElementById('2411.10962v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.10962v1-abstract-full" style="display: none;"> Modern autonomous vehicle perception systems often struggle with occlusions and limited perception range. Previous studies have demonstrated the effectiveness of cooperative perception in extending the perception range and overcoming occlusions, thereby improving the safety of autonomous driving. In recent years, a series of cooperative perception datasets have emerged. However, these datasets only focus on camera and LiDAR, overlooking 4D Radar, a sensor employed in single-vehicle autonomous driving for robust perception in adverse weather conditions. In this paper, to bridge the gap of missing 4D Radar datasets in cooperative perception, we present V2X-Radar, the first large real-world multi-modal dataset featuring 4D Radar. Our V2X-Radar dataset is collected using a connected vehicle platform and an intelligent roadside unit equipped with 4D Radar, LiDAR, and multi-view cameras. The collected data includes sunny and rainy weather conditions, spanning daytime, dusk, and nighttime, as well as typical challenging scenarios. The dataset comprises 20K LiDAR frames, 40K camera images, and 20K 4D Radar data, with 350K annotated bounding boxes across five categories. To facilitate diverse research domains, we establish V2X-Radar-C for cooperative perception, V2X-Radar-I for roadside perception, and V2X-Radar-V for single-vehicle perception. We further provide comprehensive benchmarks of recent perception algorithms on the above three sub-datasets. The dataset and benchmark codebase will be available at \url{}. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10962v1-abstract-full').style.display = 'none'; document.getElementById('2411.10962v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 16 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">11 pages, 5 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.10681</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Structured Dialogue System for Mental Health: An LLM Chatbot Leveraging the PM+ Guidelines </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chen%2C+Y">Yixiang Chen</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+X">Xinyu Zhang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+J">Jinran Wang</a>, <a href="/search/cs?searchtype=author&query=Xie%2C+X">Xurong Xie</a>, <a href="/search/cs?searchtype=author&query=Yan%2C+N">Nan Yan</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+H">Hui Chen</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lan Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.10681v1-abstract-short" style="display: inline;"> The Structured Dialogue System, referred to as SuDoSys, is an innovative Large Language Model (LLM)-based chatbot designed to provide psychological counseling. SuDoSys leverages the World Health Organization (WHO)'s Problem Management Plus (PM+) guidelines to deliver stage-aware multi-turn dialogues. Existing methods for employing an LLM in multi-turn psychological counseling typically involve dir… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10681v1-abstract-full').style.display = 'inline'; document.getElementById('2411.10681v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.10681v1-abstract-full" style="display: none;"> The Structured Dialogue System, referred to as SuDoSys, is an innovative Large Language Model (LLM)-based chatbot designed to provide psychological counseling. SuDoSys leverages the World Health Organization (WHO)'s Problem Management Plus (PM+) guidelines to deliver stage-aware multi-turn dialogues. Existing methods for employing an LLM in multi-turn psychological counseling typically involve direct fine-tuning using generated dialogues, often neglecting the dynamic stage shifts of counseling sessions. Unlike previous approaches, SuDoSys considers the different stages of counseling and stores essential information throughout the counseling process, ensuring coherent and directed conversations. The system employs an LLM, a stage-aware instruction generator, a response unpacker, a topic database, and a stage controller to maintain dialogue flow. In addition, we propose a novel technique that simulates counseling clients to interact with the evaluated system and evaluate its performance automatically. When assessed using both objective and subjective evaluations, SuDoSys demonstrates its effectiveness in generating logically coherent responses. The system's code and program scripts for evaluation are open-sourced. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10681v1-abstract-full').style.display = 'none'; document.getElementById('2411.10681v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 15 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted to the 16th International Conference on Social Robotic (ICSR 2024)</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.10449</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Human-Computer Interaction">cs.HC</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Love in Action: Gamifying Public Video Cameras for Fostering Social Relationships in Real World </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Zhang%2C+Z">Zhang Zhang</a>, <a href="/search/cs?searchtype=author&query=Li%2C+D">Da Li</a>, <a href="/search/cs?searchtype=author&query=Wu%2C+G">Geng Wu</a>, <a href="/search/cs?searchtype=author&query=Li%2C+Y">Yaoning Li</a>, <a href="/search/cs?searchtype=author&query=Sun%2C+X">Xiaobing Sun</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Liang Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.10449v1-abstract-short" style="display: inline;"> In this paper, we create "Love in Action" (LIA), a body language-based social game utilizing video cameras installed in public spaces to enhance social relationships in real-world. In the game, participants assume dual roles, i.e., requesters, who issue social requests, and performers, who respond social requests through performing specified body languages. To mediate the communication between par… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10449v1-abstract-full').style.display = 'inline'; document.getElementById('2411.10449v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.10449v1-abstract-full" style="display: none;"> In this paper, we create "Love in Action" (LIA), a body language-based social game utilizing video cameras installed in public spaces to enhance social relationships in real-world. In the game, participants assume dual roles, i.e., requesters, who issue social requests, and performers, who respond social requests through performing specified body languages. To mediate the communication between participants, we build an AI-enhanced video analysis system incorporating multiple visual analysis modules like person detection, attribute recognition, and action recognition, to assess the performer's body language quality. A two-week field study involving 27 participants shows significant improvements in their social friendships, as indicated by self-reported questionnaires. Moreover, user experiences are investigated to highlight the potential of public video cameras as a novel communication medium for socializing in public spaces. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10449v1-abstract-full').style.display = 'none'; document.getElementById('2411.10449v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 30 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">accepted as a main track paper by EAI-ArtsIT 2024</span> </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">MSC Class:</span> 14J60 (Primary) 14F05; 14J26 (Secondary) </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.10362</a> <span> [<a href="">pdf</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Human-Computer Interaction">cs.HC</span> </div> </div> <p class="title is-5 mathjax"> Interactive Cycle Model -- The Linkage Combination among Automatic Speech Recognition, Large Language Models and Smart Glasses </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+L">Libo Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.10362v1-abstract-short" style="display: inline;"> This research proposes the interaction loop model "ASR-LLM-Smart Glasses", which model combines automatic speech recognition, large language model and smart glasses to facilitate seamless human-computer interaction. And the methodology of this research involves decomposing the interaction process into different stages and elements. Speech is captured and processed by ASR, then analyzed and interpr… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10362v1-abstract-full').style.display = 'inline'; document.getElementById('2411.10362v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.10362v1-abstract-full" style="display: none;"> This research proposes the interaction loop model "ASR-LLM-Smart Glasses", which model combines automatic speech recognition, large language model and smart glasses to facilitate seamless human-computer interaction. And the methodology of this research involves decomposing the interaction process into different stages and elements. Speech is captured and processed by ASR, then analyzed and interpreted by LLM. The results are then transmitted to smart glasses for display. The feedback loop is complete when the user interacts with the displayed data. Mathematical formulas are used to quantify the performance of the model that revolves around core evaluation points: accuracy, coherence, and latency during ASR speech-to-text conversion. The research results are provided theoretically to test and evaluate the feasibility and performance of the model. Although such human-computer interaction products have not yet appeared in the industry, the performance indicators of this model in enhancing user experience in fields that rely on human-computer interaction have also verified its utility as a technology to promote human-computer interaction. In addition, this research pioneered the idea of integrating cutting-edge technologies such as generative pre-trained Transformer models into unique interaction models, LLM provides raw value through powerful evaluation techniques and innovative use, which provides a new perspective to evaluate and enhanced human-computer interaction. Keywords: Automatic speech recognition, Large Language Model, Smart glasses, Interaction mechanism <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10362v1-abstract-full').style.display = 'none'; document.getElementById('2411.10362v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 15 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">OpenReview submitted. 11 pages of text and 1 figure</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.10156</a> <span> [<a href="">pdf</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+L">Libo Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.10156v2-abstract-short" style="display: inline;"> To address the sycophancy problem caused by reinforcement learning from human feedback in large language models, this research applies synthetic data intervention technology to the decoder-only transformer architecture. Based on the research gaps in the existing literature, the researcher designed an experimental process to reduce the tendency of models to cater by generating diversified data, and… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10156v2-abstract-full').style.display = 'inline'; document.getElementById('2411.10156v2-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.10156v2-abstract-full" style="display: none;"> To address the sycophancy problem caused by reinforcement learning from human feedback in large language models, this research applies synthetic data intervention technology to the decoder-only transformer architecture. Based on the research gaps in the existing literature, the researcher designed an experimental process to reduce the tendency of models to cater by generating diversified data, and used GPT4o as an experimental tool for verification. The experiment used 100 true and false questions, and compared the performance of the model trained with synthetic data intervention and the original untrained model on multiple indicators. The results show that the SDI training model supports the technology in terms of accuracy rate and sycophancy rate and has significant effectiveness in reducing sycophancy phenomena. Notably, the data set, experimental process, code and data results have been uploaded to Github, the link is <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10156v2-abstract-full').style.display = 'none'; document.getElementById('2411.10156v2-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 20 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 15 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">This research is also submitted to OpenReview. The main text is 9 pages (excluding citations), 7 figures, and 1 table</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.09896</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Materials Science">cond-mat.mtrl-sci</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Revealing the Evolution of Order in Materials Microstructures Using Multi-Modal Computer Vision </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Ter-Petrosyan%2C+A">Arman Ter-Petrosyan</a>, <a href="/search/cs?searchtype=author&query=Holden%2C+M">Michael Holden</a>, <a href="/search/cs?searchtype=author&query=Bilbrey%2C+J+A">Jenna A. Bilbrey</a>, <a href="/search/cs?searchtype=author&query=Akers%2C+S">Sarah Akers</a>, <a href="/search/cs?searchtype=author&query=Doty%2C+C">Christina Doty</a>, <a href="/search/cs?searchtype=author&query=Yano%2C+K+H">Kayla H. Yano</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Le Wang</a>, <a href="/search/cs?searchtype=author&query=Paudel%2C+R">Rajendra Paudel</a>, <a href="/search/cs?searchtype=author&query=Lang%2C+E">Eric Lang</a>, <a href="/search/cs?searchtype=author&query=Hattar%2C+K">Khalid Hattar</a>, <a href="/search/cs?searchtype=author&query=Comes%2C+R+B">Ryan B. Comes</a>, <a href="/search/cs?searchtype=author&query=Du%2C+Y">Yingge Du</a>, <a href="/search/cs?searchtype=author&query=Matthews%2C+B+E">Bethany E. Matthews</a>, <a href="/search/cs?searchtype=author&query=Spurgeon%2C+S+R">Steven R. Spurgeon</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.09896v1-abstract-short" style="display: inline;"> The development of high-performance materials for microelectronics, energy storage, and extreme environments depends on our ability to describe and direct property-defining microstructural order. Our present understanding is typically derived from laborious manual analysis of imaging and spectroscopy data, which is difficult to scale, challenging to reproduce, and lacks the ability to reveal laten… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.09896v1-abstract-full').style.display = 'inline'; document.getElementById('2411.09896v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.09896v1-abstract-full" style="display: none;"> The development of high-performance materials for microelectronics, energy storage, and extreme environments depends on our ability to describe and direct property-defining microstructural order. Our present understanding is typically derived from laborious manual analysis of imaging and spectroscopy data, which is difficult to scale, challenging to reproduce, and lacks the ability to reveal latent associations needed for mechanistic models. Here, we demonstrate a multi-modal machine learning (ML) approach to describe order from electron microscopy analysis of the complex oxide La$_{1-x}$Sr$_x$FeO$_3$. We construct a hybrid pipeline based on fully and semi-supervised classification, allowing us to evaluate both the characteristics of each data modality and the value each modality adds to the ensemble. We observe distinct differences in the performance of uni- and multi-modal models, from which we draw general lessons in describing crystal order using computer vision. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.09896v1-abstract-full').style.display = 'none'; document.getElementById('2411.09896v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 14 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">30 pages, 5 figures, 2 tables</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.09593</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chatterjee%2C+S">Soumick Chatterjee</a>, <a href="/search/cs?searchtype=author&query=Mattern%2C+H">Hendrik Mattern</a>, <a href="/search/cs?searchtype=author&query=D%C3%B6rner%2C+M">Marc D枚rner</a>, <a href="/search/cs?searchtype=author&query=Sciarra%2C+A">Alessandro Sciarra</a>, <a href="/search/cs?searchtype=author&query=Dubost%2C+F">Florian Dubost</a>, <a href="/search/cs?searchtype=author&query=Schnurre%2C+H">Hannes Schnurre</a>, <a href="/search/cs?searchtype=author&query=Khatun%2C+R">Rupali Khatun</a>, <a href="/search/cs?searchtype=author&query=Yu%2C+C">Chun-Chih Yu</a>, <a href="/search/cs?searchtype=author&query=Hsieh%2C+T">Tsung-Lin Hsieh</a>, <a href="/search/cs?searchtype=author&query=Tsai%2C+Y">Yi-Shan Tsai</a>, <a href="/search/cs?searchtype=author&query=Fang%2C+Y">Yi-Zeng Fang</a>, <a href="/search/cs?searchtype=author&query=Yang%2C+Y">Yung-Ching Yang</a>, <a href="/search/cs?searchtype=author&query=Huang%2C+J">Juinn-Dar Huang</a>, <a href="/search/cs?searchtype=author&query=Xu%2C+M">Marshall Xu</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+S">Siyu Liu</a>, <a href="/search/cs?searchtype=author&query=Ribeiro%2C+F+L">Fernanda L. Ribeiro</a>, <a href="/search/cs?searchtype=author&query=Bollmann%2C+S">Saskia Bollmann</a>, <a href="/search/cs?searchtype=author&query=Chintalapati%2C+K+V">Karthikesh Varma Chintalapati</a>, <a href="/search/cs?searchtype=author&query=Radhakrishna%2C+C+M">Chethan Mysuru Radhakrishna</a>, <a href="/search/cs?searchtype=author&query=Kumara%2C+S+C+H+R">Sri Chandana Hudukula Ram Kumara</a>, <a href="/search/cs?searchtype=author&query=Sutrave%2C+R">Raviteja Sutrave</a>, <a href="/search/cs?searchtype=author&query=Qayyum%2C+A">Abdul Qayyum</a>, <a href="/search/cs?searchtype=author&query=Mazher%2C+M">Moona Mazher</a>, <a href="/search/cs?searchtype=author&query=Razzak%2C+I">Imran Razzak</a>, <a href="/search/cs?searchtype=author&query=Rodero%2C+C">Cristobal Rodero</a> , et al. (23 additional authors not shown) </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.09593v1-abstract-short" style="display: inline;"> The human brain receives nutrients and oxygen through an intricate network of blood vessels. Pathology affecting small vessels, at the mesoscopic scale, represents a critical vulnerability within the cerebral blood supply and can lead to severe conditions, such as Cerebral Small Vessel Diseases. The advent of 7 Tesla MRI systems has enabled the acquisition of higher spatial resolution images, maki… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.09593v1-abstract-full').style.display = 'inline'; document.getElementById('2411.09593v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.09593v1-abstract-full" style="display: none;"> The human brain receives nutrients and oxygen through an intricate network of blood vessels. Pathology affecting small vessels, at the mesoscopic scale, represents a critical vulnerability within the cerebral blood supply and can lead to severe conditions, such as Cerebral Small Vessel Diseases. The advent of 7 Tesla MRI systems has enabled the acquisition of higher spatial resolution images, making it possible to visualise such vessels in the brain. However, the lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms. To address this, the SMILE-UHURA challenge was organised. This challenge, held in conjunction with the ISBI 2023, in Cartagena de Indias, Colombia, aimed to provide a platform for researchers working on related topics. The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI. This dataset was created through a combination of automated pre-segmentation and extensive manual refinement. In this manuscript, sixteen submitted methods and two baseline methods are compared both quantitatively and qualitatively on two different datasets: held-out test MRAs from the same dataset as the training data (with labels kept secret) and a separate 7T ToF MRA dataset where both input volumes and labels are kept secret. The results demonstrate that most of the submitted deep learning methods, trained on the provided training dataset, achieved reliable segmentation performance. Dice scores reached up to 0.838 $\pm$ 0.066 and 0.716 $\pm$ 0.125 on the respective datasets, with an average performance of up to 0.804 $\pm$ 0.15. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.09593v1-abstract-full').style.display = 'none'; document.getElementById('2411.09593v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 14 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.09451</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> DiffRoad: Realistic and Diverse Road Scenario Generation for Autonomous Vehicle Testing </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Zhou%2C+J">Junjie Zhou</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lin Wang</a>, <a href="/search/cs?searchtype=author&query=Meng%2C+Q">Qiang Meng</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+X">Xiaofan Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.09451v1-abstract-short" style="display: inline;"> Generating realistic and diverse road scenarios is essential for autonomous vehicle testing and validation. Nevertheless, owing to the complexity and variability of real-world road environments, creating authentic and varied scenarios for intelligent driving testing is challenging. In this paper, we propose DiffRoad, a novel diffusion model designed to produce controllable and high-fidelity 3D roa… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.09451v1-abstract-full').style.display = 'inline'; document.getElementById('2411.09451v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.09451v1-abstract-full" style="display: none;"> Generating realistic and diverse road scenarios is essential for autonomous vehicle testing and validation. Nevertheless, owing to the complexity and variability of real-world road environments, creating authentic and varied scenarios for intelligent driving testing is challenging. In this paper, we propose DiffRoad, a novel diffusion model designed to produce controllable and high-fidelity 3D road scenarios. DiffRoad leverages the generative capabilities of diffusion models to synthesize road layouts from white noise through an inverse denoising process, preserving real-world spatial features. To enhance the quality of generated scenarios, we design the Road-UNet architecture, optimizing the balance between backbone and skip connections for high-realism scenario generation. Furthermore, we introduce a road scenario evaluation module that screens adequate and reasonable scenarios for intelligent driving testing using two critical metrics: road continuity and road reasonableness. Experimental results on multiple real-world datasets demonstrate DiffRoad's ability to generate realistic and smooth road structures while maintaining the original distribution. Additionally, the generated scenarios can be fully automated into the OpenDRIVE format, facilitating generalized autonomous vehicle simulation testing. DiffRoad provides a rich and diverse scenario library for large-scale autonomous vehicle testing and offers valuable insights for future infrastructure designs that are better suited for autonomous vehicles. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.09451v1-abstract-full').style.display = 'none'; document.getElementById('2411.09451v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 14 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">14 pages, 9 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.09111</a> <span> [<a href="">pdf</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Reducing Reasoning Costs -- The Path of Optimization for Chain of Thought via Sparse Attention Mechanism </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+L">Libo Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.09111v2-abstract-short" style="display: inline;"> In order to address the chain of thought in the large language model inference cost surge, this research proposes to use a sparse attention mechanism that only focuses on a few relevant tokens. The researcher constructed a new attention mechanism and used GiantRabbit trained with custom GPTs as an experimental tool. The experiment tested and compared the reasoning time, correctness score and chain… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.09111v2-abstract-full').style.display = 'inline'; document.getElementById('2411.09111v2-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.09111v2-abstract-full" style="display: none;"> In order to address the chain of thought in the large language model inference cost surge, this research proposes to use a sparse attention mechanism that only focuses on a few relevant tokens. The researcher constructed a new attention mechanism and used GiantRabbit trained with custom GPTs as an experimental tool. The experiment tested and compared the reasoning time, correctness score and chain of thought length of this model and o1 Preview in solving the linear algebra test questions of MIT OpenCourseWare. The results show that GiantRabbit's reasoning time and chain of thought length are significantly lower than o1 Preview, confirming the feasibility of the sparse attention mechanism in reducing chain of thought reasoning. Detailed architectural details and experimental process have been uploaded to Github, the link is: <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.09111v2-abstract-full').style.display = 'none'; document.getElementById('2411.09111v2-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 15 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 13 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">The main text is 9 pages, totaling 13 pages; 5 figures, 3 tables; preprints have been submitted to NeurIPS 2024 Workshop MusIML and OpenReview</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.08840</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Multimodal Instruction Tuning with Hybrid State Space Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Zhou%2C+J">Jianing Zhou</a>, <a href="/search/cs?searchtype=author&query=Li%2C+H">Han Li</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+S">Shuai Zhang</a>, <a href="/search/cs?searchtype=author&query=Xie%2C+N">Ning Xie</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+R">Ruijie Wang</a>, <a href="/search/cs?searchtype=author&query=Nie%2C+X">Xiaohan Nie</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+S">Sheng Liu</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lingyun Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.08840v1-abstract-short" style="display: inline;"> Handling lengthy context is crucial for enhancing the recognition and understanding capabilities of multimodal large language models (MLLMs) in applications such as processing high-resolution images or high frame rate videos. The rise in image resolution and frame rate substantially increases computational demands due to the increased number of input tokens. This challenge is further exacerbated b… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.08840v1-abstract-full').style.display = 'inline'; document.getElementById('2411.08840v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.08840v1-abstract-full" style="display: none;"> Handling lengthy context is crucial for enhancing the recognition and understanding capabilities of multimodal large language models (MLLMs) in applications such as processing high-resolution images or high frame rate videos. The rise in image resolution and frame rate substantially increases computational demands due to the increased number of input tokens. This challenge is further exacerbated by the quadratic complexity with respect to sequence length of the self-attention mechanism. Most prior works either pre-train models with long contexts, overlooking the efficiency problem, or attempt to reduce the context length via downsampling (e.g., identify the key image patches or frames) to decrease the context length, which may result in information loss. To circumvent this issue while keeping the remarkable effectiveness of MLLMs, we propose a novel approach using a hybrid transformer-MAMBA model to efficiently handle long contexts in multimodal applications. Our multimodal model can effectively process long context input exceeding 100k tokens, outperforming existing models across various benchmarks. Remarkably, our model enhances inference efficiency for high-resolution images and high-frame-rate videos by about 4 times compared to current models, with efficiency gains increasing as image resolution or video frames rise. Furthermore, our model is the first to be trained on low-resolution images or low-frame-rate videos while being capable of inference on high-resolution images and high-frame-rate videos, offering flexibility for inference in diverse scenarios. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.08840v1-abstract-full').style.display = 'none'; document.getElementById('2411.08840v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.08494</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Performance">cs.PF</span> </div> </div> <p class="title is-5 mathjax"> Achieving Consistent and Comparable CPU Evaluation Outcomes </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+C">Chenxi Wang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lei Wang</a>, <a href="/search/cs?searchtype=author&query=Gao%2C+W">Wanling Gao</a>, <a href="/search/cs?searchtype=author&query=Yang%2C+Y">Yikang Yang</a>, <a href="/search/cs?searchtype=author&query=Zhou%2C+Y">Yutong Zhou</a>, <a href="/search/cs?searchtype=author&query=Zhan%2C+J">Jianfeng Zhan</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.08494v1-abstract-short" style="display: inline;"> The SPEC CPU2017 benchmark suite is an industry standard for accessing CPU performance. It adheres strictly to some workload and system configurations - arbitrary specificity - while leaving other system configurations undefined - arbitrary ambiguity. This article reveals: (1) Arbitrary specificity proves not meaningful, obscuring many scenarios, as evidenced by significant performance variations,… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.08494v1-abstract-full').style.display = 'inline'; document.getElementById('2411.08494v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.08494v1-abstract-full" style="display: none;"> The SPEC CPU2017 benchmark suite is an industry standard for accessing CPU performance. It adheres strictly to some workload and system configurations - arbitrary specificity - while leaving other system configurations undefined - arbitrary ambiguity. This article reveals: (1) Arbitrary specificity proves not meaningful, obscuring many scenarios, as evidenced by significant performance variations, a 74.49x performance difference observed on the same CPU. (2) Arbitrary ambiguity is unfair as it fails to establish the same configurations for comparing different CPUs. We propose an innovative CPU evaluation methodology. It considers all workload and system configurations valid and mandates each configuration to be well-defined to avoid arbitrary specificity and ambiguity. To reduce the evaluation cost, a sampling approach is proposed to select a subset of the configurations. To expose CPU performance under different scenarios, it treats all outcomes under each configuration as equally important. Finally, it utilizes confidence level and confidence interval to report the outcomes to avoid bias. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.08494v1-abstract-full').style.display = 'none'; document.getElementById('2411.08494v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.08443</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Machine Unlearning on Pre-trained Models by Residual Feature Alignment Using LoRA </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Qin%2C+L">Laiqiao Qin</a>, <a href="/search/cs?searchtype=author&query=Zhu%2C+T">Tianqing Zhu</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Linlin Wang</a>, <a href="/search/cs?searchtype=author&query=Zhou%2C+W">Wanlei Zhou</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.08443v1-abstract-short" style="display: inline;"> Machine unlearning is new emerged technology that removes a subset of the training data from a trained model without affecting the model performance on the remaining data. This topic is becoming increasingly important in protecting user privacy and eliminating harmful or outdated data. The key challenge lies in effectively and efficiently unlearning specific information without compromising the mo… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.08443v1-abstract-full').style.display = 'inline'; document.getElementById('2411.08443v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.08443v1-abstract-full" style="display: none;"> Machine unlearning is new emerged technology that removes a subset of the training data from a trained model without affecting the model performance on the remaining data. This topic is becoming increasingly important in protecting user privacy and eliminating harmful or outdated data. The key challenge lies in effectively and efficiently unlearning specific information without compromising the model's utility on the retained data. For the pre-trained models, fine-tuning is an important way to achieve the unlearning target. Previous work typically fine-tuned the entire model's parameters, which incurs significant computation costs. In addition, the fine-tuning process may cause shifts in the intermediate layer features, affecting the model's overall utility. In this work, we propose a novel and efficient machine unlearning method on pre-trained models. We term the method as Residual Feature Alignment Unlearning. Specifically, we leverage LoRA (Low-Rank Adaptation) to decompose the model's intermediate features into pre-trained features and residual features. By adjusting the residual features, we align the unlearned model with the pre-trained model at the intermediate feature level to achieve both unlearning and remaining targets. The method aims to learn the zero residuals on the retained set and shifted residuals on the unlearning set. Extensive experiments on numerous datasets validate the effectiveness of our approach. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.08443v1-abstract-full').style.display = 'none'; document.getElementById('2411.08443v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.08437</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Neural and Evolutionary Computing">cs.NE</span> </div> </div> <p class="title is-5 mathjax"> Evolutionary Algorithm with Detection Region Method for Constrained Multi-Objective Problems with Binary Constraints </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Huang%2C+W">Weixiong Huang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+R">Rui Wang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+T">Tao Zhang</a>, <a href="/search/cs?searchtype=author&query=Qi%2C+S">Sheng Qi</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Ling Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.08437v1-abstract-short" style="display: inline;"> Solving constrained multi-objective optimization problems (CMOPs) is a challenging task. While many practical algorithms have been developed to tackle CMOPs, real-world scenarios often present cases where the constraint functions are unknown or unquantifiable, resulting in only binary outcomes (feasible or infeasible). This limitation reduces the effectiveness of constraint violation guidance, whi… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.08437v1-abstract-full').style.display = 'inline'; document.getElementById('2411.08437v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.08437v1-abstract-full" style="display: none;"> Solving constrained multi-objective optimization problems (CMOPs) is a challenging task. While many practical algorithms have been developed to tackle CMOPs, real-world scenarios often present cases where the constraint functions are unknown or unquantifiable, resulting in only binary outcomes (feasible or infeasible). This limitation reduces the effectiveness of constraint violation guidance, which can negatively impact the performance of existing algorithms that rely on this approach. Such challenges are particularly detrimental for algorithms employing the epsilon-based method, as they hinder effective relaxation of the feasible region. To address these challenges, this paper proposes a novel algorithm called DRMCMO based on the detection region method. In DRMCMO, detection regions dynamic monitor feasible solutions to enhance convergence, helping the population escape local optima. Additionally, these regions collaborate with the neighbor pairing strategy to improve population diversity within narrow feasible areas. We have modified three existing test suites to serve as benchmark test problems for CMOPs with binary constraints(CMOP/BC) and conducted comprehensive comparative experiments with state-of-the-art algorithms on these test suites and real-world problems. The results demonstrate the strong competitiveness of DRMCMO against state-of-the-art algorithms. Given the limited research on CMOP/BC, our study offers a new perspective for advancing this field. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.08437v1-abstract-full').style.display = 'none'; document.getElementById('2411.08437v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.08126</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Bian%2C+Z">Zeyu Bian</a>, <a href="/search/cs?searchtype=author&query=Qi%2C+Z">Zhengling Qi</a>, <a href="/search/cs?searchtype=author&query=Shi%2C+C">Cong Shi</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lan Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.08126v1-abstract-short" style="display: inline;"> This paper studies offline dynamic pricing without data coverage assumption, thereby allowing for any price including the optimal one not being observed in the offline data. Previous approaches that rely on the various coverage assumptions such as that the optimal prices are observable, would lead to suboptimal decisions and consequently, reduced profits. We address this challenge by framing the p… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.08126v1-abstract-full').style.display = 'inline'; document.getElementById('2411.08126v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.08126v1-abstract-full" style="display: none;"> This paper studies offline dynamic pricing without data coverage assumption, thereby allowing for any price including the optimal one not being observed in the offline data. Previous approaches that rely on the various coverage assumptions such as that the optimal prices are observable, would lead to suboptimal decisions and consequently, reduced profits. We address this challenge by framing the problem to a partial identification framework. Specifically, we establish a partial identification bound for the demand parameter whose associated price is unobserved by leveraging the inherent monotonicity property in the pricing problem. We further incorporate pessimistic and opportunistic strategies within the proposed partial identification framework to derive the estimated policy. Theoretically, we establish rate-optimal finite-sample regret guarantees for both strategies. Empirically, we demonstrate the superior performance of the newly proposed methods via a synthetic environment. This research provides practitioners with valuable insights into offline pricing strategies in the challenging no-coverage setting, ultimately fostering sustainable growth and profitability of the company. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.08126v1-abstract-full').style.display = 'none'; document.getElementById('2411.08126v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 12 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.07130</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Zou%2C+K">Kaijian Zou</a>, <a href="/search/cs?searchtype=author&query=Khalifa%2C+M">Muhammad Khalifa</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lu Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.07130v1-abstract-short" style="display: inline;"> Language models (LMs) have demonstrated an improved capacity to handle long-context information, yet existing long-context benchmarks primarily measure LMs' retrieval abilities with extended inputs, e.g., pinpointing a short phrase from long-form text. Therefore, they may fall short when evaluating models' global context understanding capacity, such as synthesizing and reasoning over content acros… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.07130v1-abstract-full').style.display = 'inline'; document.getElementById('2411.07130v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.07130v1-abstract-full" style="display: none;"> Language models (LMs) have demonstrated an improved capacity to handle long-context information, yet existing long-context benchmarks primarily measure LMs' retrieval abilities with extended inputs, e.g., pinpointing a short phrase from long-form text. Therefore, they may fall short when evaluating models' global context understanding capacity, such as synthesizing and reasoning over content across input to generate the response. In this paper, we study long-context language model (LCLM) evaluation through many-shot in-context learning (ICL). Concretely, we identify the skills each ICL task requires, and examine models' long-context capabilities on them. We first ask: What types of ICL tasks benefit from additional demonstrations, and are these tasks effective at evaluating LCLMs? We find that classification and summarization tasks show notable performance improvements with additional demonstrations, while translation and reasoning tasks do not exhibit clear trends. This suggests the classification tasks predominantly test models' retrieval skills. Next, we ask: To what extent does each task require retrieval skills versus global context understanding from LCLMs? We develop metrics to categorize ICL tasks into two groups: (i) retrieval tasks that require strong retrieval ability to pinpoint relevant examples, and (ii) global context understanding tasks that necessitate a deeper comprehension of the full input. We find that not all datasets can effectively evaluate these long-context capabilities. To address this gap, we introduce a new many-shot ICL benchmark, MANYICLBENCH, designed to characterize LCLMs' retrieval and global context understanding capabilities separately. Benchmarking 11 open-weight LCLMs with MANYICLBENCH, we find that while state-of-the-art models perform well in retrieval tasks up to 64k tokens, many show significant drops in global context tasks at just 16k tokens. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.07130v1-abstract-full').style.display = 'none'; document.getElementById('2411.07130v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 11 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.06652</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> LFSamba: Marry SAM with Mamba for Light Field Salient Object Detection </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Liu%2C+Z">Zhengyi Liu</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Longzhen Wang</a>, <a href="/search/cs?searchtype=author&query=Fang%2C+X">Xianyong Fang</a>, <a href="/search/cs?searchtype=author&query=Tu%2C+Z">Zhengzheng Tu</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Linbo Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.06652v1-abstract-short" style="display: inline;"> A light field camera can reconstruct 3D scenes using captured multi-focus images that contain rich spatial geometric information, enhancing applications in stereoscopic photography, virtual reality, and robotic vision. In this work, a state-of-the-art salient object detection model for multi-focus light field images, called LFSamba, is introduced to emphasize four main insights: (a) Efficient feat… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.06652v1-abstract-full').style.display = 'inline'; document.getElementById('2411.06652v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.06652v1-abstract-full" style="display: none;"> A light field camera can reconstruct 3D scenes using captured multi-focus images that contain rich spatial geometric information, enhancing applications in stereoscopic photography, virtual reality, and robotic vision. In this work, a state-of-the-art salient object detection model for multi-focus light field images, called LFSamba, is introduced to emphasize four main insights: (a) Efficient feature extraction, where SAM is used to extract modality-aware discriminative features; (b) Inter-slice relation modeling, leveraging Mamba to capture long-range dependencies across multiple focal slices, thus extracting implicit depth cues; (c) Inter-modal relation modeling, utilizing Mamba to integrate all-focus and multi-focus images, enabling mutual enhancement; (d) Weakly supervised learning capability, developing a scribble annotation dataset from an existing pixel-level mask dataset, establishing the first scribble-supervised baseline for light field salient object detection. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.06652v1-abstract-full').style.display = 'none'; document.getElementById('2411.06652v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted by SPL</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.06471</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Graphics">cs.GR</span> </div> </div> <p class="title is-5 mathjax"> Towards Voronoi Diagrams of Surface Patches </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+P">Pengfei Wang</a>, <a href="/search/cs?searchtype=author&query=Song%2C+J">Jiantao Song</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lei Wang</a>, <a href="/search/cs?searchtype=author&query=Xin%2C+S">Shiqing Xin</a>, <a href="/search/cs?searchtype=author&query=Yan%2C+D">Dongming Yan</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+S">Shuangmin Chen</a>, <a href="/search/cs?searchtype=author&query=Tu%2C+C">Changhe Tu</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+W">Wenping Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.06471v1-abstract-short" style="display: inline;"> Extraction of a high-fidelity 3D medial axis is a crucial operation in CAD. When dealing with a polygonal model as input, ensuring accuracy and tidiness becomes challenging due to discretization errors inherent in the mesh surface. Commonly, existing approaches yield medial-axis surfaces with various artifacts, including zigzag boundaries, bumpy surfaces, unwanted spikes, and non-smooth stitching… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.06471v1-abstract-full').style.display = 'inline'; document.getElementById('2411.06471v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.06471v1-abstract-full" style="display: none;"> Extraction of a high-fidelity 3D medial axis is a crucial operation in CAD. When dealing with a polygonal model as input, ensuring accuracy and tidiness becomes challenging due to discretization errors inherent in the mesh surface. Commonly, existing approaches yield medial-axis surfaces with various artifacts, including zigzag boundaries, bumpy surfaces, unwanted spikes, and non-smooth stitching curves. Considering that the surface of a CAD model can be easily decomposed into a collection of surface patches, its 3D medial axis can be extracted by computing the Voronoi diagram of these surface patches, where each surface patch serves as a generator. However, no solver currently exists for accurately computing such an extended Voronoi diagram. Under the assumption that each generator defines a linear distance field over a sufficiently small range, our approach operates by tetrahedralizing the region of interest and computing the medial axis within each tetrahedral element. Just as SurfaceVoronoi computes surface-based Voronoi diagrams by cutting a 3D prism with 3D planes (each plane encodes a linear field in a triangle), the key operation in this paper is to conduct the hyperplane cutting process in 4D, where each hyperplane encodes a linear field in a tetrahedron. In comparison with the state-of-the-art, our algorithm produces better outcomes. Furthermore, it can also be used to compute the offset surface. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.06471v1-abstract-full').style.display = 'none'; document.getElementById('2411.06471v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.06456</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wu%2C+C">Chen Wu</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Ling Wang</a>, <a href="/search/cs?searchtype=author&query=Peng%2C+L">Long Peng</a>, <a href="/search/cs?searchtype=author&query=Lu%2C+D">Dianjie Lu</a>, <a href="/search/cs?searchtype=author&query=Zheng%2C+Z">Zhuoran Zheng</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.06456v1-abstract-short" style="display: inline;"> With the popularization of high-end mobile devices, Ultra-high-definition (UHD) images have become ubiquitous in our lives. The restoration of UHD images is a highly challenging problem due to the exaggerated pixel count, which often leads to memory overflow during processing. Existing methods either downsample UHD images at a high rate before processing or split them into multiple patches for sep… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.06456v1-abstract-full').style.display = 'inline'; document.getElementById('2411.06456v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.06456v1-abstract-full" style="display: none;"> With the popularization of high-end mobile devices, Ultra-high-definition (UHD) images have become ubiquitous in our lives. The restoration of UHD images is a highly challenging problem due to the exaggerated pixel count, which often leads to memory overflow during processing. Existing methods either downsample UHD images at a high rate before processing or split them into multiple patches for separate processing. However, high-rate downsampling leads to significant information loss, while patch-based approaches inevitably introduce boundary artifacts. In this paper, we propose a novel design paradigm to solve the UHD image restoration problem, called D2Net. D2Net enables direct full-resolution inference on UHD images without the need for high-rate downsampling or dividing the images into several patches. Specifically, we ingeniously utilize the characteristics of the frequency domain to establish long-range dependencies of features. Taking into account the richer local patterns in UHD images, we also design a multi-scale convolutional group to capture local features. Additionally, during the decoding stage, we dynamically incorporate features from the encoding stage to reduce the flow of irrelevant information. Extensive experiments on three UHD image restoration tasks, including low-light image enhancement, image dehazing, and image deblurring, show that our model achieves better quantitative and qualitative results than state-of-the-art methods. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.06456v1-abstract-full').style.display = 'none'; document.getElementById('2411.06456v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">WACV2025</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.06172</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Cryptography and Security">cs.CR</span> </div> </div> <p class="title is-5 mathjax"> IDU-Detector: A Synergistic Framework for Robust Masquerader Attack Detection </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Huang%2C+Z">Zilin Huang</a>, <a href="/search/cs?searchtype=author&query=Li%2C+X">Xiulai Li</a>, <a href="/search/cs?searchtype=author&query=Cao%2C+X">Xinyi Cao</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+K">Ke Chen</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Longjuan Wang</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+L+B">Logan Bo-Yee Liu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.06172v1-abstract-short" style="display: inline;"> In the digital age, users store personal data in corporate databases, making data security central to enterprise management. Given the extensive attack surface, assets face challenges like weak authentication, vulnerabilities, and malware. Attackers may exploit vulnerabilities to gain unauthorized access, masquerading as legitimate users. Such attacks can lead to privacy breaches, business disrupt… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.06172v1-abstract-full').style.display = 'inline'; document.getElementById('2411.06172v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.06172v1-abstract-full" style="display: none;"> In the digital age, users store personal data in corporate databases, making data security central to enterprise management. Given the extensive attack surface, assets face challenges like weak authentication, vulnerabilities, and malware. Attackers may exploit vulnerabilities to gain unauthorized access, masquerading as legitimate users. Such attacks can lead to privacy breaches, business disruption, financial losses, and reputational damage. Complex attack vectors blur lines between insider and external threats. To address this, we introduce the IDU-Detector, integrating Intrusion Detection Systems (IDS) with User and Entity Behavior Analytics (UEBA). This integration monitors unauthorized access, bridges system gaps, ensures continuous monitoring, and enhances threat identification. Existing insider threat datasets lack depth and coverage of diverse attack vectors. This hinders detection technologies from addressing complex attack surfaces. We propose new, diverse datasets covering more attack scenarios, enhancing detection technologies. Testing our framework, the IDU-Detector achieved average accuracies of 98.96% and 99.12%. These results show effectiveness in detecting attacks, improving security and response speed, and providing higher asset safety assurance. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.06172v1-abstract-full').style.display = 'none'; document.getElementById('2411.06172v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 9 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.05895</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> One Small and One Large for Document-level Event Argument Extraction </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Peng%2C+J">Jiaren Peng</a>, <a href="/search/cs?searchtype=author&query=Sun%2C+H">Hongda Sun</a>, <a href="/search/cs?searchtype=author&query=Yang%2C+W">Wenzhong Yang</a>, <a href="/search/cs?searchtype=author&query=Wei%2C+F">Fuyuan Wei</a>, <a href="/search/cs?searchtype=author&query=He%2C+L">Liang He</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Liejun Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.05895v1-abstract-short" style="display: inline;"> Document-level Event Argument Extraction (EAE) faces two challenges due to increased input length: 1) difficulty in distinguishing semantic boundaries between events, and 2) interference from redundant information. To address these issues, we propose two methods. The first method introduces the Co and Structure Event Argument Extraction model (CsEAE) based on Small Language Models (SLMs). CsEAE in… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05895v1-abstract-full').style.display = 'inline'; document.getElementById('2411.05895v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.05895v1-abstract-full" style="display: none;"> Document-level Event Argument Extraction (EAE) faces two challenges due to increased input length: 1) difficulty in distinguishing semantic boundaries between events, and 2) interference from redundant information. To address these issues, we propose two methods. The first method introduces the Co and Structure Event Argument Extraction model (CsEAE) based on Small Language Models (SLMs). CsEAE includes a co-occurrences-aware module, which integrates information about all events present in the current input through context labeling and co-occurrences event prompts extraction. Additionally, CsEAE includes a structure-aware module that reduces interference from redundant information by establishing structural relationships between the sentence containing the trigger and other sentences in the document. The second method introduces new prompts to transform the extraction task into a generative task suitable for Large Language Models (LLMs), addressing gaps in EAE performance using LLMs under Supervised Fine-Tuning (SFT) conditions. We also fine-tuned multiple datasets to develop an LLM that performs better across most datasets. Finally, we applied insights from CsEAE to LLMs, achieving further performance improvements. This suggests that reliable insights validated on SLMs are also applicable to LLMs. We tested our models on the Rams, WikiEvents, and MLEE datasets. The CsEAE model achieved improvements of 2.1\%, 2.3\%, and 3.2\% in the Arg-C F1 metric compared to the baseline, PAIE~\cite{PAIE}. For LLMs, we demonstrated that their performance on document-level datasets is comparable to that of SLMs~\footnote{All code is available at}. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05895v1-abstract-full').style.display = 'none'; document.getElementById('2411.05895v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 8 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.05540</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Software Engineering">cs.SE</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> CRepair: CVAE-based Automatic Vulnerability Repair Technology </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Liu%2C+P">Penghui Liu</a>, <a href="/search/cs?searchtype=author&query=Bi%2C+Y">Yingzhou Bi</a>, <a href="/search/cs?searchtype=author&query=Huang%2C+J">Jiangtao Huang</a>, <a href="/search/cs?searchtype=author&query=Jiang%2C+X">Xinxin Jiang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lianmei Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.05540v1-abstract-short" style="display: inline;"> Software vulnerabilities are flaws in computer software systems that pose significant threats to the integrity, security, and reliability of modern software and its application data. These vulnerabilities can lead to substantial economic losses across various industries. Manual vulnerability repair is not only time-consuming but also prone to errors. To address the challenges of vulnerability repa… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05540v1-abstract-full').style.display = 'inline'; document.getElementById('2411.05540v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.05540v1-abstract-full" style="display: none;"> Software vulnerabilities are flaws in computer software systems that pose significant threats to the integrity, security, and reliability of modern software and its application data. These vulnerabilities can lead to substantial economic losses across various industries. Manual vulnerability repair is not only time-consuming but also prone to errors. To address the challenges of vulnerability repair, researchers have proposed various solutions, with learning-based automatic vulnerability repair techniques gaining widespread attention. However, existing methods often focus on learning more vulnerability data to improve repair outcomes, while neglecting the diverse characteristics of vulnerable code, and suffer from imprecise vulnerability localization.To address these shortcomings, this paper proposes CRepair, a CVAE-based automatic vulnerability repair technology aimed at fixing security vulnerabilities in system code. We first preprocess the vulnerability data using a prompt-based method to serve as input to the model. Then, we apply causal inference techniques to map the vulnerability feature data to probability distributions. By employing multi-sample feature fusion, we capture diverse vulnerability feature information. Finally, conditional control is used to guide the model in repairing the vulnerabilities.Experimental results demonstrate that the proposed method significantly outperforms other benchmark models, achieving a perfect repair rate of 52%. The effectiveness of the approach is validated from multiple perspectives, advancing AI-driven code vulnerability repair and showing promising applications. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05540v1-abstract-full').style.display = 'none'; document.getElementById('2411.05540v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 8 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.05435</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Human-Computer Interaction">cs.HC</span> </div> </div> <p class="title is-5 mathjax"> StoryExplorer: A Visualization Framework for Storyline Generation of Textual Narratives </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Ye%2C+L">Li Ye</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lei Wang</a>, <a href="/search/cs?searchtype=author&query=Ruan%2C+S">Shaolun Ruan</a>, <a href="/search/cs?searchtype=author&query=Meng%2C+Y">Yuwei Meng</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+Y">Yigang Wang</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+W">Wei Chen</a>, <a href="/search/cs?searchtype=author&query=Zhou%2C+Z">Zhiguang Zhou</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.05435v1-abstract-short" style="display: inline;"> In the context of the exponentially increasing volume of narrative texts such as novels and news, readers struggle to extract and consistently remember storyline from these intricate texts due to the constraints of human working memory and attention span. To tackle this issue, we propose a visualization approach StoryExplorer, which facilitates the process of knowledge externalization of narrative… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05435v1-abstract-full').style.display = 'inline'; document.getElementById('2411.05435v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.05435v1-abstract-full" style="display: none;"> In the context of the exponentially increasing volume of narrative texts such as novels and news, readers struggle to extract and consistently remember storyline from these intricate texts due to the constraints of human working memory and attention span. To tackle this issue, we propose a visualization approach StoryExplorer, which facilitates the process of knowledge externalization of narrative texts and further makes the form of mental models more coherent. Through the formative study and close collaboration with 2 domain experts, we identified key challenges for the extraction of the storyline. Guided by the distilled requirements, we then propose a set of workflow (i.e., insight finding-scripting-storytelling) to enable users to interactively generate fragments of narrative structures. We then propose a visualization system StoryExplorer which combines stroke annotation and GPT-based visual hints to quickly extract story fragments and interactively construct storyline. To evaluate the effectiveness and usefulness of StoryExplorer, we conducted 2 case studies and in-depth user interviews with 16 target users. The result shows that users can better extract the storyline by using StoryExplorer along with the proposed workflow. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05435v1-abstract-full').style.display = 'none'; document.getElementById('2411.05435v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 8 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.05185</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Cryptography and Security">cs.CR</span> </div> </div> <p class="title is-5 mathjax"> PentestAgent: Incorporating LLM Agents to Automated Penetration Testing </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Shen%2C+X">Xiangmin Shen</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lingzhi Wang</a>, <a href="/search/cs?searchtype=author&query=Li%2C+Z">Zhenyuan Li</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+Y">Yan Chen</a>, <a href="/search/cs?searchtype=author&query=Zhao%2C+W">Wencheng Zhao</a>, <a href="/search/cs?searchtype=author&query=Sun%2C+D">Dawei Sun</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+J">Jiashui Wang</a>, <a href="/search/cs?searchtype=author&query=Ruan%2C+W">Wei Ruan</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.05185v1-abstract-short" style="display: inline;"> Penetration testing is a critical technique for identifying security vulnerabilities, traditionally performed manually by skilled security specialists. This complex process involves gathering information about the target system, identifying entry points, exploiting the system, and reporting findings. Despite its effectiveness, manual penetration testing is time-consuming and expensive, often requi… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05185v1-abstract-full').style.display = 'inline'; document.getElementById('2411.05185v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.05185v1-abstract-full" style="display: none;"> Penetration testing is a critical technique for identifying security vulnerabilities, traditionally performed manually by skilled security specialists. This complex process involves gathering information about the target system, identifying entry points, exploiting the system, and reporting findings. Despite its effectiveness, manual penetration testing is time-consuming and expensive, often requiring significant expertise and resources that many organizations cannot afford. While automated penetration testing methods have been proposed, they often fall short in real-world applications due to limitations in flexibility, adaptability, and implementation. Recent advancements in large language models (LLMs) offer new opportunities for enhancing penetration testing through increased intelligence and automation. However, current LLM-based approaches still face significant challenges, including limited penetration testing knowledge and a lack of comprehensive automation capabilities. To address these gaps, we propose PentestAgent, a novel LLM-based automated penetration testing framework that leverages the power of LLMs and various LLM-based techniques like Retrieval Augmented Generation (RAG) to enhance penetration testing knowledge and automate various tasks. Our framework leverages multi-agent collaboration to automate intelligence gathering, vulnerability analysis, and exploitation stages, reducing manual intervention. We evaluate PentestAgent using a comprehensive benchmark, demonstrating superior performance in task completion and overall efficiency. This work significantly advances the practical applicability of automated penetration testing systems. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05185v1-abstract-full').style.display = 'none'; document.getElementById('2411.05185v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 7 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">14 pages, 13 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.04509</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> FedDP: Privacy-preserving method based on federated learning for histopathology image segmentation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Pan%2C+L">Liangrui Pan</a>, <a href="/search/cs?searchtype=author&query=Huang%2C+M">Mao Huang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lian Wang</a>, <a href="/search/cs?searchtype=author&query=Qin%2C+P">Pinle Qin</a>, <a href="/search/cs?searchtype=author&query=Peng%2C+S">Shaoliang Peng</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.04509v1-abstract-short" style="display: inline;"> Hematoxylin and Eosin (H&E) staining of whole slide images (WSIs) is considered the gold standard for pathologists and medical practitioners for tumor diagnosis, surgical planning, and post-operative assessment. With the rapid advancement of deep learning technologies, the development of numerous models based on convolutional neural networks and transformer-based models has been applied to the pre… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.04509v1-abstract-full').style.display = 'inline'; document.getElementById('2411.04509v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.04509v1-abstract-full" style="display: none;"> Hematoxylin and Eosin (H&E) staining of whole slide images (WSIs) is considered the gold standard for pathologists and medical practitioners for tumor diagnosis, surgical planning, and post-operative assessment. With the rapid advancement of deep learning technologies, the development of numerous models based on convolutional neural networks and transformer-based models has been applied to the precise segmentation of WSIs. However, due to privacy regulations and the need to protect patient confidentiality, centralized storage and processing of image data are impractical. Training a centralized model directly is challenging to implement in medical settings due to these privacy concerns.This paper addresses the dispersed nature and privacy sensitivity of medical image data by employing a federated learning framework, allowing medical institutions to collaboratively learn while protecting patient privacy. Additionally, to address the issue of original data reconstruction through gradient inversion during the federated learning training process, differential privacy introduces noise into the model updates, preventing attackers from inferring the contributions of individual samples, thereby protecting the privacy of the training data.Experimental results show that the proposed method, FedDP, minimally impacts model accuracy while effectively safeguarding the privacy of cancer pathology image data, with only a slight decrease in Dice, Jaccard, and Acc indices by 0.55%, 0.63%, and 0.42%, respectively. This approach facilitates cross-institutional collaboration and knowledge sharing while protecting sensitive data privacy, providing a viable solution for further research and application in the medical field. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.04509v1-abstract-full').style.display = 'none'; document.getElementById('2411.04509v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 7 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted in BIBM2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.04406</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Image Understanding Makes for A Good Tokenizer for Image Generation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+L">Luting Wang</a>, <a href="/search/cs?searchtype=author&query=Zhao%2C+Y">Yang Zhao</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+Z">Zijian Zhang</a>, <a href="/search/cs?searchtype=author&query=Feng%2C+J">Jiashi Feng</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+S">Si Liu</a>, <a href="/search/cs?searchtype=author&query=Kang%2C+B">Bingyi Kang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.04406v1-abstract-short" style="display: inline;"> Abstract Modern image generation (IG) models have been shown to capture rich semantics valuable for image understanding (IU) tasks. However, the potential of IU models to improve IG performance remains uncharted. We address this issue using a token-based IG framework, which relies on effective tokenizers to project images into token sequences. Currently, pixel reconstruction (e.g., VQGAN) dominate… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.04406v1-abstract-full').style.display = 'inline'; document.getElementById('2411.04406v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.04406v1-abstract-full" style="display: none;"> Abstract Modern image generation (IG) models have been shown to capture rich semantics valuable for image understanding (IU) tasks. However, the potential of IU models to improve IG performance remains uncharted. We address this issue using a token-based IG framework, which relies on effective tokenizers to project images into token sequences. Currently, pixel reconstruction (e.g., VQGAN) dominates the training objective for image tokenizers. In contrast, our approach adopts the feature reconstruction objective, where tokenizers are trained by distilling knowledge from pretrained IU encoders. Comprehensive comparisons indicate that tokenizers with strong IU capabilities achieve superior IG performance across a variety of metrics, datasets, tasks, and proposal networks. Notably, VQ-KD CLIP achieves $4.10$ FID on ImageNet-1k (IN-1k). Visualization suggests that the superiority of VQ-KD can be partly attributed to the rich semantics within the VQ-KD codebook. We further introduce a straightforward pipeline to directly transform IU encoders into tokenizers, demonstrating exceptional effectiveness for IG tasks. These discoveries may energize further exploration into image tokenizer research and inspire the community to reassess the relationship between IU and IG. The code is released at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.04406v1-abstract-full').style.display = 'none'; document.getElementById('2411.04406v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 6 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted by NeurIPS 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.04271</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Distributed, Parallel, and Cluster Computing">cs.DC</span> </div> </div> <p class="title is-5 mathjax"> OpenFLAME: Building a large scale federated localization and mapping service </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Bharadwaj%2C+S">Sagar Bharadwaj</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Luke Wang</a>, <a href="/search/cs?searchtype=author&query=Liang%2C+M">Michael Liang</a>, <a href="/search/cs?searchtype=author&query=Williams%2C+H">Harrison Williams</a>, <a href="/search/cs?searchtype=author&query=Liang%2C+I">Ivan Liang</a>, <a href="/search/cs?searchtype=author&query=Seshan%2C+S">Srinivasan Seshan</a>, <a href="/search/cs?searchtype=author&query=Rowe%2C+A">Anthony Rowe</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.04271v1-abstract-short" style="display: inline;"> The widespread availability of maps has enabled the development of numerous location-based applications, including navigation, ride-sharing, fitness tracking, gaming, robotics, and augmented reality. Today, the maps that power these services are predominantly controlled by a few large corporations and mostly cover outdoor spaces. As the use of these applications expands and indoor localization tec… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.04271v1-abstract-full').style.display = 'inline'; document.getElementById('2411.04271v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.04271v1-abstract-full" style="display: none;"> The widespread availability of maps has enabled the development of numerous location-based applications, including navigation, ride-sharing, fitness tracking, gaming, robotics, and augmented reality. Today, the maps that power these services are predominantly controlled by a few large corporations and mostly cover outdoor spaces. As the use of these applications expands and indoor localization technologies advance, we are seeing the need for a scalable, federated location management system that can extend into private spaces. We introduce OpenFLAME (Open Federated Localization and Mapping Engine), the first federated and decentralized localization service. OpenFLAME links servers that handle localization for specific regions, providing applications with a seamless global view. Creating a federated localization system poses challenges, such as discovering the appropriate servers for a region and integrating services managed by independent providers. To address these issues and ensure scalability, we leverage Domain Name System (DNS) for service discovery and implement map abstractions to retrieve and merge locations across different maps. Our trace-driven study demonstrates that federated localization across remote servers is feasible with acceptable query latencies. To highlight the potential of the system, we developed an augmented reality navigation application for a large indoor space, showing that OpenFLAME can successfully power location-based applications. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.04271v1-abstract-full').style.display = 'none'; document.getElementById('2411.04271v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 6 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.04106</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Systems and Control">eess.SY</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> A Comparative Study of Deep Reinforcement Learning for Crop Production Management </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Balderas%2C+J">Joseph Balderas</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+D">Dong Chen</a>, <a href="/search/cs?searchtype=author&query=Huang%2C+Y">Yanbo Huang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Li Wang</a>, <a href="/search/cs?searchtype=author&query=Li%2C+R">Ren-Cang Li</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.04106v1-abstract-short" style="display: inline;"> Crop production management is essential for optimizing yield and minimizing a field's environmental impact to crop fields, yet it remains challenging due to the complex and stochastic processes involved. Recently, researchers have turned to machine learning to address these complexities. Specifically, reinforcement learning (RL), a cutting-edge approach designed to learn optimal decision-making st… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.04106v1-abstract-full').style.display = 'inline'; document.getElementById('2411.04106v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.04106v1-abstract-full" style="display: none;"> Crop production management is essential for optimizing yield and minimizing a field's environmental impact to crop fields, yet it remains challenging due to the complex and stochastic processes involved. Recently, researchers have turned to machine learning to address these complexities. Specifically, reinforcement learning (RL), a cutting-edge approach designed to learn optimal decision-making strategies through trial and error in dynamic environments, has emerged as a promising tool for developing adaptive crop management policies. RL models aim to optimize long-term rewards by continuously interacting with the environment, making them well-suited for tackling the uncertainties and variability inherent in crop management. Studies have shown that RL can generate crop management policies that compete with, and even outperform, expert-designed policies within simulation-based crop models. In the gym-DSSAT crop model environment, one of the most widely used simulators for crop management, proximal policy optimization (PPO) and deep Q-networks (DQN) have shown promising results. However, these methods have not yet been systematically evaluated under identical conditions. In this study, we evaluated PPO and DQN against static baseline policies across three different RL tasks, fertilization, irrigation, and mixed management, provided by the gym-DSSAT environment. To ensure a fair comparison, we used consistent default parameters, identical reward functions, and the same environment settings. Our results indicate that PPO outperforms DQN in fertilization and irrigation tasks, while DQN excels in the mixed management task. This comparative analysis provides critical insights into the strengths and limitations of each approach, advancing the development of more effective RL-based crop management strategies. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.04106v1-abstract-full').style.display = 'none'; document.getElementById('2411.04106v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 6 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">10 pages</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.03857</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Hardware Architecture">cs.AR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Efficient Message Passing Architecture for GCN Training on HBM-based FPGAs with Orthogonal Topology On-Chip Networks </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wu%2C+Q">Qizhe Wu</a>, <a href="/search/cs?searchtype=author&query=Zhao%2C+L">Letian Zhao</a>, <a href="/search/cs?searchtype=author&query=Gui%2C+Y">Yuchen Gui</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+H+L+X">Huawen Liang Xiaotian Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.03857v1-abstract-short" style="display: inline;"> Graph Convolutional Networks (GCNs) are state-of-the-art deep learning models for representation learning on graphs. However, the efficient training of GCNs is hampered by constraints in memory capacity and bandwidth, compounded by the irregular data flow that results in communication bottlenecks. To address these challenges, we propose a message-passing architecture that leverages NUMA-based memo… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.03857v1-abstract-full').style.display = 'inline'; document.getElementById('2411.03857v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.03857v1-abstract-full" style="display: none;"> Graph Convolutional Networks (GCNs) are state-of-the-art deep learning models for representation learning on graphs. However, the efficient training of GCNs is hampered by constraints in memory capacity and bandwidth, compounded by the irregular data flow that results in communication bottlenecks. To address these challenges, we propose a message-passing architecture that leverages NUMA-based memory access properties and employs a parallel multicast routing algorithm based on a 4-D hypercube network within the accelerator for efficient message passing in graphs. Additionally, we have re-engineered the backpropagation algorithm specific to GCNs within our proposed accelerator. This redesign strategically mitigates the memory demands prevalent during the training phase and diminishes the computational overhead associated with the transposition of extensive matrices. Compared to the state-of-the-art HP-GNN architecture we achieved a performance improvement of $1.03\times \sim 1.81\times$. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.03857v1-abstract-full').style.display = 'none'; document.getElementById('2411.03857v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 6 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">This paper has been accepted for 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays(FPGA'24) as poster</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.03349</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> RuAG: Learned-rule-augmented Generation for Large Language Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Zhang%2C+Y">Yudi Zhang</a>, <a href="/search/cs?searchtype=author&query=Xiao%2C+P">Pei Xiao</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lu Wang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+C">Chaoyun Zhang</a>, <a href="/search/cs?searchtype=author&query=Fang%2C+M">Meng Fang</a>, <a href="/search/cs?searchtype=author&query=Du%2C+Y">Yali Du</a>, <a href="/search/cs?searchtype=author&query=Puzyrev%2C+Y">Yevgeniy Puzyrev</a>, <a href="/search/cs?searchtype=author&query=Yao%2C+R">Randolph Yao</a>, <a href="/search/cs?searchtype=author&query=Qin%2C+S">Si Qin</a>, <a href="/search/cs?searchtype=author&query=Lin%2C+Q">Qingwei Lin</a>, <a href="/search/cs?searchtype=author&query=Pechenizkiy%2C+M">Mykola Pechenizkiy</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+D">Dongmei Zhang</a>, <a href="/search/cs?searchtype=author&query=Rajmohan%2C+S">Saravan Rajmohan</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+Q">Qi Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.03349v1-abstract-short" style="display: inline;"> In-context learning (ICL) and Retrieval-Augmented Generation (RAG) have gained attention for their ability to enhance LLMs' reasoning by incorporating external knowledge but suffer from limited contextual window size, leading to insufficient information injection. To this end, we propose a novel framework, RuAG, to automatically distill large volumes of offline data into interpretable first-order… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.03349v1-abstract-full').style.display = 'inline'; document.getElementById('2411.03349v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.03349v1-abstract-full" style="display: none;"> In-context learning (ICL) and Retrieval-Augmented Generation (RAG) have gained attention for their ability to enhance LLMs' reasoning by incorporating external knowledge but suffer from limited contextual window size, leading to insufficient information injection. To this end, we propose a novel framework, RuAG, to automatically distill large volumes of offline data into interpretable first-order logic rules, which are injected into LLMs to boost their reasoning capabilities. Our method begins by formulating the search process relying on LLMs' commonsense, where LLMs automatically define head and body predicates. Then, RuAG applies Monte Carlo Tree Search (MCTS) to address the combinational searching space and efficiently discover logic rules from data. The resulting logic rules are translated into natural language, allowing targeted knowledge injection and seamless integration into LLM prompts for LLM's downstream task reasoning. We evaluate our framework on public and private industrial tasks, including natural language processing, time-series, decision-making, and industrial tasks, demonstrating its effectiveness in enhancing LLM's capability over diverse tasks. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.03349v1-abstract-full').style.display = 'none'; document.getElementById('2411.03349v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 3 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.03137</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Human-Computer Interaction">cs.HC</span> </div> </div> <p class="title is-5 mathjax"> From Pen to Prompt: How Creative Writers Integrate AI into their Writing Practice </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Guo%2C+A">Alicia Guo</a>, <a href="/search/cs?searchtype=author&query=Sathyanarayanan%2C+S">Shreya Sathyanarayanan</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Leijie Wang</a>, <a href="/search/cs?searchtype=author&query=Heer%2C+J">Jeffrey Heer</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+A">Amy Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.03137v1-abstract-short" style="display: inline;"> Creative writers have a love for their craft, yet AI systems using large language models (LLMs) offer the automation of significant parts of the writing process. So why do some creative writers choose to integrate AI into their workflows? To explore this, we interview and observe a writing session with 18 creative writers who already use AI regularly in their writing practice. Our findings reveal… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.03137v1-abstract-full').style.display = 'inline'; document.getElementById('2411.03137v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.03137v1-abstract-full" style="display: none;"> Creative writers have a love for their craft, yet AI systems using large language models (LLMs) offer the automation of significant parts of the writing process. So why do some creative writers choose to integrate AI into their workflows? To explore this, we interview and observe a writing session with 18 creative writers who already use AI regularly in their writing practice. Our findings reveal that creative writers are intentional about how they incorporate AI, making many deliberate decisions about when and how to engage AI based on the core values they hold about writing. These values, such as authenticity and craftsmanship, alongside writers' relationships with and use of AI influence the parts of writing over which they wish to maintain control. Through our analysis, we contribute a taxonomy of writer values, writer relationships with AI, and integration strategies, and discuss how these three elements interrelate. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.03137v1-abstract-full').style.display = 'none'; document.getElementById('2411.03137v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.02843</a> <span> [<a href="">pdf</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Advances in Photoacoustic Imaging Reconstruction and Quantitative Analysis for Biomedical Applications </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lei Wang</a>, <a href="/search/cs?searchtype=author&query=Zeng%2C+W">Weiming Zeng</a>, <a href="/search/cs?searchtype=author&query=Long%2C+K">Kai Long</a>, <a href="/search/cs?searchtype=author&query=Lan%2C+R">Rongfeng Lan</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+L">Li Liu</a>, <a href="/search/cs?searchtype=author&query=Siok%2C+W+T">Wai Ting Siok</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+N">Nizhuan Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.02843v1-abstract-short" style="display: inline;"> Photoacoustic imaging (PAI) represents an innovative biomedical imaging modality that harnesses the advantages of optical resolution and acoustic penetration depth while ensuring enhanced safety. Despite its promising potential across a diverse array of preclinical and clinical applications, the clinical implementation of PAI faces significant challenges, including the trade-off between penetratio… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02843v1-abstract-full').style.display = 'inline'; document.getElementById('2411.02843v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.02843v1-abstract-full" style="display: none;"> Photoacoustic imaging (PAI) represents an innovative biomedical imaging modality that harnesses the advantages of optical resolution and acoustic penetration depth while ensuring enhanced safety. Despite its promising potential across a diverse array of preclinical and clinical applications, the clinical implementation of PAI faces significant challenges, including the trade-off between penetration depth and spatial resolution, as well as the demand for faster imaging speeds. This paper explores the fundamental principles underlying PAI, with a particular emphasis on three primary implementations: photoacoustic computed tomography (PACT), photoacoustic microscopy (PAM), and photoacoustic endoscopy (PAE). We undertake a critical assessment of their respective strengths and practical limitations. Furthermore, recent developments in utilizing conventional or deep learning (DL) methodologies for image reconstruction and artefact mitigation across PACT, PAM, and PAE are outlined, demonstrating considerable potential to enhance image quality and accelerate imaging processes. Furthermore, this paper examines the recent developments in quantitative analysis within PAI, including the quantification of haemoglobin concentration, oxygen saturation, and other physiological parameters within tissues. Finally, our discussion encompasses current trends and future directions in PAI research while emphasizing the transformative impact of deep learning on advancing PAI. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02843v1-abstract-full').style.display = 'none'; document.getElementById('2411.02843v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.02818</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> LiVOS: Light Video Object Segmentation with Gated Linear Matching </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Liu%2C+Q">Qin Liu</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+J">Jianfeng Wang</a>, <a href="/search/cs?searchtype=author&query=Yang%2C+Z">Zhengyuan Yang</a>, <a href="/search/cs?searchtype=author&query=Li%2C+L">Linjie Li</a>, <a href="/search/cs?searchtype=author&query=Lin%2C+K">Kevin Lin</a>, <a href="/search/cs?searchtype=author&query=Niethammer%2C+M">Marc Niethammer</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lijuan Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.02818v1-abstract-short" style="display: inline;"> Semi-supervised video object segmentation (VOS) has been largely driven by space-time memory (STM) networks, which store past frame features in a spatiotemporal memory to segment the current frame via softmax attention. However, STM networks face memory limitations due to the quadratic complexity of softmax matching, restricting their applicability as video length and resolution increase. To addre… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02818v1-abstract-full').style.display = 'inline'; document.getElementById('2411.02818v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.02818v1-abstract-full" style="display: none;"> Semi-supervised video object segmentation (VOS) has been largely driven by space-time memory (STM) networks, which store past frame features in a spatiotemporal memory to segment the current frame via softmax attention. However, STM networks face memory limitations due to the quadratic complexity of softmax matching, restricting their applicability as video length and resolution increase. To address this, we propose LiVOS, a lightweight memory network that employs linear matching via linear attention, reformulating memory matching into a recurrent process that reduces the quadratic attention matrix to a constant-size, spatiotemporal-agnostic 2D state. To enhance selectivity, we introduce gated linear matching, where a data-dependent gate matrix is multiplied with the state matrix to control what information to retain or discard. Experiments on diverse benchmarks demonstrated the effectiveness of our method. It achieved 64.8 J&F on MOSE and 85.1 J&F on DAVIS, surpassing all non-STM methods and narrowing the gap with STM-based approaches. For longer and higher-resolution videos, it matched STM-based methods with 53% less GPU memory and supports 4096p inference on a 32G consumer-grade GPU--a previously cost-prohibitive capability--opening the door for long and high-resolution video foundation models. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02818v1-abstract-full').style.display = 'none'; document.getElementById('2411.02818v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Code&models:</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.02430</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Generative Emotion Cause Explanation in Multimodal Conversations </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lin Wang</a>, <a href="/search/cs?searchtype=author&query=Yang%2C+X">Xiaocui Yang</a>, <a href="/search/cs?searchtype=author&query=Feng%2C+S">Shi Feng</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+D">Daling Wang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+Y">Yifei Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.02430v1-abstract-short" style="display: inline;"> Multimodal conversation, a crucial form of human communication, carries rich emotional content, making the exploration of the causes of emotions within it a research endeavor of significant importance. However, existing research on the causes of emotions typically uses clause selection methods to locate the reason utterance, without providing a detailed explanation of the emotional causes. In this… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02430v1-abstract-full').style.display = 'inline'; document.getElementById('2411.02430v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.02430v1-abstract-full" style="display: none;"> Multimodal conversation, a crucial form of human communication, carries rich emotional content, making the exploration of the causes of emotions within it a research endeavor of significant importance. However, existing research on the causes of emotions typically uses clause selection methods to locate the reason utterance, without providing a detailed explanation of the emotional causes. In this paper, we propose a new task, \textbf{M}ultimodal \textbf{C}onversation \textbf{E}motion \textbf{C}ause \textbf{E}xplanation (MCECE), aiming to generate a detailed explanation of the emotional cause to the target utterance within a multimodal conversation scenario. Building upon the MELD dataset, we develop a new dataset (ECEM) that integrates video clips with detailed explanations of character emotions, facilitating an in-depth examination of the causal factors behind emotional expressions in multimodal conversations.A novel approach, FAME-Net, is further proposed, that harnesses the power of Large Language Models (LLMs) to analyze visual data and accurately interpret the emotions conveyed through facial expressions in videos. By exploiting the contagion effect of facial emotions, FAME-Net effectively captures the emotional causes of individuals engaged in conversations. Our experimental results on the newly constructed dataset show that FAME-Net significantly outperforms several excellent large language model baselines. Code and dataset are available at \url{} <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02430v1-abstract-full').style.display = 'none'; document.getElementById('2411.02430v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 1 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.02319</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> GenXD: Generating Any 3D and 4D Scenes </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Zhao%2C+Y">Yuyang Zhao</a>, <a href="/search/cs?searchtype=author&query=Lin%2C+C">Chung-Ching Lin</a>, <a href="/search/cs?searchtype=author&query=Lin%2C+K">Kevin Lin</a>, <a href="/search/cs?searchtype=author&query=Yan%2C+Z">Zhiwen Yan</a>, <a href="/search/cs?searchtype=author&query=Li%2C+L">Linjie Li</a>, <a href="/search/cs?searchtype=author&query=Yang%2C+Z">Zhengyuan Yang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+J">Jianfeng Wang</a>, <a href="/search/cs?searchtype=author&query=Lee%2C+G+H">Gim Hee Lee</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lijuan Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.02319v2-abstract-short" style="display: inline;"> Recent developments in 2D visual generation have been remarkably successful. However, 3D and 4D generation remain challenging in real-world applications due to the lack of large-scale 4D data and effective model design. In this paper, we propose to jointly investigate general 3D and 4D generation by leveraging camera and object movements commonly observed in daily life. Due to the lack of real-wor… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02319v2-abstract-full').style.display = 'inline'; document.getElementById('2411.02319v2-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.02319v2-abstract-full" style="display: none;"> Recent developments in 2D visual generation have been remarkably successful. However, 3D and 4D generation remain challenging in real-world applications due to the lack of large-scale 4D data and effective model design. In this paper, we propose to jointly investigate general 3D and 4D generation by leveraging camera and object movements commonly observed in daily life. Due to the lack of real-world 4D data in the community, we first propose a data curation pipeline to obtain camera poses and object motion strength from videos. Based on this pipeline, we introduce a large-scale real-world 4D scene dataset: CamVid-30K. By leveraging all the 3D and 4D data, we develop our framework, GenXD, which allows us to produce any 3D or 4D scene. We propose multiview-temporal modules, which disentangle camera and object movements, to seamlessly learn from both 3D and 4D data. Additionally, GenXD employs masked latent conditions to support a variety of conditioning views. GenXD can generate videos that follow the camera trajectory as well as consistent 3D views that can be lifted into 3D representations. We perform extensive evaluations across various real-world and synthetic datasets, demonstrating GenXD's effectiveness and versatility compared to previous methods in 3D and 4D generation. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02319v2-abstract-full').style.display = 'none'; document.getElementById('2411.02319v2-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 4 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.02310</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> MdEval: Massively Multilingual Code Debugging </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Liu%2C+S">Shukai Liu</a>, <a href="/search/cs?searchtype=author&query=Chai%2C+L">Linzheng Chai</a>, <a href="/search/cs?searchtype=author&query=Yang%2C+J">Jian Yang</a>, <a href="/search/cs?searchtype=author&query=Shi%2C+J">Jiajun Shi</a>, <a href="/search/cs?searchtype=author&query=Zhu%2C+H">He Zhu</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Liran Wang</a>, <a href="/search/cs?searchtype=author&query=Jin%2C+K">Ke Jin</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+W">Wei Zhang</a>, <a href="/search/cs?searchtype=author&query=Zhu%2C+H">Hualei Zhu</a>, <a href="/search/cs?searchtype=author&query=Guo%2C+S">Shuyue Guo</a>, <a href="/search/cs?searchtype=author&query=Sun%2C+T">Tao Sun</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+J">Jiaheng Liu</a>, <a href="/search/cs?searchtype=author&query=Duan%2C+Y">Yunlong Duan</a>, <a href="/search/cs?searchtype=author&query=Hao%2C+Y">Yu Hao</a>, <a href="/search/cs?searchtype=author&query=Yang%2C+L">Liqun Yang</a>, <a href="/search/cs?searchtype=author&query=Niu%2C+G">Guanglin Niu</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+G">Ge Zhang</a>, <a href="/search/cs?searchtype=author&query=Li%2C+Z">Zhoujun Li</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.02310v1-abstract-short" style="display: inline;"> Code large language models (LLMs) have made significant progress in code debugging by directly generating the correct code based on the buggy code snippet. Programming benchmarks, typically consisting of buggy code snippet and their associated test cases, are used to assess the debugging capabilities of LLMs. However, many existing benchmarks primarily focus on Python and are often limited in term… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02310v1-abstract-full').style.display = 'inline'; document.getElementById('2411.02310v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.02310v1-abstract-full" style="display: none;"> Code large language models (LLMs) have made significant progress in code debugging by directly generating the correct code based on the buggy code snippet. Programming benchmarks, typically consisting of buggy code snippet and their associated test cases, are used to assess the debugging capabilities of LLMs. However, many existing benchmarks primarily focus on Python and are often limited in terms of language diversity (e.g., DebugBench and DebugEval). To advance the field of multilingual debugging with LLMs, we propose the first massively multilingual debugging benchmark, which includes 3.6K test samples of 18 programming languages and covers the automated program repair (APR) task, the code review (CR) task, and the bug identification (BI) task. Further, we introduce the debugging instruction corpora MDEVAL-INSTRUCT by injecting bugs into the correct multilingual queries and solutions (xDebugGen). Further, a multilingual debugger xDebugCoder trained on MDEVAL-INSTRUCT as a strong baseline specifically to handle the bugs of a wide range of programming languages (e.g. "Missing Mut" in language Rust and "Misused Macro Definition" in language C). Our extensive experiments on MDEVAL reveal a notable performance gap between open-source models and closed-source LLMs (e.g., GPT and Claude series), highlighting huge room for improvement in multilingual code debugging scenarios. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02310v1-abstract-full').style.display = 'none'; document.getElementById('2411.02310v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">15 pages</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.02293</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Yang%2C+X">Xianghui Yang</a>, <a href="/search/cs?searchtype=author&query=Shi%2C+H">Huiwen Shi</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+B">Bowen Zhang</a>, <a href="/search/cs?searchtype=author&query=Yang%2C+F">Fan Yang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+J">Jiacheng Wang</a>, <a href="/search/cs?searchtype=author&query=Zhao%2C+H">Hongxu Zhao</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+X">Xinhai Liu</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+X">Xinzhou Wang</a>, <a href="/search/cs?searchtype=author&query=Lin%2C+Q">Qingxiang Lin</a>, <a href="/search/cs?searchtype=author&query=Yu%2C+J">Jiaao Yu</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lifu Wang</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+Z">Zhuo Chen</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+S">Sicong Liu</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+Y">Yuhong Liu</a>, <a href="/search/cs?searchtype=author&query=Yang%2C+Y">Yong Yang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+D">Di Wang</a>, <a href="/search/cs?searchtype=author&query=Jiang%2C+J">Jie Jiang</a>, <a href="/search/cs?searchtype=author&query=Guo%2C+C">Chunchao Guo</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.02293v2-abstract-short" style="display: inline;"> While 3D generative models have greatly improved artists' workflows, the existing diffusion models for 3D generation suffer from slow generation and poor generalization. To address this issue, we propose a two-stage approach named Hunyuan3D-1.0 including a lite version and a standard version, that both support text- and image-conditioned generation. In the first stage, we employ a multi-view diffu… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02293v2-abstract-full').style.display = 'inline'; document.getElementById('2411.02293v2-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.02293v2-abstract-full" style="display: none;"> While 3D generative models have greatly improved artists' workflows, the existing diffusion models for 3D generation suffer from slow generation and poor generalization. To address this issue, we propose a two-stage approach named Hunyuan3D-1.0 including a lite version and a standard version, that both support text- and image-conditioned generation. In the first stage, we employ a multi-view diffusion model that efficiently generates multi-view RGB in approximately 4 seconds. These multi-view images capture rich details of the 3D asset from different viewpoints, relaxing the tasks from single-view to multi-view reconstruction. In the second stage, we introduce a feed-forward reconstruction model that rapidly and faithfully reconstructs the 3D asset given the generated multi-view images in approximately 7 seconds. The reconstruction network learns to handle noises and in-consistency introduced by the multi-view diffusion and leverages the available information from the condition image to efficiently recover the 3D structure. Our framework involves the text-to-image model, i.e., Hunyuan-DiT, making it a unified framework to support both text- and image-conditioned 3D generation. Our standard version has 3x more parameters than our lite and other existing model. Our Hunyuan3D-1.0 achieves an impressive balance between speed and quality, significantly reducing generation time while maintaining the quality and diversity of the produced assets. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02293v2-abstract-full').style.display = 'none'; document.getElementById('2411.02293v2-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 4 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Technical Report; 3D Generation</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.01988</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> QCS:Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+C">Chengpeng Wang</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+L">Li Chen</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lili Wang</a>, <a href="/search/cs?searchtype=author&query=Li%2C+Z">Zhaofan Li</a>, <a href="/search/cs?searchtype=author&query=Lv%2C+X">Xuebin Lv</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.01988v1-abstract-short" style="display: inline;"> On facial expression datasets with complex and numerous feature types, where the significance and dominance of labeled features are difficult to predict, facial expression recognition(FER) encounters the challenges of inter-class similarity and intra-class variances, making it difficult to mine effective features. We aim to solely leverage the feature similarity among facial samples to address thi… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.01988v1-abstract-full').style.display = 'inline'; document.getElementById('2411.01988v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.01988v1-abstract-full" style="display: none;"> On facial expression datasets with complex and numerous feature types, where the significance and dominance of labeled features are difficult to predict, facial expression recognition(FER) encounters the challenges of inter-class similarity and intra-class variances, making it difficult to mine effective features. We aim to solely leverage the feature similarity among facial samples to address this. We introduce the Cross Similarity Attention (CSA), an input-output position-sensitive attention mechanism that harnesses feature similarity across different images to compute the corresponding global spatial attention. Based on this, we propose a four-branch circular framework, called Quadruplet Cross Similarity (QCS), to extract discriminative features from the same class and eliminate redundant ones from different classes synchronously to refine cleaner features. The symmetry of the network ensures balanced and stable training and reduces the amount of CSA interaction matrix. Contrastive residual distillation is utilized to transfer the information learned in the cross module back to the base network. The cross-attention module exists during training, and only one base branch is retained during inference. our proposed QCS model outperforms state-of-the-art methods on several popular FER datasets, without requiring additional landmark information or other extra training data. The code is available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.01988v1-abstract-full').style.display = 'none'; document.getElementById('2411.01988v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.01821</a> <span> [<a href="">pdf</a>, <a href="">ps</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Information Theory">cs.IT</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> IRS-Enhanced Secure Semantic Communication Networks: Cross-Layer and Context-Awared Resource Allocation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lingyi Wang</a>, <a href="/search/cs?searchtype=author&query=Wu%2C+W">Wei Wu</a>, <a href="/search/cs?searchtype=author&query=Zhou%2C+F">Fuhui Zhou</a>, <a href="/search/cs?searchtype=author&query=Qin%2C+Z">Zhijin Qin</a>, <a href="/search/cs?searchtype=author&query=Wu%2C+Q">Qihui Wu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.01821v1-abstract-short" style="display: inline;"> Learning-task oriented semantic communication is pivotal in optimizing transmission efficiency by extracting and conveying essential semantics tailored to specific tasks, such as image reconstruction and classification. Nevertheless, the challenge of eavesdropping poses a formidable threat to semantic privacy due to the open nature of wireless communications. In this paper, intelligent reflective… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.01821v1-abstract-full').style.display = 'inline'; document.getElementById('2411.01821v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.01821v1-abstract-full" style="display: none;"> Learning-task oriented semantic communication is pivotal in optimizing transmission efficiency by extracting and conveying essential semantics tailored to specific tasks, such as image reconstruction and classification. Nevertheless, the challenge of eavesdropping poses a formidable threat to semantic privacy due to the open nature of wireless communications. In this paper, intelligent reflective surface (IRS)-enhanced secure semantic communication (IRS-SSC) is proposed to guarantee the physical layer security from a task-oriented semantic perspective. Specifically, a multi-layer codebook is exploited to discretize continuous semantic features and describe semantics with different numbers of bits, thereby meeting the need for hierarchical semantic representation and further enhancing the transmission efficiency. Novel semantic security metrics, i.e., secure semantic rate (S-SR) and secure semantic spectrum efficiency (S-SSE), are defined to map the task-oriented security requirements at the application layer into the physical layer. To achieve artificial intelligence (AI)-native secure communication, we propose a noise disturbance enhanced hybrid deep reinforcement learning (NdeHDRL)-based resource allocation scheme. This scheme dynamically maximizes the S-SSE by jointly optimizing the bits for semantic representations, reflective coefficients of the IRS, and the subchannel assignment. Moreover, we propose a novel semantic context awared state space (SCA-SS) to fusion the high-dimensional semantic space and the observable system state space, which enables the agent to perceive semantic context and solves the dimensional catastrophe problem. Simulation results demonstrate the efficiency of our proposed schemes in both enhancing the security performance and the S-SSE compared to several benchmark schemes. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.01821v1-abstract-full').style.display = 'none'; document.getElementById('2411.01821v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.01647</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+Z">Zhenbin Wang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+L">Lei Zhang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lituan Wang</a>, <a href="/search/cs?searchtype=author&query=Zhu%2C+M">Minjuan Zhu</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+Z">Zhenwei Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.01647v1-abstract-short" style="display: inline;"> Medical video generation models are expected to have a profound impact on the healthcare industry, including but not limited to medical education and training, surgical planning, and simulation. Current video diffusion models typically build on image diffusion architecture by incorporating temporal operations (such as 3D convolution and temporal attention). Although this approach is effective, its… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.01647v1-abstract-full').style.display = 'inline'; document.getElementById('2411.01647v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.01647v1-abstract-full" style="display: none;"> Medical video generation models are expected to have a profound impact on the healthcare industry, including but not limited to medical education and training, surgical planning, and simulation. Current video diffusion models typically build on image diffusion architecture by incorporating temporal operations (such as 3D convolution and temporal attention). Although this approach is effective, its oversimplification limits spatio-temporal performance and consumes substantial computational resources. To counter this, we propose Medical Simulation Video Generator (MedSora), which incorporates three key elements: i) a video diffusion framework integrates the advantages of attention and Mamba, balancing low computational load with high-quality video generation, ii) an optical flow representation alignment method that implicitly enhances attention to inter-frame pixels, and iii) a video variational autoencoder (VAE) with frequency compensation addresses the information loss of medical features that occurs when transforming pixel space into latent features and then back to pixel frames. Extensive experiments and applications demonstrate that MedSora exhibits superior visual quality in generating medical videos, outperforming the most advanced baseline methods. Further results and code are available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.01647v1-abstract-full').style.display = 'none'; document.getElementById('2411.01647v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 3 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.01488</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Graphics">cs.GR</span> </div> </div> <p class="title is-5 mathjax"> ITS: Implicit Thin Shell for Polygonal Meshes </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wen%2C+H">Huibiao Wen</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lei Wang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+Y">Yunxiao Zhang</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+S">Shuangmin Chen</a>, <a href="/search/cs?searchtype=author&query=Xin%2C+S">Shiqing Xin</a>, <a href="/search/cs?searchtype=author&query=Deng%2C+C">Chongyang Deng</a>, <a href="/search/cs?searchtype=author&query=He%2C+Y">Ying He</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+W">Wenping Wang</a>, <a href="/search/cs?searchtype=author&query=Tu%2C+C">Changhe Tu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.01488v1-abstract-short" style="display: inline;"> In computer graphics, simplifying a polygonal mesh surface~$\mathcal{M}$ into a geometric proxy that maintains close conformity to~$\mathcal{M}$ is crucial, as it can significantly reduce computational demands in various applications. In this paper, we introduce the Implicit Thin Shell~(ITS), a concept designed to implicitly represent the sandwich-walled space surrounding~$\mathcal{M}$, defined as… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.01488v1-abstract-full').style.display = 'inline'; document.getElementById('2411.01488v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.01488v1-abstract-full" style="display: none;"> In computer graphics, simplifying a polygonal mesh surface~$\mathcal{M}$ into a geometric proxy that maintains close conformity to~$\mathcal{M}$ is crucial, as it can significantly reduce computational demands in various applications. In this paper, we introduce the Implicit Thin Shell~(ITS), a concept designed to implicitly represent the sandwich-walled space surrounding~$\mathcal{M}$, defined as~$\{\textbf{x}\in\mathbb{R}^3|蔚_1\leq f(\textbf{x}) \leq 蔚_2, 蔚_1< 0, 蔚_2>0\}$. Here, $f$ is an approximation of the signed distance function~(SDF) of~$\mathcal{M}$, and we aim to minimize the thickness~$蔚_2-蔚_1$. To achieve a balance between mathematical simplicity and expressive capability in~$f$, we employ a tri-variate tensor-product B-spline to represent~$f$. This representation is coupled with adaptive knot grids that adapt to the inherent shape variations of~$\mathcal{M}$, while restricting~$f$'s basis functions to the first degree. In this manner, the analytical form of~$f$ can be rapidly determined by solving a sparse linear system. Moreover, the process of identifying the extreme values of~$f$ among the infinitely many points on~$\mathcal{M}$ can be simplified to seeking extremes among a finite set of candidate points. By exhausting the candidate points, we find the extreme values~$蔚_1<0$ and $蔚_2>0$ that minimize the thickness. The constructed ITS is guaranteed to wrap~$\mathcal{M}$ rigorously, without any intersections between the bounding surfaces and~$\mathcal{M}$. ITS offers numerous potential applications thanks to its rigorousness, tightness, expressiveness, and computational efficiency. We demonstrate the efficacy of ITS in rapid inside-outside tests and in mesh simplification through the control of global error. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.01488v1-abstract-full').style.display = 'none'; document.getElementById('2411.01488v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 3 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.01477</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> DPCL-Diff: The Temporal Knowledge Graph Reasoning based on Graph Node Diffusion Model with Dual-Domain Periodic Contrastive Learning </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Cao%2C+Y">Yukun Cao</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lisheng Wang</a>, <a href="/search/cs?searchtype=author&query=Huang%2C+L">Luobing Huang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.01477v1-abstract-short" style="display: inline;"> Temporal knowledge graph (TKG) reasoning that infers future missing facts is an essential and challenging task. Predicting future events typically relies on closely related historical facts, yielding more accurate results for repetitive or periodic events. However, for future events with sparse historical interactions, the effectiveness of this method, which focuses on leveraging high-frequency hi… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.01477v1-abstract-full').style.display = 'inline'; document.getElementById('2411.01477v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.01477v1-abstract-full" style="display: none;"> Temporal knowledge graph (TKG) reasoning that infers future missing facts is an essential and challenging task. Predicting future events typically relies on closely related historical facts, yielding more accurate results for repetitive or periodic events. However, for future events with sparse historical interactions, the effectiveness of this method, which focuses on leveraging high-frequency historical information, diminishes. Recently, the capabilities of diffusion models in image generation have opened new opportunities for TKG reasoning. Therefore, we propose a graph node diffusion model with dual-domain periodic contrastive learning (DPCL-Diff). Graph node diffusion model (GNDiff) introduces noise into sparsely related events to simulate new events, generating high-quality data that better conforms to the actual distribution. This generative mechanism significantly enhances the model's ability to reason about new events. Additionally, the dual-domain periodic contrastive learning (DPCL) maps periodic and non-periodic event entities to Poincar茅 and Euclidean spaces, leveraging their characteristics to distinguish similar periodic events effectively. Experimental results on four public datasets demonstrate that DPCL-Diff significantly outperforms state-of-the-art TKG models in event prediction, demonstrating our approach's effectiveness. This study also investigates the combined effectiveness of GNDiff and DPCL in TKG tasks. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.01477v1-abstract-full').style.display = 'none'; document.getElementById('2411.01477v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 3 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">11 pages, 2 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.01273</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Cryptography and Security">cs.CR</span> </div> </div> <p class="title is-5 mathjax"> PARIS: A Practical, Adaptive Trace-Fetching and Real-Time Malicious Behavior Detection System </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+J">Jian Wang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lingzhi Wang</a>, <a href="/search/cs?searchtype=author&query=Yu%2C+H">Husheng Yu</a>, <a href="/search/cs?searchtype=author&query=Shen%2C+X">Xiangmin Shen</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+Y">Yan Chen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.01273v1-abstract-short" style="display: inline;"> The escalating sophistication of cyber-attacks and the widespread utilization of stealth tactics have led to significant security threats globally. Nevertheless, the existing static detection methods exhibit limited coverage, and traditional dynamic monitoring approaches encounter challenges in bypassing evasion techniques. Thus, it has become imperative to implement nuanced and dynamic analysis t… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.01273v1-abstract-full').style.display = 'inline'; document.getElementById('2411.01273v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.01273v1-abstract-full" style="display: none;"> The escalating sophistication of cyber-attacks and the widespread utilization of stealth tactics have led to significant security threats globally. Nevertheless, the existing static detection methods exhibit limited coverage, and traditional dynamic monitoring approaches encounter challenges in bypassing evasion techniques. Thus, it has become imperative to implement nuanced and dynamic analysis to achieve precise behavior detection in real time. There are two pressing concerns associated with current dynamic malware behavior detection solutions. Firstly, the collection and processing of data entail a significant amount of overhead, making it challenging to be employed for real-time detection on the end host. Secondly, these approaches tend to treat malware as a singular entity, thereby overlooking varied behaviors within one instance. To fill these gaps, we propose PARIS, an adaptive trace fetching, lightweight, real-time malicious behavior detection system. Specifically, we monitor malicious behavior with Event Tracing for Windows (ETW) and learn to selectively collect maliciousness-related APIs or call stacks, significantly reducing the data collection overhead. As a result, we can monitor a wider range of APIs and detect more intricate attack behavior. We implemented a prototype of PARIS and evaluated the system overhead, the accuracy of comparative behavior recognition, and the impact of different models and parameters. The result demonstrates that PARIS can reduce over 98.8% of data compared to the raw ETW trace and hence decreases the overhead on the host in terms of memory, bandwidth, and CPU usage with a similar detection accuracy to the baselines that suffer from the high overhead. Furthermore, a breakdown evaluation shows that 80% of the memory and bandwidth savings and a complete reduction in CPU usage can be attributed to our adaptive trace-fetching collector. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.01273v1-abstract-full').style.display = 'none'; document.getElementById('2411.01273v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 2 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> </ol> <nav class="pagination is-small is-centered breathe-horizontal" role="navigation" aria-label="pagination"> <a href="" class="pagination-previous is-invisible">Previous </a> <a href="/search/?searchtype=author&query=Wang%2C+L&start=50" class="pagination-next" >Next </a> <ul class="pagination-list"> <li> <a 