href="/search/?searchtype=author&amp;query=Ma%2C+Q&amp;start=250" class="pagination-link " aria-label="Page 6" aria-current="page">6 </a> </li> </ul> </nav> <ol class="breathe-horizontal" start="1"> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.14205</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Z">Zeqing Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qingyang Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Wan%2C+W">Wentao Wan</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+H">Haojie Li</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+K">Keze Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Tian%2C+Y">Yonghong Tian</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.14205v1-abstract-short" style="display: inline;"> Recent improvements in visual synthesis have significantly enhanced the depiction of generated human photos, which are pivotal due to their wide applicability and demand. Nonetheless, the existing text-to-image or text-to-video models often generate low-quality human photos that might differ considerably from real-world body structures, referred to as &#34;abnormal human bodies&#34;. Such abnormalities, t&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.14205v1-abstract-full').style.display = 'inline'; document.getElementById('2411.14205v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.14205v1-abstract-full" style="display: none;"> Recent improvements in visual synthesis have significantly enhanced the depiction of generated human photos, which are pivotal due to their wide applicability and demand. Nonetheless, the existing text-to-image or text-to-video models often generate low-quality human photos that might differ considerably from real-world body structures, referred to as &#34;abnormal human bodies&#34;. Such abnormalities, typically deemed unacceptable, pose considerable challenges in the detection and repair of them within human photos. These challenges require precise abnormality recognition capabilities, which entail pinpointing both the location and the abnormality type. Intuitively, Visual Language Models (VLMs) that have obtained remarkable performance on various visual tasks are quite suitable for this task. However, their performance on abnormality detection in human photos is quite poor. Hence, it is quite important to highlight this task for the research community. In this paper, we first introduce a simple yet challenging task, i.e., \textbf{F}ine-grained \textbf{H}uman-body \textbf{A}bnormality \textbf{D}etection \textbf{(FHAD)}, and construct two high-quality datasets for evaluation. Then, we propose a meticulous framework, named HumanCalibrator, which identifies and repairs abnormalities in human body structures while preserving the other content. Experiments indicate that our HumanCalibrator achieves high accuracy in abnormality detection and accomplishes an increase in visual comparisons while preserving the other visual content. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.14205v1-abstract-full').style.display = 'none'; document.getElementById('2411.14205v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">16 pages, 14 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.13865</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Information Retrieval">cs.IR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> HARec: Hyperbolic Graph-LLM Alignment for Exploration and Exploitation in Recommender Systems </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qiyao Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+M">Menglin Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Ju%2C+M">Mingxuan Ju</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+T">Tong Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Shah%2C+N">Neil Shah</a>, <a href="/search/cs?searchtype=author&amp;query=Ying%2C+R">Rex Ying</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.13865v1-abstract-short" style="display: inline;"> Modern recommendation systems often create information cocoons, limiting users&#39; exposure to diverse content. To enhance user experience, a crucial challenge is developing systems that can balance content exploration and exploitation, allowing users to adjust their recommendation preferences. Intuitively, this balance can be achieved through a tree-structured representation, where depth search faci&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.13865v1-abstract-full').style.display = 'inline'; document.getElementById('2411.13865v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.13865v1-abstract-full" style="display: none;"> Modern recommendation systems often create information cocoons, limiting users&#39; exposure to diverse content. To enhance user experience, a crucial challenge is developing systems that can balance content exploration and exploitation, allowing users to adjust their recommendation preferences. Intuitively, this balance can be achieved through a tree-structured representation, where depth search facilitates exploitation and breadth search enables exploration. However, current works face two challenges to achieve this target: (1) Euclidean methods fail to fully capture hierarchical structures and lack flexibility in balancing exploration-exploitation, while (2) hyperbolic approaches, despite better hierarchical modeling, suffer from insufficient semantic alignment due to their reliance on Euclidean text encoders. To address these challenges, we propose HARec, a hyperbolic representation learning framework that jointly aligns user-item collaborative information with textual descriptions in hyperbolic space. Our framework introduces two key technique novelty: (1) a hierarchical-aware graph-llm alignment mechanism that enables better hierarchical representation, and (2) a hyperbolic hierarchical tree structure that facilitates user-adjustable exploration-exploitation trade-offs. Extensive experiments demonstrate that HARec consistently outperforms both Euclidean and hyperbolic baselines, achieving up to 5.49% improvement in utility metrics and 11.39% increase in diversity metrics. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.13865v1-abstract-full').style.display = 'none'; document.getElementById('2411.13865v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.13503</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Huang%2C+Z">Ziqi Huang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+F">Fan Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+X">Xiaojie Xu</a>, <a href="/search/cs?searchtype=author&amp;query=He%2C+Y">Yinan He</a>, <a href="/search/cs?searchtype=author&amp;query=Yu%2C+J">Jiashuo Yu</a>, <a href="/search/cs?searchtype=author&amp;query=Dong%2C+Z">Ziyue Dong</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qianli Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Chanpaisit%2C+N">Nattapol Chanpaisit</a>, <a href="/search/cs?searchtype=author&amp;query=Si%2C+C">Chenyang Si</a>, <a href="/search/cs?searchtype=author&amp;query=Jiang%2C+Y">Yuming Jiang</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yaohui Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+X">Xinyuan Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+Y">Ying-Cong Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+L">Limin Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Lin%2C+D">Dahua Lin</a>, <a href="/search/cs?searchtype=author&amp;query=Qiao%2C+Y">Yu Qiao</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+Z">Ziwei Liu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.13503v1-abstract-short" style="display: inline;"> Video generation has witnessed significant advancements, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal evaluation system should provide insights to inform future developments of video generation. To this end, we present VBench, a&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.13503v1-abstract-full').style.display = 'inline'; document.getElementById('2411.13503v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.13503v1-abstract-full" style="display: none;"> Video generation has witnessed significant advancements, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal evaluation system should provide insights to inform future developments of video generation. To this end, we present VBench, a comprehensive benchmark suite that dissects &#34;video generation quality&#34; into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods. VBench has several appealing properties: 1) Comprehensive Dimensions: VBench comprises 16 dimensions in video generation (e.g., subject identity inconsistency, motion smoothness, temporal flickering, and spatial relationship, etc). The evaluation metrics with fine-grained levels reveal individual models&#39; strengths and weaknesses. 2) Human Alignment: We also provide a dataset of human preference annotations to validate our benchmarks&#39; alignment with human perception, for each evaluation dimension respectively. 3) Valuable Insights: We look into current models&#39; ability across various evaluation dimensions, and various content types. We also investigate the gaps between video and image generation models. 4) Versatile Benchmarking: VBench++ supports evaluating text-to-video and image-to-video. We introduce a high-quality Image Suite with an adaptive aspect ratio to enable fair evaluations across different image-to-video generation settings. Beyond assessing technical quality, VBench++ evaluates the trustworthiness of video generative models, providing a more holistic view of model performance. 5) Full Open-Sourcing: We fully open-source VBench++ and continually add new video generation models to our leaderboard to drive forward the field of video generation. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.13503v1-abstract-full').style.display = 'none'; document.getElementById('2411.13503v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 20 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Leaderboard: Code: Project page: extension of arXiv:2311.17982. arXiv admin note: substantial text overlap with arXiv:2311.17982</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.10321</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Sun%2C+G">Guodong Sun</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qixiang Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+L">Liqiang Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+H">Hongwei Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Gao%2C+Z">Zixuan Gao</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+H">Haotian Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.10321v1-abstract-short" style="display: inline;"> Atmospheric turbulence introduces severe spatial and geometric distortions, challenging traditional image restoration methods. We propose the Probabilistic Prior Turbulence Removal Network (PPTRN), which combines probabilistic diffusion-based prior modeling with Transformer-driven feature extraction to address this issue. PPTRN employs a two-stage approach: first, a latent encoder and Transformer&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10321v1-abstract-full').style.display = 'inline'; document.getElementById('2411.10321v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.10321v1-abstract-full" style="display: none;"> Atmospheric turbulence introduces severe spatial and geometric distortions, challenging traditional image restoration methods. We propose the Probabilistic Prior Turbulence Removal Network (PPTRN), which combines probabilistic diffusion-based prior modeling with Transformer-driven feature extraction to address this issue. PPTRN employs a two-stage approach: first, a latent encoder and Transformer are jointly trained on clear images to establish robust feature representations. Then, a Denoising Diffusion Probabilistic Model (DDPM) models prior distributions over latent vectors, guiding the Transformer in capturing diverse feature variations essential for restoration. A key innovation in PPTRN is the Probabilistic Prior Driven Cross Attention mechanism, which integrates the DDPM-generated prior with feature embeddings to reduce artifacts and enhance spatial coherence. Extensive experiments validate that PPTRN significantly improves restoration quality on turbulence-degraded images, setting a new benchmark in clarity and structural fidelity. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10321v1-abstract-full').style.display = 'none'; document.getElementById('2411.10321v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 15 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.07135</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Graphics">cs.GR</span> </div> </div> <p class="title is-5 mathjax"> Edify 3D: Scalable High-Quality 3D Asset Generation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=NVIDIA"> NVIDIA</a>, <a href="/search/cs?searchtype=author&amp;query=%3A"> :</a>, <a href="/search/cs?searchtype=author&amp;query=Bala%2C+M">Maciej Bala</a>, <a href="/search/cs?searchtype=author&amp;query=Cui%2C+Y">Yin Cui</a>, <a href="/search/cs?searchtype=author&amp;query=Ding%2C+Y">Yifan Ding</a>, <a href="/search/cs?searchtype=author&amp;query=Ge%2C+Y">Yunhao Ge</a>, <a href="/search/cs?searchtype=author&amp;query=Hao%2C+Z">Zekun Hao</a>, <a href="/search/cs?searchtype=author&amp;query=Hasselgren%2C+J">Jon Hasselgren</a>, <a href="/search/cs?searchtype=author&amp;query=Huffman%2C+J">Jacob Huffman</a>, <a href="/search/cs?searchtype=author&amp;query=Jin%2C+J">Jingyi Jin</a>, <a href="/search/cs?searchtype=author&amp;query=Lewis%2C+J+P">J. P. Lewis</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+Z">Zhaoshuo Li</a>, <a href="/search/cs?searchtype=author&amp;query=Lin%2C+C">Chen-Hsuan Lin</a>, <a href="/search/cs?searchtype=author&amp;query=Lin%2C+Y">Yen-Chen Lin</a>, <a href="/search/cs?searchtype=author&amp;query=Lin%2C+T">Tsung-Yi Lin</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+M">Ming-Yu Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Luo%2C+A">Alice Luo</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qianli Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Munkberg%2C+J">Jacob Munkberg</a>, <a href="/search/cs?searchtype=author&amp;query=Shi%2C+S">Stella Shi</a>, <a href="/search/cs?searchtype=author&amp;query=Wei%2C+F">Fangyin Wei</a>, <a href="/search/cs?searchtype=author&amp;query=Xiang%2C+D">Donglai Xiang</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+J">Jiashu Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Zeng%2C+X">Xiaohui Zeng</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+Q">Qinsheng Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.07135v1-abstract-short" style="display: inline;"> We introduce Edify 3D, an advanced solution designed for high-quality 3D asset generation. Our method first synthesizes RGB and surface normal images of the described object at multiple viewpoints using a diffusion model. The multi-view observations are then used to reconstruct the shape, texture, and PBR materials of the object. Our method can generate high-quality 3D assets with detailed geometr&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.07135v1-abstract-full').style.display = 'inline'; document.getElementById('2411.07135v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.07135v1-abstract-full" style="display: none;"> We introduce Edify 3D, an advanced solution designed for high-quality 3D asset generation. Our method first synthesizes RGB and surface normal images of the described object at multiple viewpoints using a diffusion model. The multi-view observations are then used to reconstruct the shape, texture, and PBR materials of the object. Our method can generate high-quality 3D assets with detailed geometry, clean shape topologies, high-resolution textures, and materials within 2 minutes of runtime. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.07135v1-abstract-full').style.display = 'none'; document.getElementById('2411.07135v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 11 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Project website:</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.07126</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=NVIDIA"> NVIDIA</a>, <a href="/search/cs?searchtype=author&amp;query=%3A"> :</a>, <a href="/search/cs?searchtype=author&amp;query=Atzmon%2C+Y">Yuval Atzmon</a>, <a href="/search/cs?searchtype=author&amp;query=Bala%2C+M">Maciej Bala</a>, <a href="/search/cs?searchtype=author&amp;query=Balaji%2C+Y">Yogesh Balaji</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+T">Tiffany Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Cui%2C+Y">Yin Cui</a>, <a href="/search/cs?searchtype=author&amp;query=Fan%2C+J">Jiaojiao Fan</a>, <a href="/search/cs?searchtype=author&amp;query=Ge%2C+Y">Yunhao Ge</a>, <a href="/search/cs?searchtype=author&amp;query=Gururani%2C+S">Siddharth Gururani</a>, <a href="/search/cs?searchtype=author&amp;query=Huffman%2C+J">Jacob Huffman</a>, <a href="/search/cs?searchtype=author&amp;query=Isaac%2C+R">Ronald Isaac</a>, <a href="/search/cs?searchtype=author&amp;query=Jannaty%2C+P">Pooya Jannaty</a>, <a href="/search/cs?searchtype=author&amp;query=Karras%2C+T">Tero Karras</a>, <a href="/search/cs?searchtype=author&amp;query=Lam%2C+G">Grace Lam</a>, <a href="/search/cs?searchtype=author&amp;query=Lewis%2C+J+P">J. P. Lewis</a>, <a href="/search/cs?searchtype=author&amp;query=Licata%2C+A">Aaron Licata</a>, <a href="/search/cs?searchtype=author&amp;query=Lin%2C+Y">Yen-Chen Lin</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+M">Ming-Yu Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qianli Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Mallya%2C+A">Arun Mallya</a>, <a href="/search/cs?searchtype=author&amp;query=Martino-Tarr%2C+A">Ashlee Martino-Tarr</a>, <a href="/search/cs?searchtype=author&amp;query=Mendez%2C+D">Doug Mendez</a>, <a href="/search/cs?searchtype=author&amp;query=Nah%2C+S">Seungjun Nah</a>, <a href="/search/cs?searchtype=author&amp;query=Pruett%2C+C">Chris Pruett</a> , et al. (7 additional authors not shown) </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.07126v1-abstract-short" style="display: inline;"> We introduce Edify Image, a family of diffusion models capable of generating photorealistic image content with pixel-perfect accuracy. Edify Image utilizes cascaded pixel-space diffusion models trained using a novel Laplacian diffusion process, in which image signals at different frequency bands are attenuated at varying rates. Edify Image supports a wide range of applications, including text-to-i&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.07126v1-abstract-full').style.display = 'inline'; document.getElementById('2411.07126v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.07126v1-abstract-full" style="display: none;"> We introduce Edify Image, a family of diffusion models capable of generating photorealistic image content with pixel-perfect accuracy. Edify Image utilizes cascaded pixel-space diffusion models trained using a novel Laplacian diffusion process, in which image signals at different frequency bands are attenuated at varying rates. Edify Image supports a wide range of applications, including text-to-image synthesis, 4K upsampling, ControlNets, 360 HDR panorama generation, and finetuning for image customization. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.07126v1-abstract-full').style.display = 'none'; document.getElementById('2411.07126v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 11 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.22981</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> DisenTS: Disentangled Channel Evolving Pattern Modeling for Multivariate Time Series Forecasting </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Liu%2C+Z">Zhiding Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+J">Jiqian Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Mao%2C+Q">Qingyang Mao</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+Y">Yuze Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Cheng%2C+M">Mingyue Cheng</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+Z">Zhi Li</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+Q">Qi Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+E">Enhong Chen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.22981v1-abstract-short" style="display: inline;"> Multivariate time series forecasting plays a crucial role in various real-world applications. Significant efforts have been made to integrate advanced network architectures and training strategies that enhance the capture of temporal dependencies, thereby improving forecasting accuracy. On the other hand, mainstream approaches typically utilize a single unified model with simplistic channel-mixing&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.22981v1-abstract-full').style.display = 'inline'; document.getElementById('2410.22981v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.22981v1-abstract-full" style="display: none;"> Multivariate time series forecasting plays a crucial role in various real-world applications. Significant efforts have been made to integrate advanced network architectures and training strategies that enhance the capture of temporal dependencies, thereby improving forecasting accuracy. On the other hand, mainstream approaches typically utilize a single unified model with simplistic channel-mixing embedding or cross-channel attention operations to account for the critical intricate inter-channel dependencies. Moreover, some methods even trade capacity for robust prediction based on the channel-independent assumption. Nonetheless, as time series data may display distinct evolving patterns due to the unique characteristics of each channel (including multiple strong seasonalities and trend changes), the unified modeling methods could yield suboptimal results. To this end, we propose DisenTS, a tailored framework for modeling disentangled channel evolving patterns in general multivariate time series forecasting. The central idea of DisenTS is to model the potential diverse patterns within the multivariate time series data in a decoupled manner. Technically, the framework employs multiple distinct forecasting models, each tasked with uncovering a unique evolving pattern. To guide the learning process without supervision of pattern partition, we introduce a novel Forecaster Aware Gate (FAG) module that generates the routing signals adaptively according to both the forecasters&#39; states and input series&#39; characteristics. The forecasters&#39; states are derived from the Linear Weight Approximation (LWA) strategy, which quantizes the complex deep neural networks into compact matrices. Additionally, the Similarity Constraint (SC) is further proposed to guide each model to specialize in an underlying pattern by minimizing the mutual information between the representations. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.22981v1-abstract-full').style.display = 'none'; document.getElementById('2410.22981v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 30 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.18505</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Wang%2C+L">Liangdong Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+B">Bo-Wen Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+C">Chengwei Wu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+H">Hanyu Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Shi%2C+X">Xiaofeng Shi</a>, <a href="/search/cs?searchtype=author&amp;query=Gu%2C+S">Shuhao Gu</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+J">Jijie Li</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Quanyue Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Pan%2C+T">TengFei Pan</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+G">Guang Liu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.18505v2-abstract-short" style="display: inline;"> We present CCI3.0-HQ (, a high-quality 500GB subset of the Chinese Corpora Internet 3.0 (CCI3.0)(, developed using a novel two-stage hybrid filtering pipeline that significantly enhances data quality. To evaluate its effectiveness, we trained a 0.5B parameter model from scratch on 100B tokens across various&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.18505v2-abstract-full').style.display = 'inline'; document.getElementById('2410.18505v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.18505v2-abstract-full" style="display: none;"> We present CCI3.0-HQ (, a high-quality 500GB subset of the Chinese Corpora Internet 3.0 (CCI3.0)(, developed using a novel two-stage hybrid filtering pipeline that significantly enhances data quality. To evaluate its effectiveness, we trained a 0.5B parameter model from scratch on 100B tokens across various datasets, achieving superior performance on 10 benchmarks in a zero-shot setting compared to CCI3.0, SkyPile, and WanjuanV1. The high-quality filtering process effectively distills the capabilities of the Qwen2-72B-instruct model into a compact 0.5B model, attaining optimal F1 scores for Chinese web data classification. We believe this open-access dataset will facilitate broader access to high-quality language models. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.18505v2-abstract-full').style.display = 'none'; document.getElementById('2410.18505v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 25 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 24 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.12346</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Towards Flexible and Efficient Diffusion Low Light Enhancer </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Lan%2C+G">Guanzhou Lan</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qianli Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+Y">Yuqi Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Z">Zhigang Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+D">Dong Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Yuan%2C+Y">Yuan Yuan</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+B">Bin Zhao</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.12346v1-abstract-short" style="display: inline;"> Diffusion-based Low-Light Image Enhancement (LLIE) has demonstrated significant success in improving the visibility of low-light images. However, the substantial computational burden introduced by the iterative sampling process remains a major concern. Current acceleration methods, whether training-based or training-free, often lead to significant performance degradation. As a result, to achieve a&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.12346v1-abstract-full').style.display = 'inline'; document.getElementById('2410.12346v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.12346v1-abstract-full" style="display: none;"> Diffusion-based Low-Light Image Enhancement (LLIE) has demonstrated significant success in improving the visibility of low-light images. However, the substantial computational burden introduced by the iterative sampling process remains a major concern. Current acceleration methods, whether training-based or training-free, often lead to significant performance degradation. As a result, to achieve an efficient student model with performance comparable to that of existing multi-step teacher model, it is usually necessary to retrain a more capable teacher model. This approach introduces inflexibility, as it requires additional training to enhance the teacher&#39;s performance. To address these challenges, we propose \textbf{Re}flectance-aware \textbf{D}iffusion with \textbf{Di}stilled \textbf{T}rajectory (\textbf{ReDDiT}), a step distillation framework specifically designed for LLIE. ReDDiT trains a student model to replicate the teacher&#39;s trajectory in fewer steps while also possessing the ability to surpass the teacher&#39;s performance. Specifically, we first introduce a trajectory decoder from the teacher model to provide guidance. Subsequently, a reflectance-aware trajectory refinement module is incorporated into the distillation process to enable more deterministic guidance from the teacher model. Our framework achieves comparable performance to previous diffusion-based methods with redundant steps in just 2 steps while establishing new state-of-the-art (SOTA) results with 8 or 4 steps. Comprehensive experimental evaluations on 10 benchmark datasets validate the effectiveness of our method, consistently outperforming existing SOTA methods. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.12346v1-abstract-full').style.display = 'none'; document.getElementById('2410.12346v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 16 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">7 pages</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.12274</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Liang%2C+P">Pengwei Liang</a>, <a href="/search/cs?searchtype=author&amp;query=Jiang%2C+J">Junjun Jiang</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qing Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+X">Xianming Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+J">Jiayi Ma</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.12274v1-abstract-short" style="display: inline;"> Image fusion is famous as an alternative solution to generate one high-quality image from multiple images in addition to image restoration from a single degraded image. The essence of image fusion is to integrate complementary information from source images. Existing fusion methods struggle with generalization across various tasks and often require labor-intensive designs, in which it is difficult&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.12274v1-abstract-full').style.display = 'inline'; document.getElementById('2410.12274v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.12274v1-abstract-full" style="display: none;"> Image fusion is famous as an alternative solution to generate one high-quality image from multiple images in addition to image restoration from a single degraded image. The essence of image fusion is to integrate complementary information from source images. Existing fusion methods struggle with generalization across various tasks and often require labor-intensive designs, in which it is difficult to identify and extract useful information from source images due to the diverse requirements of each fusion task. Additionally, these methods develop highly specialized features for different downstream applications, hindering the adaptation to new and diverse downstream tasks. To address these limitations, we introduce DeFusion++, a novel framework that leverages self-supervised learning (SSL) to enhance the versatility of feature representation for different image fusion tasks. DeFusion++ captures the image fusion task-friendly representations from large-scale data in a self-supervised way, overcoming the constraints of limited fusion datasets. Specifically, we introduce two innovative pretext tasks: common and unique decomposition (CUD) and masked feature modeling (MFM). CUD decomposes source images into abstract common and unique components, while MFM refines these components into robust fused features. Jointly training of these tasks enables DeFusion++ to produce adaptable representations that can effectively extract useful information from various source images, regardless of the fusion task. The resulting fused representations are also highly adaptable for a wide range of downstream tasks, including image segmentation and object detection. DeFusion++ stands out by producing versatile fused representations that can enhance both the quality of image fusion and the effectiveness of downstream high-level vision tasks, simplifying the process with the elegant fusion framework. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.12274v1-abstract-full').style.display = 'none'; document.getElementById('2410.12274v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 16 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">18page</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.08889</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Exploiting Memory-aware Q-distribution Prediction for Nuclear Fusion via Modern Hopfield Network </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qingchuan Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+S">Shiao Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Zheng%2C+T">Tong Zheng</a>, <a href="/search/cs?searchtype=author&amp;query=Dai%2C+X">Xiaodong Dai</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yifeng Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+Q">Qingquan Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+X">Xiao Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.08889v1-abstract-short" style="display: inline;"> This study addresses the critical challenge of predicting the Q-distribution in long-term stable nuclear fusion task, a key component for advancing clean energy solutions. We introduce an innovative deep learning framework that employs Modern Hopfield Networks to incorporate associative memory from historical shots. Utilizing a newly compiled dataset, we demonstrate the effectiveness of our approa&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.08889v1-abstract-full').style.display = 'inline'; document.getElementById('2410.08889v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.08889v1-abstract-full" style="display: none;"> This study addresses the critical challenge of predicting the Q-distribution in long-term stable nuclear fusion task, a key component for advancing clean energy solutions. We introduce an innovative deep learning framework that employs Modern Hopfield Networks to incorporate associative memory from historical shots. Utilizing a newly compiled dataset, we demonstrate the effectiveness of our approach in enhancing Q-distribution prediction. The proposed method represents a significant advancement by leveraging historical memory information for the first time in this context, showcasing improved prediction accuracy and contributing to the optimization of nuclear fusion research. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.08889v1-abstract-full').style.display = 'none'; document.getElementById('2410.08889v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 11 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.08879</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Multi-modal Fusion based Q-distribution Prediction for Controlled Nuclear Fusion </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Wang%2C+S">Shiao Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yifeng Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qingchuan Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+X">Xiao Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Yan%2C+N">Ning Yan</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+Q">Qingquan Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+G">Guosheng Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Tang%2C+J">Jin Tang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.08879v1-abstract-short" style="display: inline;"> Q-distribution prediction is a crucial research direction in controlled nuclear fusion, with deep learning emerging as a key approach to solving prediction challenges. In this paper, we leverage deep learning techniques to tackle the complexities of Q-distribution prediction. Specifically, we explore multimodal fusion methods in computer vision, integrating 2D line image data with the original 1D&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.08879v1-abstract-full').style.display = 'inline'; document.getElementById('2410.08879v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.08879v1-abstract-full" style="display: none;"> Q-distribution prediction is a crucial research direction in controlled nuclear fusion, with deep learning emerging as a key approach to solving prediction challenges. In this paper, we leverage deep learning techniques to tackle the complexities of Q-distribution prediction. Specifically, we explore multimodal fusion methods in computer vision, integrating 2D line image data with the original 1D data to form a bimodal input. Additionally, we employ the Transformer&#39;s attention mechanism for feature extraction and the interactive fusion of bimodal information. Extensive experiments validate the effectiveness of our approach, significantly reducing prediction errors in Q-distribution. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.08879v1-abstract-full').style.display = 'none'; document.getElementById('2410.08879v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 11 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.06664</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Decouple-Then-Merge: Towards Better Training for Diffusion Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qianli Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Ning%2C+X">Xuefei Ning</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+D">Dongrui Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Niu%2C+L">Li Niu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+L">Linfeng Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.06664v1-abstract-short" style="display: inline;"> Diffusion models are trained by learning a sequence of models that reverse each step of noise corruption. Typically, the model parameters are fully shared across multiple timesteps to enhance training efficiency. However, since the denoising tasks differ at each timestep, the gradients computed at different timesteps may conflict, potentially degrading the overall performance of image generation.&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.06664v1-abstract-full').style.display = 'inline'; document.getElementById('2410.06664v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.06664v1-abstract-full" style="display: none;"> Diffusion models are trained by learning a sequence of models that reverse each step of noise corruption. Typically, the model parameters are fully shared across multiple timesteps to enhance training efficiency. However, since the denoising tasks differ at each timestep, the gradients computed at different timesteps may conflict, potentially degrading the overall performance of image generation. To solve this issue, this work proposes a Decouple-then-Merge (DeMe) framework, which begins with a pretrained model and finetunes separate models tailored to specific timesteps. We introduce several improved techniques during the finetuning stage to promote effective knowledge sharing while minimizing training interference across timesteps. Finally, after finetuning, these separate models can be merged into a single model in the parameter space, ensuring efficient and practical inference. Experimental results show significant generation quality improvements upon 6 benchmarks including Stable Diffusion on COCO30K, ImageNet1K, PartiPrompts, and DDPM on LSUN Church, LSUN Bedroom, and CIFAR10. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.06664v1-abstract-full').style.display = 'none'; document.getElementById('2410.06664v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 9 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.06515</a> <span>&nbsp;[<a href="">pdf</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Software Engineering">cs.SE</span> </div> </div> <p class="title is-5 mathjax"> Studying Practitioners&#39; Expectations on Clear Code Review Comments </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Li%2C+Z">Zhenhao Li</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+J">Junkai Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Mao%2C+Q">Qiheng Mao</a>, <a href="/search/cs?searchtype=author&amp;query=Hu%2C+X">Xing Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+K">Kui Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Xia%2C+X">Xin Xia</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.06515v1-abstract-short" style="display: inline;"> The code review comment (CRC) is pivotal in the process of modern code review. It provides reviewers with the opportunity to identify potential bugs, offer constructive feedback, and suggest improvements. Clear and concise code review comments (CRCs) facilitate the communication between developers and is crucial to the correct understanding of the issues identified and proposed solutions. Despite&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.06515v1-abstract-full').style.display = 'inline'; document.getElementById('2410.06515v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.06515v1-abstract-full" style="display: none;"> The code review comment (CRC) is pivotal in the process of modern code review. It provides reviewers with the opportunity to identify potential bugs, offer constructive feedback, and suggest improvements. Clear and concise code review comments (CRCs) facilitate the communication between developers and is crucial to the correct understanding of the issues identified and proposed solutions. Despite the importance of CRCs&#39; clarity, there is still a lack of guidelines on what constitutes a good clarity and how to evaluate it. In this paper, we conduct a comprehensive study on understanding and evaluating the clarity of CRCs. We first derive a set of attributes related to the clarity of CRCs, namely RIE attributes (i.e., Relevance, Informativeness, and Expression), as well as their corresponding evaluation criteria based on our literature review and survey with practitioners. We then investigate the clarity of CRCs in open-source projects written in nine programming languages and find that a large portion (i.e., 28.8%) of the CRCs lack the clarity in at least one of the attributes. Finally, we propose ClearCRC, an automated framework that evaluates the clarity of CRCs. Experimental results show that ClearCRC can effectively evaluate the clarity of CRCs and outperform the baselines. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.06515v1-abstract-full').style.display = 'none'; document.getElementById('2410.06515v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 8 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.04454</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> CopyLens: Dynamically Flagging Copyrighted Sub-Dataset Contributions to LLM Outputs </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qichao Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Zhu%2C+R">Rui-Jie Zhu</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+P">Peiye Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Yan%2C+R">Renye Yan</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+F">Fahong Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Liang%2C+L">Ling Liang</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+M">Meng Li</a>, <a href="/search/cs?searchtype=author&amp;query=Yu%2C+Z">Zhaofei Yu</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Z">Zongwei Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Cai%2C+Y">Yimao Cai</a>, <a href="/search/cs?searchtype=author&amp;query=Huang%2C+T">Tiejun Huang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.04454v1-abstract-short" style="display: inline;"> Large Language Models (LLMs) have become pervasive due to their knowledge absorption and text-generation capabilities. Concurrently, the copyright issue for pretraining datasets has been a pressing concern, particularly when generation includes specific styles. Previous methods either focus on the defense of identical copyrighted outputs or find interpretability by individual tokens with computati&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.04454v1-abstract-full').style.display = 'inline'; document.getElementById('2410.04454v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.04454v1-abstract-full" style="display: none;"> Large Language Models (LLMs) have become pervasive due to their knowledge absorption and text-generation capabilities. Concurrently, the copyright issue for pretraining datasets has been a pressing concern, particularly when generation includes specific styles. Previous methods either focus on the defense of identical copyrighted outputs or find interpretability by individual tokens with computational burdens. However, the gap between them exists, where direct assessments of how dataset contributions impact LLM outputs are missing. Once the model providers ensure copyright protection for data holders, a more mature LLM community can be established. To address these limitations, we introduce CopyLens, a new framework to analyze how copyrighted datasets may influence LLM responses. Specifically, a two-stage approach is employed: First, based on the uniqueness of pretraining data in the embedding space, token representations are initially fused for potential copyrighted texts, followed by a lightweight LSTM-based network to analyze dataset contributions. With such a prior, a contrastive-learning-based non-copyright OOD detector is designed. Our framework can dynamically face different situations and bridge the gap between current copyright detection methods. Experiments show that CopyLens improves efficiency and accuracy by 15.2% over our proposed baseline, 58.7% over prompt engineering methods, and 0.21 AUC over OOD detection baselines. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.04454v1-abstract-full').style.display = 'none'; document.getElementById('2410.04454v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 6 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.00379</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Wang%2C+X">Xiao Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+F">Fuling Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+Y">Yuehang Li</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qingchuan Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+S">Shiao Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Jiang%2C+B">Bo Jiang</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+C">Chuanfu Li</a>, <a href="/search/cs?searchtype=author&amp;query=Tang%2C+J">Jin Tang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.00379v1-abstract-short" style="display: inline;"> X-ray image-based medical report generation (MRG) is a pivotal area in artificial intelligence which can significantly reduce diagnostic burdens and patient wait times. Despite significant progress, we believe that the task has reached a bottleneck due to the limited benchmark datasets and the existing large models&#39; insufficient capability enhancements in this specialized domain. Specifically, the&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.00379v1-abstract-full').style.display = 'inline'; document.getElementById('2410.00379v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.00379v1-abstract-full" style="display: none;"> X-ray image-based medical report generation (MRG) is a pivotal area in artificial intelligence which can significantly reduce diagnostic burdens and patient wait times. Despite significant progress, we believe that the task has reached a bottleneck due to the limited benchmark datasets and the existing large models&#39; insufficient capability enhancements in this specialized domain. Specifically, the recently released CheXpert Plus dataset lacks comparative evaluation algorithms and their results, providing only the dataset itself. This situation makes the training, evaluation, and comparison of subsequent algorithms challenging. Thus, we conduct a comprehensive benchmarking of existing mainstream X-ray report generation models and large language models (LLMs), on the CheXpert Plus dataset. We believe that the proposed benchmark can provide a solid comparative basis for subsequent algorithms and serve as a guide for researchers to quickly grasp the state-of-the-art models in this field. More importantly, we propose a large model for the X-ray image report generation using a multi-stage pre-training strategy, including self-supervised autoregressive generation and Xray-report contrastive learning, and supervised fine-tuning. Extensive experimental results indicate that the autoregressive pre-training based on Mamba effectively encodes X-ray images, and the image-text contrastive pre-training further aligns the feature spaces, achieving better experimental results. Source code can be found on \url{}. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.00379v1-abstract-full').style.display = 'none'; document.getElementById('2410.00379v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 1 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">In Peer Review</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.19976</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Numerical Analysis">math.NA</span> </div> </div> <p class="title is-5 mathjax"> Learning Partial Differential Equations with Deep Parallel Neural Operator </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qinglong Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+P">Peizhi Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+S">Sen Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Song%2C+T">Tao Song</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.19976v2-abstract-short" style="display: inline;"> In recent years, Solving partial differential equations has shifted the focus of traditional neural network studies from finite-dimensional Euclidean spaces to generalized functional spaces in research. A novel methodology is to learn an operator as a means of approximating the mapping between outputs. Currently, researchers have proposed a variety of operator architectures. Nevertheless, the majo&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.19976v2-abstract-full').style.display = 'inline'; document.getElementById('2409.19976v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.19976v2-abstract-full" style="display: none;"> In recent years, Solving partial differential equations has shifted the focus of traditional neural network studies from finite-dimensional Euclidean spaces to generalized functional spaces in research. A novel methodology is to learn an operator as a means of approximating the mapping between outputs. Currently, researchers have proposed a variety of operator architectures. Nevertheless, the majority of these architectures adopt an iterative update architecture, whereby a single operator is learned from the same function space. In practical physical science problems, the numerical solutions of partial differential equations are complex, and a serial single operator is unable to accurately approximate the intricate mapping between input and output. So, We propose a deep parallel operator model (DPNO) for efficiently and accurately solving partial differential equations. DPNO employs convolutional neural networks to extract local features and map data into distinct latent spaces. Designing a parallel block of double Fourier neural operators to solve the iterative error problem. DPNO approximates complex mappings between inputs and outputs by learning multiple operators in different potential spaces in parallel blocks. DPNO achieved the best performance on five of them, with an average improvement of 10.5\%, and ranked second on one dataset. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.19976v2-abstract-full').style.display = 'none'; document.getElementById('2409.19976v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 7 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 30 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.15259</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> S$^2$AG-Vid: Enhancing Multi-Motion Alignment in Video Diffusion Models via Spatial and Syntactic Attention-Based Guidance </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Li%2C+Y">Yuanhang Li</a>, <a href="/search/cs?searchtype=author&amp;query=Mao%2C+Q">Qi Mao</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+L">Lan Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Fang%2C+Z">Zhen Fang</a>, <a href="/search/cs?searchtype=author&amp;query=Tian%2C+L">Lei Tian</a>, <a href="/search/cs?searchtype=author&amp;query=Xiao%2C+X">Xinyan Xiao</a>, <a href="/search/cs?searchtype=author&amp;query=Jin%2C+L">Libiao Jin</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+H">Hua Wu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.15259v1-abstract-short" style="display: inline;"> Recent advancements in text-to-video (T2V) generation using diffusion models have garnered significant attention. However, existing T2V models primarily focus on simple scenes featuring a single object performing a single motion. Challenges arise in scenarios involving multiple objects with distinct motions, often leading to incorrect video-text alignment between subjects and their corresponding m&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.15259v1-abstract-full').style.display = 'inline'; document.getElementById('2409.15259v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.15259v1-abstract-full" style="display: none;"> Recent advancements in text-to-video (T2V) generation using diffusion models have garnered significant attention. However, existing T2V models primarily focus on simple scenes featuring a single object performing a single motion. Challenges arise in scenarios involving multiple objects with distinct motions, often leading to incorrect video-text alignment between subjects and their corresponding motions. To address this challenge, we propose \textbf{S$^2$AG-Vid}, a training-free inference-stage optimization method that improves the alignment of multiple objects with their corresponding motions in T2V models. S$^2$AG-Vid initially applies a spatial position-based, cross-attention (CA) constraint in the early stages of the denoising process, facilitating multiple nouns distinctly attending to the correct subject regions. To enhance the motion-subject binding, we implement a syntax-guided contrastive constraint in the subsequent denoising phase, aimed at improving the correlations between the CA maps of verbs and their corresponding nouns.Both qualitative and quantitative evaluations demonstrate that the proposed framework significantly outperforms baseline approaches, producing higher-quality videos with improved subject-motion consistency. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.15259v1-abstract-full').style.display = 'none'; document.getElementById('2409.15259v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 23 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.12865</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> KnowFormer: Revisiting Transformers for Knowledge Graph Reasoning </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Liu%2C+J">Junnan Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Mao%2C+Q">Qianren Mao</a>, <a href="/search/cs?searchtype=author&amp;query=Jiang%2C+W">Weifeng Jiang</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+J">Jianxin Li</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.12865v1-abstract-short" style="display: inline;"> Knowledge graph reasoning plays a vital role in various applications and has garnered considerable attention. Recently, path-based methods have achieved impressive performance. However, they may face limitations stemming from constraints in message-passing neural networks, such as missing paths and information over-squashing. In this paper, we revisit the application of transformers for knowledge&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.12865v1-abstract-full').style.display = 'inline'; document.getElementById('2409.12865v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.12865v1-abstract-full" style="display: none;"> Knowledge graph reasoning plays a vital role in various applications and has garnered considerable attention. Recently, path-based methods have achieved impressive performance. However, they may face limitations stemming from constraints in message-passing neural networks, such as missing paths and information over-squashing. In this paper, we revisit the application of transformers for knowledge graph reasoning to address the constraints faced by path-based methods and propose a novel method KnowFormer.KnowFormer utilizes a transformer architecture to perform reasoning on knowledge graphs from the message-passing perspective, rather than reasoning by textual information like previous pretrained language model based methods. Specifically, we define the attention computation based on the query prototype of knowledge graph reasoning, facilitating convenient construction and efficient optimization. To incorporate structural information into the self-attention mechanism, we introduce structure-aware modules to calculate query, key, and value respectively. Additionally, we present an efficient attention computation method for better scalability. Experimental results demonstrate the superior performance of KnowFormer compared to prominent baseline methods on both transductive and inductive benchmarks. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.12865v1-abstract-full').style.display = 'none'; document.getElementById('2409.12865v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 19 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted by ICML2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.09796</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Universal Topology Refinement for Medical Image Segmentation with Polynomial Feature Synthesis </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Li%2C+L">Liu Li</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+H">Hanchun Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Baugh%2C+M">Matthew Baugh</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qiang Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+W">Weitong Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Ouyang%2C+C">Cheng Ouyang</a>, <a href="/search/cs?searchtype=author&amp;query=Rueckert%2C+D">Daniel Rueckert</a>, <a href="/search/cs?searchtype=author&amp;query=Kainz%2C+B">Bernhard Kainz</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.09796v1-abstract-short" style="display: inline;"> Although existing medical image segmentation methods provide impressive pixel-wise accuracy, they often neglect topological correctness, making their segmentations unusable for many downstream tasks. One option is to retrain such models whilst including a topology-driven loss component. However, this is computationally expensive and often impractical. A better solution would be to have a versatile&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.09796v1-abstract-full').style.display = 'inline'; document.getElementById('2409.09796v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.09796v1-abstract-full" style="display: none;"> Although existing medical image segmentation methods provide impressive pixel-wise accuracy, they often neglect topological correctness, making their segmentations unusable for many downstream tasks. One option is to retrain such models whilst including a topology-driven loss component. However, this is computationally expensive and often impractical. A better solution would be to have a versatile plug-and-play topology refinement method that is compatible with any domain-specific segmentation pipeline. Directly training a post-processing model to mitigate topological errors often fails as such models tend to be biased towards the topological errors of a target segmentation network. The diversity of these errors is confined to the information provided by a labelled training set, which is especially problematic for small datasets. Our method solves this problem by training a model-agnostic topology refinement network with synthetic segmentations that cover a wide variety of topological errors. Inspired by the Stone-Weierstrass theorem, we synthesize topology-perturbation masks with randomly sampled coefficients of orthogonal polynomial bases, which ensures a complete and unbiased representation. Practically, we verified the efficiency and effectiveness of our methods as being compatible with multiple families of polynomial bases, and show evidence that our universal plug-and-play topology refinement network outperforms both existing topology-driven learning-based and post-processing methods. We also show that combining our method with learning-based models provides an effortless add-on, which can further improve the performance of existing approaches. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.09796v1-abstract-full').style.display = 'none'; document.getElementById('2409.09796v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 15 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted by the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024)</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.08775</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Human-Computer Interaction">cs.HC</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> What You Say = What You Want? Teaching Humans to Articulate Requirements for LLMs </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qianou Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Peng%2C+W">Weirui Peng</a>, <a href="/search/cs?searchtype=author&amp;query=Shen%2C+H">Hua Shen</a>, <a href="/search/cs?searchtype=author&amp;query=Koedinger%2C+K">Kenneth Koedinger</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+T">Tongshuang Wu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.08775v1-abstract-short" style="display: inline;"> Prompting ChatGPT to achieve complex goals (e.g., creating a customer support chatbot) often demands meticulous prompt engineering, including aspects like fluent writing and chain-of-thought techniques. While emerging prompt optimizers can automatically refine many of these aspects, we argue that clearly conveying customized requirements (e.g., how to handle diverse inputs) remains a human-centric&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.08775v1-abstract-full').style.display = 'inline'; document.getElementById('2409.08775v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.08775v1-abstract-full" style="display: none;"> Prompting ChatGPT to achieve complex goals (e.g., creating a customer support chatbot) often demands meticulous prompt engineering, including aspects like fluent writing and chain-of-thought techniques. While emerging prompt optimizers can automatically refine many of these aspects, we argue that clearly conveying customized requirements (e.g., how to handle diverse inputs) remains a human-centric challenge. In this work, we introduce Requirement-Oriented Prompt Engineering (ROPE), a paradigm that focuses human attention on generating clear, complete requirements during prompting. We implement ROPE through an assessment and training suite that provides deliberate practice with LLM-generated feedback. In a study with 30 novices, we show that requirement-focused training doubles novices&#39; prompting performance, significantly outperforming conventional prompt engineering training and prompt optimization. We also demonstrate that high-quality LLM outputs are directly tied to the quality of input requirements. Our work paves the way for more effective task delegation in human-LLM collaborative prompting. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.08775v1-abstract-full').style.display = 'none'; document.getElementById('2409.08775v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">15 pages, 5 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.07500</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Cryptography and Security">cs.CR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Information Retrieval">cs.IR</span> </div> </div> <p class="title is-5 mathjax"> DV-FSR: A Dual-View Target Attack Framework for Federated Sequential Recommendation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Qin%2C+Q">Qitao Qin</a>, <a href="/search/cs?searchtype=author&amp;query=Luo%2C+Y">Yucong Luo</a>, <a href="/search/cs?searchtype=author&amp;query=Cheng%2C+M">Mingyue Cheng</a>, <a href="/search/cs?searchtype=author&amp;query=Mao%2C+Q">Qingyang Mao</a>, <a href="/search/cs?searchtype=author&amp;query=Lei%2C+C">Chenyi Lei</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.07500v1-abstract-short" style="display: inline;"> Federated recommendation (FedRec) preserves user privacy by enabling decentralized training of personalized models, but this architecture is inherently vulnerable to adversarial attacks. Significant research has been conducted on targeted attacks in FedRec systems, motivated by commercial and social influence considerations. However, much of this work has largely overlooked the differential robust&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.07500v1-abstract-full').style.display = 'inline'; document.getElementById('2409.07500v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.07500v1-abstract-full" style="display: none;"> Federated recommendation (FedRec) preserves user privacy by enabling decentralized training of personalized models, but this architecture is inherently vulnerable to adversarial attacks. Significant research has been conducted on targeted attacks in FedRec systems, motivated by commercial and social influence considerations. However, much of this work has largely overlooked the differential robustness of recommendation models. Moreover, our empirical findings indicate that existing targeted attack methods achieve only limited effectiveness in Federated Sequential Recommendation (FSR) tasks. Driven by these observations, we focus on investigating targeted attacks in FSR and propose a novel dualview attack framework, named DV-FSR. This attack method uniquely combines a sampling-based explicit strategy with a contrastive learning-based implicit gradient strategy to orchestrate a coordinated attack. Additionally, we introduce a specific defense mechanism tailored for targeted attacks in FSR, aiming to evaluate the mitigation effects of the attack method we proposed. Extensive experiments validate the effectiveness of our proposed approach on representative sequential models. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.07500v1-abstract-full').style.display = 'none'; document.getElementById('2409.07500v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.05847</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ding%2C+H">Henghui Ding</a>, <a href="/search/cs?searchtype=author&amp;query=Hong%2C+L">Lingyi Hong</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+C">Chang Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+N">Ning Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+L">Linjie Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Fan%2C+Y">Yuchen Fan</a>, <a href="/search/cs?searchtype=author&amp;query=Miao%2C+D">Deshui Miao</a>, <a href="/search/cs?searchtype=author&amp;query=Gu%2C+Y">Yameng Gu</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+X">Xin Li</a>, <a href="/search/cs?searchtype=author&amp;query=He%2C+Z">Zhenyu He</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+Y">Yaowei Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+M">Ming-Hsuan Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Chai%2C+J">Jinming Chai</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qin Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+J">Junpei Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Jiao%2C+L">Licheng Jiao</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+F">Fang Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+X">Xinyu Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+J">Jing Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+K">Kexin Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+X">Xu Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+L">LingLing Li</a>, <a href="/search/cs?searchtype=author&amp;query=Fang%2C+H">Hao Fang</a>, <a href="/search/cs?searchtype=author&amp;query=Pan%2C+F">Feiyu Pan</a>, <a href="/search/cs?searchtype=author&amp;query=Lu%2C+X">Xiankai Lu</a> , et al. (8 additional authors not shown) </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.05847v1-abstract-short" style="display: inline;"> Despite the promising performance of current video segmentation models on existing benchmarks, these models still struggle with complex scenes. In this paper, we introduce the 6th Large-scale Video Object Segmentation (LSVOS) challenge in conjunction with ECCV 2024 workshop. This year&#39;s challenge includes two tasks: Video Object Segmentation (VOS) and Referring Video Object Segmentation (RVOS). In&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.05847v1-abstract-full').style.display = 'inline'; document.getElementById('2409.05847v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.05847v1-abstract-full" style="display: none;"> Despite the promising performance of current video segmentation models on existing benchmarks, these models still struggle with complex scenes. In this paper, we introduce the 6th Large-scale Video Object Segmentation (LSVOS) challenge in conjunction with ECCV 2024 workshop. This year&#39;s challenge includes two tasks: Video Object Segmentation (VOS) and Referring Video Object Segmentation (RVOS). In this year, we replace the classic YouTube-VOS and YouTube-RVOS benchmark with latest datasets MOSE, LVOS, and MeViS to assess VOS under more challenging complex environments. This year&#39;s challenge attracted 129 registered teams from more than 20 institutes across over 8 countries. This report include the challenge and dataset introduction, and the methods used by top 7 teams in two tracks. More details can be found in our homepage <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.05847v1-abstract-full').style.display = 'none'; document.getElementById('2409.05847v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 9 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">ECCV 2024 LSVOS Challenge Report:</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.00956</a> <span>&nbsp;[<a href="">pdf</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Physics-Informed Neural Network Based Digital Image Correlation Method </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Li%2C+B">Boda Li</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+S">Shichao Zhou</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qinwei Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+S">Shaopeng Ma</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.00956v1-abstract-short" style="display: inline;"> Digital Image Correlation (DIC) is a key technique in experimental mechanics for full-field deformation measurement, traditionally relying on subset matching to determine displacement fields. However, selecting optimal parameters like shape functions and subset size can be challenging in non-uniform deformation scenarios. Recent deep learning-based DIC approaches, both supervised and unsupervised,&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.00956v1-abstract-full').style.display = 'inline'; document.getElementById('2409.00956v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.00956v1-abstract-full" style="display: none;"> Digital Image Correlation (DIC) is a key technique in experimental mechanics for full-field deformation measurement, traditionally relying on subset matching to determine displacement fields. However, selecting optimal parameters like shape functions and subset size can be challenging in non-uniform deformation scenarios. Recent deep learning-based DIC approaches, both supervised and unsupervised, use neural networks to map speckle images to deformation fields, offering precise measurements without manual tuning. However, these methods require complex network architectures to extract speckle image features, which does not guarantee solution accuracy This paper introduces PINN-DIC, a novel DIC method based on Physics-Informed Neural Networks (PINNs). Unlike traditional approaches, PINN-DIC uses a simple fully connected neural network that takes the coordinate domain as input and outputs the displacement field. By integrating the DIC governing equation into the loss function, PINN-DIC directly extracts the displacement field from reference and deformed speckle images through iterative optimization. Evaluations on simulated and real experiments demonstrate that PINN-DIC maintains the accuracy of deep learning-based DIC in non-uniform fields while offering three distinct advantages: 1) enhanced precision with a simpler network by directly fitting the displacement field from coordinates, 2) effective handling of irregular boundary displacement fields with minimal parameter adjustments, and 3) easy integration with other neural network-based mechanical analysis methods for comprehensive DIC result analysis. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.00956v1-abstract-full').style.display = 'none'; document.getElementById('2409.00956v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 2 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.00130</a> <span>&nbsp;[<a href="">pdf</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Signal Processing">eess.SP</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Mirror contrastive loss based sliding window transformer for subject-independent motor imagery based EEG signal recognition </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Luo%2C+J">Jing Luo</a>, <a href="/search/cs?searchtype=author&amp;query=Mao%2C+Q">Qi Mao</a>, <a href="/search/cs?searchtype=author&amp;query=Shi%2C+W">Weiwei Shi</a>, <a href="/search/cs?searchtype=author&amp;query=Shi%2C+Z">Zhenghao Shi</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+X">Xiaofan Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Lu%2C+X">Xiaofeng Lu</a>, <a href="/search/cs?searchtype=author&amp;query=Hei%2C+X">Xinhong Hei</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.00130v1-abstract-short" style="display: inline;"> While deep learning models have been extensively utilized in motor imagery based EEG signal recognition, they often operate as black boxes. Motivated by neurological findings indicating that the mental imagery of left or right-hand movement induces event-related desynchronization (ERD) in the contralateral sensorimotor area of the brain, we propose a Mirror Contrastive Loss based Sliding Window Tr&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.00130v1-abstract-full').style.display = 'inline'; document.getElementById('2409.00130v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.00130v1-abstract-full" style="display: none;"> While deep learning models have been extensively utilized in motor imagery based EEG signal recognition, they often operate as black boxes. Motivated by neurological findings indicating that the mental imagery of left or right-hand movement induces event-related desynchronization (ERD) in the contralateral sensorimotor area of the brain, we propose a Mirror Contrastive Loss based Sliding Window Transformer (MCL-SWT) to enhance subject-independent motor imagery-based EEG signal recognition. Specifically, our proposed mirror contrastive loss enhances sensitivity to the spatial location of ERD by contrasting the original EEG signals with their mirror counterparts-mirror EEG signals generated by interchanging the channels of the left and right hemispheres of the EEG signals. Moreover, we introduce a temporal sliding window transformer that computes self-attention scores from high temporal resolution features, thereby improving model performance with manageable computational complexity. We evaluate the performance of MCL-SWT on subject-independent motor imagery EEG signal recognition tasks, and our experimental results demonstrate that MCL-SWT achieved accuracies of 66.48% and 75.62%, surpassing the state-of-the-art (SOTA) model by 2.82% and 2.17%, respectively. Furthermore, ablation experiments confirm the effectiveness of the proposed mirror contrastive loss. A code demo of MCL-SWT is available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.00130v1-abstract-full').style.display = 'none'; document.getElementById('2409.00130v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 29 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">This paper has been accepted by the Fourth International Workshop on Human Brain and Artificial Intelligence, joint workshop of the 33rd International Joint Conference on Artificial Intelligence, Jeju Island, South Korea, from August 3rd to August 9th, 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.15276</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> A Survey of Deep Learning for Group-level Emotion Recognition </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Huang%2C+X">Xiaohua Huang</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+J">Jinke Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Zheng%2C+W">Wenming Zheng</a>, <a href="/search/cs?searchtype=author&amp;query=Mao%2C+Q">Qirong Mao</a>, <a href="/search/cs?searchtype=author&amp;query=Dhall%2C+A">Abhinav Dhall</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.15276v1-abstract-short" style="display: inline;"> With the advancement of artificial intelligence (AI) technology, group-level emotion recognition (GER) has emerged as an important area in analyzing human behavior. Early GER methods are primarily relied on handcrafted features. However, with the proliferation of Deep Learning (DL) techniques and their remarkable success in diverse tasks, neural networks have garnered increasing interest in GER. U&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.15276v1-abstract-full').style.display = 'inline'; document.getElementById('2408.15276v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.15276v1-abstract-full" style="display: none;"> With the advancement of artificial intelligence (AI) technology, group-level emotion recognition (GER) has emerged as an important area in analyzing human behavior. Early GER methods are primarily relied on handcrafted features. However, with the proliferation of Deep Learning (DL) techniques and their remarkable success in diverse tasks, neural networks have garnered increasing interest in GER. Unlike individual&#39;s emotion, group emotions exhibit diversity and dynamics. Presently, several DL approaches have been proposed to effectively leverage the rich information inherent in group-level image and enhance GER performance significantly. In this survey, we present a comprehensive review of DL techniques applied to GER, proposing a new taxonomy for the field cover all aspects of GER based on DL. The survey overviews datasets, the deep GER pipeline, and performance comparisons of the state-of-the-art methods past decade. Moreover, it summarizes and discuss the fundamental approaches and advanced developments for each aspect. Furthermore, we identify outstanding challenges and suggest potential avenues for the design of robust GER systems. To the best of our knowledge, thus survey represents the first comprehensive review of deep GER methods, serving as a pivotal references for future GER research endeavors. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.15276v1-abstract-full').style.display = 'none'; document.getElementById('2408.15276v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">16 pages, 2 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.14734</a> <span>&nbsp;[<a href="">pdf</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Mathematical Physics">math-ph</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Numerical Analysis">math.NA</span> </div> </div> <p class="title is-5 mathjax"> General-Kindred Physics-Informed Neural Network to the Solutions of Singularly Perturbed Differential Equations </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Wang%2C+S">Sen Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+P">Peizhi Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qinglong Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Song%2C+T">Tao Song</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.14734v1-abstract-short" style="display: inline;"> Physics-Informed Neural Networks (PINNs) have become a promising research direction in the field of solving Partial Differential Equations (PDEs). Dealing with singular perturbation problems continues to be a difficult challenge in the field of PINN. The solution of singular perturbation problems often exhibits sharp boundary layers and steep gradients, and traditional PINN cannot achieve approxim&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.14734v1-abstract-full').style.display = 'inline'; document.getElementById('2408.14734v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.14734v1-abstract-full" style="display: none;"> Physics-Informed Neural Networks (PINNs) have become a promising research direction in the field of solving Partial Differential Equations (PDEs). Dealing with singular perturbation problems continues to be a difficult challenge in the field of PINN. The solution of singular perturbation problems often exhibits sharp boundary layers and steep gradients, and traditional PINN cannot achieve approximation of boundary layers. In this manuscript, we propose the General-Kindred Physics-Informed Neural Network (GKPINN) for solving Singular Perturbation Differential Equations (SPDEs). This approach utilizes asymptotic analysis to acquire prior knowledge of the boundary layer from the equation and establishes a novel network to assist PINN in approximating the boundary layer. It is compared with traditional PINN by solving examples of one-dimensional, two-dimensional, and time-varying SPDE equations. The research findings underscore the exceptional performance of our novel approach, GKPINN, which delivers a remarkable enhancement in reducing the $L_2$ error by two to four orders of magnitude compared to the established PINN methodology. This significant improvement is accompanied by a substantial acceleration in convergence rates, without compromising the high precision that is critical for our applications. Furthermore, GKPINN still performs well in extreme cases with perturbation parameters of ${1\times10}^{-38}$, demonstrating its excellent generalization ability. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.14734v1-abstract-full').style.display = 'none'; document.getElementById('2408.14734v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 26 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.13582</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> CSS-Segment: 2nd Place Report of LSVOS Challenge VOS Track </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Chai%2C+J">Jinming Chai</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qin Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+J">Junpei Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Jiao%2C+L">Licheng Jiao</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+F">Fang Liu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.13582v1-abstract-short" style="display: inline;"> Video object segmentation is a challenging task that serves as the cornerstone of numerous downstream applications, including video editing and autonomous driving. In this technical report, we briefly introduce the solution of our team &#34;yuanjie&#34; for video object segmentation in the 6-th LSVOS Challenge VOS Track at ECCV 2024. We believe that our proposed CSS-Segment will perform better in videos o&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.13582v1-abstract-full').style.display = 'inline'; document.getElementById('2408.13582v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.13582v1-abstract-full" style="display: none;"> Video object segmentation is a challenging task that serves as the cornerstone of numerous downstream applications, including video editing and autonomous driving. In this technical report, we briefly introduce the solution of our team &#34;yuanjie&#34; for video object segmentation in the 6-th LSVOS Challenge VOS Track at ECCV 2024. We believe that our proposed CSS-Segment will perform better in videos of complex object motion and long-term presentation. In this report, we successfully validated the effectiveness of the CSS-Segment in video object segmentation. Finally, our method achieved a J\&amp;F score of 80.84 in and test phases, and ultimately ranked 2nd in the 6-th LSVOS Challenge VOS Track at ECCV 2024. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.13582v1-abstract-full').style.display = 'none'; document.getElementById('2408.13582v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 24 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.12316</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhu%2C+L">Lingyu Zhu</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+W">Wenhan Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+B">Baoliang Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Zhu%2C+H">Hanwei Zhu</a>, <a href="/search/cs?searchtype=author&amp;query=Ni%2C+Z">Zhangkai Ni</a>, <a href="/search/cs?searchtype=author&amp;query=Mao%2C+Q">Qi Mao</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+S">Shiqi Wang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.12316v1-abstract-short" style="display: inline;"> Obtaining pairs of low/normal-light videos, with motions, is more challenging than still images, which raises technical issues and poses the technical route of unpaired learning as a critical role. This paper makes endeavors in the direction of learning for low-light video enhancement without using paired ground truth. Compared to low-light image enhancement, enhancing low-light videos is more dif&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.12316v1-abstract-full').style.display = 'inline'; document.getElementById('2408.12316v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.12316v1-abstract-full" style="display: none;"> Obtaining pairs of low/normal-light videos, with motions, is more challenging than still images, which raises technical issues and poses the technical route of unpaired learning as a critical role. This paper makes endeavors in the direction of learning for low-light video enhancement without using paired ground truth. Compared to low-light image enhancement, enhancing low-light videos is more difficult due to the intertwined effects of noise, exposure, and contrast in the spatial domain, jointly with the need for temporal coherence. To address the above challenge, we propose the Unrolled Decomposed Unpaired Network (UDU-Net) for enhancing low-light videos by unrolling the optimization functions into a deep network to decompose the signal into spatial and temporal-related factors, which are updated iteratively. Firstly, we formulate low-light video enhancement as a Maximum A Posteriori estimation (MAP) problem with carefully designed spatial and temporal visual regularization. Then, via unrolling the problem, the optimization of the spatial and temporal constraints can be decomposed into different steps and updated in a stage-wise manner. From the spatial perspective, the designed Intra subnet leverages unpair prior information from expert photography retouched skills to adjust the statistical distribution. Additionally, we introduce a novel mechanism that integrates human perception feedback to guide network optimization, suppressing over/under-exposure conditions. Meanwhile, to address the issue from the temporal perspective, the designed Inter subnet fully exploits temporal cues in progressive optimization, which helps achieve improved temporal consistency in enhancement results. Consequently, the proposed method achieves superior performance to state-of-the-art methods in video illumination, noise suppression, and temporal consistency across outdoor and indoor scenes. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.12316v1-abstract-full').style.display = 'none'; document.getElementById('2408.12316v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 22 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.11243</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Do Neural Scaling Laws Exist on Graph Self-Supervised Learning? </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qian Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Mao%2C+H">Haitao Mao</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+J">Jingzhe Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+Z">Zhehua Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Feng%2C+C">Chunlin Feng</a>, <a href="/search/cs?searchtype=author&amp;query=Song%2C+Y">Yu Song</a>, <a href="/search/cs?searchtype=author&amp;query=Shao%2C+Y">Yihan Shao</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Y">Yao Ma</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.11243v2-abstract-short" style="display: inline;"> Self-supervised learning~(SSL) is essential to obtain foundation models in NLP and CV domains via effectively leveraging knowledge in large-scale unlabeled data. The reason for its success is that a suitable SSL design can help the model to follow the neural scaling law, i.e., the performance consistently improves with increasing model and dataset sizes. However, it remains a mystery whether exist&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.11243v2-abstract-full').style.display = 'inline'; document.getElementById('2408.11243v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.11243v2-abstract-full" style="display: none;"> Self-supervised learning~(SSL) is essential to obtain foundation models in NLP and CV domains via effectively leveraging knowledge in large-scale unlabeled data. The reason for its success is that a suitable SSL design can help the model to follow the neural scaling law, i.e., the performance consistently improves with increasing model and dataset sizes. However, it remains a mystery whether existing SSL in the graph domain can follow the scaling behavior toward building Graph Foundation Models~(GFMs) with large-scale pre-training. In this study, we examine whether existing graph SSL techniques can follow the neural scaling behavior with the potential to serve as the essential component for GFMs. Our benchmark includes comprehensive SSL technique implementations with analysis conducted on both the conventional SSL setting and many new settings adopted in other domains. Surprisingly, despite the SSL loss continuously decreasing, no existing graph SSL techniques follow the neural scaling behavior on the downstream performance. The model performance only merely fluctuates on different data scales and model scales. Instead of the scales, the key factors influencing the performance are the choices of model architecture and pretext task design. This paper examines existing SSL techniques for the feasibility of Graph SSL techniques in developing GFMs and opens a new direction for graph SSL design with the new evaluation prototype. Our code implementation is available online to ease reproducibility on <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.11243v2-abstract-full').style.display = 'none'; document.getElementById('2408.11243v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 26 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 20 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.10906</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qi Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+Y">Yue Li</a>, <a href="/search/cs?searchtype=author&amp;query=Ren%2C+B">Bin Ren</a>, <a href="/search/cs?searchtype=author&amp;query=Sebe%2C+N">Nicu Sebe</a>, <a href="/search/cs?searchtype=author&amp;query=Konukoglu%2C+E">Ender Konukoglu</a>, <a href="/search/cs?searchtype=author&amp;query=Gevers%2C+T">Theo Gevers</a>, <a href="/search/cs?searchtype=author&amp;query=Van+Gool%2C+L">Luc Van Gool</a>, <a href="/search/cs?searchtype=author&amp;query=Paudel%2C+D+P">Danda Pani Paudel</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.10906v1-abstract-short" style="display: inline;"> 3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, w&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.10906v1-abstract-full').style.display = 'inline'; document.getElementById('2408.10906v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.10906v1-abstract-full" style="display: none;"> 3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, whose labels are in accordance with the respective datasets. The creation of this dataset utilized the compute equivalent of 2 GPU years on a TITAN XP GPU. We utilize our dataset for unsupervised pretraining and supervised finetuning for classification and segmentation tasks. To this end, we introduce \textbf{\textit{Gaussian-MAE}}, which highlights the unique benefits of representation learning from Gaussian parameters. Through exhaustive experiments, we provide several valuable insights. In particular, we show that (1) the distribution of the optimized GS centroids significantly differs from the uniformly sampled point cloud (used for initialization) counterpart; (2) this change in distribution results in degradation in classification but improvement in segmentation tasks when using only the centroids; (3) to leverage additional Gaussian parameters, we propose Gaussian feature grouping in a normalized feature space, along with splats pooling layer, offering a tailored solution to effectively group and embed similar Gaussians, which leads to notable improvement in finetuning tasks. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.10906v1-abstract-full').style.display = 'none'; document.getElementById('2408.10906v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 20 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.08709</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Information Retrieval">cs.IR</span> </div> </div> <p class="title is-5 mathjax"> Multimodal Relational Triple Extraction with Query-based Entity Object Transformer </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Hei%2C+L">Lei Hei</a>, <a href="/search/cs?searchtype=author&amp;query=An%2C+N">Ning An</a>, <a href="/search/cs?searchtype=author&amp;query=Liao%2C+T">Tingjing Liao</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qi Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+J">Jiaqi Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Ren%2C+F">Feiliang Ren</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.08709v1-abstract-short" style="display: inline;"> Multimodal Relation Extraction is crucial for constructing flexible and realistic knowledge graphs. Recent studies focus on extracting the relation type with entity pairs present in different modalities, such as one entity in the text and another in the image. However, existing approaches require entities and objects given beforehand, which is costly and impractical. To address the limitation, we&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.08709v1-abstract-full').style.display = 'inline'; document.getElementById('2408.08709v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.08709v1-abstract-full" style="display: none;"> Multimodal Relation Extraction is crucial for constructing flexible and realistic knowledge graphs. Recent studies focus on extracting the relation type with entity pairs present in different modalities, such as one entity in the text and another in the image. However, existing approaches require entities and objects given beforehand, which is costly and impractical. To address the limitation, we propose a novel task, Multimodal Entity-Object Relational Triple Extraction, which aims to extract all triples (entity span, relation, object region) from image-text pairs. To facilitate this study, we modified a multimodal relation extraction dataset MORE, which includes 21 relation types, to create a new dataset containing 20,264 triples, averaging 5.75 triples per image-text pair. Moreover, we propose QEOT, a query-based model with a selective attention mechanism, to dynamically explore the interaction and fusion of textual and visual information. In particular, the proposed method can simultaneously accomplish entity extraction, relation classification, and object detection with a set of queries. Our method is suitable for downstream applications and reduces error accumulation due to the pipeline-style approaches. Extensive experimental results demonstrate that our proposed method outperforms the existing baselines by 8.06% and achieves state-of-the-art performance. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.08709v1-abstract-full').style.display = 'none'; document.getElementById('2408.08709v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 16 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">15 pages, 7 figures, preprint</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.06710</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> </div> </div> <p class="title is-5 mathjax"> Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Xu%2C+J">Jian Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Du%2C+S">Shian Du</a>, <a href="/search/cs?searchtype=author&amp;query=Yang%2C+J">Junmei Yang</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qianli Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Zeng%2C+D">Delu Zeng</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.06710v1-abstract-short" style="display: inline;"> Gaussian Process Latent Variable Models (GPLVMs) have become increasingly popular for unsupervised tasks such as dimensionality reduction and missing data recovery due to their flexibility and non-linear nature. An importance-weighted version of the Bayesian GPLVMs has been proposed to obtain a tighter variational bound. However, this version of the approach is primarily limited to analyzing simpl&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.06710v1-abstract-full').style.display = 'inline'; document.getElementById('2408.06710v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.06710v1-abstract-full" style="display: none;"> Gaussian Process Latent Variable Models (GPLVMs) have become increasingly popular for unsupervised tasks such as dimensionality reduction and missing data recovery due to their flexibility and non-linear nature. An importance-weighted version of the Bayesian GPLVMs has been proposed to obtain a tighter variational bound. However, this version of the approach is primarily limited to analyzing simple data structures, as the generation of an effective proposal distribution can become quite challenging in high-dimensional spaces or with complex data sets. In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues. By transforming the posterior into a sequence of intermediate distributions using annealing, we combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution. We further propose an efficient algorithm by reparameterizing all variables in the evidence lower bound (ELBO). Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.06710v1-abstract-full').style.display = 'none'; document.getElementById('2408.06710v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.06567</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+B">Bo-Wen Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+L">Liangdong Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Yuan%2C+Y">Ye Yuan</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+J">Jijie Li</a>, <a href="/search/cs?searchtype=author&amp;query=Gu%2C+S">Shuhao Gu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+M">Mengdi Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+X">Xinya Wu</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+G">Guang Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Wu%2C+C">Chengwei Wu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+H">Hanyu Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Du%2C+L">Li Du</a>, <a href="/search/cs?searchtype=author&amp;query=Ju%2C+Y">Yiming Ju</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Quanyue Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Ao%2C+Y">Yulong Ao</a>, <a href="/search/cs?searchtype=author&amp;query=Zhao%2C+Y">Yingli Zhao</a>, <a href="/search/cs?searchtype=author&amp;query=Zhu%2C+S">Songhe Zhu</a>, <a href="/search/cs?searchtype=author&amp;query=Cao%2C+Z">Zhou Cao</a>, <a href="/search/cs?searchtype=author&amp;query=Liang%2C+D">Dong Liang</a>, <a href="/search/cs?searchtype=author&amp;query=Lin%2C+Y">Yonghua Lin</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+M">Ming Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+S">Shunfei Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Zhou%2C+Y">Yanxin Zhou</a>, <a href="/search/cs?searchtype=author&amp;query=Ye%2C+M">Min Ye</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+X">Xuekai Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Yu%2C+X">Xinyang Yu</a> , et al. (2 additional authors not shown) </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.06567v1-abstract-short" style="display: inline;"> In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch will cost a lot of computation resources while scaling up from a smaller model is a more efficient approach and has thus attracted significant attention&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.06567v1-abstract-full').style.display = 'inline'; document.getElementById('2408.06567v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.06567v1-abstract-full" style="display: none;"> In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch will cost a lot of computation resources while scaling up from a smaller model is a more efficient approach and has thus attracted significant attention. In this paper, we present AquilaMoE, a cutting-edge bilingual 8*16B Mixture of Experts (MoE) language model that has 8 experts with 16 billion parameters each and is developed using an innovative training methodology called EfficientScale. This approach optimizes performance while minimizing data requirements through a two-stage process. The first stage, termed Scale-Up, initializes the larger model with weights from a pre-trained smaller model, enabling substantial knowledge transfer and continuous pretraining with significantly less data. The second stage, Scale-Out, uses a pre-trained dense model to initialize the MoE experts, further enhancing knowledge transfer and performance. Extensive validation experiments on 1.8B and 7B models compared various initialization schemes, achieving models that maintain and reduce loss during continuous pretraining. Utilizing the optimal scheme, we successfully trained a 16B model and subsequently the 8*16B AquilaMoE model, demonstrating significant improvements in performance and training efficiency. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.06567v1-abstract-full').style.display = 'none'; document.getElementById('2408.06567v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 12 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.00981</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> <div class="is-inline-block" style="margin-left: 0.5rem"> <div class="tags has-addons"> <span class="tag is-dark is-size-7">doi</span> <span class="tag is-light is-size-7"><a class="" href="">10.18653/v1/2022.findings-acl.210 <i class="fa fa-external-link" aria-hidden="true"></i></a></span> </div> </div> </div> <p class="title is-5 mathjax"> Cross-domain Named Entity Recognition via Graph Matching </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zheng%2C+J">Junhao Zheng</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+H">Haibin Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qianli Ma</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.00981v2-abstract-short" style="display: inline;"> Cross-domain NER is a practical yet challenging problem since the data scarcity in the real-world scenario. A common practice is first to learn a NER model in a rich-resource general domain and then adapt the model to specific domains. Due to the mismatch problem between entity types across domains, the wide knowledge in the general domain can not effectively transfer to the target domain NER mode&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.00981v2-abstract-full').style.display = 'inline'; document.getElementById('2408.00981v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.00981v2-abstract-full" style="display: none;"> Cross-domain NER is a practical yet challenging problem since the data scarcity in the real-world scenario. A common practice is first to learn a NER model in a rich-resource general domain and then adapt the model to specific domains. Due to the mismatch problem between entity types across domains, the wide knowledge in the general domain can not effectively transfer to the target domain NER model. To this end, we model the label relationship as a probability distribution and construct label graphs in both source and target label spaces. To enhance the contextual representation with label structures, we fuse the label graph into the word embedding output by BERT. By representing label relationships as graphs, we formulate cross-domain NER as a graph matching problem. Furthermore, the proposed method has good applicability with pre-training methods and is potentially capable of other cross-domain prediction tasks. Empirical results on four datasets show that our method outperforms a series of transfer learning, multi-task learning, and few-shot learning methods. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.00981v2-abstract-full').style.display = 'none'; document.getElementById('2408.00981v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 7 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 1 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Findings of ACL; available at Findings 2022; Improve presentation</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.21282</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">ps</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Human-Computer Interaction">cs.HC</span> </div> </div> <p class="title is-5 mathjax"> FedBChain: A Blockchain-enabled Federated Learning Framework for Improving DeepConvLSTM with Comparative Strategy Insights </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Li%2C+G">Gaoxuan Li</a>, <a href="/search/cs?searchtype=author&amp;query=Lim%2C+C+H">Chern Hong Lim</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qiyao Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Tang%2C+X">Xinyu Tang</a>, <a href="/search/cs?searchtype=author&amp;query=Tew%2C+H+H">Hwa Hui Tew</a>, <a href="/search/cs?searchtype=author&amp;query=Ding%2C+F">Fan Ding</a>, <a href="/search/cs?searchtype=author&amp;query=Luo%2C+X">Xuewen Luo</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.21282v2-abstract-short" style="display: inline;"> Recent research in the field of Human Activity Recognition has shown that an improvement in prediction performance can be achieved by reducing the number of LSTM layers. However, this kind of enhancement is only significant on monolithic architectures, and when it runs on large-scale distributed training, data security and privacy issues will be reconsidered, and its prediction performance is unkn&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.21282v2-abstract-full').style.display = 'inline'; document.getElementById('2407.21282v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.21282v2-abstract-full" style="display: none;"> Recent research in the field of Human Activity Recognition has shown that an improvement in prediction performance can be achieved by reducing the number of LSTM layers. However, this kind of enhancement is only significant on monolithic architectures, and when it runs on large-scale distributed training, data security and privacy issues will be reconsidered, and its prediction performance is unknown. In this paper, we introduce a novel framework: FedBChain, which integrates the federated learning paradigm based on a modified DeepConvLSTM architecture with a single LSTM layer. This framework performs comparative tests of prediction performance on three different real-world datasets based on three different hidden layer units (128, 256, and 512) combined with five different federated learning strategies, respectively. The results show that our architecture has significant improvements in Precision, Recall and F1-score compared to the centralized training approach on all datasets with all hidden layer units for all strategies: FedAvg strategy improves on average by 4.54%, FedProx improves on average by 4.57%, FedTrimmedAvg improves on average by 4.35%, Krum improves by 4.18% on average, and FedAvgM improves by 4.46% on average. Based on our results, it can be seen that FedBChain not only improves in performance, but also guarantees the security and privacy of user data compared to centralized training methods during the training process. The code for our experiments is publicly available ( <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.21282v2-abstract-full').style.display = 'none'; document.getElementById('2407.21282v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 7 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 30 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.10374</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> An Empirical Study of Mamba-based Pedestrian Attribute Recognition </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Wang%2C+X">Xiao Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Kong%2C+W">Weizhe Kong</a>, <a href="/search/cs?searchtype=author&amp;query=Jin%2C+J">Jiandong Jin</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+S">Shiao Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Gao%2C+R">Ruichong Gao</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qingchuan Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+C">Chenglong Li</a>, <a href="/search/cs?searchtype=author&amp;query=Tang%2C+J">Jin Tang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.10374v1-abstract-short" style="display: inline;"> Current strong pedestrian attribute recognition models are developed based on Transformer networks, which are computationally heavy. Recently proposed models with linear complexity (e.g., Mamba) have garnered significant attention and have achieved a good balance between accuracy and computational cost across a variety of visual tasks. Relevant review articles also suggest that while these models&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.10374v1-abstract-full').style.display = 'inline'; document.getElementById('2407.10374v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.10374v1-abstract-full" style="display: none;"> Current strong pedestrian attribute recognition models are developed based on Transformer networks, which are computationally heavy. Recently proposed models with linear complexity (e.g., Mamba) have garnered significant attention and have achieved a good balance between accuracy and computational cost across a variety of visual tasks. Relevant review articles also suggest that while these models can perform well on some pedestrian attribute recognition datasets, they are generally weaker than the corresponding Transformer models. To further tap into the potential of the novel Mamba architecture for PAR tasks, this paper designs and adapts Mamba into two typical PAR frameworks, i.e., the text-image fusion approach and pure vision Mamba multi-label recognition framework. It is found that interacting with attribute tags as additional input does not always lead to an improvement, specifically, Vim can be enhanced, but VMamba cannot. This paper further designs various hybrid Mamba-Transformer variants and conducts thorough experimental validations. These experimental results indicate that simply enhancing Mamba with a Transformer does not always lead to performance improvements but yields better results under certain settings. We hope this empirical study can further inspire research in Mamba for PAR, and even extend into the domain of multi-label recognition, through the design of these network structures and comprehensive experimentation. The source code of this work will be released at \url{} <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.10374v1-abstract-full').style.display = 'none'; document.getElementById('2407.10374v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 14 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">In Peer Review</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.17438</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qi Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Paudel%2C+D+P">Danda Pani Paudel</a>, <a href="/search/cs?searchtype=author&amp;query=Konukoglu%2C+E">Ender Konukoglu</a>, <a href="/search/cs?searchtype=author&amp;query=Van+Gool%2C+L">Luc Van Gool</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.17438v1-abstract-short" style="display: inline;"> Neural implicit functions have demonstrated significant importance in various areas such as computer vision, graphics. Their advantages include the ability to represent complex shapes and scenes with high fidelity, smooth interpolation capabilities, and continuous representations. Despite these benefits, the development and analysis of implicit functions have been limited by the lack of comprehens&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.17438v1-abstract-full').style.display = 'inline'; document.getElementById('2406.17438v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.17438v1-abstract-full" style="display: none;"> Neural implicit functions have demonstrated significant importance in various areas such as computer vision, graphics. Their advantages include the ability to represent complex shapes and scenes with high fidelity, smooth interpolation capabilities, and continuous representations. Despite these benefits, the development and analysis of implicit functions have been limited by the lack of comprehensive datasets and the substantial computational resources required for their implementation and evaluation. To address these challenges, we introduce &#34;Implicit-Zoo&#34;: a large-scale dataset requiring thousands of GPU training days designed to facilitate research and development in this field. Our dataset includes diverse 2D and 3D scenes, such as CIFAR-10, ImageNet-1K, and Cityscapes for 2D image tasks, and the OmniObject3D dataset for 3D vision tasks. We ensure high quality through strict checks, refining or filtering out low-quality data. Using Implicit-Zoo, we showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models. This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.17438v1-abstract-full').style.display = 'none'; document.getElementById('2406.17438v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 25 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.16079</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> EERPD: Leveraging Emotion and Emotion Regulation for Improving Personality Detection </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Li%2C+Z">Zheng Li</a>, <a href="/search/cs?searchtype=author&amp;query=Zhu%2C+D">Dawei Zhu</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qilong Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Xiong%2C+W">Weimin Xiong</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+S">Sujian Li</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.16079v1-abstract-short" style="display: inline;"> Personality is a fundamental construct in psychology, reflecting an individual&#39;s behavior, thinking, and emotional patterns. Previous researches have made some progress in personality detection, primarily by utilizing the whole text to predict personality. However, these studies generally tend to overlook psychological knowledge: they rarely apply the well-established correlations between emotion&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.16079v1-abstract-full').style.display = 'inline'; document.getElementById('2406.16079v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.16079v1-abstract-full" style="display: none;"> Personality is a fundamental construct in psychology, reflecting an individual&#39;s behavior, thinking, and emotional patterns. Previous researches have made some progress in personality detection, primarily by utilizing the whole text to predict personality. However, these studies generally tend to overlook psychological knowledge: they rarely apply the well-established correlations between emotion regulation and personality. Based on this, we propose a new personality detection method called EERPD. This method introduces the use of emotion regulation, a psychological concept highly correlated with personality, for personality prediction. By combining this feature with emotion features, it retrieves few-shot examples and provides process CoTs for inferring labels from text. This approach enhances the understanding of LLM for personality within text and improves the performance in personality detection. Experimental results demonstrate that EERPD significantly enhances the accuracy and robustness of personality detection, outperforming previous SOTA by 15.05/4.29 in average F1 on the two benchmark datasets. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.16079v1-abstract-full').style.display = 'none'; document.getElementById('2406.16079v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 23 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.15000</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+L">Lichao Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Yu%2C+J">Jia Yu</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+S">Shuai Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+L">Long Li</a>, <a href="/search/cs?searchtype=author&amp;query=Zhong%2C+Y">Yangyang Zhong</a>, <a href="/search/cs?searchtype=author&amp;query=Liang%2C+G">Guanbao Liang</a>, <a href="/search/cs?searchtype=author&amp;query=Yan%2C+Y">Yuming Yan</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qing Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Weng%2C+F">Fangsheng Weng</a>, <a href="/search/cs?searchtype=author&amp;query=Pan%2C+F">Fayu Pan</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+J">Jing Li</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+R">Renjun Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Lan%2C+Z">Zhenzhong Lan</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.15000v1-abstract-short" style="display: inline;"> Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We cond&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.15000v1-abstract-full').style.display = 'inline'; document.getElementById('2406.15000v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.15000v1-abstract-full" style="display: none;"> Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We conduct a comprehensive analysis using a diverse set of chatbots and real-user interaction data, employing metrics such as retention rate and conversation length to evaluate user engagement. Our findings reveal a significant enhancement in user engagement with multi-modal interactions compared to text-only dialogues. Notably, the incorporation of a third modality significantly amplifies engagement beyond the benefits observed with just two modalities. These results suggest that multi-modal interactions optimize cognitive processing and facilitate richer information comprehension. This study underscores the importance of multi-modality in chatbot design, offering valuable insights for creating more engaging and immersive AI communication experiences and informing the broader AI community about the benefits of multi-modal interactions in enhancing user engagement. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.15000v1-abstract-full').style.display = 'none'; document.getElementById('2406.15000v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.14880</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Logic in Computer Science">cs.LO</span> </div> </div> <p class="title is-5 mathjax"> Pathformer: Recursive Path Query Encoding for Complex Logical Query Answering </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+C">Chongzhi Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Peng%2C+Z">Zhiping Peng</a>, <a href="/search/cs?searchtype=author&amp;query=Zheng%2C+J">Junhao Zheng</a>, <a href="/search/cs?searchtype=author&amp;query=Wang%2C+L">Linghao Wang</a>, <a href="/search/cs?searchtype=author&amp;query=Shi%2C+R">Ruifeng Shi</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qianli Ma</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.14880v1-abstract-short" style="display: inline;"> Complex Logical Query Answering (CLQA) over incomplete knowledge graphs is a challenging task. Recently, Query Embedding (QE) methods are proposed to solve CLQA by performing multi-hop logical reasoning. However, most of them only consider historical query context information while ignoring future information, which leads to their failure to capture the complex dependencies behind the elements of&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.14880v1-abstract-full').style.display = 'inline'; document.getElementById('2406.14880v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.14880v1-abstract-full" style="display: none;"> Complex Logical Query Answering (CLQA) over incomplete knowledge graphs is a challenging task. Recently, Query Embedding (QE) methods are proposed to solve CLQA by performing multi-hop logical reasoning. However, most of them only consider historical query context information while ignoring future information, which leads to their failure to capture the complex dependencies behind the elements of a query. In recent years, the transformer architecture has shown a strong ability to model long-range dependencies between words. The bidirectional attention mechanism proposed by the transformer can solve the limitation of these QE methods regarding query context. Still, as a sequence model, it is difficult for the transformer to model complex logical queries with branch structure computation graphs directly. To this end, we propose a neural one-point embedding method called Pathformer based on the tree-like computation graph, i.e., query computation tree. Specifically, Pathformer decomposes the query computation tree into path query sequences by branches and then uses the transformer encoder to recursively encode these path query sequences to obtain the final query embedding. This allows Pathformer to fully utilize future context information to explicitly model the complex interactions between various parts of the path query. Experimental results show that Pathformer outperforms existing competitive neural QE methods, and we found that Pathformer has the potential to be applied to non-one-point embedding space. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.14880v1-abstract-full').style.display = 'none'; document.getElementById('2406.14880v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">This work has been submitted to the IEEE</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.09701</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Software Engineering">cs.SE</span> </div> </div> <p class="title is-5 mathjax"> Towards Effectively Detecting and Explaining Vulnerabilities Using Large Language Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Mao%2C+Q">Qiheng Mao</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+Z">Zhenhao Li</a>, <a href="/search/cs?searchtype=author&amp;query=Hu%2C+X">Xing Hu</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+K">Kui Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Xia%2C+X">Xin Xia</a>, <a href="/search/cs?searchtype=author&amp;query=Sun%2C+J">Jianling Sun</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.09701v2-abstract-short" style="display: inline;"> Software vulnerabilities pose significant risks to the security and integrity of software systems. Prior studies have proposed various approaches to vulnerability detection using deep learning or pre-trained models. However, there is still a lack of detailed explanations for understanding vulnerabilities beyond merely detecting their occurrence, which fails to truly help software developers unders&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.09701v2-abstract-full').style.display = 'inline'; document.getElementById('2406.09701v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.09701v2-abstract-full" style="display: none;"> Software vulnerabilities pose significant risks to the security and integrity of software systems. Prior studies have proposed various approaches to vulnerability detection using deep learning or pre-trained models. However, there is still a lack of detailed explanations for understanding vulnerabilities beyond merely detecting their occurrence, which fails to truly help software developers understand and remediate the issues. Recently, large language models (LLMs) have demonstrated remarkable capabilities in comprehending complex contexts and generating content, presenting new opportunities for both detecting and explaining software vulnerabilities. In this paper, we conduct a comprehensive study to investigate the capabilities of LLMs in both detecting and explaining vulnerabilities, and we propose LLMVulExp, a framework that utilizes LLMs for these tasks. Under specialized fine-tuning for vulnerability explanation, our LLMVulExp not only detects the types of vulnerabilities in the code but also analyzes the code context to generate the cause, location, and repair suggestions for these vulnerabilities. These detailed explanations are crucial for helping developers quickly analyze and locate vulnerability issues, providing essential guidance and reference for effective remediation. We find that LLMVulExp can effectively enable the LLMs to perform vulnerability detection (e.g., achieving over a 90\% F1 score on the SeVC dataset) and provide detailed explanations. We also explore the potential of using advanced strategies such as Chain-of-Thought (CoT) to guide the LLMs in concentrating on vulnerability-prone code, achieving promising results. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.09701v2-abstract-full').style.display = 'none'; document.getElementById('2406.09701v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 8 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 14 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.07385</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Science and Game Theory">cs.GT</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computational Complexity">cs.CC</span> </div> </div> <p class="title is-5 mathjax"> Disrupting Bipartite Trading Networks: Matching for Revenue Maximization </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=D%27Amico-Wong%2C+L">Luca D&#39;Amico-Wong</a>, <a href="/search/cs?searchtype=author&amp;query=Gonczarowski%2C+Y+A">Yannai A. Gonczarowski</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+G+Q">Gary Qiurui Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Parkes%2C+D+C">David C. Parkes</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.07385v1-abstract-short" style="display: inline;"> We model the role of an online platform disrupting a market with unit-demand buyers and unit-supply sellers. Each seller can transact with a subset of the buyers whom she already knows, as well as with any additional buyers to whom she is introduced by the platform. Given these constraints on trade, prices and transactions are induced by a competitive equilibrium. The platform&#39;s revenue is proport&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.07385v1-abstract-full').style.display = 'inline'; document.getElementById('2406.07385v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.07385v1-abstract-full" style="display: none;"> We model the role of an online platform disrupting a market with unit-demand buyers and unit-supply sellers. Each seller can transact with a subset of the buyers whom she already knows, as well as with any additional buyers to whom she is introduced by the platform. Given these constraints on trade, prices and transactions are induced by a competitive equilibrium. The platform&#39;s revenue is proportional to the total price of all trades between platform-introduced buyers and sellers. In general, we show that the platform&#39;s revenue-maximization problem is computationally intractable. We provide structural results for revenue-optimal matchings and isolate special cases in which the platform can efficiently compute them. Furthermore, in a market where the maximum increase in social welfare that the platform can create is $螖W$, we prove that the platform can attain revenue $惟(螖W/\log(\min\{n,m\}))$, where $n$ and $m$ are the numbers of buyers and sellers, respectively. When $螖W$ is large compared to welfare without the platform, this gives a polynomial-time algorithm that guarantees a logarithmic approximation of the optimal welfare as revenue. We also show that even when the platform optimizes for revenue, the social welfare is at least an $O(\log(\min\{n,m\}))$-approximation to the optimal welfare. Finally, we prove significantly stronger bounds for revenue and social welfare in homogeneous-goods markets. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.07385v1-abstract-full').style.display = 'none'; document.getElementById('2406.07385v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 11 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted at the Twenty-Fifth ACM Conference on Economics and Computation (EC&#39;24), 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.06391</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Towards Lifelong Learning of Large Language Models: A Survey </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zheng%2C+J">Junhao Zheng</a>, <a href="/search/cs?searchtype=author&amp;query=Qiu%2C+S">Shengjie Qiu</a>, <a href="/search/cs?searchtype=author&amp;query=Shi%2C+C">Chengming Shi</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qianli Ma</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.06391v1-abstract-short" style="display: inline;"> As the applications of large language models (LLMs) expand across diverse fields, the ability of these models to adapt to ongoing changes in data, tasks, and user preferences becomes crucial. Traditional training methods, relying on static datasets, are increasingly inadequate for coping with the dynamic nature of real-world information. Lifelong learning, also known as continual or incremental le&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.06391v1-abstract-full').style.display = 'inline'; document.getElementById('2406.06391v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.06391v1-abstract-full" style="display: none;"> As the applications of large language models (LLMs) expand across diverse fields, the ability of these models to adapt to ongoing changes in data, tasks, and user preferences becomes crucial. Traditional training methods, relying on static datasets, are increasingly inadequate for coping with the dynamic nature of real-world information. Lifelong learning, also known as continual or incremental learning, addresses this challenge by enabling LLMs to learn continuously and adaptively over their operational lifetime, integrating new knowledge while retaining previously learned information and preventing catastrophic forgetting. This survey delves into the sophisticated landscape of lifelong learning, categorizing strategies into two primary groups: Internal Knowledge and External Knowledge. Internal Knowledge includes continual pretraining and continual finetuning, each enhancing the adaptability of LLMs in various scenarios. External Knowledge encompasses retrieval-based and tool-based lifelong learning, leveraging external data sources and computational tools to extend the model&#39;s capabilities without modifying core parameters. The key contributions of our survey are: (1) Introducing a novel taxonomy categorizing the extensive literature of lifelong learning into 12 scenarios; (2) Identifying common techniques across all lifelong learning scenarios and classifying existing literature into various technique groups within each scenario; (3) Highlighting emerging techniques such as model expansion and data selection, which were less explored in the pre-LLM era. Through a detailed examination of these groups and their respective categories, this survey aims to enhance the adaptability, reliability, and overall performance of LLMs in real-world applications. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.06391v1-abstract-full').style.display = 'none'; document.getElementById('2406.06391v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">37 pages</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.03625</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+Y">Yan Zhang</a>, <a href="/search/cs?searchtype=author&amp;query=Prokudin%2C+S">Sergey Prokudin</a>, <a href="/search/cs?searchtype=author&amp;query=Mihajlovic%2C+M">Marko Mihajlovic</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qianli Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Tang%2C+S">Siyu Tang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.03625v1-abstract-short" style="display: inline;"> Understanding the dynamics of generic 3D scenes is fundamentally challenging in computer vision, essential in enhancing applications related to scene reconstruction, motion tracking, and avatar creation. In this work, we address the task as the problem of inferring dense, long-range motion of 3D points. By observing a set of point trajectories, we aim to learn an implicit motion field parameterize&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.03625v1-abstract-full').style.display = 'inline'; document.getElementById('2406.03625v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.03625v1-abstract-full" style="display: none;"> Understanding the dynamics of generic 3D scenes is fundamentally challenging in computer vision, essential in enhancing applications related to scene reconstruction, motion tracking, and avatar creation. In this work, we address the task as the problem of inferring dense, long-range motion of 3D points. By observing a set of point trajectories, we aim to learn an implicit motion field parameterized by a neural network to predict the movement of novel points within the same domain, without relying on any data-driven or scene-specific priors. To achieve this, our approach builds upon the recently introduced dynamic point field model that learns smooth deformation fields between the canonical frame and individual observation frames. However, temporal consistency between consecutive frames is neglected, and the number of required parameters increases linearly with the sequence length due to per-frame modeling. To address these shortcomings, we exploit the intrinsic regularization provided by SIREN, and modify the input layer to produce a spatiotemporally smooth motion field. Additionally, we analyze the motion field Jacobian matrix, and discover that the motion degrees of freedom (DOFs) in an infinitesimal area around a point and the network hidden variables have different behaviors to affect the model&#39;s representational power. This enables us to improve the model representation capability while retaining the model compactness. Furthermore, to reduce the risk of overfitting, we introduce a regularization term based on the assumption of piece-wise motion smoothness. Our experiments assess the model&#39;s performance in predicting unseen point trajectories and its application in temporal mesh alignment with guidance. The results demonstrate its superiority and effectiveness. The code and data for the project are publicly available: \url{} <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.03625v1-abstract-full').style.display = 'none'; document.getElementById('2406.03625v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">cvpr24 post camera ready</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.02377</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Information Retrieval">cs.IR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> XRec: Large Language Models for Explainable Recommendation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qiyao Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Ren%2C+X">Xubin Ren</a>, <a href="/search/cs?searchtype=author&amp;query=Huang%2C+C">Chao Huang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.02377v2-abstract-short" style="display: inline;"> Recommender systems help users navigate information overload by providing personalized recommendations aligned with their preferences. Collaborative Filtering (CF) is a widely adopted approach, but while advanced techniques like graph neural networks (GNNs) and self-supervised learning (SSL) have enhanced CF models for better user representations, they often lack the ability to provide explanation&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.02377v2-abstract-full').style.display = 'inline'; document.getElementById('2406.02377v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.02377v2-abstract-full" style="display: none;"> Recommender systems help users navigate information overload by providing personalized recommendations aligned with their preferences. Collaborative Filtering (CF) is a widely adopted approach, but while advanced techniques like graph neural networks (GNNs) and self-supervised learning (SSL) have enhanced CF models for better user representations, they often lack the ability to provide explanations for the recommended items. Explainable recommendations aim to address this gap by offering transparency and insights into the recommendation decision-making process, enhancing users&#39; understanding. This work leverages the language capabilities of Large Language Models (LLMs) to push the boundaries of explainable recommender systems. We introduce a model-agnostic framework called XRec, which enables LLMs to provide comprehensive explanations for user behaviors in recommender systems. By integrating collaborative signals and designing a lightweight collaborative adaptor, the framework empowers LLMs to understand complex patterns in user-item interactions and gain a deeper understanding of user preferences. Our extensive experiments demonstrate the effectiveness of XRec, showcasing its ability to generate comprehensive and meaningful explanations that outperform baseline approaches in explainable recommender systems. We open-source our model implementation at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.02377v2-abstract-full').style.display = 'none'; document.getElementById('2406.02377v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 22 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 4 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted to EMNLP 2024 Findings</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.20710</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Information Retrieval">cs.IR</span> </div> </div> <p class="title is-5 mathjax"> Information Maximization via Variational Autoencoders for Cross-Domain Recommendation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ning%2C+X">Xuying Ning</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+W">Wujiang Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Liu%2C+X">Xiaolei Liu</a>, <a href="/search/cs?searchtype=author&amp;query=Ha%2C+M">Mingming Ha</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qiongxu Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+Y">Youru Li</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+L">Linxun Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Zhang%2C+Y">Yongfeng Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.20710v1-abstract-short" style="display: inline;"> Cross-Domain Sequential Recommendation (CDSR) methods aim to address the data sparsity and cold-start problems present in Single-Domain Sequential Recommendation (SDSR). Existing CDSR methods typically rely on overlapping users, designing complex cross-domain modules to capture users&#39; latent interests that can propagate across different domains. However, their propagated informative information is&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.20710v1-abstract-full').style.display = 'inline'; document.getElementById('2405.20710v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.20710v1-abstract-full" style="display: none;"> Cross-Domain Sequential Recommendation (CDSR) methods aim to address the data sparsity and cold-start problems present in Single-Domain Sequential Recommendation (SDSR). Existing CDSR methods typically rely on overlapping users, designing complex cross-domain modules to capture users&#39; latent interests that can propagate across different domains. However, their propagated informative information is limited to the overlapping users and the users who have rich historical behavior records. As a result, these methods often underperform in real-world scenarios, where most users are non-overlapping (cold-start) and long-tailed. In this research, we introduce a new CDSR framework named Information Maximization Variational Autoencoder (\textbf{\texttt{IM-VAE}}). Here, we suggest using a Pseudo-Sequence Generator to enhance the user&#39;s interaction history input for downstream fine-grained CDSR models to alleviate the cold-start issues. We also propose a Generative Recommendation Framework combined with three regularizers inspired by the mutual information maximization (MIM) theory \cite{mcgill1954multivariate} to capture the semantic differences between a user&#39;s interests shared across domains and those specific to certain domains, as well as address the informational gap between a user&#39;s actual interaction sequences and the pseudo-sequences generated. To the best of our knowledge, this paper is the first CDSR work that considers the information disentanglement and denoising of pseudo-sequences in the open-world recommendation scenario. Empirical experiments illustrate that \texttt{IM-VAE} outperforms the state-of-the-art approaches on two real-world cross-domain datasets on all sorts of users, including cold-start and tailed users, demonstrating the effectiveness of \texttt{IM-VAE} in open-world recommendation. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.20710v1-abstract-full').style.display = 'none'; document.getElementById('2405.20710v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 31 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.20336</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Sound">cs.SD</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Audio and Speech Processing">eess.AS</span> </div> </div> <p class="title is-5 mathjax"> RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Chen%2C+J">Jiaben Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Yan%2C+X">Xin Yan</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+Y">Yihang Chen</a>, <a href="/search/cs?searchtype=author&amp;query=Cen%2C+S">Siyuan Cen</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qinwei Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Zhen%2C+H">Haoyu Zhen</a>, <a href="/search/cs?searchtype=author&amp;query=Qian%2C+K">Kaizhi Qian</a>, <a href="/search/cs?searchtype=author&amp;query=Lu%2C+L">Lie Lu</a>, <a href="/search/cs?searchtype=author&amp;query=Gan%2C+C">Chuang Gan</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.20336v1-abstract-short" style="display: inline;"> In this work, we introduce a challenging task for simultaneously generating 3D holistic body motions and singing vocals directly from textual lyrics inputs, advancing beyond existing works that typically address these two modalities in isolation. To facilitate this, we first collect the RapVerse dataset, a large dataset containing synchronous rapping vocals, lyrics, and high-quality 3D holistic bo&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.20336v1-abstract-full').style.display = 'inline'; document.getElementById('2405.20336v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.20336v1-abstract-full" style="display: none;"> In this work, we introduce a challenging task for simultaneously generating 3D holistic body motions and singing vocals directly from textual lyrics inputs, advancing beyond existing works that typically address these two modalities in isolation. To facilitate this, we first collect the RapVerse dataset, a large dataset containing synchronous rapping vocals, lyrics, and high-quality 3D holistic body meshes. With the RapVerse dataset, we investigate the extent to which scaling autoregressive multimodal transformers across language, audio, and motion can enhance the coherent and realistic generation of vocals and whole-body human motions. For modality unification, a vector-quantized variational autoencoder is employed to encode whole-body motion sequences into discrete motion tokens, while a vocal-to-unit model is leveraged to obtain quantized audio tokens preserving content, prosodic information, and singer identity. By jointly performing transformer modeling on these three modalities in a unified way, our framework ensures a seamless and realistic blend of vocals and human motions. Extensive experiments demonstrate that our unified generation framework not only produces coherent and realistic singing vocals alongside human motions directly from textual inputs but also rivals the performance of specialized single-modality generation systems, establishing new benchmarks for joint vocal-motion generation. The project page is available for research purposes at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.20336v1-abstract-full').style.display = 'none'; document.getElementById('2405.20336v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 30 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Project website:</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.15403</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">stat.ML</span> </div> </div> <p class="title is-5 mathjax"> Fine-Grained Dynamic Framework for Bias-Variance Joint Optimization on Data Missing Not at Random </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Ha%2C+M">Mingming Ha</a>, <a href="/search/cs?searchtype=author&amp;query=Tao%2C+X">Xuewen Tao</a>, <a href="/search/cs?searchtype=author&amp;query=Lin%2C+W">Wenfang Lin</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qionxu Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Xu%2C+W">Wujiang Xu</a>, <a href="/search/cs?searchtype=author&amp;query=Chen%2C+L">Linxun Chen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.15403v1-abstract-short" style="display: inline;"> In most practical applications such as recommendation systems, display advertising, and so forth, the collected data often contains missing values and those missing values are generally missing-not-at-random, which deteriorates the prediction performance of models. Some existing estimators and regularizers attempt to achieve unbiased estimation to improve the predictive performance. However, varia&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.15403v1-abstract-full').style.display = 'inline'; document.getElementById('2405.15403v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.15403v1-abstract-full" style="display: none;"> In most practical applications such as recommendation systems, display advertising, and so forth, the collected data often contains missing values and those missing values are generally missing-not-at-random, which deteriorates the prediction performance of models. Some existing estimators and regularizers attempt to achieve unbiased estimation to improve the predictive performance. However, variances and generalization bound of these methods are generally unbounded when the propensity scores tend to zero, compromising their stability and robustness. In this paper, we first theoretically reveal that limitations of regularization techniques. Besides, we further illustrate that, for more general estimators, unbiasedness will inevitably lead to unbounded variance. These general laws inspire us that the estimator designs is not merely about eliminating bias, reducing variance, or simply achieve a bias-variance trade-off. Instead, it involves a quantitative joint optimization of bias and variance. Then, we develop a systematic fine-grained dynamic learning framework to jointly optimize bias and variance, which adaptively selects an appropriate estimator for each user-item pair according to the predefined objective function. With this operation, the generalization bounds and variances of models are reduced and bounded with theoretical guarantees. Extensive experiments are conducted to verify the theoretical results and the effectiveness of the proposed dynamic learning framework. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.15403v1-abstract-full').style.display = 'none'; document.getElementById('2405.15403v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 24 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.15124</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Scaling Law for Time Series Forecasting </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shi%2C+J">Jingzhe Shi</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+Q">Qinwei Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Ma%2C+H">Huan Ma</a>, <a href="/search/cs?searchtype=author&amp;query=Li%2C+L">Lei Li</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.15124v4-abstract-short" style="display: inline;"> Scaling law that rewards large datasets, complex models and enhanced data granularity has been observed in various fields of deep learning. Yet, studies on time series forecasting have cast doubt on scaling behaviors of deep learning methods for time series forecasting: while more training data improves performance, more capable models do not always outperform less capable models, and longer input&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.15124v4-abstract-full').style.display = 'inline'; document.getElementById('2405.15124v4-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.15124v4-abstract-full" style="display: none;"> Scaling law that rewards large datasets, complex models and enhanced data granularity has been observed in various fields of deep learning. Yet, studies on time series forecasting have cast doubt on scaling behaviors of deep learning methods for time series forecasting: while more training data improves performance, more capable models do not always outperform less capable models, and longer input horizons may hurt performance for some models. We propose a theory for scaling law for time series forecasting that can explain these seemingly abnormal behaviors. We take into account the impact of dataset size and model complexity, as well as time series data granularity, particularly focusing on the look-back horizon, an aspect that has been unexplored in previous theories. Furthermore, we empirically evaluate various models using a diverse set of time series forecasting datasets, which (1) verifies the validity of scaling law on dataset size and model complexity within the realm of time series forecasting, and (2) validates our theoretical framework, particularly regarding the influence of look back horizon. We hope our findings may inspire new models targeting time series forecasting datasets of limited size, as well as large foundational datasets and models for time series forecasting in future work. Code for our experiments has been made public at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.15124v4-abstract-full').style.display = 'none'; document.getElementById('2405.15124v4-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 9 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 23 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted by NeurIPS 2024</span> </p> </li> </ol> <nav class="pagination is-small is-centered breathe-horizontal" role="navigation" aria-label="pagination"> <a href="" class="pagination-previous is-invisible">Previous </a> <a 