class="breathe-horizontal" start="1"> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.17077</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Contrastive CFG: Improving CFG in Diffusion Models by Contrasting Positive and Negative Concepts </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jinho Chang</a>, <a href="/search/cs?searchtype=author&query=Chung%2C+H">Hyungjin Chung</a>, <a href="/search/cs?searchtype=author&query=Ye%2C+J+C">Jong Chul Ye</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.17077v1-abstract-short" style="display: inline;"> As Classifier-Free Guidance (CFG) has proven effective in conditional diffusion model sampling for improved condition alignment, many applications use a negated CFG term to filter out unwanted features from samples. However, simply negating CFG guidance creates an inverted probability distribution, often distorting samples away from the marginal distribution. Inspired by recent advances in conditi… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.17077v1-abstract-full').style.display = 'inline'; document.getElementById('2411.17077v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.17077v1-abstract-full" style="display: none;"> As Classifier-Free Guidance (CFG) has proven effective in conditional diffusion model sampling for improved condition alignment, many applications use a negated CFG term to filter out unwanted features from samples. However, simply negating CFG guidance creates an inverted probability distribution, often distorting samples away from the marginal distribution. Inspired by recent advances in conditional diffusion models for inverse problems, here we present a novel method to enhance negative CFG guidance using contrastive loss. Specifically, our guidance term aligns or repels the denoising direction based on the given condition through contrastive loss, achieving a nearly identical guiding direction to traditional CFG for positive guidance while overcoming the limitations of existing negative guidance methods. Experimental results demonstrate that our approach effectively removes undesirable concepts while maintaining sample quality across diverse scenarios, from simple class conditions to complex and overlapping text prompts. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.17077v1-abstract-full').style.display = 'none'; document.getElementById('2411.17077v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 25 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">14 pages, 8 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.14733</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Systems and Control">eess.SY</span> </div> </div> <p class="title is-5 mathjax"> FLARE: FP-Less PTQ and Low-ENOB ADC Based AMS-PiM for Error-Resilient, Fast, and Efficient Transformer Acceleration </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Yi%2C+D">Donghyeon Yi</a>, <a href="/search/cs?searchtype=author&query=Lee%2C+S">Seoyoung Lee</a>, <a href="/search/cs?searchtype=author&query=Kim%2C+J">Jongho Kim</a>, <a href="/search/cs?searchtype=author&query=Kim%2C+J">Junyoung Kim</a>, <a href="/search/cs?searchtype=author&query=Ha%2C+S">Sohmyung Ha</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+I+J">Ik Joon Chang</a>, <a href="/search/cs?searchtype=author&query=Je%2C+M">Minkyu Je</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.14733v1-abstract-short" style="display: inline;"> Encoder-based transformers, powered by self-attention layers, have revolutionized machine learning with their context-aware representations. However, their quadratic growth in computational and memory demands presents significant bottlenecks. Analog-Mixed-Signal Process-in-Memory (AMS-PiM) architectures address these challenges by enabling efficient on-chip processing. Traditionally, AMS-PiM relie… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.14733v1-abstract-full').style.display = 'inline'; document.getElementById('2411.14733v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.14733v1-abstract-full" style="display: none;"> Encoder-based transformers, powered by self-attention layers, have revolutionized machine learning with their context-aware representations. However, their quadratic growth in computational and memory demands presents significant bottlenecks. Analog-Mixed-Signal Process-in-Memory (AMS-PiM) architectures address these challenges by enabling efficient on-chip processing. Traditionally, AMS-PiM relies on Quantization-Aware Training (QAT), which is hardware-efficient but requires extensive retraining to adapt models to AMS-PiMs, making it increasingly impractical for transformer models. Post-Training Quantization (PTQ) mitigates this training overhead but introduces significant hardware inefficiencies. PTQ relies on dequantization-quantization (DQ-Q) processes, floating-point units (FPUs), and high-ENOB (Effective Number of Bits) analog-to-digital converters (ADCs). Particularly, High-ENOB ADCs scale exponentially in area and energy ($2^{ENOB}$), reduce sensing margins, and increase susceptibility to process, voltage, and temperature (PVT) variations, further compounding PTQ's challenges in AMS-PiM systems. To overcome these limitations, we propose RAP, an AMS-PiM architecture that eliminates DQ-Q processes, introduces FPU- and division-free nonlinear processing, and employs a low-ENOB-ADC-based sparse Matrix Vector multiplication technique. Using the proposed techniques, RAP improves error resiliency, area/energy efficiency, and computational speed while preserving numerical stability. Experimental results demonstrate that RAP outperforms state-of-the-art GPUs and conventional PiM architectures in energy efficiency, latency, and accuracy, making it a scalable solution for the efficient deployment of transformers. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.14733v1-abstract-full').style.display = 'none'; document.getElementById('2411.14733v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 22 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.14199</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Digital Libraries">cs.DL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Information Retrieval">cs.IR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Asai%2C+A">Akari Asai</a>, <a href="/search/cs?searchtype=author&query=He%2C+J">Jacqueline He</a>, <a href="/search/cs?searchtype=author&query=Shao%2C+R">Rulin Shao</a>, <a href="/search/cs?searchtype=author&query=Shi%2C+W">Weijia Shi</a>, <a href="/search/cs?searchtype=author&query=Singh%2C+A">Amanpreet Singh</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J+C">Joseph Chee Chang</a>, <a href="/search/cs?searchtype=author&query=Lo%2C+K">Kyle Lo</a>, <a href="/search/cs?searchtype=author&query=Soldaini%2C+L">Luca Soldaini</a>, <a href="/search/cs?searchtype=author&query=Feldman%2C+S">Sergey Feldman</a>, <a href="/search/cs?searchtype=author&query=D%27arcy%2C+M">Mike D'arcy</a>, <a href="/search/cs?searchtype=author&query=Wadden%2C+D">David Wadden</a>, <a href="/search/cs?searchtype=author&query=Latzke%2C+M">Matt Latzke</a>, <a href="/search/cs?searchtype=author&query=Tian%2C+M">Minyang Tian</a>, <a href="/search/cs?searchtype=author&query=Ji%2C+P">Pan Ji</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+S">Shengyan Liu</a>, <a href="/search/cs?searchtype=author&query=Tong%2C+H">Hao Tong</a>, <a href="/search/cs?searchtype=author&query=Wu%2C+B">Bohao Wu</a>, <a href="/search/cs?searchtype=author&query=Xiong%2C+Y">Yanyu Xiong</a>, <a href="/search/cs?searchtype=author&query=Zettlemoyer%2C+L">Luke Zettlemoyer</a>, <a href="/search/cs?searchtype=author&query=Neubig%2C+G">Graham Neubig</a>, <a href="/search/cs?searchtype=author&query=Weld%2C+D">Dan Weld</a>, <a href="/search/cs?searchtype=author&query=Downey%2C+D">Doug Downey</a>, <a href="/search/cs?searchtype=author&query=Yih%2C+W">Wen-tau Yih</a>, <a href="/search/cs?searchtype=author&query=Koh%2C+P+W">Pang Wei Koh</a>, <a href="/search/cs?searchtype=author&query=Hajishirzi%2C+H">Hannaneh Hajishirzi</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.14199v1-abstract-short" style="display: inline;"> Scientific progress depends on researchers' ability to synthesize the growing body of literature. Can large language models (LMs) assist scientists in this task? We introduce OpenScholar, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses. To evaluate OpenScholar, we dev… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.14199v1-abstract-full').style.display = 'inline'; document.getElementById('2411.14199v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.14199v1-abstract-full" style="display: none;"> Scientific progress depends on researchers' ability to synthesize the growing body of literature. Can large language models (LMs) assist scientists in this task? We introduce OpenScholar, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses. To evaluate OpenScholar, we develop ScholarQABench, the first large-scale multi-domain benchmark for literature search, comprising 2,967 expert-written queries and 208 long-form answers across computer science, physics, neuroscience, and biomedicine. On ScholarQABench, OpenScholar-8B outperforms GPT-4o by 5% and PaperQA2 by 7% in correctness, despite being a smaller, open model. While GPT4o hallucinates citations 78 to 90% of the time, OpenScholar achieves citation accuracy on par with human experts. OpenScholar's datastore, retriever, and self-feedback inference loop also improves off-the-shelf LMs: for instance, OpenScholar-GPT4o improves GPT-4o's correctness by 12%. In human evaluations, experts preferred OpenScholar-8B and OpenScholar-GPT4o responses over expert-written ones 51% and 70% of the time, respectively, compared to GPT4o's 32%. We open-source all of our code, models, datastore, data and a public demo. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.14199v1-abstract-full').style.display = 'none'; document.getElementById('2411.14199v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.11195</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Cryptography and Security">cs.CR</span> </div> </div> <p class="title is-5 mathjax"> SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Sun%2C+R">Ruoxi Sun</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jiamin Chang</a>, <a href="/search/cs?searchtype=author&query=Pearce%2C+H">Hammond Pearce</a>, <a href="/search/cs?searchtype=author&query=Xiao%2C+C">Chaowei Xiao</a>, <a href="/search/cs?searchtype=author&query=Li%2C+B">Bo Li</a>, <a href="/search/cs?searchtype=author&query=Wu%2C+Q">Qi Wu</a>, <a href="/search/cs?searchtype=author&query=Nepal%2C+S">Surya Nepal</a>, <a href="/search/cs?searchtype=author&query=Xue%2C+M">Minhui Xue</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.11195v2-abstract-short" style="display: inline;"> Multimodal foundation models (MFMs) represent a significant advancement in artificial intelligence, combining diverse data modalities to enhance learning and understanding across a wide range of applications. However, this integration also brings unique safety and security challenges. In this paper, we conceptualize cybersafety and cybersecurity in the context of multimodal learning and present a… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.11195v2-abstract-full').style.display = 'inline'; document.getElementById('2411.11195v2-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.11195v2-abstract-full" style="display: none;"> Multimodal foundation models (MFMs) represent a significant advancement in artificial intelligence, combining diverse data modalities to enhance learning and understanding across a wide range of applications. However, this integration also brings unique safety and security challenges. In this paper, we conceptualize cybersafety and cybersecurity in the context of multimodal learning and present a comprehensive Systematization of Knowledge (SoK) to unify these concepts in MFMs, identifying key threats to these models. We propose a taxonomy framework grounded in information theory, evaluating and categorizing threats through the concepts of channel capacity, signal, noise, and bandwidth. This approach provides a novel framework that unifies model safety and system security in MFMs, offering a more comprehensive and actionable understanding of the risks involved. We used this to explore existing defense mechanisms, and identified gaps in current research - particularly, a lack of protection for alignment between modalities and a need for more systematic defense methods. Our work contributes to a deeper understanding of the security and safety landscape in MFMs, providing researchers and practitioners with valuable insights for improving the robustness and reliability of these models. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.11195v2-abstract-full').style.display = 'none'; document.getElementById('2411.11195v2-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 19 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 17 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.10034</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Cryptography and Security">cs.CR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Multimedia">cs.MM</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Sound">cs.SD</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Audio and Speech Processing">eess.AS</span> </div> </div> <p class="title is-5 mathjax"> EveGuard: Defeating Vibration-based Side-Channel Eavesdropping with Audio Adversarial Perturbations </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jung-Woo Chang</a>, <a href="/search/cs?searchtype=author&query=Sun%2C+K">Ke Sun</a>, <a href="/search/cs?searchtype=author&query=Xia%2C+D">David Xia</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+X">Xinyu Zhang</a>, <a href="/search/cs?searchtype=author&query=Koushanfar%2C+F">Farinaz Koushanfar</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.10034v1-abstract-short" style="display: inline;"> Vibrometry-based side channels pose a significant privacy risk, exploiting sensors like mmWave radars, light sensors, and accelerometers to detect vibrations from sound sources or proximate objects, enabling speech eavesdropping. Despite various proposed defenses, these involve costly hardware solutions with inherent physical limitations. This paper presents EveGuard, a software-driven defense fra… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10034v1-abstract-full').style.display = 'inline'; document.getElementById('2411.10034v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.10034v1-abstract-full" style="display: none;"> Vibrometry-based side channels pose a significant privacy risk, exploiting sensors like mmWave radars, light sensors, and accelerometers to detect vibrations from sound sources or proximate objects, enabling speech eavesdropping. Despite various proposed defenses, these involve costly hardware solutions with inherent physical limitations. This paper presents EveGuard, a software-driven defense framework that creates adversarial audio, protecting voice privacy from side channels without compromising human perception. We leverage the distinct sensing capabilities of side channels and traditional microphones where side channels capture vibrations and microphones record changes in air pressure, resulting in different frequency responses. EveGuard first proposes a perturbation generator model (PGM) that effectively suppresses sensor-based eavesdropping while maintaining high audio quality. Second, to enable end-to-end training of PGM, we introduce a new domain translation task called Eve-GAN for inferring an eavesdropped signal from a given audio. We further apply few-shot learning to mitigate the data collection overhead for Eve-GAN training. Our extensive experiments show that EveGuard achieves a protection rate of more than 97 percent from audio classifiers and significantly hinders eavesdropped audio reconstruction. We further validate the performance of EveGuard across three adaptive attack mechanisms. We have conducted a user study to verify the perceptual quality of our perturbed audio. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.10034v1-abstract-full').style.display = 'none'; document.getElementById('2411.10034v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 15 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.07237</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Malaviya%2C+C">Chaitanya Malaviya</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J+C">Joseph Chee Chang</a>, <a href="/search/cs?searchtype=author&query=Roth%2C+D">Dan Roth</a>, <a href="/search/cs?searchtype=author&query=Iyyer%2C+M">Mohit Iyyer</a>, <a href="/search/cs?searchtype=author&query=Yatskar%2C+M">Mark Yatskar</a>, <a href="/search/cs?searchtype=author&query=Lo%2C+K">Kyle Lo</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.07237v1-abstract-short" style="display: inline;"> Language model users often issue queries that lack specification, where the context under which a query was issued -- such as the user's identity, the query's intent, and the criteria for a response to be useful -- is not explicit. For instance, a good response to a subjective query like "What book should I read next?" would depend on the user's preferences, and a good response to an open-ended qu… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.07237v1-abstract-full').style.display = 'inline'; document.getElementById('2411.07237v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.07237v1-abstract-full" style="display: none;"> Language model users often issue queries that lack specification, where the context under which a query was issued -- such as the user's identity, the query's intent, and the criteria for a response to be useful -- is not explicit. For instance, a good response to a subjective query like "What book should I read next?" would depend on the user's preferences, and a good response to an open-ended query like "How do antibiotics work against bacteria?" would depend on the user's expertise. This makes evaluation of responses to such queries an ill-posed task, as evaluators may make arbitrary judgments about the response quality. To remedy this, we present contextualized evaluations, a protocol that synthetically constructs context surrounding an underspecified query and provides it during evaluation. We find that the presence of context can 1) alter conclusions drawn from evaluation, even flipping win rates between model pairs, 2) nudge evaluators to make fewer judgments based on surface-level criteria, like style, and 3) provide new insights about model behavior across diverse contexts. Specifically, our procedure uncovers an implicit bias towards WEIRD contexts in models' "default" responses and we find that models are not equally sensitive to following different contexts, even when they are provided in prompts. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.07237v1-abstract-full').style.display = 'none'; document.getElementById('2411.07237v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 11 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Code & data available at</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.05025</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computers and Society">cs.CY</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Digital Libraries">cs.DL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Human-Computer Interaction">cs.HC</span> </div> </div> <p class="title is-5 mathjax"> LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Liao%2C+Z">Zhehui Liao</a>, <a href="/search/cs?searchtype=author&query=Antoniak%2C+M">Maria Antoniak</a>, <a href="/search/cs?searchtype=author&query=Cheong%2C+I">Inyoung Cheong</a>, <a href="/search/cs?searchtype=author&query=Cheng%2C+E+Y">Evie Yu-Yen Cheng</a>, <a href="/search/cs?searchtype=author&query=Lee%2C+A">Ai-Heng Lee</a>, <a href="/search/cs?searchtype=author&query=Lo%2C+K">Kyle Lo</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J+C">Joseph Chee Chang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+A+X">Amy X. Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.05025v1-abstract-short" style="display: inline;"> The rise of large language models (LLMs) has led many researchers to consider their usage for scientific work. Some have found benefits using LLMs to augment or automate aspects of their research pipeline, while others have urged caution due to risks and ethical concerns. Yet little work has sought to quantify and characterize how researchers use LLMs and why. We present the first large-scale surv… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05025v1-abstract-full').style.display = 'inline'; document.getElementById('2411.05025v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.05025v1-abstract-full" style="display: none;"> The rise of large language models (LLMs) has led many researchers to consider their usage for scientific work. Some have found benefits using LLMs to augment or automate aspects of their research pipeline, while others have urged caution due to risks and ethical concerns. Yet little work has sought to quantify and characterize how researchers use LLMs and why. We present the first large-scale survey of 816 verified research article authors to understand how the research community leverages and perceives LLMs as research tools. We examine participants' self-reported LLM usage, finding that 81% of researchers have already incorporated LLMs into different aspects of their research workflow. We also find that traditionally disadvantaged groups in academia (non-White, junior, and non-native English speaking researchers) report higher LLM usage and perceived benefits, suggesting potential for improved research equity. However, women, non-binary, and senior researchers have greater ethical concerns, potentially hindering adoption. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.05025v1-abstract-full').style.display = 'none'; document.getElementById('2411.05025v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 30 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">30 pages, 5 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.02353</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Human-Computer Interaction">cs.HC</span> </div> </div> <p class="title is-5 mathjax"> Social-RAG: Retrieving from Group Interactions to Socially Ground Proactive AI Generation to Group Preferences </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+R">Ruotong Wang</a>, <a href="/search/cs?searchtype=author&query=Zhou%2C+X">Xinyi Zhou</a>, <a href="/search/cs?searchtype=author&query=Qiu%2C+L">Lin Qiu</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J+C">Joseph Chee Chang</a>, <a href="/search/cs?searchtype=author&query=Bragg%2C+J">Jonathan Bragg</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+A+X">Amy X. Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.02353v1-abstract-short" style="display: inline;"> AI agents are increasingly tasked with making proactive suggestions in online spaces where groups collaborate, but can be unhelpful or even annoying, due to not fitting the group's preferences or behaving in socially inappropriate ways. Fortunately, group spaces have a rich history of prior social interactions and affordances for social feedback to support creating agents that align to a group's i… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02353v1-abstract-full').style.display = 'inline'; document.getElementById('2411.02353v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.02353v1-abstract-full" style="display: none;"> AI agents are increasingly tasked with making proactive suggestions in online spaces where groups collaborate, but can be unhelpful or even annoying, due to not fitting the group's preferences or behaving in socially inappropriate ways. Fortunately, group spaces have a rich history of prior social interactions and affordances for social feedback to support creating agents that align to a group's interests and norms. We present Social-RAG, a workflow for grounding agents to social information about a group, which retrieves from prior group interactions, selects relevant social signals, and then feeds the context into a large language model to generate messages to the group. We implement this into PaperPing, our system that posts academic paper recommendations in group chat, leveraging social signals determined from formative studies with 39 researchers. From a three-month deployment in 18 channels, we observed PaperPing posted relevant messages in groups without disrupting their existing social practices, fostering group common ground. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.02353v1-abstract-full').style.display = 'none'; document.getElementById('2411.02353v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.00632</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> PCoTTA: Continual Test-Time Adaptation for Multi-Task Point Cloud Understanding </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Jiang%2C+J">Jincen Jiang</a>, <a href="/search/cs?searchtype=author&query=Zhou%2C+Q">Qianyu Zhou</a>, <a href="/search/cs?searchtype=author&query=Li%2C+Y">Yuhang Li</a>, <a href="/search/cs?searchtype=author&query=Zhao%2C+X">Xinkui Zhao</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+M">Meili Wang</a>, <a href="/search/cs?searchtype=author&query=Ma%2C+L">Lizhuang Ma</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jian Chang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+J+J">Jian Jun Zhang</a>, <a href="/search/cs?searchtype=author&query=Lu%2C+X">Xuequan Lu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.00632v1-abstract-short" style="display: inline;"> In this paper, we present PCoTTA, an innovative, pioneering framework for Continual Test-Time Adaptation (CoTTA) in multi-task point cloud understanding, enhancing the model's transferability towards the continually changing target domain. We introduce a multi-task setting for PCoTTA, which is practical and realistic, handling multiple tasks within one unified model during the continual adaptation… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.00632v1-abstract-full').style.display = 'inline'; document.getElementById('2411.00632v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.00632v1-abstract-full" style="display: none;"> In this paper, we present PCoTTA, an innovative, pioneering framework for Continual Test-Time Adaptation (CoTTA) in multi-task point cloud understanding, enhancing the model's transferability towards the continually changing target domain. We introduce a multi-task setting for PCoTTA, which is practical and realistic, handling multiple tasks within one unified model during the continual adaptation. Our PCoTTA involves three key components: automatic prototype mixture (APM), Gaussian Splatted feature shifting (GSFS), and contrastive prototype repulsion (CPR). Firstly, APM is designed to automatically mix the source prototypes with the learnable prototypes with a similarity balancing factor, avoiding catastrophic forgetting. Then, GSFS dynamically shifts the testing sample toward the source domain, mitigating error accumulation in an online manner. In addition, CPR is proposed to pull the nearest learnable prototype close to the testing feature and push it away from other prototypes, making each prototype distinguishable during the adaptation. Experimental comparisons lead to a new benchmark, demonstrating PCoTTA's superiority in boosting the model's transferability towards the continually changing target domain. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.00632v1-abstract-full').style.display = 'none'; document.getElementById('2411.00632v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 1 November, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted to NeurIPS 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2411.00281</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> </div> <div class="is-inline-block" style="margin-left: 0.5rem"> <div class="tags has-addons"> <span class="tag is-dark is-size-7">doi</span> <span class="tag is-light is-size-7"><a class="" href="">10.1117/12.2015155 <i class="fa fa-external-link" aria-hidden="true"></i></a></span> </div> </div> </div> <p class="title is-5 mathjax"> Detection and tracking of gas plumes in LWIR hyperspectral video sequence data </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Gerhart%2C+T">Torin Gerhart</a>, <a href="/search/cs?searchtype=author&query=Sunu%2C+J">Justin Sunu</a>, <a href="/search/cs?searchtype=author&query=Merkurjev%2C+E">Ekaterina Merkurjev</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jen-Mei Chang</a>, <a href="/search/cs?searchtype=author&query=Gilles%2C+J">Jerome Gilles</a>, <a href="/search/cs?searchtype=author&query=Bertozzi%2C+A+L">Andrea L. Bertozzi</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2411.00281v1-abstract-short" style="display: inline;"> Automated detection of chemical plumes presents a segmentation challenge. The segmentation problem for gas plumes is difficult due to the diffusive nature of the cloud. The advantage of considering hyperspectral images in the gas plume detection problem over the conventional RGB imagery is the presence of non-visual data, allowing for a richer representation of information. In this paper we presen… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.00281v1-abstract-full').style.display = 'inline'; document.getElementById('2411.00281v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2411.00281v1-abstract-full" style="display: none;"> Automated detection of chemical plumes presents a segmentation challenge. The segmentation problem for gas plumes is difficult due to the diffusive nature of the cloud. The advantage of considering hyperspectral images in the gas plume detection problem over the conventional RGB imagery is the presence of non-visual data, allowing for a richer representation of information. In this paper we present an effective method of visualizing hyperspectral video sequences containing chemical plumes and investigate the effectiveness of segmentation techniques on these post-processed videos. Our approach uses a combination of dimension reduction and histogram equalization to prepare the hyperspectral videos for segmentation. First, Principal Components Analysis (PCA) is used to reduce the dimension of the entire video sequence. This is done by projecting each pixel onto the first few Principal Components resulting in a type of spectral filter. Next, a Midway method for histogram equalization is used. These methods redistribute the intensity values in order to reduce flicker between frames. This properly prepares these high-dimensional video sequences for more traditional segmentation techniques. We compare the ability of various clustering techniques to properly segment the chemical plume. These include K-means, spectral clustering, and the Ginzburg-Landau functional. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2411.00281v1-abstract-full').style.display = 'none'; document.getElementById('2411.00281v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 31 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> November 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Journal ref:</span> SPIE Defense, Security, and Sensing, 2013, Baltimore, Proceedings Volume 8743, Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XIX; 87430J (2013) </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.22360</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Newman%2C+B">Benjamin Newman</a>, <a href="/search/cs?searchtype=author&query=Lee%2C+Y">Yoonjoo Lee</a>, <a href="/search/cs?searchtype=author&query=Naik%2C+A">Aakanksha Naik</a>, <a href="/search/cs?searchtype=author&query=Siangliulue%2C+P">Pao Siangliulue</a>, <a href="/search/cs?searchtype=author&query=Fok%2C+R">Raymond Fok</a>, <a href="/search/cs?searchtype=author&query=Kim%2C+J">Juho Kim</a>, <a href="/search/cs?searchtype=author&query=Weld%2C+D+S">Daniel S. Weld</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J+C">Joseph Chee Chang</a>, <a href="/search/cs?searchtype=author&query=Lo%2C+K">Kyle Lo</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.22360v1-abstract-short" style="display: inline;"> When conducting literature reviews, scientists often create literature review tables - tables whose rows are publications and whose columns constitute a schema, a set of aspects used to compare and contrast the papers. Can we automatically generate these tables using language models (LMs)? In this work, we introduce a framework that leverages LMs to perform this task by decomposing it into separat… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.22360v1-abstract-full').style.display = 'inline'; document.getElementById('2410.22360v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.22360v1-abstract-full" style="display: none;"> When conducting literature reviews, scientists often create literature review tables - tables whose rows are publications and whose columns constitute a schema, a set of aspects used to compare and contrast the papers. Can we automatically generate these tables using language models (LMs)? In this work, we introduce a framework that leverages LMs to perform this task by decomposing it into separate schema and value generation steps. To enable experimentation, we address two main challenges: First, we overcome a lack of high-quality datasets to benchmark table generation by curating and releasing arxivDIGESTables, a new dataset of 2,228 literature review tables extracted from ArXiv papers that synthesize a total of 7,542 research papers. Second, to support scalable evaluation of model generations against human-authored reference tables, we develop DecontextEval, an automatic evaluation method that aligns elements of tables with the same underlying aspects despite differing surface forms. Given these tools, we evaluate LMs' abilities to reconstruct reference tables, finding this task benefits from additional context to ground the generation (e.g. table captions, in-text references). Finally, through a human evaluation study we find that even when LMs fail to fully reconstruct a reference table, their generated novel aspects can still be useful. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.22360v1-abstract-full').style.display = 'none'; document.getElementById('2410.22360v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 25 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">EMNLP 2024, 21 pages, 8 figures, 10 tables</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.10505</a> <span> [<a href="">pdf</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Comparison of deep learning and conventional methods for disease onset prediction </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=John%2C+L+H">Luis H. John</a>, <a href="/search/cs?searchtype=author&query=Kim%2C+C">Chungsoo Kim</a>, <a href="/search/cs?searchtype=author&query=Kors%2C+J+A">Jan A. Kors</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Junhyuk Chang</a>, <a href="/search/cs?searchtype=author&query=Morgan-Cooper%2C+H">Hannah Morgan-Cooper</a>, <a href="/search/cs?searchtype=author&query=Desai%2C+P">Priya Desai</a>, <a href="/search/cs?searchtype=author&query=Pang%2C+C">Chao Pang</a>, <a href="/search/cs?searchtype=author&query=Rijnbeek%2C+P+R">Peter R. Rijnbeek</a>, <a href="/search/cs?searchtype=author&query=Reps%2C+J+M">Jenna M. Reps</a>, <a href="/search/cs?searchtype=author&query=Fridgeirsson%2C+E+A">Egill A. Fridgeirsson</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.10505v1-abstract-short" style="display: inline;"> Background: Conventional prediction methods such as logistic regression and gradient boosting have been widely utilized for disease onset prediction for their reliability and interpretability. Deep learning methods promise enhanced prediction performance by extracting complex patterns from clinical data, but face challenges like data sparsity and high dimensionality. Methods: This study compares… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.10505v1-abstract-full').style.display = 'inline'; document.getElementById('2410.10505v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.10505v1-abstract-full" style="display: none;"> Background: Conventional prediction methods such as logistic regression and gradient boosting have been widely utilized for disease onset prediction for their reliability and interpretability. Deep learning methods promise enhanced prediction performance by extracting complex patterns from clinical data, but face challenges like data sparsity and high dimensionality. Methods: This study compares conventional and deep learning approaches to predict lung cancer, dementia, and bipolar disorder using observational data from eleven databases from North America, Europe, and Asia. Models were developed using logistic regression, gradient boosting, ResNet, and Transformer, and validated both internally and externally across the data sources. Discrimination performance was assessed using AUROC, and calibration was evaluated using Eavg. Findings: Across 11 datasets, conventional methods generally outperformed deep learning methods in terms of discrimination performance, particularly during external validation, highlighting their better transportability. Learning curves suggest that deep learning models require substantially larger datasets to reach the same performance levels as conventional methods. Calibration performance was also better for conventional methods, with ResNet showing the poorest calibration. Interpretation: Despite the potential of deep learning models to capture complex patterns in structured observational healthcare data, conventional models remain highly competitive for disease onset prediction, especially in scenarios involving smaller datasets and if lengthy training times need to be avoided. The study underscores the need for future research focused on optimizing deep learning models to handle the sparsity, high dimensionality, and heterogeneity inherent in healthcare datasets, and find new strategies to exploit the full capabilities of deep learning methods. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.10505v1-abstract-full').style.display = 'none'; document.getElementById('2410.10505v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 14 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.09254</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Few Exemplar-Based General Medical Image Segmentation via Domain-Aware Selective Adaptation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Xu%2C+C">Chen Xu</a>, <a href="/search/cs?searchtype=author&query=Huang%2C+Q">Qiming Huang</a>, <a href="/search/cs?searchtype=author&query=Hou%2C+Y">Yuqi Hou</a>, <a href="/search/cs?searchtype=author&query=Wu%2C+J">Jiangxing Wu</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+F">Fan Zhang</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+H+J">Hyung Jin Chang</a>, <a href="/search/cs?searchtype=author&query=Jiao%2C+J">Jianbo Jiao</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.09254v2-abstract-short" style="display: inline;"> Medical image segmentation poses challenges due to domain gaps, data modality variations, and dependency on domain knowledge or experts, especially for low- and middle-income countries (LMICs). Whereas for humans, given a few exemplars (with corresponding labels), we are able to segment different medical images even without exten-sive domain-specific clinical training. In addition, current SAM-bas… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.09254v2-abstract-full').style.display = 'inline'; document.getElementById('2410.09254v2-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.09254v2-abstract-full" style="display: none;"> Medical image segmentation poses challenges due to domain gaps, data modality variations, and dependency on domain knowledge or experts, especially for low- and middle-income countries (LMICs). Whereas for humans, given a few exemplars (with corresponding labels), we are able to segment different medical images even without exten-sive domain-specific clinical training. In addition, current SAM-based medical segmentation models use fine-grained visual prompts, such as the bounding rectangle generated from manually annotated target segmentation mask, as the bounding box (bbox) prompt during the testing phase. However, in actual clinical scenarios, no such precise prior knowledge is available. Our experimental results also reveal that previous models nearly fail to predict when given coarser bbox prompts. Considering these issues, in this paper, we introduce a domain-aware selective adaptation approach to adapt the general knowledge learned from a large model trained with natural images to the corresponding medical domains/modalities, with access to only a few (e.g. less than 5) exemplars. Our method mitigates the aforementioned limitations, providing an efficient and LMICs-friendly solution. Extensive experimental analysis showcases the effectiveness of our approach, offering potential advancements in healthcare diagnostics and clinical applications in LMICs. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.09254v2-abstract-full').style.display = 'none'; document.getElementById('2410.09254v2-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 25 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 11 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepcted in ACCV 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.07783</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> CLIP Multi-modal Hashing for Multimedia Retrieval </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Zhu%2C+J">Jian Zhu</a>, <a href="/search/cs?searchtype=author&query=Sheng%2C+M">Mingkai Sheng</a>, <a href="/search/cs?searchtype=author&query=Huang%2C+Z">Zhangmin Huang</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jingfei Chang</a>, <a href="/search/cs?searchtype=author&query=Jiang%2C+J">Jinling Jiang</a>, <a href="/search/cs?searchtype=author&query=Long%2C+J">Jian Long</a>, <a href="/search/cs?searchtype=author&query=Luo%2C+C">Cheng Luo</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+L">Lei Liu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.07783v1-abstract-short" style="display: inline;"> Multi-modal hashing methods are widely used in multimedia retrieval, which can fuse multi-source data to generate binary hash code. However, the individual backbone networks have limited feature expression capabilities and are not jointly pre-trained on large-scale unsupervised multi-modal data, resulting in low retrieval accuracy. To address this issue, we propose a novel CLIP Multi-modal Hashing… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.07783v1-abstract-full').style.display = 'inline'; document.getElementById('2410.07783v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.07783v1-abstract-full" style="display: none;"> Multi-modal hashing methods are widely used in multimedia retrieval, which can fuse multi-source data to generate binary hash code. However, the individual backbone networks have limited feature expression capabilities and are not jointly pre-trained on large-scale unsupervised multi-modal data, resulting in low retrieval accuracy. To address this issue, we propose a novel CLIP Multi-modal Hashing (CLIPMH) method. Our method employs the CLIP framework to extract both text and vision features and then fuses them to generate hash code. Due to enhancement on each modal feature, our method has great improvement in the retrieval performance of multi-modal hashing methods. Compared with state-of-the-art unsupervised and supervised multi-modal hashing methods, experiments reveal that the proposed CLIPMH can significantly improve performance (a maximum increase of 8.38% in mAP). <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.07783v1-abstract-full').style.display = 'none'; document.getElementById('2410.07783v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted by 31st International Conference on MultiMedia Modeling (MMM2025)</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.05634</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Methodology">stat.ME</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Econometrics">econ.EM</span> </div> </div> <p class="title is-5 mathjax"> Identification and estimation for matrix time series CP-factor models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jinyuan Chang</a>, <a href="/search/cs?searchtype=author&query=Du%2C+Y">Yue Du</a>, <a href="/search/cs?searchtype=author&query=Huang%2C+G">Guanglin Huang</a>, <a href="/search/cs?searchtype=author&query=Yao%2C+Q">Qiwei Yao</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.05634v1-abstract-short" style="display: inline;"> We investigate the identification and the estimation for matrix time series CP-factor models. Unlike the generalized eigenanalysis-based method of Chang et al. (2023) which requires the two factor loading matrices to be full-ranked, the newly proposed estimation can handle rank-deficient factor loading matrices. The estimation procedure consists of the spectral decomposition of several matrices an… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.05634v1-abstract-full').style.display = 'inline'; document.getElementById('2410.05634v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.05634v1-abstract-full" style="display: none;"> We investigate the identification and the estimation for matrix time series CP-factor models. Unlike the generalized eigenanalysis-based method of Chang et al. (2023) which requires the two factor loading matrices to be full-ranked, the newly proposed estimation can handle rank-deficient factor loading matrices. The estimation procedure consists of the spectral decomposition of several matrices and a matrix joint diagonalization algorithm, resulting in low computational cost. The theoretical guarantee established without the stationarity assumption shows that the proposed estimation exhibits a faster convergence rate than that of Chang et al. (2023). In fact the new estimator is free from the adverse impact of any eigen-gaps, unlike most eigenanalysis-based methods such as that of Chang et al. (2023). Furthermore, in terms of the error rates of the estimation, the proposed procedure is equivalent to handling a vector time series of dimension $\max(p,q)$ instead of $p \times q$, where $(p, q)$ are the dimensions of the matrix time series concerned. We have achieved this without assuming the "near orthogonality" of the loadings under various incoherence conditions often imposed in the CP-decomposition literature, see Han and Zhang (2022), Han et al. (2024) and the references within. Illustration with both simulated and real matrix time series data shows the usefulness of the proposed approach. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.05634v1-abstract-full').style.display = 'none'; document.getElementById('2410.05634v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 7 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.04612</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Gao%2C+Z">Zhaolin Gao</a>, <a href="/search/cs?searchtype=author&query=Zhan%2C+W">Wenhao Zhan</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J+D">Jonathan D. Chang</a>, <a href="/search/cs?searchtype=author&query=Swamy%2C+G">Gokul Swamy</a>, <a href="/search/cs?searchtype=author&query=Brantley%2C+K">Kiant茅 Brantley</a>, <a href="/search/cs?searchtype=author&query=Lee%2C+J+D">Jason D. Lee</a>, <a href="/search/cs?searchtype=author&query=Sun%2C+W">Wen Sun</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.04612v1-abstract-short" style="display: inline;"> Large Language Models (LLMs) have achieved remarkable success at tasks like summarization that involve a single turn of interaction. However, they can still struggle with multi-turn tasks like dialogue that require long-term planning. Previous works on multi-turn dialogue extend single-turn reinforcement learning from human feedback (RLHF) methods to the multi-turn setting by treating all prior di… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.04612v1-abstract-full').style.display = 'inline'; document.getElementById('2410.04612v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.04612v1-abstract-full" style="display: none;"> Large Language Models (LLMs) have achieved remarkable success at tasks like summarization that involve a single turn of interaction. However, they can still struggle with multi-turn tasks like dialogue that require long-term planning. Previous works on multi-turn dialogue extend single-turn reinforcement learning from human feedback (RLHF) methods to the multi-turn setting by treating all prior dialogue turns as a long context. Such approaches suffer from covariate shift: the conversations in the training set have previous turns generated by some reference policy, which means that low training error may not necessarily correspond to good performance when the learner is actually in the conversation loop. In response, we introduce REgressing the RELative FUture (REFUEL), an efficient policy optimization approach designed to address multi-turn RLHF in LLMs. REFUEL employs a single model to estimate $Q$-values and trains on self-generated data, addressing the covariate shift issue. REFUEL frames the multi-turn RLHF problem as a sequence of regression tasks on iteratively collected datasets, enabling ease of implementation. Theoretically, we prove that REFUEL can match the performance of any policy covered by the training set. Empirically, we evaluate our algorithm by using Llama-3.1-70B-it to simulate a user in conversation with our model. REFUEL consistently outperforms state-of-the-art methods such as DPO and REBEL across various settings. Furthermore, despite having only 8 billion parameters, Llama-3-8B-it fine-tuned with REFUEL outperforms Llama-3.1-70B-it on long multi-turn dialogues. Implementation of REFUEL can be found at, and models trained by REFUEL can be found at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.04612v1-abstract-full').style.display = 'none'; document.getElementById('2410.04612v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 6 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.04025</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Human-Computer Interaction">cs.HC</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> IdeaSynth: Iterative Research Idea Development Through Evolving and Composing Idea Facets with Literature-Grounded Feedback </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Pu%2C+K">Kevin Pu</a>, <a href="/search/cs?searchtype=author&query=Feng%2C+K+J+K">K. J. Kevin Feng</a>, <a href="/search/cs?searchtype=author&query=Grossman%2C+T">Tovi Grossman</a>, <a href="/search/cs?searchtype=author&query=Hope%2C+T">Tom Hope</a>, <a href="/search/cs?searchtype=author&query=Mishra%2C+B+D">Bhavana Dalvi Mishra</a>, <a href="/search/cs?searchtype=author&query=Latzke%2C+M">Matt Latzke</a>, <a href="/search/cs?searchtype=author&query=Bragg%2C+J">Jonathan Bragg</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J+C">Joseph Chee Chang</a>, <a href="/search/cs?searchtype=author&query=Siangliulue%2C+P">Pao Siangliulue</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.04025v1-abstract-short" style="display: inline;"> Research ideation involves broad exploring and deep refining ideas. Both require deep engagement with literature. Existing tools focus primarily on idea broad generation, yet offer little support for iterative specification, refinement, and evaluation needed to further develop initial ideas. To bridge this gap, we introduce IdeaSynth, a research idea development system that uses LLMs to provide li… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.04025v1-abstract-full').style.display = 'inline'; document.getElementById('2410.04025v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.04025v1-abstract-full" style="display: none;"> Research ideation involves broad exploring and deep refining ideas. Both require deep engagement with literature. Existing tools focus primarily on idea broad generation, yet offer little support for iterative specification, refinement, and evaluation needed to further develop initial ideas. To bridge this gap, we introduce IdeaSynth, a research idea development system that uses LLMs to provide literature-grounded feedback for articulating research problems, solutions, evaluations, and contributions. IdeaSynth represents these idea facets as nodes on a canvas, and allow researchers to iteratively refine them by creating and exploring variations and composing them. Our lab study (N=20) showed that participants, while using IdeaSynth, explored more alternative ideas and expanded initial ideas with more details compared to a strong LLM-based baseline. Our deployment study (N=7) demonstrated that participants effectively used IdeaSynth for real-world research projects at various ideation stages from developing initial ideas to revising framings of mature manuscripts, highlighting the possibilities to adopt IdeaSynth in researcher's workflows. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.04025v1-abstract-full').style.display = 'none'; document.getElementById('2410.04025v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2410.03688</a> <span> [<a href="">pdf</a>, <a href="">ps</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Networking and Internet Architecture">cs.NI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> LLM Agents as 6G Orchestrator: A Paradigm for Task-Oriented Physical-Layer Automation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Xiao%2C+Z">Zhuoran Xiao</a>, <a href="/search/cs?searchtype=author&query=Ye%2C+C">Chenhui Ye</a>, <a href="/search/cs?searchtype=author&query=Hu%2C+Y">Yunbo Hu</a>, <a href="/search/cs?searchtype=author&query=Yuan%2C+H">Honggang Yuan</a>, <a href="/search/cs?searchtype=author&query=Huang%2C+Y">Yihang Huang</a>, <a href="/search/cs?searchtype=author&query=Feng%2C+Y">Yijia Feng</a>, <a href="/search/cs?searchtype=author&query=Cai%2C+L">Liyu Cai</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jiang Chang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2410.03688v1-abstract-short" style="display: inline;"> The rapid advancement in generative pre-training models is propelling a paradigm shift in technological progression from basic applications such as chatbots towards more sophisticated agent-based systems. It is with huge potential and necessity that the 6G system be combined with the copilot of large language model (LLM) agents and digital twins (DT) to manage the highly complicated communication… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.03688v1-abstract-full').style.display = 'inline'; document.getElementById('2410.03688v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2410.03688v1-abstract-full" style="display: none;"> The rapid advancement in generative pre-training models is propelling a paradigm shift in technological progression from basic applications such as chatbots towards more sophisticated agent-based systems. It is with huge potential and necessity that the 6G system be combined with the copilot of large language model (LLM) agents and digital twins (DT) to manage the highly complicated communication system with new emerging features such as native AI service and sensing. With the 6G-oriented agent, the base station could understand the transmission requirements of various dynamic upper-layer tasks, automatically orchestrate the optimal system workflow. Through continuously get feedback from the 6G DT for reinforcement, the agents can finally raise the performance of practical system accordingly. Differing from existing LLM agents designed for general application, the 6G-oriented agent aims to make highly rigorous and precise planning with a vast amount of extra expert knowledge, which inevitably requires a specific system design from model training to implementation. This paper proposes a novel comprehensive approach for building task-oriented 6G LLM agents. We first propose a two-stage continual pre-training and fine-tuning scheme to build the field basic model and diversities of specialized expert models for meeting the requirements of various application scenarios. Further, a novel inference framework based on semantic retrieval for leveraging the existing communication-related functions is proposed. Experiment results of exemplary tasks, such as physical-layer task decomposition, show the proposed paradigm's feasibility and effectiveness. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2410.03688v1-abstract-full').style.display = 'none'; document.getElementById('2410.03688v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.20344</a> <span> [<a href="">pdf</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Systems and Control">eess.SY</span> </div> </div> <p class="title is-5 mathjax"> Design, manufacturing, and inverse dynamic modeling of soft parallel robots actuated by dielectric elastomer actuators </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jung-Che Chang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+X">Xi Wang</a>, <a href="/search/cs?searchtype=author&query=Axinte%2C+D">Dragos Axinte</a>, <a href="/search/cs?searchtype=author&query=Dong%2C+X">Xin Dong</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.20344v1-abstract-short" style="display: inline;"> Soft parallel robots with their manipulation safety and low commercial cost show a promising future for delicate operations and safe human-robot interactions. However, promoting the use of electroactive polymers (EAPs) is still challenging due to the under-improving quality of the product and the dynamic modelling of the collaborations between multiple actuators. This article presents the design,… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.20344v1-abstract-full').style.display = 'inline'; document.getElementById('2409.20344v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.20344v1-abstract-full" style="display: none;"> Soft parallel robots with their manipulation safety and low commercial cost show a promising future for delicate operations and safe human-robot interactions. However, promoting the use of electroactive polymers (EAPs) is still challenging due to the under-improving quality of the product and the dynamic modelling of the collaborations between multiple actuators. This article presents the design, fabrication, modelling and control of a parallel kinematics Delta robot actuated by dielectric elastomer actuators (DEAs). The trade-off between the actuation force and stroke is retaken by an angular stroke amplification mechanism, and the weight of the robot frame is reduced by utilizing 3D puzzling strip structures. A generic way of constructing a high-stability conductive paint on a silicon-based film has been achieved by laser scanning the DE-film and then sandwiching a conductive particle-based electrode with a paint which is mixed by the particles and photosensitive resin. Compared to the wildly used carbon grease, the fabricated electrode shows a higher consistency in its dynamic behaviour before and after the on-stand test. Finally, to predict the output force and inverse motion of the robot end effector, we constructed the inverse dynamic model by introducing an expanded Bergstrom-Boyce model to the constitutive behavior of the dielectric film. The experimental results show a prediction of robot output force with RSME of 12.4% when the end effector remains stationary, and a well-followed trajectory with less than RSME 2.5%. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.20344v1-abstract-full').style.display = 'none'; document.getElementById('2409.20344v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 30 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">17 pages, 12 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.20261</a> <span> [<a href="">pdf</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Classical Physics">physics.class-ph</span> </div> </div> <p class="title is-5 mathjax"> Bi-stable thin soft robot for in-plane locomotion in narrow space </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+X">Xi Wang</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jung-che Chang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+F">Feiran Wang</a>, <a href="/search/cs?searchtype=author&query=Axinte%2C+D">Dragos Axinte</a>, <a href="/search/cs?searchtype=author&query=Dong%2C+X">Xin Dong</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.20261v1-abstract-short" style="display: inline;"> Dielectric elastomer actuators (DEAs), also recognized as artificial muscle, have been widely developed for the soft locomotion robot. With the complaint skeleton and miniaturized dimension, they are well suited for the narrow space inspection. In this work, we propose a novel low profile (1.1mm) and lightweight (1.8g) bi-stable in-plane DEA (Bi-DEA) constructed by supporting a dielectric elastome… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.20261v1-abstract-full').style.display = 'inline'; document.getElementById('2409.20261v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.20261v1-abstract-full" style="display: none;"> Dielectric elastomer actuators (DEAs), also recognized as artificial muscle, have been widely developed for the soft locomotion robot. With the complaint skeleton and miniaturized dimension, they are well suited for the narrow space inspection. In this work, we propose a novel low profile (1.1mm) and lightweight (1.8g) bi-stable in-plane DEA (Bi-DEA) constructed by supporting a dielectric elastomer onto a flat bi-stable mechanism. It has an amplified displacement and output force compared with the in-plane DEA (I-DEA) without the bi-stable mechanism. Then, the Bi-DEA is applied to a thin soft robot, using three electrostatic adhesive pads (EA-Pads) as anchoring elements. This robot is capable of crawling and climbing to access millimetre-scale narrow gaps. A theoretical model of the bi-stable mechanism and the DEA are presented. The enhanced performance of the Bi-DEA induced by the mechanism is experimentally validated. EA-Pad provides the adhesion between the actuator and the locomotion substrate, allowing crawling and climbing on various surfaces, i.e., paper and acrylic. The thin soft robot has been demonstrated to be capable of crawling through a 4mm narrow gap with a speed up to 3.3mm/s (0.07 body length per second and 2.78 body thickness per second). <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.20261v1-abstract-full').style.display = 'none'; document.getElementById('2409.20261v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 30 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">8 pages, 12 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.15376</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> ControlMath: Controllable Data Generation Promotes Math Generalist Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chen%2C+N">Nuo Chen</a>, <a href="/search/cs?searchtype=author&query=Wu%2C+N">Ning Wu</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jianhui Chang</a>, <a href="/search/cs?searchtype=author&query=Li%2C+J">Jia Li</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.15376v1-abstract-short" style="display: inline;"> Utilizing large language models (LLMs) for data augmentation has yielded encouraging results in mathematical reasoning. However, these approaches face constraints in problem diversity, potentially restricting them to in-domain/distribution data generation. To this end, we propose ControlMath, an iterative method involving an equation-generator module and two LLM-based agents. The module creates di… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.15376v1-abstract-full').style.display = 'inline'; document.getElementById('2409.15376v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.15376v1-abstract-full" style="display: none;"> Utilizing large language models (LLMs) for data augmentation has yielded encouraging results in mathematical reasoning. However, these approaches face constraints in problem diversity, potentially restricting them to in-domain/distribution data generation. To this end, we propose ControlMath, an iterative method involving an equation-generator module and two LLM-based agents. The module creates diverse equations, which the Problem-Crafter agent then transforms into math word problems. The Reverse-Agent filters and selects high-quality data, adhering to the "less is more" principle, achieving better results with fewer data points. This approach enables the generation of diverse math problems, not limited to specific domains or distributions. As a result, we collect ControlMathQA, which involves 190k math word problems. Extensive results prove that combining our dataset with in-domain datasets like GSM8K can help improve the model's mathematical ability to generalize, leading to improved performances both within and beyond specific domains. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.15376v1-abstract-full').style.display = 'none'; document.getElementById('2409.15376v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 19 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">17 pages</span> </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Report number:</span> EMNLP 2024 Main </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.08702</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Audio and Speech Processing">eess.AS</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> DM: Dual-path Magnitude Network for General Speech Restoration </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Yang%2C+D">Da-Hee Yang</a>, <a href="/search/cs?searchtype=author&query=Kim%2C+D">Dail Kim</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Joon-Hyuk Chang</a>, <a href="/search/cs?searchtype=author&query=Choi%2C+J">Jeonghwan Choi</a>, <a href="/search/cs?searchtype=author&query=Moon%2C+H">Han-gil Moon</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.08702v1-abstract-short" style="display: inline;"> In this paper, we introduce a novel general speech restoration model: the Dual-path Magnitude (DM) network, designed to address multiple distortions including noise, reverberation, and bandwidth degradation effectively. The DM network employs dual parallel magnitude decoders that share parameters: one uses a masking-based algorithm for distortion removal and the other employs a mapping-based appro… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.08702v1-abstract-full').style.display = 'inline'; document.getElementById('2409.08702v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.08702v1-abstract-full" style="display: none;"> In this paper, we introduce a novel general speech restoration model: the Dual-path Magnitude (DM) network, designed to address multiple distortions including noise, reverberation, and bandwidth degradation effectively. The DM network employs dual parallel magnitude decoders that share parameters: one uses a masking-based algorithm for distortion removal and the other employs a mapping-based approach for speech restoration. A novel aspect of the DM network is the integration of the magnitude spectrogram output from the masking decoder into the mapping decoder through a skip connection, enhancing the overall restoration capability. This integrated approach overcomes the inherent limitations observed in previous models, as detailed in a step-by-step analysis. The experimental results demonstrate that the DM network outperforms other baseline models in the comprehensive aspect of general speech restoration, achieving substantial restoration with fewer parameters. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.08702v1-abstract-full').style.display = 'none'; document.getElementById('2409.08702v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.08512</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Software Engineering">cs.SE</span> </div> </div> <p class="title is-5 mathjax"> Learning Graph-based Patch Representations for Identifying and Assessing Silent Vulnerability Fixes </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Han%2C+M">Mei Han</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+L">Lulu Wang</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jianming Chang</a>, <a href="/search/cs?searchtype=author&query=Li%2C+B">Bixin Li</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+C">Chunguang Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.08512v1-abstract-short" style="display: inline;"> Software projects are dependent on many third-party libraries, therefore high-risk vulnerabilities can propagate through the dependency chain to downstream projects. Owing to the subjective nature of patch management, software vendors commonly fix vulnerabilities silently. Silent vulnerability fixes cause downstream software to be unaware of urgent security issues in a timely manner, posing a secu… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.08512v1-abstract-full').style.display = 'inline'; document.getElementById('2409.08512v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.08512v1-abstract-full" style="display: none;"> Software projects are dependent on many third-party libraries, therefore high-risk vulnerabilities can propagate through the dependency chain to downstream projects. Owing to the subjective nature of patch management, software vendors commonly fix vulnerabilities silently. Silent vulnerability fixes cause downstream software to be unaware of urgent security issues in a timely manner, posing a security risk to the software. Presently, most of the existing works for vulnerability fix identification only consider the changed code as a sequential textual sequence, ignoring the structural information of the code. In this paper, we propose GRAPE, a GRAph-based Patch rEpresentation that aims to 1) provide a unified framework for getting vulnerability fix patches representation; and 2) enhance the understanding of the intent and potential impact of patches by extracting structural information of the code. GRAPE employs a novel joint graph structure (MCPG) to represent the syntactic and semantic information of fix patches and embeds both nodes and edges. Subsequently, a carefully designed graph convolutional neural network (NE-GCN) is utilized to fully learn structural features by leveraging the attributes of the nodes and edges. Moreover, we construct a dataset containing 2251 silent fixes. For the experimental section, we evaluated patch representation on three tasks, including vulnerability fix identification, vulnerability types classification, and vulnerability severity classification. Experimental results indicate that, in comparison to baseline methods, GRAPE can more effectively reduce false positives and omissions of vulnerability fixes identification and provide accurate vulnerability assessments. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.08512v1-abstract-full').style.display = 'none'; document.getElementById('2409.08512v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 12 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">The paper has been accepted at the 35th IEEE International Symposium on Software Reliability Engineering (ISSRE 2024)</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2409.02771</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Programming Languages">cs.PL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Graphics">cs.GR</span> </div> <div class="is-inline-block" style="margin-left: 0.5rem"> <div class="tags has-addons"> <span class="tag is-dark is-size-7">doi</span> <span class="tag is-light is-size-7"><a class="" href="">10.1145/3689741 <i class="fa fa-external-link" aria-hidden="true"></i></a></span> </div> </div> </div> <p class="title is-5 mathjax"> CoolerSpace: A Language for Physically Correct and Computationally Efficient Color Programming </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chen%2C+E">Ethan Chen</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jiwon Chang</a>, <a href="/search/cs?searchtype=author&query=Zhu%2C+Y">Yuhao Zhu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2409.02771v1-abstract-short" style="display: inline;"> Color programmers manipulate lights, materials, and the resulting colors from light-material interactions. Existing libraries for color programming provide only a thin layer of abstraction around matrix operations. Color programs are, thus, vulnerable to bugs arising from mathematically permissible but physically meaningless matrix computations. Correct implementations are difficult to write and o… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.02771v1-abstract-full').style.display = 'inline'; document.getElementById('2409.02771v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2409.02771v1-abstract-full" style="display: none;"> Color programmers manipulate lights, materials, and the resulting colors from light-material interactions. Existing libraries for color programming provide only a thin layer of abstraction around matrix operations. Color programs are, thus, vulnerable to bugs arising from mathematically permissible but physically meaningless matrix computations. Correct implementations are difficult to write and optimize. We introduce CoolerSpace to facilitate physically correct and computationally efficient color programming. CoolerSpace raises the level of abstraction of color programming by allowing programmers to focus on describing the logic of color physics. Correctness and efficiency are handled by CoolerSpace. The type system in CoolerSpace assigns physical meaning and dimensions to user-defined objects. The typing rules permit only legal computations informed by color physics and perception. Along with type checking, CoolerSpace also generates performance-optimized programs using equality saturation. CoolerSpace is implemented as a Python library and compiles to ONNX, a common intermediate representation for tensor computations. CoolerSpace not only prevents common errors in color programming, but also does so without run-time overhead: even unoptimized CoolerSpace programs out-perform existing Python-based color programming systems by up to 5.7 times; our optimizations provide up to an additional 1.4 times speed-up. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2409.02771v1-abstract-full').style.display = 'none'; document.getElementById('2409.02771v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 4 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> September 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.14009</a> <span> [<a href="">pdf</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Robotics">cs.RO</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Optimizing TD3 for 7-DOF Robotic Arm Grasping: Overcoming Suboptimality with Exploration-Enhanced Contrastive Learning </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Hsieh%2C+W">Wen-Han Hsieh</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jen-Yuan Chang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.14009v1-abstract-short" style="display: inline;"> In actor-critic-based reinforcement learning algorithms such as Twin Delayed Deep Deterministic policy gradient (TD3), insufficient exploration of the spatial space can result in suboptimal policies when controlling 7-DOF robotic arms. To address this issue, we propose a novel Exploration-Enhanced Contrastive Learning (EECL) module that improves exploration by providing additional rewards for enco… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.14009v1-abstract-full').style.display = 'inline'; document.getElementById('2408.14009v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.14009v1-abstract-full" style="display: none;"> In actor-critic-based reinforcement learning algorithms such as Twin Delayed Deep Deterministic policy gradient (TD3), insufficient exploration of the spatial space can result in suboptimal policies when controlling 7-DOF robotic arms. To address this issue, we propose a novel Exploration-Enhanced Contrastive Learning (EECL) module that improves exploration by providing additional rewards for encountering novel states. Our module stores previously explored states in a buffer and identifies new states by comparing them with historical data using Euclidean distance within a K-dimensional tree (KDTree) framework. When the agent explores new states, exploration rewards are assigned. These rewards are then integrated into the TD3 algorithm, ensuring that the Q-learning process incorporates these signals, promoting more effective strategy optimization. We evaluate our method on the robosuite panda lift task, demonstrating that it significantly outperforms the baseline TD3 in terms of both efficiency and convergence speed in the tested environment. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.14009v1-abstract-full').style.display = 'none'; document.getElementById('2408.14009v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 26 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">4 pages, 2 figures, IEEE-ICKII-2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.11791</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Critique-out-Loud Reward Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Ankner%2C+Z">Zachary Ankner</a>, <a href="/search/cs?searchtype=author&query=Paul%2C+M">Mansheej Paul</a>, <a href="/search/cs?searchtype=author&query=Cui%2C+B">Brandon Cui</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J+D">Jonathan D. Chang</a>, <a href="/search/cs?searchtype=author&query=Ammanabrolu%2C+P">Prithviraj Ammanabrolu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.11791v1-abstract-short" style="display: inline;"> Traditionally, reward models used for reinforcement learning from human feedback (RLHF) are trained to directly predict preference scores without leveraging the generation capabilities of the underlying large language model (LLM). This limits the capabilities of reward models as they must reason implicitly about the quality of a response, i.e., preference modeling must be performed in a single for… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.11791v1-abstract-full').style.display = 'inline'; document.getElementById('2408.11791v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.11791v1-abstract-full" style="display: none;"> Traditionally, reward models used for reinforcement learning from human feedback (RLHF) are trained to directly predict preference scores without leveraging the generation capabilities of the underlying large language model (LLM). This limits the capabilities of reward models as they must reason implicitly about the quality of a response, i.e., preference modeling must be performed in a single forward pass through the model. To enable reward models to reason explicitly about the quality of a response, we introduce Critique-out-Loud (CLoud) reward models. CLoud reward models operate by first generating a natural language critique of the assistant's response that is then used to predict a scalar reward for the quality of the response. We demonstrate the success of CLoud reward models for both Llama-3-8B and 70B base models: compared to classic reward models CLoud reward models improve pairwise preference classification accuracy on RewardBench by 4.65 and 5.84 percentage points for the 8B and 70B base models respectively. Furthermore, CLoud reward models lead to a Pareto improvement for win rate on ArenaHard when used as the scoring model for Best-of-N. Finally, we explore how to exploit the dynamic inference compute capabilities of CLoud reward models by performing self-consistency decoding for reward prediction. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.11791v1-abstract-full').style.display = 'none'; document.getElementById('2408.11791v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.05074</a> <span> [<a href="">pdf</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> RT-Surv: Improving Mortality Prediction After Radiotherapy with Large Language Model Structuring of Large-Scale Unstructured Electronic Health Records </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Park%2C+S">Sangjoon Park</a>, <a href="/search/cs?searchtype=author&query=Wee%2C+C+W">Chan Woo Wee</a>, <a href="/search/cs?searchtype=author&query=Choi%2C+S+H">Seo Hee Choi</a>, <a href="/search/cs?searchtype=author&query=Kim%2C+K+H">Kyung Hwan Kim</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J+S">Jee Suk Chang</a>, <a href="/search/cs?searchtype=author&query=Yoon%2C+H+I">Hong In Yoon</a>, <a href="/search/cs?searchtype=author&query=Lee%2C+I+J">Ik Jae Lee</a>, <a href="/search/cs?searchtype=author&query=Kim%2C+Y+B">Yong Bae Kim</a>, <a href="/search/cs?searchtype=author&query=Cho%2C+J">Jaeho Cho</a>, <a href="/search/cs?searchtype=author&query=Keum%2C+K+C">Ki Chang Keum</a>, <a href="/search/cs?searchtype=author&query=Lee%2C+C+G">Chang Geol Lee</a>, <a href="/search/cs?searchtype=author&query=Byun%2C+H+K">Hwa Kyung Byun</a>, <a href="/search/cs?searchtype=author&query=Koom%2C+W+S">Woong Sub Koom</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.05074v4-abstract-short" style="display: inline;"> Accurate patient selection is critical in radiotherapy (RT) to prevent ineffective treatments. Traditional survival prediction models, relying on structured data, often lack precision. This study explores the potential of large language models (LLMs) to structure unstructured electronic health record (EHR) data, thereby improving survival prediction accuracy through comprehensive clinical informat… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.05074v4-abstract-full').style.display = 'inline'; document.getElementById('2408.05074v4-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.05074v4-abstract-full" style="display: none;"> Accurate patient selection is critical in radiotherapy (RT) to prevent ineffective treatments. Traditional survival prediction models, relying on structured data, often lack precision. This study explores the potential of large language models (LLMs) to structure unstructured electronic health record (EHR) data, thereby improving survival prediction accuracy through comprehensive clinical information integration. Data from 34,276 patients treated with RT at Yonsei Cancer Center between 2013 and 2023 were analyzed, encompassing both structured and unstructured data. An open-source LLM was used to structure the unstructured EHR data via single-shot learning, with its performance compared against a domain-specific medical LLM and a smaller variant. Survival prediction models were developed using statistical, machine learning, and deep learning approaches, incorporating both structured and LLM-structured data. Clinical experts evaluated the accuracy of the LLM-structured data. The open-source LLM achieved 87.5% accuracy in structuring unstructured EHR data without additional training, significantly outperforming the domain-specific medical LLM, which reached only 35.8% accuracy. Larger LLMs were more effective, particularly in extracting clinically relevant features like general condition and disease extent, which closely correlated with patient survival. Incorporating LLM-structured clinical features into survival prediction models significantly improved accuracy, with the C-index of deep learning models increasing from 0.737 to 0.820. These models also became more interpretable by emphasizing clinically significant factors. This study shows that general-domain LLMs, even without specific medical training, can effectively structure large-scale unstructured EHR data, substantially enhancing the accuracy and interpretability of clinical predictive models. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.05074v4-abstract-full').style.display = 'none'; document.getElementById('2408.05074v4-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 9 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">23 pages, 2 tables, 4 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2408.01933</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+B">Bowen Wang</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jiuyang Chang</a>, <a href="/search/cs?searchtype=author&query=Qian%2C+Y">Yiming Qian</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+G">Guoxin Chen</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+J">Junhao Chen</a>, <a href="/search/cs?searchtype=author&query=Jiang%2C+Z">Zhouqiang Jiang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+J">Jiahao Zhang</a>, <a href="/search/cs?searchtype=author&query=Nakashima%2C+Y">Yuta Nakashima</a>, <a href="/search/cs?searchtype=author&query=Nagahara%2C+H">Hajime Nagahara</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2408.01933v2-abstract-short" style="display: inline;"> Large language models (LLMs) have recently showcased remarkable capabilities, spanning a wide range of tasks and applications, including those in the medical domain. Models like GPT-4 excel in medical question answering but may face challenges in the lack of interpretability when handling complex tasks in real clinical settings. We thus introduce the diagnostic reasoning dataset for clinical notes… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.01933v2-abstract-full').style.display = 'inline'; document.getElementById('2408.01933v2-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2408.01933v2-abstract-full" style="display: none;"> Large language models (LLMs) have recently showcased remarkable capabilities, spanning a wide range of tasks and applications, including those in the medical domain. Models like GPT-4 excel in medical question answering but may face challenges in the lack of interpretability when handling complex tasks in real clinical settings. We thus introduce the diagnostic reasoning dataset for clinical notes (DiReCT), aiming at evaluating the reasoning ability and interpretability of LLMs compared to human doctors. It contains 511 clinical notes, each meticulously annotated by physicians, detailing the diagnostic reasoning process from observations in a clinical note to the final diagnosis. Additionally, a diagnostic knowledge graph is provided to offer essential knowledge for reasoning, which may not be covered in the training data of existing LLMs. Evaluations of leading LLMs on DiReCT bring out a significant gap between their reasoning ability and that of human doctors, highlighting the critical need for models that can reason effectively in real-world clinical scenarios. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2408.01933v2-abstract-full').style.display = 'none'; document.getElementById('2408.01933v2-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 6 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 4 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> August 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">9 pages,6 figures</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.16984</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Information Retrieval">cs.IR</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Genomics">q-bio.GN</span> </div> </div> <p class="title is-5 mathjax"> scGHSOM: Hierarchical clustering and visualization of single-cell and CRISPR data using growing hierarchical SOM </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wen%2C+S">Shang-Jung Wen</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jia-Ming Chang</a>, <a href="/search/cs?searchtype=author&query=Yu%2C+F">Fang Yu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.16984v1-abstract-short" style="display: inline;"> High-dimensional single-cell data poses significant challenges in identifying underlying biological patterns due to the complexity and heterogeneity of cellular states. We propose a comprehensive gene-cell dependency visualization via unsupervised clustering, Growing Hierarchical Self-Organizing Map (GHSOM), specifically designed for analyzing high-dimensional single-cell data like single-cell seq… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.16984v1-abstract-full').style.display = 'inline'; document.getElementById('2407.16984v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.16984v1-abstract-full" style="display: none;"> High-dimensional single-cell data poses significant challenges in identifying underlying biological patterns due to the complexity and heterogeneity of cellular states. We propose a comprehensive gene-cell dependency visualization via unsupervised clustering, Growing Hierarchical Self-Organizing Map (GHSOM), specifically designed for analyzing high-dimensional single-cell data like single-cell sequencing and CRISPR screens. GHSOM is applied to cluster samples in a hierarchical structure such that the self-growth structure of clusters satisfies the required variations between and within. We propose a novel Significant Attributes Identification Algorithm to identify features that distinguish clusters. This algorithm pinpoints attributes with minimal variation within a cluster but substantial variation between clusters. These key attributes can then be used for targeted data retrieval and downstream analysis. Furthermore, we present two innovative visualization tools: Cluster Feature Map and Cluster Distribution Map. The Cluster Feature Map highlights the distribution of specific features across the hierarchical structure of GHSOM clusters. This allows for rapid visual assessment of cluster uniqueness based on chosen features. The Cluster Distribution Map depicts leaf clusters as circles on the GHSOM grid, with circle size reflecting cluster data size and color customizable to visualize features like cell type or other attributes. We apply our analysis to three single-cell datasets and one CRISPR dataset (cell-gene database) and evaluate clustering methods with internal and external CH and ARI scores. GHSOM performs well, being the best performer in internal evaluation (CH=4.2). In external evaluation, GHSOM has the third-best performance of all methods. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.16984v1-abstract-full').style.display = 'none'; document.getElementById('2407.16984v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 24 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Abstract presentation at BIOKDD@ACM KDD 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.14136</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> 6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chun%2C+S">Sungho Chun</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J+Y">Ju Yong Chang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.14136v1-abstract-short" style="display: inline;"> This study addresses the nuanced challenge of estimating head translations within the context of six-degrees-of-freedom (6DoF) head pose estimation, placing emphasis on this aspect over the more commonly studied head rotations. Identifying a gap in existing methodologies, we recognized the underutilized potential synergy between facial geometry and head translation. To bridge this gap, we propose… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.14136v1-abstract-full').style.display = 'inline'; document.getElementById('2407.14136v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.14136v1-abstract-full" style="display: none;"> This study addresses the nuanced challenge of estimating head translations within the context of six-degrees-of-freedom (6DoF) head pose estimation, placing emphasis on this aspect over the more commonly studied head rotations. Identifying a gap in existing methodologies, we recognized the underutilized potential synergy between facial geometry and head translation. To bridge this gap, we propose a novel approach called the head Translation, Rotation, and face Geometry network (TRG), which stands out for its explicit bidirectional interaction structure. This structure has been carefully designed to leverage the complementary relationship between face geometry and head translation, marking a significant advancement in the field of head pose estimation. Our contributions also include the development of a strategy for estimating bounding box correction parameters and a technique for aligning landmarks to image. Both of these innovations demonstrate superior performance in 6DoF head pose estimation tasks. Extensive experiments conducted on ARKitFace and BIWI datasets confirm that the proposed method outperforms current state-of-the-art techniques. Codes are released at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.14136v1-abstract-full').style.display = 'none'; document.getElementById('2407.14136v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 19 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.12727</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Zhang%2C+Z">Zhongqun Zhang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+H">Hengfei Wang</a>, <a href="/search/cs?searchtype=author&query=Yu%2C+Z">Ziwei Yu</a>, <a href="/search/cs?searchtype=author&query=Cheng%2C+Y">Yihua Cheng</a>, <a href="/search/cs?searchtype=author&query=Yao%2C+A">Angela Yao</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+H+J">Hyung Jin Chang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.12727v1-abstract-short" style="display: inline;"> Modeling the physical contacts between the hand and object is standard for refining inaccurate hand poses and generating novel human grasp in 3D hand-object reconstruction. However, existing methods rely on geometric constraints that cannot be specified or controlled. This paper introduces a novel task of controllable 3D hand-object contact modeling with natural language descriptions. Challenges i… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.12727v1-abstract-full').style.display = 'inline'; document.getElementById('2407.12727v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.12727v1-abstract-full" style="display: none;"> Modeling the physical contacts between the hand and object is standard for refining inaccurate hand poses and generating novel human grasp in 3D hand-object reconstruction. However, existing methods rely on geometric constraints that cannot be specified or controlled. This paper introduces a novel task of controllable 3D hand-object contact modeling with natural language descriptions. Challenges include i) the complexity of cross-modal modeling from language to contact, and ii) a lack of descriptive text for contact patterns. To address these issues, we propose NL2Contact, a model that generates controllable contacts by leveraging staged diffusion models. Given a language description of the hand and contact, NL2Contact generates realistic and faithful 3D hand-object contacts. To train the model, we build \textit{ContactDescribe}, the first dataset with hand-centered contact descriptions. It contains multi-level and diverse descriptions generated by large language models based on carefully designed prompts (e.g., grasp action, grasp type, contact location, free finger status). We show applications of our model to grasp pose optimization and novel human grasp generation, both based on a textual contact description. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.12727v1-abstract-full').style.display = 'none'; document.getElementById('2407.12727v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 17 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted by ECCV2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.08801</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Jiang%2C+J">Jincen Jiang</a>, <a href="/search/cs?searchtype=author&query=Zhou%2C+Q">Qianyu Zhou</a>, <a href="/search/cs?searchtype=author&query=Li%2C+Y">Yuhang Li</a>, <a href="/search/cs?searchtype=author&query=Lu%2C+X">Xuequan Lu</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+M">Meili Wang</a>, <a href="/search/cs?searchtype=author&query=Ma%2C+L">Lizhuang Ma</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jian Chang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+J+J">Jian Jun Zhang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.08801v1-abstract-short" style="display: inline;"> Recent point cloud understanding research suffers from performance drops on unseen data, due to the distribution shifts across different domains. While recent studies use Domain Generalization (DG) techniques to mitigate this by learning domain-invariant features, most are designed for a single task and neglect the potential of testing data. Despite In-Context Learning (ICL) showcasing multi-task… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.08801v1-abstract-full').style.display = 'inline'; document.getElementById('2407.08801v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.08801v1-abstract-full" style="display: none;"> Recent point cloud understanding research suffers from performance drops on unseen data, due to the distribution shifts across different domains. While recent studies use Domain Generalization (DG) techniques to mitigate this by learning domain-invariant features, most are designed for a single task and neglect the potential of testing data. Despite In-Context Learning (ICL) showcasing multi-task learning capability, it usually relies on high-quality context-rich data and considers a single dataset, and has rarely been studied in point cloud understanding. In this paper, we introduce a novel, practical, multi-domain multi-task setting, handling multiple domains and multiple tasks within one unified model for domain generalized point cloud understanding. To this end, we propose Domain Generalized Point-In-Context Learning (DG-PIC) that boosts the generalizability across various tasks and domains at testing time. In particular, we develop dual-level source prototype estimation that considers both global-level shape contextual and local-level geometrical structures for representing source domains and a dual-level test-time feature shifting mechanism that leverages both macro-level domain semantic information and micro-level patch positional relationships to pull the target data closer to the source ones during the testing. Our DG-PIC does not require any model updates during the testing and can handle unseen domains and multiple tasks, \textit{i.e.,} point cloud reconstruction, denoising, and registration, within one unified model. We also introduce a benchmark for this new setting. Comprehensive experiments demonstrate that DG-PIC outperforms state-of-the-art techniques significantly. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.08801v1-abstract-full').style.display = 'none'; document.getElementById('2407.08801v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 11 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted to ECCV 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2407.05254</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> GaussReg: Fast 3D Registration with Gaussian Splatting </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jiahao Chang</a>, <a href="/search/cs?searchtype=author&query=Xu%2C+Y">Yinglin Xu</a>, <a href="/search/cs?searchtype=author&query=Li%2C+Y">Yihao Li</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+Y">Yuantao Chen</a>, <a href="/search/cs?searchtype=author&query=Han%2C+X">Xiaoguang Han</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2407.05254v1-abstract-short" style="display: inline;"> Point cloud registration is a fundamental problem for large-scale 3D scene scanning and reconstruction. With the help of deep learning, registration methods have evolved significantly, reaching a nearly-mature stage. As the introduction of Neural Radiance Fields (NeRF), it has become the most popular 3D scene representation as its powerful view synthesis capabilities. Regarding NeRF representation… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.05254v1-abstract-full').style.display = 'inline'; document.getElementById('2407.05254v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2407.05254v1-abstract-full" style="display: none;"> Point cloud registration is a fundamental problem for large-scale 3D scene scanning and reconstruction. With the help of deep learning, registration methods have evolved significantly, reaching a nearly-mature stage. As the introduction of Neural Radiance Fields (NeRF), it has become the most popular 3D scene representation as its powerful view synthesis capabilities. Regarding NeRF representation, its registration is also required for large-scale scene reconstruction. However, this topic extremly lacks exploration. This is due to the inherent challenge to model the geometric relationship among two scenes with implicit representations. The existing methods usually convert the implicit representation to explicit representation for further registration. Most recently, Gaussian Splatting (GS) is introduced, employing explicit 3D Gaussian. This method significantly enhances rendering speed while maintaining high rendering quality. Given two scenes with explicit GS representations, in this work, we explore the 3D registration task between them. To this end, we propose GaussReg, a novel coarse-to-fine framework, both fast and accurate. The coarse stage follows existing point cloud registration methods and estimates a rough alignment for point clouds from GS. We further newly present an image-guided fine registration approach, which renders images from GS to provide more detailed geometric information for precise alignment. To support comprehensive evaluation, we carefully build a scene-level dataset called ScanNet-GSReg with 1379 scenes obtained from the ScanNet dataset and collect an in-the-wild dataset called GSReg. Experimental results demonstrate our method achieves state-of-the-art performance on multiple datasets. Our GaussReg is 44 times faster than HLoc (SuperPoint as the feature extractor and SuperGlue as the matcher) with comparable accuracy. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2407.05254v1-abstract-full').style.display = 'none'; document.getElementById('2407.05254v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 7 July, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> July 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">ECCV 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.19560</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Image and Video Processing">eess.IV</span> </div> </div> <p class="title is-5 mathjax"> Cost-efficient Active Illumination Camera For Hyper-spectral Reconstruction </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Zhang%2C+Y">Yuxuan Zhang</a>, <a href="/search/cs?searchtype=author&query=Sazzad%2C+T+M">T. M. Sazzad</a>, <a href="/search/cs?searchtype=author&query=Song%2C+Y">Yangyang Song</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+S+J">Spencer J. Chang</a>, <a href="/search/cs?searchtype=author&query=Chowdhry%2C+R">Ritesh Chowdhry</a>, <a href="/search/cs?searchtype=author&query=Mejia%2C+T">Tomas Mejia</a>, <a href="/search/cs?searchtype=author&query=Hampton%2C+A">Anna Hampton</a>, <a href="/search/cs?searchtype=author&query=Kucharski%2C+S">Shelby Kucharski</a>, <a href="/search/cs?searchtype=author&query=Gerber%2C+S">Stefan Gerber</a>, <a href="/search/cs?searchtype=author&query=Tillman%2C+B">Barry Tillman</a>, <a href="/search/cs?searchtype=author&query=Resende%2C+M+F+R">Marcio F. R. Resende</a>, <a href="/search/cs?searchtype=author&query=Hammond%2C+W+M">William M. Hammond</a>, <a href="/search/cs?searchtype=author&query=Wilson%2C+C+H">Chris H. Wilson</a>, <a href="/search/cs?searchtype=author&query=Zare%2C+A">Alina Zare</a>, <a href="/search/cs?searchtype=author&query=Koppal%2C+S+J">Sanjeev J. Koppal</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.19560v1-abstract-short" style="display: inline;"> Hyper-spectral imaging has recently gained increasing attention for use in different applications, including agricultural investigation, ground tracking, remote sensing and many other. However, the high cost, large physical size and complicated operation process stop hyperspectral cameras from being employed for various applications and research fields. In this paper, we introduce a cost-efficient… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.19560v1-abstract-full').style.display = 'inline'; document.getElementById('2406.19560v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.19560v1-abstract-full" style="display: none;"> Hyper-spectral imaging has recently gained increasing attention for use in different applications, including agricultural investigation, ground tracking, remote sensing and many other. However, the high cost, large physical size and complicated operation process stop hyperspectral cameras from being employed for various applications and research fields. In this paper, we introduce a cost-efficient, compact and easy to use active illumination camera that may benefit many applications. We developed a fully functional prototype of such camera. With the hope of helping with agricultural research, we tested our camera for plant root imaging. In addition, a U-Net model for spectral reconstruction was trained by using a reference hyperspectral camera's data as ground truth and our camera's data as input. We demonstrated our camera's ability to obtain additional information over a typical RGB camera. In addition, the ability to reconstruct hyperspectral data from multi-spectral input makes our device compatible to models and algorithms developed for hyperspectral applications with no modifications required. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.19560v1-abstract-full').style.display = 'none'; document.getElementById('2406.19560v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 27 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.10370</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Human-Computer Interaction">cs.HC</span> </div> </div> <p class="title is-5 mathjax"> Let's Get to the Point: LLM-Supported Planning, Drafting, and Revising of Research-Paper Blog Posts </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Radensky%2C+M">Marissa Radensky</a>, <a href="/search/cs?searchtype=author&query=Weld%2C+D+S">Daniel S. Weld</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J+C">Joseph Chee Chang</a>, <a href="/search/cs?searchtype=author&query=Siangliulue%2C+P">Pao Siangliulue</a>, <a href="/search/cs?searchtype=author&query=Bragg%2C+J">Jonathan Bragg</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.10370v1-abstract-short" style="display: inline;"> Research-paper blog posts help scientists disseminate their work to a larger audience, but translating papers into this format requires substantial additional effort. Blog post creation is not simply transforming a long-form article into a short output, as studied in most prior work on human-AI summarization. In contrast, blog posts are typically full-length articles that require a combination of… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.10370v1-abstract-full').style.display = 'inline'; document.getElementById('2406.10370v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.10370v1-abstract-full" style="display: none;"> Research-paper blog posts help scientists disseminate their work to a larger audience, but translating papers into this format requires substantial additional effort. Blog post creation is not simply transforming a long-form article into a short output, as studied in most prior work on human-AI summarization. In contrast, blog posts are typically full-length articles that require a combination of strategic planning grounded in the source document, well-organized drafting, and thoughtful revisions. Can tools powered by large language models (LLMs) assist scientists in writing research-paper blog posts? To investigate this question, we conducted a formative study (N=6) to understand the main challenges of writing such blog posts with an LLM: high interaction costs for 1) reviewing and utilizing the paper content and 2) recurrent sub-tasks of generating and modifying the long-form output. To address these challenges, we developed Papers-to-Posts, an LLM-powered tool that implements a new Plan-Draft-Revise workflow, which 1) leverages an LLM to generate bullet points from the full paper to help users find and select content to include (Plan) and 2) provides default yet customizable LLM instructions for generating and modifying text (Draft, Revise). Through a within-subjects lab study (N=20) and between-subjects deployment study (N=37 blog posts, 26 participants) in which participants wrote blog posts about their papers, we compared Papers-to-Posts to a strong baseline tool that provides an LLM-generated draft and access to free-form LLM prompting. Results show that Papers-to-Posts helped researchers to 1) write significantly more satisfying blog posts and make significantly more changes to their blog posts in a fixed amount of time without a significant change in cognitive load (lab) and 2) make more changes to their blog posts for a fixed number of writing actions (deployment). <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.10370v1-abstract-full').style.display = 'none'; document.getElementById('2406.10370v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 14 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">28 pages, 9 figures in main text (not appendix)</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2406.00490</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Research on the Application of Computer Vision Based on Deep Learning in Autonomous Driving Technology </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Zhang%2C+J">Jingyu Zhang</a>, <a href="/search/cs?searchtype=author&query=Cao%2C+J">Jin Cao</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jinghao Chang</a>, <a href="/search/cs?searchtype=author&query=Li%2C+X">Xinjin Li</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+H">Houze Liu</a>, <a href="/search/cs?searchtype=author&query=Li%2C+Z">Zhenglin Li</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2406.00490v2-abstract-short" style="display: inline;"> This research aims to explore the application of deep learning in autonomous driving computer vision technology and its impact on improving system performance. By using advanced technologies such as convolutional neural networks (CNN), multi-task joint learning methods, and deep reinforcement learning, this article analyzes in detail the application of deep learning in image recognition, real-time… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.00490v2-abstract-full').style.display = 'inline'; document.getElementById('2406.00490v2-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2406.00490v2-abstract-full" style="display: none;"> This research aims to explore the application of deep learning in autonomous driving computer vision technology and its impact on improving system performance. By using advanced technologies such as convolutional neural networks (CNN), multi-task joint learning methods, and deep reinforcement learning, this article analyzes in detail the application of deep learning in image recognition, real-time target tracking and classification, environment perception and decision support, and path planning and navigation. Application process in key areas. Research results show that the proposed system has an accuracy of over 98% in image recognition, target tracking and classification, and also demonstrates efficient performance and practicality in environmental perception and decision support, path planning and navigation. The conclusion points out that deep learning technology can significantly improve the accuracy and real-time response capabilities of autonomous driving systems. Although there are still challenges in environmental perception and decision support, with the advancement of technology, it is expected to achieve wider applications and greater capabilities in the future. potential. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2406.00490v2-abstract-full').style.display = 'none'; document.getElementById('2406.00490v2-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 3 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 1 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.17829</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jinho Chang</a>, <a href="/search/cs?searchtype=author&query=Ye%2C+J+C">Jong Chul Ye</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.17829v2-abstract-short" style="display: inline;"> With the emergence of diffusion models as the frontline of generative models, many researchers have proposed molecule generation techniques with conditional diffusion models. However, the unavoidable discreteness of a molecule makes it difficult for a diffusion model to connect raw data with highly complex conditions like natural language. To address this, we present a novel latent diffusion model… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.17829v2-abstract-full').style.display = 'inline'; document.getElementById('2405.17829v2-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.17829v2-abstract-full" style="display: none;"> With the emergence of diffusion models as the frontline of generative models, many researchers have proposed molecule generation techniques with conditional diffusion models. However, the unavoidable discreteness of a molecule makes it difficult for a diffusion model to connect raw data with highly complex conditions like natural language. To address this, we present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation. LDMol comprises a molecule autoencoder that produces a learnable and structurally informative feature space, and a natural language-conditioned latent diffusion model. In particular, recognizing that multiple SMILES notations can represent the same molecule, we employ a contrastive learning strategy to extract feature space that is aware of the unique characteristics of the molecule structure. LDMol outperforms the existing baselines on the text-to-molecule generation benchmark, suggesting a potential for diffusion models can outperform autoregressive models in text data generation with a better choice of the latent domain. Furthermore, we show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing, demonstrating its versatility as a diffusion model. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.17829v2-abstract-full').style.display = 'none'; document.getElementById('2405.17829v2-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 3 October, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 28 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.13226</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Pouransari%2C+H">Hadi Pouransari</a>, <a href="/search/cs?searchtype=author&query=Li%2C+C">Chun-Liang Li</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J+R">Jen-Hao Rick Chang</a>, <a href="/search/cs?searchtype=author&query=Vasu%2C+P+K+A">Pavan Kumar Anasosalu Vasu</a>, <a href="/search/cs?searchtype=author&query=Koc%2C+C">Cem Koc</a>, <a href="/search/cs?searchtype=author&query=Shankar%2C+V">Vaishaal Shankar</a>, <a href="/search/cs?searchtype=author&query=Tuzel%2C+O">Oncel Tuzel</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.13226v1-abstract-short" style="display: inline;"> Large language models (LLMs) are commonly trained on datasets consisting of fixed-length token sequences. These datasets are created by randomly concatenating documents of various lengths and then chunking them into sequences of a predetermined target length. However, this method of concatenation can lead to cross-document attention within a sequence, which is neither a desirable learning signal n… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.13226v1-abstract-full').style.display = 'inline'; document.getElementById('2405.13226v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.13226v1-abstract-full" style="display: none;"> Large language models (LLMs) are commonly trained on datasets consisting of fixed-length token sequences. These datasets are created by randomly concatenating documents of various lengths and then chunking them into sequences of a predetermined target length. However, this method of concatenation can lead to cross-document attention within a sequence, which is neither a desirable learning signal nor computationally efficient. Additionally, training on long sequences becomes computationally prohibitive due to the quadratic cost of attention. In this study, we introduce dataset decomposition, a novel variable sequence length training technique, to tackle these challenges. We decompose a dataset into a union of buckets, each containing sequences of the same size extracted from a unique document. During training, we use variable sequence length and batch size, sampling simultaneously from all buckets with a curriculum. In contrast to the concat-and-chunk baseline, which incurs a fixed attention cost at every step of training, our proposed method incurs a penalty proportional to the actual document lengths at each step, resulting in significant savings in training time. We train an 8k context-length 1B model at the same cost as a 2k context-length model trained with the baseline approach. Experiments on a web-scale corpus demonstrate that our approach significantly enhances performance on standard language evaluations and long-context benchmarks, reaching target accuracy 3x faster compared to the baseline. Our method not only enables efficient pretraining on long sequences but also scales effectively with dataset size. Lastly, we shed light on a critical yet less studied aspect of training large language models: the distribution and curriculum of sequence lengths, which results in a non-negligible difference in performance. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.13226v1-abstract-full').style.display = 'none'; document.getElementById('2405.13226v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 21 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.09592</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computational Engineering, Finance, and Science">cs.CE</span> </div> </div> <p class="title is-5 mathjax"> A Survey of Generative Techniques for Spatial-Temporal Data Mining </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Zhang%2C+Q">Qianru Zhang</a>, <a href="/search/cs?searchtype=author&query=Wang%2C+H">Haixin Wang</a>, <a href="/search/cs?searchtype=author&query=Long%2C+C">Cheng Long</a>, <a href="/search/cs?searchtype=author&query=Su%2C+L">Liangcai Su</a>, <a href="/search/cs?searchtype=author&query=He%2C+X">Xingwei He</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jianlong Chang</a>, <a href="/search/cs?searchtype=author&query=Wu%2C+T">Tailin Wu</a>, <a href="/search/cs?searchtype=author&query=Yin%2C+H">Hongzhi Yin</a>, <a href="/search/cs?searchtype=author&query=Yiu%2C+S">Siu-Ming Yiu</a>, <a href="/search/cs?searchtype=author&query=Tian%2C+Q">Qi Tian</a>, <a href="/search/cs?searchtype=author&query=Jensen%2C+C+S">Christian S. Jensen</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.09592v1-abstract-short" style="display: inline;"> This paper focuses on the integration of generative techniques into spatial-temporal data mining, considering the significant growth and diverse nature of spatial-temporal data. With the advancements in RNNs, CNNs, and other non-generative techniques, researchers have explored their application in capturing temporal and spatial dependencies within spatial-temporal data. However, the emergence of g… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.09592v1-abstract-full').style.display = 'inline'; document.getElementById('2405.09592v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.09592v1-abstract-full" style="display: none;"> This paper focuses on the integration of generative techniques into spatial-temporal data mining, considering the significant growth and diverse nature of spatial-temporal data. With the advancements in RNNs, CNNs, and other non-generative techniques, researchers have explored their application in capturing temporal and spatial dependencies within spatial-temporal data. However, the emergence of generative techniques such as LLMs, SSL, Seq2Seq and diffusion models has opened up new possibilities for enhancing spatial-temporal data mining further. The paper provides a comprehensive analysis of generative technique-based spatial-temporal methods and introduces a standardized framework specifically designed for the spatial-temporal data mining pipeline. By offering a detailed review and a novel taxonomy of spatial-temporal methodology utilizing generative techniques, the paper enables a deeper understanding of the various techniques employed in this field. Furthermore, the paper highlights promising future research directions, urging researchers to delve deeper into spatial-temporal data mining. It emphasizes the need to explore untapped opportunities and push the boundaries of knowledge to unlock new insights and improve the effectiveness and efficiency of spatial-temporal data mining. By integrating generative techniques and providing a standardized framework, the paper contributes to advancing the field and encourages researchers to explore the vast potential of generative techniques in spatial-temporal data mining. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.09592v1-abstract-full').style.display = 'none'; document.getElementById('2405.09592v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 15 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">19 pages</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.08832</a> <span> [<a href="">pdf</a>, <a href="">ps</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Human-Computer Interaction">cs.HC</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computers and Society">cs.CY</span> </div> <div class="is-inline-block" style="margin-left: 0.5rem"> <div class="tags has-addons"> <span class="tag is-dark is-size-7">doi</span> <span class="tag is-light is-size-7"><a class="" href="">10.1145/3613905.3650997 <i class="fa fa-external-link" aria-hidden="true"></i></a></span> </div> </div> </div> <p class="title is-5 mathjax"> Theorizing Deception: A Scoping Review of Theory in Research on Dark Patterns and Deceptive Design </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chang%2C+W+J">Weichen Joe Chang</a>, <a href="/search/cs?searchtype=author&query=Seaborn%2C+K">Katie Seaborn</a>, <a href="/search/cs?searchtype=author&query=Adams%2C+A+A">Andrew A. Adams</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.08832v1-abstract-short" style="display: inline;"> The issue of dark patterns and deceptive designs (DPs) in everyday interfaces and interactions continues to grow. DPs are manipulative and malicious elements within user interfaces that deceive users into making unintended choices. In parallel, research on DPs has significantly increased over the past two decades. As the field has matured, epistemological gaps have also become a salient and pressi… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.08832v1-abstract-full').style.display = 'inline'; document.getElementById('2405.08832v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.08832v1-abstract-full" style="display: none;"> The issue of dark patterns and deceptive designs (DPs) in everyday interfaces and interactions continues to grow. DPs are manipulative and malicious elements within user interfaces that deceive users into making unintended choices. In parallel, research on DPs has significantly increased over the past two decades. As the field has matured, epistemological gaps have also become a salient and pressing concern. In this scoping review, we assessed the academic work so far -- 51 papers between 2014 to 2023 -- to identify the state of theory in DP research. We identified the key theories employed, examined how these theories have been referenced, and call for enhancing the incorporation of theory into DP research. We also propose broad theoretical foundations to establish a comprehensive and solid base for contextualizing and informing future DP research from a variety of theoretical scopes and lenses. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.08832v1-abstract-full').style.display = 'none'; document.getElementById('2405.08832v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 13 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Journal ref:</span> CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (2024), Article No.: 321, 1-7 </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2405.04943</a> <span> [<a href="">pdf</a>, <a href="">ps</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Unsupervised Skin Feature Tracking with Deep Neural Networks </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jose Chang</a>, <a href="/search/cs?searchtype=author&query=Nordling%2C+T+E+M">Torbj枚rn E. M. Nordling</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2405.04943v1-abstract-short" style="display: inline;"> Facial feature tracking is essential in imaging ballistocardiography for accurate heart rate estimation and enables motor degradation quantification in Parkinson's disease through skin feature tracking. While deep convolutional neural networks have shown remarkable accuracy in tracking tasks, they typically require extensive labeled data for supervised training. Our proposed pipeline employs a con… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.04943v1-abstract-full').style.display = 'inline'; document.getElementById('2405.04943v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2405.04943v1-abstract-full" style="display: none;"> Facial feature tracking is essential in imaging ballistocardiography for accurate heart rate estimation and enables motor degradation quantification in Parkinson's disease through skin feature tracking. While deep convolutional neural networks have shown remarkable accuracy in tracking tasks, they typically require extensive labeled data for supervised training. Our proposed pipeline employs a convolutional stacked autoencoder to match image crops with a reference crop containing the target feature, learning deep feature encodings specific to the object category in an unsupervised manner, thus reducing data requirements. To overcome edge effects making the performance dependent on crop size, we introduced a Gaussian weight on the residual errors of the pixels when calculating the loss function. Training the autoencoder on facial images and validating its performance on manually labeled face and hand videos, our Deep Feature Encodings (DFE) method demonstrated superior tracking accuracy with a mean error ranging from 0.6 to 3.3 pixels, outperforming traditional methods like SIFT, SURF, Lucas Kanade, and the latest transformers like PIPs++ and CoTracker. Overall, our unsupervised learning approach excels in tracking various skin features under significant motion conditions, providing superior feature descriptors for tracking, matching, and image registration compared to both traditional and state-of-the-art supervised learning methods. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2405.04943v1-abstract-full').style.display = 'none'; document.getElementById('2405.04943v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 8 May, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">arXiv admin note: text overlap with arXiv:2112.14159</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2404.17486</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> TextGaze: Gaze-Controllable Face Generation with Natural Language </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Wang%2C+H">Hengfei Wang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+Z">Zhongqun Zhang</a>, <a href="/search/cs?searchtype=author&query=Cheng%2C+Y">Yihua Cheng</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+H+J">Hyung Jin Chang</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2404.17486v3-abstract-short" style="display: inline;"> Generating face image with specific gaze information has attracted considerable attention. Existing approaches typically input gaze values directly for face generation, which is unnatural and requires annotated gaze datasets for training, thereby limiting its application. In this paper, we present a novel gaze-controllable face generation task. Our approach inputs textual descriptions that describ… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2404.17486v3-abstract-full').style.display = 'inline'; document.getElementById('2404.17486v3-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2404.17486v3-abstract-full" style="display: none;"> Generating face image with specific gaze information has attracted considerable attention. Existing approaches typically input gaze values directly for face generation, which is unnatural and requires annotated gaze datasets for training, thereby limiting its application. In this paper, we present a novel gaze-controllable face generation task. Our approach inputs textual descriptions that describe human gaze and head behavior and generates corresponding face images. Our work first introduces a text-of-gaze dataset containing over 90k text descriptions spanning a dense distribution of gaze and head poses. We further propose a gaze-controllable text-to-face method. Our method contains a sketch-conditioned face diffusion module and a model-based sketch diffusion module. We define a face sketch based on facial landmarks and eye segmentation map. The face diffusion module generates face images from the face sketch, and the sketch diffusion module employs a 3D face model to generate face sketch from text description. Experiments on the FFHQ dataset show the effectiveness of our method. We will release our dataset and code for future research. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2404.17486v3-abstract-full').style.display = 'none'; document.getElementById('2404.17486v3-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 28 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 26 April, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> April 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">ACM MM2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2404.16767</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> REBEL: Reinforcement Learning via Regressing Relative Rewards </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Gao%2C+Z">Zhaolin Gao</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J+D">Jonathan D. Chang</a>, <a href="/search/cs?searchtype=author&query=Zhan%2C+W">Wenhao Zhan</a>, <a href="/search/cs?searchtype=author&query=Oertell%2C+O">Owen Oertell</a>, <a href="/search/cs?searchtype=author&query=Swamy%2C+G">Gokul Swamy</a>, <a href="/search/cs?searchtype=author&query=Brantley%2C+K">Kiant茅 Brantley</a>, <a href="/search/cs?searchtype=author&query=Joachims%2C+T">Thorsten Joachims</a>, <a href="/search/cs?searchtype=author&query=Bagnell%2C+J+A">J. Andrew Bagnell</a>, <a href="/search/cs?searchtype=author&query=Lee%2C+J+D">Jason D. Lee</a>, <a href="/search/cs?searchtype=author&query=Sun%2C+W">Wen Sun</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2404.16767v3-abstract-short" style="display: inline;"> While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clipping), and is notorious for its sensitivity to the precise impleme… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2404.16767v3-abstract-full').style.display = 'inline'; document.getElementById('2404.16767v3-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2404.16767v3-abstract-full" style="display: none;"> While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clipping), and is notorious for its sensitivity to the precise implementation of these components. In response, we take a step back and ask what a minimalist RL algorithm for the era of generative models would look like. We propose REBEL, an algorithm that cleanly reduces the problem of policy optimization to regressing the relative reward between two completions to a prompt in terms of the policy, enabling strikingly lightweight implementation. In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL, which allows us to match the strongest known theoretical guarantees in terms of convergence and sample complexity in the RL literature. REBEL can also cleanly incorporate offline data and be extended to handle the intransitive preferences we frequently see in practice. Empirically, we find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO, all while being simpler to implement and more computationally efficient than PPO. When fine-tuning Llama-3-8B-Instruct, REBEL achieves strong performance in AlpacaEval 2.0, MT-Bench, and Open LLM Leaderboard. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2404.16767v3-abstract-full').style.display = 'none'; document.getElementById('2404.16767v3-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 1 September, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 25 April, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> April 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">New experimental results on general chat</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2404.08513</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> Adversarial Imitation Learning via Boosting </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chang%2C+J+D">Jonathan D. Chang</a>, <a href="/search/cs?searchtype=author&query=Sreenivas%2C+D">Dhruv Sreenivas</a>, <a href="/search/cs?searchtype=author&query=Huang%2C+Y">Yingbing Huang</a>, <a href="/search/cs?searchtype=author&query=Brantley%2C+K">Kiant茅 Brantley</a>, <a href="/search/cs?searchtype=author&query=Sun%2C+W">Wen Sun</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2404.08513v1-abstract-short" style="display: inline;"> Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy learning algorithms in improving sample efficiency and scalability to higher-dimensional observations. Despite DAC's empirical success, the original AIL objective… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2404.08513v1-abstract-full').style.display = 'inline'; document.getElementById('2404.08513v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2404.08513v1-abstract-full" style="display: none;"> Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy learning algorithms in improving sample efficiency and scalability to higher-dimensional observations. Despite DAC's empirical success, the original AIL objective is on-policy and DAC's ad-hoc application of off-policy training does not guarantee successful imitation (Kostrikov et al., 2019; 2020). Follow-up work such as ValueDICE (Kostrikov et al., 2020) tackles this issue by deriving a fully off-policy AIL objective. Instead in this work, we develop a novel and principled AIL algorithm via the framework of boosting. Like boosting, our new algorithm, AILBoost, maintains an ensemble of properly weighted weak learners (i.e., policies) and trains a discriminator that witnesses the maximum discrepancy between the distributions of the ensemble and the expert policy. We maintain a weighted replay buffer to represent the state-action distribution induced by the ensemble, allowing us to train discriminators using the entire data collected so far. In the weighted replay buffer, the contribution of the data from older policies are properly discounted with the weight computed based on the boosting framework. Empirically, we evaluate our algorithm on both controller state-based and pixel-based environments from the DeepMind Control Suite. AILBoost outperforms DAC on both types of environments, demonstrating the benefit of properly weighting replay buffer data for off-policy training. On state-based environments, DAC outperforms ValueDICE and IQ-Learn (Gary et al., 2021), achieving competitive performance with as little as one expert trajectory. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2404.08513v1-abstract-full').style.display = 'none'; document.getElementById('2404.08513v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 12 April, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> April 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">19 pages, 7 figures, 4 tables, 3 algorithms, ICLR 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2404.08495</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Dataset Reset Policy Optimization for RLHF </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Chang%2C+J+D">Jonathan D. Chang</a>, <a href="/search/cs?searchtype=author&query=Zhan%2C+W">Wenhao Zhan</a>, <a href="/search/cs?searchtype=author&query=Oertell%2C+O">Owen Oertell</a>, <a href="/search/cs?searchtype=author&query=Brantley%2C+K">Kiant茅 Brantley</a>, <a href="/search/cs?searchtype=author&query=Misra%2C+D">Dipendra Misra</a>, <a href="/search/cs?searchtype=author&query=Lee%2C+J+D">Jason D. Lee</a>, <a href="/search/cs?searchtype=author&query=Sun%2C+W">Wen Sun</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2404.08495v3-abstract-short" style="display: inline;"> Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a reward model from an offline preference dataset followed by running online RL to optimize the learned reward model. In this work, leveraging the idea of r… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2404.08495v3-abstract-full').style.display = 'inline'; document.getElementById('2404.08495v3-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2404.08495v3-abstract-full" style="display: none;"> Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a reward model from an offline preference dataset followed by running online RL to optimize the learned reward model. In this work, leveraging the idea of reset, we propose a new RLHF algorithm with provable guarantees. Motivated by the fact that offline preference dataset provides informative states (i.e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution. In theory, we show that DR-PO learns to perform at least as good as any policy that is covered by the offline dataset under general function approximation with finite sample complexity. In experiments, we demonstrate that on both the TL;DR summarization and the Anthropic Helpful Harmful (HH) dataset, the generation from DR-PO is better than that from Proximal Policy Optimization (PPO) and Direction Preference Optimization (DPO), under the metric of GPT4 win-rate. Code for this work can be found at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2404.08495v3-abstract-full').style.display = 'none'; document.getElementById('2404.08495v3-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 16 April, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 12 April, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> April 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">28 pages, 6 tables, 3 Figures, 3 Algorithms</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2404.03673</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> </div> </div> <p class="title is-5 mathjax"> RL for Consistency Models: Faster Reward Guided Text-to-Image Generation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Oertell%2C+O">Owen Oertell</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J+D">Jonathan D. Chang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+Y">Yiyi Zhang</a>, <a href="/search/cs?searchtype=author&query=Brantley%2C+K">Kiant茅 Brantley</a>, <a href="/search/cs?searchtype=author&query=Sun%2C+W">Wen Sun</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2404.03673v2-abstract-short" style="display: inline;"> Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit the same iterative sampling process of diffusion models that causes slow generation. To overcome this limitation, consistency models proposed learning… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2404.03673v2-abstract-full').style.display = 'inline'; document.getElementById('2404.03673v2-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2404.03673v2-abstract-full" style="display: none;"> Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit the same iterative sampling process of diffusion models that causes slow generation. To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration. In this work, to optimize text-to-image generative models for task specific rewards and enable fast training and inference, we propose a framework for fine-tuning consistency models via RL. Our framework, called Reinforcement Learning for Consistency Model (RLCM), frames the iterative inference process of a consistency model as an RL procedure. Comparing to RL finetuned diffusion models, RLCM trains significantly faster, improves the quality of the generation measured under the reward objectives, and speeds up the inference procedure by generating high quality images with as few as two inference steps. Experimentally, we show that RLCM can adapt text-to-image consistency models to objectives that are challenging to express with prompting, such as image compressibility, and those derived from human feedback, such as aesthetic quality. Our code is available at <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2404.03673v2-abstract-full').style.display = 'none'; document.getElementById('2404.03673v2-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 22 June, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 25 March, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> April 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">18 pages, 9 figures, 1 table</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2403.19632</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Ye%2C+C">Chongjie Ye</a>, <a href="/search/cs?searchtype=author&query=Nie%2C+Y">Yinyu Nie</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Jiahao Chang</a>, <a href="/search/cs?searchtype=author&query=Chen%2C+Y">Yuantao Chen</a>, <a href="/search/cs?searchtype=author&query=Zhi%2C+Y">Yihao Zhi</a>, <a href="/search/cs?searchtype=author&query=Han%2C+X">Xiaoguang Han</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2403.19632v1-abstract-short" style="display: inline;"> We present GauStudio, a novel modular framework for modeling 3D Gaussian Splatting (3DGS) to provide standardized, plug-and-play components for users to easily customize and implement a 3DGS pipeline. Supported by our framework, we propose a hybrid Gaussian representation with foreground and skyball background models. Experiments demonstrate this representation reduces artifacts in unbounded outdo… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.19632v1-abstract-full').style.display = 'inline'; document.getElementById('2403.19632v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2403.19632v1-abstract-full" style="display: none;"> We present GauStudio, a novel modular framework for modeling 3D Gaussian Splatting (3DGS) to provide standardized, plug-and-play components for users to easily customize and implement a 3DGS pipeline. Supported by our framework, we propose a hybrid Gaussian representation with foreground and skyball background models. Experiments demonstrate this representation reduces artifacts in unbounded outdoor scenes and improves novel view synthesis. Finally, we propose Gaussian Splatting Surface Reconstruction (GauS), a novel render-then-fuse approach for high-fidelity mesh reconstruction from 3DGS inputs without fine-tuning. Overall, our GauStudio framework, hybrid representation, and GauS approach enhance 3DGS modeling and rendering capabilities, enabling higher-quality novel view synthesis and surface reconstruction. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.19632v1-abstract-full').style.display = 'none'; document.getElementById('2403.19632v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 28 March, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> March 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Code:</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2403.17428</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Aligning Large Language Models for Enhancing Psychiatric Interviews through Symptom Delineation and Summarization </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=So%2C+J">Jae-hee So</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">Joonhwan Chang</a>, <a href="/search/cs?searchtype=author&query=Kim%2C+E">Eunji Kim</a>, <a href="/search/cs?searchtype=author&query=Na%2C+J">Junho Na</a>, <a href="/search/cs?searchtype=author&query=Choi%2C+J">JiYeon Choi</a>, <a href="/search/cs?searchtype=author&query=Sohn%2C+J">Jy-yong Sohn</a>, <a href="/search/cs?searchtype=author&query=Kim%2C+B">Byung-Hoon Kim</a>, <a href="/search/cs?searchtype=author&query=Chu%2C+S+H">Sang Hui Chu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2403.17428v1-abstract-short" style="display: inline;"> Recent advancements in Large Language Models (LLMs) have accelerated their usage in various domains. Given the fact that psychiatric interviews are goal-oriented and structured dialogues between the professional interviewer and the interviewee, it is one of the most underexplored areas where LLMs can contribute substantial value. Here, we explore the use of LLMs for enhancing psychiatric interview… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.17428v1-abstract-full').style.display = 'inline'; document.getElementById('2403.17428v1-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2403.17428v1-abstract-full" style="display: none;"> Recent advancements in Large Language Models (LLMs) have accelerated their usage in various domains. Given the fact that psychiatric interviews are goal-oriented and structured dialogues between the professional interviewer and the interviewee, it is one of the most underexplored areas where LLMs can contribute substantial value. Here, we explore the use of LLMs for enhancing psychiatric interviews, by analyzing counseling data from North Korean defectors with traumatic events and mental health issues. Specifically, we investigate whether LLMs can (1) delineate the part of the conversation that suggests psychiatric symptoms and name the symptoms, and (2) summarize stressors and symptoms, based on the interview dialogue transcript. Here, the transcript data was labeled by mental health experts for training and evaluation of LLMs. Our experimental results show that appropriately prompted LLMs can achieve high performance on both the symptom delineation task and the summarization task. This research contributes to the nascent field of applying LLMs to psychiatric interview and demonstrates their potential effectiveness in aiding mental health practitioners. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.17428v1-abstract-full').style.display = 'none'; document.getElementById('2403.17428v1-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 26 March, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> March 2024. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2403.16428</a> <span> [<a href="">pdf</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Fan%2C+Z">Zicong Fan</a>, <a href="/search/cs?searchtype=author&query=Ohkawa%2C+T">Takehiko Ohkawa</a>, <a href="/search/cs?searchtype=author&query=Yang%2C+L">Linlin Yang</a>, <a href="/search/cs?searchtype=author&query=Lin%2C+N">Nie Lin</a>, <a href="/search/cs?searchtype=author&query=Zhou%2C+Z">Zhishan Zhou</a>, <a href="/search/cs?searchtype=author&query=Zhou%2C+S">Shihao Zhou</a>, <a href="/search/cs?searchtype=author&query=Liang%2C+J">Jiajun Liang</a>, <a href="/search/cs?searchtype=author&query=Gao%2C+Z">Zhong Gao</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+X">Xuanyang Zhang</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+X">Xue Zhang</a>, <a href="/search/cs?searchtype=author&query=Li%2C+F">Fei Li</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+Z">Zheng Liu</a>, <a href="/search/cs?searchtype=author&query=Lu%2C+F">Feng Lu</a>, <a href="/search/cs?searchtype=author&query=Zeid%2C+K+A">Karim Abou Zeid</a>, <a href="/search/cs?searchtype=author&query=Leibe%2C+B">Bastian Leibe</a>, <a href="/search/cs?searchtype=author&query=On%2C+J">Jeongwan On</a>, <a href="/search/cs?searchtype=author&query=Baek%2C+S">Seungryul Baek</a>, <a href="/search/cs?searchtype=author&query=Prakash%2C+A">Aditya Prakash</a>, <a href="/search/cs?searchtype=author&query=Gupta%2C+S">Saurabh Gupta</a>, <a href="/search/cs?searchtype=author&query=He%2C+K">Kun He</a>, <a href="/search/cs?searchtype=author&query=Sato%2C+Y">Yoichi Sato</a>, <a href="/search/cs?searchtype=author&query=Hilliges%2C+O">Otmar Hilliges</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+H+J">Hyung Jin Chang</a>, <a href="/search/cs?searchtype=author&query=Yao%2C+A">Angela Yao</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2403.16428v2-abstract-short" style="display: inline;"> We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the h… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.16428v2-abstract-full').style.display = 'inline'; document.getElementById('2403.16428v2-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2403.16428v2-abstract-full" style="display: none;"> We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the head movement. To this end, we designed the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits. Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks. Our analysis demonstrates the effectiveness of addressing distortion specific to egocentric cameras, adopting high-capacity transformers to learn complex hand-object interactions, and fusing predictions from different views. Our study further reveals challenging scenarios intractable with state-of-the-art methods, such as fast hand motion, object reconstruction from narrow egocentric views, and close contact between two hands and objects. Our efforts will enrich the community's knowledge foundation and facilitate future hand studies on egocentric hand-object interactions. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.16428v2-abstract-full').style.display = 'none'; document.getElementById('2403.16428v2-abstract-short').style.display = 'inline';">△ Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 5 August, 2024; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 25 March, 2024; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> March 2024. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">Accepted to ECCV 2024</span> </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2403.15943</a> <span> [<a href="">pdf</a>, <a href="">ps</a>, <a href="">other</a>] </span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computer Vision and Pattern Recognition">cs.CV</span> </div> </div> <p class="title is-5 mathjax"> Advanced Feature Manipulation for Enhanced Change Detection Leveraging Natural Language Models </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&query=Li%2C+Z">Zhenglin Li</a>, <a href="/search/cs?searchtype=author&query=Huang%2C+Y">Yangchen Huang</a>, <a href="/search/cs?searchtype=author&query=Zhu%2C+M">Mengran Zhu</a>, <a href="/search/cs?searchtype=author&query=Zhang%2C+J">Jingyu Zhang</a>, <a href="/search/cs?searchtype=author&query=Chang%2C+J">JingHao Chang</a>, <a href="/search/cs?searchtype=author&query=Liu%2C+H">Houze Liu</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2403.15943v2-abstract-short" style="display: inline;"> Change detection is a fundamental task in computer vision that processes a bi-temporal image pair to differentiate between semantically altered and unaltered regions. Large language models (LLMs) have been utilized in various domains for their exceptional feature extraction capabilities and have shown promise in numerous downstream applications. In this study, we harness the power of a pre-trained… <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2403.15943v2-abstract-full').style.display = 'inline'; document.getElementById('2403.15943v2-abstract-short').style.display = 'none';">▽ More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2403.15943v2-abstract-full" style="display: none;"> Change detection is a fundamental task in computer vision that processes a bi-temporal image pair to differentiate between semantically altered and unaltered regions. Large language models (LLMs) have been utilized in various domains for their exceptional feature extraction capabilities and have shown promise in numerous downstream applications. In this study, we harness the power of a pre-trained LLM, extracting feature maps from extensive datasets, and employ an auxiliary network to detect changes. Unlike existing LLM-based change detection methods that solely focus on deriving high-quality feature maps, our approach emphasizes the manipulation of these feature maps to enhance semantic relevance. 