<p class="list-title is-inline-block"><a href="">arXiv:2306.12317</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Iterated Piecewise Affine (IPA) Approximation for Language Modeling </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shamsi%2C+D">Davood Shamsi</a>, <a href="/search/cs?searchtype=author&amp;query=Hua%2C+W">Wen-yu Hua</a>, <a href="/search/cs?searchtype=author&amp;query=Williams%2C+B">Brian Williams</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2306.12317v3-abstract-short" style="display: inline;"> In this work, we demonstrate the application of a first-order Taylor expansion to approximate a generic function $F: R^{n \times m} \to R^{n \times m}$ and utilize it in language modeling. To enhance the basic Taylor expansion, we introduce iteration and piecewise modeling, leading us to name the algorithm the Iterative Piecewise Affine (IPA) approximation. The final algorithm exhibits interesting&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2306.12317v3-abstract-full').style.display = 'inline'; document.getElementById('2306.12317v3-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2306.12317v3-abstract-full" style="display: none;"> In this work, we demonstrate the application of a first-order Taylor expansion to approximate a generic function $F: R^{n \times m} \to R^{n \times m}$ and utilize it in language modeling. To enhance the basic Taylor expansion, we introduce iteration and piecewise modeling, leading us to name the algorithm the Iterative Piecewise Affine (IPA) approximation. The final algorithm exhibits interesting resemblances to the Transformers decoder architecture. By comparing parameter arrangements in IPA and Transformers, we observe a strikingly similar performance, with IPA outperforming Transformers by 1.5\% in the next token prediction task with cross-entropy loss for smaller sequence lengths. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2306.12317v3-abstract-full').style.display = 'none'; document.getElementById('2306.12317v3-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 1 November, 2023; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 21 June, 2023; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> June 2023. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2305.06404</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> </div> </div> <p class="title is-5 mathjax"> LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits Siamese-BLOOM </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Hua%2C+W">Wen-Yu Hua</a>, <a href="/search/cs?searchtype=author&amp;query=Williams%2C+B">Brian Williams</a>, <a href="/search/cs?searchtype=author&amp;query=Shamsi%2C+D">Davood Shamsi</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2305.06404v1-abstract-short" style="display: inline;"> Text embeddings are useful features for several NLP applications, such as sentence similarity, text clustering, and semantic search. In this paper, we present a Low-rank Adaptation with a Contrastive objective on top of 8-bit Siamese-BLOOM, a multilingual large language model optimized to produce semantically meaningful word embeddings. The innovation is threefold. First, we cast BLOOM weights to&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2305.06404v1-abstract-full').style.display = 'inline'; document.getElementById('2305.06404v1-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2305.06404v1-abstract-full" style="display: none;"> Text embeddings are useful features for several NLP applications, such as sentence similarity, text clustering, and semantic search. In this paper, we present a Low-rank Adaptation with a Contrastive objective on top of 8-bit Siamese-BLOOM, a multilingual large language model optimized to produce semantically meaningful word embeddings. The innovation is threefold. First, we cast BLOOM weights to 8-bit values. Second, we fine-tune BLOOM with a scalable adapter (LoRA) and 8-bit Adam optimizer for sentence similarity classification. Third, we apply a Siamese architecture on BLOOM model with a contrastive objective to ease the multi-lingual labeled data scarcity. The experiment results show the quality of learned embeddings from LACoS-BLOOM is proportional to the number of model parameters and the amount of unlabeled training data. With the parameter efficient fine-tuning design, we are able to run BLOOM 7.1 billion parameters end-to-end on a single GPU machine with 32GB memory. Compared to previous solution Sentence-BERT, we achieve significant improvement on both English and multi-lingual STS tasks. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2305.06404v1-abstract-full').style.display = 'none'; document.getElementById('2305.06404v1-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 10 May, 2023; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> May 2023. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:2203.01294</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Machine Learning">cs.LG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Artificial Intelligence">cs.AI</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> <div class="is-inline-block" style="margin-left: 0.5rem"> <div class="tags has-addons"> <span class="tag is-dark is-size-7">doi</span> <span class="tag is-light is-size-7"><a class="" href="">10.1007/978-3-031-11644-5 <i class="fa fa-external-link" aria-hidden="true"></i></a></span> </div> </div> </div> <p class="title is-5 mathjax"> Providing Insights for Open-Response Surveys via End-to-End Context-Aware Clustering </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Esmaeilzadeh%2C+S">Soheil Esmaeilzadeh</a>, <a href="/search/cs?searchtype=author&amp;query=Williams%2C+B">Brian Williams</a>, <a href="/search/cs?searchtype=author&amp;query=Shamsi%2C+D">Davood Shamsi</a>, <a href="/search/cs?searchtype=author&amp;query=Vikingstad%2C+O">Onar Vikingstad</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="2203.01294v2-abstract-short" style="display: inline;"> Teachers often conduct surveys in order to collect data from a predefined group of students to gain insights into topics of interest. When analyzing surveys with open-ended textual responses, it is extremely time-consuming, labor-intensive, and difficult to manually process all the responses into an insightful and comprehensive report. In the analysis step, traditionally, the teacher has to read e&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2203.01294v2-abstract-full').style.display = 'inline'; document.getElementById('2203.01294v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="2203.01294v2-abstract-full" style="display: none;"> Teachers often conduct surveys in order to collect data from a predefined group of students to gain insights into topics of interest. When analyzing surveys with open-ended textual responses, it is extremely time-consuming, labor-intensive, and difficult to manually process all the responses into an insightful and comprehensive report. In the analysis step, traditionally, the teacher has to read each of the responses and decide on how to group them in order to extract insightful information. Even though it is possible to group the responses only using certain keywords, such an approach would be limited since it not only fails to account for embedded contexts but also cannot detect polysemous words or phrases and semantics that are not expressible in single words. In this work, we present a novel end-to-end context-aware framework that extracts, aggregates, and abbreviates embedded semantic patterns in open-response survey data. Our framework relies on a pre-trained natural language model in order to encode the textual data into semantic vectors. The encoded vectors then get clustered either into an optimally tuned number of groups or into a set of groups with pre-specified titles. In the former case, the clusters are then further analyzed to extract a representative set of keywords or summary sentences that serve as the labels of the clusters. In our framework, for the designated clusters, we finally provide context-aware wordclouds that demonstrate the semantically prominent keywords within each group. Honoring user privacy, we have successfully built the on-device implementation of our framework suitable for real-time analysis on mobile devices and have tested it on a synthetic dataset. Our framework reduces the costs at-scale by automating the process of extracting the most insightful information pieces from survey data. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('2203.01294v2-abstract-full').style.display = 'none'; document.getElementById('2203.01294v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 8 October, 2022; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 2 March, 2022; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> March 2022. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Journal ref:</span> AIED 2022, Springer vol 13355, pp 526-532 </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:1802.05373</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Computation and Language">cs.CL</span> </div> </div> <p class="title is-5 mathjax"> Improving Retrieval Modeling Using Cross Convolution Networks And Multi Frequency Word Embedding </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=An%2C+G">Guozhen An</a>, <a href="/search/cs?searchtype=author&amp;query=Shafiee%2C+M">Mehrnoosh Shafiee</a>, <a href="/search/cs?searchtype=author&amp;query=Shamsi%2C+D">Davood Shamsi</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="1802.05373v2-abstract-short" style="display: inline;"> To build a satisfying chatbot that has the ability of managing a goal-oriented multi-turn dialogue, accurate modeling of human conversation is crucial. In this paper we concentrate on the task of response selection for multi-turn human-computer conversation with a given context. Previous approaches show weakness in capturing information of rare keywords that appear in either or both context and co&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('1802.05373v2-abstract-full').style.display = 'inline'; document.getElementById('1802.05373v2-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="1802.05373v2-abstract-full" style="display: none;"> To build a satisfying chatbot that has the ability of managing a goal-oriented multi-turn dialogue, accurate modeling of human conversation is crucial. In this paper we concentrate on the task of response selection for multi-turn human-computer conversation with a given context. Previous approaches show weakness in capturing information of rare keywords that appear in either or both context and correct response, and struggle with long input sequences. We propose Cross Convolution Network (CCN) and Multi Frequency word embedding to address both problems. We train several models using the Ubuntu Dialogue dataset which is the largest freely available multi-turn based dialogue corpus. We further build an ensemble model by averaging predictions of multiple models. We achieve a new state-of-the-art on this dataset with considerable improvements compared to previous best results. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('1802.05373v2-abstract-full').style.display = 'none'; document.getElementById('1802.05373v2-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 16 February, 2018; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 14 February, 2018; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> February 2018. </p> </li> <li class="arxiv-result"> <div class="is-marginless"> <p class="list-title is-inline-block"><a href="">arXiv:1010.2262</a> <span>&nbsp;[<a href="">pdf</a>, <a href="">ps</a>, <a href="">other</a>]&nbsp;</span> </p> <div class="tags is-inline-block"> <span class="tag is-small is-link tooltip is-tooltip-top" data-tooltip="Metric Geometry">math.MG</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Data Structures and Algorithms">cs.DS</span> <span class="tag is-small is-grey tooltip is-tooltip-top" data-tooltip="Optimization and Control">math.OC</span> </div> </div> <p class="title is-5 mathjax"> On Sensor Network Localization Using SDP Relaxation </p> <p class="authors"> <span class="search-hit">Authors:</span> <a href="/search/cs?searchtype=author&amp;query=Shamsi%2C+D">Davood Shamsi</a>, <a href="/search/cs?searchtype=author&amp;query=Taheri%2C+N">Nicole Taheri</a>, <a href="/search/cs?searchtype=author&amp;query=Zhu%2C+Z">Zhisu Zhu</a>, <a href="/search/cs?searchtype=author&amp;query=Ye%2C+Y">Yinyu Ye</a> </p> <p class="abstract mathjax"> <span class="has-text-black-bis has-text-weight-semibold">Abstract</span>: <span class="abstract-short has-text-grey-dark mathjax" id="1010.2262v4-abstract-short" style="display: inline;"> A Semidefinite Programming (SDP) relaxation is an effective computational method to solve a Sensor Network Localization problem, which attempts to determine the locations of a group of sensors given the distances between some of them [11]. In this paper, we analyze and determine new sufficient conditions and formulations that guarantee that the SDP relaxation is exact, i.e., gives the correct solu&hellip; <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('1010.2262v4-abstract-full').style.display = 'inline'; document.getElementById('1010.2262v4-abstract-short').style.display = 'none';">&#9661; More</a> </span> <span class="abstract-full has-text-grey-dark mathjax" id="1010.2262v4-abstract-full" style="display: none;"> A Semidefinite Programming (SDP) relaxation is an effective computational method to solve a Sensor Network Localization problem, which attempts to determine the locations of a group of sensors given the distances between some of them [11]. In this paper, we analyze and determine new sufficient conditions and formulations that guarantee that the SDP relaxation is exact, i.e., gives the correct solution. These conditions can be useful for designing sensor networks and managing connectivities in practice. Our main contribution is twofold: We present the first non-asymptotic bound on the connectivity or radio range requirement of the sensors in order to ensure the network is uniquely localizable. Determining this range is a key component in the design of sensor networks, and we provide a result that leads to a correct localization of each sensor, for any number of sensors. Second, we introduce a new class of graphs that can always be correctly localized by an SDP relaxation. Specifically, we show that adding a simple objective function to the SDP relaxation model will ensure that the solution is correct when applied to a triangulation graph. Since triangulation graphs are very sparse, this is informationally efficient, requiring an almost minimal amount of distance information. We also analyze a number objective functions for the SDP relaxation to solve the localization problem for a general graph. <a class="is-size-7" style="white-space: nowrap;" onclick="document.getElementById('1010.2262v4-abstract-full').style.display = 'none'; document.getElementById('1010.2262v4-abstract-short').style.display = 'inline';">&#9651; Less</a> </span> </p> <p class="is-size-7"><span class="has-text-black-bis has-text-weight-semibold">Submitted</span> 15 November, 2012; <span class="has-text-black-bis has-text-weight-semibold">v1</span> submitted 11 October, 2010; <span class="has-text-black-bis has-text-weight-semibold">originally announced</span> October 2010. </p> <p class="comments is-size-7"> <span class="has-text-black-bis has-text-weight-semibold">Comments:</span> <span class="has-text-grey-dark mathjax">20 pages, 4 figures, submitted to the Fields Institute Communications Series on Discrete Geometry and Optimization</span> </p> </li> </ol> <div 