CINXE.COM
A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval
<!DOCTYPE html> <html lang="en"> <head> <meta content="text/html; charset=utf-8" http-equiv="content-type"/> <title>A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval</title> <!--Generated on Sun Mar 2 19:51:08 2025 by LaTeXML (version 0.8.8) http://dlmf.nist.gov/LaTeXML/.--> <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/ar5iv-fonts.0.7.9.min.css" rel="stylesheet" type="text/css"/> <link href="/static/browse/0.3.4/css/latexml_styles.css" rel="stylesheet" type="text/css"/> <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.3.3/html2canvas.min.js"></script> <script src="/static/browse/0.3.4/js/addons_new.js"></script> <script src="/static/browse/0.3.4/js/feedbackOverlay.js"></script> <base href="/html/2503.01003v1/"/></head> <body> <nav class="ltx_page_navbar"> <nav class="ltx_TOC"> <ol class="ltx_toclist"> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S1" title="In A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">1 </span>Introduction</span></a></li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S2" title="In A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">2 </span>Related Works</span></a></li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S3" title="In A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3 </span>Methods</span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S3.SS1" title="In 3 Methods ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.1 </span>Document Indexing</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S3.SS2" title="In 3 Methods ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.2 </span>Semantic Search Pipeline</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S3.SS3" title="In 3 Methods ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">3.3 </span>Post Query Causal Filtering</span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"> <a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S4" title="In A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4 </span>Experiments</span></a> <ol class="ltx_toclist ltx_toclist_section"> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S4.SS1" title="In 4 Experiments ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.1 </span>Data</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S4.SS2" title="In 4 Experiments ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.2 </span>Setup</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S4.SS3" title="In 4 Experiments ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.3 </span>Baselines</span></a></li> <li class="ltx_tocentry ltx_tocentry_subsection"><a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S4.SS4" title="In 4 Experiments ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">4.4 </span>Results</span></a></li> </ol> </li> <li class="ltx_tocentry ltx_tocentry_section"><a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S5" title="In A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_title"><span class="ltx_tag ltx_tag_ref">5 </span>Conclusion</span></a></li> </ol></nav> </nav> <div class="ltx_page_main"> <div class="ltx_page_content"> <article class="ltx_document ltx_authors_1line"> <div class="ltx_para" id="p1"> <span class="ltx_ERROR undefined" id="p1.1">\copyrightclause</span> <p class="ltx_p" id="p1.2">Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p> </div> <div class="ltx_para" id="p2"> <span class="ltx_ERROR undefined" id="p2.1">\conference</span> <p class="ltx_p" id="p2.2">Forum for Information Retrieval Evaluation, December 13-17, 2021, India</p> </div> <div class="ltx_para" id="p3"> <p class="ltx_p" id="p3.1">[orcid=0000-0003-0279-234X, email=d.dalal1@nuigalway.ie]</p> </div> <div class="ltx_para" id="p4"> <p class="ltx_p" id="p4.1">[orcid=0000-0003-0376-2206, email=sharmi.devgupta@cs.ucc.ie, ]</p> </div> <div class="ltx_para" id="p5"> <p class="ltx_p" id="p5.1">[orcid=, email=b.binaei1@nuigalway.ie, ]</p> </div> <h1 class="ltx_title ltx_title_document">A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval</h1> <div class="ltx_authors"> <span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Dhairya Dalal </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_address">SFI Centre for Research and Training in Artificial Intelligence, Data Science Institute, National University of Ireland Galway </span></span></span> <span class="ltx_author_before"> </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Sharmi Dev Gupta </span><span class="ltx_author_notes"> <span class="ltx_contact ltx_role_address">SFI Centre for Research and Training in Artificial Intelligence, School of Computer Science and Information Technology, University College Cork </span></span></span> <span class="ltx_author_before"> </span><span class="ltx_creator ltx_role_author"> <span class="ltx_personname">Bentolhoda Binaei </span></span> </div> <div class="ltx_dates">(2021)</div> <div class="ltx_abstract"> <h6 class="ltx_title ltx_title_abstract">Abstract</h6> <p class="ltx_p" id="id1.id1">We present a unsupervised semantic search pipeline for the Causality-driven Adhoc Information Retrieval (CAIR-2021) shared task. The CAIR shared task expands traditional information retrieval to support the retrieval of documents containing the likely causes of a query event. A successful system must be able to distinguish between topical documents and documents containing causal descriptions of events that are causally related to the query event. Our approach involves aggregating results from multiple query strategies over a semantic and lexical index. The proposed approach leads the CAIR-2021 leaderboard and outperformed both traditional IR and pure semantic embedding-based approaches.</p> </div> <div class="ltx_classification"> <h6 class="ltx_title ltx_title_classification">keywords: </h6> semantic search, causal information retrieval, causality detection, causal search </div> <section class="ltx_section" id="S1"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">1 </span>Introduction</h2> <div class="ltx_para" id="S1.p1"> <p class="ltx_p" id="S1.p1.1">The Causality-driven Adhoc Information Retrieval (CAIR) shared task consists of retrieving documents with the likely causes of a query event <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#bib.bib1" title="">1</a>]</cite>. The search system must be able to differentiate between topical documents and casual documents. Traditional information retrieval (IR) systems usually rely on keyword matching and corpus level n-gram statistics to score which documents are most topically relevant to a provided query. In contrast, given a query event (e.g. Shashi Tharoor resigned), the goal of the causal search system is to identify documents that contain causal information about the events that lead to the query event. For example, causally relevant documents for the query in Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S1.F1" title="Figure 1 ‣ 1 Introduction ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_tag">1</span></a> would refer to the IPL controversy and illicit behavior by Shashi Tharoor. General documents that mention Shashi Tharoor, while topically relevant, may not be causally relevant if they do not contain information about his misbehavior.</p> </div> <div class="ltx_para" id="S1.p2"> <p class="ltx_p" id="S1.p2.1">In this paper, we describe our solution for the CAIR shared task. We design a unsupervised semantic search pipeline, which aggregates results across several query strategies and indices. The pipeline leverages both a lexical index and a semantic index to retrieve causally relevant documents. Our approach both outperformed standard IR baselines and semantic baselines and was the top method on the CAIR-2021 task leaderboard.</p> </div> <figure class="ltx_figure" id="S1.F1"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="183" id="S1.F1.g1" src="extracted/6246156/example.png" width="393"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 1: </span>Example CAIR topic. Each topic consists of a query (title), which describes an event, and a narrative, which contains descriptions of documents that are causally relevant to the event.</figcaption> </figure> </section> <section class="ltx_section" id="S2"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">2 </span>Related Works</h2> <div class="ltx_para" id="S2.p1"> <p class="ltx_p" id="S2.p1.1"><cite class="ltx_cite ltx_citemacro_citet">Datta et al. [<a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#bib.bib2" title="">2</a>]</cite> provide a brief survey of the literature on causality in natural language processing and explore the task of causal information retrieval in the context of news articles. They also introduce a recursive causal retrieval model which allows for identifying the causal chain of events that led to a news event. <cite class="ltx_cite ltx_citemacro_citet">Datta et al. [<a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#bib.bib3" title="">3</a>]</cite> propose an unsupervised pseudo-relevance feedback approach that estimates the distribution of infrequent terms that are potentially relevant to the causality of the query event. Recent advances in IR have focused on neural re-ranking and leveraging latent embeddings to improve the overall recall and semantic relevance of returned results. For example, <cite class="ltx_cite ltx_citemacro_citet">Pang et al. [<a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#bib.bib4" title="">4</a>]</cite> propose SetRank, a permutation-invariant ranking model that jointly learns the embeddings of retrieved documents using self-attention. Most modern IR approaches combine lexical and semantic approaches. For example, <cite class="ltx_cite ltx_citemacro_citet">Gao et al. [<a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#bib.bib5" title="">5</a>]</cite> presents CLEAR in which a residual-based learning framework teaches the neural embedding to be complementary to the lexical retrieval model. Our approach follows the trend of combining lexical models with semantic embeddings.</p> </div> </section> <section class="ltx_section" id="S3"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">3 </span>Methods</h2> <div class="ltx_para" id="S3.p1"> <p class="ltx_p" id="S3.p1.1">Our approach focused on developing an unsupervised semantic search pipeline. Documents were indexed in two indices: a semantic index and a lexical index (see Section <a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S3.SS1" title="3.1 Document Indexing ‣ 3 Methods ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_tag">3.1</span></a>). Results from multiple queries across the two indices were then aggregated to return the most relevant documents. We additionally explored a post query filter step that aimed to identify documents that contained causal language in the context of the query event. This approach did not produce viable results and was not pursued. In this section, we will present our methodology and experimental setup in further detail.</p> </div> <section class="ltx_subsection" id="S3.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">3.1 </span>Document Indexing</h3> <div class="ltx_para" id="S3.SS1.p1"> <p class="ltx_p" id="S3.SS1.p1.4">Two document indices were created for our semantic search pipeline. The first was a <span class="ltx_text ltx_font_bold" id="S3.SS1.p1.4.1">lexical index</span> that treated documents as bags of words and was optimized for Okapi BM25 <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#bib.bib6" title="">6</a>]</cite> retrieval. Before indexing, documents were cleaned and tokenized using standard preprocessing steps: lowercasing, stripping out all non-alphanumeric characters, and lemmatization. Thus each document <math alttext="D" class="ltx_Math" display="inline" id="S3.SS1.p1.1.m1.1"><semantics id="S3.SS1.p1.1.m1.1a"><mi id="S3.SS1.p1.1.m1.1.1" xref="S3.SS1.p1.1.m1.1.1.cmml">D</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.1.m1.1b"><ci id="S3.SS1.p1.1.m1.1.1.cmml" xref="S3.SS1.p1.1.m1.1.1">𝐷</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.1.m1.1c">D</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.1.m1.1d">italic_D</annotation></semantics></math> was broken into lemmatized unigram tokens <math alttext="t_{1}...t_{n}" class="ltx_Math" display="inline" id="S3.SS1.p1.2.m2.1"><semantics id="S3.SS1.p1.2.m2.1a"><mrow id="S3.SS1.p1.2.m2.1.1" xref="S3.SS1.p1.2.m2.1.1.cmml"><msub id="S3.SS1.p1.2.m2.1.1.2" xref="S3.SS1.p1.2.m2.1.1.2.cmml"><mi id="S3.SS1.p1.2.m2.1.1.2.2" xref="S3.SS1.p1.2.m2.1.1.2.2.cmml">t</mi><mn id="S3.SS1.p1.2.m2.1.1.2.3" xref="S3.SS1.p1.2.m2.1.1.2.3.cmml">1</mn></msub><mo id="S3.SS1.p1.2.m2.1.1.1" xref="S3.SS1.p1.2.m2.1.1.1.cmml"></mo><mi id="S3.SS1.p1.2.m2.1.1.3" mathvariant="normal" xref="S3.SS1.p1.2.m2.1.1.3.cmml">…</mi><mo id="S3.SS1.p1.2.m2.1.1.1a" xref="S3.SS1.p1.2.m2.1.1.1.cmml"></mo><msub id="S3.SS1.p1.2.m2.1.1.4" xref="S3.SS1.p1.2.m2.1.1.4.cmml"><mi id="S3.SS1.p1.2.m2.1.1.4.2" xref="S3.SS1.p1.2.m2.1.1.4.2.cmml">t</mi><mi id="S3.SS1.p1.2.m2.1.1.4.3" xref="S3.SS1.p1.2.m2.1.1.4.3.cmml">n</mi></msub></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.2.m2.1b"><apply id="S3.SS1.p1.2.m2.1.1.cmml" xref="S3.SS1.p1.2.m2.1.1"><times id="S3.SS1.p1.2.m2.1.1.1.cmml" xref="S3.SS1.p1.2.m2.1.1.1"></times><apply id="S3.SS1.p1.2.m2.1.1.2.cmml" xref="S3.SS1.p1.2.m2.1.1.2"><csymbol cd="ambiguous" id="S3.SS1.p1.2.m2.1.1.2.1.cmml" xref="S3.SS1.p1.2.m2.1.1.2">subscript</csymbol><ci id="S3.SS1.p1.2.m2.1.1.2.2.cmml" xref="S3.SS1.p1.2.m2.1.1.2.2">𝑡</ci><cn id="S3.SS1.p1.2.m2.1.1.2.3.cmml" type="integer" xref="S3.SS1.p1.2.m2.1.1.2.3">1</cn></apply><ci id="S3.SS1.p1.2.m2.1.1.3.cmml" xref="S3.SS1.p1.2.m2.1.1.3">…</ci><apply id="S3.SS1.p1.2.m2.1.1.4.cmml" xref="S3.SS1.p1.2.m2.1.1.4"><csymbol cd="ambiguous" id="S3.SS1.p1.2.m2.1.1.4.1.cmml" xref="S3.SS1.p1.2.m2.1.1.4">subscript</csymbol><ci id="S3.SS1.p1.2.m2.1.1.4.2.cmml" xref="S3.SS1.p1.2.m2.1.1.4.2">𝑡</ci><ci id="S3.SS1.p1.2.m2.1.1.4.3.cmml" xref="S3.SS1.p1.2.m2.1.1.4.3">𝑛</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.2.m2.1c">t_{1}...t_{n}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.2.m2.1d">italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT</annotation></semantics></math>. Next, the tokenized documents were further processed to support the Okapi BM25 ranking algorithm. Given a query <math alttext="Q" class="ltx_Math" display="inline" id="S3.SS1.p1.3.m3.1"><semantics id="S3.SS1.p1.3.m3.1a"><mi id="S3.SS1.p1.3.m3.1.1" xref="S3.SS1.p1.3.m3.1.1.cmml">Q</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.3.m3.1b"><ci id="S3.SS1.p1.3.m3.1.1.cmml" xref="S3.SS1.p1.3.m3.1.1">𝑄</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.3.m3.1c">Q</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.3.m3.1d">italic_Q</annotation></semantics></math> which consists of query tokens <math alttext="q_{1},...q_{n}" class="ltx_Math" display="inline" id="S3.SS1.p1.4.m4.2"><semantics id="S3.SS1.p1.4.m4.2a"><mrow id="S3.SS1.p1.4.m4.2.2.2" xref="S3.SS1.p1.4.m4.2.2.3.cmml"><msub id="S3.SS1.p1.4.m4.1.1.1.1" xref="S3.SS1.p1.4.m4.1.1.1.1.cmml"><mi id="S3.SS1.p1.4.m4.1.1.1.1.2" xref="S3.SS1.p1.4.m4.1.1.1.1.2.cmml">q</mi><mn id="S3.SS1.p1.4.m4.1.1.1.1.3" xref="S3.SS1.p1.4.m4.1.1.1.1.3.cmml">1</mn></msub><mo id="S3.SS1.p1.4.m4.2.2.2.3" xref="S3.SS1.p1.4.m4.2.2.3.cmml">,</mo><mrow id="S3.SS1.p1.4.m4.2.2.2.2" xref="S3.SS1.p1.4.m4.2.2.2.2.cmml"><mi id="S3.SS1.p1.4.m4.2.2.2.2.2" mathvariant="normal" xref="S3.SS1.p1.4.m4.2.2.2.2.2.cmml">…</mi><mo id="S3.SS1.p1.4.m4.2.2.2.2.1" xref="S3.SS1.p1.4.m4.2.2.2.2.1.cmml"></mo><msub id="S3.SS1.p1.4.m4.2.2.2.2.3" xref="S3.SS1.p1.4.m4.2.2.2.2.3.cmml"><mi id="S3.SS1.p1.4.m4.2.2.2.2.3.2" xref="S3.SS1.p1.4.m4.2.2.2.2.3.2.cmml">q</mi><mi id="S3.SS1.p1.4.m4.2.2.2.2.3.3" xref="S3.SS1.p1.4.m4.2.2.2.2.3.3.cmml">n</mi></msub></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p1.4.m4.2b"><list id="S3.SS1.p1.4.m4.2.2.3.cmml" xref="S3.SS1.p1.4.m4.2.2.2"><apply id="S3.SS1.p1.4.m4.1.1.1.1.cmml" xref="S3.SS1.p1.4.m4.1.1.1.1"><csymbol cd="ambiguous" id="S3.SS1.p1.4.m4.1.1.1.1.1.cmml" xref="S3.SS1.p1.4.m4.1.1.1.1">subscript</csymbol><ci id="S3.SS1.p1.4.m4.1.1.1.1.2.cmml" xref="S3.SS1.p1.4.m4.1.1.1.1.2">𝑞</ci><cn id="S3.SS1.p1.4.m4.1.1.1.1.3.cmml" type="integer" xref="S3.SS1.p1.4.m4.1.1.1.1.3">1</cn></apply><apply id="S3.SS1.p1.4.m4.2.2.2.2.cmml" xref="S3.SS1.p1.4.m4.2.2.2.2"><times id="S3.SS1.p1.4.m4.2.2.2.2.1.cmml" xref="S3.SS1.p1.4.m4.2.2.2.2.1"></times><ci id="S3.SS1.p1.4.m4.2.2.2.2.2.cmml" xref="S3.SS1.p1.4.m4.2.2.2.2.2">…</ci><apply id="S3.SS1.p1.4.m4.2.2.2.2.3.cmml" xref="S3.SS1.p1.4.m4.2.2.2.2.3"><csymbol cd="ambiguous" id="S3.SS1.p1.4.m4.2.2.2.2.3.1.cmml" xref="S3.SS1.p1.4.m4.2.2.2.2.3">subscript</csymbol><ci id="S3.SS1.p1.4.m4.2.2.2.2.3.2.cmml" xref="S3.SS1.p1.4.m4.2.2.2.2.3.2">𝑞</ci><ci id="S3.SS1.p1.4.m4.2.2.2.2.3.3.cmml" xref="S3.SS1.p1.4.m4.2.2.2.2.3.3">𝑛</ci></apply></apply></list></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p1.4.m4.2c">q_{1},...q_{n}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p1.4.m4.2d">italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT</annotation></semantics></math>, we score each document D in our index using the following scoring function:</p> </div> <div class="ltx_para" id="S3.SS1.p2"> <p class="ltx_p" id="S3.SS1.p2.1"><math alttext="Score(D,Q)=\sum_{i=1}^{n}IDF(q_{i})\cdot\frac{freq(q_{i},D)\cdot(k_{1}+1)}{% freq(q_{i},D)+k_{1}\cdot(1-b+b\cdot\frac{D_{l}ength}{avgDL})}" class="ltx_Math" display="inline" id="S3.SS1.p2.1.m1.9"><semantics id="S3.SS1.p2.1.m1.9a"><mrow id="S3.SS1.p2.1.m1.9.9" xref="S3.SS1.p2.1.m1.9.9.cmml"><mrow id="S3.SS1.p2.1.m1.9.9.3" xref="S3.SS1.p2.1.m1.9.9.3.cmml"><mi id="S3.SS1.p2.1.m1.9.9.3.2" xref="S3.SS1.p2.1.m1.9.9.3.2.cmml">S</mi><mo id="S3.SS1.p2.1.m1.9.9.3.1" xref="S3.SS1.p2.1.m1.9.9.3.1.cmml"></mo><mi id="S3.SS1.p2.1.m1.9.9.3.3" xref="S3.SS1.p2.1.m1.9.9.3.3.cmml">c</mi><mo id="S3.SS1.p2.1.m1.9.9.3.1a" xref="S3.SS1.p2.1.m1.9.9.3.1.cmml"></mo><mi id="S3.SS1.p2.1.m1.9.9.3.4" xref="S3.SS1.p2.1.m1.9.9.3.4.cmml">o</mi><mo id="S3.SS1.p2.1.m1.9.9.3.1b" xref="S3.SS1.p2.1.m1.9.9.3.1.cmml"></mo><mi id="S3.SS1.p2.1.m1.9.9.3.5" xref="S3.SS1.p2.1.m1.9.9.3.5.cmml">r</mi><mo id="S3.SS1.p2.1.m1.9.9.3.1c" xref="S3.SS1.p2.1.m1.9.9.3.1.cmml"></mo><mi id="S3.SS1.p2.1.m1.9.9.3.6" xref="S3.SS1.p2.1.m1.9.9.3.6.cmml">e</mi><mo id="S3.SS1.p2.1.m1.9.9.3.1d" xref="S3.SS1.p2.1.m1.9.9.3.1.cmml"></mo><mrow id="S3.SS1.p2.1.m1.9.9.3.7.2" xref="S3.SS1.p2.1.m1.9.9.3.7.1.cmml"><mo id="S3.SS1.p2.1.m1.9.9.3.7.2.1" stretchy="false" xref="S3.SS1.p2.1.m1.9.9.3.7.1.cmml">(</mo><mi id="S3.SS1.p2.1.m1.7.7" xref="S3.SS1.p2.1.m1.7.7.cmml">D</mi><mo id="S3.SS1.p2.1.m1.9.9.3.7.2.2" xref="S3.SS1.p2.1.m1.9.9.3.7.1.cmml">,</mo><mi id="S3.SS1.p2.1.m1.8.8" xref="S3.SS1.p2.1.m1.8.8.cmml">Q</mi><mo id="S3.SS1.p2.1.m1.9.9.3.7.2.3" stretchy="false" xref="S3.SS1.p2.1.m1.9.9.3.7.1.cmml">)</mo></mrow></mrow><mo id="S3.SS1.p2.1.m1.9.9.2" rspace="0.111em" xref="S3.SS1.p2.1.m1.9.9.2.cmml">=</mo><mrow id="S3.SS1.p2.1.m1.9.9.1" xref="S3.SS1.p2.1.m1.9.9.1.cmml"><msubsup id="S3.SS1.p2.1.m1.9.9.1.2" xref="S3.SS1.p2.1.m1.9.9.1.2.cmml"><mo id="S3.SS1.p2.1.m1.9.9.1.2.2.2" xref="S3.SS1.p2.1.m1.9.9.1.2.2.2.cmml">∑</mo><mrow id="S3.SS1.p2.1.m1.9.9.1.2.2.3" xref="S3.SS1.p2.1.m1.9.9.1.2.2.3.cmml"><mi id="S3.SS1.p2.1.m1.9.9.1.2.2.3.2" xref="S3.SS1.p2.1.m1.9.9.1.2.2.3.2.cmml">i</mi><mo id="S3.SS1.p2.1.m1.9.9.1.2.2.3.1" xref="S3.SS1.p2.1.m1.9.9.1.2.2.3.1.cmml">=</mo><mn id="S3.SS1.p2.1.m1.9.9.1.2.2.3.3" xref="S3.SS1.p2.1.m1.9.9.1.2.2.3.3.cmml">1</mn></mrow><mi id="S3.SS1.p2.1.m1.9.9.1.2.3" xref="S3.SS1.p2.1.m1.9.9.1.2.3.cmml">n</mi></msubsup><mrow id="S3.SS1.p2.1.m1.9.9.1.1" xref="S3.SS1.p2.1.m1.9.9.1.1.cmml"><mrow id="S3.SS1.p2.1.m1.9.9.1.1.1" xref="S3.SS1.p2.1.m1.9.9.1.1.1.cmml"><mi id="S3.SS1.p2.1.m1.9.9.1.1.1.3" xref="S3.SS1.p2.1.m1.9.9.1.1.1.3.cmml">I</mi><mo id="S3.SS1.p2.1.m1.9.9.1.1.1.2" xref="S3.SS1.p2.1.m1.9.9.1.1.1.2.cmml"></mo><mi id="S3.SS1.p2.1.m1.9.9.1.1.1.4" xref="S3.SS1.p2.1.m1.9.9.1.1.1.4.cmml">D</mi><mo id="S3.SS1.p2.1.m1.9.9.1.1.1.2a" xref="S3.SS1.p2.1.m1.9.9.1.1.1.2.cmml"></mo><mi id="S3.SS1.p2.1.m1.9.9.1.1.1.5" xref="S3.SS1.p2.1.m1.9.9.1.1.1.5.cmml">F</mi><mo id="S3.SS1.p2.1.m1.9.9.1.1.1.2b" xref="S3.SS1.p2.1.m1.9.9.1.1.1.2.cmml"></mo><mrow id="S3.SS1.p2.1.m1.9.9.1.1.1.1.1" xref="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1.cmml"><mo id="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.2" stretchy="false" xref="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1.cmml">(</mo><msub id="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1" xref="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1.cmml"><mi id="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1.2" xref="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1.2.cmml">q</mi><mi id="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1.3" xref="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1.3.cmml">i</mi></msub><mo id="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.3" rspace="0.055em" stretchy="false" xref="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1.cmml">)</mo></mrow></mrow><mo id="S3.SS1.p2.1.m1.9.9.1.1.2" rspace="0.222em" xref="S3.SS1.p2.1.m1.9.9.1.1.2.cmml">⋅</mo><mfrac id="S3.SS1.p2.1.m1.6.6" xref="S3.SS1.p2.1.m1.6.6.cmml"><mrow id="S3.SS1.p2.1.m1.3.3.3" xref="S3.SS1.p2.1.m1.3.3.3.cmml"><mrow id="S3.SS1.p2.1.m1.2.2.2.2" xref="S3.SS1.p2.1.m1.2.2.2.2.cmml"><mi id="S3.SS1.p2.1.m1.2.2.2.2.3" xref="S3.SS1.p2.1.m1.2.2.2.2.3.cmml">f</mi><mo id="S3.SS1.p2.1.m1.2.2.2.2.2" xref="S3.SS1.p2.1.m1.2.2.2.2.2.cmml"></mo><mi id="S3.SS1.p2.1.m1.2.2.2.2.4" xref="S3.SS1.p2.1.m1.2.2.2.2.4.cmml">r</mi><mo id="S3.SS1.p2.1.m1.2.2.2.2.2a" xref="S3.SS1.p2.1.m1.2.2.2.2.2.cmml"></mo><mi id="S3.SS1.p2.1.m1.2.2.2.2.5" xref="S3.SS1.p2.1.m1.2.2.2.2.5.cmml">e</mi><mo id="S3.SS1.p2.1.m1.2.2.2.2.2b" xref="S3.SS1.p2.1.m1.2.2.2.2.2.cmml"></mo><mi id="S3.SS1.p2.1.m1.2.2.2.2.6" xref="S3.SS1.p2.1.m1.2.2.2.2.6.cmml">q</mi><mo id="S3.SS1.p2.1.m1.2.2.2.2.2c" xref="S3.SS1.p2.1.m1.2.2.2.2.2.cmml"></mo><mrow id="S3.SS1.p2.1.m1.2.2.2.2.1.1" xref="S3.SS1.p2.1.m1.2.2.2.2.1.2.cmml"><mo id="S3.SS1.p2.1.m1.2.2.2.2.1.1.2" stretchy="false" xref="S3.SS1.p2.1.m1.2.2.2.2.1.2.cmml">(</mo><msub id="S3.SS1.p2.1.m1.2.2.2.2.1.1.1" xref="S3.SS1.p2.1.m1.2.2.2.2.1.1.1.cmml"><mi id="S3.SS1.p2.1.m1.2.2.2.2.1.1.1.2" xref="S3.SS1.p2.1.m1.2.2.2.2.1.1.1.2.cmml">q</mi><mi id="S3.SS1.p2.1.m1.2.2.2.2.1.1.1.3" xref="S3.SS1.p2.1.m1.2.2.2.2.1.1.1.3.cmml">i</mi></msub><mo id="S3.SS1.p2.1.m1.2.2.2.2.1.1.3" xref="S3.SS1.p2.1.m1.2.2.2.2.1.2.cmml">,</mo><mi id="S3.SS1.p2.1.m1.1.1.1.1" xref="S3.SS1.p2.1.m1.1.1.1.1.cmml">D</mi><mo id="S3.SS1.p2.1.m1.2.2.2.2.1.1.4" rspace="0.055em" stretchy="false" xref="S3.SS1.p2.1.m1.2.2.2.2.1.2.cmml">)</mo></mrow></mrow><mo id="S3.SS1.p2.1.m1.3.3.3.4" rspace="0.222em" xref="S3.SS1.p2.1.m1.3.3.3.4.cmml">⋅</mo><mrow id="S3.SS1.p2.1.m1.3.3.3.3.1" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.cmml"><mo id="S3.SS1.p2.1.m1.3.3.3.3.1.2" stretchy="false" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.cmml">(</mo><mrow id="S3.SS1.p2.1.m1.3.3.3.3.1.1" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.cmml"><msub id="S3.SS1.p2.1.m1.3.3.3.3.1.1.2" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.2.cmml"><mi id="S3.SS1.p2.1.m1.3.3.3.3.1.1.2.2" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.2.2.cmml">k</mi><mn id="S3.SS1.p2.1.m1.3.3.3.3.1.1.2.3" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.2.3.cmml">1</mn></msub><mo id="S3.SS1.p2.1.m1.3.3.3.3.1.1.1" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.1.cmml">+</mo><mn id="S3.SS1.p2.1.m1.3.3.3.3.1.1.3" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.3.cmml">1</mn></mrow><mo id="S3.SS1.p2.1.m1.3.3.3.3.1.3" stretchy="false" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.cmml">)</mo></mrow></mrow><mrow id="S3.SS1.p2.1.m1.6.6.6" xref="S3.SS1.p2.1.m1.6.6.6.cmml"><mrow id="S3.SS1.p2.1.m1.5.5.5.2" xref="S3.SS1.p2.1.m1.5.5.5.2.cmml"><mi id="S3.SS1.p2.1.m1.5.5.5.2.3" xref="S3.SS1.p2.1.m1.5.5.5.2.3.cmml">f</mi><mo id="S3.SS1.p2.1.m1.5.5.5.2.2" xref="S3.SS1.p2.1.m1.5.5.5.2.2.cmml"></mo><mi id="S3.SS1.p2.1.m1.5.5.5.2.4" xref="S3.SS1.p2.1.m1.5.5.5.2.4.cmml">r</mi><mo id="S3.SS1.p2.1.m1.5.5.5.2.2a" xref="S3.SS1.p2.1.m1.5.5.5.2.2.cmml"></mo><mi id="S3.SS1.p2.1.m1.5.5.5.2.5" xref="S3.SS1.p2.1.m1.5.5.5.2.5.cmml">e</mi><mo id="S3.SS1.p2.1.m1.5.5.5.2.2b" xref="S3.SS1.p2.1.m1.5.5.5.2.2.cmml"></mo><mi id="S3.SS1.p2.1.m1.5.5.5.2.6" xref="S3.SS1.p2.1.m1.5.5.5.2.6.cmml">q</mi><mo id="S3.SS1.p2.1.m1.5.5.5.2.2c" xref="S3.SS1.p2.1.m1.5.5.5.2.2.cmml"></mo><mrow id="S3.SS1.p2.1.m1.5.5.5.2.1.1" xref="S3.SS1.p2.1.m1.5.5.5.2.1.2.cmml"><mo id="S3.SS1.p2.1.m1.5.5.5.2.1.1.2" stretchy="false" xref="S3.SS1.p2.1.m1.5.5.5.2.1.2.cmml">(</mo><msub id="S3.SS1.p2.1.m1.5.5.5.2.1.1.1" xref="S3.SS1.p2.1.m1.5.5.5.2.1.1.1.cmml"><mi id="S3.SS1.p2.1.m1.5.5.5.2.1.1.1.2" xref="S3.SS1.p2.1.m1.5.5.5.2.1.1.1.2.cmml">q</mi><mi id="S3.SS1.p2.1.m1.5.5.5.2.1.1.1.3" xref="S3.SS1.p2.1.m1.5.5.5.2.1.1.1.3.cmml">i</mi></msub><mo id="S3.SS1.p2.1.m1.5.5.5.2.1.1.3" xref="S3.SS1.p2.1.m1.5.5.5.2.1.2.cmml">,</mo><mi id="S3.SS1.p2.1.m1.4.4.4.1" xref="S3.SS1.p2.1.m1.4.4.4.1.cmml">D</mi><mo id="S3.SS1.p2.1.m1.5.5.5.2.1.1.4" stretchy="false" xref="S3.SS1.p2.1.m1.5.5.5.2.1.2.cmml">)</mo></mrow></mrow><mo id="S3.SS1.p2.1.m1.6.6.6.4" xref="S3.SS1.p2.1.m1.6.6.6.4.cmml">+</mo><mrow id="S3.SS1.p2.1.m1.6.6.6.3" xref="S3.SS1.p2.1.m1.6.6.6.3.cmml"><msub id="S3.SS1.p2.1.m1.6.6.6.3.3" xref="S3.SS1.p2.1.m1.6.6.6.3.3.cmml"><mi id="S3.SS1.p2.1.m1.6.6.6.3.3.2" xref="S3.SS1.p2.1.m1.6.6.6.3.3.2.cmml">k</mi><mn id="S3.SS1.p2.1.m1.6.6.6.3.3.3" xref="S3.SS1.p2.1.m1.6.6.6.3.3.3.cmml">1</mn></msub><mo id="S3.SS1.p2.1.m1.6.6.6.3.2" lspace="0.222em" rspace="0.222em" xref="S3.SS1.p2.1.m1.6.6.6.3.2.cmml">⋅</mo><mrow id="S3.SS1.p2.1.m1.6.6.6.3.1.1" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.cmml"><mo id="S3.SS1.p2.1.m1.6.6.6.3.1.1.2" stretchy="false" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.cmml">(</mo><mrow id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.cmml"><mrow id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2.cmml"><mn id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2.2" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2.2.cmml">1</mn><mo id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2.1" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2.1.cmml">−</mo><mi id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2.3" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2.3.cmml">b</mi></mrow><mo id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.1" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.1.cmml">+</mo><mrow id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.cmml"><mi id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.2" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.2.cmml">b</mi><mo id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.1" lspace="0.222em" rspace="0.222em" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.1.cmml">⋅</mo><mfrac id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.cmml"><mrow id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.cmml"><msub id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.2" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.2.cmml"><mi id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.2.2" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.2.2.cmml">D</mi><mi id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.2.3" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.2.3.cmml">l</mi></msub><mo id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.1" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.1.cmml"></mo><mi id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.3" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.3.cmml">e</mi><mo id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.1a" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.1.cmml"></mo><mi id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.4" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.4.cmml">n</mi><mo id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.1b" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.1.cmml"></mo><mi id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.5" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.5.cmml">g</mi><mo id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.1c" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.1.cmml"></mo><mi id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.6" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.6.cmml">t</mi><mo id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.1d" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.1.cmml"></mo><mi id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.7" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.7.cmml">h</mi></mrow><mrow id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.cmml"><mi id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.2" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.2.cmml">a</mi><mo id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.1" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.1.cmml"></mo><mi id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.3" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.3.cmml">v</mi><mo id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.1a" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.1.cmml"></mo><mi id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.4" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.4.cmml">g</mi><mo id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.1b" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.1.cmml"></mo><mi id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.5" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.5.cmml">D</mi><mo id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.1c" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.1.cmml"></mo><mi id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.6" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.6.cmml">L</mi></mrow></mfrac></mrow></mrow><mo id="S3.SS1.p2.1.m1.6.6.6.3.1.1.3" stretchy="false" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.cmml">)</mo></mrow></mrow></mrow></mfrac></mrow></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p2.1.m1.9b"><apply id="S3.SS1.p2.1.m1.9.9.cmml" xref="S3.SS1.p2.1.m1.9.9"><eq id="S3.SS1.p2.1.m1.9.9.2.cmml" xref="S3.SS1.p2.1.m1.9.9.2"></eq><apply id="S3.SS1.p2.1.m1.9.9.3.cmml" xref="S3.SS1.p2.1.m1.9.9.3"><times id="S3.SS1.p2.1.m1.9.9.3.1.cmml" xref="S3.SS1.p2.1.m1.9.9.3.1"></times><ci id="S3.SS1.p2.1.m1.9.9.3.2.cmml" xref="S3.SS1.p2.1.m1.9.9.3.2">𝑆</ci><ci id="S3.SS1.p2.1.m1.9.9.3.3.cmml" xref="S3.SS1.p2.1.m1.9.9.3.3">𝑐</ci><ci id="S3.SS1.p2.1.m1.9.9.3.4.cmml" xref="S3.SS1.p2.1.m1.9.9.3.4">𝑜</ci><ci id="S3.SS1.p2.1.m1.9.9.3.5.cmml" xref="S3.SS1.p2.1.m1.9.9.3.5">𝑟</ci><ci id="S3.SS1.p2.1.m1.9.9.3.6.cmml" xref="S3.SS1.p2.1.m1.9.9.3.6">𝑒</ci><interval closure="open" id="S3.SS1.p2.1.m1.9.9.3.7.1.cmml" xref="S3.SS1.p2.1.m1.9.9.3.7.2"><ci id="S3.SS1.p2.1.m1.7.7.cmml" xref="S3.SS1.p2.1.m1.7.7">𝐷</ci><ci id="S3.SS1.p2.1.m1.8.8.cmml" xref="S3.SS1.p2.1.m1.8.8">𝑄</ci></interval></apply><apply id="S3.SS1.p2.1.m1.9.9.1.cmml" xref="S3.SS1.p2.1.m1.9.9.1"><apply id="S3.SS1.p2.1.m1.9.9.1.2.cmml" xref="S3.SS1.p2.1.m1.9.9.1.2"><csymbol cd="ambiguous" id="S3.SS1.p2.1.m1.9.9.1.2.1.cmml" xref="S3.SS1.p2.1.m1.9.9.1.2">superscript</csymbol><apply id="S3.SS1.p2.1.m1.9.9.1.2.2.cmml" xref="S3.SS1.p2.1.m1.9.9.1.2"><csymbol cd="ambiguous" id="S3.SS1.p2.1.m1.9.9.1.2.2.1.cmml" xref="S3.SS1.p2.1.m1.9.9.1.2">subscript</csymbol><sum id="S3.SS1.p2.1.m1.9.9.1.2.2.2.cmml" xref="S3.SS1.p2.1.m1.9.9.1.2.2.2"></sum><apply id="S3.SS1.p2.1.m1.9.9.1.2.2.3.cmml" xref="S3.SS1.p2.1.m1.9.9.1.2.2.3"><eq id="S3.SS1.p2.1.m1.9.9.1.2.2.3.1.cmml" xref="S3.SS1.p2.1.m1.9.9.1.2.2.3.1"></eq><ci id="S3.SS1.p2.1.m1.9.9.1.2.2.3.2.cmml" xref="S3.SS1.p2.1.m1.9.9.1.2.2.3.2">𝑖</ci><cn id="S3.SS1.p2.1.m1.9.9.1.2.2.3.3.cmml" type="integer" xref="S3.SS1.p2.1.m1.9.9.1.2.2.3.3">1</cn></apply></apply><ci id="S3.SS1.p2.1.m1.9.9.1.2.3.cmml" xref="S3.SS1.p2.1.m1.9.9.1.2.3">𝑛</ci></apply><apply id="S3.SS1.p2.1.m1.9.9.1.1.cmml" xref="S3.SS1.p2.1.m1.9.9.1.1"><ci id="S3.SS1.p2.1.m1.9.9.1.1.2.cmml" xref="S3.SS1.p2.1.m1.9.9.1.1.2">⋅</ci><apply id="S3.SS1.p2.1.m1.9.9.1.1.1.cmml" xref="S3.SS1.p2.1.m1.9.9.1.1.1"><times id="S3.SS1.p2.1.m1.9.9.1.1.1.2.cmml" xref="S3.SS1.p2.1.m1.9.9.1.1.1.2"></times><ci id="S3.SS1.p2.1.m1.9.9.1.1.1.3.cmml" xref="S3.SS1.p2.1.m1.9.9.1.1.1.3">𝐼</ci><ci id="S3.SS1.p2.1.m1.9.9.1.1.1.4.cmml" xref="S3.SS1.p2.1.m1.9.9.1.1.1.4">𝐷</ci><ci id="S3.SS1.p2.1.m1.9.9.1.1.1.5.cmml" xref="S3.SS1.p2.1.m1.9.9.1.1.1.5">𝐹</ci><apply id="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1.cmml" xref="S3.SS1.p2.1.m1.9.9.1.1.1.1.1"><csymbol cd="ambiguous" id="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1.1.cmml" xref="S3.SS1.p2.1.m1.9.9.1.1.1.1.1">subscript</csymbol><ci id="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1.2.cmml" xref="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1.2">𝑞</ci><ci id="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1.3.cmml" xref="S3.SS1.p2.1.m1.9.9.1.1.1.1.1.1.3">𝑖</ci></apply></apply><apply id="S3.SS1.p2.1.m1.6.6.cmml" xref="S3.SS1.p2.1.m1.6.6"><divide id="S3.SS1.p2.1.m1.6.6.7.cmml" xref="S3.SS1.p2.1.m1.6.6"></divide><apply id="S3.SS1.p2.1.m1.3.3.3.cmml" xref="S3.SS1.p2.1.m1.3.3.3"><ci id="S3.SS1.p2.1.m1.3.3.3.4.cmml" xref="S3.SS1.p2.1.m1.3.3.3.4">⋅</ci><apply id="S3.SS1.p2.1.m1.2.2.2.2.cmml" xref="S3.SS1.p2.1.m1.2.2.2.2"><times id="S3.SS1.p2.1.m1.2.2.2.2.2.cmml" xref="S3.SS1.p2.1.m1.2.2.2.2.2"></times><ci id="S3.SS1.p2.1.m1.2.2.2.2.3.cmml" xref="S3.SS1.p2.1.m1.2.2.2.2.3">𝑓</ci><ci id="S3.SS1.p2.1.m1.2.2.2.2.4.cmml" xref="S3.SS1.p2.1.m1.2.2.2.2.4">𝑟</ci><ci id="S3.SS1.p2.1.m1.2.2.2.2.5.cmml" xref="S3.SS1.p2.1.m1.2.2.2.2.5">𝑒</ci><ci id="S3.SS1.p2.1.m1.2.2.2.2.6.cmml" xref="S3.SS1.p2.1.m1.2.2.2.2.6">𝑞</ci><interval closure="open" id="S3.SS1.p2.1.m1.2.2.2.2.1.2.cmml" xref="S3.SS1.p2.1.m1.2.2.2.2.1.1"><apply id="S3.SS1.p2.1.m1.2.2.2.2.1.1.1.cmml" xref="S3.SS1.p2.1.m1.2.2.2.2.1.1.1"><csymbol cd="ambiguous" id="S3.SS1.p2.1.m1.2.2.2.2.1.1.1.1.cmml" xref="S3.SS1.p2.1.m1.2.2.2.2.1.1.1">subscript</csymbol><ci id="S3.SS1.p2.1.m1.2.2.2.2.1.1.1.2.cmml" xref="S3.SS1.p2.1.m1.2.2.2.2.1.1.1.2">𝑞</ci><ci id="S3.SS1.p2.1.m1.2.2.2.2.1.1.1.3.cmml" xref="S3.SS1.p2.1.m1.2.2.2.2.1.1.1.3">𝑖</ci></apply><ci id="S3.SS1.p2.1.m1.1.1.1.1.cmml" xref="S3.SS1.p2.1.m1.1.1.1.1">𝐷</ci></interval></apply><apply id="S3.SS1.p2.1.m1.3.3.3.3.1.1.cmml" xref="S3.SS1.p2.1.m1.3.3.3.3.1"><plus id="S3.SS1.p2.1.m1.3.3.3.3.1.1.1.cmml" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.1"></plus><apply id="S3.SS1.p2.1.m1.3.3.3.3.1.1.2.cmml" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.2"><csymbol cd="ambiguous" id="S3.SS1.p2.1.m1.3.3.3.3.1.1.2.1.cmml" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.2">subscript</csymbol><ci id="S3.SS1.p2.1.m1.3.3.3.3.1.1.2.2.cmml" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.2.2">𝑘</ci><cn id="S3.SS1.p2.1.m1.3.3.3.3.1.1.2.3.cmml" type="integer" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.2.3">1</cn></apply><cn id="S3.SS1.p2.1.m1.3.3.3.3.1.1.3.cmml" type="integer" xref="S3.SS1.p2.1.m1.3.3.3.3.1.1.3">1</cn></apply></apply><apply id="S3.SS1.p2.1.m1.6.6.6.cmml" xref="S3.SS1.p2.1.m1.6.6.6"><plus id="S3.SS1.p2.1.m1.6.6.6.4.cmml" xref="S3.SS1.p2.1.m1.6.6.6.4"></plus><apply id="S3.SS1.p2.1.m1.5.5.5.2.cmml" xref="S3.SS1.p2.1.m1.5.5.5.2"><times id="S3.SS1.p2.1.m1.5.5.5.2.2.cmml" xref="S3.SS1.p2.1.m1.5.5.5.2.2"></times><ci id="S3.SS1.p2.1.m1.5.5.5.2.3.cmml" xref="S3.SS1.p2.1.m1.5.5.5.2.3">𝑓</ci><ci id="S3.SS1.p2.1.m1.5.5.5.2.4.cmml" xref="S3.SS1.p2.1.m1.5.5.5.2.4">𝑟</ci><ci id="S3.SS1.p2.1.m1.5.5.5.2.5.cmml" xref="S3.SS1.p2.1.m1.5.5.5.2.5">𝑒</ci><ci id="S3.SS1.p2.1.m1.5.5.5.2.6.cmml" xref="S3.SS1.p2.1.m1.5.5.5.2.6">𝑞</ci><interval closure="open" id="S3.SS1.p2.1.m1.5.5.5.2.1.2.cmml" xref="S3.SS1.p2.1.m1.5.5.5.2.1.1"><apply id="S3.SS1.p2.1.m1.5.5.5.2.1.1.1.cmml" xref="S3.SS1.p2.1.m1.5.5.5.2.1.1.1"><csymbol cd="ambiguous" id="S3.SS1.p2.1.m1.5.5.5.2.1.1.1.1.cmml" xref="S3.SS1.p2.1.m1.5.5.5.2.1.1.1">subscript</csymbol><ci id="S3.SS1.p2.1.m1.5.5.5.2.1.1.1.2.cmml" xref="S3.SS1.p2.1.m1.5.5.5.2.1.1.1.2">𝑞</ci><ci id="S3.SS1.p2.1.m1.5.5.5.2.1.1.1.3.cmml" xref="S3.SS1.p2.1.m1.5.5.5.2.1.1.1.3">𝑖</ci></apply><ci id="S3.SS1.p2.1.m1.4.4.4.1.cmml" xref="S3.SS1.p2.1.m1.4.4.4.1">𝐷</ci></interval></apply><apply id="S3.SS1.p2.1.m1.6.6.6.3.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3"><ci id="S3.SS1.p2.1.m1.6.6.6.3.2.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.2">⋅</ci><apply id="S3.SS1.p2.1.m1.6.6.6.3.3.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.3"><csymbol cd="ambiguous" id="S3.SS1.p2.1.m1.6.6.6.3.3.1.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.3">subscript</csymbol><ci id="S3.SS1.p2.1.m1.6.6.6.3.3.2.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.3.2">𝑘</ci><cn id="S3.SS1.p2.1.m1.6.6.6.3.3.3.cmml" type="integer" xref="S3.SS1.p2.1.m1.6.6.6.3.3.3">1</cn></apply><apply id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1"><plus id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.1.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.1"></plus><apply id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2"><minus id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2.1.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2.1"></minus><cn id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2.2.cmml" type="integer" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2.2">1</cn><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2.3.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.2.3">𝑏</ci></apply><apply id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3"><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.1.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.1">⋅</ci><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.2.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.2">𝑏</ci><apply id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3"><divide id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.1.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3"></divide><apply id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2"><times id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.1.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.1"></times><apply id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.2.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.2"><csymbol cd="ambiguous" id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.2.1.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.2">subscript</csymbol><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.2.2.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.2.2">𝐷</ci><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.2.3.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.2.3">𝑙</ci></apply><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.3.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.3">𝑒</ci><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.4.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.4">𝑛</ci><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.5.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.5">𝑔</ci><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.6.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.6">𝑡</ci><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.7.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.2.7">ℎ</ci></apply><apply id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3"><times id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.1.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.1"></times><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.2.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.2">𝑎</ci><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.3.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.3">𝑣</ci><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.4.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.4">𝑔</ci><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.5.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.5">𝐷</ci><ci id="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.6.cmml" xref="S3.SS1.p2.1.m1.6.6.6.3.1.1.1.3.3.3.6">𝐿</ci></apply></apply></apply></apply></apply></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p2.1.m1.9c">Score(D,Q)=\sum_{i=1}^{n}IDF(q_{i})\cdot\frac{freq(q_{i},D)\cdot(k_{1}+1)}{% freq(q_{i},D)+k_{1}\cdot(1-b+b\cdot\frac{D_{l}ength}{avgDL})}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p2.1.m1.9d">italic_S italic_c italic_o italic_r italic_e ( italic_D , italic_Q ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_I italic_D italic_F ( italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ divide start_ARG italic_f italic_r italic_e italic_q ( italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_D ) ⋅ ( italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 ) end_ARG start_ARG italic_f italic_r italic_e italic_q ( italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_D ) + italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ ( 1 - italic_b + italic_b ⋅ divide start_ARG italic_D start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_e italic_n italic_g italic_t italic_h end_ARG start_ARG italic_a italic_v italic_g italic_D italic_L end_ARG ) end_ARG</annotation></semantics></math></p> </div> <div class="ltx_para" id="S3.SS1.p3"> <p class="ltx_p" id="S3.SS1.p3.4"><math alttext="IDF(q_{i})" class="ltx_Math" display="inline" id="S3.SS1.p3.1.m1.1"><semantics id="S3.SS1.p3.1.m1.1a"><mrow id="S3.SS1.p3.1.m1.1.1" xref="S3.SS1.p3.1.m1.1.1.cmml"><mi id="S3.SS1.p3.1.m1.1.1.3" xref="S3.SS1.p3.1.m1.1.1.3.cmml">I</mi><mo id="S3.SS1.p3.1.m1.1.1.2" xref="S3.SS1.p3.1.m1.1.1.2.cmml"></mo><mi id="S3.SS1.p3.1.m1.1.1.4" xref="S3.SS1.p3.1.m1.1.1.4.cmml">D</mi><mo id="S3.SS1.p3.1.m1.1.1.2a" xref="S3.SS1.p3.1.m1.1.1.2.cmml"></mo><mi id="S3.SS1.p3.1.m1.1.1.5" xref="S3.SS1.p3.1.m1.1.1.5.cmml">F</mi><mo id="S3.SS1.p3.1.m1.1.1.2b" xref="S3.SS1.p3.1.m1.1.1.2.cmml"></mo><mrow id="S3.SS1.p3.1.m1.1.1.1.1" xref="S3.SS1.p3.1.m1.1.1.1.1.1.cmml"><mo id="S3.SS1.p3.1.m1.1.1.1.1.2" stretchy="false" xref="S3.SS1.p3.1.m1.1.1.1.1.1.cmml">(</mo><msub id="S3.SS1.p3.1.m1.1.1.1.1.1" xref="S3.SS1.p3.1.m1.1.1.1.1.1.cmml"><mi id="S3.SS1.p3.1.m1.1.1.1.1.1.2" xref="S3.SS1.p3.1.m1.1.1.1.1.1.2.cmml">q</mi><mi id="S3.SS1.p3.1.m1.1.1.1.1.1.3" xref="S3.SS1.p3.1.m1.1.1.1.1.1.3.cmml">i</mi></msub><mo id="S3.SS1.p3.1.m1.1.1.1.1.3" stretchy="false" xref="S3.SS1.p3.1.m1.1.1.1.1.1.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p3.1.m1.1b"><apply id="S3.SS1.p3.1.m1.1.1.cmml" xref="S3.SS1.p3.1.m1.1.1"><times id="S3.SS1.p3.1.m1.1.1.2.cmml" xref="S3.SS1.p3.1.m1.1.1.2"></times><ci id="S3.SS1.p3.1.m1.1.1.3.cmml" xref="S3.SS1.p3.1.m1.1.1.3">𝐼</ci><ci id="S3.SS1.p3.1.m1.1.1.4.cmml" xref="S3.SS1.p3.1.m1.1.1.4">𝐷</ci><ci id="S3.SS1.p3.1.m1.1.1.5.cmml" xref="S3.SS1.p3.1.m1.1.1.5">𝐹</ci><apply id="S3.SS1.p3.1.m1.1.1.1.1.1.cmml" xref="S3.SS1.p3.1.m1.1.1.1.1"><csymbol cd="ambiguous" id="S3.SS1.p3.1.m1.1.1.1.1.1.1.cmml" xref="S3.SS1.p3.1.m1.1.1.1.1">subscript</csymbol><ci id="S3.SS1.p3.1.m1.1.1.1.1.1.2.cmml" xref="S3.SS1.p3.1.m1.1.1.1.1.1.2">𝑞</ci><ci id="S3.SS1.p3.1.m1.1.1.1.1.1.3.cmml" xref="S3.SS1.p3.1.m1.1.1.1.1.1.3">𝑖</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p3.1.m1.1c">IDF(q_{i})</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p3.1.m1.1d">italic_I italic_D italic_F ( italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )</annotation></semantics></math> is the inverse document frequency of the query token and <math alttext="freq(q_{i},D)" class="ltx_Math" display="inline" id="S3.SS1.p3.2.m2.2"><semantics id="S3.SS1.p3.2.m2.2a"><mrow id="S3.SS1.p3.2.m2.2.2" xref="S3.SS1.p3.2.m2.2.2.cmml"><mi id="S3.SS1.p3.2.m2.2.2.3" xref="S3.SS1.p3.2.m2.2.2.3.cmml">f</mi><mo id="S3.SS1.p3.2.m2.2.2.2" xref="S3.SS1.p3.2.m2.2.2.2.cmml"></mo><mi id="S3.SS1.p3.2.m2.2.2.4" xref="S3.SS1.p3.2.m2.2.2.4.cmml">r</mi><mo id="S3.SS1.p3.2.m2.2.2.2a" xref="S3.SS1.p3.2.m2.2.2.2.cmml"></mo><mi id="S3.SS1.p3.2.m2.2.2.5" xref="S3.SS1.p3.2.m2.2.2.5.cmml">e</mi><mo id="S3.SS1.p3.2.m2.2.2.2b" xref="S3.SS1.p3.2.m2.2.2.2.cmml"></mo><mi id="S3.SS1.p3.2.m2.2.2.6" xref="S3.SS1.p3.2.m2.2.2.6.cmml">q</mi><mo id="S3.SS1.p3.2.m2.2.2.2c" xref="S3.SS1.p3.2.m2.2.2.2.cmml"></mo><mrow id="S3.SS1.p3.2.m2.2.2.1.1" xref="S3.SS1.p3.2.m2.2.2.1.2.cmml"><mo id="S3.SS1.p3.2.m2.2.2.1.1.2" stretchy="false" xref="S3.SS1.p3.2.m2.2.2.1.2.cmml">(</mo><msub id="S3.SS1.p3.2.m2.2.2.1.1.1" xref="S3.SS1.p3.2.m2.2.2.1.1.1.cmml"><mi id="S3.SS1.p3.2.m2.2.2.1.1.1.2" xref="S3.SS1.p3.2.m2.2.2.1.1.1.2.cmml">q</mi><mi id="S3.SS1.p3.2.m2.2.2.1.1.1.3" xref="S3.SS1.p3.2.m2.2.2.1.1.1.3.cmml">i</mi></msub><mo id="S3.SS1.p3.2.m2.2.2.1.1.3" xref="S3.SS1.p3.2.m2.2.2.1.2.cmml">,</mo><mi id="S3.SS1.p3.2.m2.1.1" xref="S3.SS1.p3.2.m2.1.1.cmml">D</mi><mo id="S3.SS1.p3.2.m2.2.2.1.1.4" stretchy="false" xref="S3.SS1.p3.2.m2.2.2.1.2.cmml">)</mo></mrow></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p3.2.m2.2b"><apply id="S3.SS1.p3.2.m2.2.2.cmml" xref="S3.SS1.p3.2.m2.2.2"><times id="S3.SS1.p3.2.m2.2.2.2.cmml" xref="S3.SS1.p3.2.m2.2.2.2"></times><ci id="S3.SS1.p3.2.m2.2.2.3.cmml" xref="S3.SS1.p3.2.m2.2.2.3">𝑓</ci><ci id="S3.SS1.p3.2.m2.2.2.4.cmml" xref="S3.SS1.p3.2.m2.2.2.4">𝑟</ci><ci id="S3.SS1.p3.2.m2.2.2.5.cmml" xref="S3.SS1.p3.2.m2.2.2.5">𝑒</ci><ci id="S3.SS1.p3.2.m2.2.2.6.cmml" xref="S3.SS1.p3.2.m2.2.2.6">𝑞</ci><interval closure="open" id="S3.SS1.p3.2.m2.2.2.1.2.cmml" xref="S3.SS1.p3.2.m2.2.2.1.1"><apply id="S3.SS1.p3.2.m2.2.2.1.1.1.cmml" xref="S3.SS1.p3.2.m2.2.2.1.1.1"><csymbol cd="ambiguous" id="S3.SS1.p3.2.m2.2.2.1.1.1.1.cmml" xref="S3.SS1.p3.2.m2.2.2.1.1.1">subscript</csymbol><ci id="S3.SS1.p3.2.m2.2.2.1.1.1.2.cmml" xref="S3.SS1.p3.2.m2.2.2.1.1.1.2">𝑞</ci><ci id="S3.SS1.p3.2.m2.2.2.1.1.1.3.cmml" xref="S3.SS1.p3.2.m2.2.2.1.1.1.3">𝑖</ci></apply><ci id="S3.SS1.p3.2.m2.1.1.cmml" xref="S3.SS1.p3.2.m2.1.1">𝐷</ci></interval></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p3.2.m2.2c">freq(q_{i},D)</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p3.2.m2.2d">italic_f italic_r italic_e italic_q ( italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_D )</annotation></semantics></math> is the frequency of query token in the document. Finally, <math alttext="D_{length}" class="ltx_Math" display="inline" id="S3.SS1.p3.3.m3.1"><semantics id="S3.SS1.p3.3.m3.1a"><msub id="S3.SS1.p3.3.m3.1.1" xref="S3.SS1.p3.3.m3.1.1.cmml"><mi id="S3.SS1.p3.3.m3.1.1.2" xref="S3.SS1.p3.3.m3.1.1.2.cmml">D</mi><mrow id="S3.SS1.p3.3.m3.1.1.3" xref="S3.SS1.p3.3.m3.1.1.3.cmml"><mi id="S3.SS1.p3.3.m3.1.1.3.2" xref="S3.SS1.p3.3.m3.1.1.3.2.cmml">l</mi><mo id="S3.SS1.p3.3.m3.1.1.3.1" xref="S3.SS1.p3.3.m3.1.1.3.1.cmml"></mo><mi id="S3.SS1.p3.3.m3.1.1.3.3" xref="S3.SS1.p3.3.m3.1.1.3.3.cmml">e</mi><mo id="S3.SS1.p3.3.m3.1.1.3.1a" xref="S3.SS1.p3.3.m3.1.1.3.1.cmml"></mo><mi id="S3.SS1.p3.3.m3.1.1.3.4" xref="S3.SS1.p3.3.m3.1.1.3.4.cmml">n</mi><mo id="S3.SS1.p3.3.m3.1.1.3.1b" xref="S3.SS1.p3.3.m3.1.1.3.1.cmml"></mo><mi id="S3.SS1.p3.3.m3.1.1.3.5" xref="S3.SS1.p3.3.m3.1.1.3.5.cmml">g</mi><mo id="S3.SS1.p3.3.m3.1.1.3.1c" xref="S3.SS1.p3.3.m3.1.1.3.1.cmml"></mo><mi id="S3.SS1.p3.3.m3.1.1.3.6" xref="S3.SS1.p3.3.m3.1.1.3.6.cmml">t</mi><mo id="S3.SS1.p3.3.m3.1.1.3.1d" xref="S3.SS1.p3.3.m3.1.1.3.1.cmml"></mo><mi id="S3.SS1.p3.3.m3.1.1.3.7" xref="S3.SS1.p3.3.m3.1.1.3.7.cmml">h</mi></mrow></msub><annotation-xml encoding="MathML-Content" id="S3.SS1.p3.3.m3.1b"><apply id="S3.SS1.p3.3.m3.1.1.cmml" xref="S3.SS1.p3.3.m3.1.1"><csymbol cd="ambiguous" id="S3.SS1.p3.3.m3.1.1.1.cmml" xref="S3.SS1.p3.3.m3.1.1">subscript</csymbol><ci id="S3.SS1.p3.3.m3.1.1.2.cmml" xref="S3.SS1.p3.3.m3.1.1.2">𝐷</ci><apply id="S3.SS1.p3.3.m3.1.1.3.cmml" xref="S3.SS1.p3.3.m3.1.1.3"><times id="S3.SS1.p3.3.m3.1.1.3.1.cmml" xref="S3.SS1.p3.3.m3.1.1.3.1"></times><ci id="S3.SS1.p3.3.m3.1.1.3.2.cmml" xref="S3.SS1.p3.3.m3.1.1.3.2">𝑙</ci><ci id="S3.SS1.p3.3.m3.1.1.3.3.cmml" xref="S3.SS1.p3.3.m3.1.1.3.3">𝑒</ci><ci id="S3.SS1.p3.3.m3.1.1.3.4.cmml" xref="S3.SS1.p3.3.m3.1.1.3.4">𝑛</ci><ci id="S3.SS1.p3.3.m3.1.1.3.5.cmml" xref="S3.SS1.p3.3.m3.1.1.3.5">𝑔</ci><ci id="S3.SS1.p3.3.m3.1.1.3.6.cmml" xref="S3.SS1.p3.3.m3.1.1.3.6">𝑡</ci><ci id="S3.SS1.p3.3.m3.1.1.3.7.cmml" xref="S3.SS1.p3.3.m3.1.1.3.7">ℎ</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p3.3.m3.1c">D_{length}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p3.3.m3.1d">italic_D start_POSTSUBSCRIPT italic_l italic_e italic_n italic_g italic_t italic_h end_POSTSUBSCRIPT</annotation></semantics></math> is the length of the document (i.e. the total number of tokens) and <math alttext="avgDL" class="ltx_Math" display="inline" id="S3.SS1.p3.4.m4.1"><semantics id="S3.SS1.p3.4.m4.1a"><mrow id="S3.SS1.p3.4.m4.1.1" xref="S3.SS1.p3.4.m4.1.1.cmml"><mi id="S3.SS1.p3.4.m4.1.1.2" xref="S3.SS1.p3.4.m4.1.1.2.cmml">a</mi><mo id="S3.SS1.p3.4.m4.1.1.1" xref="S3.SS1.p3.4.m4.1.1.1.cmml"></mo><mi id="S3.SS1.p3.4.m4.1.1.3" xref="S3.SS1.p3.4.m4.1.1.3.cmml">v</mi><mo id="S3.SS1.p3.4.m4.1.1.1a" xref="S3.SS1.p3.4.m4.1.1.1.cmml"></mo><mi id="S3.SS1.p3.4.m4.1.1.4" xref="S3.SS1.p3.4.m4.1.1.4.cmml">g</mi><mo id="S3.SS1.p3.4.m4.1.1.1b" xref="S3.SS1.p3.4.m4.1.1.1.cmml"></mo><mi id="S3.SS1.p3.4.m4.1.1.5" xref="S3.SS1.p3.4.m4.1.1.5.cmml">D</mi><mo id="S3.SS1.p3.4.m4.1.1.1c" xref="S3.SS1.p3.4.m4.1.1.1.cmml"></mo><mi id="S3.SS1.p3.4.m4.1.1.6" xref="S3.SS1.p3.4.m4.1.1.6.cmml">L</mi></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p3.4.m4.1b"><apply id="S3.SS1.p3.4.m4.1.1.cmml" xref="S3.SS1.p3.4.m4.1.1"><times id="S3.SS1.p3.4.m4.1.1.1.cmml" xref="S3.SS1.p3.4.m4.1.1.1"></times><ci id="S3.SS1.p3.4.m4.1.1.2.cmml" xref="S3.SS1.p3.4.m4.1.1.2">𝑎</ci><ci id="S3.SS1.p3.4.m4.1.1.3.cmml" xref="S3.SS1.p3.4.m4.1.1.3">𝑣</ci><ci id="S3.SS1.p3.4.m4.1.1.4.cmml" xref="S3.SS1.p3.4.m4.1.1.4">𝑔</ci><ci id="S3.SS1.p3.4.m4.1.1.5.cmml" xref="S3.SS1.p3.4.m4.1.1.5">𝐷</ci><ci id="S3.SS1.p3.4.m4.1.1.6.cmml" xref="S3.SS1.p3.4.m4.1.1.6">𝐿</ci></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p3.4.m4.1c">avgDL</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p3.4.m4.1d">italic_a italic_v italic_g italic_D italic_L</annotation></semantics></math> is the average document length. Okapi BM25 scores are unbounded and larger scores indicate the retrieved document is more relevant compared to lower scored documents in the context of the query.</p> </div> <div class="ltx_para" id="S3.SS1.p4"> <p class="ltx_p" id="S3.SS1.p4.2">The second index was a <span class="ltx_text ltx_font_bold" id="S3.SS1.p4.2.1">semantic index</span> where documents were represented fixed dimension vector embeddings generated by a sentence embedding model. The broad goal of the semantic index was to retrieve documents that are semantically similar to the query. Semantic relevance is measured by the cosine similarity between query embedding <math alttext="U" class="ltx_Math" display="inline" id="S3.SS1.p4.1.m1.1"><semantics id="S3.SS1.p4.1.m1.1a"><mi id="S3.SS1.p4.1.m1.1.1" xref="S3.SS1.p4.1.m1.1.1.cmml">U</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p4.1.m1.1b"><ci id="S3.SS1.p4.1.m1.1.1.cmml" xref="S3.SS1.p4.1.m1.1.1">𝑈</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p4.1.m1.1c">U</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p4.1.m1.1d">italic_U</annotation></semantics></math> and document embedding <math alttext="V" class="ltx_Math" display="inline" id="S3.SS1.p4.2.m2.1"><semantics id="S3.SS1.p4.2.m2.1a"><mi id="S3.SS1.p4.2.m2.1.1" xref="S3.SS1.p4.2.m2.1.1.cmml">V</mi><annotation-xml encoding="MathML-Content" id="S3.SS1.p4.2.m2.1b"><ci id="S3.SS1.p4.2.m2.1.1.cmml" xref="S3.SS1.p4.2.m2.1.1">𝑉</ci></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p4.2.m2.1c">V</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p4.2.m2.1d">italic_V</annotation></semantics></math> which can be defined as:</p> </div> <div class="ltx_para" id="S3.SS1.p5"> <p class="ltx_p" id="S3.SS1.p5.1"><math alttext="similarity(U,V)=\frac{U\cdot V}{\left\|U\right\|\cdot\left\|V\right\|}=\frac{% \sum_{i=0}^{n}U_{i}V_{i}}{\sqrt{\sum_{i=1}^{n}U^{2}_{i}}\sqrt{\sum_{i=1}^{n}V^% {2}_{i}}}" class="ltx_Math" display="inline" id="S3.SS1.p5.1.m1.4"><semantics id="S3.SS1.p5.1.m1.4a"><mrow id="S3.SS1.p5.1.m1.4.5" xref="S3.SS1.p5.1.m1.4.5.cmml"><mrow id="S3.SS1.p5.1.m1.4.5.2" xref="S3.SS1.p5.1.m1.4.5.2.cmml"><mi id="S3.SS1.p5.1.m1.4.5.2.2" xref="S3.SS1.p5.1.m1.4.5.2.2.cmml">s</mi><mo id="S3.SS1.p5.1.m1.4.5.2.1" xref="S3.SS1.p5.1.m1.4.5.2.1.cmml"></mo><mi id="S3.SS1.p5.1.m1.4.5.2.3" xref="S3.SS1.p5.1.m1.4.5.2.3.cmml">i</mi><mo id="S3.SS1.p5.1.m1.4.5.2.1a" xref="S3.SS1.p5.1.m1.4.5.2.1.cmml"></mo><mi id="S3.SS1.p5.1.m1.4.5.2.4" xref="S3.SS1.p5.1.m1.4.5.2.4.cmml">m</mi><mo id="S3.SS1.p5.1.m1.4.5.2.1b" xref="S3.SS1.p5.1.m1.4.5.2.1.cmml"></mo><mi id="S3.SS1.p5.1.m1.4.5.2.5" xref="S3.SS1.p5.1.m1.4.5.2.5.cmml">i</mi><mo id="S3.SS1.p5.1.m1.4.5.2.1c" xref="S3.SS1.p5.1.m1.4.5.2.1.cmml"></mo><mi id="S3.SS1.p5.1.m1.4.5.2.6" xref="S3.SS1.p5.1.m1.4.5.2.6.cmml">l</mi><mo id="S3.SS1.p5.1.m1.4.5.2.1d" xref="S3.SS1.p5.1.m1.4.5.2.1.cmml"></mo><mi id="S3.SS1.p5.1.m1.4.5.2.7" xref="S3.SS1.p5.1.m1.4.5.2.7.cmml">a</mi><mo id="S3.SS1.p5.1.m1.4.5.2.1e" xref="S3.SS1.p5.1.m1.4.5.2.1.cmml"></mo><mi id="S3.SS1.p5.1.m1.4.5.2.8" xref="S3.SS1.p5.1.m1.4.5.2.8.cmml">r</mi><mo id="S3.SS1.p5.1.m1.4.5.2.1f" xref="S3.SS1.p5.1.m1.4.5.2.1.cmml"></mo><mi id="S3.SS1.p5.1.m1.4.5.2.9" xref="S3.SS1.p5.1.m1.4.5.2.9.cmml">i</mi><mo id="S3.SS1.p5.1.m1.4.5.2.1g" xref="S3.SS1.p5.1.m1.4.5.2.1.cmml"></mo><mi id="S3.SS1.p5.1.m1.4.5.2.10" xref="S3.SS1.p5.1.m1.4.5.2.10.cmml">t</mi><mo id="S3.SS1.p5.1.m1.4.5.2.1h" xref="S3.SS1.p5.1.m1.4.5.2.1.cmml"></mo><mi id="S3.SS1.p5.1.m1.4.5.2.11" xref="S3.SS1.p5.1.m1.4.5.2.11.cmml">y</mi><mo id="S3.SS1.p5.1.m1.4.5.2.1i" xref="S3.SS1.p5.1.m1.4.5.2.1.cmml"></mo><mrow id="S3.SS1.p5.1.m1.4.5.2.12.2" xref="S3.SS1.p5.1.m1.4.5.2.12.1.cmml"><mo id="S3.SS1.p5.1.m1.4.5.2.12.2.1" stretchy="false" xref="S3.SS1.p5.1.m1.4.5.2.12.1.cmml">(</mo><mi id="S3.SS1.p5.1.m1.3.3" xref="S3.SS1.p5.1.m1.3.3.cmml">U</mi><mo id="S3.SS1.p5.1.m1.4.5.2.12.2.2" xref="S3.SS1.p5.1.m1.4.5.2.12.1.cmml">,</mo><mi id="S3.SS1.p5.1.m1.4.4" xref="S3.SS1.p5.1.m1.4.4.cmml">V</mi><mo id="S3.SS1.p5.1.m1.4.5.2.12.2.3" stretchy="false" xref="S3.SS1.p5.1.m1.4.5.2.12.1.cmml">)</mo></mrow></mrow><mo id="S3.SS1.p5.1.m1.4.5.3" xref="S3.SS1.p5.1.m1.4.5.3.cmml">=</mo><mfrac id="S3.SS1.p5.1.m1.2.2" xref="S3.SS1.p5.1.m1.2.2.cmml"><mrow id="S3.SS1.p5.1.m1.2.2.4" xref="S3.SS1.p5.1.m1.2.2.4.cmml"><mi id="S3.SS1.p5.1.m1.2.2.4.2" xref="S3.SS1.p5.1.m1.2.2.4.2.cmml">U</mi><mo id="S3.SS1.p5.1.m1.2.2.4.1" lspace="0.222em" rspace="0.222em" xref="S3.SS1.p5.1.m1.2.2.4.1.cmml">⋅</mo><mi id="S3.SS1.p5.1.m1.2.2.4.3" xref="S3.SS1.p5.1.m1.2.2.4.3.cmml">V</mi></mrow><mrow id="S3.SS1.p5.1.m1.2.2.2" xref="S3.SS1.p5.1.m1.2.2.2.cmml"><mrow id="S3.SS1.p5.1.m1.2.2.2.4.2" xref="S3.SS1.p5.1.m1.2.2.2.4.1.cmml"><mo id="S3.SS1.p5.1.m1.2.2.2.4.2.1" xref="S3.SS1.p5.1.m1.2.2.2.4.1.1.cmml">‖</mo><mi id="S3.SS1.p5.1.m1.1.1.1.1" xref="S3.SS1.p5.1.m1.1.1.1.1.cmml">U</mi><mo id="S3.SS1.p5.1.m1.2.2.2.4.2.2" rspace="0.055em" xref="S3.SS1.p5.1.m1.2.2.2.4.1.1.cmml">‖</mo></mrow><mo id="S3.SS1.p5.1.m1.2.2.2.3" rspace="0.222em" xref="S3.SS1.p5.1.m1.2.2.2.3.cmml">⋅</mo><mrow id="S3.SS1.p5.1.m1.2.2.2.5.2" xref="S3.SS1.p5.1.m1.2.2.2.5.1.cmml"><mo id="S3.SS1.p5.1.m1.2.2.2.5.2.1" xref="S3.SS1.p5.1.m1.2.2.2.5.1.1.cmml">‖</mo><mi id="S3.SS1.p5.1.m1.2.2.2.2" xref="S3.SS1.p5.1.m1.2.2.2.2.cmml">V</mi><mo id="S3.SS1.p5.1.m1.2.2.2.5.2.2" xref="S3.SS1.p5.1.m1.2.2.2.5.1.1.cmml">‖</mo></mrow></mrow></mfrac><mo id="S3.SS1.p5.1.m1.4.5.4" xref="S3.SS1.p5.1.m1.4.5.4.cmml">=</mo><mfrac id="S3.SS1.p5.1.m1.4.5.5" xref="S3.SS1.p5.1.m1.4.5.5.cmml"><mrow id="S3.SS1.p5.1.m1.4.5.5.2" xref="S3.SS1.p5.1.m1.4.5.5.2.cmml"><mstyle displaystyle="false" id="S3.SS1.p5.1.m1.4.5.5.2.1" xref="S3.SS1.p5.1.m1.4.5.5.2.1.cmml"><msubsup id="S3.SS1.p5.1.m1.4.5.5.2.1a" xref="S3.SS1.p5.1.m1.4.5.5.2.1.cmml"><mo id="S3.SS1.p5.1.m1.4.5.5.2.1.2.2" xref="S3.SS1.p5.1.m1.4.5.5.2.1.2.2.cmml">∑</mo><mrow id="S3.SS1.p5.1.m1.4.5.5.2.1.2.3" xref="S3.SS1.p5.1.m1.4.5.5.2.1.2.3.cmml"><mi id="S3.SS1.p5.1.m1.4.5.5.2.1.2.3.2" xref="S3.SS1.p5.1.m1.4.5.5.2.1.2.3.2.cmml">i</mi><mo id="S3.SS1.p5.1.m1.4.5.5.2.1.2.3.1" xref="S3.SS1.p5.1.m1.4.5.5.2.1.2.3.1.cmml">=</mo><mn id="S3.SS1.p5.1.m1.4.5.5.2.1.2.3.3" xref="S3.SS1.p5.1.m1.4.5.5.2.1.2.3.3.cmml">0</mn></mrow><mi id="S3.SS1.p5.1.m1.4.5.5.2.1.3" xref="S3.SS1.p5.1.m1.4.5.5.2.1.3.cmml">n</mi></msubsup></mstyle><mrow id="S3.SS1.p5.1.m1.4.5.5.2.2" xref="S3.SS1.p5.1.m1.4.5.5.2.2.cmml"><msub id="S3.SS1.p5.1.m1.4.5.5.2.2.2" xref="S3.SS1.p5.1.m1.4.5.5.2.2.2.cmml"><mi id="S3.SS1.p5.1.m1.4.5.5.2.2.2.2" xref="S3.SS1.p5.1.m1.4.5.5.2.2.2.2.cmml">U</mi><mi id="S3.SS1.p5.1.m1.4.5.5.2.2.2.3" xref="S3.SS1.p5.1.m1.4.5.5.2.2.2.3.cmml">i</mi></msub><mo id="S3.SS1.p5.1.m1.4.5.5.2.2.1" xref="S3.SS1.p5.1.m1.4.5.5.2.2.1.cmml"></mo><msub id="S3.SS1.p5.1.m1.4.5.5.2.2.3" xref="S3.SS1.p5.1.m1.4.5.5.2.2.3.cmml"><mi id="S3.SS1.p5.1.m1.4.5.5.2.2.3.2" xref="S3.SS1.p5.1.m1.4.5.5.2.2.3.2.cmml">V</mi><mi id="S3.SS1.p5.1.m1.4.5.5.2.2.3.3" xref="S3.SS1.p5.1.m1.4.5.5.2.2.3.3.cmml">i</mi></msub></mrow></mrow><mrow id="S3.SS1.p5.1.m1.4.5.5.3" xref="S3.SS1.p5.1.m1.4.5.5.3.cmml"><msqrt id="S3.SS1.p5.1.m1.4.5.5.3.2" xref="S3.SS1.p5.1.m1.4.5.5.3.2.cmml"><mrow id="S3.SS1.p5.1.m1.4.5.5.3.2.2" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.cmml"><mstyle displaystyle="false" id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.cmml"><msubsup id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1a" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.cmml"><mo id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.2" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.2.cmml">∑</mo><mrow id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3.cmml"><mi id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3.2" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3.2.cmml">i</mi><mo id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3.1" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3.1.cmml">=</mo><mn id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3.3" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3.3.cmml">1</mn></mrow><mi id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.3" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.3.cmml">n</mi></msubsup></mstyle><msubsup id="S3.SS1.p5.1.m1.4.5.5.3.2.2.2" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.cmml"><mi id="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.2.2" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.2.2.cmml">U</mi><mi id="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.3" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.3.cmml">i</mi><mn id="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.2.3" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.2.3.cmml">2</mn></msubsup></mrow></msqrt><mo id="S3.SS1.p5.1.m1.4.5.5.3.1" xref="S3.SS1.p5.1.m1.4.5.5.3.1.cmml"></mo><msqrt id="S3.SS1.p5.1.m1.4.5.5.3.3" xref="S3.SS1.p5.1.m1.4.5.5.3.3.cmml"><mrow id="S3.SS1.p5.1.m1.4.5.5.3.3.2" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.cmml"><mstyle displaystyle="false" id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.cmml"><msubsup id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1a" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.cmml"><mo id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.2" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.2.cmml">∑</mo><mrow id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3.cmml"><mi id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3.2" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3.2.cmml">i</mi><mo id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3.1" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3.1.cmml">=</mo><mn id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3.3" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3.3.cmml">1</mn></mrow><mi id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.3" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.3.cmml">n</mi></msubsup></mstyle><msubsup id="S3.SS1.p5.1.m1.4.5.5.3.3.2.2" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.cmml"><mi id="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.2.2" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.2.2.cmml">V</mi><mi id="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.3" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.3.cmml">i</mi><mn id="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.2.3" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.2.3.cmml">2</mn></msubsup></mrow></msqrt></mrow></mfrac></mrow><annotation-xml encoding="MathML-Content" id="S3.SS1.p5.1.m1.4b"><apply id="S3.SS1.p5.1.m1.4.5.cmml" xref="S3.SS1.p5.1.m1.4.5"><and id="S3.SS1.p5.1.m1.4.5a.cmml" xref="S3.SS1.p5.1.m1.4.5"></and><apply id="S3.SS1.p5.1.m1.4.5b.cmml" xref="S3.SS1.p5.1.m1.4.5"><eq id="S3.SS1.p5.1.m1.4.5.3.cmml" xref="S3.SS1.p5.1.m1.4.5.3"></eq><apply id="S3.SS1.p5.1.m1.4.5.2.cmml" xref="S3.SS1.p5.1.m1.4.5.2"><times id="S3.SS1.p5.1.m1.4.5.2.1.cmml" xref="S3.SS1.p5.1.m1.4.5.2.1"></times><ci id="S3.SS1.p5.1.m1.4.5.2.2.cmml" xref="S3.SS1.p5.1.m1.4.5.2.2">𝑠</ci><ci id="S3.SS1.p5.1.m1.4.5.2.3.cmml" xref="S3.SS1.p5.1.m1.4.5.2.3">𝑖</ci><ci id="S3.SS1.p5.1.m1.4.5.2.4.cmml" xref="S3.SS1.p5.1.m1.4.5.2.4">𝑚</ci><ci id="S3.SS1.p5.1.m1.4.5.2.5.cmml" xref="S3.SS1.p5.1.m1.4.5.2.5">𝑖</ci><ci id="S3.SS1.p5.1.m1.4.5.2.6.cmml" xref="S3.SS1.p5.1.m1.4.5.2.6">𝑙</ci><ci id="S3.SS1.p5.1.m1.4.5.2.7.cmml" xref="S3.SS1.p5.1.m1.4.5.2.7">𝑎</ci><ci id="S3.SS1.p5.1.m1.4.5.2.8.cmml" xref="S3.SS1.p5.1.m1.4.5.2.8">𝑟</ci><ci id="S3.SS1.p5.1.m1.4.5.2.9.cmml" xref="S3.SS1.p5.1.m1.4.5.2.9">𝑖</ci><ci id="S3.SS1.p5.1.m1.4.5.2.10.cmml" xref="S3.SS1.p5.1.m1.4.5.2.10">𝑡</ci><ci id="S3.SS1.p5.1.m1.4.5.2.11.cmml" xref="S3.SS1.p5.1.m1.4.5.2.11">𝑦</ci><interval closure="open" id="S3.SS1.p5.1.m1.4.5.2.12.1.cmml" xref="S3.SS1.p5.1.m1.4.5.2.12.2"><ci id="S3.SS1.p5.1.m1.3.3.cmml" xref="S3.SS1.p5.1.m1.3.3">𝑈</ci><ci id="S3.SS1.p5.1.m1.4.4.cmml" xref="S3.SS1.p5.1.m1.4.4">𝑉</ci></interval></apply><apply id="S3.SS1.p5.1.m1.2.2.cmml" xref="S3.SS1.p5.1.m1.2.2"><divide id="S3.SS1.p5.1.m1.2.2.3.cmml" xref="S3.SS1.p5.1.m1.2.2"></divide><apply id="S3.SS1.p5.1.m1.2.2.4.cmml" xref="S3.SS1.p5.1.m1.2.2.4"><ci id="S3.SS1.p5.1.m1.2.2.4.1.cmml" xref="S3.SS1.p5.1.m1.2.2.4.1">⋅</ci><ci id="S3.SS1.p5.1.m1.2.2.4.2.cmml" xref="S3.SS1.p5.1.m1.2.2.4.2">𝑈</ci><ci id="S3.SS1.p5.1.m1.2.2.4.3.cmml" xref="S3.SS1.p5.1.m1.2.2.4.3">𝑉</ci></apply><apply id="S3.SS1.p5.1.m1.2.2.2.cmml" xref="S3.SS1.p5.1.m1.2.2.2"><ci id="S3.SS1.p5.1.m1.2.2.2.3.cmml" xref="S3.SS1.p5.1.m1.2.2.2.3">⋅</ci><apply id="S3.SS1.p5.1.m1.2.2.2.4.1.cmml" xref="S3.SS1.p5.1.m1.2.2.2.4.2"><csymbol cd="latexml" id="S3.SS1.p5.1.m1.2.2.2.4.1.1.cmml" xref="S3.SS1.p5.1.m1.2.2.2.4.2.1">norm</csymbol><ci id="S3.SS1.p5.1.m1.1.1.1.1.cmml" xref="S3.SS1.p5.1.m1.1.1.1.1">𝑈</ci></apply><apply id="S3.SS1.p5.1.m1.2.2.2.5.1.cmml" xref="S3.SS1.p5.1.m1.2.2.2.5.2"><csymbol cd="latexml" id="S3.SS1.p5.1.m1.2.2.2.5.1.1.cmml" xref="S3.SS1.p5.1.m1.2.2.2.5.2.1">norm</csymbol><ci id="S3.SS1.p5.1.m1.2.2.2.2.cmml" xref="S3.SS1.p5.1.m1.2.2.2.2">𝑉</ci></apply></apply></apply></apply><apply id="S3.SS1.p5.1.m1.4.5c.cmml" xref="S3.SS1.p5.1.m1.4.5"><eq id="S3.SS1.p5.1.m1.4.5.4.cmml" xref="S3.SS1.p5.1.m1.4.5.4"></eq><share href="https://arxiv.org/html/2503.01003v1#S3.SS1.p5.1.m1.2.2.cmml" id="S3.SS1.p5.1.m1.4.5d.cmml" xref="S3.SS1.p5.1.m1.4.5"></share><apply id="S3.SS1.p5.1.m1.4.5.5.cmml" xref="S3.SS1.p5.1.m1.4.5.5"><divide id="S3.SS1.p5.1.m1.4.5.5.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5"></divide><apply id="S3.SS1.p5.1.m1.4.5.5.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2"><apply id="S3.SS1.p5.1.m1.4.5.5.2.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.1"><csymbol cd="ambiguous" id="S3.SS1.p5.1.m1.4.5.5.2.1.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.1">superscript</csymbol><apply id="S3.SS1.p5.1.m1.4.5.5.2.1.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.1"><csymbol cd="ambiguous" id="S3.SS1.p5.1.m1.4.5.5.2.1.2.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.1">subscript</csymbol><sum id="S3.SS1.p5.1.m1.4.5.5.2.1.2.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.1.2.2"></sum><apply id="S3.SS1.p5.1.m1.4.5.5.2.1.2.3.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.1.2.3"><eq id="S3.SS1.p5.1.m1.4.5.5.2.1.2.3.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.1.2.3.1"></eq><ci id="S3.SS1.p5.1.m1.4.5.5.2.1.2.3.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.1.2.3.2">𝑖</ci><cn id="S3.SS1.p5.1.m1.4.5.5.2.1.2.3.3.cmml" type="integer" xref="S3.SS1.p5.1.m1.4.5.5.2.1.2.3.3">0</cn></apply></apply><ci id="S3.SS1.p5.1.m1.4.5.5.2.1.3.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.1.3">𝑛</ci></apply><apply id="S3.SS1.p5.1.m1.4.5.5.2.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.2"><times id="S3.SS1.p5.1.m1.4.5.5.2.2.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.2.1"></times><apply id="S3.SS1.p5.1.m1.4.5.5.2.2.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.2.2"><csymbol cd="ambiguous" id="S3.SS1.p5.1.m1.4.5.5.2.2.2.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.2.2">subscript</csymbol><ci id="S3.SS1.p5.1.m1.4.5.5.2.2.2.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.2.2.2">𝑈</ci><ci id="S3.SS1.p5.1.m1.4.5.5.2.2.2.3.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.2.2.3">𝑖</ci></apply><apply id="S3.SS1.p5.1.m1.4.5.5.2.2.3.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.2.3"><csymbol cd="ambiguous" id="S3.SS1.p5.1.m1.4.5.5.2.2.3.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.2.3">subscript</csymbol><ci id="S3.SS1.p5.1.m1.4.5.5.2.2.3.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.2.3.2">𝑉</ci><ci id="S3.SS1.p5.1.m1.4.5.5.2.2.3.3.cmml" xref="S3.SS1.p5.1.m1.4.5.5.2.2.3.3">𝑖</ci></apply></apply></apply><apply id="S3.SS1.p5.1.m1.4.5.5.3.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3"><times id="S3.SS1.p5.1.m1.4.5.5.3.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.1"></times><apply id="S3.SS1.p5.1.m1.4.5.5.3.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2"><root id="S3.SS1.p5.1.m1.4.5.5.3.2a.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2"></root><apply id="S3.SS1.p5.1.m1.4.5.5.3.2.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2"><apply id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1"><csymbol cd="ambiguous" id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1">superscript</csymbol><apply id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1"><csymbol cd="ambiguous" id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1">subscript</csymbol><sum id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.2"></sum><apply id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3"><eq id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3.1"></eq><ci id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3.2">𝑖</ci><cn id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3.3.cmml" type="integer" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.2.3.3">1</cn></apply></apply><ci id="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.3.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.1.3">𝑛</ci></apply><apply id="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.2"><csymbol cd="ambiguous" id="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.2">subscript</csymbol><apply id="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.2"><csymbol cd="ambiguous" id="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.2.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.2">superscript</csymbol><ci id="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.2.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.2.2">𝑈</ci><cn id="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.2.3.cmml" type="integer" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.2.3">2</cn></apply><ci id="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.3.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.2.2.2.3">𝑖</ci></apply></apply></apply><apply id="S3.SS1.p5.1.m1.4.5.5.3.3.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3"><root id="S3.SS1.p5.1.m1.4.5.5.3.3a.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3"></root><apply id="S3.SS1.p5.1.m1.4.5.5.3.3.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2"><apply id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1"><csymbol cd="ambiguous" id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1">superscript</csymbol><apply id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1"><csymbol cd="ambiguous" id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1">subscript</csymbol><sum id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.2"></sum><apply id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3"><eq id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3.1"></eq><ci id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3.2">𝑖</ci><cn id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3.3.cmml" type="integer" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.2.3.3">1</cn></apply></apply><ci id="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.3.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.1.3">𝑛</ci></apply><apply id="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.2"><csymbol cd="ambiguous" id="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.2">subscript</csymbol><apply id="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.2"><csymbol cd="ambiguous" id="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.2.1.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.2">superscript</csymbol><ci id="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.2.2.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.2.2">𝑉</ci><cn id="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.2.3.cmml" type="integer" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.2.3">2</cn></apply><ci id="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.3.cmml" xref="S3.SS1.p5.1.m1.4.5.5.3.3.2.2.3">𝑖</ci></apply></apply></apply></apply></apply></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS1.p5.1.m1.4c">similarity(U,V)=\frac{U\cdot V}{\left\|U\right\|\cdot\left\|V\right\|}=\frac{% \sum_{i=0}^{n}U_{i}V_{i}}{\sqrt{\sum_{i=1}^{n}U^{2}_{i}}\sqrt{\sum_{i=1}^{n}V^% {2}_{i}}}</annotation><annotation encoding="application/x-llamapun" id="S3.SS1.p5.1.m1.4d">italic_s italic_i italic_m italic_i italic_l italic_a italic_r italic_i italic_t italic_y ( italic_U , italic_V ) = divide start_ARG italic_U ⋅ italic_V end_ARG start_ARG ∥ italic_U ∥ ⋅ ∥ italic_V ∥ end_ARG = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG</annotation></semantics></math></p> </div> <figure class="ltx_figure" id="S3.F2"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="243" id="S3.F2.g1" src="extracted/6246156/semantic-model.png" width="589"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 2: </span>Siamese sentence embedding architecture for asymmetric matching.</figcaption> </figure> <div class="ltx_para" id="S3.SS1.p6"> <p class="ltx_p" id="S3.SS1.p6.1">Sentence embedding models generate a fixed dimension representation for a provided input text and are trained to represent sentence-level inputs for tasks such as semantic text similarity, paraphrase detection, and textual entailment. Search applications present challenges when considering sentence embedding models. Applying sentence embedding to document-level inputs (e.g. paragraphs or new articles) dilutes the quality of embedding representation and will likely result in poorer performance in the context of dense passage retrieval and ranking. Additionally, there is an input asymmetry challenge where the query length is often shorter than the relevant document that is to be retrieved. Finally, there may be limited lexical overlap between the query text and the relevant document. As a result standard sentence embeddings models like USE (Universal Sentence Encoder) <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#bib.bib7" title="">7</a>]</cite> will struggle for general semantic search use cases. To account for this we use a Siamese network architecture <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#bib.bib8" title="">8</a>]</cite> that was pretrained to support asymmetric (Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S3.F2" title="Figure 2 ‣ 3.1 Document Indexing ‣ 3 Methods ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_tag">2</span></a>) matching. The Siamese architecture takes as input query and relevant passage pairs and fine-tunes a shared sentence embedding model to increase the cosine similarity between relevant pairs and decreases the similarity between negative pair samples. The resulting sentence embedding model is better tuned to support the asymmetric nature of determining the semantic similarity between a query and document embedding. Details on the pretrained sentence embedding model can be found in Section <a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S4.SS2" title="4.2 Setup ‣ 4 Experiments ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_tag">4.2</span></a>.</p> </div> </section> <section class="ltx_subsection" id="S3.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">3.2 </span>Semantic Search Pipeline</h3> <div class="ltx_para" id="S3.SS2.p1"> <p class="ltx_p" id="S3.SS2.p1.1">Our semantic search pipeline (Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S3.F3" title="Figure 3 ‣ 3.2 Semantic Search Pipeline ‣ 3 Methods ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_tag">3</span></a>) aggregates results from three distinct query strategies to produce the final set of relevant causal documents. Provided a topic consisting of a title and narrative (e,g, Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S1.F1" title="Figure 1 ‣ 1 Introduction ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_tag">1</span></a>), we treat the title as the query text and narrative as a source for causal keywords.</p> </div> <div class="ltx_para" id="S3.SS2.p2"> <p class="ltx_p" id="S3.SS2.p2.1"><math alttext="Q1" class="ltx_Math" display="inline" id="S3.SS2.p2.1.m1.1"><semantics id="S3.SS2.p2.1.m1.1a"><mrow id="S3.SS2.p2.1.m1.1.1" xref="S3.SS2.p2.1.m1.1.1.cmml"><mi id="S3.SS2.p2.1.m1.1.1.2" xref="S3.SS2.p2.1.m1.1.1.2.cmml">Q</mi><mo id="S3.SS2.p2.1.m1.1.1.1" xref="S3.SS2.p2.1.m1.1.1.1.cmml"></mo><mn id="S3.SS2.p2.1.m1.1.1.3" xref="S3.SS2.p2.1.m1.1.1.3.cmml">1</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.p2.1.m1.1b"><apply id="S3.SS2.p2.1.m1.1.1.cmml" xref="S3.SS2.p2.1.m1.1.1"><times id="S3.SS2.p2.1.m1.1.1.1.cmml" xref="S3.SS2.p2.1.m1.1.1.1"></times><ci id="S3.SS2.p2.1.m1.1.1.2.cmml" xref="S3.SS2.p2.1.m1.1.1.2">𝑄</ci><cn id="S3.SS2.p2.1.m1.1.1.3.cmml" type="integer" xref="S3.SS2.p2.1.m1.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p2.1.m1.1c">Q1</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p2.1.m1.1d">italic_Q 1</annotation></semantics></math> retrieves the 500 most semantically similar documents from the semantic index. This is accomplished by embedding the query text using the sentence embedding model, retrieving the closest document embeddings based on cosine distance, and then ranking the documents using cosine similarity scores between query embedding and document embedding.</p> </div> <div class="ltx_para" id="S3.SS2.p3"> <p class="ltx_p" id="S3.SS2.p3.1"><math alttext="Q2" class="ltx_Math" display="inline" id="S3.SS2.p3.1.m1.1"><semantics id="S3.SS2.p3.1.m1.1a"><mrow id="S3.SS2.p3.1.m1.1.1" xref="S3.SS2.p3.1.m1.1.1.cmml"><mi id="S3.SS2.p3.1.m1.1.1.2" xref="S3.SS2.p3.1.m1.1.1.2.cmml">Q</mi><mo id="S3.SS2.p3.1.m1.1.1.1" xref="S3.SS2.p3.1.m1.1.1.1.cmml"></mo><mn id="S3.SS2.p3.1.m1.1.1.3" xref="S3.SS2.p3.1.m1.1.1.3.cmml">2</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.p3.1.m1.1b"><apply id="S3.SS2.p3.1.m1.1.1.cmml" xref="S3.SS2.p3.1.m1.1.1"><times id="S3.SS2.p3.1.m1.1.1.1.cmml" xref="S3.SS2.p3.1.m1.1.1.1"></times><ci id="S3.SS2.p3.1.m1.1.1.2.cmml" xref="S3.SS2.p3.1.m1.1.1.2">𝑄</ci><cn id="S3.SS2.p3.1.m1.1.1.3.cmml" type="integer" xref="S3.SS2.p3.1.m1.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p3.1.m1.1c">Q2</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p3.1.m1.1d">italic_Q 2</annotation></semantics></math> retrieves the 500 most relevant documents from the lexical index, where the relevance is measured by the Okapi BM25 between the candidate documents tokens and query tokens.</p> </div> <div class="ltx_para" id="S3.SS2.p4"> <p class="ltx_p" id="S3.SS2.p4.1"><math alttext="Q3" class="ltx_Math" display="inline" id="S3.SS2.p4.1.m1.1"><semantics id="S3.SS2.p4.1.m1.1a"><mrow id="S3.SS2.p4.1.m1.1.1" xref="S3.SS2.p4.1.m1.1.1.cmml"><mi id="S3.SS2.p4.1.m1.1.1.2" xref="S3.SS2.p4.1.m1.1.1.2.cmml">Q</mi><mo id="S3.SS2.p4.1.m1.1.1.1" xref="S3.SS2.p4.1.m1.1.1.1.cmml"></mo><mn id="S3.SS2.p4.1.m1.1.1.3" xref="S3.SS2.p4.1.m1.1.1.3.cmml">3</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.p4.1.m1.1b"><apply id="S3.SS2.p4.1.m1.1.1.cmml" xref="S3.SS2.p4.1.m1.1.1"><times id="S3.SS2.p4.1.m1.1.1.1.cmml" xref="S3.SS2.p4.1.m1.1.1.1"></times><ci id="S3.SS2.p4.1.m1.1.1.2.cmml" xref="S3.SS2.p4.1.m1.1.1.2">𝑄</ci><cn id="S3.SS2.p4.1.m1.1.1.3.cmml" type="integer" xref="S3.SS2.p4.1.m1.1.1.3">3</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p4.1.m1.1c">Q3</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p4.1.m1.1d">italic_Q 3</annotation></semantics></math> also retrieves 500 results from the lexical index but uses causal keywords extracted from the narrative description. The narrative text is first passed through a filter step which removes any statements in the description that describes irrelevant documents. The filter uses a simple keyword-based regex (e.g. not relevant, not considered, irrelevant, etc) to identify those statements. Next, the filtered narrative is converted into a set of keywords using TopicRank <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#bib.bib9" title="">9</a>]</cite>. Finally, the causal keywords from the narrative are used to query the lexical index.</p> </div> <div class="ltx_para" id="S3.SS2.p5"> <p class="ltx_p" id="S3.SS2.p5.6"><math alttext="Q1" class="ltx_Math" display="inline" id="S3.SS2.p5.1.m1.1"><semantics id="S3.SS2.p5.1.m1.1a"><mrow id="S3.SS2.p5.1.m1.1.1" xref="S3.SS2.p5.1.m1.1.1.cmml"><mi id="S3.SS2.p5.1.m1.1.1.2" xref="S3.SS2.p5.1.m1.1.1.2.cmml">Q</mi><mo id="S3.SS2.p5.1.m1.1.1.1" xref="S3.SS2.p5.1.m1.1.1.1.cmml"></mo><mn id="S3.SS2.p5.1.m1.1.1.3" xref="S3.SS2.p5.1.m1.1.1.3.cmml">1</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.p5.1.m1.1b"><apply id="S3.SS2.p5.1.m1.1.1.cmml" xref="S3.SS2.p5.1.m1.1.1"><times id="S3.SS2.p5.1.m1.1.1.1.cmml" xref="S3.SS2.p5.1.m1.1.1.1"></times><ci id="S3.SS2.p5.1.m1.1.1.2.cmml" xref="S3.SS2.p5.1.m1.1.1.2">𝑄</ci><cn id="S3.SS2.p5.1.m1.1.1.3.cmml" type="integer" xref="S3.SS2.p5.1.m1.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p5.1.m1.1c">Q1</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p5.1.m1.1d">italic_Q 1</annotation></semantics></math>, <math alttext="Q2" class="ltx_Math" display="inline" id="S3.SS2.p5.2.m2.1"><semantics id="S3.SS2.p5.2.m2.1a"><mrow id="S3.SS2.p5.2.m2.1.1" xref="S3.SS2.p5.2.m2.1.1.cmml"><mi id="S3.SS2.p5.2.m2.1.1.2" xref="S3.SS2.p5.2.m2.1.1.2.cmml">Q</mi><mo id="S3.SS2.p5.2.m2.1.1.1" xref="S3.SS2.p5.2.m2.1.1.1.cmml"></mo><mn id="S3.SS2.p5.2.m2.1.1.3" xref="S3.SS2.p5.2.m2.1.1.3.cmml">2</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.p5.2.m2.1b"><apply id="S3.SS2.p5.2.m2.1.1.cmml" xref="S3.SS2.p5.2.m2.1.1"><times id="S3.SS2.p5.2.m2.1.1.1.cmml" xref="S3.SS2.p5.2.m2.1.1.1"></times><ci id="S3.SS2.p5.2.m2.1.1.2.cmml" xref="S3.SS2.p5.2.m2.1.1.2">𝑄</ci><cn id="S3.SS2.p5.2.m2.1.1.3.cmml" type="integer" xref="S3.SS2.p5.2.m2.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p5.2.m2.1c">Q2</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p5.2.m2.1d">italic_Q 2</annotation></semantics></math>, and <math alttext="Q3" class="ltx_Math" display="inline" id="S3.SS2.p5.3.m3.1"><semantics id="S3.SS2.p5.3.m3.1a"><mrow id="S3.SS2.p5.3.m3.1.1" xref="S3.SS2.p5.3.m3.1.1.cmml"><mi id="S3.SS2.p5.3.m3.1.1.2" xref="S3.SS2.p5.3.m3.1.1.2.cmml">Q</mi><mo id="S3.SS2.p5.3.m3.1.1.1" xref="S3.SS2.p5.3.m3.1.1.1.cmml"></mo><mn id="S3.SS2.p5.3.m3.1.1.3" xref="S3.SS2.p5.3.m3.1.1.3.cmml">3</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.p5.3.m3.1b"><apply id="S3.SS2.p5.3.m3.1.1.cmml" xref="S3.SS2.p5.3.m3.1.1"><times id="S3.SS2.p5.3.m3.1.1.1.cmml" xref="S3.SS2.p5.3.m3.1.1.1"></times><ci id="S3.SS2.p5.3.m3.1.1.2.cmml" xref="S3.SS2.p5.3.m3.1.1.2">𝑄</ci><cn id="S3.SS2.p5.3.m3.1.1.3.cmml" type="integer" xref="S3.SS2.p5.3.m3.1.1.3">3</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p5.3.m3.1c">Q3</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p5.3.m3.1d">italic_Q 3</annotation></semantics></math> each produce a set of candidate documents (<math alttext="Q1^{\prime}" class="ltx_Math" display="inline" id="S3.SS2.p5.4.m4.1"><semantics id="S3.SS2.p5.4.m4.1a"><mrow id="S3.SS2.p5.4.m4.1.1" xref="S3.SS2.p5.4.m4.1.1.cmml"><mi id="S3.SS2.p5.4.m4.1.1.2" xref="S3.SS2.p5.4.m4.1.1.2.cmml">Q</mi><mo id="S3.SS2.p5.4.m4.1.1.1" xref="S3.SS2.p5.4.m4.1.1.1.cmml"></mo><msup id="S3.SS2.p5.4.m4.1.1.3" xref="S3.SS2.p5.4.m4.1.1.3.cmml"><mn id="S3.SS2.p5.4.m4.1.1.3.2" xref="S3.SS2.p5.4.m4.1.1.3.2.cmml">1</mn><mo id="S3.SS2.p5.4.m4.1.1.3.3" xref="S3.SS2.p5.4.m4.1.1.3.3.cmml">′</mo></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.p5.4.m4.1b"><apply id="S3.SS2.p5.4.m4.1.1.cmml" xref="S3.SS2.p5.4.m4.1.1"><times id="S3.SS2.p5.4.m4.1.1.1.cmml" xref="S3.SS2.p5.4.m4.1.1.1"></times><ci id="S3.SS2.p5.4.m4.1.1.2.cmml" xref="S3.SS2.p5.4.m4.1.1.2">𝑄</ci><apply id="S3.SS2.p5.4.m4.1.1.3.cmml" xref="S3.SS2.p5.4.m4.1.1.3"><csymbol cd="ambiguous" id="S3.SS2.p5.4.m4.1.1.3.1.cmml" xref="S3.SS2.p5.4.m4.1.1.3">superscript</csymbol><cn id="S3.SS2.p5.4.m4.1.1.3.2.cmml" type="integer" xref="S3.SS2.p5.4.m4.1.1.3.2">1</cn><ci id="S3.SS2.p5.4.m4.1.1.3.3.cmml" xref="S3.SS2.p5.4.m4.1.1.3.3">′</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p5.4.m4.1c">Q1^{\prime}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p5.4.m4.1d">italic_Q 1 start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT</annotation></semantics></math>, <math alttext="Q2^{\prime}" class="ltx_Math" display="inline" id="S3.SS2.p5.5.m5.1"><semantics id="S3.SS2.p5.5.m5.1a"><mrow id="S3.SS2.p5.5.m5.1.1" xref="S3.SS2.p5.5.m5.1.1.cmml"><mi id="S3.SS2.p5.5.m5.1.1.2" xref="S3.SS2.p5.5.m5.1.1.2.cmml">Q</mi><mo id="S3.SS2.p5.5.m5.1.1.1" xref="S3.SS2.p5.5.m5.1.1.1.cmml"></mo><msup id="S3.SS2.p5.5.m5.1.1.3" xref="S3.SS2.p5.5.m5.1.1.3.cmml"><mn id="S3.SS2.p5.5.m5.1.1.3.2" xref="S3.SS2.p5.5.m5.1.1.3.2.cmml">2</mn><mo id="S3.SS2.p5.5.m5.1.1.3.3" xref="S3.SS2.p5.5.m5.1.1.3.3.cmml">′</mo></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.p5.5.m5.1b"><apply id="S3.SS2.p5.5.m5.1.1.cmml" xref="S3.SS2.p5.5.m5.1.1"><times id="S3.SS2.p5.5.m5.1.1.1.cmml" xref="S3.SS2.p5.5.m5.1.1.1"></times><ci id="S3.SS2.p5.5.m5.1.1.2.cmml" xref="S3.SS2.p5.5.m5.1.1.2">𝑄</ci><apply id="S3.SS2.p5.5.m5.1.1.3.cmml" xref="S3.SS2.p5.5.m5.1.1.3"><csymbol cd="ambiguous" id="S3.SS2.p5.5.m5.1.1.3.1.cmml" xref="S3.SS2.p5.5.m5.1.1.3">superscript</csymbol><cn id="S3.SS2.p5.5.m5.1.1.3.2.cmml" type="integer" xref="S3.SS2.p5.5.m5.1.1.3.2">2</cn><ci id="S3.SS2.p5.5.m5.1.1.3.3.cmml" xref="S3.SS2.p5.5.m5.1.1.3.3">′</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p5.5.m5.1c">Q2^{\prime}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p5.5.m5.1d">italic_Q 2 start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT</annotation></semantics></math>, and <math alttext="Q3^{\prime}" class="ltx_Math" display="inline" id="S3.SS2.p5.6.m6.1"><semantics id="S3.SS2.p5.6.m6.1a"><mrow id="S3.SS2.p5.6.m6.1.1" xref="S3.SS2.p5.6.m6.1.1.cmml"><mi id="S3.SS2.p5.6.m6.1.1.2" xref="S3.SS2.p5.6.m6.1.1.2.cmml">Q</mi><mo id="S3.SS2.p5.6.m6.1.1.1" xref="S3.SS2.p5.6.m6.1.1.1.cmml"></mo><msup id="S3.SS2.p5.6.m6.1.1.3" xref="S3.SS2.p5.6.m6.1.1.3.cmml"><mn id="S3.SS2.p5.6.m6.1.1.3.2" xref="S3.SS2.p5.6.m6.1.1.3.2.cmml">3</mn><mo id="S3.SS2.p5.6.m6.1.1.3.3" xref="S3.SS2.p5.6.m6.1.1.3.3.cmml">′</mo></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.SS2.p5.6.m6.1b"><apply id="S3.SS2.p5.6.m6.1.1.cmml" xref="S3.SS2.p5.6.m6.1.1"><times id="S3.SS2.p5.6.m6.1.1.1.cmml" xref="S3.SS2.p5.6.m6.1.1.1"></times><ci id="S3.SS2.p5.6.m6.1.1.2.cmml" xref="S3.SS2.p5.6.m6.1.1.2">𝑄</ci><apply id="S3.SS2.p5.6.m6.1.1.3.cmml" xref="S3.SS2.p5.6.m6.1.1.3"><csymbol cd="ambiguous" id="S3.SS2.p5.6.m6.1.1.3.1.cmml" xref="S3.SS2.p5.6.m6.1.1.3">superscript</csymbol><cn id="S3.SS2.p5.6.m6.1.1.3.2.cmml" type="integer" xref="S3.SS2.p5.6.m6.1.1.3.2">3</cn><ci id="S3.SS2.p5.6.m6.1.1.3.3.cmml" xref="S3.SS2.p5.6.m6.1.1.3.3">′</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.SS2.p5.6.m6.1c">Q3^{\prime}</annotation><annotation encoding="application/x-llamapun" id="S3.SS2.p5.6.m6.1d">italic_Q 3 start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT</annotation></semantics></math> respectively). These results are sent to an aggregator module that deduplicates and re-ranks all the candidate documents. If a document appears in multiple results sets, its scores are summed. The top 500 documents are returned as the final result set.</p> </div> <figure class="ltx_figure" id="S3.F3"><img alt="Refer to caption" class="ltx_graphics ltx_img_landscape" height="322" id="S3.F3.g1" src="extracted/6246156/semantic-pipeline.png" width="589"/> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_figure">Figure 3: </span>The semantic search pipeline aggregates results from three query strategies, <math alttext="Q1" class="ltx_Math" display="inline" id="S3.F3.11.m1.1"><semantics id="S3.F3.11.m1.1b"><mrow id="S3.F3.11.m1.1.1" xref="S3.F3.11.m1.1.1.cmml"><mi id="S3.F3.11.m1.1.1.2" xref="S3.F3.11.m1.1.1.2.cmml">Q</mi><mo id="S3.F3.11.m1.1.1.1" xref="S3.F3.11.m1.1.1.1.cmml"></mo><mn id="S3.F3.11.m1.1.1.3" xref="S3.F3.11.m1.1.1.3.cmml">1</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.F3.11.m1.1c"><apply id="S3.F3.11.m1.1.1.cmml" xref="S3.F3.11.m1.1.1"><times id="S3.F3.11.m1.1.1.1.cmml" xref="S3.F3.11.m1.1.1.1"></times><ci id="S3.F3.11.m1.1.1.2.cmml" xref="S3.F3.11.m1.1.1.2">𝑄</ci><cn id="S3.F3.11.m1.1.1.3.cmml" type="integer" xref="S3.F3.11.m1.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.F3.11.m1.1d">Q1</annotation><annotation encoding="application/x-llamapun" id="S3.F3.11.m1.1e">italic_Q 1</annotation></semantics></math>, <math alttext="Q2" class="ltx_Math" display="inline" id="S3.F3.12.m2.1"><semantics id="S3.F3.12.m2.1b"><mrow id="S3.F3.12.m2.1.1" xref="S3.F3.12.m2.1.1.cmml"><mi id="S3.F3.12.m2.1.1.2" xref="S3.F3.12.m2.1.1.2.cmml">Q</mi><mo id="S3.F3.12.m2.1.1.1" xref="S3.F3.12.m2.1.1.1.cmml"></mo><mn id="S3.F3.12.m2.1.1.3" xref="S3.F3.12.m2.1.1.3.cmml">2</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.F3.12.m2.1c"><apply id="S3.F3.12.m2.1.1.cmml" xref="S3.F3.12.m2.1.1"><times id="S3.F3.12.m2.1.1.1.cmml" xref="S3.F3.12.m2.1.1.1"></times><ci id="S3.F3.12.m2.1.1.2.cmml" xref="S3.F3.12.m2.1.1.2">𝑄</ci><cn id="S3.F3.12.m2.1.1.3.cmml" type="integer" xref="S3.F3.12.m2.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.F3.12.m2.1d">Q2</annotation><annotation encoding="application/x-llamapun" id="S3.F3.12.m2.1e">italic_Q 2</annotation></semantics></math>, and <math alttext="Q3" class="ltx_Math" display="inline" id="S3.F3.13.m3.1"><semantics id="S3.F3.13.m3.1b"><mrow id="S3.F3.13.m3.1.1" xref="S3.F3.13.m3.1.1.cmml"><mi id="S3.F3.13.m3.1.1.2" xref="S3.F3.13.m3.1.1.2.cmml">Q</mi><mo id="S3.F3.13.m3.1.1.1" xref="S3.F3.13.m3.1.1.1.cmml"></mo><mn id="S3.F3.13.m3.1.1.3" xref="S3.F3.13.m3.1.1.3.cmml">3</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.F3.13.m3.1c"><apply id="S3.F3.13.m3.1.1.cmml" xref="S3.F3.13.m3.1.1"><times id="S3.F3.13.m3.1.1.1.cmml" xref="S3.F3.13.m3.1.1.1"></times><ci id="S3.F3.13.m3.1.1.2.cmml" xref="S3.F3.13.m3.1.1.2">𝑄</ci><cn id="S3.F3.13.m3.1.1.3.cmml" type="integer" xref="S3.F3.13.m3.1.1.3">3</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.F3.13.m3.1d">Q3</annotation><annotation encoding="application/x-llamapun" id="S3.F3.13.m3.1e">italic_Q 3</annotation></semantics></math>. <math alttext="Q1" class="ltx_Math" display="inline" id="S3.F3.14.m4.1"><semantics id="S3.F3.14.m4.1b"><mrow id="S3.F3.14.m4.1.1" xref="S3.F3.14.m4.1.1.cmml"><mi id="S3.F3.14.m4.1.1.2" xref="S3.F3.14.m4.1.1.2.cmml">Q</mi><mo id="S3.F3.14.m4.1.1.1" xref="S3.F3.14.m4.1.1.1.cmml"></mo><mn id="S3.F3.14.m4.1.1.3" xref="S3.F3.14.m4.1.1.3.cmml">1</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.F3.14.m4.1c"><apply id="S3.F3.14.m4.1.1.cmml" xref="S3.F3.14.m4.1.1"><times id="S3.F3.14.m4.1.1.1.cmml" xref="S3.F3.14.m4.1.1.1"></times><ci id="S3.F3.14.m4.1.1.2.cmml" xref="S3.F3.14.m4.1.1.2">𝑄</ci><cn id="S3.F3.14.m4.1.1.3.cmml" type="integer" xref="S3.F3.14.m4.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.F3.14.m4.1d">Q1</annotation><annotation encoding="application/x-llamapun" id="S3.F3.14.m4.1e">italic_Q 1</annotation></semantics></math> embeds the query using the sentence embedding model and retrieves the most relevant results based on cosine similarity. <math alttext="Q2" class="ltx_Math" display="inline" id="S3.F3.15.m5.1"><semantics id="S3.F3.15.m5.1b"><mrow id="S3.F3.15.m5.1.1" xref="S3.F3.15.m5.1.1.cmml"><mi id="S3.F3.15.m5.1.1.2" xref="S3.F3.15.m5.1.1.2.cmml">Q</mi><mo id="S3.F3.15.m5.1.1.1" xref="S3.F3.15.m5.1.1.1.cmml"></mo><mn id="S3.F3.15.m5.1.1.3" xref="S3.F3.15.m5.1.1.3.cmml">2</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.F3.15.m5.1c"><apply id="S3.F3.15.m5.1.1.cmml" xref="S3.F3.15.m5.1.1"><times id="S3.F3.15.m5.1.1.1.cmml" xref="S3.F3.15.m5.1.1.1"></times><ci id="S3.F3.15.m5.1.1.2.cmml" xref="S3.F3.15.m5.1.1.2">𝑄</ci><cn id="S3.F3.15.m5.1.1.3.cmml" type="integer" xref="S3.F3.15.m5.1.1.3">2</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.F3.15.m5.1d">Q2</annotation><annotation encoding="application/x-llamapun" id="S3.F3.15.m5.1e">italic_Q 2</annotation></semantics></math> and <math alttext="Q3" class="ltx_Math" display="inline" id="S3.F3.16.m6.1"><semantics id="S3.F3.16.m6.1b"><mrow id="S3.F3.16.m6.1.1" xref="S3.F3.16.m6.1.1.cmml"><mi id="S3.F3.16.m6.1.1.2" xref="S3.F3.16.m6.1.1.2.cmml">Q</mi><mo id="S3.F3.16.m6.1.1.1" xref="S3.F3.16.m6.1.1.1.cmml"></mo><mn id="S3.F3.16.m6.1.1.3" xref="S3.F3.16.m6.1.1.3.cmml">3</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.F3.16.m6.1c"><apply id="S3.F3.16.m6.1.1.cmml" xref="S3.F3.16.m6.1.1"><times id="S3.F3.16.m6.1.1.1.cmml" xref="S3.F3.16.m6.1.1.1"></times><ci id="S3.F3.16.m6.1.1.2.cmml" xref="S3.F3.16.m6.1.1.2">𝑄</ci><cn id="S3.F3.16.m6.1.1.3.cmml" type="integer" xref="S3.F3.16.m6.1.1.3">3</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.F3.16.m6.1d">Q3</annotation><annotation encoding="application/x-llamapun" id="S3.F3.16.m6.1e">italic_Q 3</annotation></semantics></math> retrieve the most relevant documents from the lexical index. <math alttext="Q3" class="ltx_Math" display="inline" id="S3.F3.17.m7.1"><semantics id="S3.F3.17.m7.1b"><mrow id="S3.F3.17.m7.1.1" xref="S3.F3.17.m7.1.1.cmml"><mi id="S3.F3.17.m7.1.1.2" xref="S3.F3.17.m7.1.1.2.cmml">Q</mi><mo id="S3.F3.17.m7.1.1.1" xref="S3.F3.17.m7.1.1.1.cmml"></mo><mn id="S3.F3.17.m7.1.1.3" xref="S3.F3.17.m7.1.1.3.cmml">3</mn></mrow><annotation-xml encoding="MathML-Content" id="S3.F3.17.m7.1c"><apply id="S3.F3.17.m7.1.1.cmml" xref="S3.F3.17.m7.1.1"><times id="S3.F3.17.m7.1.1.1.cmml" xref="S3.F3.17.m7.1.1.1"></times><ci id="S3.F3.17.m7.1.1.2.cmml" xref="S3.F3.17.m7.1.1.2">𝑄</ci><cn id="S3.F3.17.m7.1.1.3.cmml" type="integer" xref="S3.F3.17.m7.1.1.3">3</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.F3.17.m7.1d">Q3</annotation><annotation encoding="application/x-llamapun" id="S3.F3.17.m7.1e">italic_Q 3</annotation></semantics></math> adds filtering and keyword extraction steps to transform the narrative description in causal search terms. Finally results from all three queries (<math alttext="Q1^{\prime}" class="ltx_Math" display="inline" id="S3.F3.18.m8.1"><semantics id="S3.F3.18.m8.1b"><mrow id="S3.F3.18.m8.1.1" xref="S3.F3.18.m8.1.1.cmml"><mi id="S3.F3.18.m8.1.1.2" xref="S3.F3.18.m8.1.1.2.cmml">Q</mi><mo id="S3.F3.18.m8.1.1.1" xref="S3.F3.18.m8.1.1.1.cmml"></mo><msup id="S3.F3.18.m8.1.1.3" xref="S3.F3.18.m8.1.1.3.cmml"><mn id="S3.F3.18.m8.1.1.3.2" xref="S3.F3.18.m8.1.1.3.2.cmml">1</mn><mo id="S3.F3.18.m8.1.1.3.3" xref="S3.F3.18.m8.1.1.3.3.cmml">′</mo></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.F3.18.m8.1c"><apply id="S3.F3.18.m8.1.1.cmml" xref="S3.F3.18.m8.1.1"><times id="S3.F3.18.m8.1.1.1.cmml" xref="S3.F3.18.m8.1.1.1"></times><ci id="S3.F3.18.m8.1.1.2.cmml" xref="S3.F3.18.m8.1.1.2">𝑄</ci><apply id="S3.F3.18.m8.1.1.3.cmml" xref="S3.F3.18.m8.1.1.3"><csymbol cd="ambiguous" id="S3.F3.18.m8.1.1.3.1.cmml" xref="S3.F3.18.m8.1.1.3">superscript</csymbol><cn id="S3.F3.18.m8.1.1.3.2.cmml" type="integer" xref="S3.F3.18.m8.1.1.3.2">1</cn><ci id="S3.F3.18.m8.1.1.3.3.cmml" xref="S3.F3.18.m8.1.1.3.3">′</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.F3.18.m8.1d">Q1^{\prime}</annotation><annotation encoding="application/x-llamapun" id="S3.F3.18.m8.1e">italic_Q 1 start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT</annotation></semantics></math>, <math alttext="Q2^{\prime}" class="ltx_Math" display="inline" id="S3.F3.19.m9.1"><semantics id="S3.F3.19.m9.1b"><mrow id="S3.F3.19.m9.1.1" xref="S3.F3.19.m9.1.1.cmml"><mi id="S3.F3.19.m9.1.1.2" xref="S3.F3.19.m9.1.1.2.cmml">Q</mi><mo id="S3.F3.19.m9.1.1.1" xref="S3.F3.19.m9.1.1.1.cmml"></mo><msup id="S3.F3.19.m9.1.1.3" xref="S3.F3.19.m9.1.1.3.cmml"><mn id="S3.F3.19.m9.1.1.3.2" xref="S3.F3.19.m9.1.1.3.2.cmml">2</mn><mo id="S3.F3.19.m9.1.1.3.3" xref="S3.F3.19.m9.1.1.3.3.cmml">′</mo></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.F3.19.m9.1c"><apply id="S3.F3.19.m9.1.1.cmml" xref="S3.F3.19.m9.1.1"><times id="S3.F3.19.m9.1.1.1.cmml" xref="S3.F3.19.m9.1.1.1"></times><ci id="S3.F3.19.m9.1.1.2.cmml" xref="S3.F3.19.m9.1.1.2">𝑄</ci><apply id="S3.F3.19.m9.1.1.3.cmml" xref="S3.F3.19.m9.1.1.3"><csymbol cd="ambiguous" id="S3.F3.19.m9.1.1.3.1.cmml" xref="S3.F3.19.m9.1.1.3">superscript</csymbol><cn id="S3.F3.19.m9.1.1.3.2.cmml" type="integer" xref="S3.F3.19.m9.1.1.3.2">2</cn><ci id="S3.F3.19.m9.1.1.3.3.cmml" xref="S3.F3.19.m9.1.1.3.3">′</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.F3.19.m9.1d">Q2^{\prime}</annotation><annotation encoding="application/x-llamapun" id="S3.F3.19.m9.1e">italic_Q 2 start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT</annotation></semantics></math>, and <math alttext="Q3^{\prime}" class="ltx_Math" display="inline" id="S3.F3.20.m10.1"><semantics id="S3.F3.20.m10.1b"><mrow id="S3.F3.20.m10.1.1" xref="S3.F3.20.m10.1.1.cmml"><mi id="S3.F3.20.m10.1.1.2" xref="S3.F3.20.m10.1.1.2.cmml">Q</mi><mo id="S3.F3.20.m10.1.1.1" xref="S3.F3.20.m10.1.1.1.cmml"></mo><msup id="S3.F3.20.m10.1.1.3" xref="S3.F3.20.m10.1.1.3.cmml"><mn id="S3.F3.20.m10.1.1.3.2" xref="S3.F3.20.m10.1.1.3.2.cmml">3</mn><mo id="S3.F3.20.m10.1.1.3.3" xref="S3.F3.20.m10.1.1.3.3.cmml">′</mo></msup></mrow><annotation-xml encoding="MathML-Content" id="S3.F3.20.m10.1c"><apply id="S3.F3.20.m10.1.1.cmml" xref="S3.F3.20.m10.1.1"><times id="S3.F3.20.m10.1.1.1.cmml" xref="S3.F3.20.m10.1.1.1"></times><ci id="S3.F3.20.m10.1.1.2.cmml" xref="S3.F3.20.m10.1.1.2">𝑄</ci><apply id="S3.F3.20.m10.1.1.3.cmml" xref="S3.F3.20.m10.1.1.3"><csymbol cd="ambiguous" id="S3.F3.20.m10.1.1.3.1.cmml" xref="S3.F3.20.m10.1.1.3">superscript</csymbol><cn id="S3.F3.20.m10.1.1.3.2.cmml" type="integer" xref="S3.F3.20.m10.1.1.3.2">3</cn><ci id="S3.F3.20.m10.1.1.3.3.cmml" xref="S3.F3.20.m10.1.1.3.3">′</ci></apply></apply></annotation-xml><annotation encoding="application/x-tex" id="S3.F3.20.m10.1d">Q3^{\prime}</annotation><annotation encoding="application/x-llamapun" id="S3.F3.20.m10.1e">italic_Q 3 start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT</annotation></semantics></math>) are aggregated and re-ranked by the aggregator module. The top 500 relevant submissions are returned.</figcaption> </figure> </section> <section class="ltx_subsection" id="S3.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">3.3 </span>Post Query Causal Filtering</h3> <div class="ltx_para" id="S3.SS3.p1"> <p class="ltx_p" id="S3.SS3.p1.1">We additionally explored a post query filtering step. This involved extracting causal relations (if any were found) from the candidate document. Candidate documents would have passed this filtering stage if the extracted cause had an overlap with the query text and the extracted effect overlapped with the narrative causal keywords. This approach did not yield promising results on the train topics and was not explored further on the test topics. Often the causal documents did not mention the caused event as the document was reporting news that occurred before the query event. This filtering method would have failed to identify news reports of events in the past that lead to the query event because at the time of the reporting, the article did not know about the query event (as it would happen in the future).</p> </div> </section> </section> <section class="ltx_section" id="S4"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">4 </span>Experiments</h2> <div class="ltx_para" id="S4.p1"> <p class="ltx_p" id="S4.p1.1">In this section, we describe our implementation and experiment results.</p> </div> <section class="ltx_subsection" id="S4.SS1"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.1 </span>Data</h3> <div class="ltx_para" id="S4.SS1.p1"> <p class="ltx_p" id="S4.SS1.p1.1">The CAIR dataset contains 303,291 Telegraph India news articles from 2001 to 2010 <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#bib.bib3" title="">3</a>]</cite>. There are 5 train topics and 20 test topics provided. Each topic (e.g. Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S1.F1" title="Figure 1 ‣ 1 Introduction ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_tag">1</span></a>) consists of a title, which describes the query event, and a narrative that describes the expected relevant and irrelevant documents.</p> </div> </section> <section class="ltx_subsection" id="S4.SS2"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.2 </span>Setup</h3> <div class="ltx_para" id="S4.SS2.p1"> <p class="ltx_p" id="S4.SS2.p1.2">The spacy library <span class="ltx_note ltx_role_footnote" id="footnote1"><sup class="ltx_note_mark">1</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">1</sup><span class="ltx_tag ltx_tag_note">1</span>https://spacy.io</span></span></span> was used for prepossessing (i.e. lemmatizing and tokenizing). We used the python rank25 library <span class="ltx_note ltx_role_footnote" id="footnote2"><sup class="ltx_note_mark">2</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">2</sup><span class="ltx_tag ltx_tag_note">2</span>https://github.com/dorianbrown/rank_bm25</span></span></span> to implement a lexical index optimized for Okapi BM25 scoring. The default values were used for the <math alttext="k1" class="ltx_Math" display="inline" id="S4.SS2.p1.1.m1.1"><semantics id="S4.SS2.p1.1.m1.1a"><mrow id="S4.SS2.p1.1.m1.1.1" xref="S4.SS2.p1.1.m1.1.1.cmml"><mi id="S4.SS2.p1.1.m1.1.1.2" xref="S4.SS2.p1.1.m1.1.1.2.cmml">k</mi><mo id="S4.SS2.p1.1.m1.1.1.1" xref="S4.SS2.p1.1.m1.1.1.1.cmml"></mo><mn id="S4.SS2.p1.1.m1.1.1.3" xref="S4.SS2.p1.1.m1.1.1.3.cmml">1</mn></mrow><annotation-xml encoding="MathML-Content" id="S4.SS2.p1.1.m1.1b"><apply id="S4.SS2.p1.1.m1.1.1.cmml" xref="S4.SS2.p1.1.m1.1.1"><times id="S4.SS2.p1.1.m1.1.1.1.cmml" xref="S4.SS2.p1.1.m1.1.1.1"></times><ci id="S4.SS2.p1.1.m1.1.1.2.cmml" xref="S4.SS2.p1.1.m1.1.1.2">𝑘</ci><cn id="S4.SS2.p1.1.m1.1.1.3.cmml" type="integer" xref="S4.SS2.p1.1.m1.1.1.3">1</cn></apply></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p1.1.m1.1c">k1</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p1.1.m1.1d">italic_k 1</annotation></semantics></math> (1.5) and <math alttext="b" class="ltx_Math" display="inline" id="S4.SS2.p1.2.m2.1"><semantics id="S4.SS2.p1.2.m2.1a"><mi id="S4.SS2.p1.2.m2.1.1" xref="S4.SS2.p1.2.m2.1.1.cmml">b</mi><annotation-xml encoding="MathML-Content" id="S4.SS2.p1.2.m2.1b"><ci id="S4.SS2.p1.2.m2.1.1.cmml" xref="S4.SS2.p1.2.m2.1.1">𝑏</ci></annotation-xml><annotation encoding="application/x-tex" id="S4.SS2.p1.2.m2.1c">b</annotation><annotation encoding="application/x-llamapun" id="S4.SS2.p1.2.m2.1d">italic_b</annotation></semantics></math> (0.75) parameters.</p> </div> <div class="ltx_para" id="S4.SS2.p2"> <p class="ltx_p" id="S4.SS2.p2.1">For the semantic index, we use the pretrained <span class="ltx_text ltx_font_italic" id="S4.SS2.p2.1.1">msmarco-distilbert-base-v4</span> sentence embedding model from the SentenceTransformers library <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#bib.bib8" title="">8</a>]</cite>. This model was pretrained on the MS Marco passage ranking dataset <cite class="ltx_cite ltx_citemacro_cite">[<a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#bib.bib10" title="">10</a>]</cite> which has asymmetric input properties as the query is often shorted the relevant passage. The MS Marco dataset consists of a million queries from the Bing search engine and 8.8 million passages from search results. The passage ranking task requires the model to find the most relevant passages for a provided query and rank them. Documents and qrels from the CAIR corpus were not used for the pretraining of the sentence embedding model.</p> </div> <div class="ltx_para" id="S4.SS2.p3"> <p class="ltx_p" id="S4.SS2.p3.1">All the documents in the CAIR corpus were embedded using the <span class="ltx_text ltx_font_italic" id="S4.SS2.p3.1.1">msmarco-distilbert-base-v4</span> sentence embedding model and then stored in an index optimized for approximate nearest neighbors search. We used the ANNOY python library <span class="ltx_note ltx_role_footnote" id="footnote3"><sup class="ltx_note_mark">3</sup><span class="ltx_note_outer"><span class="ltx_note_content"><sup class="ltx_note_mark">3</sup><span class="ltx_tag ltx_tag_note">3</span>https://github.com/spotify/annoy</span></span></span> to store the document embeddings and built a search index of 1000 trees.</p> </div> </section> <section class="ltx_subsection" id="S4.SS3"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.3 </span>Baselines</h3> <div class="ltx_para" id="S4.SS3.p1"> <p class="ltx_p" id="S4.SS3.p1.1">We evaluated our approach against four different lexical and semantic baselines. All the baselines returned the top 500 relevant results which were evaluated against the gold document relevance set. Mean Average Precision (MAP) and Precision at 5 (P@5) metrics were used for evaluation. The first (Narrative Only Okapi BM25) baseline used returned results from the lexical index using the narrative text as the query. The second baseline (Query Only Okapi BM25) used the title as the query for lexical index. The third baseline (Query + Narrative Semantic) combined the query and title texts and retrieved the most relevant semantic results from the semantic index. Finally, the last baseline only used the title text to query the semantic index.</p> </div> </section> <section class="ltx_subsection" id="S4.SS4"> <h3 class="ltx_title ltx_title_subsection"> <span class="ltx_tag ltx_tag_subsection">4.4 </span>Results</h3> <div class="ltx_para" id="S4.SS4.p1"> <p class="ltx_p" id="S4.SS4.p1.1">Experiment results can be found in Table <a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S4.SS4" title="4.4 Results ‣ 4 Experiments ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_tag">4.4</span></a>. In addition to our baselines, we include the results of the best submission from the NITS team in the CAIR 2021 shared task. The test set contained 20 topics and a gold relevance set which identified causally relevant documents in the corpus. Our semantic search pipeline outperforms all the baseline methods and leads the shared task leader board. The semantic search pipeline posted a twenty-five percent increase in MAP and a fourteen percent increase in P@5 over the Narrative Only Okapi BM25 baseline.</p> </div> <figure class="ltx_table" id="S4.T1"> <figcaption class="ltx_caption"><span class="ltx_tag ltx_tag_table">Table 1: </span>Experiment Results for the 20 test topics.</figcaption> <table class="ltx_tabular ltx_align_middle" id="S4.T1.1"> <tr class="ltx_tr" id="S4.T1.1.1"> <td class="ltx_td ltx_align_left ltx_border_tt" id="S4.T1.1.1.1"><span class="ltx_text ltx_font_bold" id="S4.T1.1.1.1.1">Method</span></td> <td class="ltx_td ltx_align_left ltx_border_tt" id="S4.T1.1.1.2"><span class="ltx_text ltx_font_bold" id="S4.T1.1.1.2.1">MAP</span></td> <td class="ltx_td ltx_nopad_r ltx_align_left ltx_border_tt" id="S4.T1.1.1.3"><span class="ltx_text ltx_font_bold" id="S4.T1.1.1.3.1">P@5</span></td> </tr> <tr class="ltx_tr" id="S4.T1.1.2"> <td class="ltx_td ltx_align_left ltx_border_t" id="S4.T1.1.2.1">Semantic Pipeline</td> <td class="ltx_td ltx_align_left ltx_border_t" id="S4.T1.1.2.2">.5761</td> <td class="ltx_td ltx_nopad_r ltx_align_left ltx_border_t" id="S4.T1.1.2.3">.7800</td> </tr> <tr class="ltx_tr" id="S4.T1.1.3"> <td class="ltx_td ltx_align_left" id="S4.T1.1.3.1">Narrative Only Okapi BM25 baseline</td> <td class="ltx_td ltx_align_left" id="S4.T1.1.3.2">.3285</td> <td class="ltx_td ltx_nopad_r ltx_align_left" id="S4.T1.1.3.3">.6399</td> </tr> <tr class="ltx_tr" id="S4.T1.1.4"> <td class="ltx_td ltx_align_left" id="S4.T1.1.4.1">Query Only Okapi BM25 baseline</td> <td class="ltx_td ltx_align_left" id="S4.T1.1.4.2">.2561</td> <td class="ltx_td ltx_nopad_r ltx_align_left" id="S4.T1.1.4.3">.4999</td> </tr> <tr class="ltx_tr" id="S4.T1.1.5"> <td class="ltx_td ltx_align_left" id="S4.T1.1.5.1">Query + Narrative Semantic baseline</td> <td class="ltx_td ltx_align_left" id="S4.T1.1.5.2">.2239</td> <td class="ltx_td ltx_nopad_r ltx_align_left" id="S4.T1.1.5.3">.5500</td> </tr> <tr class="ltx_tr" id="S4.T1.1.6"> <td class="ltx_td ltx_align_left" id="S4.T1.1.6.1">Query Only Semantic baseline</td> <td class="ltx_td ltx_align_left" id="S4.T1.1.6.2">.1611</td> <td class="ltx_td ltx_nopad_r ltx_align_left" id="S4.T1.1.6.3">.5000</td> </tr> <tr class="ltx_tr" id="S4.T1.1.7"> <td class="ltx_td ltx_align_left ltx_border_bb" id="S4.T1.1.7.1">NITS-Run</td> <td class="ltx_td ltx_align_left ltx_border_bb" id="S4.T1.1.7.2">.1063</td> <td class="ltx_td ltx_nopad_r ltx_align_left ltx_border_bb" id="S4.T1.1.7.3">.4800</td> </tr> </table> </figure> <div class="ltx_para" id="S4.SS4.p2"> <p class="ltx_p" id="S4.SS4.p2.1">Our semantic search pipeline uses the same lexical and semantic indexes as the baselines. However, the pipeline is better able to combine the lexical and semantic results to produce the most causally relevant documents. The aggregator module conceptually functions as an ensemble model and weights documents that appear in multiple query result sets higher. Each query strategy utilized information from the topic differently and the final result set reflected that.</p> </div> <div class="ltx_para" id="S4.SS4.p3"> <p class="ltx_p" id="S4.SS4.p3.1">Amongst the baselines, the Narrative Only Okapi BM25 baseline was the strongest. The narrative text contains the most useful information about what caused the query event and was expected to provide the best results amongst the baselines. However, the narrative input with a lexical index is still prone to returning topical documents that are not causally relevant. Figure <a class="ltx_ref" href="https://arxiv.org/html/2503.01003v1#S4.F4" title="Figure 4 ‣ 4.4 Results ‣ 4 Experiments ‣ A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval"><span class="ltx_text ltx_ref_tag">4</span></a> provides a qualitative comparison between the Narrative Only Okapi BM25 baseline and the Semantic Search Pipeline. The baseline models match on terms present in the narrative but the article is focused on accusing Modi of misconduct in the context of the IPL Kochi scandal. In contrast, the Semantic Search Pipeline correctly identifies a document that describes why Shashi Tharoor resigned in relation to scandal and his friend Sunanda Pushkar.</p> </div> <figure class="ltx_figure" id="S4.F4"><img alt="Refer to caption" class="ltx_graphics ltx_centering ltx_img_landscape" height="211" id="S4.F4.g1" src="extracted/6246156/analysis.png" width="550"/> <figcaption class="ltx_caption ltx_centering"><span class="ltx_tag ltx_tag_figure">Figure 4: </span>Example results returned by Semantic Search Pipeline and the Narrative Only Okapi BM25 baseline. The baseline returns a topically relevant result based on keyword matches but fails to describe why Shashi Tharoor resigned.</figcaption> </figure> </section> </section> <section class="ltx_section" id="S5"> <h2 class="ltx_title ltx_title_section"> <span class="ltx_tag ltx_tag_section">5 </span>Conclusion</h2> <div class="ltx_para" id="S5.p1"> <p class="ltx_p" id="S5.p1.1">In this paper, we introduced a semantic search pipeline for the CAIR-2021 shared task. Our approach aggregated results from multiple query strategies across a lexical and semantic index. The semantic search pipeline outperformed the lexical and simple semantic baselines and was the top method on the CAIR 2021 leader board. This approach should serve as a stepping stone toward better causal information retrieval. Future work could explore developing a better model of causality and retrieving results using the query title only. The narrative text provides strong clues as the causal terms that would be in the causally relevant documents. A causal search system would have a better way identify and causally linking events.</p> </div> <div class="ltx_acknowledgements"> <h6 class="ltx_title ltx_title_acknowledgements">Acknowledgements.</h6> This work was supported by Science Foundation Ireland under grants SFI/18/CRT/6223 (Centre for Research Training in Artificial Intelligence). </div> </section> <section class="ltx_bibliography" id="bib"> <h2 class="ltx_title ltx_title_bibliography">References</h2> <ul class="ltx_biblist"> <li class="ltx_bibitem" id="bib.bib1"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Datta et al. [2020a]</span> <span class="ltx_bibblock"> S. Datta, D. Ganguly, D. Roy, D. Greene, C. Jochim, F. Bonin, </span> <span class="ltx_bibblock">Overview of the causality-driven adhoc information retrieval (cair) task at fire-2020, </span> <span class="ltx_bibblock">in: Forum for Information Retrieval Evaluation, FIRE 2020, Association for Computing Machinery, New York, NY, USA, 2020a, p. 14–17. URL: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://doi.org/10.1145/3441501.3441513" title="">https://doi.org/10.1145/3441501.3441513</a>. doi:<a class="ltx_ref" href="https:/doi.org/10.1145/3441501.3441513" title="">10.1145/3441501.3441513</a>. </span> </li> <li class="ltx_bibitem" id="bib.bib2"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Datta et al. [2020b]</span> <span class="ltx_bibblock"> S. Datta, D. Greene, D. Ganguly, D. Roy, M. Mitra, </span> <span class="ltx_bibblock">Where’s the why? in search of chains of causes for query events, </span> <span class="ltx_bibblock">in: AICS, 2020b. </span> </li> <li class="ltx_bibitem" id="bib.bib3"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Datta et al. [2020c]</span> <span class="ltx_bibblock"> S. Datta, D. Ganguly, D. Roy, F. Bonin, C. Jochim, M. Mitra, </span> <span class="ltx_bibblock">Retrieving potential causes from a query event, </span> <span class="ltx_bibblock">in: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020c, pp. 1689–1692. </span> </li> <li class="ltx_bibitem" id="bib.bib4"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Pang et al. [2020]</span> <span class="ltx_bibblock"> L. Pang, J. Xu, Q. Ai, Y. Lan, X. Cheng, J. Wen, </span> <span class="ltx_bibblock">Setrank: Learning a permutation-invariant ranking model for information retrieval, </span> <span class="ltx_bibblock">in: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020, pp. 499–508. </span> </li> <li class="ltx_bibitem" id="bib.bib5"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Gao et al. [2021]</span> <span class="ltx_bibblock"> L. Gao, Z. Dai, T. Chen, Z. Fan, B. V. Durme, J. Callan, </span> <span class="ltx_bibblock">Complement lexical retrieval model with semantic residual embeddings, </span> <span class="ltx_bibblock">in: European Conference on Information Retrieval, Springer, 2021, pp. 146–160. </span> </li> <li class="ltx_bibitem" id="bib.bib6"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Robertson et al. [1994]</span> <span class="ltx_bibblock"> S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, M. Gatford, </span> <span class="ltx_bibblock">Okapi at trec-3, </span> <span class="ltx_bibblock">in: TREC, 1994. </span> </li> <li class="ltx_bibitem" id="bib.bib7"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Cer et al. [2018]</span> <span class="ltx_bibblock"> D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. St. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, B. Strope, R. Kurzweil, </span> <span class="ltx_bibblock">Universal sentence encoder for English, </span> <span class="ltx_bibblock">in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 169–174. URL: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://aclanthology.org/D18-2029" title="">https://aclanthology.org/D18-2029</a>. doi:<a class="ltx_ref" href="https:/doi.org/10.18653/v1/D18-2029" title="">10.18653/v1/D18-2029</a>. </span> </li> <li class="ltx_bibitem" id="bib.bib8"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Reimers and Gurevych [2019]</span> <span class="ltx_bibblock"> N. Reimers, I. Gurevych, </span> <span class="ltx_bibblock">Sentence-bert: Sentence embeddings using siamese bert-networks, </span> <span class="ltx_bibblock">in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2019. URL: <a class="ltx_ref ltx_url ltx_font_typewriter" href="http://arxiv.org/abs/1908.10084" title="">http://arxiv.org/abs/1908.10084</a>. </span> </li> <li class="ltx_bibitem" id="bib.bib9"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Bougouin et al. [2013]</span> <span class="ltx_bibblock"> A. Bougouin, F. Boudin, B. Daille, </span> <span class="ltx_bibblock">TopicRank: Graph-based topic ranking for keyphrase extraction, </span> <span class="ltx_bibblock">in: Proceedings of the Sixth International Joint Conference on Natural Language Processing, Asian Federation of Natural Language Processing, Nagoya, Japan, 2013, pp. 543–551. URL: <a class="ltx_ref ltx_url ltx_font_typewriter" href="https://aclanthology.org/I13-1062" title="">https://aclanthology.org/I13-1062</a>. </span> </li> <li class="ltx_bibitem" id="bib.bib10"> <span class="ltx_tag ltx_role_refnum ltx_tag_bibitem">Nguyen et al. [2016]</span> <span class="ltx_bibblock"> T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, L. Deng, </span> <span class="ltx_bibblock">MS MARCO: A human generated machine reading comprehension dataset, </span> <span class="ltx_bibblock">CoRR abs/1611.09268 (2016). URL: <a class="ltx_ref ltx_url ltx_font_typewriter" href="http://arxiv.org/abs/1611.09268" title="">http://arxiv.org/abs/1611.09268</a>. <a class="ltx_ref ltx_href ltx_font_typewriter" href="http://arxiv.org/abs/1611.09268" title="">arXiv:1611.09268</a>. </span> </li> </ul> </section> </article> </div> <footer class="ltx_page_footer"> <div class="ltx_page_logo">Generated on Sun Mar 2 19:51:08 2025 by <a class="ltx_LaTeXML_logo" href="http://dlmf.nist.gov/LaTeXML/"><span style="letter-spacing:-0.2em; margin-right:0.1em;">L<span class="ltx_font_smallcaps" style="position:relative; bottom:2.2pt;">a</span>T<span class="ltx_font_smallcaps" style="font-size:120%;position:relative; bottom:-0.2ex;">e</span></span><span style="font-size:90%; position:relative; bottom:-0.2ex;">XML</span><img alt="Mascot Sammy" src=""/></a> </div></footer> </div> </body> </html>