CINXE.COM

Improving the Sensitivity of MinHash Through Hash-Value Analysis

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <meta name="google" content="notranslate"> <meta http-equiv="Content-Language" content="en"> <meta name="viewport" content="width=device-width, initial-scale=1" /> <meta name="csrf-token" content="Z71zJF4RwHjFhJUwbbN3jA14QHkD9Vh72p72V1Dr" /> <meta name="DC.Publisher" content="Schloss Dagstuhl – Leibniz-Zentrum für Informatik" /> <meta name="DC.Title" xml:lang="en" content="Improving the Sensitivity of MinHash Through Hash-Value Analysis" /> <meta name="DC.Creator.PersonalName" content="Kucherov, Gregory" /> <meta name="DC.Creator.PersonalName" content="Skiena, Steven" /> <meta name="DC.Subject" content="MinHash sketching" /> <meta name="DC.Subject" content="sequence similarity" /> <meta name="DC.Subject" content="hashing" /> <meta name="DC.Date.created" scheme="ISO8601" content="2023-06-21" /> <meta name="DC.Date.issued" scheme="ISO8601" content="2023-06-21" /> <meta name="DC.Date.modified" scheme="ISO8601" content="2023-06-21" /> <meta name="DC.Type" content="InProceedings" /> <meta name="DC.Format" scheme="IMT" content="application/pdf" /> <meta name="DC.Identifier" scheme="urn:nbn:de" content="urn:nbn:de:0030-drops-179740" /> <meta name="DC.Identifier" scheme="DOI" content="10.4230/LIPIcs.CPM.2023.20" /> <meta name="DC.Identifier" scheme="URL" content="https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.20" /> <meta name="DC.Source" content="34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)" /> <meta name="DC.Source.URI" content="https://drops.dagstuhl.de/entities/volume/LIPIcs-volume-259" /> <meta name="DC.Language" scheme="ISO639-1" content="en" /> <meta name="DC.Description" xml:lang="en" content="MinHash sketching is an important algorithm for efficient document retrieval and bioinformatics. We show that the value of the matching MinHash codes convey additional information about the Jaccard similarity of S and T over and above the fact that the MinHash codes agree. This observation holds the potential to increase the sensitivity of minhash-based retrieval systems. We analyze the expected Jaccard similarity of two sets as a function of observing a matching MinHash value a under a reasonable prior distribution on intersection set sizes, and present a practical approach to using MinHash values to improve the sensitivity of traditional Jaccard similarity estimation, based on the Kolmogorov-Smirnov statistical test for sample distributions. Experiments over a wide range of hash function counts and set similarities show a small but consistent improvement over chance at predicting over/under-estimation, yielding an average accuracy of 61% over the range of experiments." /> <meta name="DC.Rights" scheme="DCTERMS.URI" content="https://creativecommons.org/licenses/by/4.0/legalcode" /> <meta name="citation_conference_title" content="34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)" /> <meta name="citation_doi" content="10.4230/LIPIcs.CPM.2023.20" /> <meta name="citation_firstpage" content="20:1" /> <meta name="citation_lastpage" content="20:12" /> <meta name="citation_title" content="Improving the Sensitivity of MinHash Through Hash-Value Analysis" /> <meta name="citation_language" content="en" /> <meta name="citation_keyword" content="MinHash sketching" /> <meta name="citation_keyword" content="sequence similarity" /> <meta name="citation_keyword" content="hashing" /> <meta name="citation_author" content="Kucherov, Gregory" /> <meta name="citation_author_institution" content="LIGM, CNRS/Université Gustave Eiffel, Marne-la-Vallée, France" /> <meta name="citation_author" content="Skiena, Steven" /> <meta name="citation_author_institution" content="Dept. of Computer Science, Stony Brook University, Stony Brook, NY, USA" /> <meta name="citation_date" content="2023" /> <meta name="citation_keyword" xml:lang="en" content="MinHash sketching" /> <meta name="citation_keyword" xml:lang="en" content="sequence similarity" /> <meta name="citation_keyword" xml:lang="en" content="hashing" /> <meta name="citation_pdf_url" content="https://drops.dagstuhl.de/storage/00lipics/lipics-vol259-cpm2023/LIPIcs.CPM.2023.20/LIPIcs.CPM.2023.20.pdf" /> <meta name="citation_reference" content="Daniel N. Baker and Ben Langmead. Dashing: fast and accurate genomic distances with HyperLogLog. Genome Biology, 20:265, 2019. URL: https://doi.org/10.1186/s13059-019-1875-0." /> <meta name="citation_reference" content="Andrei Broder. On the resemblance and containment of documents. In Proceedings of the Compression and Complexity of Sequences 1997, SEQUENCES &#039;97, pages 21-, Washington, DC, USA, 1997. IEEE Computer Society. URL: http://dl.acm.org/citation.cfm?id=829502.830043." /> <meta name="citation_reference" content="Andrei Z. Broder. Identifying and filtering near-duplicate documents. In Combinatorial Pattern Matching, 11th Annual Symposium, CPM 2000, Montreal, Canada, June 21-23, 2000, Proceedings, pages 1-10, 2000. URL: https://doi.org/10.1007/3-540-45123-4_1." /> <meta name="citation_reference" content="Andrei Z Broder, Moses Charikar, Alan M Frieze, and Michael Mitzenmacher. Min-wise independent permutations. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 327-336, 1998." /> <meta name="citation_reference" content="C. Titus Brown and Luiz Irber. Sourmash: a library for minhash sketching of dna. Journal of Open Source Software, 1(5):27, 2016. URL: https://doi.org/10.21105/joss.00027." /> <meta name="citation_reference" content="Moses S Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 380-388, 2002." /> <meta name="citation_reference" content="Lianhua Chi, Bin Li, and Xingquan Zhu. Context-preserving hashing for fast text classification. In Proceedings of the 2014 SIAM International Conference on Data Mining, pages 100-108. SIAM, 2014." /> <meta name="citation_reference" content="Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. On compressing social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 219-228, 2009." /> <meta name="citation_reference" content="Edith Cohen. Min-hash sketches: A brief survey, 2016. URL: http://www.cohenwang.com/edith/Surveys/minhash.pdf." /> <meta name="citation_reference" content="Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry, pages 253-262, 2004." /> <meta name="citation_reference" content="Otmar Ertl. Setsketch: Filling the gap between minhash and hyperloglog. Proc. VLDB Endow., 14(11):2244-2257, 2021. URL: https://doi.org/10.14778/3476249.3476276." /> <meta name="citation_reference" content="Dennis Fetterly, Mark Manasse, Marc Najork, and Janet Wiener. A large-scale study of the evolution of web pages. In Proceedings of the 12th international conference on World Wide Web, pages 669-678, 2003." /> <meta name="citation_reference" content="Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. Discrete Mathematics &amp; Theoretical Computer Science, DMTCS Proceedings vol. AH, 2007 Conference on Analysis of Algorithms (AofA 07), January 2007. URL: https://doi.org/10.46298/dmtcs.3545." /> <meta name="citation_reference" content="Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al. Similarity search in high dimensions via hashing. In Vldb, volume 99, pages 518-529, 1999." /> <meta name="citation_reference" content="Monika Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 284-291, 2006." /> <meta name="citation_reference" content="Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604-613, 1998." /> <meta name="citation_reference" content="Nitin Jindal and Bing Liu. Opinion spam and analysis. In Proceedings of the 2008 international conference on web search and data mining, pages 219-230, 2008." /> <meta name="citation_reference" content="Andrey Kolmogorov. Sulla determinazione empirica di una lgge di distribuzione. Inst. Ital. Attuari, Giorn., 4:83-91, 1933." /> <meta name="citation_reference" content="Ping Li, Anshumali Shrivastava, Joshua Moore, and Arnd König. Hashing algorithms for large-scale learning. Advances in neural information processing systems, 24, 2011." /> <meta name="citation_reference" content="Brian D. Ondov, Gabriel Starrett, Anna Sappington, Aleksandra Kostic, Sergey Koren, Christopher B. Buck, and Adam M. Phillippy. Mash screen: high-throughput sequence containment estimation for genome discovery. Genome Biology, 20:232, 2019. URL: https://doi.org/10.1186/s13059-019-1841-x." /> <meta name="citation_reference" content="Brian D Ondov, Todd J Treangen, Páll Melsted, Adam B Mallonee, Nicholas H Bergman, Sergey Koren, and Adam M Phillippy. Mash: fast genome and metagenome distance estimation using minhash. Genome Biology, 17(1):1-14, 2016." /> <meta name="citation_reference" content="Anshumali Shrivastava and Ping Li. In defense of MinHash over SimHash. In Artificial Intelligence and Statistics, pages 886-894. PMLR, 2014." /> <meta name="citation_reference" content="Nickolay Smirnov. Table for estimating the goodness of fit of empirical distributions. The annals of mathematical statistics, 19(2):279-281, 1948." /> <meta name="citation_reference" content="Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. Hashing for similarity search: A survey. CoRR, abs/1408.2927, 2014. URL: https://arxiv.org/abs/1408.2927." /> <meta name="citation_reference" content="Wei Wu, Bin Li, Ling Chen, Junbin Gao, and Chengqi Zhang. A review for weighted MinHash algorithms. IEEE Transactions on Knowledge and Data Engineering, 34(6):2553-2573, 2020." /> <meta name="citation_reference" content="Wei Wu, Bin Li, Ling Chen, Junbin Gao, and Chengqi Zhang. A review for weighted minhash algorithms. IEEE Trans. Knowl. Data Eng., 34(6):2553-2573, 2022. URL: https://doi.org/10.1109/TKDE.2020.3021067." /> <meta name="citation_reference" content="Yun William Yu and Griffin M. Weber. Hyperminhash: Minhash in loglog space. IEEE Trans. Knowl. Data Eng., 34(1):328-339, 2022. URL: https://doi.org/10.1109/TKDE.2020.2981311." /> <meta name="citation_reference" content="XiaoFei Zhao. Bindash, software for fast genome distance estimation on a typical personal laptop. Bioinformatics, 35(4):671-673, 2019." /> <link rel="canonical" href="https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.20" /> <script type="application/json+ld"> {"@context":"https:\/\/schema.org\/","@type":"WebPage","mainEntity":{"@type":"ScholarlyArticle","@id":"#article18150","name":"Improving the Sensitivity of MinHash Through Hash-Value Analysis","abstract":"MinHash sketching is an important algorithm for efficient document retrieval and bioinformatics. We show that the value of the matching MinHash codes convey additional information about the Jaccard similarity of S and T over and above the fact that the MinHash codes agree. This observation holds the potential to increase the sensitivity of minhash-based retrieval systems. We analyze the expected Jaccard similarity of two sets as a function of observing a matching MinHash value a under a reasonable prior distribution on intersection set sizes, and present a practical approach to using MinHash values to improve the sensitivity of traditional Jaccard similarity estimation, based on the Kolmogorov-Smirnov statistical test for sample distributions. Experiments over a wide range of hash function counts and set similarities show a small but consistent improvement over chance at predicting over\/under-estimation, yielding an average accuracy of 61% over the range of experiments.","keywords":["MinHash sketching","sequence similarity","hashing"],"author":[{"@type":"Person","name":"Kucherov, Gregory","givenName":"Gregory","familyName":"Kucherov","email":"mailto:Gregory.Kucherov@univ-eiffel.fr","sameAs":"https:\/\/orcid.org\/0000-0001-5899-5424","affiliation":"LIGM, CNRS\/Universit\u00e9 Gustave Eiffel, Marne-la-Vall\u00e9e, France"},{"@type":"Person","name":"Skiena, Steven","givenName":"Steven","familyName":"Skiena","email":"mailto:skiena@cs.stonybrook.edu","sameAs":"https:\/\/orcid.org\/0000-0003-0397-7514","affiliation":"Dept. of Computer Science, Stony Brook University, Stony Brook, NY, USA","funding":"Partially supported by a travel grant from Universit\u00e9 Paris-Est, and NSF grant IIS-1927227."}],"position":20,"pageStart":"20:1","pageEnd":"20:12","dateCreated":"2023-06-21","datePublished":"2023-06-21","isAccessibleForFree":true,"license":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/legalcode","copyrightHolder":[{"@type":"Person","name":"Kucherov, Gregory","givenName":"Gregory","familyName":"Kucherov","email":"mailto:Gregory.Kucherov@univ-eiffel.fr","sameAs":"https:\/\/orcid.org\/0000-0001-5899-5424","affiliation":"LIGM, CNRS\/Universit\u00e9 Gustave Eiffel, Marne-la-Vall\u00e9e, France"},{"@type":"Person","name":"Skiena, Steven","givenName":"Steven","familyName":"Skiena","email":"mailto:skiena@cs.stonybrook.edu","sameAs":"https:\/\/orcid.org\/0000-0003-0397-7514","affiliation":"Dept. of Computer Science, Stony Brook University, Stony Brook, NY, USA","funding":"Partially supported by a travel grant from Universit\u00e9 Paris-Est, and NSF grant IIS-1927227."}],"copyrightYear":"2023","accessMode":"textual","accessModeSufficient":"textual","creativeWorkStatus":"Published","inLanguage":"en-US","sameAs":"https:\/\/doi.org\/10.4230\/LIPIcs.CPM.2023.20","publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","citation":["https:\/\/doi.org\/10.1186\/s13059-019-1875-0","http:\/\/dl.acm.org\/citation.cfm?id=829502.830043","https:\/\/doi.org\/10.1007\/3-540-45123-4_1","https:\/\/doi.org\/10.21105\/joss.00027","http:\/\/www.cohenwang.com\/edith\/Surveys\/minhash.pdf","https:\/\/doi.org\/10.14778\/3476249.3476276","https:\/\/doi.org\/10.46298\/dmtcs.3545","https:\/\/doi.org\/10.1186\/s13059-019-1841-x","https:\/\/arxiv.org\/abs\/1408.2927","https:\/\/doi.org\/10.1109\/TKDE.2020.3021067","https:\/\/doi.org\/10.1109\/TKDE.2020.2981311"],"isPartOf":{"@type":"PublicationVolume","@id":"#volume18128","volumeNumber":259,"name":"34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)","dateCreated":"2023-06-21","datePublished":"2023-06-21","editor":[{"@type":"Person","name":"Bulteau, Laurent","givenName":"Laurent","familyName":"Bulteau","email":"mailto:laurent.bulteau@univ-eiffel.fr","sameAs":"https:\/\/orcid.org\/0000-0003-1645-9345","affiliation":"LIGM, CNRS, Universit\u00e9 Gustave Eiffel, Marne-la-vall\u00e9e, France"},{"@type":"Person","name":"Lipt\u00e1k, Zsuzsanna","givenName":"Zsuzsanna","familyName":"Lipt\u00e1k","email":"mailto:zsuzsanna.liptak@univr.it","sameAs":"https:\/\/orcid.org\/0000-0002-3233-0691","affiliation":"University of Verona, Italy"}],"isAccessibleForFree":true,"publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","hasPart":"#article18150","isPartOf":{"@type":"Periodical","@id":"#series116","name":"Leibniz International Proceedings in Informatics","issn":"1868-8969","isAccessibleForFree":true,"publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","hasPart":"#volume18128"}}}} </script> <title>Improving the Sensitivity of MinHash Through Hash-Value Analysis</title> <link rel="icon" href="https://drops.dagstuhl.de/favicon.ico" /> <link rel="stylesheet" href="https://drops.dagstuhl.de/css/app.css?drops-core-2024-10-22" /> </head> <body> <nav class="navbar main fixed-top navbar-expand-lg navbar-light bg-light"> <div class="container-fluid"> <a class="navbar-brand" href="https://www.dagstuhl.de"> <img class="lzi-logo" src="https://drops.dagstuhl.de/images/LZI-Logo.jpg" width="84" height=62" alt="Schloss Dagstuhl - LZI - Logo" /> </a> <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation"> <span class="navbar-toggler-icon"></span> </button> <div class="collapse navbar-collapse" id="navbarSupportedContent"> <ul class="navbar-nav me-auto mb-2 mb-lg-0"> <li class="nav-item" style="white-space: nowrap"> <a class="nav-link" href="https://drops.dagstuhl.de"> <i class="bi bi-house large-icon"></i> DROPS </a> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownSeries" role="button" data-bs-toggle="dropdown" aria-expanded="false"> <i class="bi bi-journals large-icon"></i> Series </a> <ul class="dropdown-menu" aria-labelledby="navbarDropdown"> <li><a class="dropdown-item" href="https://drops.dagstuhl.de/entities/series/LIPIcs"> LIPIcs – Leibniz International Proceedings in Informatics </a> </li> <li><a class="dropdown-item" href="https://drops.dagstuhl.de/entities/series/OASIcs"> OASIcs – Open Access Series in Informatics </a> </li> <li><a class="dropdown-item" href="https://drops.dagstuhl.de/entities/series/DFU"> Dagstuhl Follow-Ups </a> </li> <li><a class="dropdown-item" href="https://drops.dagstuhl.de/entities/series/DagAnnRep"> Schloss Dagstuhl Jahresbericht </a> </li> <li class="dropdown-divider"></li> <li> <a class="dropdown-item" href="/#discontinued-series">Discontinued Series</a> </li> </ul> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownJournals" role="button" data-bs-toggle="dropdown" aria-expanded="false"> <i class="bi bi-journal large-icon"></i> Journals </a> <ul class="dropdown-menu" aria-labelledby="navbarDropdown"> <li><a class="dropdown-item" href="https://drops.dagstuhl.de/entities/journal/DARTS"> DARTS – Dagstuhl Artifacts Series </a> </li> <li><a class="dropdown-item" href="https://drops.dagstuhl.de/entities/journal/DagRep"> Dagstuhl Reports </a> </li> <li><a class="dropdown-item" href="https://drops.dagstuhl.de/entities/journal/DagMan"> Dagstuhl Manifestos </a> </li> <li><a class="dropdown-item" href="https://drops.dagstuhl.de/entities/journal/LITES"> LITES – Leibniz Transactions on Embedded Systems </a> </li> <li><a class="dropdown-item" href="https://drops.dagstuhl.de/entities/journal/TGDK"> TGDK – Transactions on Graph Data and Knowledge </a> </li> </ul> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownConferences" role="button" data-bs-toggle="dropdown" aria-expanded="false"> <i class="bi bi-people large-icon"></i> Conferences </a> <ul class="dropdown-menu conference-dropdown" aria-labelledby="navbarDropdown"> <div class="row"> <div class="col-sm-4 nav-conference-col"> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/AFT"> AFT </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/AIB"> AIB </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/AofA"> AofA </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/APPROX"> APPROX </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/ATMOS"> ATMOS </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/CALCO"> CALCO </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/CCC"> CCC </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/CONCUR"> CONCUR </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/COSIT"> COSIT </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/CP"> CP </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/CPM"> CPM </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/CSL"> CSL </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/DISC"> DISC </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/DITAM"> DITAM </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/DNA"> DNA </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/ECOOP"> ECOOP </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/ECRTS"> ECRTS </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/ESA"> ESA </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/FAB"> FAB </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/FMBC"> FMBC </a> </li> </div> <div class="col-sm-4 nav-conference-col"> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/FORC"> FORC </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/FSCD"> FSCD </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/FSTTCS"> FSTTCS </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/FUN"> FUN </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/GD"> GD </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/GIScience"> GIScience </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/ICALP"> ICALP </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/ICDT"> ICDT </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/ICPEC"> ICPEC </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/IPEC"> IPEC </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/iPMVM"> iPMVM </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/ISAAC"> ISAAC </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/ITC"> ITC </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/ITCS"> ITCS </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/ITP"> ITP </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/LDK"> LDK </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/MFCS"> MFCS </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/Microservices"> Microservices </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/NG-RES"> NG-RES </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/OPODIS"> OPODIS </a> </li> </div> <div class="col-sm-4 nav-conference-col"> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/PARMA"> PARMA </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/RANDOM"> RANDOM </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/SAND"> SAND </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/SAT"> SAT </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/SEA"> SEA </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/SLATE"> SLATE </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/SNAPL"> SNAPL </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/SoCG"> SoCG </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/STACS"> STACS </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/SWAT"> SWAT </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/TIME"> TIME </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/Tokenomics"> Tokenomics </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/TQC"> TQC </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/TYPES"> TYPES </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/WABI"> WABI </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/conference/WCET"> WCET </a> </li> </div> </div> </ul> </li> <li class="nav-item"> <a class="nav-link" href="https://drops.dagstuhl.de/docs/about" id="navbarAbout"> <i class="bi bi-info-circle large-icon"></i><span class="nav-text-about-drops"> About DROPS</span> </a> </li> </ul> <form class="navbar-search d-flex" action="https://drops.dagstuhl.de/search" method="post"> <input type="hidden" name="_token" value="Z71zJF4RwHjFhJUwbbN3jA14QHkD9Vh72p72V1Dr" autocomplete="off"> <div class="input-group"> <input class="form-control" type="search" placeholder="Search" aria-label="Search" name="term" autocomplete="off" maxlength="600"> <button class="btn btn-outline-success" type="submit"> <i class="bi bi-search" style="color: #000"></i> </button> </div> </form> <ul class="navbar-nav nav-metadata"> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" id="navbarDropdownMetadata" role="button" data-bs-toggle="dropdown" aria-expanded="false"> <i class="bi bi-database-down large-icon"></i><span class="nav-text-metadata">Metadata Export</span> </a> <ul class="dropdown-menu dropdown-metadata" aria-labelledby="navbarDropdownMetadata"> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/metadata">Metadata Export</a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/oai?verb=Identify" target="_blank">OAI Interface</a> </li> </ul> </li> </ul> </div> </div> </nav> <div id="app" data-release="drops-core-2024-10-22" class="container "> <div id="_top-of-page"></div> <div class="fixed-search-button"><i class="bi bi-search"></i></div> <div class="document-details"> <section class="mt-3 mb-5 title"> <div class="entity-type document"> <div class="row"> <div class="col-lg-3 entity-type-name"> <i class="bi bi-file-earmark"></i> Document <img src="https://drops.dagstuhl.de/images/open-access-logo.png" width="80px" alt="Open Access Logo" style="transform: translate(1em, -0.2em);"/> </div> <div class="col-lg-9"> <input id="doi" type="text" style="position: fixed; top: -50vh" value="https://doi.org/10.4230/LIPIcs.CPM.2023.20"/> <div class="sharing-section"> <span class="permalink"> <span> <code><span class="hide-mobile">https://doi.org/</span>10.4230/LIPIcs.CPM.2023.20</code> <span class="btn btn-clipboard copy-to-clipboard" title="Copy to clipboard" data-selector="doi"> <i class="bi bi-clipboard" aria-hidden="true"></i> <i class="bi bi-check -hidden" aria-hidden="true"></i> </span> </span> </span> <span class="sharing-buttons"> <a class="btn sharing-btn facebook" href="https://facebook.com/sharer/sharer.php?u=https%3A%2F%2Fdoi.org%2F10.4230%2FLIPIcs.CPM.2023.20" target="_blank" rel="noopener" aria-hidden="true"> <i class="bi bi-facebook"></i> </a> <a class="btn sharing-btn reddit" href="https://www.reddit.com/submit?url=https%3A%2F%2Fdoi.org%2F10.4230%2FLIPIcs.CPM.2023.20&title=Improving+the+Sensitivity+of+MinHash+Through+Hash-Value+Analysis" target="_blank" rel="noopener" aria-hidden="true"> <i class="bi bi-reddit"></i> </a> <a class="btn sharing-btn twitter" href="https://twitter.com/intent/tweet/?text=Improving+the+Sensitivity+of+MinHash+Through+Hash-Value+Analysis&url=https%3A%2F%2Fdoi.org%2F10.4230%2FLIPIcs.CPM.2023.20" target="_blank" rel="noopener" aria-hidden="true"> <svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-twitter-x" viewBox="0 0 16 16"> <path d="M12.6.75h2.454l-5.36 6.142L16 15.25h-4.937l-3.867-5.07-4.425 5.07H.316l5.733-6.57L0 .75h5.063l3.495 4.633L12.601.75Zm-.86 13.028h1.36L4.323 2.145H2.865l8.875 11.633Z"/> </svg> </a> <a class="btn sharing-btn email" href="mailto:?subject=Improving+the+Sensitivity+of+MinHash+Through+Hash-Value+Analysis&amp;body=https%3A%2F%2Fdoi.org%2F10.4230%2FLIPIcs.CPM.2023.20" target="_self" rel="noopener" aria-hidden="true"> <svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="currentColor"> <path d="M22 4H2C.9 4 0 4.9 0 6v12c0 1.1.9 2 2 2h20c1.1 0 2-.9 2-2V6c0-1.1-.9-2-2-2zM7.25 14.43l-3.5 2c-.08.05-.17.07-.25.07-.17 0-.34-.1-.43-.25-.14-.24-.06-.55.18-.68l3.5-2c.24-.14.55-.06.68.18.14.24.06.55-.18.68zm4.75.07c-.1 0-.2-.03-.27-.08l-8.5-5.5c-.23-.15-.3-.46-.15-.7.15-.22.46-.3.7-.14L12 13.4l8.23-5.32c.23-.15.54-.08.7.15.14.23.07.54-.16.7l-8.5 5.5c-.08.04-.17.07-.27.07zm8.93 1.75c-.1.16-.26.25-.43.25-.08 0-.17-.02-.25-.07l-3.5-2c-.24-.13-.32-.44-.18-.68s.44-.32.68-.18l3.5 2c.24.13.32.44.18.68z"/> </svg> </a> </span> <span class="sharing-buttons"> <span class="dropdown"> <a class="btn sharing-btn metadata dropdown-toggle" href="#" role="button" data-bs-toggle="dropdown" aria-expanded="false"><i class="bi bi-download"></i></a> <ul class="dropdown-menu dropdown-menu-end"> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.20/metadata/xml"> Export XML </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.20/metadata/acm-xml"> Export ACM-XML </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.20/metadata/doaj-xml"> Export DOAJ-XML </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.20/metadata/schema-org"> Export Schema.org </a> </li> <li> <a class="dropdown-item" href="https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.20/metadata/bibtex"> Export BibTeX </a> </li> </ul> </span> </span> </div> </div> </div> </div> <hr/> <h1>Improving the Sensitivity of MinHash Through Hash-Value Analysis</h1> <h3 class="mt-2 authors"> Authors <a href="#author-details" style="font-size: 0.8em; padding-right: 1em"><i class="bi bi-info-circle"></i></a> <span class="author"> <a href="https://drops.dagstuhl.de/search/documents?author=Kucherov, Gregory" class="name">Gregory Kucherov</a> <a href="https://orcid.org/0000-0001-5899-5424"><img class="orcid-logo" src="https://drops.dagstuhl.de/images/orcid.png" alt="ORCID-Logo"></a><!-- --><!---->, </span> <span class="author"> <a href="https://drops.dagstuhl.de/search/documents?author=Skiena, Steven" class="name">Steven Skiena</a> <a href="https://orcid.org/0000-0003-0397-7514"><img class="orcid-logo" src="https://drops.dagstuhl.de/images/orcid.png" alt="ORCID-Logo"></a><!-- --><!----> </span> </h3> <hr/> <ul class="mt-4"> <li> <span style="margin-right: 10px; ">Part of:</span> <span style="white-space: nowrap"> <i class="bi bi-book-half"></i> Volume: </span> <a href="https://drops.dagstuhl.de/entities/volume/LIPIcs-volume-259">34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023) </a> <br> <span style="margin-right: 10px; visibility: hidden;">Part of:</span> <span style="white-space: nowrap"> <i class="bi bi-journals"></i> Series: </span> <a href="https://drops.dagstuhl.de/entities/series/LIPIcs">Leibniz International Proceedings in Informatics (LIPIcs)</a> <br> <span style="margin-right: 10px; visibility: hidden;">Part of:</span> <span style="white-space: nowrap"> <i class="bi bi-people"></i> Conference: </span> <a href="https://drops.dagstuhl.de/entities/conference/CPM">Annual Symposium on Combinatorial Pattern Matching (CPM)</a> </li> <li class="mt-2">License: &nbsp; <a href="https://creativecommons.org/licenses/by/4.0/legalcode"> <img class="license-logo" src="https://drops.dagstuhl.de/images/cc-by.png" alt="CC-BY Logo"> Creative Commons Attribution 4.0 International license </a> </li> <li>Publication Date: 2023-06-21 </li> </ul> <hr/> </section> <a class="fixed-pdf-button" href="https://drops.dagstuhl.de/storage/00lipics/lipics-vol259-cpm2023/LIPIcs.CPM.2023.20/LIPIcs.CPM.2023.20.pdf"><i class="bi bi-file-earmark-pdf-fill" style="color:red"></i> PDF </a> <div class="row mt-2"> <div class="col-lg-4 mt-2"> <section class="thumbnail"> <a href="https://drops.dagstuhl.de/storage/00lipics/lipics-vol259-cpm2023/LIPIcs.CPM.2023.20/LIPIcs.CPM.2023.20.pdf"><img src="https://drops.dagstuhl.de/storage/00lipics/lipics-vol259-cpm2023/thumbnails/LIPIcs.CPM.2023.20/LIPIcs.CPM.2023.20.png" alt="Thumbnail PDF" /></a> </section> <section class="files mt-5"> <h2>File</h2> <div class="content"> <div> <a href="https://drops.dagstuhl.de/storage/00lipics/lipics-vol259-cpm2023/LIPIcs.CPM.2023.20/LIPIcs.CPM.2023.20.pdf"><i class="bi bi-file-earmark-pdf-fill" style="color:red"></i> LIPIcs.CPM.2023.20.pdf</a> <ul> <li>Filesize: 0.78 MB</li> <li>12 pages</li> </ul> </div> </div> </section> <section class="identifiers mt-3"> <h2>Document Identifiers</h2> <div class="content"> <ul> <li><b>DOI:</b> <a href="https://doi.org/10.4230/LIPIcs.CPM.2023.20">10.4230/LIPIcs.CPM.2023.20</a></li> <li><b>URN:</b> <a href="https://nbn-resolving.org/process-urn-form?identifier=urn:nbn:de:0030-drops-179740&verb=FULL">urn:nbn:de:0030-drops-179740</a></li> </ul> </div> </section> <section class="authors mt-3" id="author-details"> <h2>Author Details</h2> <div class="author person-details"> <div> <span class="name"><b>Gregory Kucherov</b></span> <a href="https://orcid.org/0000-0001-5899-5424"><img class="orcid-logo" src="https://drops.dagstuhl.de/images/orcid.png" alt="ORCID-Logo"></a> <a href="http://igm.univ-mlv.fr/~koutcher/"><i class="bi bi-house"></i></a> <a href="mailto:Gregory.Kucherov@univ-eiffel.fr"><i class="bi bi-envelope"></i></a> <a href="https://drops.dagstuhl.de/search/documents?author=Kucherov, Gregory"><small><i class="bi bi-search"></i></small></a> </div> <ul> <li class="affiliation">LIGM, CNRS/Université Gustave Eiffel, Marne-la-Vallée, France</li> </ul> </div> <div class="author person-details"> <div> <span class="name"><b>Steven Skiena</b></span> <a href="https://orcid.org/0000-0003-0397-7514"><img class="orcid-logo" src="https://drops.dagstuhl.de/images/orcid.png" alt="ORCID-Logo"></a> <a href="https://www.cs.stonybrook.edu/~skiena"><i class="bi bi-house"></i></a> <a href="mailto:skiena@cs.stonybrook.edu"><i class="bi bi-envelope"></i></a> <a href="https://drops.dagstuhl.de/search/documents?author=Skiena, Steven"><small><i class="bi bi-search"></i></small></a> </div> <ul> <li class="affiliation">Dept. of Computer Science, Stony Brook University, Stony Brook, NY, USA</li> </ul> </div> </section> <section class="related-version mt-3"> <h2>Funding</h2> <div class="content"> <ul> <li><b>Skiena, Steven</b>: Partially supported by a travel grant from Université Paris-Est, and NSF grant IIS-1927227.</li> </ul> </div> </section> </div> <div class="col-lg-8 mt-2"> <section class="cite-as mt-3"> <h2>Cite As<span class="btn btn-primary btn-xs" style="float: right" data-bs-toggle="modal" data-bs-target="#bibtex-modal">Get BibTex</span></h2> <div class="content"> Gregory Kucherov and Steven Skiena. Improving the Sensitivity of MinHash Through Hash-Value Analysis. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 20:1-20:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)<br> <a href="https://doi.org/10.4230/LIPIcs.CPM.2023.20" style="word-break: break-all">https://doi.org/10.4230/LIPIcs.CPM.2023.20</a> </div> <div class="modal fade" id="bibtex-modal" tabindex="-1" aria-labelledby="bibtexModalLabel" aria-hidden="true"> <div class="modal-dialog modal-dialog-centered"> <div class="modal-content"> <div class="modal-header"> <h5 class="modal-title" id="bibtexModalLabel">BibTex</h5> <button type="button" class="btn-close" data-bs-dismiss="modal" aria-label="Close"></button> </div> <div class="modal-body"> <pre class="bibtex">@InProceedings{kucherov_et_al:LIPIcs.CPM.2023.20, author = {Kucherov, Gregory and Skiena, Steven}, title = {{Improving the Sensitivity of MinHash Through Hash-Value Analysis}}, booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)}, pages = {20:1--20:12}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-276-1}, ISSN = {1868-8969}, year = {2023}, volume = {259}, editor = {Bulteau, Laurent and Lipt\&#039;{a}k, Zsuzsanna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\&quot;u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.20}, URN = {urn:nbn:de:0030-drops-179740}, doi = {10.4230/LIPIcs.CPM.2023.20}, annote = {Keywords: MinHash sketching, sequence similarity, hashing} }</pre> <div style="overflow: hidden"> <textarea style="position: absolute; top: -400vh" id="bibtex-input">@InProceedings{kucherov_et_al:LIPIcs.CPM.2023.20, author = {Kucherov, Gregory and Skiena, Steven}, title = {{Improving the Sensitivity of MinHash Through Hash-Value Analysis}}, booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)}, pages = {20:1--20:12}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-276-1}, ISSN = {1868-8969}, year = {2023}, volume = {259}, editor = {Bulteau, Laurent and Lipt\&#039;{a}k, Zsuzsanna}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\&quot;u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.20}, URN = {urn:nbn:de:0030-drops-179740}, doi = {10.4230/LIPIcs.CPM.2023.20}, annote = {Keywords: MinHash sketching, sequence similarity, hashing} }</textarea> </div> </div> <div class="modal-footer"> <button type="button" class="btn btn-secondary" data-bs-dismiss="modal">Close</button> <button type="button" class="btn btn-primary copy-to-clipboard" data-selector="bibtex-input"><i class="bi bi-clipboard"></i> Copy BibTex To Clipboard<span class="bi bi-check -hidden" style="padding-left: 1em; font-weight: bold"></span></button> </div> </div> </div> </div> </section> <section class="abstract"> <h2>Abstract</h2> <div class="content"> MinHash sketching is an important algorithm for efficient document retrieval and bioinformatics. We show that the value of the matching MinHash codes convey additional information about the Jaccard similarity of S and T over and above the fact that the MinHash codes agree. This observation holds the potential to increase the sensitivity of minhash-based retrieval systems. We analyze the expected Jaccard similarity of two sets as a function of observing a matching MinHash value a under a reasonable prior distribution on intersection set sizes, and present a practical approach to using MinHash values to improve the sensitivity of traditional Jaccard similarity estimation, based on the Kolmogorov-Smirnov statistical test for sample distributions. Experiments over a wide range of hash function counts and set similarities show a small but consistent improvement over chance at predicting over/under-estimation, yielding an average accuracy of 61% over the range of experiments. </div> </section> <section class="subject-classifications mt-3"> <h2>Subject Classification</h2> <div class="acm-subject-classifications"> <h5>ACM Subject Classification</h5> <div class="content"> <ul> <li>Theory of computation → Bloom filters and hashing</li> </ul> </div> </div> <div class="keywords"> <h5>Keywords</h5> <div class="content"> <ul class="list-style-comma"> <li>MinHash sketching</li> <li>sequence similarity</li> <li>hashing</li> </ul> </div> </div> </section> <section class="metrics mt-3"> <h2>Metrics</h2> <div class="content"> <ul> <li> <a href="#" class="btn-statistics" data-entity="document/10.4230/LIPIcs.CPM.2023.20" data-title="Improving the Sensitivity of MinHash Through Hash-Value Analysis"> <i class="bi bi-graph-up"></i> Access Statistics </a> </li> <li> Total Accesses (updated on a weekly basis) <div class="stats-total"> <div class="stats total-downloads"> <div class="circle"> <div class="number" data-number="0">0</div> </div> <div class="label">PDF Downloads</div> </div> <div class="stats total-metadata-views"> <div class="circle"> <div class="number" data-number="0">0</div> </div> <div class="label">Metadata Views</div> </div> </div> <!-- must be externally available for the series/journal case --> </li> </ul> </div> </section> <div class="offcanvas offcanvas-bottom" tabindex="-1" id="statistics-offcanvas" aria-labelledby="statistics-offcanvas"> <div class="offcanvas-header"> <h5 class="offcanvas-title"></h5> <button type="button" class="btn-close text-reset" data-bs-dismiss="offcanvas" aria-label="Close"></button> </div> <div class="offcanvas-body small" data-context=""> <div style="margin-top: 20vh" class="centered-loader"><div class="loader"></div></div> <iframe class="-hidden"></iframe> </div> </div> <section class="references mt-3"> <h2>References</h2> <div class="content"> <ol> <li> <span> Daniel N. Baker and Ben Langmead. Dashing: fast and accurate genomic distances with HyperLogLog. Genome Biology, 20:265, 2019. URL: <a href="https://doi.org/10.1186/s13059-019-1875-0">https://doi.org/10.1186/s13059-019-1875-0</a>. </span> </li> <li> <span> Andrei Broder. On the resemblance and containment of documents. In Proceedings of the Compression and Complexity of Sequences 1997, SEQUENCES '97, pages 21-, Washington, DC, USA, 1997. IEEE Computer Society. URL: <a href="http://dl.acm.org/citation.cfm?id=829502.830043">http://dl.acm.org/citation.cfm?id=829502.830043</a>. </span> </li> <li> <span> Andrei Z. Broder. Identifying and filtering near-duplicate documents. In Combinatorial Pattern Matching, 11th Annual Symposium, CPM 2000, Montreal, Canada, June 21-23, 2000, Proceedings, pages 1-10, 2000. URL: <a href="https://doi.org/10.1007/3-540-45123-4_1">https://doi.org/10.1007/3-540-45123-4_1</a>. </span> </li> <li> <span> Andrei Z Broder, Moses Charikar, Alan M Frieze, and Michael Mitzenmacher. Min-wise independent permutations. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 327-336, 1998. <a href="https://scholar.google.com/scholar?hl=en&q=Andrei Z Broder, Moses Charikar, Alan M Frieze, and Michael Mitzenmacher. Min-wise independent permutations. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 327-336, 1998." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> C. Titus Brown and Luiz Irber. Sourmash: a library for minhash sketching of dna. Journal of Open Source Software, 1(5):27, 2016. URL: <a href="https://doi.org/10.21105/joss.00027">https://doi.org/10.21105/joss.00027</a>. </span> </li> <li> <span> Moses S Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 380-388, 2002. <a href="https://scholar.google.com/scholar?hl=en&q=Moses S Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 380-388, 2002." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Lianhua Chi, Bin Li, and Xingquan Zhu. Context-preserving hashing for fast text classification. In Proceedings of the 2014 SIAM International Conference on Data Mining, pages 100-108. SIAM, 2014. <a href="https://scholar.google.com/scholar?hl=en&q=Lianhua Chi, Bin Li, and Xingquan Zhu. Context-preserving hashing for fast text classification. In Proceedings of the 2014 SIAM International Conference on Data Mining, pages 100-108. SIAM, 2014." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. On compressing social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 219-228, 2009. <a href="https://scholar.google.com/scholar?hl=en&q=Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. On compressing social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 219-228, 2009." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Edith Cohen. Min-hash sketches: A brief survey, 2016. URL: <a href="http://www.cohenwang.com/edith/Surveys/minhash.pdf">http://www.cohenwang.com/edith/Surveys/minhash.pdf</a>. </span> </li> <li> <span> Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry, pages 253-262, 2004. <a href="https://scholar.google.com/scholar?hl=en&q=Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry, pages 253-262, 2004." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Otmar Ertl. Setsketch: Filling the gap between minhash and hyperloglog. Proc. VLDB Endow., 14(11):2244-2257, 2021. URL: <a href="https://doi.org/10.14778/3476249.3476276">https://doi.org/10.14778/3476249.3476276</a>. </span> </li> <li> <span> Dennis Fetterly, Mark Manasse, Marc Najork, and Janet Wiener. A large-scale study of the evolution of web pages. In Proceedings of the 12th international conference on World Wide Web, pages 669-678, 2003. <a href="https://scholar.google.com/scholar?hl=en&q=Dennis Fetterly, Mark Manasse, Marc Najork, and Janet Wiener. A large-scale study of the evolution of web pages. In Proceedings of the 12th international conference on World Wide Web, pages 669-678, 2003." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. Discrete Mathematics & Theoretical Computer Science, DMTCS Proceedings vol. AH, 2007 Conference on Analysis of Algorithms (AofA 07), January 2007. URL: <a href="https://doi.org/10.46298/dmtcs.3545">https://doi.org/10.46298/dmtcs.3545</a>. </span> </li> <li> <span> Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al. Similarity search in high dimensions via hashing. In Vldb, volume 99, pages 518-529, 1999. <a href="https://scholar.google.com/scholar?hl=en&q=Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al. Similarity search in high dimensions via hashing. In Vldb, volume 99, pages 518-529, 1999." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Monika Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 284-291, 2006. <a href="https://scholar.google.com/scholar?hl=en&q=Monika Henzinger. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 284-291, 2006." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604-613, 1998. <a href="https://scholar.google.com/scholar?hl=en&q=Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604-613, 1998." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Nitin Jindal and Bing Liu. Opinion spam and analysis. In Proceedings of the 2008 international conference on web search and data mining, pages 219-230, 2008. <a href="https://scholar.google.com/scholar?hl=en&q=Nitin Jindal and Bing Liu. Opinion spam and analysis. In Proceedings of the 2008 international conference on web search and data mining, pages 219-230, 2008." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Andrey Kolmogorov. Sulla determinazione empirica di una lgge di distribuzione. Inst. Ital. Attuari, Giorn., 4:83-91, 1933. <a href="https://scholar.google.com/scholar?hl=en&q=Andrey Kolmogorov. Sulla determinazione empirica di una lgge di distribuzione. Inst. Ital. Attuari, Giorn., 4:83-91, 1933." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Ping Li, Anshumali Shrivastava, Joshua Moore, and Arnd König. Hashing algorithms for large-scale learning. Advances in neural information processing systems, 24, 2011. <a href="https://scholar.google.com/scholar?hl=en&q=Ping Li, Anshumali Shrivastava, Joshua Moore, and Arnd König. Hashing algorithms for large-scale learning. Advances in neural information processing systems, 24, 2011." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Brian D. Ondov, Gabriel Starrett, Anna Sappington, Aleksandra Kostic, Sergey Koren, Christopher B. Buck, and Adam M. Phillippy. Mash screen: high-throughput sequence containment estimation for genome discovery. Genome Biology, 20:232, 2019. URL: <a href="https://doi.org/10.1186/s13059-019-1841-x">https://doi.org/10.1186/s13059-019-1841-x</a>. </span> </li> <li> <span> Brian D Ondov, Todd J Treangen, Páll Melsted, Adam B Mallonee, Nicholas H Bergman, Sergey Koren, and Adam M Phillippy. Mash: fast genome and metagenome distance estimation using minhash. Genome Biology, 17(1):1-14, 2016. <a href="https://scholar.google.com/scholar?hl=en&q=Brian D Ondov, Todd J Treangen, Páll Melsted, Adam B Mallonee, Nicholas H Bergman, Sergey Koren, and Adam M Phillippy. Mash: fast genome and metagenome distance estimation using minhash. Genome Biology, 17(1):1-14, 2016." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Anshumali Shrivastava and Ping Li. In defense of MinHash over SimHash. In Artificial Intelligence and Statistics, pages 886-894. PMLR, 2014. <a href="https://scholar.google.com/scholar?hl=en&q=Anshumali Shrivastava and Ping Li. In defense of MinHash over SimHash. In Artificial Intelligence and Statistics, pages 886-894. PMLR, 2014." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Nickolay Smirnov. Table for estimating the goodness of fit of empirical distributions. The annals of mathematical statistics, 19(2):279-281, 1948. <a href="https://scholar.google.com/scholar?hl=en&q=Nickolay Smirnov. Table for estimating the goodness of fit of empirical distributions. The annals of mathematical statistics, 19(2):279-281, 1948." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. Hashing for similarity search: A survey. CoRR, abs/1408.2927, 2014. URL: <a href="https://arxiv.org/abs/1408.2927">https://arxiv.org/abs/1408.2927</a>. </span> </li> <li> <span> Wei Wu, Bin Li, Ling Chen, Junbin Gao, and Chengqi Zhang. A review for weighted MinHash algorithms. IEEE Transactions on Knowledge and Data Engineering, 34(6):2553-2573, 2020. <a href="https://scholar.google.com/scholar?hl=en&q=Wei Wu, Bin Li, Ling Chen, Junbin Gao, and Chengqi Zhang. A review for weighted MinHash algorithms. IEEE Transactions on Knowledge and Data Engineering, 34(6):2553-2573, 2020." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> <li> <span> Wei Wu, Bin Li, Ling Chen, Junbin Gao, and Chengqi Zhang. A review for weighted minhash algorithms. IEEE Trans. Knowl. Data Eng., 34(6):2553-2573, 2022. URL: <a href="https://doi.org/10.1109/TKDE.2020.3021067">https://doi.org/10.1109/TKDE.2020.3021067</a>. </span> </li> <li> <span> Yun William Yu and Griffin M. Weber. Hyperminhash: Minhash in loglog space. IEEE Trans. Knowl. Data Eng., 34(1):328-339, 2022. URL: <a href="https://doi.org/10.1109/TKDE.2020.2981311">https://doi.org/10.1109/TKDE.2020.2981311</a>. </span> </li> <li> <span> XiaoFei Zhao. Bindash, software for fast genome distance estimation on a typical personal laptop. Bioinformatics, 35(4):671-673, 2019. <a href="https://scholar.google.com/scholar?hl=en&q=XiaoFei Zhao. Bindash, software for fast genome distance estimation on a typical personal laptop. Bioinformatics, 35(4):671-673, 2019." target="_blank" title="Google Scholar"><img style="opacity: 0.5" src="https://drops.dagstuhl.de/images/google-scholar.dark.16x16.png" alt="Google Scholar"/></a> </span> </li> </ol> </div> </section> </div> </div> </div> </div> <span class="_feedback-button"> <i class="bi bi-question-circle"></i> <span class="text">Questions / Remarks / Feedback</span> </span> <div class="_feedback-form -hidden"> <span class="_feedback-close">X</span> <p>Feedback for Dagstuhl Publishing</p> <div> <textarea class="form-control" name="_feedback"></textarea> <input class="form-control" type="text" name="name" autocomplete="off" placeholder="Name (optional)" maxlength="60" /> <input class="form-control" type="email" name="email" autocomplete="off" placeholder="Email (optional)" maxlength="60" /> <br/> <button class="btn btn-sm btn-default">Send</button> </div> </div> <div class="alert alert-success _feedback-success -hidden"> <span class="glyphicon glyphicon-ok"></span> <h3>Thanks for your feedback!</h3> <div>Feedback submitted</div> <button class="btn btn-white _feedback-done">OK</button> </div> <div class="alert alert-danger _feedback-error -hidden"> <span class="glyphicon glyphicon-remove"></span> <h3>Could not send message</h3> <div>Please try again later or send an <a href="mailto:publishing@dagstuhl.de">E-mail</a></div> <button class="btn btn-white _feedback-done">OK</button> </div> <a class="scroll-up-button -hidden" href="#_top-of-page"> <i class="bi bi-arrow-up-circle"></i> </a> <footer class="page-footer dark"> <div class="container"> <h5>About DROPS</h5> <p>Schloss Dagstuhl - Leibniz Center for Informatics has been operating the Dagstuhl Research Online Publication Server (short: DROPS) since 2004. DROPS enables publication of the latest research findings in a fast, uncomplicated manner, in addition to providing unimpeded, open access to them. All the requisite metadata on each publication is administered in accordance with general guidelines pertaining to online publications (cf. Dublin Core). This enables the online publications to be authorized for citation and made accessible to a wide readership on a permanent basis. Access is free of charge for readers following the open access idea which fosters unimpeded access to scientific publications. </p> </div> <div class="container"> <div class="row"> <div class="col-lg-6"> <h5>Instructions for Authors</h5> <div class="row"> <div class="col-sm-6"> <b>Dagstuhl Series</b><br> <ul> <li><a href="https://submission.dagstuhl.de/series/details/lipics#author">LIPIcs</a></li> <li><a href="https://submission.dagstuhl.de/series/details/oasics#author">OASIcs</a></li> <li><a href="https://submission.dagstuhl.de/series/details/dfu#author">Dagstuhl Follow-Ups</a></li> </ul> </div> <div class="col-sm-6"> <b>Dagstuhl Journals</b><br> <ul> <li><a href="https://submission.dagstuhl.de/series/details/darts#author">DARTS – Dagstuhl Artifacts Series</a></li> <li><a href="https://submission.dagstuhl.de/series/details/dagrep#author">Dagstuhl Reports</a></li> <li><a href="https://submission.dagstuhl.de/series/details/dagman#author">Dagstuhl Manifestos</a></li> <li><a href="https://submission.dagstuhl.de/series/details/lites#author">LITES</a></li> <li><a href="https://submission.dagstuhl.de/series/details/tgdk#author">TGDK – Transactions on Graph Data and Knowledge</a></li> </ul> </div> </div> </div> <div class="col-lg-6"> <h5>Instructions for Editors</h5> <div class="row"> <div class="col-sm-6"> <b>Dagstuhl Series</b><br> <ul> <li><a href="https://submission.dagstuhl.de/series/details/lipics#editor">LIPIcs</a></li> <li><a href="https://submission.dagstuhl.de/series/details/oasics#editor">OASIcs</a></li> <li><a href="https://submission.dagstuhl.de/series/details/dfu#editor">Dagstuhl Follow-Ups</a></li> </ul> </div> <div class="col-sm-6"> <b>Dagstuhl Journals</b><br> <ul> <li><a href="https://submission.dagstuhl.de/series/details/darts#editor">DARTS – Dagstuhl Artifacts Series</a></li> <li><a href="https://submission.dagstuhl.de/series/details/dagrep#editor">Dagstuhl Reports</a></li> <li><a href="https://submission.dagstuhl.de/series/details/dagman#editor">Dagstuhl Manifestos</a></li> <li><a href="https://submission.dagstuhl.de/series/details/lites#editor">LITES</a></li> <li><a href="https://submission.dagstuhl.de/series/details/tgdk#editor">TGDK – Transactions on Graph Data and Knowledge</a></li> </ul> </div> </div> </div> </div> </div> </footer> <div class="copyright"> &copy; 2023-2024 <a href="https://www.dagstuhl.de">Schloss Dagstuhl – LZI GmbH</a> <a href="https://drops.dagstuhl.de/docs/imprint">Imprint</a> <a href="https://drops.dagstuhl.de/docs/privacy">Privacy</a> <a href="https://www.dagstuhl.de/en/publishing/team">Contact</a> </div> <script type="text/javascript" src="https://drops.dagstuhl.de/js/jquery-3.6.0.min.js"></script> <script type="text/javascript" src="https://drops.dagstuhl.de/js/app.js?drops-core-2024-10-22"></script> <script type="text/javascript" src="https://drops.dagstuhl.de/js/popper.min.js"></script> <script type="text/javascript" src="https://drops.dagstuhl.de/js/circle-progress.js"></script> <script type="text/javascript"> $(document).ready(function() { const view = { statsServer: 'https://drops-stats.dagstuhl.de', animationStarted: false, isScrolledIntoView: function (elem) { const rect = elem.getBoundingClientRect(); const elemTop = rect.top; //const elemBottom = rect.bottom; // const elemHeight = rect.height; return (elemTop >= 0) && (elemTop <= window.innerHeight); }, progressCircle: function ($el) { $el.find('.circle').circleProgress({ value: 1, size: 80, fill: { color: '#555' }, animation: { duration: 1200 } }); }, countUp: function($el) { $el.find('.number').each(function() { const $this = $(this); const number = parseInt($this.attr('data-number')); let suffix = ''; if (number > 90000) { $this.text(Math.ceil(number/1000)+' k'); suffix = ' k'; } else if (number > 90000000) { $this.text(Math.ceil(number/1000000)+' m'); suffix = ' m'; } else { $this.text(Math.ceil(number)) } $({ Counter: 0 }).animate({ Counter: $this.text().replace(suffix, '').replace(',', '') }, { duration: 1000, easing: 'swing', step: function() { $this.text(Math.ceil(this.Counter).toString().replace(/\B(?=(\d{3})+(?!\d))/g, ",") + suffix); } }); }); }, checkVisibility: function() { const $container = $('.stats-total'); if (view.isScrolledIntoView($container[0])) { if (!view.animationStarted) { view.animationStarted = true; view.progressCircle($container); setTimeout(function() { view.countUp($container) }, 200); } } }, initialize: function() { $.ajax({ type: 'get', url: view.statsServer + '/api/external/drops2/document/10.4230/LIPIcs.CPM.2023.20/-/stats', success: function (result) { $('.total-downloads .number').attr('data-number', result.total.downloads); $('.total-metadata-views .number').attr('data-number', result.total.clicks); window.addEventListener('scroll', view.checkVisibility); view.checkVisibility(); }, error: function () { $('.total-downloads .number').text(0); $('.total-metadata-views .number').text(0); window.addEventListener('scroll', view.checkVisibility); view.checkVisibility(); } }); } }; view.initialize(); }); </script> <script type="text/javascript"> $(document).ready(function() { const statistics = { statsServer: 'https://drops-stats.dagstuhl.de', showStatistics: function(e) { e.preventDefault(); const $offCanvas = $('#statistics-offcanvas'); const $iframe = $offCanvas.find('iframe'); const $loader = $offCanvas.find('.centered-loader'); $iframe.addClass('-hidden'); $loader.removeClass('-hidden'); $iframe[0].onload = function () { $iframe.removeClass('-hidden'); $loader.addClass('-hidden'); }.bind(this); const modeParameter = ''; const $target = $(e.currentTarget); const entityId = $target.attr('data-entity'); const title = $target.attr('data-title'); const embedUrl = statistics.statsServer + '/embed/external/drops2/' + entityId + modeParameter; let context = $offCanvas.find('.offcanvas-body').attr('data-context'); if (context === title) { context = ''; } if (context !== '') { context += '<br>'; } $offCanvas.find('.offcanvas-title').html(context + '<h2 style="font-weight: bold">' + title + '</h2>'); $offCanvas.offcanvas('show'); $offCanvas.find('iframe').attr('src', embedUrl); return false; }, initialize: function() { $('.btn-statistics').on('click', this.showStatistics); } }; statistics.initialize(); }); </script> <script type="text/javascript"> function _enableFeedback() { var $feedbackButton = $('._feedback-button, .fixed-beta-button'); var $feedbackClose = $('._feedback-close'); var $feedbackForm = $('._feedback-form'); var $feedbackSubmit = $('._feedback-form button'); var $textarea = $feedbackForm.find('textarea'); var $feedbackSuccess = $('._feedback-success'); var $feedbackError = $('._feedback-error'); var $feedbackDoneButton = $('._feedback-done'); $feedbackButton.addClass('_show'); $feedbackButton.on('click', function () { $feedbackButton.addClass('-hidden'); $feedbackForm.removeClass('-hidden'); $textarea.focus(); }); $feedbackClose.on('click', function () { $feedbackSuccess.addClass('-hidden'); $feedbackForm.addClass('-hidden'); $feedbackButton.removeClass('-hidden'); }); $feedbackDoneButton.on('click', function () { $feedbackError.addClass('-hidden'); $feedbackSuccess.addClass('-hidden'); $feedbackForm.addClass('-hidden'); $feedbackButton.removeClass('-hidden'); }); $feedbackSubmit.on('click', function () { var message = $textarea.val(); if (message === undefined) { message = ''; } if (message.trim() !== '') { $.ajax({ url: '/api/v1/feedback', type: 'post', data: { content: message, context: window.location.href, name: $('input[name="name"]').val(), email: $('input[name="email"]').val(), }, success: function (result) { if (result === 'success') { $textarea.val(''); $feedbackSuccess.removeClass('-hidden'); } else { $feedbackError.removeClass('-hidden'); } }, error: function () { $feedbackError.removeClass('-hidden'); } }); } }); } var _defer_counter = 0; function _defer(method) { if (window.jQuery) { method(); } else { if (_defer_counter < 20) { setTimeout(function () { _defer(method); console.log(_defer_counter); _defer_counter++; }, 500); } } } setTimeout(function() { _defer(_enableFeedback); }, 1000); </script> <script type="text/javascript"> $(document).ready(function() { const app = { maxScrollPos: window.innerWidth < 500 ? 100 // mobile : 400, // desktop $el : { navbarSearch: $('nav .navbar-search'), fixedSearchButton: $('.fixed-search-button'), copyToClipboard: $('.copy-to-clipboard') }, methods: { hideMenuOnScroll: function(scrollPos) { const $menu = $('nav.navbar.main'); const $stickySearch = $('.search.sticky'); const $banner = $('#_banner'); const $fixedSearchButton = $('.fixed-search-button'); if (scrollPos > app.maxScrollPos) { $menu.addClass('-hide'); $banner.addClass('-hide'); $stickySearch.addClass('-top'); $fixedSearchButton.addClass('-show'); } else { app.methods.showMenu(null, false); } }, showMenu: function(e, focus) { const $menu = $('nav.navbar.main'); const $stickySearch = $('.search.sticky'); const $banner = $('#_banner'); const $fixedSearchButton = $('.fixed-search-button'); $menu.removeClass('-hide'); $banner.removeClass('-hide'); $stickySearch.removeClass('-top'); $fixedSearchButton.removeClass('-show'); if (focus !== false) { $('.navbar-search').find('input[name="term"]').focus(); } }, showUpLinkOnScroll: function(scrollPos) { const $upLink = $('.scroll-up-button'); if (scrollPos > app.maxScrollPos) { $upLink.removeClass('-hidden'); } else { $upLink.addClass('-hidden'); } }, scrollHandler: function() { const scrollPos = $(document).scrollTop(); app.methods.hideMenuOnScroll(scrollPos); app.methods.showUpLinkOnScroll(scrollPos); }, copyToClipboard: function(e) { e.preventDefault(); const $current = $(e.currentTarget); // console.log($current.attr('data-selector')); const element = $('#'+$current.attr('data-selector'))[0]; console.log(element); element.select(); document.execCommand("copy"); const $success = $current.find('.bi-check'); $success.removeClass('-hidden'); setTimeout(function() { $success.addClass('-hidden'); }, 1000); }, initTooltips: function() { const tooltipTriggerList = [].slice.call(document.querySelectorAll('[data-bs-toggle="tooltip"]')) const tooltipList = tooltipTriggerList.map(function (tooltipTriggerEl) { return new bootstrap.Tooltip(tooltipTriggerEl) }); }, expandSearch: function() { app.$el.navbarSearch.addClass('expanded'); }, collapseSearch: function(e) { if (!$(e.currentTarget).is('button.btn-outline-success')) { app.$el.navbarSearch.removeClass('expanded'); } }, initDeepLinksTabs: function() { const innerAnchors = [ 'resource-articles', 'cfp-si-resources', 'cfp-si-autonomous-systems-and-knowledge-graphs' ]; let anchors = []; $('a[role="tab"]').each(function() { anchors.push($(this).attr('aria-controls')); }); innerAnchors.forEach(function(anchor) { if ($('#'+anchor).length > 0) { anchors.push(anchor); } }); if (anchors.length === 0) { return; } let selectedAnchor = window.location.hash.substring(1); if (anchors.indexOf(selectedAnchor) === -1) { $('#publications-tab').tab('show'); window.scrollTo(0,0); return; } if (selectedAnchor !== 'publications') { const innerAnchorIndex = innerAnchors.indexOf(selectedAnchor); // anchor sits inside tab -> open the tab first, then scroll to anchor if (innerAnchorIndex > -1) { const $tab = $('#'+selectedAnchor).closest('.tab-pane'); console.log($tab); try { $('#'+$tab.attr('id')+'-tab').tab('show'); } catch (e) { } setTimeout(function () { $('#' + selectedAnchor)[0].scrollIntoView(); }, 800); } else { try { $('#' + selectedAnchor + '-tab').tab('show'); } catch (e) { } window.scrollTo(0, 0); } } $('[data-tab-link]').on('click', function(e) { const $target = $(e.currentTarget); let href = $target.attr('href'); let scrollLink = null; if (href === '#cfp-si-list') { scrollLink = href; href = '#cfp'; } try { console.log(href); $(href+'-tab').tab('show'); } catch(e) { } console.log(scrollLink); if (scrollLink !== null) { setTimeout(function() { $(scrollLink)[0].scrollIntoView(); }, 200); } else { setTimeout(function() { window.scrollTo(0,0); }, 200); } }); } }, initialize: function() { $(window).on('scroll', app.methods.scrollHandler); $(window).on('scroll-to-top', function() { window.scrollTo(0,0) }); $(document).trigger('scroll'); // set correct status for scroll-related parts (navbar/back-to-top) on page reload app.$el.copyToClipboard.on('click', app.methods.copyToClipboard); app.$el.fixedSearchButton.on('click', app.methods.showMenu); app.$el.navbarSearch.find('input').on('click', app.methods.expandSearch) app.$el.navbarSearch.find('input').on('blur', app.methods.collapseSearch); app.methods.initTooltips(); app.methods.initDeepLinksTabs(); } }; app.initialize(); }); </script> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10