CINXE.COM
Algorithm | Europe PMC Tech Blog
<!DOCTYPE html> <html lang="en-US"> <head> <meta charset="UTF-8"> <link type="application/atom+xml" rel="alternate" href="https://europepmc.github.io/techblog/feed.xml" title="Europe PMC Tech Blog" /> <!-- Begin Jekyll SEO tag v2.7.1 --> <title>Algorithm | Europe PMC Tech Blog</title> <meta name="generator" content="Jekyll v3.9.0" /> <meta property="og:title" content="Algorithm" /> <meta property="og:locale" content="en_US" /> <meta name="description" content="Techy stuff from Europe PMC" /> <meta property="og:description" content="Techy stuff from Europe PMC" /> <link rel="canonical" href="https://europepmc.github.io/techblog/categories/algorithm/" /> <meta property="og:url" content="https://europepmc.github.io/techblog/categories/algorithm/" /> <meta property="og:site_name" content="Europe PMC Tech Blog" /> <meta property="og:type" content="article" /> <meta property="article:published_time" content="2022-03-01T15:22:13+00:00" /> <meta name="twitter:card" content="summary" /> <meta property="twitter:title" content="Algorithm" /> <script type="application/ld+json"> {"description":"Techy stuff from Europe PMC","headline":"Algorithm","url":"https://europepmc.github.io/techblog/categories/algorithm/","dateModified":"2022-03-01T15:22:13+00:00","@type":"BlogPosting","datePublished":"2022-03-01T15:22:13+00:00","mainEntityOfPage":{"@type":"WebPage","@id":"https://europepmc.github.io/techblog/categories/algorithm/"},"@context":"https://schema.org"}</script> <!-- End Jekyll SEO tag --> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="theme-color" content="#157878"> <link href='https://fonts.googleapis.com/css?family=Open+Sans:400,700' rel='stylesheet' type='text/css'> <link rel="stylesheet" href="/techblog/assets/css/style.css?v=99af2fde682c7a20c7f4457b97c1b3da781ebef0"> <link rel="shortcut icon" type="image/x-icon" href="/techblog/favicon.ico"> <!-- Twitter cards --> <meta name="twitter:site" content="@EuropePMC_news"> <meta name="twitter:creator" content="@"> <meta name="twitter:title" content="Algorithm"> <meta name="twitter:description" content="Techy stuff from Europe PMC"> <meta name="twitter:card" content="summary"> <meta name="twitter:image" content="https://europepmc.github.io/techblog/images/epmc-techblog_larger.png"> <!-- end of Twitter cards --> </head> <body> <section class="page-header"> <h1 class="project-name"><a href="/techblog/"><img src="/techblog/images/epmc-techblog.png" style="border:0">Europe PMC Tech Blog</a></h1> <!--<h2 class="project-tagline">Techy stuff from Europe PMC</h2> <a href="https://github.com/EuropePMC/techblog" class="btn">View on GitHub</a> --> <a href="/techblog/" class="btn">Home</a> <a href="/techblog/about" class="btn">About</a> <a href="/techblog/categories" class="btn">Post Categories</a> <a href="https://europepmc.org" class="btn epmc" target="_blank">Europe PMC</a> </section> <section class="main-content"> <h1>Posts in Category: Algorithm</h1> <section class="post-excerpt"> <p class="post-excerpt-date">04 Jul 2018</p><h2 class="post-excerpt-title"><a href="/techblog/algorithm/2018/07/04/locating-text-html-pages.html">A perfect match: locating plain text in HTML pages</a></h2> <p><a href="https://europepmc.org/Annotations">SciLite</a> is a Europe PMC tool that allows biological terms or relations, such as diseases, chemicals or protein interactions, to be highlighted for readers on abstracts and full text articles. These terms are identified as annotations by text mining algorithms, developed by a variety of text mining groups.</p> <p>The main challenge for the SciLite tool is locating plain text annotations in HTML pages. The challenges derive from the nature of HTML pages. Below is a list of the major challenges we faced and the solutions adopted to mitigate them.</p> <ol> <li><strong>The pages contain HTML tags, obviously.</strong> For example, <a href="https://europepmc.org/articles/PMC1215513">visit this article</a>, and click on the “Gene Function” checkbox, on the right-hand side of the page, to see the sentence highlighted. <br /><br /> <a href="/techblog/images/posts/locating-text-html-pages/fuzzy_match_PMC1215513.png"><img src="/techblog/images/posts/locating-text-html-pages/fuzzy_match_PMC1215513.png" alt="Annotation containing HTML tags" /></a> <em><strong>Figure 1</strong>: Annotation containing HTML tags</em><br /><br /></li> </ol> <div class="excerpt-bottom"> <p class="continue"><a href="/techblog/algorithm/2018/07/04/locating-text-html-pages.html">Continue reading ›</a></p> <p class="metadata">Author: <a href="https://github.com/ftalo" target ="_blank">Francesco Talo'</a></p> </div> </section> <footer class="site-footer"> <span class="site-footer-owner"><a href="https://github.com/EuropePMC/techblog">techblog</a> is maintained by <a href="https://github.com/EuropePMC">EuropePMC</a>.</span> <span class="site-footer-credits">This page was generated by <a href="https://pages.github.com">GitHub Pages</a>.</span> </footer> </section> <script type="text/javascript"> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-114538088-1', 'auto'); ga('send', 'pageview'); </script> </body> </html>