Tagging and Disambiguating KBpedia Knowledge Graph Concepts

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">    <html lang="en">  <head> <title>Tagging and Disambiguating KBpedia Knowledge Graph Concepts</title> <link rel="alternate" type="application/rss+xml" title="" href="/resources/feeds/news.xml" />  <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="description" content=""> <meta name="author" content="">  <link rel="shortcut icon" href="/favicon.ico">  <link rel="stylesheet" href="/assets/plugins/bootstrap/css/bootstrap.min.css"> <link rel="stylesheet" href="/assets/css/style.css">  <link rel="stylesheet" href="/assets/plugins/line-icons/line-icons.css"> <link rel="stylesheet" href="/assets/plugins/font-awesome/css/font-awesome.min.css">  <link rel="stylesheet" href="/assets/css/theme-colors/blue.css"> <link rel="stylesheet" href="/assets/css/footers/footer-v1.css"> <link rel="stylesheet" href="/assets/css/headers/header-v1.css">  <link rel="stylesheet" href="/css/custom.css"> <meta property="og:site_name" content="KBpedia"/> <meta property="og:title" content="Machine Learning Use Cases: Tagging and Disambiguating KBpedia Knowledge Graph Concepts"> <meta property="og:type" content="article"/> <meta property="og:description" content="This use case describes a fast method for disambiguating concepts, when more than one possible match is provided."> <meta property="og:image" content="/imgs/kbpedia-logo-350.png"> <meta property="og:image:width" content="350"> <meta property="og:image:height" content="390"> </head> <body> <div class="wrapper"> <div class="header-v1">  <div class="topbar"> <div class="container"> <div class="col-md-1"> </div> <a class="navbar-brand" href="/"> <img id="logo-header" src="/imgs/kbpedia-logo-420-horz.png" height="75" alt="KBpedia Knowledge Structure" name="logo-header"> </a> </div> </div>  <div class="navbar navbar-default mega-menu" role="navigation"> <div class="container">  <div class="navbar-header"> <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse"> Toggle navigation </button> </div> <div style="clear:both; height: 1px;">   </div>  <div class="col-md-1">   </div> <div class="collapse navbar-collapse navbar-responsive-collapse col-md-10"> <ul class="nav navbar-nav pull-left">  <li> <a href="/">Home</a> </li>  <li> <a href="/knowledge-graph/">knowledge Graph</a> </li> <li> <a href="http://sparql.kbpedia.org/">SPARQL</a> </li>  <li class="dropdown"> <a href="/background/">Background</a> <ul class="dropdown-menu"> <li> <a href="/background/overview/">Overview</a> </li> <li> <a href="/background/features-and-benefits/">Features & Benefits</a> </li> <li> <a href="/background/data-and-knowledge-structures/">Data and Knowledge Structures</a> </li> <li> <a href="/background/technology/">Technology</a> </li> <li> <a href="/background/machine-learning/">Machine Learning</a> </li> <li> <a href="/background/uses/">Uses</a> </li> </ul> </li> <li class="dropdown"> <a href="/use-cases/">Use Cases</a> <ul class="dropdown-menu"> <li class="dropdown-submenu"> <a href="/use-cases/knowledge-graph/">Knowledge Graph (KG)</a> <ul class="dropdown-menu"> <li><a href="/use-cases/browse-the-knowledge-graph/">Browse the Knowledge Graph</a></li> <li><a href="/use-cases/search-the-knowledge-graph/">Search the Knowledge Graph</a></li> <li><a href="/use-cases/expand-queries-using-semsets/">Expand Queries Using Semsets</a></li> <li><a href="/use-cases/use-and-control-of-inferencing/">Uses and Control of Inferencing</a></li> </ul> </li> <li class="dropdown-submenu"> <a href="/use-cases/machine-learning-use-case/">Machine Learning (KBAI)</a> <ul class="dropdown-menu"> <li><a href="/use-cases/text-classification-using-esa-and-svm/">Create Supervised Learning Training Sets</a></li> <li><a href="/use-cases/document-specific-word2vec-training-corpuses/">Create Word Embedding Corpuses</a></li> <li><a href="/use-cases/extending-kbpedia-with-kbpedia-categories/">Create Graph Embedding Corpuses</a></li> <li><a href="/use-cases/text-classification-using-esa-and-svm/">Classify Text</a></li> <li><a href="/use-cases/dynamic-machine-learning/">Create 'Gold Standards' for Tuning Learners</a></li> <li><a href="/use-cases/disambiguating-kbpedia-knowledge-graph-concepts/">Disambiguate KG Concepts</a></li> <li><a href="/use-cases/dynamic-machine-learning/">Dynamic Machine Learning Using the KG</a></li> </ul> </li> <li class="dropdown-submenu"> <a href="/use-cases/mapping-use-case/">Mapping</a> <ul class="dropdown-menu"> <li><a href="/use-cases/mapping-external-data-and-schema/">Map Concepts</a></li> <li><a href="/use-cases/extending-kbpedia-with-kbpedia-categories/">Extend KBpedia with Wikipedia</a></li> <li><a href="/use-cases/benefits-from-extending-kbpedia-with-private-datasets/">Extend KBpedia for Domains</a></li> <li><a href="/use-cases/mapping-external-data-and-schema/">General Use of the Mapper</a></li> </ul> </li> </ul> </li> <li class="dropdown"> <a href="/resources/">Resources</a> <ul class="dropdown-menu"> <li><a href="/resources/downloads/">Download KBpedia</a></li> <li><a href="/resources/about/">About KBpedia</a></li> <li><a href="/resources/faq/">KBpedia FAQ</a></li> <li><a href="/resources/news/">News About KBpedia</a></li> <li><a href="/resources/statistics/">KBpedia Statistics</a></li> <li><a href="/resources/documentation/">Additional Documentation</a></li> <li><a href="/resources/support/">Support for KBpedia</a>.</li> </ul> </li> </ul> </div> <div class="col-md-1">   </div> </div> </div> </div>  <div class="breadcrumbs"> <div class="container"> <div class="col-md-1">   </div> <div class="col-md-10"> <h1 class="pull-left"></h1> <ul class="pull-right breadcrumb"> <li>Use Cases</li> <li class="active">Disambiguating Concepts</li> </ul> </div> <div class="col-md-1">   </div> </div> </div>    <div class="container content"> <div class="row"> <div class="col-md-2">   </div> <div class="col-md-8"> <div class="use-cases-header"> <table border="0" cellpadding="4" cellspacing="2"> <tbody> <tr> <td colspan="2" align="center"> <h2> USE CASE </h2> </td> </tr> <tr> <td style="width: 140px;" valign="top"> Title: </td> <td style="padding-left: 25px;" valign="top"> Disambiguating KBpedia Knowledge Graph Concepts </td> </tr> <tr> <td valign="top"> Short Description: </td> <td style="padding-left: 25px;" valign="top"> This use case describes a fast method for disambiguating concepts, when more than one possible match is provided. </td> </tr> <tr> <td valign="top"> Problem: </td> <td style="padding-left: 25px;" valign="top"> When "tagging" or extracting concepts, most methods rely on matching the surface forms of the names or labels associated with the concepts. However, these specific terms or phrases may be, in themselves, ambiguous; that is, they may match to multiple possible concepts (think of words like "bank", "club", "fan", "right", etc.). Thus, the proper meaning of a term or phrase depends on its use and context. Disambiguation methods must make these discriminations. </td> </tr> <tr> <td valign="top"> Approach: </td> <td style="padding-left: 25px;" valign="top"> The KBpedia knowledge graph is based on consistent and coherent relationships. Related concepts are located close to one another; unrelated concepts reside much farther apart in the graph. Graph embeddings for each node and its adjacent neighbors in the graph are created using the <a href="https://arxiv.org/abs/1403.6652">DeepWalk</a> method, which creates a vector for each concept. Surface forms of matching concepts are first obtained via fairly straightforward parsing. Where potential matches have more than one candidate concept, its vector representation is compared to vectors from the surrounding text; most closely aligned vectors are taken as the most accurate match, thus disambiguating from amongst the ambiguous candidates. Simple linear algebra equations are used to make these comparison calculations, a process that is extremely fast. The overall process is captured by a repeatable pipeline with statistical reporting, enabling rapid refinements in parameters and methods to achieve the best-performing model. This general method can be applied to entities as well as concepts. </td> </tr> <tr> <td valign="top"> Key Findings </td> <td valign="top"> <ul> <li>A fast, and acceptably accurate, way to disambiguate concepts is defined</li> <li>The general method is applicable to disambiguating both concepts and entities</li> <li>Accurate graph embeddings to aid disambiguation may be obtained automatically and for "free" given the coherent structure of the KBpedia knowledge graph</li> <li>A disambiguation accuracy of more than 65% is possible with the first-candidate approach with optimized hyperparameters</li> <li>The chosen method can disambiguate more than 1500 terms per minute on commodity hardware </li> <li>Slower methods can be invoked to increase accuracy further</li> <li>The speed of the disambiguation process suggests it may be justified to include as a pre-processing step to other machine learning tasks (classification, entity typing, etc.) </li> <li>Future experiments may include other graph walk strategies like depth-first or breadth-first walks, as well as combinations of different graph relationships. </li> </ul> </td> </tr> </tbody> </table> </div> </div> <div class="col-md-2">   </div> </div> <div class="row"> </div> <div class="row"> </div> <div class="row"> <div class="col-md-2">   </div> <div class="col-md-8"> One of the most important natural language processing tasks is to "tag" concepts in text. Tagging a concept means determining whether words or phrases in a text document match any of the concepts that exist in a knowledge structure (such as a knowledge graph, an ontology, a taxonomy, a vocabulary, etc.). (BTW, a similar definition and process applies to tagging an entity.) What is usually performed when tagging is that the input text is parsed and normalized in some manner. Then all of the surface forms of the concepts within the input knowledge structure (based on their preferred and alternative labels) are matched against the words within the text. "Tagging" is when a match occurs between a concept in the knowledge structure and one of its surface forms in the input text. But here is the problem. Given the ambiguous world we live in, often this surface form, which after all is only a word or phrase, may be associated with multiple different concepts. When we identify the surface form of "bank", does that term refer to a financial institution, the shore of a river, a plane turning, or a pool shot? Identical surface forms may refer to totally different concepts. Further, sometimes a single concept will be identified but it won't be the right concept, possibly because the right one is missing from the knowledge structure or other issues. The purpose of this article is thus to explain how we can "disambiguate" (that is, suggest the proper concept from an ambiguous list) the concepts that have been tagged. What we will do is to show how we can leverage the KBpedia knowledge graph structure as-is to perform this disambiguation. What we will do is to create graph embeddings for each of the KBpedia concepts using the DeepWalk algorithm. Then we will perform simple linear algebra equations on the graph embeddings to determine if the tagged concept(s) is the right one given its context or not. We will test multiple different algorithms and strategies to analyze the impact on the overall disambiguation performance of the system. <div id="outline-container-orgafcce09" class="outline-2"> <h2 id="orgafcce09">Different Situations</h2> <div class="outline-text-2" id="text-orgafcce09"> Before starting to work on the problem, let's discuss the issues we may encounter with any tagging and disambiguation task. The three possibilities we may encounter are: <ol class="org-ol"> <li>A word in the text refers to a concept, but the concept hasn't been identified (is not in the graph)</li> <li>A word in the text has been tagged with a single concept, but that concept is not the right one (the right one is not in the graph)</li> <li>A word in the text has been tagged with multiple concepts, so the right concept needs to be selected amongst the options offered.</li> </ol> What we present in this article is how we can automatically disambiguate the situations 2 and 3. (To fix the first situation, not addressed further herein, we would have to add additional surface forms to the knowledge graph by adding new alternative labels to existing concepts, or by adding new concepts with their own surface forms.) </div> </div> <div id="outline-container-org3214f1c" class="outline-2"> <h2 id="org3214f1c">Disambiguation Process</h2> <div class="outline-text-2" id="text-org3214f1c"> The disambiguation process is composed of two main tasks: the creation of the graph embeddings for each concept in the KBpedia knowledge graph, and then the evaluation of each tagged concept to disambiguate them. </div> <div id="outline-container-org607ec3e" class="outline-3"> <h3 id="org607ec3e">Introducing DeepWalk</h3> <div class="outline-text-3" id="text-org607ec3e"> <a href="https://arxiv.org/abs/1403.6652">DeepWalk</a> was created to learn social representations of a graph's vertices that capture neighborhood similarity and community membership. DeepWalk generalizes neural language models to process a special language composed of a set of randomly-generated walks. We want to use DeepWalk not to learn social representations but to learn the relationship (that is, the similarity) between all of the concepts existing in a knowledge graph given different kinds of relationships such as <code>sub-class-of</code>, <code>super-class-of</code>, <code>equivalent-class</code> or other relationships such as KBpedia's <code>80 aspects relationships</code>. (Note, we also discussed and used DeepWalk in another use case <a href="/use-cases/extending-kbpedia-with-kbpedia-categories/">extending KBpedia</a>.) For this use case we use the DeepWalk algorithm to select concepts from the broader Wikipedia corpus. Other tasks that could be performed using DeepWalk in a similar manner are: <ol class="org-ol"> <li>Content recommendation,</li> <li>Anomaly detection [in the knowledge graph], or</li> <li>Missing link prediction [in the knowledge graph].</li> </ol> Note that we randomly walk the graphs as stated in DeepWalk's original paper <a id="fnr.1" class="footref" href="#fn.1">1</a>. However more experiments could be performed to change the random walk by other graph walk strategies like depth-first or breadth-first walks. </div> </div> <div id="outline-container-org6b9844f" class="outline-3"> <h3 id="org6b9844f">Create Graph Embedding Vectors</h3> <div class="outline-text-3" id="text-org6b9844f"> We first create a <code>graph embedding</code> for each of the Wikipedia categories. What we have to do is to use the Wikipedia category structure along with the linked KBpedia knowledge graph. Then we have to generate the graph embedding for each of the Wikipedia categories that exist in the structure. The graph embeddings are generated using the <a href="https://arxiv.org/abs/1403.6652">DeepWalk</a> algorithm over that linked structure. It randomly walks the linked graph hundred of times to generate the graph embeddings for each category. </div> <div id="outline-container-org2082ce9" class="outline-4"> <h4 id="org2082ce9">Create Deeplearning4j Graph</h4> <div class="outline-text-4" id="text-org2082ce9"> To generate the graph embeddings, we use <a href="https://deeplearning4j.org/">Deeplearning4j's</a> DeepWalk implementation. This step creates a Deeplearning4j <code>graph</code> structure that is used by its DeepWalk implementation to generate the embeddings. The graph we have to create is composed of the latest version of KBpedia knowledge graph version <code>1.20</code>. Since the only concepts we want to disambiguate are the KBpedia reference concepts as tagged by the tagger, then this is the only knowledge structure we have to load to create the graph embeddings. Then we generate the initial Deeplearning4j graph structure composed of the inferred KBpedia knowledge graph. <div class="org-src-container"> <pre class="src src-clojure">(use '[cognonto-deepwalk.core]) (require '[clojure.string :as string]) (require '[clojure.data.csv :as csv]) (require '[clojure.java.io :as io]) (require '[cognonto-owl.query :as query]) (defonce knowledge-graph (load-knowledge-graph "file:/d:/cognonto-git/kbpedia-generator/kbpedia/1.10/target/kbpedia_reference_concepts_linkage_inferrence_extended_2.n3.owl")) </pre> </div> </div> </div> <div id="outline-container-org9b34106" class="outline-4"> <h4 id="org9b34106">Train DeepWalk</h4> <div class="outline-text-4" id="text-org9b34106"> Once the Deeplearning4j graph is created, the next step is to create and train the DeepWalk algorithm. What the <code>(create-deep-walk)</code> function does is to create and initialize a <code>DeepWalk</code> object with the <code>Graph</code> we created above and with some hyperparameters. The <code>:window-size</code> hyperparameter is the size of the window used by the continuous <a href="https://en.wikipedia.org/wiki/N-gram#Skip-gram">Skip-gram</a> algorithm used in DeepWalk. The <code>:vector-size</code> hyperparameter is the size of the embedding vectors we want the DeepWalk to generate (it is the number of dimensions of our model). The <code>:learning-rate</code> is the initial leaning rate of the <a href="https://en.wikipedia.org/wiki/Stochastic_gradient_descent">Stochastic gradient descent</a>. For this task, we initially use a window of <code>15</code> and <code>3</code> dimensions to make visualizations simpler to interpret, and an initial learning rate of <code>2.5%</code>. <div class="org-src-container"> <pre class="src src-clojure">(def graph (create-deepwalk-graph knowledge-graph :directed? true)) (def deep-walk (create-deep-walk graph :window-size 15 :vector-size 3 :learning-rate 0.025)) </pre> </div> Once the DeepWalk object is created and initialized with the graph, the next step is to train that model to generate the embedding vectors for each vertex in the graph. The training is performed using a random walk iterator. The two hyperparameters related to the training process are the <code>walk-length</code> and the <code>walks-per-vertex</code>. The <code>walk-length</code> is the number of vertices we want to visit for each iteration. The <code>walks-per-vertex</code> is the number of timex we want to create random walks for each vertex in the graph. <div class="org-src-container"> <pre class="src src-clojure">(defn train ([deep-walk iterator] (train deep-walk iterator 1)) ([deep-walk iterator walks-per-vertex] (.fit deep-walk iterator) (dotimes [n walks-per-vertex] (.reset iterator) (.fit deep-walk iterator))) ([deep-walk graph walk-length walks-per-vertex] (train deep-walk (new RandomWalkIterator graph walk-length) walks-per-vertex))) </pre> </div> For the initial setup, we want to have a <code>walk-length</code> of <code>15</code> and we want to iterate the process <code>175</code> times per vertex. <div class="org-src-container"> <pre class="src src-clojure">(train deep-walk graph 15 175) </pre> </div> </div> </div> </div> <div id="outline-container-org7ea8f0f" class="outline-3"> <h3 id="org7ea8f0f">Concept Disambiguation</h3> <div class="outline-text-3" id="text-org7ea8f0f"> Now that all of the KBpedia reference concepts are characterized using the graph embeddings, how can we disambiguate each of these concepts in a text? The first thing is to observe what is happening. When we tag a sentence such as "A sandstorm blows over damaged buildings in rebel held area of Douma." the first step that happens is that the surface forms (preferred labels and alternative labels) of the KBpedia concepts are identified within the sentence. As we discussed above, it is possible that a word, or a group of words, may get tagged with multiple KBpedia reference concepts. Because we characterized each of the concepts with its graph embeddings, it means that what we are dealing with is a series of vectors that represent the "meaning" of each concept in the knowledge graph structure. What we end-up with is the following: <div class="figure"> <img src="sentence.png" alt="sentence.png" /> </div> The next step is to use that information to try to disambiguate each of these concepts. To try to disambiguate the concepts, we have to make a few assumptions. First we have to assume that the graph embeddings created by the DeepWalk algorithm when crawling the KBpedia knowledge graph represent the "meaning" of the concept within the knowledge structure. Then we have to assume that the sentence where the concept has been identified creates a context that we can use to disambiguate the identified concepts based on their "meaning". In the example above, the context is the sentence, and four concepts have been identified. The big assumption we are making here is that each identified concept is "tightly" related to others in the knowledge graph. Given this assumption, what we have to do is to calculate the relatedness of each concept, within the context, and keep the ones that are closest to each other. </div> <div id="outline-container-org4c1cb97" class="outline-4"> <h4 id="org4c1cb97">Gold Standard Creation</h4> <div class="outline-text-4" id="text-org4c1cb97"> Before digging into how we will perform the disambiguation of these concepts, another step is to create a gold standard that we will use to evaluate the performance of our model and to check the impact of different hyperparameters on the disambiguation process. We created this gold standard manually by using random sentences found in online news articles. We first tagged each of these sentences using the tagger. Then, we manually disambiguated each of the tagged concepts. This gives us the properly labeled training set for our exercise. The result is the following <a href="disambiguation-gold-standard.csv">gold standard file</a>. Each of the annotations looks like this: <img src="ltximg/kbpedia-concepts-disambiguation_main_7c24ba70dbba722f3a2b8cfcf3afea2446f60105.png" alt="kbpedia-concepts-disambiguation_main_7c24ba70dbba722f3a2b8cfcf3afea2446f60105.png" /> This markup first presents the surface forms for the related concept(s) in the knowledge graph within the double brackets, followed by the concept(s) URI endings between the double parenthesis. The double colon <code>::</code> designator provides the suggested disambiguated concept. If nothing follows the double colon it means that the correct concept does not exist in the knowledge graph). </div> </div> <div id="outline-container-orgb597cc7" class="outline-4"> <h4 id="orgb597cc7">Disambiguation Method</h4> <div class="outline-text-4" id="text-orgb597cc7"> The actual disambiguation method is based on some simple linear algebra formulas. Each tagged word <img src="ltximg/kbpedia-concepts-disambiguation_main_0d1399fad7c1911e6241538c7b4fdfb50329d273.png" alt="kbpedia-concepts-disambiguation_main_0d1399fad7c1911e6241538c7b4fdfb50329d273.png" /> is related to a sliding context <img src="ltximg/kbpedia-concepts-disambiguation_main_7d61ea454e712ce09c7f75d3140788acf53ae5fe.png" alt="kbpedia-concepts-disambiguation_main_7d61ea454e712ce09c7f75d3140788acf53ae5fe.png" /> where <img src="ltximg/kbpedia-concepts-disambiguation_main_9783bbde9fb6923e6ceddef20e7a918b54a2f992.png" alt="kbpedia-concepts-disambiguation_main_9783bbde9fb6923e6ceddef20e7a918b54a2f992.png" />. Each tagged word has <img src="ltximg/kbpedia-concepts-disambiguation_main_ef038ba123ebef4319d569d54ec40dc5f67bc0c5.png" alt="kbpedia-concepts-disambiguation_main_ef038ba123ebef4319d569d54ec40dc5f67bc0c5.png" /> graph embedding vectors (i.e. a "sense" vector) <img src="ltximg/kbpedia-concepts-disambiguation_main_a5f0e1561072b96080a9c732794bd18fe00f1a57.png" alt="kbpedia-concepts-disambiguation_main_a5f0e1561072b96080a9c732794bd18fe00f1a57.png" /> where <img src="ltximg/kbpedia-concepts-disambiguation_main_72cf9483351da664caaca6cbde55432080d6c202.png" alt="kbpedia-concepts-disambiguation_main_72cf9483351da664caaca6cbde55432080d6c202.png" />. What we want the disambiguation method to do is to calculate a score between each embedding vector of each identified concept of each word given its sliding context. The score is calculated by performing the dot product between the graph embedding vector <img src="ltximg/kbpedia-concepts-disambiguation_main_6b526e3a714ac2c7be46661934ef653eac7e7fd8.png" alt="kbpedia-concepts-disambiguation_main_6b526e3a714ac2c7be46661934ef653eac7e7fd8.png" /> and the sliding context vector <img src="ltximg/kbpedia-concepts-disambiguation_main_774d0389f3a45245568f0233190a445a7ad65f53.png" alt="kbpedia-concepts-disambiguation_main_774d0389f3a45245568f0233190a445a7ad65f53.png" />. The sliding context vector is calculated by summing the graph embedding vector <img src="ltximg/kbpedia-concepts-disambiguation_main_6b526e3a714ac2c7be46661934ef653eac7e7fd8.png" alt="kbpedia-concepts-disambiguation_main_6b526e3a714ac2c7be46661934ef653eac7e7fd8.png" /> of each word <img src="ltximg/kbpedia-concepts-disambiguation_main_ccb5e9230532a6990c88b620742d233da163504d.png" alt="kbpedia-concepts-disambiguation_main_ccb5e9230532a6990c88b620742d233da163504d.png" /> within the window: <img src="ltximg/kbpedia-concepts-disambiguation_main_73e561775d087190d01c276973ae7b6941ab72e7.png" alt="kbpedia-concepts-disambiguation_main_73e561775d087190d01c276973ae7b6941ab72e7.png" /> Finally the score is calculated using: <img src="ltximg/kbpedia-concepts-disambiguation_main_15bde81280c85f78a2f16bd40d9ac16d62b8fb41.png" alt="kbpedia-concepts-disambiguation_main_15bde81280c85f78a2f16bd40d9ac16d62b8fb41.png" /> This is the first definition of the score we want to use to begin to disambiguate concepts from the knowledge graph associated with words in a sentence. As you can read from the formula, only the first sense for each word <img src="ltximg/kbpedia-concepts-disambiguation_main_0d1399fad7c1911e6241538c7b4fdfb50329d273.png" alt="kbpedia-concepts-disambiguation_main_0d1399fad7c1911e6241538c7b4fdfb50329d273.png" /> is being selected for each word within the sliding window that creates the context. (Other variants are be explored later in this article.) However as we will see below, we will change that formulas a bit such that the score becomes easier to understand for a human by expressing it in degrees. What is important to understand here is that we use two vectors to calculate the score: the graph embedding of the concept we want to disambiguate and the vector calculated from the windowed context that sums all the graph embedding vectors of each concept within that window. Now let's put this method into code. The first thing we have to do is to create a lookup table that is composed of the graph embeddings for each of the concepts that exist in the KBpedia knowledge graph as we calculated with the DeepWalk algorithm above. <div class="org-src-container"> <pre class="src src-clojure">(def index (->> (query/get-classes knowledge-graph) (map-indexed (fn [i class] (if-not (string? class) {(.toString (.getIRI class)) (inc i)} {class (inc i)}))) (apply merge))) (def ^:dynamic vertex-vectors (.getVertexVectors (.lookupTable deep-walk))) </pre> </div> Getting the vector <img src="ltximg/kbpedia-concepts-disambiguation_main_a5f0e1561072b96080a9c732794bd18fe00f1a57.png" alt="kbpedia-concepts-disambiguation_main_a5f0e1561072b96080a9c732794bd18fe00f1a57.png" /> for the word <img src="ltximg/kbpedia-concepts-disambiguation_main_0d1399fad7c1911e6241538c7b4fdfb50329d273.png" alt="kbpedia-concepts-disambiguation_main_0d1399fad7c1911e6241538c7b4fdfb50329d273.png" /> is as simple as getting it from the lookup table we created above: <div class="org-src-container"> <pre class="src src-clojure">(defn get-vector [uri-ending] (when-let [concept (get index (str "http://kbpedia.org/kko/rc/" uri-ending))] (.getRow vertex-vectors concept))) </pre> </div> To get the sense <img src="ltximg/kbpedia-concepts-disambiguation_main_e58c29910eb8c832fb24c34d0ac56af7b0892c2e.png" alt="kbpedia-concepts-disambiguation_main_e58c29910eb8c832fb24c34d0ac56af7b0892c2e.png" /> of the word <img src="ltximg/kbpedia-concepts-disambiguation_main_f1a1bc341aaedf18366e6daed56662ba6789bae3.png" alt="kbpedia-concepts-disambiguation_main_f1a1bc341aaedf18366e6daed56662ba6789bae3.png" /> (which is <code>SandstormAsObject</code>) in our example sentence above, we only have to: <div class="org-src-container"> <pre class="src src-clojure">(.toString (.data (get-vector "SandstormAsObject"))) </pre> </div> <pre class="example"> [0.08586349,-0.06854561,-0.14704005] </pre> To get the vector <img src="ltximg/kbpedia-concepts-disambiguation_main_774d0389f3a45245568f0233190a445a7ad65f53.png" alt="kbpedia-concepts-disambiguation_main_774d0389f3a45245568f0233190a445a7ad65f53.png" /> of the context of the word <img src="ltximg/kbpedia-concepts-disambiguation_main_0d1399fad7c1911e6241538c7b4fdfb50329d273.png" alt="kbpedia-concepts-disambiguation_main_0d1399fad7c1911e6241538c7b4fdfb50329d273.png" /> we have to do some more work. We have to create a function that will calculate the <code>dot product</code> between two vectors. Then we will have to create another function that will calculate the <code>sum</code> of <code>x</code> number of vectors. <div class="org-src-container"> <pre class="src src-clojure">(defn dot-product-clj [x y] (->> (interleave x y) (partition 2 2) (map #(apply * %)) (reduce +))) (defn angle-clj [a b] (Math/toDegrees (Math/acos (/ (dot-product-clj a b) (* (Math/sqrt (dot-product-clj a a)) (Math/sqrt (dot-product-clj b b))))))) (defn disambiguate [line] (let [tags (re-seq #"\[\[(.*?)\]\]\(\((.*?)\)\)" line)] (clojure.pprint/pprint tags) (println) (loop [i 0 tag (first tags) rtags (rest tags)] (let [word (second tag) concepts (last tag) concept (get-tag-concept concepts)] (println word " --> " concepts) ;; Disambiguate concepts (let [ambiguous-concepts (string/split (first (string/split concepts #"::")) #" ")] (doseq [ambiguous-concept ambiguous-concepts] (println "a-vector: " (get-vector ambiguous-concept)) (println "b-vector: " (get-context i tags)) (println "dot product: " (dot-product (get-vector ambiguous-concept) (get-context i tags))) (println "angle: " (angle (get-vector ambiguous-concept) (get-context i tags))) (println))) (when-not (empty? rtags) (recur (inc i) (first rtags) (rest rtags))))))) </pre> </div> <div class="org-src-container"> <pre class="src src-clojure">(defn dot-product "Calculate the dot product of two vectors (NDArray)" [v1 v2] (first (read-string (.toString (.data (.mmul v1 (.transpose v2))))))) (defn sum-vectors "Sum any number of vectors (NDArray)" [& args] (let [args (remove nil? args)] (loop [result (first args) args (rest args)] (if (empty? args) result (recur (.add result (first args)) (rest args)))))) </pre> </div> The next step is to create the function that calculates the context vector, which is a sliding window of the concept associated with <img src="ltximg/kbpedia-concepts-disambiguation_main_82ab83e730d76d23ce3732d5b1705d391cb72c45.png" alt="kbpedia-concepts-disambiguation_main_82ab83e730d76d23ce3732d5b1705d391cb72c45.png" />, <img src="ltximg/kbpedia-concepts-disambiguation_main_0d1399fad7c1911e6241538c7b4fdfb50329d273.png" alt="kbpedia-concepts-disambiguation_main_0d1399fad7c1911e6241538c7b4fdfb50329d273.png" /> and <img src="ltximg/kbpedia-concepts-disambiguation_main_aa3602038506d03eb84e703eb2db703d74794a9c.png" alt="kbpedia-concepts-disambiguation_main_aa3602038506d03eb84e703eb2db703d74794a9c.png" />. <div class="org-src-container"> <pre class="src src-clojure">(defn get-tag-concept [concepts] (if (> (.indexOf concepts "::") -1) (second (re-find #"::(.*)" concepts)) concepts)) </pre> </div> Let's see how that works. To calculate <img src="ltximg/kbpedia-concepts-disambiguation_main_774d0389f3a45245568f0233190a445a7ad65f53.png" alt="kbpedia-concepts-disambiguation_main_774d0389f3a45245568f0233190a445a7ad65f53.png" /> when <img src="ltximg/kbpedia-concepts-disambiguation_main_99f89040e96fd7145a771f6ea53e64f14a07bfbc.png" alt="kbpedia-concepts-disambiguation_main_99f89040e96fd7145a771f6ea53e64f14a07bfbc.png" /> we have to calculate <img src="ltximg/kbpedia-concepts-disambiguation_main_3e70cb55d508d5cd9276f01daa169dce44fb7bcb.png" alt="kbpedia-concepts-disambiguation_main_3e70cb55d508d5cd9276f01daa169dce44fb7bcb.png" /> which can be done with the following code: <div class="org-src-container"> <pre class="src src-clojure">;; s{1,1} (def sandstormasobject (get-vector "SandstormAsObject")) ;; s{2,1} (def blowingair (get-vector "BlowingAir")) ;; s{3,1} (def building (get-vector "Building")) (def c (sum-vectors sandstormasobject blowingair building)) (println (.toString (.data c))) </pre> </div> <pre class="example"> [0.24941699,-0.0940429,-0.11848262] </pre> Finally we have to calculate the score used to disambiguate the two senses of the word <code>blows</code> by performing the dot product between <img src="ltximg/kbpedia-concepts-disambiguation_main_1776d587df81749698f57cb9f0873c69870fe6ce.png" alt="kbpedia-concepts-disambiguation_main_1776d587df81749698f57cb9f0873c69870fe6ce.png" /> and <img src="ltximg/kbpedia-concepts-disambiguation_main_34ba6b04114515f0170ff209540475e3fcdf7ddb.png" alt="kbpedia-concepts-disambiguation_main_34ba6b04114515f0170ff209540475e3fcdf7ddb.png" /> where <img src="ltximg/kbpedia-concepts-disambiguation_main_eafcc43344ccdb500c279acd0b8a17187992c261.png" alt="kbpedia-concepts-disambiguation_main_eafcc43344ccdb500c279acd0b8a17187992c261.png" /> which can be done using the following code: <div class="org-src-container"> <pre class="src src-clojure">;; s{2,2} (def windprocess (get-vector "WindProcess")) (println "BlowingAir score:" (dot-product blowingair c)) (println "WindProcess score:" (dot-product windprocess c)) </pre> </div> <pre class="example"> BlowingAir score: 0.011248392 WindProcess score: 0.005993685 </pre> What the scores suggest is, given the context, the right concept associated with the word <code>blows</code> is <code>BlowingAir</code> since its score is bigger. This is the right answer. We can see how simple linear algebra manipulations can be used to help us automatically disambiguate such concepts. In fact, the crux of the problem is not to perform these operations but to create a coherent and consistent knowledge graph such as KBpedia and then to create the right graph embeddings for each of its concepts. The coherent graph structure gives us this disambiguation capability for "free". However, these scores, as is, are hard to interpret and understand. What we want to do next is to transform these numbers into a degree between <code>0</code> and <code>360</code>. The degree between the word sense's vector <img src="ltximg/kbpedia-concepts-disambiguation_main_a5f0e1561072b96080a9c732794bd18fe00f1a57.png" alt="kbpedia-concepts-disambiguation_main_a5f0e1561072b96080a9c732794bd18fe00f1a57.png" /> and <img src="ltximg/kbpedia-concepts-disambiguation_main_a62515e81db25785ddb65fc097b8bc1e0c6f84ad.png" alt="kbpedia-concepts-disambiguation_main_a62515e81db25785ddb65fc097b8bc1e0c6f84ad.png" /> represent how close the two vectors are to each other. A degree <code>0</code> would means that both vectors are identical in terms of relationship, and a degree of <code>180</code> would mean they are quite dissimilar. What we will see later is that we can use scores such as this to drop senses that score above some similarity degree threshold. The angle between the two vectors can easily be calculated with the following formula: <img src="ltximg/kbpedia-concepts-disambiguation_main_8fa9e31902fcb39732a13f13ac5105eaa36976ee.png" alt="kbpedia-concepts-disambiguation_main_8fa9e31902fcb39732a13f13ac5105eaa36976ee.png" /> The following code will calculate the angle between two vectors using this formula: <div class="org-src-container"> <pre class="src src-clojure">(defn angle [a b] (Math/toDegrees (Math/acos (/ (dot-product a b) (* (Math/sqrt (dot-product a a)) (Math/sqrt (dot-product b b))))))) </pre> </div> OK, so let's get the angle between the example we created above: <div class="org-src-container"> <pre class="src src-clojure">(println "BlowingAir degree score:" (angle blowingair c)) (println "WindProcess degree score:" (angle windprocess c)) </pre> </div> <pre class="example"> BlowingAir degree score: 76.77807342393672 WindProcess degree score: 82.80469144165336 </pre> Since the degree between the context and <code>BlowingAir</code> is smaller than the degree between the context and <code>WindProcess</code>, we keep the word <code>blows</code> as the disambiguated concept. </div> </div> </div> <div id="outline-container-org85b00ef" class="outline-3"> <h3 id="org85b00ef">Evaluate Model</h3> <div class="outline-text-3" id="text-org85b00ef"> The last step is to evaluate the model we created. What we have to do is to create a function called <code>(evaluate-disambiguation-model)</code> that will read the gold standard file, parse the markup and then perform the disambiguation. Then the <code>precision</code>, <code>recall</code>, <code>acuracy</code> and <code>F1</code> metrics will be calculated to get the performance of this disambiguation system and the models we created for it. <div class="org-src-container"> <pre class="src src-clojure">(defn get-first-sense-vector [concepts] (get-vector (if (> (.indexOf concepts "::") -1) (first (string/split (second (re-find #"(.*)::" concepts)) #" ")) concepts))) (defn get-context [i tags] (if (= (count tags) 1) ;; There is only one tag in the sentence (get-first-sense-vector (last (first tags))) (if (= (count tags) 2) ;; There is only 2 tags in the sentence (sum-vectors (get-first-sense-vector (last (first tags))) (get-first-sense-vector (last (second tags)))) ;; Target concept is the first one of the sentence (if (= i 0) (sum-vectors (get-first-sense-vector (last (first tags))) (get-first-sense-vector (last (second tags))) (get-first-sense-vector (last (nth tags (+ i 2))))) ;; Target concept is the last one of the sentence (if (= i (- (count tags) 1)) (sum-vectors (get-first-sense-vector (last (nth tags (- i 2)))) (get-first-sense-vector (last (nth tags (- i 1)))) (get-first-sense-vector (last (nth tags i)))) ;; Target is in-between (sum-vectors (get-first-sense-vector (last (nth tags (- i 1)))) (get-first-sense-vector (last (nth tags i))) (get-first-sense-vector (last (nth tags (+ i 1)))))))))) (defn predict-label [ambiguous-concepts i tags max-angle] (second (first (->> ambiguous-concepts (map (fn [ambiguous-concept] (if-let [a (get-vector ambiguous-concept)] (if-let [b (get-context i tags)] (if (> (angle a b) max-angle) {} {(angle a b) ambiguous-concept}) {}) {}))) (apply merge) sort)))) (defn predict-label-angle [ambiguous-concepts i tags] (first (->> ambiguous-concepts (map (fn [ambiguous-concept] (if-let [a (get-vector ambiguous-concept)] (if-let [b (get-context i tags)] {(angle a b) ambiguous-concept}) {}))) (apply merge) sort))) (defn evaluate-disambiguation-model [gold-standard-file & {:keys [max-angle] :or {max-angle 90.0}}] (let [sentences (with-open [in-file (io/reader gold-standard-file)] (doall (csv/read-csv in-file :separator \tab))) true-positive (atom 0) false-positive (atom 0) true-negative (atom 0) false-negative (atom 0)] (doseq [[sentence] sentences] (let [tags (re-seq #"\[\[(.*?)\]\]\(\((.*?)\)\)" sentence)] (loop [i 0 tag (first tags) rtags (rest tags)] (when-not (= i (count tags)) ;; Disambiguate concepts (let [word (second tag) concepts (last tag) concept (get-tag-concept concepts) label (second (re-find #"::(.*)" concepts)) ambiguous-concepts (string/split (first (string/split concepts #"::")) #" ") predicted-label (predict-label ambiguous-concepts i tags max-angle)] (when (and (= label predicted-label) (not (empty? label)) (not (empty? predicted-label))) (swap! true-positive inc)) (when (and (not (= label predicted-label)) (not (empty? predicted-label))) (swap! false-positive inc)) (when (and (empty? label) (empty? predicted-label)) (swap! true-negative inc)) (when (and (not (empty? label)) (empty? predicted-label)) (swap! false-negative inc)) (recur (inc i) (first rtags) (rest rtags))))))) (println "True positive: " @true-positive) (println "false positive: " @false-positive) (println "True negative: " @true-negative) (println "False negative: " @false-negative) (println) (if (= 0 @true-positive) (let [precision 0 recall 0 accuracy 0 f1 0] (println "Precision: " precision) (println "Recall: " recall) (println "Accuracy: " accuracy) (println "F1: " f1) {:precision precision :recall recall :accuracy accuracy :f1 f1}) (let [precision (float (/ @true-positive (+ @true-positive @false-positive))) recall (float (/ @true-positive (+ @true-positive @false-negative))) accuracy (float (/ (+ @true-positive @true-negative) (+ @true-positive @false-negative @false-positive @true-negative))) f1 (float (* 2 (/ (* precision recall) (+ precision recall))))] (println "Precision: " precision) (println "Recall: " recall) (println "Accuracy: " accuracy) (println "F1: " f1) {:precision precision :recall recall :accuracy accuracy :f1 f1})))) </pre> </div> </div> <div id="outline-container-org08f46b5" class="outline-4"> <h4 id="org08f46b5">Evaluation Baseline: Random Sense</h4> <div class="outline-text-4" id="text-org08f46b5"> Before evaluating the initial model we described above, the first thing we should be doing is to check the result of each metric we want to evaluate if we take a concept at random. To do this, we will modify the <code>(evaluate-disambiguation-model)</code> function we created above to perform the selection of the "disambiguated" sense at random. This will provide the baseline to evaluate the other algorithms we will test. <div class="org-src-container"> <pre class="src src-clojure">(defn evaluate-disambiguation-model-random [gold-standard-file & {:keys [max-angle] :or {max-angle 90.0}}] (let [sentences (with-open [in-file (io/reader gold-standard-file)] (doall (csv/read-csv in-file :separator \tab))) true-positive (atom 0) false-positive (atom 0) true-negative (atom 0) false-negative (atom 0)] (doseq [[sentence] sentences] (let [tags (re-seq #"\[\[(.*?)\]\]\(\((.*?)\)\)" sentence)] (loop [i 0 tag (first tags) rtags (rest tags)] (when-not (= i (count tags)) ;; Disambiguate concepts (let [word (second tag) concepts (last tag) concept (get-tag-concept concepts) label (second (re-find #"::(.*)" concepts)) ambiguous-concepts (string/split (first (string/split concepts #"::")) #" ") predicted-label (rand-nth (into ambiguous-concepts [nil]))] (when (and (= label predicted-label) (not (empty? label)) (not (empty? predicted-label))) (swap! true-positive inc)) (when (and (not (= label predicted-label)) (not (empty? predicted-label))) (swap! false-positive inc)) (when (and (empty? label) (empty? predicted-label)) (swap! true-negative inc)) (when (and (not (empty? label)) (empty? predicted-label)) (swap! false-negative inc)) (recur (inc i) (first rtags) (rest rtags))))))) (if (= 0 @true-positive) (let [precision 0 recall 0 accuracy 0 f1 0] {:precision precision :recall recall :accuracy accuracy :f1 f1}) (let [precision (float (/ @true-positive (+ @true-positive @false-positive))) recall (float (/ @true-positive (+ @true-positive @false-negative))) accuracy (float (/ (+ @true-positive @true-negative) (+ @true-positive @false-negative @false-positive @true-negative))) f1 (float (* 2 (/ (* precision recall) (+ precision recall))))] {:precision precision :recall recall :accuracy accuracy :f1 f1})))) </pre> </div> What we do is to calculate the average random sense disambiguation score for each metric. We run the <code>(evaluate-disambiguation-model-random)</code> one thousand times and then calculate the average. <div class="org-src-container"> <pre class="src src-clojure">(let [f1 (atom 0) precision (atom 0) accuracy (atom 0) recall (atom 0) nb (atom 0)] (dotimes [i 1000] (let [results (evaluate-disambiguation-model-random "resources/disambiguation-gold.standard.csv")] (swap! nb inc) (swap! f1 + (:f1 results)) (swap! precision + (:precision results)) (swap! recall + (:recall results)) (swap! accuracy + (:accuracy results)))) (println "Average precision: " (/ @precision @nb)) (println "Average recall: " (/ @recall @nb)) (println "Average accuracy: " (/ @accuracy @nb)) (println "Average F1: " (/ @f1 @nb))) </pre> </div> <pre class="example"> Average precision: 0.6080861368775368 Average recall: 0.49945064118504523 Average accuracy: 0.42075299778580666 Average F1: 0.5480979214012622 </pre> This gives us a baseline <code>precision</code> score of <code>0.608</code>, <code>recall</code> score of <code>0.499</code>, <code>accuracy</code> score of <code>0.420</code> and <code>F1</code> score of <code>0.548</code>. </div> </div> <div id="outline-container-orgfd10448" class="outline-4"> <h4 id="orgfd10448">Evaluation: First Sense Context</h4> <div class="outline-text-4" id="text-orgfd10448"> Now that we have our baseline in place, let's see the performance of the initial <code>first-sensse-context</code> disambiguation algorithm as described above compared to a random process. Note that this initial run is using an unoptimized DeepWalk graph with initial general hyperparameters. <div class="org-src-container"> <pre class="src src-clojure">(evaluate-disambiguation-model "resources/disambiguation-gold.standard.csv") </pre> </div> <pre class="example"> True positive: 246 false positive: 126 True negative: 9 False negative: 36 Precision: 0.66129035 Recall: 0.87234044 Accuracy: 0.6115108 F1: 0.7522936 </pre> As you can see, even using unoptimized default hyperparameters, we calculate an <code>F1</code> score of <code>0.752</code>, which is an increase of <code>37.26%</code> over the baseline. Other measures improve, too: <code>Recall</code> (increased by <code>74.67%</code>), <code>accuracy</code> (increased by <code>45.35%</code>) and <code>precision</code> (increased by <code>8.75%</code>). We properly identified <code>246</code> of <code>417</code> examples. <code>126</code> were wrongly identified, <code>9</code> were properly discarded and <code>36</code> have been discarded when they shouldn't. These results are really promising, but can we do better? </div> </div> </div> <div id="outline-container-orgb405bb4" class="outline-3"> <h3 id="orgb405bb4">Hyperparameter Optimization</h3> <div class="outline-text-3" id="text-orgb405bb4"> Now that we have a disambiguation workflow in place with standards by which to compute performance statistics, the final step is to try to optimize the system by testing multiple different values for some of the hyperparameters that can impact the performance of the system. The parameters we want to optimize are the: <ol class="org-ol"> <li>Size of the DeepWalk vectors</li> <li>Window size of the skip-gram</li> <li>Depth of the walks</li> <li>Number of iterations per concept, and the</li> <li>Angle threshold</li> </ol> Then we will perform a <a href="https://en.wikipedia.org/wiki/Hyperparameter_optimization#Grid_search">grid search</a> to find the optimal hyperparameters to try to optimize each of the metric. <div class="org-src-container"> <pre class="src src-clojure">(defn svm-grid-search [graph & {:keys [grid-parameters] :or {grid-parameters {:window-size [10 15] :vector-size [3 16 64 128 256 512] :walk-length [5 10 15] :walks-per-vertex [32 64 128 256] :angle [30 60 90 120] :selection-metrics [:precision :accuracy :recall :f1]}}}] (let [best (atom (->> (:selection-metrics grid-parameters) (map (fn [selection-metric] {selection-metric {:window-size nil :vector-size nil :walk-length nil :walks-per-vertex nil :angle nil :score 0.0}})) (apply merge))) parameters grid-parameters] (doseq [window-size (:window-size parameters)] (doseq [vector-size (:vector-size parameters)] (let [deep-walk (create-deep-walk graph :window-size window-size :vector-size vector-size :learning-rate 0.025)] (doseq [walk-length (:walk-length parameters)] (doseq [walks-per-vertex (:walks-per-vertex parameters)] (train deep-walk graph walk-length walks-per-vertex) (doseq [angle (:angle parameters)] (let [results (binding [vertex-vectors (.getVertexVectors (.lookupTable deep-walk))] (evaluate-disambiguation-model "resources/disambiguation-gold.standard.csv" :max-angle angle))] (doseq [selection-metric (:selection-metrics parameters)] (let [score (get results selection-metric)] (println) (println "Window size: " window-size) (println "Vector size: " vector-size) (println "Walk length: " walk-length) (println "Walks per vertex: " walks-per-vertex) (println "Angle: " angle) (println "Score ("selection-metric"): " score) (println) (println) (when (> score (:score (get @best selection-metric))) (reset! best (assoc-in @best [selection-metric] {:window-size window-size :vector-size vector-size :walk-length walk-length :walks-per-vertex walks-per-vertex :angle angle :score score})))))))))))) @best)) </pre> </div> <div class="org-src-container"> <pre class="src src-clojure">(svm-grid-search graph :grid-parameters {:window-size [10 15 20] :vector-size [64 128 256] :walk-length [5 10 15] :walks-per-vertex [32 64 128] :angle [60 90 120] :selection-metrics [:precision :accuracy :recall :f1]}) </pre> </div> <pre class="example"> {:precision {:window-size 10, :vector-size 64, :walk-length 5, :walks-per-vertex 32, :angle 60, :score 0.6812866}, :accuracy {:window-size 10, :vector-size 64, :walk-length 5, :walks-per-vertex 32, :angle 90, :score 0.676259}, :recall {:window-size 10, :vector-size 64, :walk-length 5, :walks-per-vertex 32, :angle 90, :score 0.99640286}, :f1 {:window-size 10, :vector-size 64, :walk-length 5, :walks-per-vertex 32, :angle 90, :score 0.80406386}} </pre> For the same <code>first-sense-context</code> disambiguation algorithm, once the key hyperparameters of the pipeline are optimized, then we endup with a <code>F1</code> score of <code>0.8040</code>, which is an increase of <code>46.72%</code> over the baseline. <code>Recall</code> has now increased by <code>99.52%</code>, <code>accuracy</code> increased by <code>60.73%</code> and <code>precision</code> increased by <code>12.04%</code>. Still, understand that when performing the grid search in this manner that many of the hyperparameters we are trying to optimize are related to the DeepWalk algorithm. Because of the nature of the algorithm (when the model is trained, the paths are randomly created and the learning starts with random seeds) we can't reproduce exactly the same results every time we re-create a <code>DeepWalk</code> instance. However the results tend to be similar with a differences of <code>+/- 0.02</code> for most of the metrics. Depending on the task at hand we don't necessarly want to keep the hyperparameters of the best score. If model creation and execution speed is also a governing consideration, we may want to sacrifice <code>0.03</code> of the <code>F1</code> score to have the <code>:window-size</code>, <code>:vector-size</code>, <code>:walk-length</code> and <code>:walk-per-vertex</code> as small as possible. But in any case, these numbers indicate the kind of performances we may expect from this disambiguation process. <div class="org-src-container"> <pre class="src src-clojure">(def deep-walk (create-deep-walk graph :window-size 10 :vector-size 128 :learning-rate 0.025)) (train deep-walk graph 10 128) (binding [vertex-vectors (.getVertexVectors (.lookupTable deep-walk))] (evaluate-disambiguation-model "resources/disambiguation-gold.standard.csv" :max-angle 150)) </pre> </div> <pre class="example"> True positive: 229 false positive: 123 True negative: 5 False negative: 1 Precision: 0.6505682 Recall: 0.9956522 Accuracy: 0.65363127 F1: 0.7869416 </pre> </div> </div> <div id="outline-container-org6b90589" class="outline-3"> <h3 id="org6b90589">Different Sliding Window Strategies</h3> <div class="outline-text-3" id="text-org6b90589"> Another part of this disambiguation process that has an impact on results is how the "context" of a word is being created. There are multiple ways to define a context, such as extending the window size or letting window size extend over sentence boundaries. For clarity's sake, we initially defined the context <img src="ltximg/kbpedia-concepts-disambiguation_main_774d0389f3a45245568f0233190a445a7ad65f53.png" alt="kbpedia-concepts-disambiguation_main_774d0389f3a45245568f0233190a445a7ad65f53.png" /> as: <img src="ltximg/kbpedia-concepts-disambiguation_main_73e561775d087190d01c276973ae7b6941ab72e7.png" alt="kbpedia-concepts-disambiguation_main_73e561775d087190d01c276973ae7b6941ab72e7.png" /> One of the issues with this definition of "context" is that, so far, <img src="ltximg/kbpedia-concepts-disambiguation_main_70280f6b9061479517c21b3204e4d4aaaccf740f.png" alt="kbpedia-concepts-disambiguation_main_70280f6b9061479517c21b3204e4d4aaaccf740f.png" /> (the senses for each word) is always <code>1</code>, which means that the context is always defined with the first sense of a word, whatever it is. This is why we called this method the <code>first-sense-context</code> disambiguation. Biasing the context to the first sense ignores the other senses that may be summed into the context vector. However, this initial definition is simple enough and, as we saw above, we still have adequate results with it. Still, what could be other strategies? </div> <div id="outline-container-org3c3312b" class="outline-4"> <h4 id="org3c3312b">Ignore Target Word Senses In Context</h4> <div class="outline-text-4" id="text-org3c3312b"> The first test we may do is to remove the target word's sense (the word we are trying to disambiguate) from the context. That way, we would make sure that we don't bias the context toward that sense. The context would be defined as: <img src="ltximg/kbpedia-concepts-disambiguation_main_727c862fa4ba91e94a62e87da99e9a4131c26a83.png" alt="kbpedia-concepts-disambiguation_main_727c862fa4ba91e94a62e87da99e9a4131c26a83.png" /> We are still picking the first sense of each word that belongs to the context, but we are not considering that sense for the target word when we calculate the context vector. Let's see what is the impact of changing the context with this new definition. To test this new context, we have to change the <code>(get-context)</code> function to reflect the new machanism and to re-test and re-optimize the hyperparameters. <div class="org-src-container"> <pre class="src src-clojure">(defn get-context [i tags] (if (= (count tags) 1) ;; There is only one tag in the sentence (get-first-sense-vector (last (first tags))) (if (= (count tags) 2) ;; There is only 2 tags in the sentence (if (= i 0) (sum-vectors (get-first-sense-vector (last (second tags)))) (sum-vectors (get-first-sense-vector (last (first tags))))) ;; Target concept is the first one of the sentence (if (= i 0) (sum-vectors (get-first-sense-vector (last (second tags))) (get-first-sense-vector (last (nth tags (+ i 2))))) ;; Target concept is the last one of the sentence (if (= i (- (count tags) 1)) (sum-vectors (get-first-sense-vector (last (nth tags (- i 2)))) (get-first-sense-vector (last (nth tags (- i 1))))) ;; Target is in-between (sum-vectors (get-first-sense-vector (last (nth tags (- i 1)))) (get-first-sense-vector (last (nth tags (+ i 1)))))))))) </pre> </div> Now let's re-optimize and re-evaluate the impact of this modification on our gold standard: <div class="org-src-container"> <pre class="src src-clojure">(svm-grid-search graph :grid-parameters {:window-size [10 15 20] :vector-size [64 128 256] :walk-length [5 10 15] :walks-per-vertex [32 64 128] :angle [60 90 120] :selection-metrics [:precision :accuracy :recall :f1]}) </pre> </div> <pre class="example"> {:precision {:window-size 20, :vector-size 128, :walk-length 5, :walks-per-vertex 32, :angle 120, :score 0.6666667}, :accuracy {:window-size 20, :vector-size 128, :walk-length 5, :walks-per-vertex 32, :angle 120, :score 0.66906476}, :recall {:window-size 20, :vector-size 128, :walk-length 5, :walks-per-vertex 32, :angle 120, :score 0.99636364}, :f1 {:window-size 20, :vector-size 128, :walk-length 5, :walks-per-vertex 32, :angle 120, :score 0.79883385}} </pre> With this modification to the context creation algorithm, every metric slightly dropped compared to our previous version of the <code>first-sense</code> disambiguation algorithm. However, this modification of the algorithm has little impact on the final results if we consider the margin of error we have for the <code>F1</code> score incurred by DeepWalk's random walks and training process. <div class="org-src-container"> <pre class="src src-clojure">(def deep-walk (create-deep-walk graph :window-size 15 :vector-size 128 :learning-rate 0.025)) (train deep-walk graph 15 128) (binding [vertex-vectors (.getVertexVectors (.lookupTable deep-walk))] (evaluate-disambiguation-model "resources/disambiguation-gold.standard.csv" :max-angle 120)) </pre> </div> </div> </div> <div id="outline-container-org5b25cfc" class="outline-4"> <h4 id="org5b25cfc">Consider Multiple Contexts</h4> <div class="outline-text-4" id="text-org5b25cfc"> Another modification we will test is to define not a single context but multiple contexts and to keep the most relevant one. So far, we only used the top sense for each word to define the context. The truth is that most of the words have multiple possible senses, which means that we are currently ignoring them. Let's again revisit our initial example: <div class="figure"> <img src="sentence.png" alt="sentence.png" /> </div> If the target word is <code>buildings</code> then the contexts we created so far only considered the senses <img src="ltximg/kbpedia-concepts-disambiguation_main_d65b40ecc3113c0303a621e3360a8e2f51df152f.png" alt="kbpedia-concepts-disambiguation_main_d65b40ecc3113c0303a621e3360a8e2f51df152f.png" />, <img src="ltximg/kbpedia-concepts-disambiguation_main_7986902a6150fca4666ab80489c5e19f57d3c27c.png" alt="kbpedia-concepts-disambiguation_main_7986902a6150fca4666ab80489c5e19f57d3c27c.png" /> and <img src="ltximg/kbpedia-concepts-disambiguation_main_77b1bb8b1889bda6d392cbafb767fa4f6c24fc97.png" alt="kbpedia-concepts-disambiguation_main_77b1bb8b1889bda6d392cbafb767fa4f6c24fc97.png" /> but what about the other two? What we want to do here is to create multiple contexts and then to compare each of the senses of the target word against each of these contexts. Then we will check the angle between each of the senses of the target word against each of the contexts. Then we will keep the sense that is most closely related to one of the context vectors. The contexts are simply created out of all the possible combinations between each of the senses of each words within the window. With the example above, the possible combinations are: <div class="org-src-container"> <pre class="src src-clojure">(for [x [[1 -9 -2] [2 3 -7]] y [[-8 1 1]] z [[-1 -9 0] [0 3 -1]]] (clojure.pprint/pprint (vector x y x))) </pre> </div> <pre class="example"> [[1 -9 -2] [-8 1 1] [1 -9 -2]] [[1 -9 -2] [-8 1 1] [1 -9 -2]] [[2 3 -7] [-8 1 1] [2 3 -7]] [[2 3 -7] [-8 1 1] [2 3 -7]] </pre> To now compare this larger number of combinations, we need to modify the evaluation procedure. We have to modify how the contexts are created and how all possible contexts are generated. Then we have to modify how we predict the label for an example by using the most relevant context. <div class="org-src-container"> <pre class="src src-clojure">(defn disambiguate [line] (let [tags (re-seq #"\[\[(.*?)\]\]\(\((.*?)\)\)" line)] (loop [i 0 tag (first tags) rtags (rest tags)] (let [word (second tag) concepts (last tag) concept (get-tag-concept concepts)] (println word " --> " concepts) ;; Disambiguate concepts (let [ambiguous-concepts (string/split (first (string/split concepts #"::")) #" ")] (doseq [ambiguous-concept ambiguous-concepts] ;; (println "a-vector: " (get-vector ambiguous-concept)) (let [unambiguous (first (->> (get-contexts i tags) (map (fn [context] {(angle (get-vector ambiguous-concept) context) ambiguous-concept})) (apply merge) sort))] ;; (println "b-vector: " (get-context i tags)) (println "Unambiguous concept: " (second unambiguous)) (println "angle: " (first unambiguous)) (println))))) (when-not (empty? rtags) (recur (inc i) (first rtags) (rest rtags)))))) </pre> </div> <div class="org-src-container"> <pre class="src src-clojure">(require '[clojure.math.combinatorics :as combo]) (defn get-word-senses [i tags] (let [senses (last (nth tags i))] (if (> (.indexOf senses "::") -1) (string/split (second (re-find #"(.*)::" senses)) #" ") senses))) (defn get-contexts [i tags] (if (= (count tags) 1) ;; There is only one tag in the sentence (get-first-sense-vector (last (first tags))) (if (= (count tags) 2) ;; There is only 2 tags in the sentence (let [contexts (combo/cartesian-product (get-word-senses 0 tags) (get-word-senses 1 tags))] (->> contexts (mapv (fn [context] (sum-vectors (get-first-sense-vector (first context)) (get-first-sense-vector (second context))))))) ;; Target concept is the first one of the sentence (if (= i 0) (let [contexts (combo/cartesian-product (get-word-senses 0 tags) (get-word-senses 1 tags) (get-word-senses 2 tags))] (->> contexts (mapv (fn [context] (sum-vectors (get-first-sense-vector (first context)) (get-first-sense-vector (second context)) (get-first-sense-vector (nth context 2))))))) ;; Target concept is the last one of the sentence (if (= i (- (count tags) 1)) (let [contexts (combo/cartesian-product (get-word-senses (- i 2) tags) (get-word-senses (- i 1) tags) (get-word-senses i tags))] (->> contexts (mapv (fn [context] (sum-vectors (get-first-sense-vector (first context)) (get-first-sense-vector (second context)) (get-first-sense-vector (nth context 2))))))) ;; Target is in-between (let [contexts (combo/cartesian-product (get-word-senses (- i 1) tags) (get-word-senses i tags) (get-word-senses (+ i 1) tags))] (->> contexts (mapv (fn [context] (sum-vectors (get-first-sense-vector (first context)) (get-first-sense-vector (second context)) (get-first-sense-vector (nth context 2)))))))))))) (defn predict-label [ambiguous-concepts i tags max-angle] (second (first (->> ambiguous-concepts (map (fn [ambiguous-concept] (let [unambiguous (first (->> (get-contexts i tags) (map (fn [context] (if-let [a (get-vector ambiguous-concept)] (if-not (nil? context) (if (> (angle a context) max-angle) {} {(angle a context) ambiguous-concept}) {}) {}))) (apply merge) sort))] {(first unambiguous) (second unambiguous)}))) (apply merge) sort)))) (defn predict-label-angle [ambiguous-concepts i tags] (first (->> ambiguous-concepts (map (fn [ambiguous-concept] (let [unambiguous (first (->> (get-contexts i tags) (map (fn [context] (if-let [a (get-vector ambiguous-concept)] (if-not (nil? context) {(angle a context) ambiguous-concept} {}) {}))) (apply merge) sort))] {(first unambiguous) (second unambiguous)}))) (apply merge) sort))) </pre> </div> Now, using this changed disambiguation method, let's optimize the hyperparameters of the pipeline to see how this method can perform compared to the baseline. <div class="org-src-container"> <pre class="src src-clojure">(svm-grid-search graph :grid-parameters {:window-size [10 15 20] :vector-size [64 128 256] :walk-length [5 10 15] :walks-per-vertex [32 64 128] :angle [60 90 120] :selection-metrics [:precision :accuracy :recall :f1]}) </pre> </div> <pre class="example"> {:precision {:window-size 15, :vector-size 64, :walk-length 5, :walks-per-vertex 32, :angle 60, :score 0.69970846}, :accuracy {:window-size 15, :vector-size 64, :walk-length 5, :walks-per-vertex 32, :angle 90, :score 0.676259}, :recall {:window-size 15, :vector-size 64, :walk-length 5, :walks-per-vertex 32, :angle 90, :score 0.98566306}, :f1 {:window-size 15, :vector-size 64, :walk-length 5, :walks-per-vertex 32, :angle 90, :score 0.8029197}} </pre> These results are nearly the same as the <code>first-sense-context</code> method described above. Our first intuition is that this more complicated method may have little or no impact on the overall performance. Yet, as we will see below, this is actually not the case in all situations. <div class="org-src-container"> <pre class="src src-clojure">(def deep-walk (create-deep-walk graph :window-size 15 :vector-size 128 :learning-rate 0.025)) (train deep-walk graph 15 128) (binding [vertex-vectors (.getVertexVectors (.lookupTable deep-walk))] (evaluate-disambiguation-model "resources/disambiguation-gold.standard.csv" :max-angle 120)) </pre> </div> </div> </div> </div> <div id="outline-container-orgc757a26" class="outline-3"> <h3 id="orgc757a26">Focus Evaluation Of Multiple Senses</h3> <div class="outline-text-3" id="text-orgc757a26"> Now that we evaluated how different strategies perform on the entire gold standard, what we will do next is to evaluate how the different strategies perform when we only focus on the example that has multiple possible senses. To run this evaluation, we will modify the <code>(evaluate-disambiguation-model)</code> function such that we only predict the disambiguated word sense when multiple senses have been tagged for that word. <div class="org-src-container"> <pre class="src src-clojure">(defn evaluate-disambiguation-model [gold-standard-file & {:keys [max-angle] :or {max-angle 90.0}}] (let [sentences (with-open [in-file (io/reader gold-standard-file)] (doall (csv/read-csv in-file :separator \tab))) true-positive (atom 0) false-positive (atom 0) true-negative (atom 0) false-negative (atom 0)] (doseq [[sentence] sentences] (let [tags (re-seq #"\[\[(.*?)\]\]\(\((.*?)\)\)" sentence)] (loop [i 0 tag (first tags) rtags (rest tags)] (when-not (= i (count tags)) ;; Disambiguate concepts (let [word (second tag) concepts (last tag) concept (get-tag-concept concepts) label (second (re-find #"::(.*)" concepts)) ambiguous-concepts (string/split (first (string/split concepts #"::")) #" ")] (when (> (count ambiguous-concepts) 1) (let [predicted-label (predict-label ambiguous-concepts i tags max-angle)] (when (and (= label predicted-label) (not (empty? label)) (not (empty? predicted-label))) (swap! true-positive inc)) (when (and (not (= label predicted-label)) (not (empty? predicted-label))) (swap! false-positive inc)) (when (and (empty? label) (empty? predicted-label)) (swap! true-negative inc)) (when (and (not (empty? label)) (empty? predicted-label)) (swap! false-negative inc)))) (recur (inc i) (first rtags) (rest rtags))))))) (println "True positive: " @true-positive) (println "false positive: " @false-positive) (println "True negative: " @true-negative) (println "False negative: " @false-negative) (println) (if (= 0 @true-positive) (let [precision 0 recall 0 accuracy 0 f1 0] (println "Precision: " precision) (println "Recall: " recall) (println "Accuracy: " accuracy) (println "F1: " f1) {:precision precision :recall recall :accuracy accuracy :f1 f1}) (let [precision (float (/ @true-positive (+ @true-positive @false-positive))) recall (float (/ @true-positive (+ @true-positive @false-negative))) accuracy (float (/ (+ @true-positive @true-negative) (+ @true-positive @false-negative @false-positive @true-negative))) f1 (float (* 2 (/ (* precision recall) (+ precision recall))))] (println "Precision: " precision) (println "Recall: " recall) (println "Accuracy: " accuracy) (println "F1: " f1) {:precision precision :recall recall :accuracy accuracy :f1 f1})))) </pre> </div> </div> <div id="outline-container-org6106185" class="outline-4"> <h4 id="org6106185">Evaluation At Random</h4> <div class="outline-text-4" id="text-org6106185"> To put the next evaluations into context, the first thing we will do is to check what would be the score for each metric if we would pick the concept at random. What we do is to modify the <code>(evaluate-disambiguation-model)</code> function to take the "disambiguated" concept at random. Then we run that evaluation one thousand times and display the average of these runs for each metric. This will give us the foundation to evaluate the performance of each of the strategies below. <div class="org-src-container"> <pre class="src src-clojure">(defn evaluate-disambiguation-model-random [gold-standard-file & {:keys [max-angle] :or {max-angle 90.0}}] (let [sentences (with-open [in-file (io/reader gold-standard-file)] (doall (csv/read-csv in-file :separator \tab))) true-positive (atom 0) false-positive (atom 0) true-negative (atom 0) false-negative (atom 0)] (doseq [[sentence] sentences] (let [tags (re-seq #"\[\[(.*?)\]\]\(\((.*?)\)\)" sentence)] (loop [i 0 tag (first tags) rtags (rest tags)] (when-not (= i (count tags)) ;; Disambiguate concepts (let [word (second tag) concepts (last tag) concept (get-tag-concept concepts) label (second (re-find #"::(.*)" concepts)) ambiguous-concepts (string/split (first (string/split concepts #"::")) #" ")] (when (> (count ambiguous-concepts) 1) (let [predicted-label (rand-nth (into ambiguous-concepts [nil]))] (when (and (= label predicted-label) (not (empty? label)) (not (empty? predicted-label))) (swap! true-positive inc)) (when (and (not (= label predicted-label)) (not (empty? predicted-label))) (swap! false-positive inc)) (when (and (empty? label) (empty? predicted-label)) (swap! true-negative inc)) (when (and (not (empty? label)) (empty? predicted-label)) (swap! false-negative inc)))) (recur (inc i) (first rtags) (rest rtags))))))) (if (= 0 @true-positive) (let [precision 0 recall 0 accuracy 0 f1 0] {:precision precision :recall recall :accuracy accuracy :f1 f1}) (let [precision (float (/ @true-positive (+ @true-positive @false-positive))) recall (float (/ @true-positive (+ @true-positive @false-negative))) accuracy (float (/ (+ @true-positive @true-negative) (+ @true-positive @false-negative @false-positive @true-negative))) f1 (float (* 2 (/ (* precision recall) (+ precision recall))))] {:precision precision :recall recall :accuracy accuracy :f1 f1})))) </pre> </div> <div class="org-src-container"> <pre class="src src-clojure">(let [f1 (atom 0) precision (atom 0) recall (atom 0) accuracy (atom 0) nb (atom 0)] (dotimes [i 1000] (let [results (evaluate-disambiguation-model-random "resources/disambiguation-gold.standard.csv")] (swap! nb inc) (swap! f1 + (:f1 results)) (swap! precision + (:precision results)) (swap! recall + (:recall results)) (swap! accuracy + (:accuracy results)))) (println "Average precision: " (/ @precision @nb)) (println "Average recall: " (/ @recall @nb)) (println "Average accuracy: " (/ @accuracy @nb)) (println "Average F1: " (/ @f1 @nb))) </pre> </div> <pre class="example"> Average precision: 0.3583671303540468 Average recall: 0.4984975547492504 Average accuracy: 0.295875000089407 Average F1: 0.4158918097615242 </pre> This gives us a baseline <code>precision</code> score of <code>0.358</code>, <code>recall</code> score of <code>0.498</code>, <code>accuracy</code> score of <code>0.295</code> and <code>F1</code> score of <code>0.4158</code>. As we can notice if we compare with the random baseline with the full gold standard is that when we focus on the words which have multiple possible senses, the problem becomes much harder. </div> </div> <div id="outline-container-org7160fd8" class="outline-4"> <h4 id="org7160fd8">Evaluation Using First Sense Context</h4> <div class="outline-text-4" id="text-org7160fd8"> <div class="org-src-container"> <pre class="src src-clojure">(svm-grid-search graph :grid-parameters {:window-size [10 15 20] :vector-size [64 128 256] :walk-length [5 10 15] :walks-per-vertex [32 64 128] :angle [60 90 120] :selection-metrics [:precision :accuracy :recall :f1]}) </pre> </div> <pre class="example"> {:precision {:window-size 15, :vector-size 128, :walk-length 5, :walks-per-vertex 32, :angle 60, :score 0.4397163}, :accuracy {:window-size 15, :vector-size 256, :walk-length 5, :walks-per-vertex 32, :angle 60, :score 0.41875}, :recall {:window-size 10, :vector-size 64, :walk-length 5, :walks-per-vertex 32, :angle 90, :score 1.0}, :f1 {:window-size 15, :vector-size 256, :walk-length 5, :walks-per-vertex 32, :angle 60, :score 0.58295965}} </pre> For the same <code>first-sense-context</code> disambiguation algorithm, once the key hyperparameters of the pipeline are optimized, then we endup with a <code>F1</code> score of <code>0.5829</code> which is an increase of <code>40.19%</code> over the baseline. <code>Recall</code> increases by <code>100.64%</code>, <code>accuracy</code> increases by <code>41.55%</code> and <code>precision</code> increases by <code>22.72%</code>. </div> </div> <div id="outline-container-org39b1a83" class="outline-4"> <h4 id="org39b1a83">Evaluation Without Target Word's Senses</h4> <div class="outline-text-4" id="text-org39b1a83"> Now let's re-run the procedure when we don't focus on the target word that we want to disambiguate to create the context vector. <div class="org-src-container"> <pre class="src src-clojure">(svm-grid-search graph :grid-parameters {:window-size [10 15 20] :vector-size [64 128 256] :walk-length [5 10 15] :walks-per-vertex [32 64 128] :angle [60 90 120] :selection-metrics [:precision :accuracy :recall :f1]}) </pre> </div> <pre class="example"> {:precision {:window-size 10, :vector-size 128, :walk-length 5, :walks-per-vertex 32, :angle 90, :score 0.38931298}, :accuracy {:window-size 10, :vector-size 128, :walk-length 5, :walks-per-vertex 32, :angle 120, :score 0.38125}, :recall {:window-size 10, :vector-size 64, :walk-length 5, :walks-per-vertex 32, :angle 120, :score 1.0}, :f1 {:window-size 10, :vector-size 128, :walk-length 5, :walks-per-vertex 32, :angle 120, :score 0.55203617}} </pre> In this case, the <code>F1</code> score increased by <code>32.76%</code> over the baseline. <code>Recall</code> increases by <code>238.07%</code>, <code>accuracy</code> increases by <code>28.87%</code> and <code>precision</code> increases by <code>8.65%</code>. Like we experienced previously, all of the results decreased compared to the version of the algorithm that included the sense of the target word in the calculation of the context vector. This option is not improving matters, and should be dropped from further consideration. </div> </div> <div id="outline-container-org23ea010" class="outline-4"> <h4 id="org23ea010">Evaluation With Multiple Contexts</h4> <div class="outline-text-4" id="text-org23ea010"> Finally let's evaluate the other modification we did that keeps the context vector that best matches one of the senses for the target word. <div class="org-src-container"> <pre class="src src-clojure">(svm-grid-search graph :grid-parameters {:window-size [10 15 20] :vector-size [64 128 256] :walk-length [5 10 15] :walks-per-vertex [32 64 128] :angle [60 90 120] :selection-metrics [:precision :accuracy :recall :f1]}) </pre> </div> <pre class="example"> {:precision {:window-size 15, :vector-size 64, :walk-length 5, :walks-per-vertex 32, :angle 60, :score 0.4566929}, :accuracy {:window-size 15, :vector-size 64, :walk-length 5, :walks-per-vertex 32, :angle 90, :score 0.44375}, :recall {:window-size 15, :vector-size 64, :walk-length 5, :walks-per-vertex 32, :angle 90, :score 0.9583333}, :f1 {:window-size 15, :vector-size 64, :walk-length 5, :walks-per-vertex 32, :angle 90, :score 0.6079295}} </pre> In this case, the <code>F1</code> score increased by <code>46.2%</code> over the baseline. <code>Recall</code> increases by <code>223.97%</code>, <code>accuracy</code> increases by <code>50%</code>, and <code>precision</code> increases by <code>27.44%</code>. Except for the <code>recall</code>, all the other metrics improved compared to the <code>first-sense-context</code> method. Though we previously found that there were no apparent improvements using this method against using the <code>first-sense-context</code> algorithm, when we look at the words that have multiple senses only, we see that there is a clear improvement on all the metrics except for the <code>recall</code>, which is slightly lower. This suggests that in a situation where multiple words have multiple possible senses, then this method should outperform the <code>first-sense-context</code> one. </div> </div> </div> </div> <div id="outline-container-org4a5d4ea" class="outline-2"> <h2 id="org4a5d4ea">Conclusion</h2> <div class="outline-text-2" id="text-org4a5d4ea"> Disambiguating tagged concepts from a knowledge graph such as KBpedia is not an easy task. However, if a coherent and consistent knowledge graph such as KBpedia is used as the starting point, along with existing machine learning algorithms such as DeepWalk, it is possible to obtain good disambiguation performance in a short period of time without the need to create extensive training sets. With the tests we performed in this article, we gained insight into the kinds of things that may have much influence on the disambiguation process. We also learned how different strategies could be tested to get the best results for a specific disambiguation task. Results from these disambiguation processes can easily become the input into other tasks such as a document classification. [Disambiguated], tagged concepts should be a better basis for classifying the documents. If the concepts are not disambiguated, then much noise is added, leading to lower performance of the classification task. There are two main advantages of using such a disambiguation system. The first advantage is that we have no need to create any kind of training set, which would obviously be quite expensive to create considering that hundreds of contexts for each concept within the knowledge graph would need to be disambiguated manually. With a knowledge graph of the size of KBpedia, that would mean a training set of about 3.8 million examples. But since the training takes place on the structure of the knowledge graph that already exists, there is no need to create such a training set. Also, different parts of the knowledge graph could be used to create different kinds of knowledge structures ([i.e.], in this article we used the sub-class-of paths within the knowledge structure, but we could have used different aspects, different relationship paths, etc.). Alternate knowledge structures may result in different graph embeddings that may perform better in different situations. The second advantage is that since we only use simple linear algebra to disambiguate the concepts, performance is really fast. On average, we can disambiguate a single word within <code>0.65</code> milliseconds which means that we can disambiguate about 1,538 words per second with this method on a commodity desktop computer. Many more experiments could be performed to increase the performance of these methods, while retaining the ease of model creation and the disambiguation speed. New context algorithms could be tested, bigger sliding windows, etc. New graph embedding creation algorithms could be tested. New DeepWalk graph walk algorithms could be used such as breadth and depth first searches, etc. </div> </div> <div id="footnotes"> <h2 class="footnotes">Footnotes</h2> <div id="text-footnotes"> <div class="footdef"><a id="fn.1" class="footnum" href="#fnr.1">1</a>Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM. </div> </div> </div></div> <div class="col-md-2">   </div> </div> </div>  <div class="footer-v1"> <div class="footer"> <div class="container"> <div class="row">  <div class="col-md-3 md-margin-bottom-40"> <table> <tbody><tr> <td> <a href="/"> <img id="logo-footer" class="footer-logo" src="/imgs/logo-simple-purple.png" alt="" name="logo-footer"> </a> </td> </tr> <tr> <td> <center> KBpedia </center> </td> </tr> </tbody></table> KBpedia exploits large-scale knowledge bases and semantic technologies for machine learning, data interoperability and mapping, and fact extraction and tagging. </div>   <div class="col-md-3 md-margin-bottom-40"> <div class="posts"> <div class="headline"> <h2> Latest News </h2> </div> <ul class="list-unstyled latest-list">  <li> <a href="https://kbpedia.org/resources/news/kbpedia-adds-ecommerce/">KBpedia Adds Major eCommerce Capabilities</a> 06/15/2020 </li>  <li> <a href="http://kbpedia.org/resources/news/kbpedia-continues-quality-improvements/">KBpedia Continues Quality Improvements</a> 12/04/2019 </li>  <li> <a href="http://kbpedia.org/resources/news/wikidata-coverage-nearly-complete/">Wikidata Coverage Nearly Complete (98%)</a> 04/08/2019 </li> </ul> </div> </div> <div class="col-md-3 md-margin-bottom-40"> <div class="headline"> <h2> Other Resources </h2> </div> <ul class="list-unstyled link-list"> <li> <a href="/resources/about/">About</a> </li> <li> <a href="/resources/faq/">FAQ</a> </li> <li> <a href="/resources/news/">News</a> </li> <li> <a href="/use-cases/">Use Cases</a> </li> <li> <a href="/resources/documentation/">Documentation</a> </li> <li> <a href="/resources/privacy/">Privacy</a> </li> <li> <a href="/resources/terms-of-use/">Terms of Use</a> </li> </ul> </div>  <div class="col-md-3 map-img md-margin-bottom-40"> <div class="headline"> <h2> Contact Us </h2> </div> <address class="md-margin-bottom-40"> c/o <a href="mailto:info@mkbergman.com?subject=KBpedia%20Inquiry">Michael K. Bergman</a> 380 Knowling Drive Coralville, IA 52241 U.S.A. Voice: +1 319 621 5225 </address> </div>  </div> </div> </div> <div class="copyright"> <div class="container"> <div class="row"> <div class="col-md-7"> 2016-2022 漏 <a href="http://kbpedia.org" style="font-size: 10px;">Michael K. Bergman.</a> All Rights Reserved. </div>  <div class="col-md-5"> <ul class="footer-socials list-inline"> <li> <a href="/resources/feeds/news.xml" class="tooltips" data-toggle="tooltip" data-placement="top" title="" data-original-title="RSS feed"> </a> </li> <li> <a href="http://github.com/Cognonto" class="tooltips" data-toggle="tooltip" data-placement="top" title="" data-original-title="Github"> </a> </li> <li> <a href="http://twitter.com/cognonto" class="tooltips" data-toggle="tooltip" data-placement="top" title="" data-original-title="Twitter"> </a> </li> </ul> </div>  </div> </div> </div> </div>   <script type="text/javascript" src="/assets/plugins/jquery/jquery.min.js"></script> <script type="text/javascript" src="/assets/plugins/jquery/jquery-migrate.min.js"></script> <script type="text/javascript" src="/assets/plugins/bootstrap/js/bootstrap.min.js"></script>  <script type="text/javascript" src="/assets/plugins/back-to-top.js"></script>  <script type="text/javascript" src="/assets/js/custom.js"></script>  <script type="text/javascript" src="/assets/js/app.js"></script>  <script type="text/javascript" src="/assets/plugins/smoothScroll.js"></script> <script type="text/javascript" src="/assets/plugins/owl-carousel/owl-carousel/owl.carousel.js"></script> <script type="text/javascript" src="/assets/plugins/layer-slider/layerslider/js/greensock.js"></script> <script type="text/javascript" src="/assets/plugins/layer-slider/layerslider/js/layerslider.transitions.js"></script> <script type="text/javascript" src="/assets/plugins/layer-slider/layerslider/js/layerslider.kreaturamedia.jquery.js"></script>  <script type="text/javascript" src="/assets/js/custom.js"></script>  <script type="text/javascript" src="/assets/js/plugins/layer-slider.js"></script> <script type="text/javascript" src="/assets/js/plugins/style-switcher.js"></script> <script type="text/javascript" src="/assets/js/plugins/owl-carousel.js"></script> <script type="text/javascript" src="/assets/js/plugins/owl-recent-works.js"></script> <script type="text/javascript"> jQuery(document).ready(function() { App.init(); LayerSlider.initLayerSlider(); StyleSwitcher.initStyleSwitcher(); OwlCarousel.initOwlCarousel(); OwlRecentWorks.initOwlRecentWorksV2(); }); </script>   <script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-84405507-1', 'auto'); ga('send', 'pageview'); </script>

CINXE.COM

Tagging and Disambiguating KBpedia Knowledge Graph Concepts