CINXE.COM
Measuring the Structural Similarity of Web-based Documents: A Novel Approach
<!DOCTYPE html> <html lang="en" dir="ltr"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="google-site-verification" content="5fPGCLllnWrvFxH9QWI0l1TadV7byeEvfPcyK2VkS_s"/> <meta name="google-site-verification" content="Rp5zp04IKW-s1IbpTOGB7Z6XY60oloZD5C3kTM-AiY4"/> <meta name="generator" content="InvenioRDM 13.0"/> <meta name="robots" content="noindex, nofollow"> <meta name="description" content="Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents." /> <meta name="citation_title" content="Measuring the Structural Similarity of Web-based Documents: A Novel Approach" /> <meta name="citation_doi" content="10.5281/zenodo.1086031" /> <meta name="citation_keywords" content="Graph similarity" /> <meta name="citation_keywords" content="hierarchical and directed graphs" /> <meta name="citation_keywords" content="hypertext" /> <meta name="citation_keywords" content="generalized trees" /> <meta name="citation_keywords" content="web structure mining." /> <meta name="citation_abstract_html_url" content="https://zenodo.org/records/1086031" /> <meta property="og:title" content="Measuring the Structural Similarity of Web-based Documents: A Novel Approach" /> <meta property="og:description" content="Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents." /> <meta property="og:url" content="https://zenodo.org/records/1086031" /> <meta property="og:site_name" content="Zenodo" /> <meta name="twitter:card" content="summary" /> <meta name="twitter:site" content="@zenodo_org" /> <meta name="twitter:title" content="Measuring the Structural Similarity of Web-based Documents: A Novel Approach" /> <meta name="twitter:description" content="Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents." /> <meta name="citation_pdf_url" content="https://zenodo.org/records/1086031/files/15928.pdf"/> <link rel="alternate" type="application/pdf" href="https://zenodo.org/records/1086031/files/15928.pdf"> <link rel="canonical" href="https://zenodo.org/records/1086031"> <title>Measuring the Structural Similarity of Web-based Documents: A Novel Approach</title> <link rel="shortcut icon" type="image/x-icon" href="/static/favicon.ico"/> <link rel="apple-touch-icon" sizes="120x120" href="/static/apple-touch-icon-120.png"/> <link rel="apple-touch-icon" sizes="152x152" href="/static/apple-touch-icon-152.png"/> <link rel="apple-touch-icon" sizes="167x167" href="/static/apple-touch-icon-167.png"/> <link rel="apple-touch-icon" sizes="180x180" href="/static/apple-touch-icon-180.png"/> <link rel="stylesheet" href="/static/dist/css/3526.0d9b3c8be998e2e93a52.css" /> <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> <!--[if lt IE 9]> <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> <![endif]--> </head> <body data-invenio-config='{"isMathJaxEnabled": "//cdnjs.cloudflare.com/ajax/libs/mathjax/3.2.2/es5/tex-mml-chtml.js?config=TeX-AMS-MML_HTMLorMML"}' itemscope itemtype="http://schema.org/WebPage" data-spy="scroll" data-target=".scrollspy-target"> <a id="skip-to-main" class="ui button primary ml-5 mt-5 skip-link" href="#main">Skip to main</a> <!--[if lt IE 8]> <p class="browserupgrade">You are using an <strong>outdated</strong> browser. Please <a href="http://browsehappy.com/">upgrade your browser</a> to improve your experience.</p> <![endif]--> <div> <header class="theme header"> <div class="outer-navbar"> <div class="ui container invenio-header-container"> <nav id="invenio-nav" class="ui inverted menu borderless p-0"> <div class="item logo p-0"> <a class="logo-link" href="/"> <img class="ui image rdm-logo" src="/static/images/invenio-rdm.svg" alt="Zenodo home"/> </a> </div> <div id="rdm-burger-toggle"> <button id="rdm-burger-menu-icon" class="ui button transparent" aria-label="Menu" aria-haspopup="menu" aria-expanded="false" aria-controls="invenio-menu" > <span class="navicon" aria-hidden="true"></span> </button> </div> <nav id="invenio-menu" aria-labelledby="rdm-burger-menu-icon" class="ui fluid menu borderless mobile-hidden" > <button id="rdm-close-burger-menu-icon" class="ui button transparent" aria-label="Close menu" > <span class="navicon" aria-hidden="true"></span> </button> <div class="item p-0 search-bar"> <div id="header-search-bar" data-options='[{"key": "communities", "text": "In this community", "value": "/communities/waset/records"}, {"key": "records", "text": "All Zenodo", "value": "/search"}]'> <div class="ui fluid search"> <div class="ui icon input"> <input autocomplete="off" aria-label="Search records" placeholder="Search records..." type="text" tabindex="0" class="prompt" value="" > <i aria-hidden="true" class="search icon"></i> </div> </div> </div> </div> <div class="item"> <a href="/communities">Communities</a> </div> <div class="item"> <a href="/me/uploads">My dashboard</a> </div> <div class="right menu item"> <form> <a href="/login/?next=%2Frecords%2F1086031" class="ui button auth-button" aria-busy="false" aria-live="polite" aria-label="Log in" > <i class="sign-in icon auth-icon" aria-hidden="true"></i> Log in </a> <a href="/signup/" class="ui button signup"> <i class="edit outline icon"></i> Sign up </a> </form> </div> </nav> </nav> </div> </header> </div> <main id="main"> <div class="invenio-page-body"> <section id="banners" class="banners" aria-label="Information banner"> <!-- COMMUNITY HEADER: hide it when displaying the submission request --> <div class="ui fluid container page-subheader-outer with-submenu compact ml-0-mobile mr-0-mobile"> <div class="ui container page-subheader"> <div class="page-subheader-element"> <img class="ui rounded image community-header-logo" src="https://zenodo.org/api/communities/a59dd046-9a86-4a47-97ce-b51f1bb8fc3f/logo" alt="" /> </div> <div class="page-subheader-element"> <div class="ui header"> <a href="/communities/waset/records" class="ui small header"> World Academy of Science, Engineering and Technology </a> <!-- Show the icon for subcommunities --> </div> </div> </div> </div> <!-- /COMMUNITY HEADER --> <!-- PREVIEW HEADER --> <!-- /PREVIEW HEADER --> </section> <div class="ui container"> <div class="ui relaxed grid mt-5"> <div class="two column row top-padded"> <article class="sixteen wide tablet eleven wide computer column main-record-content"> <section id="record-info" aria-label="Publication date and version number"> <div class="ui grid middle aligned"> <div class="two column row"> <div class="left floated left aligned column"> <span class="ui" title="Publication date"> Published October 20, 2007 </span> <span class="label text-muted"> | Version 15928</span> </div> <div class="right floated right aligned column"> <span role="note" class="ui label horizontal small neutral mb-5" aria-label="Resource type" > Journal article </span> <span role="note" class="ui label horizontal small access-status open mb-5" data-tooltip="The record and files are publicly accessible." data-inverted="" aria-label="Access status" > <i class="icon unlock" aria-hidden="true"></i> <span aria-label="The record and files are publicly accessible."> Open </span> </span> </div> </div> </div> </section> <div class="ui divider hidden"></div><section id="record-title-section" aria-label="Record title and creators"> <h1 id="record-title" class="wrap-overflowing-text">Measuring the Structural Similarity of Web-based Documents: A Novel Approach</h1> <section id="creatibutors" aria-label="Creators and contributors"> <div class="ui grid"> <div class="row ui accordion affiliations"> <div class="sixteen wide mobile twelve wide tablet thirteen wide computer column"> <h3 class="sr-only">Creators</h3> <ul class="creatibutors"> <li class="creatibutor-wrap separated"> <a class="ui creatibutor-link" href="/search?q=metadata.creators.person_or_org.name%3A%22Matthias+Dehmer%22" > <span class="creatibutor-name">Matthias Dehmer</span></a> </li> <li class="creatibutor-wrap separated"> <a class="ui creatibutor-link" href="/search?q=metadata.creators.person_or_org.name%3A%22Frank+Emmert+Streib%22" > <span class="creatibutor-name">Frank Emmert Streib</span></a> </li> <li class="creatibutor-wrap separated"> <a class="ui creatibutor-link" href="/search?q=metadata.creators.person_or_org.name%3A%22Alexander+Mehler%22" > <span class="creatibutor-name">Alexander Mehler</span></a> </li> <li class="creatibutor-wrap separated"> <a class="ui creatibutor-link" href="/search?q=metadata.creators.person_or_org.name%3A%22J%C3%BCrgen+Kilian%22" > <span class="creatibutor-name">Jürgen Kilian</span></a> </li> </ul> </div> </div> </div> </section> </section> <section id="description" class="rel-mt-2 rich-input-content" aria-label="Record description"> <h2 id="description-heading" class="sr-only">Description</h2> <div style="word-wrap: break-word;"> <p><p>Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.</p></p> </div> </section> <section id="record-files" class="rel-mt-2 rel-mb-3" aria-label="Files" ><h2 id="files-heading">Files</h2> <div class="ui accordion panel mb-10 open" href="#files-preview-accordion-panel"> <h3 class="active title panel-heading open m-0"> <div role="button" id="files-preview-accordion-trigger" aria-controls="files-preview-accordion-panel" aria-expanded="true" tabindex="0" class="trigger" aria-label="File preview" > <span id="preview-file-title">15928.pdf</span> <i class="angle right icon" aria-hidden="true"></i> </div> </h3> <div role="region" id="files-preview-accordion-panel" aria-labelledby="files-preview-accordion-trigger" class="active content preview-container pt-0 open" > <div> <iframe title="Preview" class="preview-iframe" id="preview-iframe" name="preview-iframe" src="/records/1086031/preview/15928.pdf?include_deleted=0" > </iframe> </div> </div> </div> <div class="ui accordion panel mb-10 open" href="#files-list-accordion-panel"> <h3 class="active title panel-heading open m-0"> <div role="button" id="files-list-accordion-trigger" aria-controls="files-list-accordion-panel" aria-expanded="true" tabindex="0" class="trigger"> Files <small class="text-muted"> (1.9 MB)</small> <i class="angle right icon" aria-hidden="true"></i> </div> </h3> <div role="region" id="files-list-accordion-panel" aria-labelledby="files-list-accordion-trigger" class="active content pt-0"> <div> <table class="ui striped table files fluid open"> <thead> <tr> <th>Name</th> <th>Size</th> <th class> <a role="button" class="ui compact mini button right floated archive-link" href="https://zenodo.org/api/records/1086031/files-archive"> <i class="file archive icon button" aria-hidden="true"></i> Download all </a> </th> </tr> </thead> <tbody> <tr> <td class="ten wide"> <div> <a href="/records/1086031/files/15928.pdf?download=1">15928.pdf</a> </div> <small class="ui text-muted font-tiny">md5:ae4213def0553785a81da08353f92265 <div class="ui icon inline-block" data-tooltip="This is the file fingerprint (checksum), which can be used to verify the file integrity."> <i class="question circle checksum icon"></i> </div> </small> </td> <td>1.9 MB</td> <td class="right aligned"> <span> <a role="button" class="ui compact mini button preview-link" href="/records/1086031/preview/15928.pdf?include_deleted=0" target="preview-iframe" data-file-key="15928.pdf"> <i class="eye icon" aria-hidden="true"></i>Preview </a> <a role="button" class="ui compact mini button" href="/records/1086031/files/15928.pdf?download=1"> <i class="download icon" aria-hidden="true"></i>Download </a> </span> </td> </tr> </tbody> </table> </div> </div> </div> </section> <section id="additional-details" class="rel-mt-2" aria-label="Additional record details"> <h2 id="record-details-heading">Additional details</h2> <div class="ui divider"></div> <div class="ui fluid accordion padded grid rel-mb-1"> <div class="active title sixteen wide mobile four wide tablet three wide computer column"> <h3 class="ui header"> <div id="references-accordion-trigger" role="button" tabindex="0" aria-expanded="true" aria-controls="references-panel" class="trigger" > <i class="caret right icon" aria-hidden="true"></i>References </div> </h3> </div> <div id="references-panel" role="region" aria-labelledby="references-accordion-trigger" class="active content sixteen wide mobile twelve wide tablet thirteen wide computer column" > <ul class="ui bulleted list details-list"> <li class="item">R. Bellman, Dynamic Programming. Princeton University Press, 1957</li> <li class="item">R. A. Botafogo, B. Shneiderman: Structural analysis of hypertexts: Identifying hierarchies and useful metrics, ACM Trans. Inf. Syst. 10 (2), 1992, 142-180</li> <li class="item">S. Chakrabarti: Mining the Web. Discovering Knowledge from Hypertext Data, Morgen and Kaufmann Publishers, 2003</li> <li class="item">S. Chakrabarti: Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction, Proc. of the 10th International World Wide Web Conference, Hong Kong, 2001, 211- 220</li> <li class="item">I. F. Cruz, S. Borisov, M. A. Marks, T. R. Webb: Measuring Structural Similarity Among Web Documents: Preliminary Results , Lecture Notes In Computer Science, Vol. 1375, 1998</li> <li class="item">M. Dehmer, Strukturelle Analyse web-basierter Dokumente, Ph.D Thesis, Department of Computer Science, Technische Universit┬¿at Darmstadt, 2005, unpublished</li> <li class="item">M. Dehmer, R. Gleim, A. Mehler: Aspekte der Kategorisierung von Webseiten, GI-Edition - Lecture Notes in Informatics (LNI) - Proceedings, Jahrestagung der Gesellschaft f┬¿ur Informatik, Informatik 2004, Ulm/Germany, 2004, 39-43</li> <li class="item">R. Gleim: HyGraph - Ein Framework zur Extraktion, Repr┬¿asentation und Analyse webbasierter Hypertextstrukturen, Beitr┬¿age zur GLDVTagung 2005, Bonn/Germany, 2005</li> <li class="item">D. Gusfield: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997 [10] T. Jiang, L. Wang, K. Zhang: Alignment of trees - An alternative to tree edit, Theoretical Computer Science, Elsevier, Vol. 143, 1995, 137-148 [11] S. Joshi, N. Agrawal, R. Krishnapuram, S. Negi,: Bag of Paths Model for Measuring Structural Similarity in Web Documents, Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2003, 577-582. [12] Mehler A.: Textbedeutung. Zur prozeduralen Analyse und Repr┬¿asentation struktureller ┬¿Ahnlichkeiten von Texten, Peter Lang, Europ┬¿aischer Verlag der Wissenschaften, 2001 [13] A. Mehler, M. Dehmer, R. Gleim: Towards logical hypertext structure. A graph-theoretic perspective, Proc. of I2CS-04, Guadalajara/Mexico, Lecture Notes in Computer Science, Berlin-New York: Springer, 2004 [14] A. Mehler, R. Gleim, M. Dehmer: Towards structure-sensitive hypertext categorization, to appear in: Proceedings of the 29-th Annual Conference of the German Classification Society, 2005 [15] S. M. Selkow: The tree-to-tree editing problem, Information Processing Letters, Vol. 6 (6), 1977, 184-186 [16] T. F. Smith, M. S. Waterman: Identification of common molecular subsequences, Journal of Molecular Biology, Vol. 147 (1), 1981, 195- 197 [17] F. Sobik, Graphmetriken und Klassifikation strukturierter Objekte, ZKIInformationen, Akad. Wiss. DDR, Vol. 2 (82), 1982, 63-122 [18] J. R. Ullman, An algorithm for subgraph isomorphism, J. ACM, Vol. 23 (1), 1976, 31-42 [19] P. H. Winne., L. Gupta, J. C. Nesbit: Exploring individual differences in studying strategies using graph theoretic statistics, The Alberta Journal of Educational Research, Vol. 40, 1994, 177-193 [20] A. Winter: Exchanching Graphs with GXL, <a href="http://www.gupro" rel="noopener">http://www.gupro</a>. de/GXL [21] Y. Yang, S. Slattery, R. Ghani: A study of approaches to hypertext categorization, Journal of Intelligent Information Systems, Vol. 18 (2-3), 2002, 219-241 [22] K. Zhang, D. Shasha: Simple fast algorithms for the editing distance between trees and related problems, SIAM Journal of Computing, Vol. 18 (6), 1989, 1245-1262 [23] B. Zelinka, On a certain distance between isomorphism classes of graphs, ╦ç Casopis pro ╦çpest. Mathematiky, Vol. 100, 1975, 371-373</li> </ul> </div> </div> <div class="ui divider"></div> </section> <section id="citations-search" data-record-pids='{"doi": {"client": "datacite", "identifier": "10.5281/zenodo.1086031", "provider": "datacite"}, "oai": {"identifier": "oai:zenodo.org:1086031", "provider": "oai"}}' data-record-parent-pids='{"doi": {"client": "datacite", "identifier": "10.5281/zenodo.1086030", "provider": "datacite"}}' data-citations-endpoint="https://zenodo-broker.web.cern.ch/api/relationships" aria-label="Record citations" class="rel-mb-1" > </section> </article> <aside class="sixteen wide tablet five wide computer column sidebar" aria-label="Record details"> <section id="metrics" aria-label="Metrics" class="ui segment rdm-sidebar sidebar-container"> <div class="ui tiny two statistics rel-mt-1"> <div class="ui statistic"> <div class="value">70</div> <div class="label"> <i aria-hidden="true" class="eye icon"></i> Views </div> </div> <div class="ui statistic"> <div class="value">29</div> <div class="label"> <i aria-hidden="true" class="download icon"></i> Downloads </div> </div> </div> <div class="ui accordion rel-mt-1 centered"> <div class="title"> <i class="caret right icon" aria-hidden="true"></i> <span tabindex="0" class="trigger" data-open-text="Show more details" data-close-text="Show less details" > Show more details </span> </div> <div class="content"> <table id="record-statistics" class="ui definition table fluid"> <thead> <tr> <th></th> <th class="right aligned">All versions</th> <th class="right aligned">This version</th> </tr> </thead> <tbody> <tr> <td> Views <i tabindex="0" role="button" style="position:relative" class="popup-trigger question circle small icon" aria-expanded="false" aria-label="More info" data-variation="mini inverted" > </i> <p role="tooltip" class="popup-content ui flowing popup transition hidden"> Total views </p> </td> <td data-label="All versions" class="right aligned"> 70 </td> <td data-label="This version" class="right aligned"> 70 </td> </tr> <tr> <td> Downloads <i tabindex="0" role="button" style="position:relative" class="popup-trigger question circle small icon" aria-expanded="false" aria-label="More info" data-variation="mini inverted" > </i> <p role="tooltip" class="popup-content ui flowing popup transition hidden"> Total downloads </p> </td> <td data-label="All versions" class="right aligned"> 29 </td> <td data-label="This version" class="right aligned"> 29 </td> </tr> <tr> <td> Data volume <i tabindex="0" role="button" style="position:relative" class="popup-trigger question circle small icon" aria-expanded="false" aria-label="More info" data-variation="mini inverted" > </i> <p role="tooltip" class="popup-content ui flowing popup transition hidden"> Total data volume </p> </td> <td data-label="All versions" class="right aligned">59.5 MB</td> <td data-label="This version" class="right aligned">59.5 MB</td> </tr> </tbody> </table> <p class="text-align-center rel-mt-1"> <small> <a href="/help/statistics">More info on how stats are collected....</a> </small> </p> </div> </div> </section> <div class="sidebar-container"> <h2 class="ui medium top attached header mt-0">Versions</h2> <div id="record-versions" class="ui segment rdm-sidebar bottom attached pl-0 pr-0 pt-0"> <div class="versions"> <div id="recordVersions" data-record='{"access": {"embargo": {"active": false, "reason": null}, "files": "public", "record": "public", "status": "open"}, "created": "2018-01-17T09:24:37.141964+00:00", "custom_fields": {}, "deletion_status": {"is_deleted": false, "status": "P"}, "expanded": {"parent": {"access": {"owned_by": {"active": null, "blocked_at": null, "confirmed_at": null, "email": "", "id": "32148", "is_current_user": false, "links": {"avatar": "https://zenodo.org/api/users/32148/avatar.svg", "records_html": "https://zenodo.org/search/records?q=parent.access.owned_by.user:32148", "self": "https://zenodo.org/api/users/32148"}, "profile": {"affiliations": "", "full_name": ""}, "username": "WASET", "verified_at": null}}, "communities": {"default": {"access": {"review_policy": "open", "visibility": "public"}, "id": "a59dd046-9a86-4a47-97ce-b51f1bb8fc3f", "links": {"logo": "https://zenodo.org/api/communities/a59dd046-9a86-4a47-97ce-b51f1bb8fc3f/logo"}, "metadata": {"description": "", "title": "World Academy of Science, Engineering and Technology", "type": null}, "slug": "waset"}}}}, "files": {"count": 1, "enabled": true, "entries": {"15928.pdf": {"access": {"hidden": false}, "checksum": "md5:ae4213def0553785a81da08353f92265", "ext": "pdf", "id": "04964632-d420-40ce-93ae-bb0a79ed3925", "key": "15928.pdf", "links": {"content": "https://zenodo.org/api/records/1086031/files/15928.pdf/content", "iiif_api": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/full/full/0/default.png", "iiif_base": "https://zenodo.org/api/iiif/record:1086031:15928.pdf", "iiif_canvas": "https://zenodo.org/api/iiif/record:1086031/canvas/15928.pdf", "iiif_info": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/info.json", "self": "https://zenodo.org/api/records/1086031/files/15928.pdf"}, "metadata": null, "mimetype": "application/pdf", "size": 1920335, "storage_class": "L"}}, "order": [], "total_bytes": 1920335}, "id": "1086031", "is_draft": false, "is_published": true, "links": {"access": "https://zenodo.org/api/records/1086031/access", "access_grants": "https://zenodo.org/api/records/1086031/access/grants", "access_links": "https://zenodo.org/api/records/1086031/access/links", "access_request": "https://zenodo.org/api/records/1086031/access/request", "access_users": "https://zenodo.org/api/records/1086031/access/users", "archive": "https://zenodo.org/api/records/1086031/files-archive", "archive_media": "https://zenodo.org/api/records/1086031/media-files-archive", "communities": "https://zenodo.org/api/records/1086031/communities", "communities-suggestions": "https://zenodo.org/api/records/1086031/communities-suggestions", "doi": "https://doi.org/10.5281/zenodo.1086031", "draft": "https://zenodo.org/api/records/1086031/draft", "files": "https://zenodo.org/api/records/1086031/files", "latest": "https://zenodo.org/api/records/1086031/versions/latest", "latest_html": "https://zenodo.org/records/1086031/latest", "media_files": "https://zenodo.org/api/records/1086031/media-files", "parent": "https://zenodo.org/api/records/1086030", "parent_doi": "https://doi.org/10.5281/zenodo.1086030", "parent_doi_html": "https://zenodo.org/doi/10.5281/zenodo.1086030", "parent_html": "https://zenodo.org/records/1086030", "requests": "https://zenodo.org/api/records/1086031/requests", "reserve_doi": "https://zenodo.org/api/records/1086031/draft/pids/doi", "self": "https://zenodo.org/api/records/1086031", "self_doi": "https://doi.org/10.5281/zenodo.1086031", "self_doi_html": "https://zenodo.org/doi/10.5281/zenodo.1086031", "self_html": "https://zenodo.org/records/1086031", "self_iiif_manifest": "https://zenodo.org/api/iiif/record:1086031/manifest", "self_iiif_sequence": "https://zenodo.org/api/iiif/record:1086031/sequence/default", "thumbnails": {"10": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/full/^10,/0/default.jpg", "100": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/full/^100,/0/default.jpg", "1200": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/full/^1200,/0/default.jpg", "250": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/full/^250,/0/default.jpg", "50": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/full/^50,/0/default.jpg", "750": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/full/^750,/0/default.jpg"}, "versions": "https://zenodo.org/api/records/1086031/versions"}, "media_files": {"count": 1, "enabled": true, "entries": {"15928.pdf.ptif": {"access": {"hidden": true}, "ext": "ptif", "id": "a00751c2-d22c-473f-9607-6cb75a5b1dc4", "key": "15928.pdf.ptif", "links": {"content": "https://zenodo.org/api/records/1086031/files/15928.pdf.ptif/content", "self": "https://zenodo.org/api/records/1086031/files/15928.pdf.ptif"}, "metadata": null, "mimetype": "application/octet-stream", "processor": {"source_file_id": "04964632-d420-40ce-93ae-bb0a79ed3925", "status": "finished", "type": "image-tiles"}, "size": 0, "storage_class": "L"}}, "order": [], "total_bytes": 0}, "metadata": {"creators": [{"person_or_org": {"family_name": "Matthias Dehmer", "name": "Matthias Dehmer", "type": "personal"}}, {"person_or_org": {"family_name": "Frank Emmert Streib", "name": "Frank Emmert Streib", "type": "personal"}}, {"person_or_org": {"family_name": "Alexander Mehler", "name": "Alexander Mehler", "type": "personal"}}, {"person_or_org": {"family_name": "J\u00fcrgen Kilian", "name": "J\u00fcrgen Kilian", "type": "personal"}}], "description": "\u003cp\u003eMost known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.\u003c/p\u003e", "languages": [{"id": "eng", "title": {"en": "English"}}], "publication_date": "2007-10-20", "publisher": "Zenodo", "references": [{"reference": "R. Bellman, Dynamic Programming. Princeton University Press, 1957"}, {"reference": "R. A. Botafogo, B. Shneiderman: Structural analysis of hypertexts:\nIdentifying hierarchies and useful metrics, ACM Trans. Inf. Syst. 10\n(2), 1992, 142-180"}, {"reference": "S. Chakrabarti: Mining the Web. Discovering Knowledge from Hypertext\nData, Morgen and Kaufmann Publishers, 2003"}, {"reference": "S. Chakrabarti: Integrating the document object model with hyperlinks\nfor enhanced topic distillation and information extraction, Proc. of the\n10th International World Wide Web Conference, Hong Kong, 2001, 211-\n220"}, {"reference": "I. F. Cruz, S. Borisov, M. A. Marks, T. R. Webb: Measuring Structural\nSimilarity Among Web Documents: Preliminary Results , Lecture Notes\nIn Computer Science, Vol. 1375, 1998"}, {"reference": "M. Dehmer, Strukturelle Analyse web-basierter Dokumente, Ph.D Thesis,\nDepartment of Computer Science, Technische Universit\u252c\u00bfat Darmstadt,\n2005, unpublished"}, {"reference": "M. Dehmer, R. Gleim, A. Mehler: Aspekte der Kategorisierung von\nWebseiten, GI-Edition - Lecture Notes in Informatics (LNI) - Proceedings,\nJahrestagung der Gesellschaft f\u252c\u00bfur Informatik, Informatik 2004,\nUlm/Germany, 2004, 39-43"}, {"reference": "R. Gleim: HyGraph - Ein Framework zur Extraktion, Repr\u252c\u00bfasentation\nund Analyse webbasierter Hypertextstrukturen, Beitr\u252c\u00bfage zur GLDVTagung\n2005, Bonn/Germany, 2005"}, {"reference": "D. Gusfield: Algorithms on Strings, Trees, and Sequences: Computer\nScience and Computational Biology, Cambridge University Press, 1997\n[10] T. Jiang, L. Wang, K. Zhang: Alignment of trees - An alternative to tree\nedit, Theoretical Computer Science, Elsevier, Vol. 143, 1995, 137-148\n[11] S. Joshi, N. Agrawal, R. Krishnapuram, S. Negi,: Bag of Paths Model\nfor Measuring Structural Similarity in Web Documents, Proceedings of\nthe ACM International Conference on Knowledge Discovery and Data\nMining (SIGKDD), 2003, 577-582.\n[12] Mehler A.: Textbedeutung. Zur prozeduralen Analyse und\nRepr\u252c\u00bfasentation struktureller \u252c\u00bfAhnlichkeiten von Texten, Peter Lang,\nEurop\u252c\u00bfaischer Verlag der Wissenschaften, 2001\n[13] A. Mehler, M. Dehmer, R. Gleim: Towards logical hypertext structure.\nA graph-theoretic perspective, Proc. of I2CS-04, Guadalajara/Mexico,\nLecture Notes in Computer Science, Berlin-New York: Springer, 2004\n[14] A. Mehler, R. Gleim, M. Dehmer: Towards structure-sensitive hypertext\ncategorization, to appear in: Proceedings of the 29-th Annual Conference\nof the German Classification Society, 2005\n[15] S. M. Selkow: The tree-to-tree editing problem, Information Processing\nLetters, Vol. 6 (6), 1977, 184-186\n[16] T. F. Smith, M. S. Waterman: Identification of common molecular\nsubsequences, Journal of Molecular Biology, Vol. 147 (1), 1981, 195-\n197\n[17] F. Sobik, Graphmetriken und Klassifikation strukturierter Objekte, ZKIInformationen,\nAkad. Wiss. DDR, Vol. 2 (82), 1982, 63-122\n[18] J. R. Ullman, An algorithm for subgraph isomorphism, J. ACM, Vol. 23\n(1), 1976, 31-42\n[19] P. H. Winne., L. Gupta, J. C. Nesbit: Exploring individual differences in\nstudying strategies using graph theoretic statistics, The Alberta Journal\nof Educational Research, Vol. 40, 1994, 177-193\n[20] A. Winter: Exchanching Graphs with GXL, http://www.gupro.\nde/GXL\n[21] Y. Yang, S. Slattery, R. Ghani: A study of approaches to hypertext\ncategorization, Journal of Intelligent Information Systems, Vol. 18 (2-3),\n2002, 219-241\n[22] K. Zhang, D. Shasha: Simple fast algorithms for the editing distance\nbetween trees and related problems, SIAM Journal of Computing, Vol.\n18 (6), 1989, 1245-1262\n[23] B. Zelinka, On a certain distance between isomorphism classes of\ngraphs, \u2566\u00e7 Casopis pro \u2566\u00e7pest. Mathematiky, Vol. 100, 1975, 371-373"}], "resource_type": {"id": "publication-article", "title": {"de": "Zeitschriftenartikel", "en": "Journal article"}}, "rights": [{"description": {"en": "The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited."}, "icon": "cc-by-icon", "id": "cc-by-4.0", "props": {"scheme": "spdx", "url": "https://creativecommons.org/licenses/by/4.0/legalcode"}, "title": {"en": "Creative Commons Attribution 4.0 International"}}], "subjects": [{"subject": "Graph similarity"}, {"subject": "hierarchical and directed graphs"}, {"subject": "hypertext"}, {"subject": "generalized trees"}, {"subject": "web structure mining."}], "title": "Measuring the Structural Similarity of Web-based Documents: A Novel Approach", "version": "15928"}, "parent": {"access": {"owned_by": {"user": "32148"}, "settings": {"accept_conditions_text": null, "allow_guest_requests": false, "allow_user_requests": false, "secret_link_expiration": 0}}, "communities": {"default": "a59dd046-9a86-4a47-97ce-b51f1bb8fc3f", "entries": [{"access": {"member_policy": "open", "members_visibility": "public", "record_submission_policy": "open", "review_policy": "open", "visibility": "public"}, "children": {"allow": false}, "created": "2017-05-31T21:24:26.028360+00:00", "custom_fields": {}, "deletion_status": {"is_deleted": false, "status": "P"}, "id": "a59dd046-9a86-4a47-97ce-b51f1bb8fc3f", "links": {}, "metadata": {"curation_policy": "", "description": "", "page": "", "title": "World Academy of Science, Engineering and Technology"}, "revision_id": 0, "slug": "waset", "updated": "2017-11-15T12:37:31.935319+00:00"}], "ids": ["a59dd046-9a86-4a47-97ce-b51f1bb8fc3f"]}, "id": "1086030", "pids": {"doi": {"client": "datacite", "identifier": "10.5281/zenodo.1086030", "provider": "datacite"}}}, "pids": {"doi": {"client": "datacite", "identifier": "10.5281/zenodo.1086031", "provider": "datacite"}, "oai": {"identifier": "oai:zenodo.org:1086031", "provider": "oai"}}, "revision_id": 8, "stats": {"all_versions": {"data_volume": 59530385.0, "downloads": 31, "unique_downloads": 29, "unique_views": 70, "views": 71}, "this_version": {"data_volume": 59530385.0, "downloads": 31, "unique_downloads": 29, "unique_views": 70, "views": 71}}, "status": "published", "ui": {"access_status": {"description_l10n": "The record and files are publicly accessible.", "embargo_date_l10n": null, "icon": "unlock", "id": "open", "message_class": "", "title_l10n": "Open"}, "created_date_l10n_long": "January 17, 2018", "creators": {"affiliations": [], "creators": [{"person_or_org": {"family_name": "Matthias Dehmer", "name": "Matthias Dehmer", "type": "personal"}}, {"person_or_org": {"family_name": "Frank Emmert Streib", "name": "Frank Emmert Streib", "type": "personal"}}, {"person_or_org": {"family_name": "Alexander Mehler", "name": "Alexander Mehler", "type": "personal"}}, {"person_or_org": {"family_name": "J\u00fcrgen Kilian", "name": "J\u00fcrgen Kilian", "type": "personal"}}]}, "custom_fields": {}, "description_stripped": "Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.", "is_draft": false, "languages": [{"id": "eng", "title_l10n": "English"}], "publication_date_l10n_long": "October 20, 2007", "publication_date_l10n_medium": "Oct 20, 2007", "resource_type": {"id": "publication-article", "title_l10n": "Journal article"}, "rights": [{"description_l10n": "The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited.", "icon": "cc-by-icon", "id": "cc-by-4.0", "props": {"scheme": "spdx", "url": "https://creativecommons.org/licenses/by/4.0/legalcode"}, "title_l10n": "Creative Commons Attribution 4.0 International"}], "updated_date_l10n_long": "August 2, 2024", "version": "15928"}, "updated": "2024-08-02T22:55:33.371537+00:00", "versions": {"index": 1, "is_latest": true}}' data-preview='false'> <div class="rel-p-1"></div> <div class="ui fluid placeholder rel-mr-1 rel-ml-1"></div> <div class="header"> <div class="line"></div> <div class="line"></div> <div class="line"></div> </div> </div> </div> </div> </div><div class="sidebar-container"> <h2 class="ui small top attached header">External resources</h2> <div id="external-resource" aria-label="External resources" class="ui bottom attached segment rdm-sidebar external resource"> <h3 class="ui small header">Indexed in</h3> <ul class="ui relaxed list no-bullet"> <li class="item flex align-items-center"> <img class="ui image" src="/static/images/openaire.svg" alt="" width="32"> <div class="content"> <a class="header" href="https://explore.openaire.eu/search/publication?pid=10.5281/zenodo.1086031" target="_blank" rel="noreferrer" >OpenAIRE </a> </div> </li></ul></div> </div><div id="sidebar-communities-manage" data-user-communities-memberships='{}' data-record-community-endpoint="https://zenodo.org/api/records/1086031/communities" data-record-community-search-endpoint="https://zenodo.org/api/records/1086031/communities-suggestions" data-record-user-community-search-endpoint="" data-can-manage-record='false' data-pending-communities-search-config='{"aggs": [{"aggName": "type", "field": "type", "title": "Type"}, {"aggName": "status", "field": "status", "title": "Status"}], "appId": "InvenioAppRdm.RecordRequests", "defaultSortingOnEmptyQueryString": [{"sortBy": "newest"}], "initialQueryState": {"filters": [], "hiddenParams": [["expand", "1"], ["is_open", "true"], ["type", "community-inclusion"], ["type", "community-submission"]], "layout": "list", "page": 1, "size": 10, "sortBy": "bestmatch"}, "layoutOptions": {"gridView": false, "listView": true}, "paginationOptions": {"defaultValue": 10, "maxTotalResults": 10000, "resultsPerPage": [{"text": "10", "value": 10}, {"text": "20", "value": 20}, {"text": "50", "value": 50}]}, "searchApi": {"axios": {"headers": {"Accept": "application/json"}, "url": "https://zenodo.org/api/records/1086031/requests", "withCredentials": true}, "invenio": {"requestSerializer": "InvenioRecordsResourcesRequestSerializer"}}, "sortOptions": [{"sortBy": "bestmatch", "text": "Best match"}, {"sortBy": "newest", "text": "Newest"}, {"sortBy": "oldest", "text": "Oldest"}], "sortOrderDisabled": true}' data-record-community-search-config='{"aggs": [{"aggName": "type", "field": "type", "title": "Type"}, {"aggName": "funder", "field": "metadata.funding.funder", "title": "Funders"}, {"aggName": "organization", "field": "metadata.organizations", "title": "Organizations"}], "appId": "InvenioAppRdm.RecordCommunitiesSuggestions", "defaultSortingOnEmptyQueryString": [{"sortBy": "newest"}], "initialQueryState": {"filters": [], "hiddenParams": null, "layout": "list", "page": 1, "size": 10, "sortBy": "bestmatch"}, "layoutOptions": {"gridView": false, "listView": true}, "paginationOptions": {"defaultValue": 10, "maxTotalResults": 10000, "resultsPerPage": [{"text": "10", "value": 10}, {"text": "20", "value": 20}]}, "searchApi": {"axios": {"headers": {"Accept": "application/vnd.inveniordm.v1+json"}, "url": "https://zenodo.org/api/records/1086031/communities-suggestions", "withCredentials": true}, "invenio": {"requestSerializer": "InvenioRecordsResourcesRequestSerializer"}}, "sortOptions": [{"sortBy": "bestmatch", "text": "Best match"}, {"sortBy": "newest", "text": "Newest"}, {"sortBy": "oldest", "text": "Oldest"}], "sortOrderDisabled": true}' data-record-user-community-search-config='{"aggs": [{"aggName": "type", "field": "type", "title": "Type"}, {"aggName": "funder", "field": "metadata.funding.funder", "title": "Funders"}, {"aggName": "organization", "field": "metadata.organizations", "title": "Organizations"}], "appId": "InvenioAppRdm.RecordUserCommunitiesSuggestions", "defaultSortingOnEmptyQueryString": [{"sortBy": "newest"}], "initialQueryState": {"filters": [], "hiddenParams": [["membership", "true"]], "layout": "list", "page": 1, "size": 10, "sortBy": "bestmatch"}, "layoutOptions": {"gridView": false, "listView": true}, "paginationOptions": {"defaultValue": 10, "maxTotalResults": 10000, "resultsPerPage": [{"text": "10", "value": 10}, {"text": "20", "value": 20}]}, "searchApi": {"axios": {"headers": {"Accept": "application/vnd.inveniordm.v1+json"}, "url": "https://zenodo.org/api/records/1086031/communities-suggestions", "withCredentials": true}, "invenio": {"requestSerializer": "InvenioRecordsResourcesRequestSerializer"}}, "sortOptions": [{"sortBy": "bestmatch", "text": "Best match"}, {"sortBy": "newest", "text": "Newest"}, {"sortBy": "oldest", "text": "Oldest"}], "sortOrderDisabled": true}' data-permissions='{"can_edit": false, "can_manage": false, "can_media_read_files": true, "can_moderate": false, "can_new_version": false, "can_read_files": true, "can_review": false, "can_update_draft": false, "can_view": false}' class="sidebar-container" > <h2 class="ui medium top attached header">Communities</h2> <div class="ui segment bottom attached rdm-sidebar"> <div class="ui fluid placeholder"> <div class="image header"> <div class="line"></div> <div class="line"></div> </div> <div class="image header"> <div class="line"></div> <div class="line"></div> </div> <div class="image header"> <div class="line"></div> <div class="line"></div> </div> </div> </div> </div> <div class="sidebar-container"> <h2 class="ui medium top attached header mt-0">Keywords and subjects</h2> <div id="keywords-and-subjects" aria-label="Keywords and subjects" class="ui segment bottom attached rdm-sidebar"> <h3 class="hidden">Keywords</h3> <ul class="ui horizontal list no-bullets subjects"> <li class="item"> <a href="/search?q=metadata.subjects.subject%3A%22Graph+similarity%22" class="subject" title="Search results for Graph similarity" > Graph similarity </a> </li> <li class="item"> <a href="/search?q=metadata.subjects.subject%3A%22hierarchical+and+directed+graphs%22" class="subject" title="Search results for hierarchical and directed graphs" > hierarchical and directed graphs </a> </li> <li class="item"> <a href="/search?q=metadata.subjects.subject%3A%22hypertext%22" class="subject" title="Search results for hypertext" > hypertext </a> </li> <li class="item"> <a href="/search?q=metadata.subjects.subject%3A%22generalized+trees%22" class="subject" title="Search results for generalized trees" > generalized trees </a> </li> <li class="item"> <a href="/search?q=metadata.subjects.subject%3A%22web+structure+mining.%22" class="subject" title="Search results for web structure mining." > web structure mining. </a> </li> </ul> </div> </div> <div class="sidebar-container"> <h2 class="ui medium top attached header mt-0">Details</h2> <div id="record-details" class="ui segment bottom attached rdm-sidebar"> <dl class="details-list"> <dt class="ui tiny header">DOI <dd> <span class="get-badge" data-toggle="tooltip" data-placement="bottom" style="cursor: pointer;" title="Get the DOI badge!"> <img id='record-doi-badge' data-target="[data-modal='10.5281/zenodo.1086031']" src="/badge/DOI/10.5281/zenodo.1086031.svg" alt="10.5281/zenodo.1086031" /> </span> <div id="doi-modal" class="ui modal fade badge-modal" data-modal="10.5281/zenodo.1086031"> <div class="header">DOI Badge</div> <div class="content"> <h4> <small>DOI</small> </h4> <h4> <pre>10.5281/zenodo.1086031</pre> </h4> <h3 class="ui small header"> Markdown </h3> <div class="ui message code"> <pre>[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1086031.svg)](https://doi.org/10.5281/zenodo.1086031)</pre> </div> <h3 class="ui small header"> reStructuredText </h3> <div class="ui message code"> <pre>.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.1086031.svg :target: https://doi.org/10.5281/zenodo.1086031</pre> </div> <h3 class="ui small header"> HTML </h3> <div class="ui message code"> <pre><a href="https://doi.org/10.5281/zenodo.1086031"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.1086031.svg" alt="DOI"></a></pre> </div> <h3 class="ui small header"> Image URL </h3> <div class="ui message code"> <pre>https://zenodo.org/badge/DOI/10.5281/zenodo.1086031.svg</pre> </div> <h3 class="ui small header"> Target URL </h3> <div class="ui message code"> <pre>https://doi.org/10.5281/zenodo.1086031</pre> </div> </div> </div> </dd> <dt class="ui tiny header">Resource type</dt> <dd>Journal article</dd> <dt class="ui tiny header">Publisher</dt> <dd>Zenodo</dd> <dt class="ui tiny header">Languages</dt> <dd> English </dd> </dl> </div> </div> <div class="sidebar-container"> <h2 class="ui medium top attached header mt-0">Rights</h2> <div id="licenses" class="ui segment bottom attached rdm-sidebar"> <ul class="details-list m-0 p-0"> <li id="license-cc-by-4.0-1" class="has-popup"> <div id="title-cc-by-4.0-1" class="license clickable" tabindex="0" aria-haspopup="dialog" aria-expanded="false" role="button" aria-label="Creative Commons Attribution 4.0 International" > <span class="icon-wrap"> <img class="icon" src="/static/icons/licenses/cc-by-icon.svg" alt="cc-by-4.0 icon"/> </span> <span class="title-text"> Creative Commons Attribution 4.0 International </span> </div> <div id="description-cc-by-4.0-1" class="licenses-description ui flowing popup transition hidden" role="dialog" aria-labelledby="title-cc-by-4.0-1" > <i role="button" tabindex="0" class="close icon text-muted" aria-label="Close"></i> <div id="license-description-1" class="description"> <span class="text-muted"> The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. </span> <a class="license-link" href="https://creativecommons.org/licenses/by/4.0/legalcode" target="_blank" title="Opens in new tab">Read more</a> </div> </div> </li> </ul> </div> </div> <div class="sidebar-container"> <h2 class="ui medium top attached header mt-0">Citation</h2> <div id="citation" class="ui segment bottom attached rdm-sidebar"> <div id="recordCitation" data-record='{"access": {"embargo": {"active": false, "reason": null}, "files": "public", "record": "public", "status": "open"}, "created": "2018-01-17T09:24:37.141964+00:00", "custom_fields": {}, "deletion_status": {"is_deleted": false, "status": "P"}, "expanded": {"parent": {"access": {"owned_by": {"active": null, "blocked_at": null, "confirmed_at": null, "email": "", "id": "32148", "is_current_user": false, "links": {"avatar": "https://zenodo.org/api/users/32148/avatar.svg", "records_html": "https://zenodo.org/search/records?q=parent.access.owned_by.user:32148", "self": "https://zenodo.org/api/users/32148"}, "profile": {"affiliations": "", "full_name": ""}, "username": "WASET", "verified_at": null}}, "communities": {"default": {"access": {"review_policy": "open", "visibility": "public"}, "id": "a59dd046-9a86-4a47-97ce-b51f1bb8fc3f", "links": {"logo": "https://zenodo.org/api/communities/a59dd046-9a86-4a47-97ce-b51f1bb8fc3f/logo"}, "metadata": {"description": "", "title": "World Academy of Science, Engineering and Technology", "type": null}, "slug": "waset"}}}}, "files": {"count": 1, "enabled": true, "entries": {"15928.pdf": {"access": {"hidden": false}, "checksum": "md5:ae4213def0553785a81da08353f92265", "ext": "pdf", "id": "04964632-d420-40ce-93ae-bb0a79ed3925", "key": "15928.pdf", "links": {"content": "https://zenodo.org/api/records/1086031/files/15928.pdf/content", "iiif_api": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/full/full/0/default.png", "iiif_base": "https://zenodo.org/api/iiif/record:1086031:15928.pdf", "iiif_canvas": "https://zenodo.org/api/iiif/record:1086031/canvas/15928.pdf", "iiif_info": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/info.json", "self": "https://zenodo.org/api/records/1086031/files/15928.pdf"}, "metadata": null, "mimetype": "application/pdf", "size": 1920335, "storage_class": "L"}}, "order": [], "total_bytes": 1920335}, "id": "1086031", "is_draft": false, "is_published": true, "links": {"access": "https://zenodo.org/api/records/1086031/access", "access_grants": "https://zenodo.org/api/records/1086031/access/grants", "access_links": "https://zenodo.org/api/records/1086031/access/links", "access_request": "https://zenodo.org/api/records/1086031/access/request", "access_users": "https://zenodo.org/api/records/1086031/access/users", "archive": "https://zenodo.org/api/records/1086031/files-archive", "archive_media": "https://zenodo.org/api/records/1086031/media-files-archive", "communities": "https://zenodo.org/api/records/1086031/communities", "communities-suggestions": "https://zenodo.org/api/records/1086031/communities-suggestions", "doi": "https://doi.org/10.5281/zenodo.1086031", "draft": "https://zenodo.org/api/records/1086031/draft", "files": "https://zenodo.org/api/records/1086031/files", "latest": "https://zenodo.org/api/records/1086031/versions/latest", "latest_html": "https://zenodo.org/records/1086031/latest", "media_files": "https://zenodo.org/api/records/1086031/media-files", "parent": "https://zenodo.org/api/records/1086030", "parent_doi": "https://doi.org/10.5281/zenodo.1086030", "parent_doi_html": "https://zenodo.org/doi/10.5281/zenodo.1086030", "parent_html": "https://zenodo.org/records/1086030", "requests": "https://zenodo.org/api/records/1086031/requests", "reserve_doi": "https://zenodo.org/api/records/1086031/draft/pids/doi", "self": "https://zenodo.org/api/records/1086031", "self_doi": "https://doi.org/10.5281/zenodo.1086031", "self_doi_html": "https://zenodo.org/doi/10.5281/zenodo.1086031", "self_html": "https://zenodo.org/records/1086031", "self_iiif_manifest": "https://zenodo.org/api/iiif/record:1086031/manifest", "self_iiif_sequence": "https://zenodo.org/api/iiif/record:1086031/sequence/default", "thumbnails": {"10": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/full/^10,/0/default.jpg", "100": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/full/^100,/0/default.jpg", "1200": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/full/^1200,/0/default.jpg", "250": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/full/^250,/0/default.jpg", "50": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/full/^50,/0/default.jpg", "750": "https://zenodo.org/api/iiif/record:1086031:15928.pdf/full/^750,/0/default.jpg"}, "versions": "https://zenodo.org/api/records/1086031/versions"}, "media_files": {"count": 1, "enabled": true, "entries": {"15928.pdf.ptif": {"access": {"hidden": true}, "ext": "ptif", "id": "a00751c2-d22c-473f-9607-6cb75a5b1dc4", "key": "15928.pdf.ptif", "links": {"content": "https://zenodo.org/api/records/1086031/files/15928.pdf.ptif/content", "self": "https://zenodo.org/api/records/1086031/files/15928.pdf.ptif"}, "metadata": null, "mimetype": "application/octet-stream", "processor": {"source_file_id": "04964632-d420-40ce-93ae-bb0a79ed3925", "status": "finished", "type": "image-tiles"}, "size": 0, "storage_class": "L"}}, "order": [], "total_bytes": 0}, "metadata": {"creators": [{"person_or_org": {"family_name": "Matthias Dehmer", "name": "Matthias Dehmer", "type": "personal"}}, {"person_or_org": {"family_name": "Frank Emmert Streib", "name": "Frank Emmert Streib", "type": "personal"}}, {"person_or_org": {"family_name": "Alexander Mehler", "name": "Alexander Mehler", "type": "personal"}}, {"person_or_org": {"family_name": "J\u00fcrgen Kilian", "name": "J\u00fcrgen Kilian", "type": "personal"}}], "description": "\u003cp\u003eMost known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.\u003c/p\u003e", "languages": [{"id": "eng", "title": {"en": "English"}}], "publication_date": "2007-10-20", "publisher": "Zenodo", "references": [{"reference": "R. Bellman, Dynamic Programming. Princeton University Press, 1957"}, {"reference": "R. A. Botafogo, B. Shneiderman: Structural analysis of hypertexts:\nIdentifying hierarchies and useful metrics, ACM Trans. Inf. Syst. 10\n(2), 1992, 142-180"}, {"reference": "S. Chakrabarti: Mining the Web. Discovering Knowledge from Hypertext\nData, Morgen and Kaufmann Publishers, 2003"}, {"reference": "S. Chakrabarti: Integrating the document object model with hyperlinks\nfor enhanced topic distillation and information extraction, Proc. of the\n10th International World Wide Web Conference, Hong Kong, 2001, 211-\n220"}, {"reference": "I. F. Cruz, S. Borisov, M. A. Marks, T. R. Webb: Measuring Structural\nSimilarity Among Web Documents: Preliminary Results , Lecture Notes\nIn Computer Science, Vol. 1375, 1998"}, {"reference": "M. Dehmer, Strukturelle Analyse web-basierter Dokumente, Ph.D Thesis,\nDepartment of Computer Science, Technische Universit\u252c\u00bfat Darmstadt,\n2005, unpublished"}, {"reference": "M. Dehmer, R. Gleim, A. Mehler: Aspekte der Kategorisierung von\nWebseiten, GI-Edition - Lecture Notes in Informatics (LNI) - Proceedings,\nJahrestagung der Gesellschaft f\u252c\u00bfur Informatik, Informatik 2004,\nUlm/Germany, 2004, 39-43"}, {"reference": "R. Gleim: HyGraph - Ein Framework zur Extraktion, Repr\u252c\u00bfasentation\nund Analyse webbasierter Hypertextstrukturen, Beitr\u252c\u00bfage zur GLDVTagung\n2005, Bonn/Germany, 2005"}, {"reference": "D. Gusfield: Algorithms on Strings, Trees, and Sequences: Computer\nScience and Computational Biology, Cambridge University Press, 1997\n[10] T. Jiang, L. Wang, K. Zhang: Alignment of trees - An alternative to tree\nedit, Theoretical Computer Science, Elsevier, Vol. 143, 1995, 137-148\n[11] S. Joshi, N. Agrawal, R. Krishnapuram, S. Negi,: Bag of Paths Model\nfor Measuring Structural Similarity in Web Documents, Proceedings of\nthe ACM International Conference on Knowledge Discovery and Data\nMining (SIGKDD), 2003, 577-582.\n[12] Mehler A.: Textbedeutung. Zur prozeduralen Analyse und\nRepr\u252c\u00bfasentation struktureller \u252c\u00bfAhnlichkeiten von Texten, Peter Lang,\nEurop\u252c\u00bfaischer Verlag der Wissenschaften, 2001\n[13] A. Mehler, M. Dehmer, R. Gleim: Towards logical hypertext structure.\nA graph-theoretic perspective, Proc. of I2CS-04, Guadalajara/Mexico,\nLecture Notes in Computer Science, Berlin-New York: Springer, 2004\n[14] A. Mehler, R. Gleim, M. Dehmer: Towards structure-sensitive hypertext\ncategorization, to appear in: Proceedings of the 29-th Annual Conference\nof the German Classification Society, 2005\n[15] S. M. Selkow: The tree-to-tree editing problem, Information Processing\nLetters, Vol. 6 (6), 1977, 184-186\n[16] T. F. Smith, M. S. Waterman: Identification of common molecular\nsubsequences, Journal of Molecular Biology, Vol. 147 (1), 1981, 195-\n197\n[17] F. Sobik, Graphmetriken und Klassifikation strukturierter Objekte, ZKIInformationen,\nAkad. Wiss. DDR, Vol. 2 (82), 1982, 63-122\n[18] J. R. Ullman, An algorithm for subgraph isomorphism, J. ACM, Vol. 23\n(1), 1976, 31-42\n[19] P. H. Winne., L. Gupta, J. C. Nesbit: Exploring individual differences in\nstudying strategies using graph theoretic statistics, The Alberta Journal\nof Educational Research, Vol. 40, 1994, 177-193\n[20] A. Winter: Exchanching Graphs with GXL, http://www.gupro.\nde/GXL\n[21] Y. Yang, S. Slattery, R. Ghani: A study of approaches to hypertext\ncategorization, Journal of Intelligent Information Systems, Vol. 18 (2-3),\n2002, 219-241\n[22] K. Zhang, D. Shasha: Simple fast algorithms for the editing distance\nbetween trees and related problems, SIAM Journal of Computing, Vol.\n18 (6), 1989, 1245-1262\n[23] B. Zelinka, On a certain distance between isomorphism classes of\ngraphs, \u2566\u00e7 Casopis pro \u2566\u00e7pest. Mathematiky, Vol. 100, 1975, 371-373"}], "resource_type": {"id": "publication-article", "title": {"de": "Zeitschriftenartikel", "en": "Journal article"}}, "rights": [{"description": {"en": "The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited."}, "icon": "cc-by-icon", "id": "cc-by-4.0", "props": {"scheme": "spdx", "url": "https://creativecommons.org/licenses/by/4.0/legalcode"}, "title": {"en": "Creative Commons Attribution 4.0 International"}}], "subjects": [{"subject": "Graph similarity"}, {"subject": "hierarchical and directed graphs"}, {"subject": "hypertext"}, {"subject": "generalized trees"}, {"subject": "web structure mining."}], "title": "Measuring the Structural Similarity of Web-based Documents: A Novel Approach", "version": "15928"}, "parent": {"access": {"owned_by": {"user": "32148"}, "settings": {"accept_conditions_text": null, "allow_guest_requests": false, "allow_user_requests": false, "secret_link_expiration": 0}}, "communities": {"default": "a59dd046-9a86-4a47-97ce-b51f1bb8fc3f", "entries": [{"access": {"member_policy": "open", "members_visibility": "public", "record_submission_policy": "open", "review_policy": "open", "visibility": "public"}, "children": {"allow": false}, "created": "2017-05-31T21:24:26.028360+00:00", "custom_fields": {}, "deletion_status": {"is_deleted": false, "status": "P"}, "id": "a59dd046-9a86-4a47-97ce-b51f1bb8fc3f", "links": {}, "metadata": {"curation_policy": "", "description": "", "page": "", "title": "World Academy of Science, Engineering and Technology"}, "revision_id": 0, "slug": "waset", "updated": "2017-11-15T12:37:31.935319+00:00"}], "ids": ["a59dd046-9a86-4a47-97ce-b51f1bb8fc3f"]}, "id": "1086030", "pids": {"doi": {"client": "datacite", "identifier": "10.5281/zenodo.1086030", "provider": "datacite"}}}, "pids": {"doi": {"client": "datacite", "identifier": "10.5281/zenodo.1086031", "provider": "datacite"}, "oai": {"identifier": "oai:zenodo.org:1086031", "provider": "oai"}}, "revision_id": 8, "stats": {"all_versions": {"data_volume": 59530385.0, "downloads": 31, "unique_downloads": 29, "unique_views": 70, "views": 71}, "this_version": {"data_volume": 59530385.0, "downloads": 31, "unique_downloads": 29, "unique_views": 70, "views": 71}}, "status": "published", "ui": {"access_status": {"description_l10n": "The record and files are publicly accessible.", "embargo_date_l10n": null, "icon": "unlock", "id": "open", "message_class": "", "title_l10n": "Open"}, "created_date_l10n_long": "January 17, 2018", "creators": {"affiliations": [], "creators": [{"person_or_org": {"family_name": "Matthias Dehmer", "name": "Matthias Dehmer", "type": "personal"}}, {"person_or_org": {"family_name": "Frank Emmert Streib", "name": "Frank Emmert Streib", "type": "personal"}}, {"person_or_org": {"family_name": "Alexander Mehler", "name": "Alexander Mehler", "type": "personal"}}, {"person_or_org": {"family_name": "J\u00fcrgen Kilian", "name": "J\u00fcrgen Kilian", "type": "personal"}}]}, "custom_fields": {}, "description_stripped": "Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.", "is_draft": false, "languages": [{"id": "eng", "title_l10n": "English"}], "publication_date_l10n_long": "October 20, 2007", "publication_date_l10n_medium": "Oct 20, 2007", "resource_type": {"id": "publication-article", "title_l10n": "Journal article"}, "rights": [{"description_l10n": "The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited.", "icon": "cc-by-icon", "id": "cc-by-4.0", "props": {"scheme": "spdx", "url": "https://creativecommons.org/licenses/by/4.0/legalcode"}, "title_l10n": "Creative Commons Attribution 4.0 International"}], "updated_date_l10n_long": "August 2, 2024", "version": "15928"}, "updated": "2024-08-02T22:55:33.371537+00:00", "versions": {"index": 1, "is_latest": true}}' data-styles='[["apa", "APA"], ["harvard-cite-them-right", "Harvard"], ["modern-language-association", "MLA"], ["vancouver", "Vancouver"], ["chicago-fullnote-bibliography", "Chicago"], ["ieee", "IEEE"]]' data-defaultstyle='"apa"' data-include-deleted='false'> </div> </div> </div> <div class="sidebar-container"> <h2 class="ui medium top attached header mt-0">Export</h2> <div id="export-record" class="ui segment bottom attached exports rdm-sidebar"> <div id="recordExportDownload" data-formats='[{"export_url": "/records/1086031/export/json", "name": "JSON"}, {"export_url": "/records/1086031/export/json-ld", "name": "JSON-LD"}, {"export_url": "/records/1086031/export/csl", "name": "CSL"}, {"export_url": "/records/1086031/export/datacite-json", "name": "DataCite JSON"}, {"export_url": "/records/1086031/export/datacite-xml", "name": "DataCite XML"}, {"export_url": "/records/1086031/export/dublincore", "name": "Dublin Core XML"}, {"export_url": "/records/1086031/export/marcxml", "name": "MARCXML"}, {"export_url": "/records/1086031/export/bibtex", "name": "BibTeX"}, {"export_url": "/records/1086031/export/geojson", "name": "GeoJSON"}, {"export_url": "/records/1086031/export/dcat-ap", "name": "DCAT"}, {"export_url": "/records/1086031/export/codemeta", "name": "Codemeta"}, {"export_url": "/records/1086031/export/cff", "name": "Citation File Format"}]'></div> </div> </div> <section id="upload-info" role="note" aria-label="Upload information" class="sidebar-container ui segment rdm-sidebar text-muted" > <h2 class="ui small header text-muted p-0 mb-5"><small>Technical metadata</small></h2> <dl class="m-0"> <dt class="inline"><small>Created</small></dt> <dd class="inline"> <small>January 17, 2018</small> </dd> <div> <dt class="rel-mt-1 inline"><small>Modified</small></dt> <dd class="inline"> <small>August 2, 2024</small> </dd> </div> </dl> </section> </aside> </div> </div> <div class="ui container"> <div class="ui relaxed grid"> <div class="two column row"> <div class="sixteen wide tablet eleven wide computer column"> <div class="ui grid"> <div class="centered row rel-mt-1"> <button id="jump-btn" class="jump-to-top ui button labeled icon" aria-label="Jump to top of page"> <i class="arrow alternate circle up outline icon"></i> Jump up </button> </div> </div></div> </div> </div> </div> </div> </div> </main> <footer id="rdm-footer-element"> <div class="footer-top"> <div class="ui container app-rdm-footer"> <div class="ui equal width stackable grid zenodo-footer"> <div class="column"> <h2 class="ui inverted tiny header">About</h2> <ul class="ui inverted link list"> <li class="item"> <a href="https://about.zenodo.org">About</a> </li> <li class="item"> <a href="https://about.zenodo.org/policies">Policies</a> </li> <li class="item"> <a href="https://about.zenodo.org/infrastructure">Infrastructure</a> </li> <li class="item"> <a href="https://about.zenodo.org/principles">Principles</a> </li> <li class="item"> <a href="https://about.zenodo.org/projects/">Projects</a> </li> <li class="item"> <a href="https://about.zenodo.org/roadmap/">Roadmap</a> </li> <li class="item"> <a href="https://about.zenodo.org/contact">Contact</a> </li> </ul> </div> <div class="column"> <h2 class="ui inverted tiny header">Blog</h2> <ul class="ui inverted link list"> <li class="item"> <a href="https://blog.zenodo.org">Blog</a> </li> </ul> </div> <div class="column"> <h2 class="ui inverted tiny header">Help</h2> <ul class="ui inverted link list"> <li class="item"> <a href="https://help.zenodo.org">FAQ</a> </li> <li class="item"> <a href="https://help.zenodo.org/docs/">Docs</a> </li> <li class="item"> <a href="https://help.zenodo.org/guides/">Guides</a> </li> <li class="item"> <a href="https://zenodo.org/support">Support</a> </li> </ul> </div> <div class="column"> <h2 class="ui inverted tiny header">Developers</h2> <ul class="ui inverted link list"> <li class="item"> <a href="https://developers.zenodo.org">REST API</a> </li> <li class="item"> <a href="https://developers.zenodo.org#oai-pmh">OAI-PMH</a> </li> </ul> </div> <div class="column"> <h2 class="ui inverted tiny header">Contribute</h2> <ul class="ui inverted link list"> <li class="item"> <a href="https://github.com/zenodo/zenodo-rdm"> <i class="icon external" aria-hidden="true"></i> GitHub </a> </li> <li class="item"> <a href="/donate"> <i class="icon external" aria-hidden="true"></i> Donate </a> </li> </ul> </div> <div class="six wide column right aligned"> <h2 class="ui inverted tiny header">Funded by</h2> <ul class="ui horizontal link list"> <li class="item"> <a href="https://home.cern" aria-label="CERN"> <img src="/static/images/cern.png" width="60" height="60" alt="" /> </a> </li> <li class="item"> <a href="https://www.openaire.eu" aria-label="OpenAIRE"> <img src="/static/images/openaire.png" width="60" height="60" alt="" /> </a> </li> <li class="item"> <a href="https://commission.europa.eu/index_en" aria-label="European Commission"> <img src="/static/images/eu.png" width="88" height="60" alt="" /> </a> </li> </ul> </div> </div> </div> </div> <div class="footer-bottom"> <div class="ui inverted container"> <div class="ui grid"> <div class="eight wide column left middle aligned"> <p class="m-0"> Powered by <a href="http://information-technology.web.cern.ch/about/computer-centre">CERN Data Centre</a> & <a href="https://inveniordm.docs.cern.ch/">InvenioRDM</a> </p> </div> <div class="eight wide column right aligned"> <ul class="ui inverted horizontal link list"> <li class="item"> <a href="https://stats.uptimerobot.com/vlYOVuWgM/">Status</a> </li> <li class="item"> <a href="https://about.zenodo.org/privacy-policy">Privacy policy</a> </li> <li class="item"> <a href="https://about.zenodo.org/cookie-policy">Cookie policy</a> </li> <li class="item"> <a href="https://about.zenodo.org/terms">Terms of Use</a> </li> <li class="item"> <a href="/support">Support</a> </li> </ul> </div> </div> </div> </div> </footer> <script type="text/javascript"> window.MathJax = { tex: { inlineMath: [['$', '$'], ['\\(', '\\)']], processEscapes: true // Allows escaping $ signs if needed } }; </script> <script type="text/javascript" src="//cdnjs.cloudflare.com/ajax/libs/mathjax/3.2.2/es5/tex-mml-chtml.js?config=TeX-AMS-MML_HTMLorMML"></script> <script src="/static/dist/js/manifest.85914225ed5d447be325.js"></script> <script src="/static/dist/js/73.c39079ca1fc2ae113347.js"></script> <script src="/static/dist/js/3526.e89ca3df1ebb93426a28.js"></script> <script src="/static/dist/js/theme.df2465216f8b783a462f.js"></script> <script src="/static/dist/js/9630.668b690274e548f98163.js"></script> <script src="/static/dist/js/1057.7d75f7650dc82016b0de.js"></script> <script src="/static/dist/js/7655.822a63bb4ea3764acdae.js"></script> <script src="/static/dist/js/9621.6535c4e4a93e5683c079.js"></script> <script src="/static/dist/js/5373.4d3c97d870fcbcceede3.js"></script> <script src="/static/dist/js/8871.6edcd2191521fae14176.js"></script> <script src="/static/dist/js/621.a65ad81b760d669506ff.js"></script> <script src="/static/dist/js/9827.77cd5b562048a6591df4.js"></script> <script src="/static/dist/js/742.c9ff6bca3a608a9bce7d.js"></script> <script src="/static/dist/js/base-theme-rdm.9d9bbd310b84172faad3.js"></script> <script src="/static/dist/js/i18n_app.99c66e0dc76dcd873a86.js"></script> <script src="/static/dist/js/4709.939b30788057693c892e.js"></script> <script src="/static/dist/js/5941.416c32bce9b1f6137489.js"></script> <script src="/static/dist/js/9736.60a1726b721ad3d427db.js"></script> <script src="/static/dist/js/5965.a9d2e38c2dcdd42ebc70.js"></script> <script src="/static/dist/js/1677.b755f75684136ec3634e.js"></script> <script src="/static/dist/js/8102.8ca37a7a6650ce75a7fa.js"></script> <script src="/static/dist/js/5368.01a2200ad661130e2972.js"></script> <script src="/static/dist/js/8585.f6b02d1a40609affd664.js"></script> <script src="/static/dist/js/1990.4198c9b3429de6dfebed.js"></script> <script src="/static/dist/js/3532.e8cb76db2b7edf78ccfa.js"></script> <script src="/static/dist/js/overridable-registry.d865c1ea4ce3e4ded8b7.js"></script> <script type='application/ld+json'>{"@context": "http://schema.org", "@id": "https://doi.org/10.5281/zenodo.1086031", "@type": "https://schema.org/ScholarlyArticle", "author": [{"@type": "Person", "familyName": "Matthias Dehmer", "name": "Matthias Dehmer"}, {"@type": "Person", "familyName": "Frank Emmert Streib", "name": "Frank Emmert Streib"}, {"@type": "Person", "familyName": "Alexander Mehler", "name": "Alexander Mehler"}, {"@type": "Person", "familyName": "J\u00fcrgen Kilian", "name": "J\u00fcrgen Kilian"}], "contentSize": "1.83 MB", "creator": [{"@type": "Person", "familyName": "Matthias Dehmer", "name": "Matthias Dehmer"}, {"@type": "Person", "familyName": "Frank Emmert Streib", "name": "Frank Emmert Streib"}, {"@type": "Person", "familyName": "Alexander Mehler", "name": "Alexander Mehler"}, {"@type": "Person", "familyName": "J\u00fcrgen Kilian", "name": "J\u00fcrgen Kilian"}], "dateCreated": "2018-01-17T09:24:37.141964+00:00", "dateModified": "2024-08-02T22:55:33.371537+00:00", "datePublished": "2007-10-20", "description": "\u003cp\u003eMost known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.\u003c/p\u003e", "identifier": "https://doi.org/10.5281/zenodo.1086031", "inLanguage": {"@type": "Language", "alternateName": "eng", "name": "English"}, "keywords": "Graph similarity, hierarchical and directed graphs, hypertext, generalized trees, web structure mining.", "license": "https://creativecommons.org/licenses/by/4.0/legalcode", "name": "Measuring the Structural Similarity of Web-based Documents: A Novel Approach", "publisher": {"@type": "Organization", "name": "Zenodo"}, "size": "1.83 MB", "url": "https://zenodo.org/records/1086031", "version": "15928"}</script> <script src="/static/dist/js/invenio-app-rdm-landing-page-theme.9b56690388e335810f04.js"></script> <script src="/static/dist/js/9945.e11a5a6ff50535c72070.js"></script> <script src="/static/dist/js/1357.4e237807ffba81b213b0.js"></script> <script src="/static/dist/js/1644.2b2007bc83e4beeabfaf.js"></script> <script src="/static/dist/js/8962.cfabe841decd009221fd.js"></script> <script src="/static/dist/js/9300.a81535ba51a38f1472fe.js"></script> <script src="/static/dist/js/9693.dac033d778162b60d96f.js"></script> <script src="/static/dist/js/invenio-app-rdm-landing-page.024f3c02bb324ddef007.js"></script> <script src="/static/dist/js/previewer_theme.77f20174699c7786038a.js"></script> <script src="/static/dist/js/zenodo-rdm-citations.f6ca22bc7712ee9b03f7.js"></script> <div class="ui container info message cookie-banner hidden"> <i class="close icon"></i> <div> <i aria-hidden="true" class="info icon"></i> <p class="inline">This site uses cookies. Find out more on <a href="https://about.zenodo.org/cookie-policy">how we use cookies</a></p> </div> <div class="buttons"> <button class="ui button small primary" id="cookies-all">Accept all cookies</button> <button class="ui button small" id="cookies-essential">Accept only essential cookies</button> </div> </div> <script> var _paq = window._paq = window._paq || []; _paq.push(['requireCookieConsent']); (function() { var u="https://webanalytics.web.cern.ch/"; _paq.push(['setTrackerUrl', u+'matomo.php']); _paq.push(['setSiteId', '366']); var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); })(); const cookieConsent = document.cookie .split("; ") .find((row) => row.startsWith("cookie_consent=")) ?.split("=")[1]; if (cookieConsent) { if (cookieConsent === "all") { matomo(); } } else { document.querySelector(".cookie-banner").classList.remove("hidden") _paq.push(['forgetConsentGiven']); } $('.cookie-banner .close') .on('click', function () { $(this) .closest('.message') .transition('fade'); setCookie("cookie_consent","essential"); }); $('#cookies-essential') .on('click', function () { $(this) .closest('.message') .transition('fade'); setCookie("cookie_consent","essential"); }); $('#cookies-all') .on('click', function () { $(this) .closest('.message') .transition('fade'); setCookie("cookie_consent","all"); _paq.push(['rememberCookieConsentGiven']); matomo(); }); function matomo() { /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ _paq.push(['trackPageView']); _paq.push(['enableLinkTracking']); } function setCookie(cname, cvalue) { var d = new Date(); d.setTime(d.getTime() + (365 * 24 * 60 * 60 * 1000)); // one year var expires = "expires=" + d.toUTCString(); var cookie = cname + "=" + cvalue + ";" + expires + ";" cookie += "Domain=zenodo.org;Path=/;SameSite=None; Secure"; document.cookie = cookie; } </script> </body> </html>