CINXE.COM
CiteSeerX Data | CiteSeerX
<!DOCTYPE html> <html lang="en"> <head> <title> CiteSeerX Data | CiteSeerX</title> <link rel="shortcut icon" href="#"> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <!-- Bootstrap --> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css"> <link href="https://fonts.googleapis.com/css?family=Roboto:400,700" rel="stylesheet"> <link href="https://fonts.googleapis.com/css?family=Open+Sans" rel="stylesheet"> <link type="text/css" href="/resources/css/navigation.css" rel="stylesheet"> <link type="text/css" href="/resources/css/textstyles.css" rel="stylesheet"> <link type="text/css" rel="stylesheet" href="/resources/css/footer-distributed-with-address-and-phones.css"> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css"> <link type="text/css" href="/resources/css/content.css" rel="stylesheet"> <!-- jQuery (necessary for Bootstrap's JavaScript plugins) --> <script src="/js/jquery-3.2.1.min.js" type="text/javascript"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"></script> <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> <!--[if lt IE 9]> <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> <![endif]--> <!-- JQuery --> <link rel="stylesheet" href="https://code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css"> <script src="https://code.jquery.com/jquery-1.12.4.js"></script> <script src="https://code.jquery.com/ui/1.12.1/jquery-ui.js"></script> </head> <body> <div class="DMCA psunav"> <!-- DMCA --> <ul> <li class="li-2"><a href="https://www.psu.edu/copyright-information" target="_blank"> <div class="colorchange">DMCA</div> </a></li> </ul> </div> <div class="container header"> <header> <a name="top"></a> <!-- navigation tabs --> <a href="https://citeseerx.ist.psu.edu"> <img src="/resources/img/citeseerx.png" title="CiteSeerX" class="citeseerx_logo"/> </a> <div class="psunav"> <ul> <!-- About --> <li class="li-1"><a href="../index.html"> <div class="colorchange">About</div> </a></li> <!-- People Dropdown --> <li class="li-2"> <div class="colorchange"> <div class="dropbtn" id="dropbtn-people"> People <i class="fa fa-caret-down"></i> <div class="dropmenu-content" id="dropmenu-content-people"> <a href="../people/team.html">Team</a> <a href="../people/collaborators.html">Collaborators</a> </div> </div> </div> </li> <!-- Publications --> <li class="li-2"><a href="http://clgiles.ist.psu.edu/citeseer-related.pdf" target="_blank"> <div class="colorchange">Publications</div> </a></li> <!-- Data Dropdown --> <li class="li-2"> <div class="colorchange"> <div class="dropbtn" id="dropbtn-downloads"> Downloads <i class="fa fa-caret-down"></i> <div class="dropmenu-content" id="dropmenu-content-downloads"> <a href="../downloads/data.html">Data</a> <a href="../downloads/software.html">Software</a> </div> </div> </div> </li> <!-- Contact Form --> <li class="li-2"> <div class="colorchange"> <div class="dropbtn" id="dropbtn-contact"> Contact <i class="fa fa-caret-down"></i> <div class="dropmenu-content" id="dropmenu-content-contact"> <a href="../contact/contact.html">Contact Us</a> <a href="http://csxcrawlweb01.ist.psu.edu/submit_pub/" target="_blank">Crawler</a> </div> </div> </div> </li> <!-- Donate --> <li class="li-2"><a href="http://www.givenow.psu.edu/CiteseerxFund" target="_blank"> <div class="colorchange">Donate</div> </a></li> </ul> </div> </header> </div> <div> <div class="pagetitle"> <div class="container"> <h1 class="titletext">CiteSeerX Data</h1> </div> </div> </div> <div class="pagebody"> <div class="container bodytext"> <div class="row"> <section class="col-sm-12 col-md-12 col-lg-12 col-x1-12"> <div id="content-container" class="left-sidebar"> <div id="node-3" class="node clear-block"> <div class="meta"> </div> <div class="content"> <div class="ptext"> <p>CiteSeer<sup>x</sup> data and metadata are available for others to use. Data available includes CiteSeer<sup>x</sup> metadata, databases, data sets of pdf files and text of pdf files.</p> <p>For more information, please contact us directly. Currently, data is only available through sharing folders on Google Drive. Please contact us for more information. <span xmlns:dc="http://purl.org/dc/elements/1.1/" href="http://purl.org/dc/dcmitype/Text" property="dc:title" rel="dc:type">Data</span> released by <a xmlns:cc="http://creativecommons.org/ns#" href="http://citeseerx.ist.psu.edu" property="cc:attributionName" rel="cc:attributionURL">CiteSeer<sup>x</sup></a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/">Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License</a>. </p> <p><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/"><img alt="Creative Commons License" src="http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png"></a></p> <p>CiteSeer<sup>x</sup> is compliant with the <a href="http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm" title="OAI-PMH">Open Archives Initiative Protocol for Metadata Harvesting</a>, which is a standard proposed by <a href="http://www.openarchives.org/" title="The Open Archive Initiative"> The Open Archive Initiative</a> in order to facilitate content dissemination. For data not mentioned here, please contact us through feedback.</p> <p>To browse or download records programmatically from CiteSeer<sup>x</sup> OAI collection please use the harvest url:</p> <p><strong>http://citeseerx.ist.psu.edu/oai2</strong></p> <p>The archive may also be browsed from an interface via an <a href="http://re.cs.uct.ac.za/" title="OAI Repository Explorer">OAI Repository Explorer</a>, either by using the CiteSeer<sup>x</sup> archive identifier or by directly entering the harvest url.</p> <p>Here is a list of toolkits that can be used for OAI metadata harvesting.</p> <ul> <li><a href="http://search.cpan.org/~thb/OAI-Harvester-1.13/" title="OAI Harvester">OAI-Harvester</a> - perl </li> <li><a href="http://www.oclc.org/research/software/oai/harvester2.htm" title="OAI Harvester2">OAIHarvester2</a> - Java </li> <li><a href="http://sourceforge.net/projects/netoaihvster" title=".NET OAI Harvester">.NET OAI Harvester</a> - .NET (dll) </li> <li><a href="http://sourceforge.net/projects/uilib-oai/" title="UIUC OAI">UIUC OAI</a> - UIUC OAI Metadata Harvesting Project. </li> </ul> <h2 class="section-header"><a name="Data Sets" class="sie-section-header">Data Sets</a> </h2> <p></p> <p></p> <h3><a name="Citation and Header Datasets" class="sie-section-header">Citation and Header Datasets</a></h3> <p> </p> <ul> <li> <u>UMass Citation Field Extraction Dataset</u> <br> <a href="https://sites.google.com/a/iesl.cs.umass.edu/home/data/umasscitationfield">https://sites.google.com/a/iesl.cs.umass.edu/home/data/umasscitationfield</a> <br> License: N/A <br> <i>Provides labels and segments for extracted citations from articles found on arxiv.org. Citations are from 5000 papers from four fields. Described in "A New Dataset for Fine-Grained Citation Field Extraction."</i> </li> <li> <u>Cora Information Extraction</u> <br> <a href="http://people.cs.umass.edu/~mccallum/data/cora-ie.tar.gz">http://people.cs.umass.edu/~mccallum/data/cora-ie.tar.gz</a> <br> License: N/A <br> <i>Research paper headers and citations, with labeled segments for authors, title, institutions, venue, date, page numbers and several other fields.</i> </li> <li> <u>CiteSeerX Citation Data</u> <br> <a href="http://aye.comp.nus.edu.sg/parsCit/citeseerx.tagged.txt">http://aye.comp.nus.edu.sg/parsCit/citeseerx.tagged.txt</a> <br> License: N/A <br> <i>Tagged citation data from CiteSeerX</i> </li> </ul> </div> </div> </div> </div> </section> </div> </div> </div> <footer class="footer-distributed"> <div class="footer-left"> <img src="/resources/img/footer_logo.png" width="510" height="150"> <p class="footer-links"> <a href="../privacy-policy/privacy-policy.html">Privacy Policy</a> · <a href="../help/help.html">Help</a> · <a href="https://github.com/SeerLabs/CiteSeerX">Source</a> · <a href="../contact/contact.html">Contact Us</a> </p> <p class="footer-company-name">Developed at and hosted by <a href="http://ist.psu.edu/">The College of Information Sciences and Technology</a></p><br/> <p class="footer-company-name"><a href="https://www.psu.edu/">Pennsylvania State University</a> © 2007-2016 </p> </div> <div class="footer-center"> <div> <div> <i class="fa fa-map-marker"></i> <p><span>Westgate Building</span> Pennsylvania State University <br/>University Park, PA 16802 </p> </div> <div> <i class="fa fa-phone"></i> <p>+(814) 865 7884</p> </div> </div> </div> </footer> </body> </html>