CINXE.COM
About CiteSeerX | CiteSeerX
<!DOCTYPE html> <html lang="en"> <head> <title>About CiteSeerX | CiteSeerX</title> <link rel="shortcut icon" href="#"> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <!-- Bootstrap --> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css"> <link href="https://fonts.googleapis.com/css?family=Roboto:400,700" rel="stylesheet"> <link href="https://fonts.googleapis.com/css?family=Open+Sans" rel="stylesheet"> <link type="text/css" href="/resources/css/navigation.css" rel="stylesheet"> <link type="text/css" href="/resources/css/textstyles.css" rel="stylesheet"> <link type="text/css" rel="stylesheet" href="/resources/css/footer-distributed-with-address-and-phones.css"> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css"> <link type="text/css" href="/resources/css/content.css" rel="stylesheet"> <!-- jQuery (necessary for Bootstrap's JavaScript plugins) --> <script src="/js/jquery-3.2.1.min.js" type="text/javascript"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"></script> <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> <!--[if lt IE 9]> <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> <![endif]--> <!-- JQuery --> <link rel="stylesheet" href="https://code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css"> <script src="https://code.jquery.com/jquery-1.12.4.js"></script> <script src="https://code.jquery.com/ui/1.12.1/jquery-ui.js"></script> <script> $(document).ready(function () { $("#features").on("hide.bs.collapse", function () { $("#features-chevron").html('<i class="fa fa-fw fa-chevron-down"></i>'); }); $("#features").on("show.bs.collapse", function () { $("#features-chevron").html('<i class="fa fa-fw fa-chevron-up"></i>'); }); $("#history").on("hide.bs.collapse", function () { $("#history-chevron").html('<i class="fa fa-fw fa-chevron-down"></i>'); }); $("#history").on("show.bs.collapse", function () { $("#history-chevron").html('<i class="fa fa-fw fa-chevron-up"></i>'); }); $("#acknowledgements").on("hide.bs.collapse", function () { $("#acknowledgements-chevron").html('<i class="fa fa-fw fa-chevron-down"></i>'); }); $("#acknowledgements").on("show.bs.collapse", function () { $("#acknowledgements-chevron").html('<i class="fa fa-fw fa-chevron-up"></i>'); }); }); </script> </head> <body> <div class="DMCA psunav"> <!-- DMCA --> <ul> <li class="li-2"><a href="https://www.psu.edu/copyright-information" target="_blank"> <div class="colorchange">DMCA</div> </a></li> </ul> </div> <div class="container header"> <header> <a name="top"></a> <!-- navigation tabs --> <a href="https://citeseerx.ist.psu.edu"> <img src="/resources/img/citeseerx.png" title="CiteSeerX" class="citeseerx_logo"/> </a> <div class="psunav"> <ul> <!-- About --> <li class="li-1"><a href=""> <div class="colorchange">About</div> </a></li> <!-- People Dropdown --> <li class="li-2"> <div class="colorchange"> <div class="dropbtn" id="dropbtn-people"> People <i class="fa fa-caret-down"></i> <div class="dropmenu-content" id="dropmenu-content-people"> <a href="/people/team.html">Team</a> <a href="/people/collaborators.html">Collaborators</a> </div> </div> </div> </li> <!-- Publications --> <li class="li-2"><a href="http://clgiles.ist.psu.edu/citeseer-related.pdf" target="_blank"> <div class="colorchange">Publications</div> </a></li> <!-- Data Dropdown --> <li class="li-2"> <div class="colorchange"> <div class="dropbtn" id="dropbtn-downloads"> Downloads <i class="fa fa-caret-down"></i> <div class="dropmenu-content" id="dropmenu-content-downloads"> <a href="/downloads/data.html">Data</a> <a href="/downloads/software.html">Software</a> </div> </div> </div> </li> <!-- Contact Form --> <li class="li-2"> <div class="colorchange"> <div class="dropbtn" id="dropbtn-contact"> Contact <i class="fa fa-caret-down"></i> <div class="dropmenu-content" id="dropmenu-content-contact"> <a href="/contact/contact.html">Contact Us</a> <a href="http://csxcrawlweb01.ist.psu.edu/submit_pub/" target="_blank">Crawler</a> </div> </div> </div> </li> <!-- Donate --> <li class="li-2"><a href="http://www.givenow.psu.edu/CiteseerxFund" target="_blank"> <div class="colorchange">Donate</div> </a></li> </ul> </div> </header> </div> <div> <div class="pagetitle"> <div class="container"> <h1 class="titletext">About CiteSeerX</h1> </div> </div> </div> <div class="pagebody"> <div class="container bodytext"> <div class="row"> <section class="col-sm-12 col-md-12 col-lg-12 col-x1-12"> <p class="ptext"> CiteSeer<sup>x</sup> is an evolving scientific literature digital library and search engine that has focused primarily on the literature in computer and information science. CiteSeer<sup>x</sup> aims to improve the dissemination of scientific literature and to provide improvements in functionality, usability, availability, cost, comprehensiveness, efficiency, and timeliness in the access of scientific and scholarly knowledge. Rather than creating just another digital library, CiteSeer<sup>x</sup> attempts to provide resources such as algorithms, data, metadata, services, techniques, and software that can be used to promote other digital libraries. CiteSeer<sup>x</sup> has developed new methods and algorithms to index PostScript and PDF research articles on the Web. Citeseer<sup>x</sup> provides the following features. </p> <a data-toggle="collapse" href="#features" class="collapsible"> <div class="collapse-panel"> <h2 class="collapse-text"><strong>Features</strong></h2> <h2 class="collapse-chevron" id="features-chevron"><i class="fa fa-fw fa-chevron-down"></i></h2> </div> </a> <div id="features" class="collapse"> <table class="features ptext" style="text-align: center;"> <tr class="row"> <td> <div class="dropdown"> <div id="features1" class="features-header">Autonomous citation indexing (ACI)</div> <div id="features1-content" class="features-text dropdown-content"> CiteSeer uses ACI to automatically extract citations and create a citation index that can be used for literature search and evaluation. Compared to traditional citation indices, ACI provides improvements in cost, availability, comprehensiveness, efficiency, and timeliness. </div> </div> </td> <td> <div class="dropdown"> <div id="features2" class="features-header">Automatic metadata extraction</div> <div id="features2-content" class="features-text dropdown-content"> CiteSeer automatically extracts author, title and other related metadata for analysis and document search. </div> </div> </td> <td> <div class="dropdown"> <div id="features3" class="features-header">Citation statistics</div> <div id="features3-content" class="features-text dropdown-content"> CiteSeer computes citation statistics and related documents for all articles cited in the database, not just the indexed articles. </div> </div> </td> </tr> <tr class="row"> <td> <div class="dropdown"> <div id="features4" class="features-header">Reference linking</div> <div id="features4-content" class="features-text dropdown-content"> CiteSeer was the first to allow browsing documents using citation links that are automatically generated. </div> </div> </td> <td> <div class="dropdown"> <div id="features5" class="features-header">Author disambiguation</div> <div id="features5-content" class="features-text dropdown-content"> Using scalable methods authors are automatically disambiguated from other authors. </div> </div> </td> <td> <div class="dropdown"> <div id="features6" class="features-header">Citation context</div> <div id="features6-content" class="features-text dropdown-content"> CiteSeer can show the context of citations to a given paper, allowing a researcher to quickly and easily see what other researchers have to say about an article of interest (no longer available). </div> </div> </td> </tr> <tr class="row"> <td> <div class="dropdown"> <div id="features7" class="features-header">Awareness and tracking</div> <div id="features7-content" class="features-text dropdown-content"> CiteSeer provides automatic notification of new citations to given papers, and new papers matching a user profile. </div> </div> </td> <td> <div class="dropdown"> <div id="features8" class="features-header">Related documents</div> <div id="features8-content" class="features-text dropdown-content"> CiteSeer locates related documents using citation and word based measures and displays an active and continuously updated bibliography for each document. </div> </div> </td> <td> <div class="dropdown"> <div id="features9" class="features-header">Full-text indexing</div> <div id="features9-content" class="features-text dropdown-content"> CiteSeer indexes the full-text of the entire articles and citations. Full boolean, phrase and proximity search is supported. </div> </div> </td> </tr> <tr class="row"> <td> <div class="dropdown"> <div id="features10" class="features-header">Query-sensitive summaries</div> <div id="features10-content" class="features-text dropdown-content"> CiteSeer provides the context of how query terms are used in articles instead of a generic summary, improving the efficiency of search. </div> </div> </td> <td> <div class="dropdown"> <div id="features11" class="features-header">Up-to-date</div> <div id="features11-content" class="features-text dropdown-content"> CiteSeer is regularly updated based on user submissions and regular crawls. </div> </div> </td> <td> <div class="dropdown"> <div id="features12" class="features-header">Powerful search</div> <div id="features12-content" class="features-text dropdown-content"> CiteSeer uses fielded search to all complex queries over content, and allows the use of author initials to provide more flexible name search. </div> </div> </td> </tr> <tr class="row"> <td> <div class="dropdown"> <div id="features13" class="features-header">Harvesting of articles</div> <div id="features13-content" class="features-text dropdown-content"> CiteSeer automatically harvests research papers from the public Web but also accepts submissions through a submission system. </div> </div> </td> <td> <div class="dropdown"> <div id="features14" class="features-header">Metadata of articles</div> <div id="features14-content" class="features-text dropdown-content"> CiteSeer automatically extracts and provides metadata from all indexed articles. </div> </div> </td> <td> <div class="dropdown"> <div id="features15" class="features-header">Personal Content Portal</div> <div id="features15-content" class="features-text dropdown-content"> CiteSeer provides certain features such as personal collections, RSS-like notifications, social bookmarking, and social network facilities. Personalized search setting and institutional data tracking is possible. Documents of users can be submitted through an easy to use document submission system </div> </div> </td> </tr> </table> </div> <a data-toggle="collapse" href="#history" class="collapsible"> <div class="collapse-panel"> <h2 class="collapse-text"><strong>History</strong></h2> <h2 class="collapse-chevron" id="history-chevron"><i class="fa fa-fw fa-chevron-down"></i></h2> </div> </a> <div id="history" class="collapse"> <div class="ptext"> <p>CiteSeer was the first digital library and search engine to provide automated citation indexing and citation linking by autonomous citation indexing.</p> <p>CiteSeer was developed in 1997 at the NEC Research Institute, Princeton, New Jersey, by Steve Lawrence, Lee Giles and Kurt Bollacker. The service transitioned to the Pennsylvania State University's College of Information Sciences and Technology in 2003. Since then, the project has been led by Professor Lee Giles.</p> <p>After serving as a public search engine for nearly ten years, CiteSeer, originally intended as a prototype only, began to scale beyond the capabilities of its original architecture. Since its inception, the original CiteSeer grew to index over 750,000 documents and served over 1.5 million requests daily, pushing the limits of the system's capabilities. Based on an analysis of problems encountered by the original system and the needs of the research community, a new architecture and data model was developed for the "Next Generation CiteSeer," or CiteSeerx, in order to continue the CiteSeer legacy into the foreseeable future.</p> </div> </div> <a data-toggle="collapse" href="#acknowledgements" class="collapsible"> <div class="collapse-panel"> <h2 class="collapse-text"><strong>Acknowledgements</strong></h2> <h2 class="collapse-chevron" id="acknowledgements-chevron"><i class="fa fa-fw fa-chevron-down"></i></h2> </div> </a> <div id="acknowledgements" class="collapse"> <div class="ptext"> <div class="content"> <ul> <li>We gratefully acknowledge current and past support from: <ul type="list-style-type:square"> <li>The National Science Foundation award <a href="http://www.nsf.gov/awardsearch/showAward?AWD_ID=0958143">CNS-0958143</a>. </li> <li>Microsoft Research </li> <li>NASA </li> <li>Qatar </li> </ul> </li> <li>The initial header parsing algorithm used by CiteSeer<sup>x</sup> was developed by Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zha, Zhenyue Zhang, and Edward A. Fox. The algorithm was further refined by Levent Bolelli and Isaac Councill. </li> <li><a href="http://www.cse.psu.edu/~yasong/" target="_blank">Yang Song</a> developed an initial MyCiteSeer prototype that guided later efforts. </li> <li>Yang Sun contributed the venue analysis code for calculating impact factor statistics. </li> </ul> <h3>Open Source Acknowledgements</h3> <p>CiteSeer<sup>x</sup> is supported by numerous excellent open source applications and libraries. Specifically, we would like to thank all who participated in the development of the following projects:</p> <ul> <li><a href="http://www.mysql.com/" target="_blank">The MySQL Database</a> and <a href="http://www.innodb.com/" target="_blank">InnoDB Storage Engine</a></li> <li><a href="http://lucene.apache.org/solr/" target="_blank">Solr</a></li> <li><a href="http://xapian.org/" target="_blank">Xapian</a></li> <li><a href="http://tomcat.apache.org/" target="_blank">Apache Tomcat</a></li> <li><a href="http://www.springframework.org/" target="_blank">The Spring Framework</a> </li> <li><a href="http://static.springsource.org/spring-security/site/" target="_blank">Spring Security</a></li> <li><a href="http://activemq.apache.org/" target="_blank">ActiveMQ</a></li> <li>ActiveBPEL Open Source Engine</li> <li><a href="http://commons.apache.org/" target="_blank">The Apache Commons Libraries</a></li> <li><a href="http://svmlight.joachims.org/" target="_blank">SVM<sup>light</sup> support vector machine package</a></li> <li><a href="http://crfpp.sourceforge.net/" target="_blank">CRF++ conditional random field package</a></li> <li>Icons under LGPL and Creative Commons (Attribution 3.0) Licenses</li> </ul> <h3>We Also Recognize</h3> <ul> <li>Andrew Ng was the first to extract title and author information from the header of PostScript files. </li> <li><a href="http://nzdl.sadl.uleth.ca/cgi-bin/library" target="_blank">The New Zealand Digital Library</a> was the first to index the full text of PostScript research articles. </li> <li>Dr. Eugene Garfield created the idea of citation indexing of the scientific literature. </li> </ul> <h3>Special Thanks</h3> <p>Many have contributed to CiteSeer and its continuing development. In a list in which some are surely missing, we would like to thank</p> <table border="0" cellspacing="5" cellpadding="5"> <tbody class="ptext"> <tr> <td> <ul> <li>Anurag Acharya</li> <li>Joshua Alspector</li> <li>Esam Alwagait</li> <li>Jose Nelson Amaral</li> <li>Anders Ardo</li> <li>Bill Arms</li> <li>Shumeet Baluja</li> <li>Arunava Banerjee</li> <li>Eric Baum</li> <li>Donna Bergmark</li> <li>Levent Bolelli</li> <li>Kurt Bollacker</li> <li>Shannon Bradshaw</li> <li>Vivek Bhatnagar</li> <li>Jay Budzik</li> <li>Robert Cameron</li> <li>Jack Carroll</li> <li>Rich Caruana</li> <li>Ingemar Cox</li> <li>Sandip Debnath</li> <li>Seyda Ertekin</li> <li>Scott Fahlman</li> <li>Umer Farooq</li> <li>Gary Flake</li> <li>Ed Fox</li> <li>Eugene Garfield</li> </ul> </td> <td> <ul> <li>Susan Gauch</li> <li>Bill Gear</li> <li>Paul Ginsparg</li> <li>Eric Glover</li> <li>Abby Goodrum</li> <li>Marco Gori</li> <li>Allan Gottlieb</li> <li>Jim Gray</li> <li>Hui Han</li> <li>Mike Halm</li> <li>Steve Hanson</li> <li>Stevan Harnad</li> <li>Eric Hellman</li> <li>Hui Han</li> <li>Geoff Hinton</li> <li>Haym Hirsh</li> <li>Steve Hitchcock</li> <li>Jian Huang</li> <li>Kirby Huntsinger</li> <li>Gerd Hoff</li> <li>Ernesto Di Iorio</li> <li>Jim Jansen</li> <li>Shannon Johnson</li> <li>Paul Kantor</li> <li>Madian Khabsa</li> </ul> </td> <td> <ul> <li>Jon Kleinberg</li> <li>Thomas Krichel</li> <li>Bob Krovetz</li> <li>Carl Lagoze</li> <li>Andrea LaPaugh</li> <li>Steve Lawrence</li> <li>Wang-Chien Lee</li> <li>Jay Lepreau</li> <li>Michael Lesk</li> <li>Huajing Li</li> <li>Marco Maggini</li> <li>Eren Manavoglu</li> <li>Andrew McCallum</li> <li>Chris Milito</li> <li>Steve Minton</li> <li>Tom Mitchell</li> <li>Finn Nielsen</li> <li>Michael Nelson</li> <li>Craig Nevill-Manning</li> <li>Andrew Ng</li> <li>Andrew Odlyzko</li> <li>Frank Olken</li> <li>David Pennock</li> <li>Yves Petinot</li> <li>Brian Pinkerton</li> <li>Alexandrin Popescul</li> </ul> </td> <td> <ul> <li>Augusto Pucci</li> <li>Betsy Richmond</li> <li>Ben Schafer</li> <li>Bruce Schatz</li> <li>Terrence Sejnowski</li> <li>Anand Sivasubramaniam</li> <li>Warren Smith</li> <li>Yang Song</li> <li>Amanda Spink</li> <li>Yang Sun</li> <li>Harold Stone</li> <li>Pucktada Treeratpituk</li> <li>Kostas Tsioutsiouliklis</li> <li>Valerie Tucci</li> <li>Lyle Ungar</li> <li>Frits Vaandrager</li> <li>Moshe Vardi</li> <li>David Waltz</li> <li>James Ze Wang</li> <li>Simeon Warner</li> <li>Ian Witten</li> <li>John Yen</li> <li>Maria Zemankova</li> <li>Hongyuag Zha</li> <li>Ding Zhou</li> <li>Ziming Zhuang</li> </ul> </td> </tr> </tbody> </table> </div> </div> </div> <h2 class="section-header"><strong>Sponsors</strong></h2> <p class="ptext"> We are very thankful for the generous support that our sponsors have provided. In particular, CiteSeerx would not exist without their support. <br/><strong>If there is any interest in sponsoring CiteSeerx, <a href="http://clgiles.ist.psu.edu/">please contact Professor Giles.</a></strong> </p> <div class="sponsor_text" style="text-align:center;"> <h3>Current Sponsors</h3> <a href="https://www.nsf.gov/awardsearch/showAward?AWD_ID=0958143"><img src="/resources/img/nsf_logo.gif" title="The National Science Foundation" class="sponsors"/></a> <h3>Previous Sponsors</h3> <a href="https://allenai.org/"><img src="/resources/img/ai2_logo.png" title="Allen Institute for Artificial Intelligence" class="sponsors"/></a> <a href="http://research.microsoft.com/"><img src="/resources/img/logo_msr.png" title="Microsoft Research" class="sponsors" style="width:150px;"/></a> <a href="https://www.nasa.gov/"><img src="/resources/img/logo_nasa.png" title="The National Aeronautics and Space Administration" class="sponsors"/></a> </div> </section> </div> </div> </div> <footer class="footer-distributed"> <div class="footer-left"> <img src="/resources/img/footer_logo.png" width="510" height="150"> <p class="footer-links"> <a href="/privacy-policy/privacy-policy.html">Privacy Policy</a> · <a href="/help/help.html">Help</a> · <a href="https://github.com/SeerLabs/CiteSeerX">Source</a> · <a href="/contact/contact.html">Contact Us</a> </p> <p class="footer-company-name">Developed at and hosted by <a href="http://ist.psu.edu/">The College of Information Sciences and Technology</a></p><br/> <p class="footer-company-name"><a href="https://www.psu.edu/">Pennsylvania State University</a> © 2007-2016 </p> </div> <div class="footer-center"> <div> <div> <i class="fa fa-map-marker"></i> <p><span>Westgate Building</span> Pennsylvania State University <br/>University Park, PA 16802 </p> </div> <div> <i class="fa fa-phone"></i> <p>+(814) 865 7884</p> </div> </div> </div> </footer> </body> </html>