CINXE.COM

Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings | PLOS ONE

<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" xmlns:doi="http://dx.doi.org/" lang="en" xml:lang="en" itemscope itemtype="http://schema.org/Article" class="no-js"> <head prefix="og: http://ogp.me/ns#"> <link rel="stylesheet" href="/resource/css/screen.css?79f248ebefa43b7800a14562e5049ab4"/> <!-- allows for extra head tags --> <!-- hello --> <link rel="stylesheet" type="text/css" href="https://fonts.googleapis.com/css?family=Open+Sans:400,400i,600"> <link media="print" rel="stylesheet" type="text/css" href="/resource/css/print.css"/> <script type="text/javascript"> var siteUrlPrefix = "/plosone/"; </script> <script src="/resource/js/vendor/modernizr-v2.7.1.js" type="text/javascript"></script> <script src="/resource/js/vendor/detectizr.min.js" type="text/javascript"></script> <link rel="shortcut icon" href="/resource/img/favicon.ico" type="image/x-icon"/> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <link rel="canonical" href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0125592" /> <meta name="description" content="The word-frequency distribution of a text written by an author is well accounted for by a maximum entropy distribution, the RGF (random group formation)-prediction. The RGF-distribution is completely determined by the a priori values of the total number of words in the text (M), the number of distinct words (N) and the number of repetitions of the most common word (kmax). It is here shown that this maximum entropy prediction also describes a text written in Chinese characters. In particular it is shown that although the same Chinese text written in words and Chinese characters have quite differently shaped distributions, they are nevertheless both well predicted by their respective three a priori characteristic values. It is pointed out that this is analogous to the change in the shape of the distribution when translating a given text to another language. Another consequence of the RGF-prediction is that taking a part of a long text will change the input parameters (M, N, kmax) and consequently also the shape of the frequency distribution. This is explicitly confirmed for texts written in Chinese characters. Since the RGF-prediction has no system-specific information beyond the three a priori values (M, N, kmax), any specific language characteristic has to be sought in systematic deviations from the RGF-prediction and the measured frequencies. One such systematic deviation is identified and, through a statistical information theoretical argument and an extended RGF-model, it is proposed that this deviation is caused by multiple meanings of Chinese characters. The effect is stronger for Chinese characters than for Chinese words. The relation between Zipf&rsquo;s law, the Simon-model for texts and the present results are discussed." /> <meta name="citation_abstract" content="The word-frequency distribution of a text written by an author is well accounted for by a maximum entropy distribution, the RGF (random group formation)-prediction. The RGF-distribution is completely determined by the a priori values of the total number of words in the text (M), the number of distinct words (N) and the number of repetitions of the most common word (kmax). It is here shown that this maximum entropy prediction also describes a text written in Chinese characters. In particular it is shown that although the same Chinese text written in words and Chinese characters have quite differently shaped distributions, they are nevertheless both well predicted by their respective three a priori characteristic values. It is pointed out that this is analogous to the change in the shape of the distribution when translating a given text to another language. Another consequence of the RGF-prediction is that taking a part of a long text will change the input parameters (M, N, kmax) and consequently also the shape of the frequency distribution. This is explicitly confirmed for texts written in Chinese characters. Since the RGF-prediction has no system-specific information beyond the three a priori values (M, N, kmax), any specific language characteristic has to be sought in systematic deviations from the RGF-prediction and the measured frequencies. One such systematic deviation is identified and, through a statistical information theoretical argument and an extended RGF-model, it is proposed that this deviation is caused by multiple meanings of Chinese characters. The effect is stronger for Chinese characters than for Chinese words. The relation between Zipf&rsquo;s law, the Simon-model for texts and the present results are discussed."> <meta name="keywords" content="Computational linguistics,Language,Semantics,Entropy,Distribution curves,Information entropy,Monkeys,Probability distribution" /> <meta name="citation_doi" content="10.1371/journal.pone.0125592"/> <meta name="citation_author" content="Xiaoyong Yan"/> <meta name="citation_author_institution" content="Systems Science Institute, Beijing Jiaotong University, Beijing 100044, China"/> <meta name="citation_author_institution" content="Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 611731, China"/> <meta name="citation_author" content="Petter Minnhagen"/> <meta name="citation_author_institution" content="IceLab, Department of Physics, Umeå University, 901 87 Umeå, Sweden"/> <meta name="citation_title" content="Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings"/> <meta itemprop="name" content="Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings"/> <meta name="citation_journal_title" content="PLOS ONE"/> <meta name="citation_journal_abbrev" content="PLOS ONE"/> <meta name="citation_date" content="May 8, 2015"/> <meta name="citation_firstpage" content="e0125592"/> <meta name="citation_issue" content="5"/> <meta name="citation_volume" content="10"/> <meta name="citation_issn" content="1932-6203"/> <meta name="citation_publisher" content="Public Library of Science"/> <meta name="citation_pdf_url" content="https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0125592&type=printable"> <meta name="citation_article_type" content="Research Article"> <meta name="dc.identifier" content="10.1371/journal.pone.0125592" /> <meta name="twitter:card" content="summary" /> <meta name="twitter:site" content="plosone"/> <meta name="twitter:title" content="Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings" /> <meta property="twitter:description" content="The word-frequency distribution of a text written by an author is well accounted for by a maximum entropy distribution, the RGF (random group formation)-prediction. The RGF-distribution is completely determined by the a priori values of the total number of words in the text (M), the number of distinct words (N) and the number of repetitions of the most common word (kmax). It is here shown that this maximum entropy prediction also describes a text written in Chinese characters. In particular it is shown that although the same Chinese text written in words and Chinese characters have quite differently shaped distributions, they are nevertheless both well predicted by their respective three a priori characteristic values. It is pointed out that this is analogous to the change in the shape of the distribution when translating a given text to another language. Another consequence of the RGF-prediction is that taking a part of a long text will change the input parameters (M, N, kmax) and consequently also the shape of the frequency distribution. This is explicitly confirmed for texts written in Chinese characters. Since the RGF-prediction has no system-specific information beyond the three a priori values (M, N, kmax), any specific language characteristic has to be sought in systematic deviations from the RGF-prediction and the measured frequencies. One such systematic deviation is identified and, through a statistical information theoretical argument and an extended RGF-model, it is proposed that this deviation is caused by multiple meanings of Chinese characters. The effect is stronger for Chinese characters than for Chinese words. The relation between Zipf&rsquo;s law, the Simon-model for texts and the present results are discussed." /> <meta property="twitter:image" content="https://journals.plos.org/plosone/article/figure/image?id=10.1371/journal.pone.0125592.g010&size=inline" /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0125592"/> <meta property="og:title" content="Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings"/> <meta property="og:description" content="The word-frequency distribution of a text written by an author is well accounted for by a maximum entropy distribution, the RGF (random group formation)-prediction. The RGF-distribution is completely determined by the a priori values of the total number of words in the text (M), the number of distinct words (N) and the number of repetitions of the most common word (kmax). It is here shown that this maximum entropy prediction also describes a text written in Chinese characters. In particular it is shown that although the same Chinese text written in words and Chinese characters have quite differently shaped distributions, they are nevertheless both well predicted by their respective three a priori characteristic values. It is pointed out that this is analogous to the change in the shape of the distribution when translating a given text to another language. Another consequence of the RGF-prediction is that taking a part of a long text will change the input parameters (M, N, kmax) and consequently also the shape of the frequency distribution. This is explicitly confirmed for texts written in Chinese characters. Since the RGF-prediction has no system-specific information beyond the three a priori values (M, N, kmax), any specific language characteristic has to be sought in systematic deviations from the RGF-prediction and the measured frequencies. One such systematic deviation is identified and, through a statistical information theoretical argument and an extended RGF-model, it is proposed that this deviation is caused by multiple meanings of Chinese characters. The effect is stronger for Chinese characters than for Chinese words. The relation between Zipf&rsquo;s law, the Simon-model for texts and the present results are discussed."/> <meta property="og:image" content="https://journals.plos.org/plosone/article/figure/image?id=10.1371/journal.pone.0125592.g010&size=inline"/> <meta name="citation_reference" content="citation_title=The code book;citation_author=S Singh;citation_publication_date=2002;citation_publisher=Random House"/> <meta name="citation_reference" content="citation_title=Les Gammes Sténographiques;citation_author=JB Estroup;citation_publication_date=1916;citation_publisher=Institut Stenographique de France"/> <meta name="citation_reference" content="citation_title=Selective studies of the principle of relative frequency in language;citation_author=GK Zipf;citation_publication_date=1932;citation_publisher=Harvard University Press"/> <meta name="citation_reference" content="citation_title=The psycho-biology of language: an introduction to dynamic philology;citation_author=GK Zipf;citation_publication_date=1935;citation_publisher=Mifflin"/> <meta name="citation_reference" content="citation_title=Human bevavior and the principle of least effort;citation_author=GK Zipf;citation_publication_date=1949;citation_publisher=Addison-Wesley"/> <meta name="citation_reference" content="citation_title=An informational theory of the statistical structure of languages;citation_author=B Mandelbrot;citation_publication_date=1953;citation_publisher=Butterworth"/> <meta name="citation_reference" content="citation_title=Random texts exhibit Zipf’s-law-like word frequency distribution;citation_author=W Li;citation_journal_title=IEEE T Inform Theory;citation_volume=38;citation_number=38;citation_first_page=1842;citation_last_page=1845;citation_publication_date=1992;"/> <meta name="citation_reference" content="citation_title=Word frequency distributions;citation_author=RH Baayen;citation_publication_date=2001;citation_publisher=Kluwer Academic"/> <meta name="citation_reference" content="citation_title=Least effort and the origins of scaling in human language;citation_author=RF i Cancho;citation_author=RV Solé;citation_journal_title=Proc Natl Acad Sci USA;citation_volume=100;citation_number=100;citation_first_page=788;citation_last_page=791;citation_publication_date=2003;"/> <meta name="citation_reference" content="citation_title=Beyond the Zipf-Mandelbrot law in quantitative linguistics;citation_author=MA Montemurro;citation_journal_title=Physica A;citation_volume=300;citation_number=300;citation_first_page=567;citation_last_page=578;citation_publication_date=2001;"/> <meta name="citation_reference" content="citation_title=On a class of skew distribution functions;citation_author=H Simon;citation_journal_title=Biometrika;citation_volume=42;citation_number=42;citation_first_page=425;citation_last_page=440;citation_publication_date=1955;"/> <meta name="citation_reference" content="citation_title=Markov processes: linguistics and Zipf’s Law;citation_author=I Kanter;citation_author=DA Kessler;citation_journal_title=Phys Rev Lett;citation_volume=74;citation_number=74;citation_first_page=4559;citation_last_page=4562;citation_publication_date=1995;"/> <meta name="citation_reference" content="citation_title=Languague as an evolving word web;citation_author=SN Dorogovtsev;citation_author=JFF Mendes;citation_journal_title=Proc R Soc Lond B;citation_volume=268;citation_number=268;citation_first_page=2603;citation_last_page=2606;citation_publication_date=2001;"/> <meta name="citation_reference" content="citation_title=Dynamics of text generation with realistic Zipf’s distribution;citation_author=DH Zanette;citation_author=MA Montemurro;citation_journal_title=J Quant Linguistics;citation_volume=12;citation_number=12;citation_first_page=29;citation_last_page=40;citation_publication_date=2005;"/> <meta name="citation_reference" content="citation_title=Ture reason for Zipf’s law in language;citation_author=DH Wang;citation_author=MH Li;citation_author=ZR Di;citation_journal_title=Physica A;citation_volume=358;citation_number=358;citation_first_page=545;citation_last_page=550;citation_publication_date=2005;"/> <meta name="citation_reference" content="citation_title=Networks properties of written human language;citation_author=A Masucci;citation_author=G Rodgers;citation_journal_title=Phys Rev E;citation_volume=74;citation_number=74;citation_first_page=26102;citation_publication_date=2006;"/> <meta name="citation_reference" content="citation_title=A Yule-Simon process with memory;citation_author=C Cattuto;citation_author=V Loreto;citation_author=VDP Servedio;citation_journal_title=Europhys Lett;citation_volume=76;citation_number=76;citation_first_page=208;citation_last_page=214;citation_publication_date=2006;"/> <meta name="citation_reference" content="citation_title=Deviation of Zipf’s and Heaps’ laws in human languages with limited dictionary sizes;citation_author=L Lü;citation_author=Z-K Zhang;citation_author=T Zhou;citation_journal_title=Sci Rep;citation_volume=3;citation_number=3;citation_first_page=1082;citation_publication_date=2013;"/> <meta name="citation_reference" content="citation_title=Physics nomenclature in China;citation_author=KH Zhao;citation_journal_title=Am J Phys;citation_volume=58;citation_number=58;citation_first_page=449;citation_last_page=452;citation_publication_date=1990;"/> <meta name="citation_reference" content="citation_title=Zipf’s data on the frequency of Chinese words revisited;citation_author=R Rousseau;citation_author=Q Zhang;citation_journal_title=Scientometrics;citation_volume=24;citation_number=24;citation_first_page=201;citation_last_page=220;citation_publication_date=1992;"/> <meta name="citation_reference" content="citation_title=Some comments on Zipf’s law for the Chinese language;citation_author=S Shtrikman;citation_journal_title=J Inf Sci;citation_volume=20;citation_number=20;citation_first_page=142;citation_last_page=143;citation_publication_date=1994;"/> <meta name="citation_reference" content="Ha LQ, Sicilia-Garcia EI, Ming J, Smith FJ. Extension of Zipf’s law to words and phrases. Proc 19th Intl Conf Comput Linguistics. 2002: 315-320."/> <meta name="citation_reference" content="citation_title=Size dependent word frequencies and the translational invariance of books;citation_author=S Bernhardsson;citation_author=LEC da Rocha;citation_author=P Minnhagen;citation_journal_title=Physica A;citation_volume=389;citation_number=389;citation_first_page=330;citation_last_page=341;citation_publication_date=2010;"/> <meta name="citation_reference" content="citation_title=The meta book and size-dependent properties of written language;citation_author=S Bernhardsson;citation_author=LEC da Rocha;citation_author=P Minnhagen;citation_journal_title=New J Phys;citation_volume=11;citation_number=11;citation_first_page=123015;citation_publication_date=2009;"/> <meta name="citation_reference" content="citation_title=Some effects of intermittance silence;citation_author=GA Miller;citation_journal_title=Am J Psychol;citation_volume=70;citation_number=70;citation_first_page=311;citation_last_page=314;citation_publication_date=1957;"/> <meta name="citation_reference" content="citation_title=A paradoxical property of the monkey book;citation_author=S Bernhardsson;citation_author=SK Baek;citation_author=P Minnhagen;citation_journal_title=J Stat Mech;citation_volume=7;citation_number=7;citation_first_page=PO7013;citation_publication_date=2011;"/> <meta name="citation_reference" content="citation_title=Zipf’s law unzipped;citation_author=SK Baek;citation_author=S Bernhardsson;citation_author=P Minnhagen;citation_journal_title=New J Phys;citation_volume=13;citation_number=13;citation_first_page=043004;citation_publication_date=2011;"/> <meta name="citation_reference" content="citation_title=50 years of inordinate fondness;citation_author=F Bokma;citation_author=SK Baek;citation_author=P Minnhagen;citation_journal_title=Syst biol;citation_first_page=syt067;citation_publication_date=2013;"/> <meta name="citation_reference" content="citation_title=Power-law distributions in empirical data;citation_author=A Clauset;citation_author=MEJ Shalizi CR and Newman;citation_journal_title=SIAM Rev;citation_volume=51;citation_number=51;citation_first_page=661;citation_last_page=703;citation_publication_date=2009;"/> <meta name="citation_reference" content="citation_title=Zipf’s law, power laws and maximum entropy;citation_author=M Visser;citation_journal_title=New J Phys;citation_volume=15;citation_number=15;citation_first_page=043021;citation_publication_date=2013;"/> <meta name="citation_reference" content="citation_title=Neutral theory of chemical reaction networks;citation_author=SH Lee;citation_author=S Bernhardsson;citation_author=P Holme;citation_author=BJ Kim;citation_author=P Minnhagen;citation_journal_title=New J Phys;citation_volume=14;citation_number=14;citation_first_page=033032;citation_publication_date=2012;"/> <meta name="citation_reference" content="citation_title=The ten thousand Kims;citation_author=SK Baek;citation_author=P Minnhagen;citation_author=BJ Kim;citation_journal_title=New J Phys;citation_volume=13;citation_number=13;citation_first_page=073036;citation_publication_date=2011;"/> <meta name="citation_reference" content="citation_title=The meaning-frequency relationship of words;citation_author=GK Zipf;citation_journal_title=J Gen Psychol;citation_volume=33;citation_number=33;citation_first_page=251;citation_last_page=256;citation_publication_date=1945;"/> <meta name="citation_reference" content="citation_title=A semantic interpretation of encoding specificity;citation_author=L Reder;citation_author=JR Anderson;citation_author=RA Bjork;citation_journal_title=J Exp Psychol;citation_volume=102;citation_number=102;citation_first_page=648;citation_last_page=656;citation_publication_date=1974;"/> <meta name="citation_reference" content="citation_title=The variation of Zipf’s law in human language;citation_author=RF i Cancho;citation_journal_title=Eur Phys J B;citation_volume=44;citation_number=44;citation_first_page=249;citation_last_page=257;citation_publication_date=2005;"/> <meta name="citation_reference" content="citation_title=Zipf’s law and avoidance of excessive synonymy;citation_author=DY Manin;citation_journal_title=Cognitive Sci;citation_volume=32;citation_number=32;citation_first_page=1075;citation_last_page=1098;citation_publication_date=2008;"/> <meta name="citation_reference" content="citation_title=Efficient learning strategy of Chinese characters based on network approach;citation_author=X Yan;citation_author=Y Fan;citation_author=Z Di;citation_author=S Havlin;citation_author=J Wu;citation_journal_title=PLoS ONE;citation_volume=8;citation_number=8;citation_first_page=e69745;citation_publication_date=2013;"/> <!-- DoubleClick overall ad setup script --> <script type='text/javascript'> var googletag = googletag || {}; googletag.cmd = googletag.cmd || []; (function() { var gads = document.createElement('script'); gads.async = true; gads.type = 'text/javascript'; var useSSL = 'https:' == document.location.protocol; gads.src = (useSSL ? 'https:' : 'http:') + '//www.googletagservices.com/tag/js/gpt.js'; var node = document.getElementsByTagName('script')[0]; node.parentNode.insertBefore(gads, node); })(); </script> <!-- DoubleClick ad slot setup script --> <script id="doubleClickSetupScript" type='text/javascript'> googletag.cmd.push(function() { googletag.defineSlot('/75507958/PONE_728x90_ATF', [728, 90], 'div-gpt-ad-1458247671871-0').addService(googletag.pubads()); googletag.defineSlot('/75507958/PONE_160x600_BTF', [160, 600], 'div-gpt-ad-1458247671871-1').addService(googletag.pubads()); var personalizedAds = window.plosCookieConsent && window.plosCookieConsent.hasConsented('advertising'); googletag.pubads().setRequestNonPersonalizedAds(personalizedAds ? 0 : 1); googletag.pubads().enableSingleRequest(); googletag.enableServices(); }); </script> <script type="text/javascript"> var WombatConfig = WombatConfig || {}; WombatConfig.journalKey = "PLoSONE"; WombatConfig.journalName = "PLOS ONE"; WombatConfig.figurePath = "/plosone/article/figure/image"; WombatConfig.figShareInstitutionString = "plos"; WombatConfig.doiResolverPrefix = "https://dx.plos.org/"; </script> <script type="text/javascript"> var WombatConfig = WombatConfig || {}; WombatConfig.metrics = WombatConfig.metrics || {}; WombatConfig.metrics.referenceUrl = "http://lagotto.io/plos"; WombatConfig.metrics.googleScholarUrl = "https://scholar.google.com/scholar"; WombatConfig.metrics.googleScholarCitationUrl = WombatConfig.metrics.googleScholarUrl + "?hl=en&lr=&q="; WombatConfig.metrics.crossrefUrl = "https://www.crossref.org"; </script> <script defer="defer" src="/resource/js/defer.js?13928eb59791c3cc61cf"></script><script src="/resource/js/sync.js?13928eb59791c3cc61cf"></script> <script src="/resource/js/vendor/jquery.min.js" type="text/javascript"></script> <script type="text/javascript" src="https://widgets.figshare.com/static/figshare.js"></script> <script src="/resource/js/vendor/fastclick/lib/fastclick.js" type="text/javascript"></script> <script src="/resource/js/vendor/foundation/foundation.js" type="text/javascript"></script> <script src="/resource/js/vendor/underscore-min.js" type="text/javascript"></script> <script src="/resource/js/vendor/underscore.string.min.js" type="text/javascript"></script> <script src="/resource/js/vendor/moment.js" type="text/javascript"></script> <script src="/resource/js/vendor/jquery-ui-effects.min.js" type="text/javascript"></script> <script src="/resource/js/vendor/foundation/foundation.tooltip.js" type="text/javascript"></script> <script src="/resource/js/vendor/foundation/foundation.dropdown.js" type="text/javascript"></script> <script src="/resource/js/vendor/foundation/foundation.tab.js" type="text/javascript"></script> <script src="/resource/js/vendor/foundation/foundation.reveal.js" type="text/javascript"></script> <script src="/resource/js/vendor/foundation/foundation.slider.js" type="text/javascript"></script> <script src="/resource/js/util/utils.js" type="text/javascript"></script> <script src="/resource/js/components/toggle.js" type="text/javascript"></script> <script src="/resource/js/components/truncate_elem.js" type="text/javascript"></script> <script src="/resource/js/components/tooltip_hover.js" type="text/javascript"></script> <script src="/resource/js/vendor/jquery.dotdotdot.js" type="text/javascript"></script> <!--For Google Tag manager to be able to track site information --> <script> dataLayer = [{ 'mobileSite': 'false', 'desktopSite': 'true' }]; </script> <title>Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings | PLOS ONE</title> </head> <body class="article plosone"> <!-- Google Tag Manager --> <noscript><iframe src="//www.googletagmanager.com/ns.html?id=GTM-TP26BH" height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript> <script> (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0], j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src= '//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f); })(window,document,'script','dataLayer','GTM-TP26BH'); </script> <noscript><iframe src="//www.googletagmanager.com/ns.html?id=GTM-MQQMGF" height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript> <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0], j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src= '//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f); })(window,document,'script','dataLayer','GTM-MQQMGF');</script> <!-- End Google Tag Manager --> <!-- New Relic --> <script type="text/javascript"> ;window.NREUM||(NREUM={});NREUM.init={distributed_tracing:{enabled:true},privacy:{cookies_enabled:true},ajax:{deny_list:["bam.nr-data.net"]}}; window.NREUM||(NREUM={}),__nr_require=function(t,e,n){function r(n){if(!e[n]){var o=e[n]={exports:{}};t[n][0].call(o.exports,function(e){var o=t[n][1][e];return r(o||e)},o,o.exports)}return e[n].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<n.length;o++)r(n[o]);return r}({1:[function(t,e,n){function r(t){try{s.console&&console.log(t)}catch(e){}}var o,i=t("ee"),a=t(32),s={};try{o=localStorage.getItem("__nr_flags").split(","),console&&"function"==typeof console.log&&(s.console=!0,o.indexOf("dev")!==-1&&(s.dev=!0),o.indexOf("nr_dev")!==-1&&(s.nrDev=!0))}catch(c){}s.nrDev&&i.on("internal-error",function(t){r(t.stack)}),s.dev&&i.on("fn-err",function(t,e,n){r(n.stack)}),s.dev&&(r("NR AGENT IN DEVELOPMENT MODE"),r("flags: "+a(s,function(t,e){return t}).join(", ")))},{}],2:[function(t,e,n){function r(t,e,n,r,s){try{l?l-=1:o(s||new UncaughtException(t,e,n),!0)}catch(f){try{i("ierr",[f,c.now(),!0])}catch(d){}}return"function"==typeof u&&u.apply(this,a(arguments))}function UncaughtException(t,e,n){this.message=t||"Uncaught error with no additional information",this.sourceURL=e,this.line=n}function o(t,e){var n=e?null:c.now();i("err",[t,n])}var i=t("handle"),a=t(33),s=t("ee"),c=t("loader"),f=t("gos"),u=window.onerror,d=!1,p="nr@seenError";if(!c.disabled){var l=0;c.features.err=!0,t(1),window.onerror=r;try{throw new Error}catch(h){"stack"in h&&(t(14),t(13),"addEventListener"in window&&t(7),c.xhrWrappable&&t(15),d=!0)}s.on("fn-start",function(t,e,n){d&&(l+=1)}),s.on("fn-err",function(t,e,n){d&&!n[p]&&(f(n,p,function(){return!0}),this.thrown=!0,o(n))}),s.on("fn-end",function(){d&&!this.thrown&&l>0&&(l-=1)}),s.on("internal-error",function(t){i("ierr",[t,c.now(),!0])})}},{}],3:[function(t,e,n){var r=t("loader");r.disabled||(r.features.ins=!0)},{}],4:[function(t,e,n){function r(){U++,L=g.hash,this[u]=y.now()}function o(){U--,g.hash!==L&&i(0,!0);var t=y.now();this[h]=~~this[h]+t-this[u],this[d]=t}function i(t,e){E.emit("newURL",[""+g,e])}function a(t,e){t.on(e,function(){this[e]=y.now()})}var s="-start",c="-end",f="-body",u="fn"+s,d="fn"+c,p="cb"+s,l="cb"+c,h="jsTime",m="fetch",v="addEventListener",w=window,g=w.location,y=t("loader");if(w[v]&&y.xhrWrappable&&!y.disabled){var x=t(11),b=t(12),E=t(9),R=t(7),O=t(14),T=t(8),P=t(15),S=t(10),M=t("ee"),N=M.get("tracer"),C=t(23);t(17),y.features.spa=!0;var L,U=0;M.on(u,r),b.on(p,r),S.on(p,r),M.on(d,o),b.on(l,o),S.on(l,o),M.buffer([u,d,"xhr-resolved"]),R.buffer([u]),O.buffer(["setTimeout"+c,"clearTimeout"+s,u]),P.buffer([u,"new-xhr","send-xhr"+s]),T.buffer([m+s,m+"-done",m+f+s,m+f+c]),E.buffer(["newURL"]),x.buffer([u]),b.buffer(["propagate",p,l,"executor-err","resolve"+s]),N.buffer([u,"no-"+u]),S.buffer(["new-jsonp","cb-start","jsonp-error","jsonp-end"]),a(T,m+s),a(T,m+"-done"),a(S,"new-jsonp"),a(S,"jsonp-end"),a(S,"cb-start"),E.on("pushState-end",i),E.on("replaceState-end",i),w[v]("hashchange",i,C(!0)),w[v]("load",i,C(!0)),w[v]("popstate",function(){i(0,U>1)},C(!0))}},{}],5:[function(t,e,n){function r(){var t=new PerformanceObserver(function(t,e){var n=t.getEntries();s(v,[n])});try{t.observe({entryTypes:["resource"]})}catch(e){}}function o(t){if(s(v,[window.performance.getEntriesByType(w)]),window.performance["c"+p])try{window.performance[h](m,o,!1)}catch(t){}else try{window.performance[h]("webkit"+m,o,!1)}catch(t){}}function i(t){}if(window.performance&&window.performance.timing&&window.performance.getEntriesByType){var a=t("ee"),s=t("handle"),c=t(14),f=t(13),u=t(6),d=t(23),p="learResourceTimings",l="addEventListener",h="removeEventListener",m="resourcetimingbufferfull",v="bstResource",w="resource",g="-start",y="-end",x="fn"+g,b="fn"+y,E="bstTimer",R="pushState",O=t("loader");if(!O.disabled){O.features.stn=!0,t(9),"addEventListener"in window&&t(7);var T=NREUM.o.EV;a.on(x,function(t,e){var n=t[0];n instanceof T&&(this.bstStart=O.now())}),a.on(b,function(t,e){var n=t[0];n instanceof T&&s("bst",[n,e,this.bstStart,O.now()])}),c.on(x,function(t,e,n){this.bstStart=O.now(),this.bstType=n}),c.on(b,function(t,e){s(E,[e,this.bstStart,O.now(),this.bstType])}),f.on(x,function(){this.bstStart=O.now()}),f.on(b,function(t,e){s(E,[e,this.bstStart,O.now(),"requestAnimationFrame"])}),a.on(R+g,function(t){this.time=O.now(),this.startPath=location.pathname+location.hash}),a.on(R+y,function(t){s("bstHist",[location.pathname+location.hash,this.startPath,this.time])}),u()?(s(v,[window.performance.getEntriesByType("resource")]),r()):l in window.performance&&(window.performance["c"+p]?window.performance[l](m,o,d(!1)):window.performance[l]("webkit"+m,o,d(!1))),document[l]("scroll",i,d(!1)),document[l]("keypress",i,d(!1)),document[l]("click",i,d(!1))}}},{}],6:[function(t,e,n){e.exports=function(){return"PerformanceObserver"in window&&"function"==typeof window.PerformanceObserver}},{}],7:[function(t,e,n){function r(t){for(var e=t;e&&!e.hasOwnProperty(u);)e=Object.getPrototypeOf(e);e&&o(e)}function o(t){s.inPlace(t,[u,d],"-",i)}function i(t,e){return t[1]}var a=t("ee").get("events"),s=t("wrap-function")(a,!0),c=t("gos"),f=XMLHttpRequest,u="addEventListener",d="removeEventListener";e.exports=a,"getPrototypeOf"in Object?(r(document),r(window),r(f.prototype)):f.prototype.hasOwnProperty(u)&&(o(window),o(f.prototype)),a.on(u+"-start",function(t,e){var n=t[1];if(null!==n&&("function"==typeof n||"object"==typeof n)){var r=c(n,"nr@wrapped",function(){function t(){if("function"==typeof n.handleEvent)return n.handleEvent.apply(n,arguments)}var e={object:t,"function":n}[typeof n];return e?s(e,"fn-",null,e.name||"anonymous"):n});this.wrapped=t[1]=r}}),a.on(d+"-start",function(t){t[1]=this.wrapped||t[1]})},{}],8:[function(t,e,n){function r(t,e,n){var r=t[e];"function"==typeof r&&(t[e]=function(){var t=i(arguments),e={};o.emit(n+"before-start",[t],e);var a;e[m]&&e[m].dt&&(a=e[m].dt);var s=r.apply(this,t);return o.emit(n+"start",[t,a],s),s.then(function(t){return o.emit(n+"end",[null,t],s),t},function(t){throw o.emit(n+"end",[t],s),t})})}var o=t("ee").get("fetch"),i=t(33),a=t(32);e.exports=o;var s=window,c="fetch-",f=c+"body-",u=["arrayBuffer","blob","json","text","formData"],d=s.Request,p=s.Response,l=s.fetch,h="prototype",m="nr@context";d&&p&&l&&(a(u,function(t,e){r(d[h],e,f),r(p[h],e,f)}),r(s,"fetch",c),o.on(c+"end",function(t,e){var n=this;if(e){var r=e.headers.get("content-length");null!==r&&(n.rxSize=r),o.emit(c+"done",[null,e],n)}else o.emit(c+"done",[t],n)}))},{}],9:[function(t,e,n){var r=t("ee").get("history"),o=t("wrap-function")(r);e.exports=r;var i=window.history&&window.history.constructor&&window.history.constructor.prototype,a=window.history;i&&i.pushState&&i.replaceState&&(a=i),o.inPlace(a,["pushState","replaceState"],"-")},{}],10:[function(t,e,n){function r(t){function e(){f.emit("jsonp-end",[],l),t.removeEventListener("load",e,c(!1)),t.removeEventListener("error",n,c(!1))}function n(){f.emit("jsonp-error",[],l),f.emit("jsonp-end",[],l),t.removeEventListener("load",e,c(!1)),t.removeEventListener("error",n,c(!1))}var r=t&&"string"==typeof t.nodeName&&"script"===t.nodeName.toLowerCase();if(r){var o="function"==typeof t.addEventListener;if(o){var a=i(t.src);if(a){var d=s(a),p="function"==typeof d.parent[d.key];if(p){var l={};u.inPlace(d.parent,[d.key],"cb-",l),t.addEventListener("load",e,c(!1)),t.addEventListener("error",n,c(!1)),f.emit("new-jsonp",[t.src],l)}}}}}function o(){return"addEventListener"in window}function i(t){var e=t.match(d);return e?e[1]:null}function a(t,e){var n=t.match(l),r=n[1],o=n[3];return o?a(o,e[r]):e[r]}function s(t){var e=t.match(p);return e&&e.length>=3?{key:e[2],parent:a(e[1],window)}:{key:t,parent:window}}var c=t(23),f=t("ee").get("jsonp"),u=t("wrap-function")(f);if(e.exports=f,o()){var d=/[?&](?:callback|cb)=([^&#]+)/,p=/(.*)\.([^.]+)/,l=/^(\w+)(\.|$)(.*)$/,h=["appendChild","insertBefore","replaceChild"];Node&&Node.prototype&&Node.prototype.appendChild?u.inPlace(Node.prototype,h,"dom-"):(u.inPlace(HTMLElement.prototype,h,"dom-"),u.inPlace(HTMLHeadElement.prototype,h,"dom-"),u.inPlace(HTMLBodyElement.prototype,h,"dom-")),f.on("dom-start",function(t){r(t[0])})}},{}],11:[function(t,e,n){var r=t("ee").get("mutation"),o=t("wrap-function")(r),i=NREUM.o.MO;e.exports=r,i&&(window.MutationObserver=function(t){return this instanceof i?new i(o(t,"fn-")):i.apply(this,arguments)},MutationObserver.prototype=i.prototype)},{}],12:[function(t,e,n){function r(t){var e=i.context(),n=s(t,"executor-",e,null,!1),r=new f(n);return i.context(r).getCtx=function(){return e},r}var o=t("wrap-function"),i=t("ee").get("promise"),a=t("ee").getOrSetContext,s=o(i),c=t(32),f=NREUM.o.PR;e.exports=i,f&&(window.Promise=r,["all","race"].forEach(function(t){var e=f[t];f[t]=function(n){function r(t){return function(){i.emit("propagate",[null,!o],a,!1,!1),o=o||!t}}var o=!1;c(n,function(e,n){Promise.resolve(n).then(r("all"===t),r(!1))});var a=e.apply(f,arguments),s=f.resolve(a);return s}}),["resolve","reject"].forEach(function(t){var e=f[t];f[t]=function(t){var n=e.apply(f,arguments);return t!==n&&i.emit("propagate",[t,!0],n,!1,!1),n}}),f.prototype["catch"]=function(t){return this.then(null,t)},f.prototype=Object.create(f.prototype,{constructor:{value:r}}),c(Object.getOwnPropertyNames(f),function(t,e){try{r[e]=f[e]}catch(n){}}),o.wrapInPlace(f.prototype,"then",function(t){return function(){var e=this,n=o.argsToArray.apply(this,arguments),r=a(e);r.promise=e,n[0]=s(n[0],"cb-",r,null,!1),n[1]=s(n[1],"cb-",r,null,!1);var c=t.apply(this,n);return r.nextPromise=c,i.emit("propagate",[e,!0],c,!1,!1),c}}),i.on("executor-start",function(t){t[0]=s(t[0],"resolve-",this,null,!1),t[1]=s(t[1],"resolve-",this,null,!1)}),i.on("executor-err",function(t,e,n){t[1](n)}),i.on("cb-end",function(t,e,n){i.emit("propagate",[n,!0],this.nextPromise,!1,!1)}),i.on("propagate",function(t,e,n){this.getCtx&&!e||(this.getCtx=function(){if(t instanceof Promise)var e=i.context(t);return e&&e.getCtx?e.getCtx():this})}),r.toString=function(){return""+f})},{}],13:[function(t,e,n){var r=t("ee").get("raf"),o=t("wrap-function")(r),i="equestAnimationFrame";e.exports=r,o.inPlace(window,["r"+i,"mozR"+i,"webkitR"+i,"msR"+i],"raf-"),r.on("raf-start",function(t){t[0]=o(t[0],"fn-")})},{}],14:[function(t,e,n){function r(t,e,n){t[0]=a(t[0],"fn-",null,n)}function o(t,e,n){this.method=n,this.timerDuration=isNaN(t[1])?0:+t[1],t[0]=a(t[0],"fn-",this,n)}var i=t("ee").get("timer"),a=t("wrap-function")(i),s="setTimeout",c="setInterval",f="clearTimeout",u="-start",d="-";e.exports=i,a.inPlace(window,[s,"setImmediate"],s+d),a.inPlace(window,[c],c+d),a.inPlace(window,[f,"clearImmediate"],f+d),i.on(c+u,r),i.on(s+u,o)},{}],15:[function(t,e,n){function r(t,e){d.inPlace(e,["onreadystatechange"],"fn-",s)}function o(){var t=this,e=u.context(t);t.readyState>3&&!e.resolved&&(e.resolved=!0,u.emit("xhr-resolved",[],t)),d.inPlace(t,y,"fn-",s)}function i(t){x.push(t),m&&(E?E.then(a):w?w(a):(R=-R,O.data=R))}function a(){for(var t=0;t<x.length;t++)r([],x[t]);x.length&&(x=[])}function s(t,e){return e}function c(t,e){for(var n in t)e[n]=t[n];return e}t(7);var f=t("ee"),u=f.get("xhr"),d=t("wrap-function")(u),p=t(23),l=NREUM.o,h=l.XHR,m=l.MO,v=l.PR,w=l.SI,g="readystatechange",y=["onload","onerror","onabort","onloadstart","onloadend","onprogress","ontimeout"],x=[];e.exports=u;var b=window.XMLHttpRequest=function(t){var e=new h(t);try{u.emit("new-xhr",[e],e),e.addEventListener(g,o,p(!1))}catch(n){try{u.emit("internal-error",[n])}catch(r){}}return e};if(c(h,b),b.prototype=h.prototype,d.inPlace(b.prototype,["open","send"],"-xhr-",s),u.on("send-xhr-start",function(t,e){r(t,e),i(e)}),u.on("open-xhr-start",r),m){var E=v&&v.resolve();if(!w&&!v){var R=1,O=document.createTextNode(R);new m(a).observe(O,{characterData:!0})}}else f.on("fn-end",function(t){t[0]&&t[0].type===g||a()})},{}],16:[function(t,e,n){function r(t){if(!s(t))return null;var e=window.NREUM;if(!e.loader_config)return null;var n=(e.loader_config.accountID||"").toString()||null,r=(e.loader_config.agentID||"").toString()||null,f=(e.loader_config.trustKey||"").toString()||null;if(!n||!r)return null;var h=l.generateSpanId(),m=l.generateTraceId(),v=Date.now(),w={spanId:h,traceId:m,timestamp:v};return(t.sameOrigin||c(t)&&p())&&(w.traceContextParentHeader=o(h,m),w.traceContextStateHeader=i(h,v,n,r,f)),(t.sameOrigin&&!u()||!t.sameOrigin&&c(t)&&d())&&(w.newrelicHeader=a(h,m,v,n,r,f)),w}function o(t,e){return"00-"+e+"-"+t+"-01"}function i(t,e,n,r,o){var i=0,a="",s=1,c="",f="";return o+"@nr="+i+"-"+s+"-"+n+"-"+r+"-"+t+"-"+a+"-"+c+"-"+f+"-"+e}function a(t,e,n,r,o,i){var a="btoa"in window&&"function"==typeof window.btoa;if(!a)return null;var s={v:[0,1],d:{ty:"Browser",ac:r,ap:o,id:t,tr:e,ti:n}};return i&&r!==i&&(s.d.tk=i),btoa(JSON.stringify(s))}function s(t){return f()&&c(t)}function c(t){var e=!1,n={};if("init"in NREUM&&"distributed_tracing"in NREUM.init&&(n=NREUM.init.distributed_tracing),t.sameOrigin)e=!0;else if(n.allowed_origins instanceof Array)for(var r=0;r<n.allowed_origins.length;r++){var o=h(n.allowed_origins[r]);if(t.hostname===o.hostname&&t.protocol===o.protocol&&t.port===o.port){e=!0;break}}return e}function f(){return"init"in NREUM&&"distributed_tracing"in NREUM.init&&!!NREUM.init.distributed_tracing.enabled}function u(){return"init"in NREUM&&"distributed_tracing"in NREUM.init&&!!NREUM.init.distributed_tracing.exclude_newrelic_header}function d(){return"init"in NREUM&&"distributed_tracing"in NREUM.init&&NREUM.init.distributed_tracing.cors_use_newrelic_header!==!1}function p(){return"init"in NREUM&&"distributed_tracing"in NREUM.init&&!!NREUM.init.distributed_tracing.cors_use_tracecontext_headers}var l=t(29),h=t(18);e.exports={generateTracePayload:r,shouldGenerateTrace:s}},{}],17:[function(t,e,n){function r(t){var e=this.params,n=this.metrics;if(!this.ended){this.ended=!0;for(var r=0;r<p;r++)t.removeEventListener(d[r],this.listener,!1);e.aborted||(n.duration=a.now()-this.startTime,this.loadCaptureCalled||4!==t.readyState?null==e.status&&(e.status=0):i(this,t),n.cbTime=this.cbTime,s("xhr",[e,n,this.startTime,this.endTime,"xhr"],this))}}function o(t,e){var n=c(e),r=t.params;r.hostname=n.hostname,r.port=n.port,r.protocol=n.protocol,r.host=n.hostname+":"+n.port,r.pathname=n.pathname,t.parsedOrigin=n,t.sameOrigin=n.sameOrigin}function i(t,e){t.params.status=e.status;var n=v(e,t.lastSize);if(n&&(t.metrics.rxSize=n),t.sameOrigin){var r=e.getResponseHeader("X-NewRelic-App-Data");r&&(t.params.cat=r.split(", ").pop())}t.loadCaptureCalled=!0}var a=t("loader");if(a.xhrWrappable&&!a.disabled){var s=t("handle"),c=t(18),f=t(16).generateTracePayload,u=t("ee"),d=["load","error","abort","timeout"],p=d.length,l=t("id"),h=t(24),m=t(22),v=t(19),w=t(23),g=NREUM.o.REQ,y=window.XMLHttpRequest;a.features.xhr=!0,t(15),t(8),u.on("new-xhr",function(t){var e=this;e.totalCbs=0,e.called=0,e.cbTime=0,e.end=r,e.ended=!1,e.xhrGuids={},e.lastSize=null,e.loadCaptureCalled=!1,e.params=this.params||{},e.metrics=this.metrics||{},t.addEventListener("load",function(n){i(e,t)},w(!1)),h&&(h>34||h<10)||t.addEventListener("progress",function(t){e.lastSize=t.loaded},w(!1))}),u.on("open-xhr-start",function(t){this.params={method:t[0]},o(this,t[1]),this.metrics={}}),u.on("open-xhr-end",function(t,e){"loader_config"in NREUM&&"xpid"in NREUM.loader_config&&this.sameOrigin&&e.setRequestHeader("X-NewRelic-ID",NREUM.loader_config.xpid);var n=f(this.parsedOrigin);if(n){var r=!1;n.newrelicHeader&&(e.setRequestHeader("newrelic",n.newrelicHeader),r=!0),n.traceContextParentHeader&&(e.setRequestHeader("traceparent",n.traceContextParentHeader),n.traceContextStateHeader&&e.setRequestHeader("tracestate",n.traceContextStateHeader),r=!0),r&&(this.dt=n)}}),u.on("send-xhr-start",function(t,e){var n=this.metrics,r=t[0],o=this;if(n&&r){var i=m(r);i&&(n.txSize=i)}this.startTime=a.now(),this.listener=function(t){try{"abort"!==t.type||o.loadCaptureCalled||(o.params.aborted=!0),("load"!==t.type||o.called===o.totalCbs&&(o.onloadCalled||"function"!=typeof e.onload))&&o.end(e)}catch(n){try{u.emit("internal-error",[n])}catch(r){}}};for(var s=0;s<p;s++)e.addEventListener(d[s],this.listener,w(!1))}),u.on("xhr-cb-time",function(t,e,n){this.cbTime+=t,e?this.onloadCalled=!0:this.called+=1,this.called!==this.totalCbs||!this.onloadCalled&&"function"==typeof n.onload||this.end(n)}),u.on("xhr-load-added",function(t,e){var n=""+l(t)+!!e;this.xhrGuids&&!this.xhrGuids[n]&&(this.xhrGuids[n]=!0,this.totalCbs+=1)}),u.on("xhr-load-removed",function(t,e){var n=""+l(t)+!!e;this.xhrGuids&&this.xhrGuids[n]&&(delete this.xhrGuids[n],this.totalCbs-=1)}),u.on("xhr-resolved",function(){this.endTime=a.now()}),u.on("addEventListener-end",function(t,e){e instanceof y&&"load"===t[0]&&u.emit("xhr-load-added",[t[1],t[2]],e)}),u.on("removeEventListener-end",function(t,e){e instanceof y&&"load"===t[0]&&u.emit("xhr-load-removed",[t[1],t[2]],e)}),u.on("fn-start",function(t,e,n){e instanceof y&&("onload"===n&&(this.onload=!0),("load"===(t[0]&&t[0].type)||this.onload)&&(this.xhrCbStart=a.now()))}),u.on("fn-end",function(t,e){this.xhrCbStart&&u.emit("xhr-cb-time",[a.now()-this.xhrCbStart,this.onload,e],e)}),u.on("fetch-before-start",function(t){function e(t,e){var n=!1;return e.newrelicHeader&&(t.set("newrelic",e.newrelicHeader),n=!0),e.traceContextParentHeader&&(t.set("traceparent",e.traceContextParentHeader),e.traceContextStateHeader&&t.set("tracestate",e.traceContextStateHeader),n=!0),n}var n,r=t[1]||{};"string"==typeof t[0]?n=t[0]:t[0]&&t[0].url?n=t[0].url:window.URL&&t[0]&&t[0]instanceof URL&&(n=t[0].href),n&&(this.parsedOrigin=c(n),this.sameOrigin=this.parsedOrigin.sameOrigin);var o=f(this.parsedOrigin);if(o&&(o.newrelicHeader||o.traceContextParentHeader))if("string"==typeof t[0]||window.URL&&t[0]&&t[0]instanceof URL){var i={};for(var a in r)i[a]=r[a];i.headers=new Headers(r.headers||{}),e(i.headers,o)&&(this.dt=o),t.length>1?t[1]=i:t.push(i)}else t[0]&&t[0].headers&&e(t[0].headers,o)&&(this.dt=o)}),u.on("fetch-start",function(t,e){this.params={},this.metrics={},this.startTime=a.now(),this.dt=e,t.length>=1&&(this.target=t[0]),t.length>=2&&(this.opts=t[1]);var n,r=this.opts||{},i=this.target;"string"==typeof i?n=i:"object"==typeof i&&i instanceof g?n=i.url:window.URL&&"object"==typeof i&&i instanceof URL&&(n=i.href),o(this,n);var s=(""+(i&&i instanceof g&&i.method||r.method||"GET")).toUpperCase();this.params.method=s,this.txSize=m(r.body)||0}),u.on("fetch-done",function(t,e){this.endTime=a.now(),this.params||(this.params={}),this.params.status=e?e.status:0;var n;"string"==typeof this.rxSize&&this.rxSize.length>0&&(n=+this.rxSize);var r={txSize:this.txSize,rxSize:n,duration:a.now()-this.startTime};s("xhr",[this.params,r,this.startTime,this.endTime,"fetch"],this)})}},{}],18:[function(t,e,n){var r={};e.exports=function(t){if(t in r)return r[t];var e=document.createElement("a"),n=window.location,o={};e.href=t,o.port=e.port;var i=e.href.split("://");!o.port&&i[1]&&(o.port=i[1].split("/")[0].split("@").pop().split(":")[1]),o.port&&"0"!==o.port||(o.port="https"===i[0]?"443":"80"),o.hostname=e.hostname||n.hostname,o.pathname=e.pathname,o.protocol=i[0],"/"!==o.pathname.charAt(0)&&(o.pathname="/"+o.pathname);var a=!e.protocol||":"===e.protocol||e.protocol===n.protocol,s=e.hostname===document.domain&&e.port===n.port;return o.sameOrigin=a&&(!e.hostname||s),"/"===o.pathname&&(r[t]=o),o}},{}],19:[function(t,e,n){function r(t,e){var n=t.responseType;return"json"===n&&null!==e?e:"arraybuffer"===n||"blob"===n||"json"===n?o(t.response):"text"===n||""===n||void 0===n?o(t.responseText):void 0}var o=t(22);e.exports=r},{}],20:[function(t,e,n){function r(){}function o(t,e,n,r){return function(){return u.recordSupportability("API/"+e+"/called"),i(t+e,[f.now()].concat(s(arguments)),n?null:this,r),n?void 0:this}}var i=t("handle"),a=t(32),s=t(33),c=t("ee").get("tracer"),f=t("loader"),u=t(25),d=NREUM;"undefined"==typeof window.newrelic&&(newrelic=d);var p=["setPageViewName","setCustomAttribute","setErrorHandler","finished","addToTrace","inlineHit","addRelease"],l="api-",h=l+"ixn-";a(p,function(t,e){d[e]=o(l,e,!0,"api")}),d.addPageAction=o(l,"addPageAction",!0),d.setCurrentRouteName=o(l,"routeName",!0),e.exports=newrelic,d.interaction=function(){return(new r).get()};var m=r.prototype={createTracer:function(t,e){var n={},r=this,o="function"==typeof e;return i(h+"tracer",[f.now(),t,n],r),function(){if(c.emit((o?"":"no-")+"fn-start",[f.now(),r,o],n),o)try{return e.apply(this,arguments)}catch(t){throw c.emit("fn-err",[arguments,this,t],n),t}finally{c.emit("fn-end",[f.now()],n)}}}};a("actionText,setName,setAttribute,save,ignore,onEnd,getContext,end,get".split(","),function(t,e){m[e]=o(h,e)}),newrelic.noticeError=function(t,e){"string"==typeof t&&(t=new Error(t)),u.recordSupportability("API/noticeError/called"),i("err",[t,f.now(),!1,e])}},{}],21:[function(t,e,n){function r(t){if(NREUM.init){for(var e=NREUM.init,n=t.split("."),r=0;r<n.length-1;r++)if(e=e[n[r]],"object"!=typeof e)return;return e=e[n[n.length-1]]}}e.exports={getConfiguration:r}},{}],22:[function(t,e,n){e.exports=function(t){if("string"==typeof t&&t.length)return t.length;if("object"==typeof t){if("undefined"!=typeof ArrayBuffer&&t instanceof ArrayBuffer&&t.byteLength)return t.byteLength;if("undefined"!=typeof Blob&&t instanceof Blob&&t.size)return t.size;if(!("undefined"!=typeof FormData&&t instanceof FormData))try{return JSON.stringify(t).length}catch(e){return}}}},{}],23:[function(t,e,n){var r=!1;try{var o=Object.defineProperty({},"passive",{get:function(){r=!0}});window.addEventListener("testPassive",null,o),window.removeEventListener("testPassive",null,o)}catch(i){}e.exports=function(t){return r?{passive:!0,capture:!!t}:!!t}},{}],24:[function(t,e,n){var r=0,o=navigator.userAgent.match(/Firefox[\/\s](\d+\.\d+)/);o&&(r=+o[1]),e.exports=r},{}],25:[function(t,e,n){function r(t,e){var n=[a,t,{name:t},e];return i("storeMetric",n,null,"api"),n}function o(t,e){var n=[s,t,{name:t},e];return i("storeEventMetrics",n,null,"api"),n}var i=t("handle"),a="sm",s="cm";e.exports={constants:{SUPPORTABILITY_METRIC:a,CUSTOM_METRIC:s},recordSupportability:r,recordCustom:o}},{}],26:[function(t,e,n){function r(){return s.exists&&performance.now?Math.round(performance.now()):(i=Math.max((new Date).getTime(),i))-a}function o(){return i}var i=(new Date).getTime(),a=i,s=t(34);e.exports=r,e.exports.offset=a,e.exports.getLastTimestamp=o},{}],27:[function(t,e,n){function r(t){return!(!t||!t.protocol||"file:"===t.protocol)}e.exports=r},{}],28:[function(t,e,n){function r(t,e){var n=t.getEntries();n.forEach(function(t){"first-paint"===t.name?p("timing",["fp",Math.floor(t.startTime)]):"first-contentful-paint"===t.name&&p("timing",["fcp",Math.floor(t.startTime)])})}function o(t,e){var n=t.getEntries();if(n.length>0){var r=n[n.length-1];if(c&&c<r.startTime)return;p("lcp",[r])}}function i(t){t.getEntries().forEach(function(t){t.hadRecentInput||p("cls",[t])})}function a(t){if(t instanceof v&&!g){var e=Math.round(t.timeStamp),n={type:t.type};e<=l.now()?n.fid=l.now()-e:e>l.offset&&e<=Date.now()?(e-=l.offset,n.fid=l.now()-e):e=l.now(),g=!0,p("timing",["fi",e,n])}}function s(t){"hidden"===t&&(c=l.now(),p("pageHide",[c]))}if(!("init"in NREUM&&"page_view_timing"in NREUM.init&&"enabled"in NREUM.init.page_view_timing&&NREUM.init.page_view_timing.enabled===!1)){var c,f,u,d,p=t("handle"),l=t("loader"),h=t(31),m=t(23),v=NREUM.o.EV;if("PerformanceObserver"in window&&"function"==typeof window.PerformanceObserver){f=new PerformanceObserver(r);try{f.observe({entryTypes:["paint"]})}catch(w){}u=new PerformanceObserver(o);try{u.observe({entryTypes:["largest-contentful-paint"]})}catch(w){}d=new PerformanceObserver(i);try{d.observe({type:"layout-shift",buffered:!0})}catch(w){}}if("addEventListener"in document){var g=!1,y=["click","keydown","mousedown","pointerdown","touchstart"];y.forEach(function(t){document.addEventListener(t,a,m(!1))})}h(s)}},{}],29:[function(t,e,n){function r(){function t(){return e?15&e[n++]:16*Math.random()|0}var e=null,n=0,r=window.crypto||window.msCrypto;r&&r.getRandomValues&&(e=r.getRandomValues(new Uint8Array(31)));for(var o,i="xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx",a="",s=0;s<i.length;s++)o=i[s],"x"===o?a+=t().toString(16):"y"===o?(o=3&t()|8,a+=o.toString(16)):a+=o;return a}function o(){return a(16)}function i(){return a(32)}function a(t){function e(){return n?15&n[r++]:16*Math.random()|0}var n=null,r=0,o=window.crypto||window.msCrypto;o&&o.getRandomValues&&Uint8Array&&(n=o.getRandomValues(new Uint8Array(31)));for(var i=[],a=0;a<t;a++)i.push(e().toString(16));return i.join("")}e.exports={generateUuid:r,generateSpanId:o,generateTraceId:i}},{}],30:[function(t,e,n){function r(t,e){if(!o)return!1;if(t!==o)return!1;if(!e)return!0;if(!i)return!1;for(var n=i.split("."),r=e.split("."),a=0;a<r.length;a++)if(r[a]!==n[a])return!1;return!0}var o=null,i=null,a=/Version\/(\S+)\s+Safari/;if(navigator.userAgent){var s=navigator.userAgent,c=s.match(a);c&&s.indexOf("Chrome")===-1&&s.indexOf("Chromium")===-1&&(o="Safari",i=c[1])}e.exports={agent:o,version:i,match:r}},{}],31:[function(t,e,n){function r(t){function e(){t(s&&document[s]?document[s]:document[i]?"hidden":"visible")}"addEventListener"in document&&a&&document.addEventListener(a,e,o(!1))}var o=t(23);e.exports=r;var i,a,s;"undefined"!=typeof document.hidden?(i="hidden",a="visibilitychange",s="visibilityState"):"undefined"!=typeof document.msHidden?(i="msHidden",a="msvisibilitychange"):"undefined"!=typeof document.webkitHidden&&(i="webkitHidden",a="webkitvisibilitychange",s="webkitVisibilityState")},{}],32:[function(t,e,n){function r(t,e){var n=[],r="",i=0;for(r in t)o.call(t,r)&&(n[i]=e(r,t[r]),i+=1);return n}var o=Object.prototype.hasOwnProperty;e.exports=r},{}],33:[function(t,e,n){function r(t,e,n){e||(e=0),"undefined"==typeof n&&(n=t?t.length:0);for(var r=-1,o=n-e||0,i=Array(o<0?0:o);++r<o;)i[r]=t[e+r];return i}e.exports=r},{}],34:[function(t,e,n){e.exports={exists:"undefined"!=typeof window.performance&&window.performance.timing&&"undefined"!=typeof window.performance.timing.navigationStart}},{}],ee:[function(t,e,n){function r(){}function o(t){function e(t){return t&&t instanceof r?t:t?f(t,c,a):a()}function n(n,r,o,i,a){if(a!==!1&&(a=!0),!l.aborted||i){t&&a&&t(n,r,o);for(var s=e(o),c=m(n),f=c.length,u=0;u<f;u++)c[u].apply(s,r);var p=d[y[n]];return p&&p.push([x,n,r,s]),s}}function i(t,e){g[t]=m(t).concat(e)}function h(t,e){var n=g[t];if(n)for(var r=0;r<n.length;r++)n[r]===e&&n.splice(r,1)}function m(t){return g[t]||[]}function v(t){return p[t]=p[t]||o(n)}function w(t,e){l.aborted||u(t,function(t,n){e=e||"feature",y[n]=e,e in d||(d[e]=[])})}var g={},y={},x={on:i,addEventListener:i,removeEventListener:h,emit:n,get:v,listeners:m,context:e,buffer:w,abort:s,aborted:!1};return x}function i(t){return f(t,c,a)}function a(){return new r}function s(){(d.api||d.feature)&&(l.aborted=!0,d=l.backlog={})}var c="nr@context",f=t("gos"),u=t(32),d={},p={},l=e.exports=o();e.exports.getOrSetContext=i,l.backlog=d},{}],gos:[function(t,e,n){function r(t,e,n){if(o.call(t,e))return t[e];var r=n();if(Object.defineProperty&&Object.keys)try{return Object.defineProperty(t,e,{value:r,writable:!0,enumerable:!1}),r}catch(i){}return t[e]=r,r}var o=Object.prototype.hasOwnProperty;e.exports=r},{}],handle:[function(t,e,n){function r(t,e,n,r){o.buffer([t],r),o.emit(t,e,n)}var o=t("ee").get("handle");e.exports=r,r.ee=o},{}],id:[function(t,e,n){function r(t){var e=typeof t;return!t||"object"!==e&&"function"!==e?-1:t===window?0:a(t,i,function(){return o++})}var o=1,i="nr@id",a=t("gos");e.exports=r},{}],loader:[function(t,e,n){function r(){if(!P++){var t=T.info=NREUM.info,e=v.getElementsByTagName("script")[0];if(setTimeout(f.abort,3e4),!(t&&t.licenseKey&&t.applicationID&&e))return f.abort();c(R,function(e,n){t[e]||(t[e]=n)});var n=a();s("mark",["onload",n+T.offset],null,"api"),s("timing",["load",n]);var r=v.createElement("script");0===t.agent.indexOf("http://")||0===t.agent.indexOf("https://")?r.src=t.agent:r.src=h+"://"+t.agent,e.parentNode.insertBefore(r,e)}}function o(){"complete"===v.readyState&&i()}function i(){s("mark",["domContent",a()+T.offset],null,"api")}var a=t(26),s=t("handle"),c=t(32),f=t("ee"),u=t(30),d=t(27),p=t(21),l=t(23),h=p.getConfiguration("ssl")===!1?"http":"https",m=window,v=m.document,w="addEventListener",g="attachEvent",y=m.XMLHttpRequest,x=y&&y.prototype,b=!d(m.location);NREUM.o={ST:setTimeout,SI:m.setImmediate,CT:clearTimeout,XHR:y,REQ:m.Request,EV:m.Event,PR:m.Promise,MO:m.MutationObserver};var E=""+location,R={beacon:"bam.nr-data.net",errorBeacon:"bam.nr-data.net",agent:"js-agent.newrelic.com/nr-spa-1212.min.js"},O=y&&x&&x[w]&&!/CriOS/.test(navigator.userAgent),T=e.exports={offset:a.getLastTimestamp(),now:a,origin:E,features:{},xhrWrappable:O,userAgent:u,disabled:b};if(!b){t(20),t(28),v[w]?(v[w]("DOMContentLoaded",i,l(!1)),m[w]("load",r,l(!1))):(v[g]("onreadystatechange",o),m[g]("onload",r)),s("mark",["firstbyte",a.getLastTimestamp()],null,"api");var P=0}},{}],"wrap-function":[function(t,e,n){function r(t,e){function n(e,n,r,c,f){function nrWrapper(){var i,a,u,p;try{a=this,i=d(arguments),u="function"==typeof r?r(i,a):r||{}}catch(l){o([l,"",[i,a,c],u],t)}s(n+"start",[i,a,c],u,f);try{return p=e.apply(a,i)}catch(h){throw s(n+"err",[i,a,h],u,f),h}finally{s(n+"end",[i,a,p],u,f)}}return a(e)?e:(n||(n=""),nrWrapper[p]=e,i(e,nrWrapper,t),nrWrapper)}function r(t,e,r,o,i){r||(r="");var s,c,f,u="-"===r.charAt(0);for(f=0;f<e.length;f++)c=e[f],s=t[c],a(s)||(t[c]=n(s,u?c+r:r,o,c,i))}function s(n,r,i,a){if(!h||e){var s=h;h=!0;try{t.emit(n,r,i,e,a)}catch(c){o([c,n,r,i],t)}h=s}}return t||(t=u),n.inPlace=r,n.flag=p,n}function o(t,e){e||(e=u);try{e.emit("internal-error",t)}catch(n){}}function i(t,e,n){if(Object.defineProperty&&Object.keys)try{var r=Object.keys(t);return r.forEach(function(n){Object.defineProperty(e,n,{get:function(){return t[n]},set:function(e){return t[n]=e,e}})}),e}catch(i){o([i],n)}for(var a in t)l.call(t,a)&&(e[a]=t[a]);return e}function a(t){return!(t&&t instanceof Function&&t.apply&&!t[p])}function s(t,e){var n=e(t);return n[p]=t,i(t,n,u),n}function c(t,e,n){var r=t[e];t[e]=s(r,n)}function f(){for(var t=arguments.length,e=new Array(t),n=0;n<t;++n)e[n]=arguments[n];return e}var u=t("ee"),d=t(33),p="nr@original",l=Object.prototype.hasOwnProperty,h=!1;e.exports=r,e.exports.wrapFunction=s,e.exports.wrapInPlace=c,e.exports.argsToArray=f},{}]},{},["loader",2,17,5,3,4]); ;NREUM.loader_config={accountID:"804283",trustKey:"804283",agentID:"402703674",licenseKey:"cf99e8d2a3",applicationID:"402703674"} ;NREUM.info={beacon:"bam.nr-data.net",errorBeacon:"bam.nr-data.net",licenseKey:"cf99e8d2a3", // Modified this value from the generated script, to pass prod vs dev applicationID: window.location.hostname.includes('journals.plos.org') ? "402703674" : "402694889", sa:1} </script> <!-- End New Relic --> <header> <div id="topslot" class="head-top"> <a id="skip-to-content" tabindex="0" class="button" href="#main-content"> Skip to main content </a> <div class="center"> <div class="title">Advertisement</div> <!-- DoubleClick Ad Zone --> <div class='advertisement' id='div-gpt-ad-1458247671871-0' style='width:728px; height:90px;'> <script type='text/javascript'> googletag.cmd.push(function() { googletag.display('div-gpt-ad-1458247671871-0'); }); </script> </div> </div> </div> <div id="user" class="nav" data-user-management-url="https://community.plos.org"> </div> <div id="pagehdr"> <nav class="nav-main"> <h1 class="logo"> <a href="/plosone/.">PLOS ONE</a> </h1> <section class="top-bar-section"> <ul class="nav-elements"> <li class="multi-col-parent menu-section-header has-dropdown" id="publish"> Publish <div class="dropdown mega "> <ul class="multi-col" id="publish-dropdown-list"> <li class="menu-section-header " id="submissions"> <span class="menu-section-header-title"> Submissions </span> <ul class="menu-section " id="submissions-dropdown-list"> <li> <a href="/plosone/s/getting-started" >Getting Started</a> </li> <li> <a href="/plosone/s/submission-guidelines" >Submission Guidelines</a> </li> <li> <a href="/plosone/s/figures" >Figures</a> </li> <li> <a href="/plosone/s/tables" >Tables</a> </li> <li> <a href="/plosone/s/supporting-information" >Supporting Information</a> </li> <li> <a href="/plosone/s/latex" >LaTeX</a> </li> <li> <a href="/plosone/s/what-we-publish" >What We Publish</a> </li> <li> <a href="/plosone/s/preprints" >Preprints</a> </li> <li> <a href="/plosone/s/revising-your-manuscript" >Revising Your Manuscript</a> </li> <li> <a href="/plosone/s/submit-now" >Submit Now</a> </li> <li> <a href="https://collections.plos.org/s/calls-for-papers" >Calls for Papers</a> </li> </ul> </li> <li class="menu-section-header " id="policies"> <span class="menu-section-header-title"> Policies </span> <ul class="menu-section " id="policies-dropdown-list"> <li> <a href="/plosone/s/best-practices-in-research-reporting" >Best Practices in Research Reporting</a> </li> <li> <a href="/plosone/s/human-subjects-research" >Human Subjects Research</a> </li> <li> <a href="/plosone/s/animal-research" >Animal Research</a> </li> <li> <a href="/plosone/s/competing-interests" >Competing Interests</a> </li> <li> <a href="/plosone/s/disclosure-of-funding-sources" >Disclosure of Funding Sources</a> </li> <li> <a href="/plosone/s/licenses-and-copyright" >Licenses and Copyright</a> </li> <li> <a href="/plosone/s/data-availability" >Data Availability</a> </li> <li> <a href="/plosone/s/complementary-research" >Complementary Research</a> </li> <li> <a href="/plosone/s/materials-software-and-code-sharing" >Materials, Software and Code Sharing</a> </li> <li> <a href="/plosone/s/ethical-publishing-practice" >Ethical Publishing Practice</a> </li> <li> <a href="/plosone/s/authorship" >Authorship</a> </li> <li> <a href="/plosone/s/corrections-expressions-of-concern-and-retractions" >Corrections, Expressions of Concern, and Retractions</a> </li> </ul> </li> <li class="menu-section-header " id="manuscript-review-and-publication"> <span class="menu-section-header-title"> Manuscript Review and Publication </span> <ul class="menu-section " id="manuscript-review-and-publication-dropdown-list"> <li> <a href="/plosone/s/criteria-for-publication" >Criteria for Publication</a> </li> <li> <a href="/plosone/s/editorial-and-peer-review-process" >Editorial and Peer Review Process</a> </li> <li> <a href="https://plos.org/resources/editor-center" >Editor Center</a> </li> <li> <a href="/plosone/s/resources-for-editors" >Resources for Editors</a> </li> <li> <a href="/plosone/s/reviewer-guidelines" >Guidelines for Reviewers</a> </li> <li> <a href="/plosone/s/accepted-manuscripts" >Accepted Manuscripts</a> </li> <li> <a href="/plosone/s/comments" >Comments</a> </li> </ul> </li> </ul> <div class="calloutcontainer"> <h3 class="callout-headline">Submit Your Manuscript</h3> <div class="action-contain"> <p class="callout-content"> Discover a faster, simpler path to publishing in a high-quality journal. <em>PLOS ONE</em> promises fair, rigorous peer review, broad scope, and wide readership – a perfect fit for your research every time. </p> <p class="button-contain special"> <a class="button button-default" href="/plosone/static/publish"> Learn More </a> <a class="button-link" href="https://www.editorialmanager.com/pone/default.asp"> Submit Now </a> </p> </div> <!-- opens in siteMenuCalloutDescription --> </div> </div> </li> <li class="menu-section-header has-dropdown " id="about"> <span class="menu-section-header-title"> About </span> <ul class="menu-section dropdown " id="about-dropdown-list"> <li> <a href="/plosone/static/publish" >Why Publish with PLOS ONE</a> </li> <li> <a href="/plosone/s/journal-information" >Journal Information</a> </li> <li> <a href="/plosone/s/staff-editors" >Staff Editors</a> </li> <li> <a href="/plosone/static/editorial-board" >Editorial Board</a> </li> <li> <a href="/plosone/s/section-editors" >Section Editors</a> </li> <li> <a href="/plosone/s/advisory-groups" >Advisory Groups</a> </li> <li> <a href="/plosone/s/find-and-read-articles" >Find and Read Articles</a> </li> <li> <a href="/plosone/s/publishing-information" >Publishing Information</a> </li> <li> <a href="https://plos.org/publication-fees" >Publication Fees</a> </li> <li> <a href="https://plos.org/press-and-media" >Press and Media</a> </li> <li> <a href="/plosone/s/contact" >Contact</a> </li> </ul> </li> <li data-js-tooltip-hover="trigger" class="subject-area menu-section-header"> Browse </li> <script src="/resource/js/vendor/jquery.hoverIntent.js" type="text/javascript"></script> <script src="/resource/js/components/menu_drop.js" type="text/javascript"></script> <script src="/resource/js/components/hover_delay.js" type="text/javascript"></script> <li id="navsearch" class="head-search"> <form name="searchForm" action="/plosone/search" method="get"> <fieldset> <legend>Search</legend> <label for="search">Search</label> <div class="search-contain"> <input id="search" type="text" name="q" placeholder="SEARCH" required/> <button id="headerSearchButton" type="submit" aria-label="Submit search"> <i title="Submit search" class="search-icon"></i> </button> </div> </fieldset> <input type="hidden" name="filterJournals" value="PLoSONE"/> </form> <a id="advSearch" href="/plosone/search"> advanced search </a> <script src="/resource/js/components/placeholder_style.js" type="text/javascript"></script> </li> </ul> </section> </nav> </div> </header> <section id="taxonomyContainer"> <script src="/resource/js/taxonomy-browser.js" type="text/javascript"></script> <div id="taxonomy-browser" class="areas" data-search-url="/plosone/browse"> <div class="wrapper"> <div class="taxonomy-header"> Browse Subject Areas <div id="subjInfo">?</div> <div id="subjInfoText"> <p>Click through the PLOS taxonomy to find articles in your field.</p> <p>For more information about PLOS Subject Areas, click <a href="https://github.com/PLOS/plos-thesaurus/blob/master/README.md" target="_blank" title="Link opens in new window">here</a>. </p> </div> </div> <div class="levels"> <div class="levels-container cf"> <div class="levels-position"></div> </div> <a href="#" class="prev"></a> <a href="#" class="next active"></a> </div> </div> <div class="taxonomy-browser-border-bottom"></div> </div> </section> <main id="main-content"> <div class="set-grid"> <header class="title-block"> <script src="/resource/js/components/signposts.js" type="text/javascript"></script> <ul id="almSignposts" class="signposts"> <li id="loadingMetrics"> <p>Loading metrics</p> </li> </ul> <script type="text/template" id="signpostsGeneralErrorTemplate"> <li id="metricsError">Article metrics are unavailable at this time. Please try again later.</li> </script> <script type="text/template" id="signpostsNewArticleErrorTemplate"> <li></li><li></li><li id="tooSoon">Article metrics are unavailable for recently published articles.</li> </script> <script type="text/template" id="signpostsTemplate"> <li id="almSaves"> <%= s.numberFormat(saveCount, 0) %> <div class="tools" data-js-tooltip-hover="trigger"> <a class="metric-term" href="/plosone/article/metrics?id=10.1371/journal.pone.0125592#savedHeader">Save</a> <p class="saves-tip" data-js-tooltip-hover="target"><a href="/plosone/article/metrics?id=10.1371/journal.pone.0125592#savedHeader">Total Mendeley and Citeulike bookmarks.</a></p> </div> </li> <li id="almCitations"> <%= s.numberFormat(citationCount, 0) %> <div class="tools" data-js-tooltip-hover="trigger"> <a class="metric-term" href="/plosone/article/metrics?id=10.1371/journal.pone.0125592#citedHeader">Citation</a> <p class="citations-tip" data-js-tooltip-hover="target"><a href="/plosone/article/metrics?id=10.1371/journal.pone.0125592#citedHeader">Paper's citation count computed by Dimensions.</a></p> </div> </li> <li id="almViews"> <%= s.numberFormat(viewCount, 0) %> <div class="tools" data-js-tooltip-hover="trigger"> <a class="metric-term" href="/plosone/article/metrics?id=10.1371/journal.pone.0125592#viewedHeader">View</a> <p class="views-tip" data-js-tooltip-hover="target"><a href="/plosone/article/metrics?id=10.1371/journal.pone.0125592#viewedHeader">PLOS views and downloads.</a></p> </div> </li> <li id="almShares"> <%= s.numberFormat(shareCount, 0) %> <div class="tools" data-js-tooltip-hover="trigger"> <a class="metric-term" href="/plosone/article/metrics?id=10.1371/journal.pone.0125592#discussedHeader">Share</a> <p class="shares-tip" data-js-tooltip-hover="target"><a href="/plosone/article/metrics?id=10.1371/journal.pone.0125592#discussedHeader">Sum of Facebook, Twitter, Reddit and Wikipedia activity.</a></p> </div> </li> </script> <div class="article-meta"> <div class="classifications"> <p class="license-short" id="licenseShort">Open Access</p> <p class="peer-reviewed" id="peerReviewed">Peer-reviewed</p> <div class="article-type" > <p class="type-article" id="artType">Research Article</p> </div> </div> </div> <div class="article-title-etc"> <div class="title-authors"> <h1 id="artTitle"><?xml version="1.0" encoding="UTF-8"?>Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings</h1> <ul class="author-list clearfix" data-js-tooltip="tooltip_container" id="author-list"> <li data-js-tooltip="tooltip_trigger" > <a data-author-id="0" class="author-name" > Xiaoyong Yan,</a> <div id="author-meta-0" class="author-info" data-js-tooltip="tooltip_target"> <p id="authAffiliations-0"><span class="type">Affiliations</span> Systems Science Institute, Beijing Jiaotong University, Beijing 100044, China, Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 611731, China </p> <a data-js-tooltip="tooltip_close" class="close" id="tooltipClose0"> &#x02A2F; </a> </div> </li> <li data-js-tooltip="tooltip_trigger" > <a data-author-id="1" class="author-name" > Petter Minnhagen <span class="email"> </span></a> <div id="author-meta-1" class="author-info" data-js-tooltip="tooltip_target"> <p id="authCorresponding-1"> <span class="email">* E-mail:</span> <a href="mailto:Petter.Minnhagen@physics.umu.se">Petter.Minnhagen@physics.umu.se</a></p> <p id="authAffiliations-1"><span class="type">Affiliation</span> IceLab, Department of Physics, Umeå University, 901 87 Umeå, Sweden </p> <a data-js-tooltip="tooltip_close" class="close" id="tooltipClose1"> &#x02A2F; </a> </div> </li> </ul> <script src="/resource/js/components/tooltip.js" type="text/javascript"></script> </div> <div id="floatTitleTop" data-js-floater="title_author" class="float-title" role="presentation"> <div class="set-grid"> <div class="float-title-inner"> <h1><?xml version="1.0" encoding="UTF-8"?>Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings</h1> <ul id="floatAuthorList" data-js-floater="floated_authors"> <li data-float-index="1">Xiaoyong Yan,&nbsp; </li> <li data-float-index="2">Petter Minnhagen </li> </ul> </div> <div class="logo-close" id="titleTopCloser"> <img src="/resource/img/logo-plos.png" style="height: 2em" alt="PLOS" /> <div class="close-floater" title="close">x</div> </div> </div> </div> <ul class="date-doi"> <li id="artPubDate">Published: May 8, 2015</li> <li id="artDoi"> <a href="https://doi.org/10.1371/journal.pone.0125592">https://doi.org/10.1371/journal.pone.0125592</a> </li> <li class="flex-spacer"></li> </ul> </div> <div> </div> </header> <section class="article-body"> <ul class="article-tabs"> <li class="tab-title active" id="tabArticle"> <a href="/plosone/article?id=10.1371/journal.pone.0125592" class="article-tab-1">Article</a> </li> <li class="tab-title " id="tabAuthors"> <a href="/plosone/article/authors?id=10.1371/journal.pone.0125592" class="article-tab-2">Authors</a> </li> <li class="tab-title " id="tabMetrics"> <a href="/plosone/article/metrics?id=10.1371/journal.pone.0125592" class="article-tab-3">Metrics</a> </li> <li class="tab-title " id="tabComments"> <a href="/plosone/article/comments?id=10.1371/journal.pone.0125592" class="article-tab-4">Comments</a> </li> <li class="tab-title" id="tabRelated"> <a class="article-tab-5" id="tabRelated-link">Media Coverage</a> <script>$(document).ready(function() { $.getMediaLink("10.1371/journal.pone.0125592").then(function (url) { $("#tabRelated-link").attr("href", url) } ) })</script> </li> </ul> <div class="article-container"> <div id="nav-article"> <ul class="nav-secondary"> <li class="nav-comments" id="nav-comments"> <a href="article/comments?id=10.1371/journal.pone.0125592">Reader Comments</a> </li> <li id="nav-figures"><a href="#" data-doi="10.1371/journal.pone.0125592">Figures</a></li> </ul> <div id="nav-data-linking" data-data-url=""> </div> </div> <script src="/resource/js/components/scroll.js" type="text/javascript"></script> <script src="/resource/js/components/nav_builder.js" type="text/javascript"></script> <script src="/resource/js/components/floating_nav.js" type="text/javascript"></script> <div id="figure-lightbox-container"></div> <script id="figure-lightbox-template" type="text/template"> <div id="figure-lightbox" class="reveal-modal full" data-reveal aria-hidden="true" role="dialog"> <div class="lb-header"> <h1 id="lb-title"><%= articleTitle %></h1> <div id="lb-authors"> <span>Xiaoyong Yan</span> <span>Petter Minnhagen</span> </div> <div class="lb-close" title="close">&nbsp;</div> </div> <div class="img-container"> <div class="loader"> <i class="fa-spinner"></i> </div> <img class="main-lightbox-image" src=""/> <aside id="figures-list"> <% figureList.each(function (ix, figure) { %> <div class="change-img" data-doi="<%= figure.getAttribute('data-doi') %>"> <img class="aside-figure" src="/plosone/article/figure/image?size=inline&id=<%= figure.getAttribute('data-doi') %>" /> </div> <% }) %> <div class="dummy-figure"> </div> </aside> </div> <div id="lightbox-footer"> <div id="btns-container" class="lightbox-row <% if(figureList.length <= 1) { print('one-figure-only') } %>"> <div class="fig-btns-container reset-zoom-wrapper left"> <span class="fig-btn reset-zoom-btn">Reset zoom</span> </div> <div class="zoom-slider-container"> <div class="range-slider-container"> <span id="lb-zoom-min"></span> <div class="range-slider round" data-slider data-options="start: 20; end: 200; initial: 20;"> <span class="range-slider-handle" role="slider" tabindex="0"></span> <span class="range-slider-active-segment"></span> <input type="hidden"> </div> <span id="lb-zoom-max"></span> </div> </div> <% if(figureList.length > 1) { %> <div class="fig-btns-container"> <span class="fig-btn all-fig-btn"><i class="icon icon-all"></i> All Figures</span> <span class="fig-btn next-fig-btn"><i class="icon icon-next"></i> Next</span> <span class="fig-btn prev-fig-btn"><i class="icon icon-prev"></i> Previous</span> </div> <% } %> </div> <div id="image-context"> </div> </div> </div> </script> <script id="image-context-template" type="text/template"> <div class="footer-text"> <div id="figure-description-wrapper"> <div id="view-more-wrapper" style="<% descriptionExpanded? print('display:none;') : '' %>"> <span id="figure-title"><%= title %></span> <p id="figure-description"> <%= description %>&nbsp;&nbsp; </p> <span id="view-more">show more<i class="icon-arrow-right"></i></span> </div> <div id="view-less-wrapper" style="<% descriptionExpanded? print('display:inline-block;') : '' %>" > <span id="figure-title"><%= title %></span> <p id="full-figure-description"> <%= description %>&nbsp;&nbsp; <span id="view-less">show less<i class="icon-arrow-left"></i></span> </p> </div> </div> </div> <div id="show-context-container"> <a class="btn show-context" href="<%= showInContext(strippedDoi) %>">Show in Context</a> </div> <div id="download-buttons"> <h3>Download:</h3> <div class="item"> <a href="/plosone/article/figure/image?size=original&download=&id=<%= doi %>" title="original image"> <span class="download-btn">TIFF</span> </a> <span class="file-size"><%= fileSizes.original %></span> </div> <div class="item"> <a href="/plosone/article/figure/image?size=large&download=&id=<%= doi %>" title="large image"> <span class="download-btn">PNG</span> </a> <span class="file-size"><%= fileSizes.large %></span> </div> <div class="item"> <a href="/plosone/article/figure/powerpoint?id=<%= doi %>" title="PowerPoint slide"> <span class="download-btn">PPT</span> </a> </div> </div> </script> <div class="article-content"> <div id="figure-carousel-section"> <h2>Figures</h2> <div id="figure-carousel"> <div class="carousel-wrapper"> <div class="slider"> <div class="carousel-item lightbox-figure" data-doi="10.1371/journal.pone.0125592.g001"> <img src="/plosone/article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g001" loading="lazy" alt="Fig 1" /> </div> <div class="carousel-item lightbox-figure" data-doi="10.1371/journal.pone.0125592.g002"> <img src="/plosone/article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g002" loading="lazy" alt="Fig 2" /> </div> <div class="carousel-item lightbox-figure" data-doi="10.1371/journal.pone.0125592.t001"> <img src="/plosone/article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.t001" loading="lazy" alt="Table 1" /> </div> <div class="carousel-item lightbox-figure" data-doi="10.1371/journal.pone.0125592.g003"> <img src="/plosone/article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g003" loading="lazy" alt="Fig 3" /> </div> <div class="carousel-item lightbox-figure" data-doi="10.1371/journal.pone.0125592.g004"> <img src="/plosone/article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g004" loading="lazy" alt="Fig 4" /> </div> <div class="carousel-item lightbox-figure" data-doi="10.1371/journal.pone.0125592.g005"> <img src="/plosone/article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g005" loading="lazy" alt="Fig 5" /> </div> <div class="carousel-item lightbox-figure" data-doi="10.1371/journal.pone.0125592.g006"> <img src="/plosone/article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g006" loading="lazy" alt="Fig 6" /> </div> <div class="carousel-item lightbox-figure" data-doi="10.1371/journal.pone.0125592.g007"> <img src="/plosone/article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g007" loading="lazy" alt="Fig 7" /> </div> <div class="carousel-item lightbox-figure" data-doi="10.1371/journal.pone.0125592.t002"> <img src="/plosone/article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.t002" loading="lazy" alt="Table 2" /> </div> <div class="carousel-item lightbox-figure" data-doi="10.1371/journal.pone.0125592.g008"> <img src="/plosone/article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g008" loading="lazy" alt="Fig 8" /> </div> <div class="carousel-item lightbox-figure" data-doi="10.1371/journal.pone.0125592.g009"> <img src="/plosone/article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g009" loading="lazy" alt="Fig 9" /> </div> <div class="carousel-item lightbox-figure" data-doi="10.1371/journal.pone.0125592.g010"> <img src="/plosone/article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g010" loading="lazy" alt="Fig 10" /> </div> </div> </div> <div class="carousel-control"> <span class="button previous"></span> <span class="button next"></span> </div> <div class="carousel-page-buttons"> </div> </div> </div> <script src="/resource/js/vendor/jquery.touchswipe.js" type="text/javascript"></script> <script src="/resource/js/components/figure_carousel.js" type="text/javascript"></script> <script src="/resource/js/vendor/jquery.dotdotdot.js" type="text/javascript"></script> <div class="article-text" id="artText"> <div xmlns:plos="http://plos.org" class="abstract toc-section abstract-type-"><a id="abstract0" name="abstract0" data-toc="abstract0" class="link-target" title="Abstract"></a><h2>Abstract</h2><div class="abstract-content"><a id="article1.front1.article-meta1.abstract1.p1" name="article1.front1.article-meta1.abstract1.p1" class="link-target"></a><p>The <em>word</em>-frequency distribution of a text written by an author is well accounted for by a maximum entropy distribution, the RGF (random group formation)-prediction. The RGF-distribution is completely determined by the <em>a priori</em> values of the total number of words in the text (<em>M</em>), the number of distinct words (<em>N</em>) and the number of repetitions of the most common word (<em>k<sub>max</sub></em>). It is here shown that this maximum entropy prediction also describes a text written in <em>Chinese characters</em>. In particular it is shown that although the same Chinese text written in <em>words</em> and <em>Chinese characters</em> have quite differently shaped distributions, they are nevertheless <em>both</em> well predicted by their respective three <em>a priori</em> characteristic values. It is pointed out that this is analogous to the change in the shape of the distribution when translating a given text to another language. Another consequence of the RGF-prediction is that taking a part of a long text will change the input parameters (<em>M, N, k<sub>max</sub></em>) and consequently also the shape of the frequency distribution. This is explicitly confirmed for texts written in Chinese characters. Since the RGF-prediction has no system-specific information beyond the three <em>a priori</em> values (<em>M, N, k<sub>max</sub></em>), any specific language characteristic has to be sought in systematic deviations from the RGF-prediction and the measured frequencies. One such systematic deviation is identified and, through a statistical information theoretical argument and an extended RGF-model, it is proposed that this deviation is caused by multiple meanings of Chinese characters. The effect is stronger for Chinese characters than for Chinese words. The relation between Zipf’s law, the Simon-model for texts and the present results are discussed.</p> </div></div> <div xmlns:plos="http://plos.org" class="articleinfo"><p><strong>Citation: </strong>Yan X, Minnhagen P (2015) Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings. PLoS ONE 10(5): e0125592. https://doi.org/10.1371/journal.pone.0125592</p><p><strong>Academic Editor: </strong>Xuchu Weng, Zhejiang Key Laboratory for Research in Assesment of Cognitive Impairments, CHINA </p><p><strong>Received: </strong>November 13, 2014; <strong>Accepted: </strong>March 16, 2015; <strong>Published: </strong> May 8, 2015</p><p><strong>Copyright: </strong> © 2015 Yan, Minnhagen. This is an open access article distributed under the terms of the <a href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</a>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited</p><p><strong>Data Availability: </strong>All relevant data are within the paper and its Supporting Information files.</p><p><strong>Funding: </strong>Supported by the National Natural Science Foundation of China (<a href="http://www.nsfc.gov.cn/">http://www.nsfc.gov.cn/</a>) under the grant Nos. 61304177 and 11105024. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</p><p><strong>Competing interests: </strong> The authors have declared that no competing interests exist.</p></div> <div xmlns:plos="http://plos.org" id="section1" class="section toc-section"><a id="sec001" name="sec001" data-toc="sec001" class="link-target" title="Introduction"></a><h2>Introduction</h2><a id="article1.body1.sec1.p1" name="article1.body1.sec1.p1" class="link-target"></a><p>The scientific interest in the information-content hidden in the frequency statistics of words and letters in a text goes at least back to Islamic scholars in the ninth century. The first practical application of these early endeavors seems to have been the use of frequency statistics of letters to decipher cryptic messages [<a href="#pone.0125592.ref001" class="ref-tip">1</a>]. The more specific question of what <em>linguistic</em> information is hidden in the <em>shape</em> of the word-frequency distribution stems from the first part of the twentieth century when it was discovered that the words in a text typically have a broad “fat-tailed” shape, which often can be well approximated with a power law over a large range [<a href="#pone.0125592.ref002" class="ref-tip">2</a>–<a href="#pone.0125592.ref005" class="ref-tip">5</a>]. This led to the empirical concept of Zipf’s law which states that the probability that a word occurs <em>k</em>-times in a text, <em>P</em>(<em>k</em>), is proportional to 1/<em>k</em><sup>2</sup>[<a href="#pone.0125592.ref003" class="ref-tip">3</a>–<a href="#pone.0125592.ref005" class="ref-tip">5</a>]. The question is then what principle or property of a language causes this power law distribution of word-frequencies and this is still an ongoing research [<a href="#pone.0125592.ref006" class="ref-tip">6</a>–<a href="#pone.0125592.ref010" class="ref-tip">10</a>]. In the middle of the twentieth century Simon in [<a href="#pone.0125592.ref011" class="ref-tip">11</a>] instead suggested that since quite a few completely different systems also seemed to follow Zipf’s law in their corresponding frequency distributions, the explanation of the law must be more general and stochastic in nature and hence independent of any specific information of the language itself. Instead he proposed a random stochastic growth model for a book written one word at a time from beginning to end. This became a very influential model and has served as a starting point for much later works [<a href="#pone.0125592.ref012" class="ref-tip">12</a>–<a href="#pone.0125592.ref018" class="ref-tip">18</a>]. However, it was recently pointed out that the Simon-model has a fundamental flaw: the rare words in the text are more often to be found in the later part of the text, whereas a real text is to very good approximation translational invariant: the first half of a <em>real</em> text has, provided it is written by the same author, the same word-frequency distribution as the second [<a href="#pone.0125592.ref023" class="ref-tip">23</a>, <a href="#pone.0125592.ref024" class="ref-tip">24</a>]. So, although the Simon-model is very general and contains a stochastic element, it is still history dependent and, in this sense, it leads to a less random frequency distribution than a real text. An extreme random model was proposed in the middle of the twentieth century by Miller in [<a href="#pone.0125592.ref025" class="ref-tip">25</a>]: the resulting text can be described as being produced by a monkey randomly typing away on a typewriter. The monkey book is definitely translational invariant, but its properties are quite unrealistic and different from a real text [<a href="#pone.0125592.ref026" class="ref-tip">26</a>].</p> <a id="article1.body1.sec1.p2" name="article1.body1.sec1.p2" class="link-target"></a><p>The RGF (random group formation)-model, which is the basis for the present analysis, can be seen as a next step along Simon’s suggestion of system-independence [<a href="#pone.0125592.ref027" class="ref-tip">27</a>]. Instead of introducing randomness from a stochastic growth model, RGF introduces randomness directly from the maximum entropy principle [<a href="#pone.0125592.ref027" class="ref-tip">27</a>]. An important point of the RGF-theory is that it is predictive: if the only knowledge of the text is <em>M</em> (total number of words), <em>N</em> (number of distinct words), and <em>k</em><sub><em>max</em></sub> (number of repetitions of the most common word), then RGF provides a complete prediction of the probability distribution <em>P</em>(<em>k</em>). This prediction includes the functional form, which embraces Gaussian-like, exponential-like and power-law-like shapes; the form is determined by the sole knowledge of (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>). A crucial point is that, if the maximum entropy principle, through RGF, gives a very good description of the data, then this implies that the values (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) incorporate all information contained in the distribution <em>P</em>(<em>k</em>), which makes the prediction neutral and void of more specific characteristic features. More specific text information is, from this view-point, associated with systematic deviations from the RGF-prediction.</p> <a id="article1.body1.sec1.p3" name="article1.body1.sec1.p3" class="link-target"></a><p>Texts sometimes deviate significantly from the empirical Zipf’s law and a substantial part of work has been devoted to explain such deviations. These explanations usually involve text- and language specific features. However, from the RGF point of view, such explanations appear rather redundant and arbitrary, whenever the RGF-prediction agrees with the data. This point of view has been further elucidated in [<a href="#pone.0125592.ref028" class="ref-tip">28</a>] for the case of species divided into taxa in biology.</p> <a id="article1.body1.sec1.p4" name="article1.body1.sec1.p4" class="link-target"></a><p>In a recent paper by L. Lü <em>et al.</em> [<a href="#pone.0125592.ref018" class="ref-tip">18</a>] it was pointed out that the character frequency-distribution for a text written in Chinese characters differs significantly from Zipf’s law, as had also been noticed earlier [<a href="#pone.0125592.ref019" class="ref-tip">19</a>–<a href="#pone.0125592.ref022" class="ref-tip">22</a>]. Chinese characters carry specific meanings. For example, ‘huí’ and ‘<span class="inline-formula"><math id="M20" display="inline" overflow="scroll"><mrow><mtext>ji</mtext><mover accent="true"><mtext>a</mtext><mo>‾</mo></mover></mrow></math></span>’ are two Chinese characters carrying the elementary meanings of “return” and “home”, respectively. In general a Chinese character can also carry multiple meanings, where the relevant meaning has to be deduced from the context. A Chinese word corresponds to one, two or more characters, <em>e.g.</em> the two characters ‘huí’ and ‘<span class="inline-formula"><math id="M21" display="inline" overflow="scroll"><mrow><mtext>ji</mtext><mover accent="true"><mtext>a</mtext><mo>‾</mo></mover></mrow></math></span>’ can be combined into the Chinese word ‘huí,<span class="inline-formula"><math id="M22" display="inline" overflow="scroll"><mrow><mtext>ji</mtext><mover accent="true"><mtext>a</mtext><mo>‾</mo></mover></mrow></math></span>’ denoting the concept of “returning home”. Thus both Chinese characters and Chinese words carry meanings which can be single or multiple. Roughly a word in Chinese corresponds to about 1.5 characters on the average and typically more than 90% of the words in a novel are written with one or two characters, where about 50% of the words are written by one character and 40% with two. The remaining ones are made up of more than two Chinese characters.</p> <a id="article1.body1.sec1.p5" name="article1.body1.sec1.p5" class="link-target"></a><p>The Chinese character frequency distribution is illustrated in <a href="#pone-0125592-g001">Fig 1</a>. The straight line in the figure is the Zipf’s law expectation. From a Zipf’s law perspective one might then be tempted to conclude that the deviations between the data and Zipf’s law have something to do specifically with the Chinese language or the representation in terms of Chinese characters, or perhaps a bit of both. However, the dashed curve in the figure is the RGF-prediction. This prediction is very close to the data, which suggests that beyond the three characteristic numbers (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) [total number of Chinese characters, distinct characters, and the number of repetitions of the most common character] there is <em>no</em> specifically Chinese feature, which can be extracted from the data.</p> <a class="link-target" id="pone-0125592-g001" name="pone-0125592-g001"></a><div class="figure" data-doi="10.1371/journal.pone.0125592.g001"><div class="img-box"><a title="Click for larger image" href="article/figure/image?size=medium&amp;id=10.1371/journal.pone.0125592.g001" data-doi="10.1371/journal.pone.0125592" data-uri="10.1371/journal.pone.0125592.g001"><img src="article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g001" alt="thumbnail" class="thumbnail" loading="lazy"></a><div class="expand"></div></div><div class="figure-inline-download"> Download: <ul><li><a href="article/figure/powerpoint?id=10.1371/journal.pone.0125592.g001"><div class="definition-label">PPT</div><div class="definition-description">PowerPoint slide</div></a></li><li><a href="article/figure/image?download&amp;size=large&amp;id=10.1371/journal.pone.0125592.g001"><div class="definition-label">PNG</div><div class="definition-description">larger image</div></a></li><li><a href="article/figure/image?download&amp;size=original&amp;id=10.1371/journal.pone.0125592.g001"><div class="definition-label">TIFF</div><div class="definition-description">original image</div></a></li></ul></div><div class="figcaption"><span>Fig 1. </span> Frequency of Chinese characters for the novel <em>A Q Zheng Zhuan</em> by Xun Lu and comparison with the RGF-prediction and the Zipf’s law expectation.</div><p class="caption_target"><a id="article1.body1.sec1.fig1.caption1.p1" name="article1.body1.sec1.fig1.caption1.p1" class="link-target"></a><p>(a) Compares the probability, <em>P</em>(<em>k</em>), for a character to appear <em>k</em>-times in the text: crosses are raw data, filled dots are the log2-binned data, the straight line is the Zipf’s law expectation, and the dashed curve is the RGF-prediction. The RGF predicts the dashed curve directly from the three values (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) (see <a href="#pone-0125592-t001">Table 1</a> for the input values and the corresponding predicted output values from RGF). (b) The same features in terms of the cumulative distribution <em>C</em>(<em>k</em>) = ∑<sub><em>k</em>′ ≥ <em>k</em></sub> <em>P</em>(<em>k</em>): filled triangles are the data, the straight line the Zipf’s law expectation and the dashed curve the RGF-prediction. RGF gives a very good <em>ab initio</em> description of the data which differs substantially from the Zipf’s law expectation (Note that the RGF-prediction is based solely on the raw data and predicts both the binned data in (a) and the cumulant data in (b)).</p> </p><p class="caption_object"><a href="https://doi.org/10.1371/journal.pone.0125592.g001"> https://doi.org/10.1371/journal.pone.0125592.g001</a></p></div><a id="article1.body1.sec1.p6" name="article1.body1.sec1.p6" class="link-target"></a><p>A crucial point for reaching our conclusions in the present paper is the distinction between a predictive model like RGF and conventional curve-fitting. This can be illustrated by <a href="#pone-0125592-g001">Fig 1(b)</a>: if your aim is to fit the lowest <em>k</em>-data points in <a href="#pone-0125592-g001">Fig 1(b)</a> (<em>e.g.</em> <em>k</em> = 1 to 10) with an <em>ad hoc</em> two parameter curve you can obviously do slightly better than the dashed curve in the <a href="#pone-0125592-g001">Fig 1(b)</a>. However, the dashed curve is a <em>prediction</em> solely based on the knowledge of the right-most point in <a href="#pone-0125592-g001">Fig 1(b)</a> (<em>k</em><sub><em>max</em></sub> = 747) and the average number of times a character is used (<em>M</em>/<em>N</em> = 11.5). RGF predicts where the data points in the interval <em>k</em> = 1−10 in <a href="#pone-0125592-g001">Fig 1(b)</a> should fall <em>without</em> any explicit a priori knowledge of their whereabouts and with very little knowledge of anything else. This is the crucial difference between a prediction from a model and a fitting procedure and this difference carries over into the different conclusions which can be drawn from the two procedures. Another illustration is the fact that although the data in <a href="#pone-0125592-g001">Fig 1(b)</a> cannot be described by a Zipf’s-line with slope -1, such a line can be fitted to the data over a narrow range somewhere in the middle. Such an <em>ad hoc</em> fitting has no predictive value.</p> <a id="article1.body1.sec1.p7" name="article1.body1.sec1.p7" class="link-target"></a><p>Specific information about the system may be reflected in deviations from the RGF-prediction [<a href="#pone.0125592.ref028" class="ref-tip">28</a>]. One such possible deviation is discussed. It is also suggested that the cause of this deviation is multiple meanings of Chinese characters. A statistical information based argument for this conclusion is presented together with an extended RGF-model.</p> <a id="article1.body1.sec1.p8" name="article1.body1.sec1.p8" class="link-target"></a><p>The method section starts with a brief recapitulation of the RGF-theory, as well as the Random Book Transformation, which allows for the analysis of sub-parts of the novels. Both these methods are used as starting points when analyzing the frequency distribution for two Chinese novels. The Chinese character-frequency distributions are compared to the corresponding word-frequency distributions for both novels, as well as for parts of the novels. The results from these comparisons lead to an information theory which makes it possible to approximately include the multiple meanings of Chinese characters. It is pointed out that the existence of words with multiple meanings isn’t a characteristic specific to Chinese, but a general feature of languages. The frequency distribution of the elementary entities of a written language (words or characters) is therefore influenced by the distribution of meanings over these entities, in a characteristic way. Conclusions are discussed in a last section.</p> </div> <div xmlns:plos="http://plos.org" id="section2" class="section toc-section"><a id="sec002" name="sec002" data-toc="sec002" class="link-target" title="Methods"></a><h2>Methods</h2> <div id="section1" class="section toc-section"><a id="sec003" name="sec003" class="link-target" title="Random Group Formation"></a> <h3>Random Group Formation</h3> <a id="article1.body1.sec2.sec1.p1" name="article1.body1.sec2.sec1.p1" class="link-target"></a><p>The random group formation describes the general situation in which <em>M</em> objects are randomly grouped together into <em>N</em> groups [<a href="#pone.0125592.ref027" class="ref-tip">27</a>]. The simplest case is when the objects are denumerable. Then if you know <em>M</em> and <em>N</em> the most likely distribution of group sizes, <em>N</em>(<em>k</em>) (number of sizes with <em>k</em> objects), can be obtained by minimizing the information average <em>I</em>[<em>N</em>(<em>k</em>)] = <em>N</em><sup>−1</sup>∑<em>N</em>(<em>k</em>)ln(<em>kN</em>(<em>k</em>)) with respect to the functional form of <em>N</em>(<em>k</em>), subject to the two constraints that <em>N</em><sup>−1</sup>∑<em>N</em>(<em>k</em>)<em>k</em> = &lt; <em>k</em> &gt; = <em>M</em>/<em>N</em> and ∑<em>N</em>(<em>k</em>) = <em>N</em>. Note that the information to localize an object in one of the groups of size <em>k</em> is log<sub>2</sub>(<em>kN</em>(<em>k</em>)) in bits and ln(<em>kN</em>(<em>k</em>)) in nats. Minimizing the average information <em>I</em>[<em>N</em>(<em>k</em>)] is equivalent to maximizing the entropy [<a href="#pone.0125592.ref027" class="ref-tip">27</a>]. Thus RGF is a way to apply the maximum entropy principle to this particular class of problems. The result of the simplest case is the prediction <em>N</em>(<em>k</em>) = <em>A</em>exp(−<em>bk</em>)/<em>k</em>[<a href="#pone.0125592.ref027" class="ref-tip">27</a>]. However, in more general cases there might be many additional constraints and in addition all the objects might not lend themselves to a simple denumerization. The point is that in many applications you <em>do</em> know that there must be additional constraints relative to the simplest case <em>but</em> you have no idea what they might be. The RGF-idea is then based on the observation that any deviation from the simplest case will be reflected in a change of the entropy <em>S</em>[<em>N</em>(<em>k</em>)] = −∑<sub><em>k</em></sub> <em>N</em>(<em>k</em>)/<em>N</em>ln(<em>kN</em>(<em>k</em>)/<em>N</em>). This can then be taken into account by incorporating the actual value of the entropy <em>S</em> as an additional constraint in the minimizing of <em>I</em>[<em>N</em>(<em>k</em>)]. The resulting more general prediction then becomes <em>N</em>(<em>k</em>) = <em>A</em>exp(−<em>bk</em>)/<em>k</em><sup><em>γ</em></sup>[<a href="#pone.0125592.ref027" class="ref-tip">27</a>]. Thus RGF transforms the three values (<em>M</em>, <em>N</em>, <em>S</em>) into a complete prediction of the group-size distribution. This also means that the form of the distribution is determined by the values (<em>M</em>, <em>N</em>, <em>S</em>) and includes a Gaussian limit (when <em>γ</em> = (<em>M</em>/<em>N</em>)<em>b</em> and (<em>M</em>/<em>N</em>)<sup>2</sup>/<em>γ</em> is small), exponential (when <em>γ</em> = 0), power-law (when <em>b</em> = 0) and anything in between.</p> <a id="article1.body1.sec2.sec1.p2" name="article1.body1.sec2.sec1.p2" class="link-target"></a><p>In comparison with earlier work, one may note that the functional form <em>P</em>(<em>k</em>) = <em>A</em>exp(−<em>bk</em>)/<em>k</em><sup><em>γ</em></sup> has been used before when parameterizating distributions as described <em>e.g.</em> by Clauset <em>et al</em>[<a href="#pone.0125592.ref029" class="ref-tip">29</a>] and that such a functional form can obtained from a maximum entropy as described <em>e.g.</em> by Visser [<a href="#pone.0125592.ref030" class="ref-tip">30</a>]. The difference with our approach is the connection to minimal information which opens up the predictive part of the RGF. As emphasized in the Introduction, it is this predictive aspect which is crucial in the present approach and which lends itself to the generalization of including multiple meanings of characters.</p> <a id="article1.body1.sec2.sec1.p3" name="article1.body1.sec2.sec1.p3" class="link-target"></a><p>The RGF-distribution was in [<a href="#pone.0125592.ref027" class="ref-tip">27</a>, <a href="#pone.0125592.ref028" class="ref-tip">28</a>, <a href="#pone.0125592.ref031" class="ref-tip">31</a>, <a href="#pone.0125592.ref032" class="ref-tip">32</a>] shown to apply to a variety of systems like words in texts, population in counties, family names, distribution of richness, distribution of species into taxa, node sizes in metabolic networks, etc. In case of words, <em>N</em> is the number of different words, <em>M</em> is the total number of words, and <em>N</em>(<em>k</em>) is the number of different words which appears <em>k</em> times in the text. In English the largest group consists of the word “the” and its occurrence in a text written by an author is a statistically very well defined: it is typically about 4% of the total number of words [<a href="#pone.0125592.ref024" class="ref-tip">24</a>, <a href="#pone.0125592.ref027" class="ref-tip">27</a>]. As a consequence one may replace the three values (<em>M</em>, <em>N</em>, <em>S</em>) by the three values (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>). Both choices completely determine the parameters (<em>A</em>, <em>b</em>, <em>γ</em>) in the RGF-prediction. However, the latter choice has the practical advantage that <em>k</em><sub><em>max</em></sub>, the number of repetitions of the most common word, is more directly accessible and statistically very well-defined. For example, if <em>k</em><sub><em>max</em></sub> is close to the average &lt; <em>k</em> &gt; = <em>M</em>/<em>N</em>, such that (<em>k</em><sub><em>max</em></sub>− &lt; <em>k</em> &gt;)/ &lt; <em>k</em> &gt; &lt; &lt; 1 then the RGF-prediction approaches a Gaussian, which comes as no surprise because a Gaussian is just the outcome of the maximum entropy principle for such a narrow distribution [<a href="#pone.0125592.ref027" class="ref-tip">27</a>].</p> </div> <div id="section2" class="section toc-section"><a id="sec004" name="sec004" class="link-target" title="Random Book Transformation"></a> <h3>Random Book Transformation</h3> <a id="article1.body1.sec2.sec2.p1" name="article1.body1.sec2.sec2.p1" class="link-target"></a><p>In general, the distribution for a system, which falls into the RGF-class, has a distribution with a shape which depends on <em>M</em>. Since <em>M</em> for a text is the total number of words, this means that the frequency distribution is text-length dependent. The reason is that if you start from a text characterized by (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>), then the corresponding value for a half of the text is characterized by (<em>M</em><sub>1/2</sub>, <em>N</em><sub>1/2</sub>, <em>k</em><sub><em>max</em><sub>1/2</sub></sub>). Here <em>M</em><sub>1/2</sub> = <em>M</em>/2 by definition, <em>k</em><sub><em>max</em><sub>1/2</sub></sub> = [<em>k</em><sub><em>max</em></sub>]/2 because the most common word is to good approximation equally distributed within the text, but <em>N</em><sub>1/2</sub> is non trivial. In the present investigation we need a method to separate between changes in the frequency distribution due to multiple meanings and due to the size of the text. For this purpose we use the Random Book Transformation (RBT) discussed in [<a href="#pone.0125592.ref027" class="ref-tip">27</a>], where it was shown that the text-length dependence of the average <em>N</em>, when taking a part of a given text, is to good approximation a neutral feature: it is to good approximation the same as when you randomly delete the corresponding amount of words from the text. The process of changing the length of a text by randomly deleting words is a simple statistical process which transforms the probability distribution <em>P</em><sub><em>M</em></sub>(<em>k</em>) = <em>N</em>(<em>k</em>)/<em>N</em> for the full text into <em>P</em><sub><em>M</em>/<em>n</em></sub>(<em>k</em>) for the n<sup><em>th</em></sup> part of the text by <a name="pone.0125592.e001" id="pone.0125592.e001" class="link-target"></a><span class="equation"><img src="article/file?type=thumbnail&amp;id=10.1371/journal.pone.0125592.e001" loading="lazy" class="inline-graphic"><span class="note">(1)</span></span> where <strong>P</strong><sub><em>M</em>/<em>n</em></sub> and <strong>P</strong><sub><em>M</em></sub> are column matrices corresponding to <em>P</em><sub><em>M</em>/<em>n</em></sub> and <em>P</em><sub><em>M</em></sub>. The transformation matrix <strong>A</strong><sub><em>k</em>′<em>k</em></sub> is given by <a name="pone.0125592.e002" id="pone.0125592.e002" class="link-target"></a><span class="equation"><img src="article/file?type=thumbnail&amp;id=10.1371/journal.pone.0125592.e002" loading="lazy" class="inline-graphic"><span class="note">(2)</span></span> where <span class="inline-formula"><math id="M3" display="inline" overflow="scroll"><mrow><msubsup><mi>C</mi><mi>k</mi><msup><mi>k</mi><mo>′</mo></msup></msubsup></mrow></math></span> is binomial coefficient. <em>B</em> is given by the normalization condition <a name="pone.0125592.e004" id="pone.0125592.e004" class="link-target"></a><span class="equation"><img src="article/file?type=thumbnail&amp;id=10.1371/journal.pone.0125592.e004" loading="lazy" class="inline-graphic"><span class="note">(3)</span></span></p> <a id="article1.body1.sec2.sec2.p2" name="article1.body1.sec2.sec2.p2" class="link-target"></a><p>As shown in the next section, this simple random book transformation also to good approximation applies to text written in Chinese characters.</p> </div> </div> <div xmlns:plos="http://plos.org" id="section3" class="section toc-section"><a id="sec005" name="sec005" data-toc="sec005" class="link-target" title="Results"></a><h2>Results</h2> <div id="section1" class="section toc-section"><a id="sec006" name="sec006" class="link-target" title="RGF and size transformation for Chinese texts"></a> <h3>RGF and size transformation for Chinese texts</h3> <a id="article1.body1.sec3.sec1.p1" name="article1.body1.sec3.sec1.p1" class="link-target"></a><p><a href="#pone-0125592-g002">Fig 2</a> shows that the data for the novel <em>A Q Zheng Zhuan</em> is well described by the neutral-model prediction provided by RGF. This implies that the frequency distribution of both words and characters is to large extent directly determined by the “state”-variable triple (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>). At first sight this might appear surprising because the development of a spoken language and its written counterpart is a long and intricate process. However, in statistical physics this type of emergent simple properties from a complex system is well established. A well-known example is the ideal gas law <em>P</em> = <em>NT</em>/<em>V</em> which predicts the pressure, <em>P</em>, that an ideal gas inside a closed container exerts on the walls from the three “state”-variables (<em>N</em>, <em>V</em>, <em>T</em>), where <em>N</em> the number of gas particles, <em>V</em> is the volume of the container and <em>T</em> is the absolute temperature of the gas. Yet each gas particle follows its own deterministic trajectory including collisions with other particles and the walls. Since the number of particles is enormous it is in practice impossible to predict the outcome by deterministically following what happens in time to all the particles. The emergence of the simple ideal gas law stems from the fact that, with an enormous number of possibilities, the actual one is very likely to be close to the most likely outcome, assuming that all possibilities are equally likely. The basis for the maximum entropy principle in the present context is precisely the assumption that all distinct possibilities are equally likely.</p> <a class="link-target" id="pone-0125592-g002" name="pone-0125592-g002"></a><div class="figure" data-doi="10.1371/journal.pone.0125592.g002"><div class="img-box"><a title="Click for larger image" href="article/figure/image?size=medium&amp;id=10.1371/journal.pone.0125592.g002" data-doi="10.1371/journal.pone.0125592" data-uri="10.1371/journal.pone.0125592.g002"><img src="article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g002" alt="thumbnail" class="thumbnail" loading="lazy"></a><div class="expand"></div></div><div class="figure-inline-download"> Download: <ul><li><a href="article/figure/powerpoint?id=10.1371/journal.pone.0125592.g002"><div class="definition-label">PPT</div><div class="definition-description">PowerPoint slide</div></a></li><li><a href="article/figure/image?download&amp;size=large&amp;id=10.1371/journal.pone.0125592.g002"><div class="definition-label">PNG</div><div class="definition-description">larger image</div></a></li><li><a href="article/figure/image?download&amp;size=original&amp;id=10.1371/journal.pone.0125592.g002"><div class="definition-label">TIFF</div><div class="definition-description">original image</div></a></li></ul></div><div class="figcaption"><span>Fig 2. </span> Comparison between Chinese texts in characters and words.</div><p class="caption_target"><a id="article1.body1.sec3.sec1.fig1.caption1.p1" name="article1.body1.sec3.sec1.fig1.caption1.p1" class="link-target"></a><p>(a) Comparison between characters and words for the novel <em>A Q Zheng Zhuan</em> by Xun Lu together with the respective RGF-predictions. (b) The same comparison for the novel <em>Ping Fan De Shi Jie</em> by Yao Lu. Filled dots correspond to the binned data for Chinese characters and filled triangles the data for words. Full and dashed curves correspond to the respective RGF-predictions and dotted straight lines are the Zipf’s law expectations for the word-frequency distribution. The respective “state”-variables (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) and the corresponding RGF-predictions are given in <a href="#pone-0125592-t001">Table 1</a>. Note that the translation between words and characters is a deterministic process. Yet the “state”-variables (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) suffice to predict the change in frequency distribution caused by the translation between words and characters.</p> </p><p class="caption_object"><a href="https://doi.org/10.1371/journal.pone.0125592.g002"> https://doi.org/10.1371/journal.pone.0125592.g002</a></p></div><a id="article1.body1.sec3.sec1.p2" name="article1.body1.sec3.sec1.p2" class="link-target"></a><p>A crucial point is that, provided RGF does give a good description of the data, this means that it is the deviations between the data and the RGF-prediction which may carry interesting system-specific information. From this perspective Zipf’s law is just an approximation of the RGF <em>i.e.</em> the straight line in <a href="#pone-0125592-g001">Fig 1</a> should be regarded as an approximation of the dashed curve. It follows that the deviation between Zipf’s law and the data does not reflect any characteristic property of the underlying system [<a href="#pone.0125592.ref027" class="ref-tip">27</a>].</p> <a id="article1.body1.sec3.sec1.p3" name="article1.body1.sec3.sec1.p3" class="link-target"></a><p>Following this line of argument, it is essential to establish just how well the RGF does describe the data. <a href="#pone-0125592-g002">Fig 2(a)</a> gives such a quality test: if all that matters is the “state”-variables (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>), then one could equally well translate the same novel from Chinese characters to words. As seen in <a href="#pone-0125592-g002">Fig 2(a)</a>, the word-frequency distribution for the novel <em>A Q Zheng Zhuan</em> is completely different from the character-frequency and also the “state”-variables are totally different (see <a href="#pone-0125592-t001">Table 1</a> for “state”-variables and RGF prediction values). Yet according to RGF the change in shape only depends on the value of the “state”-variables and not if they relate to characters or words. As seen from <a href="#pone-0125592-g002">Fig 2(a)</a>, RGF does indeed give a very good description in both cases.</p> <a class="link-target" id="pone-0125592-t001" name="pone-0125592-t001"></a><div class="figure" data-doi="10.1371/journal.pone.0125592.t001"><div class="img-box"><a title="Click for larger image" href="article/figure/image?size=medium&amp;id=10.1371/journal.pone.0125592.t001" data-doi="10.1371/journal.pone.0125592" data-uri="10.1371/journal.pone.0125592.t001"><img src="article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.t001" alt="thumbnail" class="thumbnail" loading="lazy"></a><div class="expand"></div></div><div class="figure-inline-download"> Download: <ul><li><a href="article/figure/powerpoint?id=10.1371/journal.pone.0125592.t001"><div class="definition-label">PPT</div><div class="definition-description">PowerPoint slide</div></a></li><li><a href="article/figure/image?download&amp;size=large&amp;id=10.1371/journal.pone.0125592.t001"><div class="definition-label">PNG</div><div class="definition-description">larger image</div></a></li><li><a href="article/figure/image?download&amp;size=original&amp;id=10.1371/journal.pone.0125592.t001"><div class="definition-label">TIFF</div><div class="definition-description">original image</div></a></li></ul></div><div class="figcaption"><span>Table 1. </span> Data and RGF-predictions.</div><p class="caption_target"><a id="article1.body1.sec3.sec1.table-wrap1.caption1.p1" name="article1.body1.sec3.sec1.table-wrap1.caption1.p1" class="link-target"></a><p>Two Chinese novels are used as empirical data <em>i.e.</em> <em>A Q Zheng Zhuan</em> (AQ for short) written by Xun Lu and <em>Ping Fan De Shi Jie</em> (PF for short) by Yao Lu. For each book we first remove punctuation marks and numbers from the texts, then count the Chinese characters one by one and finally get the characters frequency results. In Chinese language the words are not separated by spaces, so we use a word segmenter, Jieba (<a href="https://github.com/fxsjy/jieba">https://github.com/fxsjy/jieba</a>), to extract words from Chinese texts. The RGF-prediction is given in the form <em>P</em>(<em>k</em>) = <em>A</em>′ exp(−<em>bk</em>)/<em>k</em><sup><em>γ</em></sup>. This means that the RGF-theory transforms the data-triple (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) into the prediction triple (<em>γ</em>, <em>b</em>, <em>A</em>′).</p> </p><p class="caption_object"><a href="https://doi.org/10.1371/journal.pone.0125592.t001"> https://doi.org/10.1371/journal.pone.0125592.t001</a></p></div><a id="article1.body1.sec3.sec1.p4" name="article1.body1.sec3.sec1.p4" class="link-target"></a><p>The translation of <em>A Q Zheng Zhuan</em> from characters to words is in itself an example of a deterministic process. Yet, as illustrated in <a href="#pone-0125592-g002">Fig 2(a)</a>, it is a complicated process in the sense that the resulting word-frequency distribution, through RGF, can be obtained to very good approximation without having any knowledge about the actual deterministic translation-process! This can again be viewed as a case when complexity results in simplicity.</p> <a id="article1.body1.sec3.sec1.p5" name="article1.body1.sec3.sec1.p5" class="link-target"></a><p><a href="#pone-0125592-g002">Fig 2(b)</a> gives a second example for a longer novel, <em>Ping Fan De Shi Jie</em> by Lu Yao (about 40 times as many characters as <em>A Q Zheng Zhuan</em>, see <a href="#pone-0125592-t001">Table 1</a>). In this case the word-frequency is very well accounted for by RGF. Note that in this particular case the Zipf’s law prediction agrees very well with both the RGF-prediction and the data (Zipf’s law is a straight line with slope -2 in Fig <a href="#pone-0125592-g002">2(a)</a> and <a href="#pone-0125592-g002">2(b)</a>). RGF also provides a reasonable approximation of the character-frequency, whereas Zipf’s law fails completely for this case. This is consistent with the interpretation that Zipf’s law is just an approximation of RGF; an approximation which sometimes works and sometimes does not. However, as will be argued below, the discernible deviation between RGF and the data may reflect some specific linguistic feature.</p> <a id="article1.body1.sec3.sec1.p6" name="article1.body1.sec3.sec1.p6" class="link-target"></a><p>As shown above, the shape of the frequency curve for a given text changes when translating between characters and words and this change is well accounted for by the RGF and the corresponding change in “state”-variables. This is quite similar to the change of shape when more generally translating a novel to different languages. This analogy is demonstrated on the basis of the Russian short story <em>The Man in a Case</em> by A. Chekhov and its translations into English words and Chinese characters. As shown in <a href="#pone-0125592-g003">Fig 3(a)</a>, the respective RGF-predictions match the corresponding frequency distributions very well. The same is true for the English novel <em>The Old Man and the Sea</em> by E. Hemmingway (compare <a href="#pone-0125592-g003">Fig 3(b)</a>). These findings confirm that the information contained in the triple (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) is sufficient to describe the frequency distribution of the fundamental entities of a written language, independent if those are words or characters in Chinese and irrespective of the underlying language.</p> <a class="link-target" id="pone-0125592-g003" name="pone-0125592-g003"></a><div class="figure" data-doi="10.1371/journal.pone.0125592.g003"><div class="img-box"><a title="Click for larger image" href="article/figure/image?size=medium&amp;id=10.1371/journal.pone.0125592.g003" data-doi="10.1371/journal.pone.0125592" data-uri="10.1371/journal.pone.0125592.g003"><img src="article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g003" alt="thumbnail" class="thumbnail" loading="lazy"></a><div class="expand"></div></div><div class="figure-inline-download"> Download: <ul><li><a href="article/figure/powerpoint?id=10.1371/journal.pone.0125592.g003"><div class="definition-label">PPT</div><div class="definition-description">PowerPoint slide</div></a></li><li><a href="article/figure/image?download&amp;size=large&amp;id=10.1371/journal.pone.0125592.g003"><div class="definition-label">PNG</div><div class="definition-description">larger image</div></a></li><li><a href="article/figure/image?download&amp;size=original&amp;id=10.1371/journal.pone.0125592.g003"><div class="definition-label">TIFF</div><div class="definition-description">original image</div></a></li></ul></div><div class="figcaption"><span>Fig 3. </span> Similarity of translation between words and characters versus words of different languages.</div><p class="caption_target"><a id="article1.body1.sec3.sec1.fig2.caption1.p1" name="article1.body1.sec3.sec1.fig2.caption1.p1" class="link-target"></a><p>(a) The Russian short story <em>The Man in a Case</em> by A. Chekhov, and its translations into English words and Chinese characters (triangles, squares, and filled dots, respectively). The RGF-predictions are given by the curves (dashed dotted, dashed, and full, respectively, The RGF-prediction completely characterizes a frequency distribution in terms of the total number of words/characters (<em>M</em>), the number of specific words/characters (<em>N</em>), and how many of the total number of words/characters are given by the most common word/character (<em>k</em><sub><em>max</em></sub>/<em>M</em>). Each such triple (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) gives a unique prediction-curve [(<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) = (4061, 1721, 231), (5375, 1317, 256), and (8212, 1150, 312), respectively]. The agreement shows that words and characters are entirely analogous with respect to frequency distributions. (b) illustrates the same thing starting from the English novel <em>The Old Man and the Sea</em> and translating into Russian words and Chinese characters. The triples are this time (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) = (22414, 5378, 988), (23894, 2388, 2091), and (34220, 1685, 1289), in the order Russian, English and Chinese characters (Data points and RGF-curves, as in (a)).</p> </p><p class="caption_object"><a href="https://doi.org/10.1371/journal.pone.0125592.g003"> https://doi.org/10.1371/journal.pone.0125592.g003</a></p></div><a id="article1.body1.sec3.sec1.p7" name="article1.body1.sec3.sec1.p7" class="link-target"></a><p>In order to gain further insight into what causes the difference in word-frequency and character-frequency of a text written in Chinese one can compare text-parts of different lengths from a given novel. As described in [<a href="#pone.0125592.ref024" class="ref-tip">24</a>], text-parts of different length of a novel have different frequency distributions. For example if you start from <em>A Q Zheng Zhuan</em> and take an 10<sup><em>th</em></sup>-part, then the shape changes, as shown in <a href="#pone-0125592-g004">Fig 4(a)</a>. According to RGF this new shape should now to good approximation be directly predicted from the new “state” <span class="inline-formula"><math id="M5" display="inline" overflow="scroll"><mrow><mo stretchy="false">(</mo><mi>M</mi><mo>/</mo><mn>10</mn><mo>,</mo><msup><mi>N</mi><mo>′</mo></msup><mo>,</mo><msubsup><mi>k</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow><mo>′</mo></msubsup><mo stretchy="false">)</mo></mrow></math></span> (see <a href="#pone-0125592-t001">Table 1</a> for the precise values) As seen in <a href="#pone-0125592-g004">Fig 4(b)</a> this is to good approximation the case. As explained in <strong>Methods</strong> and can be verified from <a href="#pone-0125592-t001">Table 1</a>, <span class="inline-formula"><math id="M6" display="inline" overflow="scroll"><mrow><msubsup><mi>k</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow><mo>′</mo></msubsup><mo>≈</mo><msub><mi>k</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub><mo>/</mo><mn>10</mn></mrow></math></span>. One may then ask if the transformation from <em>N</em> to <em>N</em>′ involves some system specific feature. In order to check this one can compare the process of taking an <em>n</em><sup><em>th</em></sup>-part of a text with the process of randomly deleting characters until only a <em>n</em><sup><em>th</em></sup>-part of them remains. This latter process is a trivial statistical transformation described in <strong>Methods</strong> under the name RBT (Random-Book-Transformation). <a href="#pone-0125592-g004">Fig 4(b)</a> also shows the predicted frequency distribution obtained from the “state”-variable triple <span class="inline-formula"><math id="M7" display="inline" overflow="scroll"><mrow><mo stretchy="false">(</mo><msup><mi>M</mi><mo>′</mo></msup><mo>,</mo><msup><mi>N</mi><mo>′</mo></msup><mo>,</mo><msubsup><mi>k</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow><mo>′</mo></msubsup><mo stretchy="false">)</mo></mrow></math></span> <em>derived</em> from RBT and used as input in RGF. (The actual RBT-derived value for <em>N</em>′ is given in <a href="#pone-0125592-t001">Table 1</a>). The close agreements signal that the change of shape due to a reduction in text length, to large extent, is a general totally system-independent feature. <a href="#pone-0125592-g004">Fig 4(c)</a> shows the change of the frequency-distribution, when taking parts of the longer novel <em>Ping Fan De Shi Jie</em> written in characters and <a href="#pone-0125592-g004">Fig 4(d)</a> compares the parts with the RGF-prediction, as well as with the combined RGF+RBT-prediction. The conclusion is that the change of shape carries very little system specific information.</p> <a class="link-target" id="pone-0125592-g004" name="pone-0125592-g004"></a><div class="figure" data-doi="10.1371/journal.pone.0125592.g004"><div class="img-box"><a title="Click for larger image" href="article/figure/image?size=medium&amp;id=10.1371/journal.pone.0125592.g004" data-doi="10.1371/journal.pone.0125592" data-uri="10.1371/journal.pone.0125592.g004"><img src="article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g004" alt="thumbnail" class="thumbnail" loading="lazy"></a><div class="expand"></div></div><div class="figure-inline-download"> Download: <ul><li><a href="article/figure/powerpoint?id=10.1371/journal.pone.0125592.g004"><div class="definition-label">PPT</div><div class="definition-description">PowerPoint slide</div></a></li><li><a href="article/figure/image?download&amp;size=large&amp;id=10.1371/journal.pone.0125592.g004"><div class="definition-label">PNG</div><div class="definition-description">larger image</div></a></li><li><a href="article/figure/image?download&amp;size=original&amp;id=10.1371/journal.pone.0125592.g004"><div class="definition-label">TIFF</div><div class="definition-description">original image</div></a></li></ul></div><div class="figcaption"><span>Fig 4. </span> Size dependence of novels written in Chinese characters.</div><p class="caption_target"><a id="article1.body1.sec3.sec1.fig3.caption1.p1" name="article1.body1.sec3.sec1.fig3.caption1.p1" class="link-target"></a><p>The same two novels as in <a href="#pone-0125592-g002">Fig 2</a> are divided into parts. The frequency distribution of a full novel is compared to the one of a part. (a) <em>P</em>(<em>k</em>) for <em>A Q Zheng Zhuan</em> (filled dots) is compared to the distribution for a typical 10<sup><em>th</em></sup>-part (filled triangles). Here the word <em>typical</em> means an average distribution obtained by taking many different 10<sup><em>th</em></sup> with different starting points. These two functions have quite different shapes. However, the shapes of both are equally well predicted by RGF (curves with dashed and full lines). (b) The distribution of the 10<sup><em>th</em></sup>-part, which can to very good approximation be trivially obtained from the full book by just <em>randomly</em> removing 90% of the words from the full book. This corresponds to the dashed curve which is almost identical to the RGF-prediction and both correspond very well to the data. (c-d) The same features for the novel <em>Ping Fan De Shi Jie</em>. Note that the 10<sup><em>th</em></sup>-part agrees better with RGF than the full novel.</p> </p><p class="caption_object"><a href="https://doi.org/10.1371/journal.pone.0125592.g004"> https://doi.org/10.1371/journal.pone.0125592.g004</a></p></div><a id="article1.body1.sec3.sec1.p8" name="article1.body1.sec3.sec1.p8" class="link-target"></a><p>By comparing Fig <a href="#pone-0125592-g002">2(a)</a> and <a href="#pone-0125592-g002">2(b)</a>, one notices that whereas RGF gives a very good account of the shorter novel <em>A Q Zheng Zhuan</em>, there appears to be some deviation for the longer novel <em>Ping Fan De Shi Jie</em>. In <a href="#pone-0125592-g005">Fig 5(a)</a> we compare a 40<sup><em>th</em></sup> part of <em>Ping Fan De Shi Jie</em> with the full length of <em>A Q Zheng Zhuan</em>. As seen from <a href="#pone-0125592-g005">Fig 5(a)</a> the two texts have very closely the same character-frequency distribution. From the point of view of RGF, it would mean that the “state”-variables (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) are closely the same. This is indeed the case, as seen in <a href="#pone-0125592-t001">Table 1</a> and from the direct comparison with RGF in <a href="#pone-0125592-g005">Fig 5(b)</a>. <em>Ping Fan De Shi Jie</em> and its partitioning suggest a possible specific additional feature for written texts: a deviation from RGF for longer texts, which becomes negligible for shorter. In the following section we suggest what type of feature this might be.</p> <a class="link-target" id="pone-0125592-g005" name="pone-0125592-g005"></a><div class="figure" data-doi="10.1371/journal.pone.0125592.g005"><div class="img-box"><a title="Click for larger image" href="article/figure/image?size=medium&amp;id=10.1371/journal.pone.0125592.g005" data-doi="10.1371/journal.pone.0125592" data-uri="10.1371/journal.pone.0125592.g005"><img src="article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g005" alt="thumbnail" class="thumbnail" loading="lazy"></a><div class="expand"></div></div><div class="figure-inline-download"> Download: <ul><li><a href="article/figure/powerpoint?id=10.1371/journal.pone.0125592.g005"><div class="definition-label">PPT</div><div class="definition-description">PowerPoint slide</div></a></li><li><a href="article/figure/image?download&amp;size=large&amp;id=10.1371/journal.pone.0125592.g005"><div class="definition-label">PNG</div><div class="definition-description">larger image</div></a></li><li><a href="article/figure/image?download&amp;size=original&amp;id=10.1371/journal.pone.0125592.g005"><div class="definition-label">TIFF</div><div class="definition-description">original image</div></a></li></ul></div><div class="figcaption"><span>Fig 5. </span> Comparison between two different texts of approximately equal length written by different authors in Chinese characters.</div><p class="caption_target"><a id="article1.body1.sec3.sec1.fig4.caption1.p1" name="article1.body1.sec3.sec1.fig4.caption1.p1" class="link-target"></a><p>(a) <em>A Q Zheng Zhuan</em> (filled dots) is compared to the 40<sup><em>th</em></sup> part of <em>Ping Fan De Shi Jie</em> (filled triangles). Note that the two data sets almost completely overlap. This means the difference in the frequency distribution between <em>A Q Zheng Zhuan</em> and <em>Ping Fan De Shi Jie</em> is just caused by the difference in length of the two novels. Furthermore (b) illustrates that this length difference is rather trivial because it just the frequency distribution you get when randomly removing 97.5% of the words from <em>Ping Fan De Shi Jie</em> (dashed curve).</p> </p><p class="caption_object"><a href="https://doi.org/10.1371/journal.pone.0125592.g005"> https://doi.org/10.1371/journal.pone.0125592.g005</a></p></div></div> <div id="section2" class="section toc-section"><a id="sec007" name="sec007" class="link-target" title="Systematic deviations, information loss and multiple meanings of words"></a> <h3>Systematic deviations, information loss and multiple meanings of words</h3> <a id="article1.body1.sec3.sec2.p1" name="article1.body1.sec3.sec2.p1" class="link-target"></a><p>As suggested in the previous section, the clearly discernible deviation in <a href="#pone-0125592-g002">Fig 2(b)</a> between the character-frequency distribution for the data and the RGF-prediction in case of <em>Ping Fan De Shi Jie</em> could be a systematic difference. The cause of this deviation should then be such that it becomes almost undetectable for a 40<sup><em>th</em></sup>-part of the same text, as seen in <a href="#pone-0125592-g005">Fig 5(b)</a>.</p> <a id="article1.body1.sec3.sec2.p2" name="article1.body1.sec3.sec2.p2" class="link-target"></a><p>We here propose that this deviation is caused by the specific linguistic feature that a written word can have more than one meaning. Let us start from an English alphabetic text. A word is then defined as a collection of letters partitioned by blanks (or other partitioning signs). Such a written word could then <em>within</em> the text have more than one meaning. Multiple meanings here means that a word in a dictionary is listed to have several meanings <em>i.e.</em> a written word may consists of a group of words with different meanings. We will call the members of these under-groups primary words. So in order to pick a distinct primary word, you first have to pick a written word and then one of its meanings within the text. It follows that the longer the text is, the larger the chance that several meanings of a written word appear in the text. Our explanation is based on an earlier proposed specific linguistic feature that a more frequently written word occurring in the text, has a tendency of having more meanings [<a href="#pone.0125592.ref033" class="ref-tip">33</a>–<a href="#pone.0125592.ref036" class="ref-tip">36</a>]. This means that a written word which occurs <em>k</em> times in the text on the average consists of a larger number of primary words than a written word which occurs fewer times. Thus if the text consists of <em>N</em>(<em>k</em>) written words which occur <em>k</em> times in the text, then the average number of primary words is <em>N</em><sub><em>P</em></sub>(<em>k</em>) = <em>N</em>(<em>k</em>)<em>f</em>(<em>k</em>) where <em>f</em>(<em>k</em>) describes how the number of multiple meanings depend on the frequency of the written word. In the case of texts written with Chinese characters, it is, as explained the introduction, the characters are the elementary entities carrying individual meanings and hence play the role of words.</p> <a id="article1.body1.sec3.sec2.p3" name="article1.body1.sec3.sec2.p3" class="link-target"></a><p>It is possible to incorporate the concept of multiple meanings into a RGF-type formulation. The point to note is that the distributed entities are really the primary words/Chinese-characters and the information needed to localize a primary word/Chinese-character belonging to a written word/Chinese-character which occurs <em>k</em> times in the text is log<sub>2</sub>(<em>kN</em><sub><em>P</em></sub>(<em>k</em>)) = log<sub>2</sub>(<em>kN</em>(<em>k</em>)<em>f</em>(<em>k</em>)). We want to determine the distribution <em>N</em>(<em>k</em>) taking into account that the information lost, −log<sub>2</sub>(<em>f</em>(<em>k</em>)), caused by the number of multiple multiple meanings (on the average) of a word which occurs <em>k</em> times in the text. It follows the information which then needs to be minimized in order to obtain the maximum entropy solution is the average of log<sub>2</sub>(<em>kN</em>(<em>k</em>))−log<sub>2</sub>(<em>f</em>(<em>k</em>)) or equivalently <a name="pone.0125592.e008" id="pone.0125592.e008" class="link-target"></a><span class="equation"><img src="article/file?type=thumbnail&amp;id=10.1371/journal.pone.0125592.e008" loading="lazy" class="inline-graphic"><span class="note">(4)</span></span> and following the same steps as in <strong>Methods</strong> and [<a href="#pone.0125592.ref027" class="ref-tip">27</a>] this predicts the functional form <a name="pone.0125592.e009" id="pone.0125592.e009" class="link-target"></a><span class="equation"><img src="article/file?type=thumbnail&amp;id=10.1371/journal.pone.0125592.e009" loading="lazy" class="inline-graphic"><span class="note">(5)</span></span></p> <a id="article1.body1.sec3.sec2.p4" name="article1.body1.sec3.sec2.p4" class="link-target"></a><p>Basically the specific linguistic character is that <em>f</em>(<em>k</em>) is an increasing function and that <em>f</em>(<em>k</em> = 1) = 1, because a word which only occurs a single time in the text can only have one meaning within the text. The simplest approximation is then just a linear increase. <a href="#pone-0125592-g006">Fig 6</a> gives some support for this supposition: the average frequency, <span class="inline-formula"><math id="M10" display="inline" overflow="scroll"><mrow><mover accent="true"><mi>k</mi><mo>‾</mo></mover><mo stretchy="false">(</mo><msub><mi>f</mi><mi>D</mi></msub><mo stretchy="false">)</mo></mrow></math></span>, of Chinese characters in <em>Ping Fan De Shi Jie</em>, which have <em>f</em><sub><em>D</em></sub> dictionary meanings,is plotted against <em>f</em><sub><em>D</em></sub>. The plot shows that the <span class="inline-formula"><math id="M11" display="inline" overflow="scroll"><mrow><mover accent="true"><mi>k</mi><mo>‾</mo></mover><mo stretchy="false">(</mo><msub><mi>f</mi><mi>D</mi></msub><mo stretchy="false">)</mo></mrow></math></span> to fair approximation has a linear increase of the form <span class="inline-formula"><math id="M12" display="inline" overflow="scroll"><mrow><mover accent="true"><mi>k</mi><mo>‾</mo></mover><mo>=</mo><msub><mi>f</mi><mi>D</mi></msub><mo>/</mo><msup><mi>c</mi><mo>′</mo></msup><mo>−</mo><mn>1</mn><mo>/</mo><msup><mi>c</mi><mo>′</mo></msup><mo>+</mo><mn>1</mn></mrow></math></span> or equivalently <span class="inline-formula"><math id="M13" display="inline" overflow="scroll"><mrow><msub><mi>f</mi><mi>D</mi></msub><mo>=</mo><msup><mi>c</mi><mo>′</mo></msup><mover accent="true"><mi>k</mi><mo>‾</mo></mover><mo>+</mo><mn>1</mn><mo>−</mo><msup><mi>c</mi><mo>′</mo></msup></mrow></math></span>. <a href="#pone-0125592-g006">Fig 6(a)</a> corresponds to the full text and <a href="#pone-0125592-g006">Fig 6(b)</a> to a 40<sup><em>th</em></sup> part. Note that the slope <em>c</em>′ changes with text size. This is easily understood: shortening the text is, as explained in the previous section, basically the same as randomly removing characters. This means that a character with a smaller <em>k</em> has a larger chance to be completely removed from the text than one with higher. But since the characters with higher frequency on average have a larger number of multiple meanings, this means that the resulting characters with low <em>k</em> will on average have more multiple meanings. Also note that the <em>dictionary</em> meanings and the meanings <em>within</em> a text is not the same; the former is larger than the latter, but the longer the text the more equal they become. However, it is reasonable to assume that also the number of meanings <em>within</em> a text follows a similar linear relationship. Next we make the further simplification by replacing the average <span class="inline-formula"><math id="M14" display="inline" overflow="scroll"><mrow><mover accent="true"><mi>k</mi><mo>‾</mo></mover></mrow></math></span> with just <em>k</em> <em>i.e.</em> we are ignoring the spread in frequency of characters having a specific number of meanings within the text. However, this approximation still catches the increase in meanings with frequency. We will take this linear increase as our ansatz and include a cut-off <em>k</em><sub><em>c</em></sub> for large <em>k</em>, since the most frequent Chinese characters has few multiple meanings. This is a general linguistic feature, the most frequent English words, “the”, has only one meaning. Thus we use the approximate ansatz <em>f</em>(<em>k</em>) ∝ <em>k</em>/(1+<em>k</em>/<em>k</em><sub><em>c</em></sub>). This approximation reduces the RGF functional form to <a name="pone.0125592.e015" id="pone.0125592.e015" class="link-target"></a><span class="equation"><img src="article/file?type=thumbnail&amp;id=10.1371/journal.pone.0125592.e015" loading="lazy" class="inline-graphic"><span class="note">(6)</span></span> where <em>d</em> = 1/<em>k</em><sub><em>c</em></sub>. In addition to the “state”-variable triple (<em>N</em>, <em>M</em>, <em>k</em><sub><em>max</em></sub>) we should specify an a priori knowledge of <em>f</em>(<em>k</em>). The knowledge of this linguistic constraint is limited and enters through its <em>approximate</em> form <em>f</em>(<em>k</em>) ∝ <em>k</em>/(1+<em>kd</em>). This enables us to determine the value <em>d</em> = 1/<em>k</em><sub><em>c</em></sub> from the RFG-method by including the value of the entropy <em>S</em> as an additional constraint. Thus we use RGF in the form of <a href="#pone.0125592.e015">Eq (6)</a> together with the “state”-variable quadruple (<em>N</em>, <em>M</em>, <em>k</em><sub><em>max</em></sub>, <em>S</em>). This follows since the four constants (<em>A</em>′, <em>b</em>, <em>γ</em>, <em>d</em>) in <a href="#pone.0125592.e015">Eq (6)</a>, through RGF-formulation completely determine the quadruple (<em>N</em>, <em>M</em>, <em>k</em><sub><em>max</em></sub>, <em>S</em>) and vice versa. In <a href="#pone-0125592-g007">Fig 7</a> this form of extended RGF is tested on data from three novels written in Chinese characters. The corresponding “state”-quadruples (<em>N</em>, <em>M</em>, <em>k</em><sub><em>max</em></sub>, <em>S</em>) are given in <a href="#pone-0125592-t002">Table 2</a> together with the corresponding predicted output-quadruple (<em>γ</em>, <em>b</em>, <em>k</em><sub><em>max</em></sub>, <em>d</em>). The agreement with the data is in all cases excellent (dashed curves in the <a href="#pone-0125592-g007">Fig 7</a>). The dotted curves are the usual RGF-prediction based on the “state”-triples (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>). Note that for a 100<sup><em>th</em></sup>-part of <em>Ping Fan De Shi Jie</em>, the usual RGF and the extended RGF agrees equally well with the data. This means that any effect of multiple meanings is in this case already taken care of by the usual RGF. However as the text size is increased to 40<sup><em>th</em></sup>-, 10<sup><em>th</em></sup> part and full novel, the extended RGF agrees equally well, whereas the usual RGF-start to deviate. It is this systematic difference, which suggest that there is specific effect beyond the neutral-model prediction given by the usual RGF.</p> <a class="link-target" id="pone-0125592-g006" name="pone-0125592-g006"></a><div class="figure" data-doi="10.1371/journal.pone.0125592.g006"><div class="img-box"><a title="Click for larger image" href="article/figure/image?size=medium&amp;id=10.1371/journal.pone.0125592.g006" data-doi="10.1371/journal.pone.0125592" data-uri="10.1371/journal.pone.0125592.g006"><img src="article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g006" alt="thumbnail" class="thumbnail" loading="lazy"></a><div class="expand"></div></div><div class="figure-inline-download"> Download: <ul><li><a href="article/figure/powerpoint?id=10.1371/journal.pone.0125592.g006"><div class="definition-label">PPT</div><div class="definition-description">PowerPoint slide</div></a></li><li><a href="article/figure/image?download&amp;size=large&amp;id=10.1371/journal.pone.0125592.g006"><div class="definition-label">PNG</div><div class="definition-description">larger image</div></a></li><li><a href="article/figure/image?download&amp;size=original&amp;id=10.1371/journal.pone.0125592.g006"><div class="definition-label">TIFF</div><div class="definition-description">original image</div></a></li></ul></div><div class="figcaption"><span>Fig 6. </span> The average frequency <span class="inline-formula"><math id="M17" display="inline" overflow="scroll"><mrow><mover accent="true"><mi>k</mi><mo>‾</mo></mover></mrow></math></span> for the occurrence of a Chinese character in a given text is plotted against its number of multiple dictionary meanings <em>f</em><sub><em>D</em></sub>.</div><p class="caption_target"><a id="article1.body1.sec3.sec2.fig1.caption1.p1" name="article1.body1.sec3.sec2.fig1.caption1.p1" class="link-target"></a><p>The Chinese character dictionary <em>Xinhua Dictionary</em>, 5<em>th</em> Edition is used for the number of dictionary meanings of Chinese characters. Figure (a) shows the occurrence in the novel <em>Ping Fan De Shi Jie</em> and figure (b) the occurrence for the average 40<sup><em>th</em></sup>-part of the same novel. In both cases the trend of the functional dependence can be represented by a straight line. The linear increase <span class="inline-formula"><math id="M18" display="inline" overflow="scroll"><mrow><msub><mi>f</mi><mi>D</mi></msub><mo>∝</mo><msup><mi>c</mi><mo>′</mo></msup><mover accent="true"><mi>k</mi><mo>‾</mo></mover></mrow></math></span> is for the full novel <em>c</em>′ ≈ 0.0083 and for the 40<sup><em>th</em></sup>-part <em>c</em>′ ≈ 0.34. The reason that <em>c</em>′ increases with decreasing size is explained in the text.</p> </p><p class="caption_object"><a href="https://doi.org/10.1371/journal.pone.0125592.g006"> https://doi.org/10.1371/journal.pone.0125592.g006</a></p></div><a class="link-target" id="pone-0125592-g007" name="pone-0125592-g007"></a><div class="figure" data-doi="10.1371/journal.pone.0125592.g007"><div class="img-box"><a title="Click for larger image" href="article/figure/image?size=medium&amp;id=10.1371/journal.pone.0125592.g007" data-doi="10.1371/journal.pone.0125592" data-uri="10.1371/journal.pone.0125592.g007"><img src="article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g007" alt="thumbnail" class="thumbnail" loading="lazy"></a><div class="expand"></div></div><div class="figure-inline-download"> Download: <ul><li><a href="article/figure/powerpoint?id=10.1371/journal.pone.0125592.g007"><div class="definition-label">PPT</div><div class="definition-description">PowerPoint slide</div></a></li><li><a href="article/figure/image?download&amp;size=large&amp;id=10.1371/journal.pone.0125592.g007"><div class="definition-label">PNG</div><div class="definition-description">larger image</div></a></li><li><a href="article/figure/image?download&amp;size=original&amp;id=10.1371/journal.pone.0125592.g007"><div class="definition-label">TIFF</div><div class="definition-description">original image</div></a></li></ul></div><div class="figcaption"><span>Fig 7. </span> Test of RGF including multiple meaning constraints.</div><p class="caption_target"><a id="article1.body1.sec3.sec2.fig2.caption1.p1" name="article1.body1.sec3.sec2.fig2.caption1.p1" class="link-target"></a><p>The RGF is in each case predicted from the quadruple of state variables (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>, <em>S</em>). The data is from three novels in Chinese (see <a href="#pone-0125592-t002">Table 2</a>). The RGF predictions with multiple meaning constraint are given by the dashed curves. The RGF <em>without</em> the multiple meaning constraint is predicted from the state variable triple (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) and corresponds to the dotted curves. Only when the multiple meaning constraint significantly improves the RGF-prediction can some specific interpretation be associated with it. As seen from the figure the significance increases with increasing length of the novel.</p> </p><p class="caption_object"><a href="https://doi.org/10.1371/journal.pone.0125592.g007"> https://doi.org/10.1371/journal.pone.0125592.g007</a></p></div><a class="link-target" id="pone-0125592-t002" name="pone-0125592-t002"></a><div class="figure" data-doi="10.1371/journal.pone.0125592.t002"><div class="img-box"><a title="Click for larger image" href="article/figure/image?size=medium&amp;id=10.1371/journal.pone.0125592.t002" data-doi="10.1371/journal.pone.0125592" data-uri="10.1371/journal.pone.0125592.t002"><img src="article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.t002" alt="thumbnail" class="thumbnail" loading="lazy"></a><div class="expand"></div></div><div class="figure-inline-download"> Download: <ul><li><a href="article/figure/powerpoint?id=10.1371/journal.pone.0125592.t002"><div class="definition-label">PPT</div><div class="definition-description">PowerPoint slide</div></a></li><li><a href="article/figure/image?download&amp;size=large&amp;id=10.1371/journal.pone.0125592.t002"><div class="definition-label">PNG</div><div class="definition-description">larger image</div></a></li><li><a href="article/figure/image?download&amp;size=original&amp;id=10.1371/journal.pone.0125592.t002"><div class="definition-label">TIFF</div><div class="definition-description">original image</div></a></li></ul></div><div class="figcaption"><span>Table 2. </span> Data and RGF-predictions including multiple meanings.</div><p class="caption_target"><a id="article1.body1.sec3.sec2.table-wrap1.caption1.p1" name="article1.body1.sec3.sec2.table-wrap1.caption1.p1" class="link-target"></a><p>Three Chinese novels are used as the empirical data <em>i.e.</em> <em>A Q Zheng Zhuan</em> written by Xun Lu, <em>Ping Fan De Shi Jie</em> by Yao Lu and <em>Harry Potter</em> (HP for short) volume 1 to 7 (written by J. K. Rowling and translated to Chinese by Ainong Ma <em>et al.</em>). The statistics for the characters are obtained as described in <a href="#pone-0125592-t001">Table 1</a>. In this case the input quadruple (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>, <em>S</em>) is transformed by the RGF-theory into the output prediction (<em>γ</em>, <em>b</em>, <em>d</em>, <em>A</em>′) corresponding to the RGF-form <span class="inline-formula"><math id="M16" display="inline" overflow="scroll"><mrow><msup><mi>A</mi><mo>′</mo></msup><mfrac><mrow><mi>exp</mi><mo stretchy="false">(</mo><mo>−</mo><mi>b</mi><mi>k</mi><mo stretchy="false">)</mo></mrow><mrow><msup><mi>k</mi><mi>γ</mi></msup><msup><mrow><mo stretchy="false">(</mo><mn>1</mn><mo>+</mo><mfrac><mn>1</mn><mrow><mi>d</mi><mi>k</mi></mrow></mfrac><mo stretchy="false">)</mo></mrow><mi>γ</mi></msup></mrow></mfrac></mrow></math></span>.</p> </p><p class="caption_object"><a href="https://doi.org/10.1371/journal.pone.0125592.t002"> https://doi.org/10.1371/journal.pone.0125592.t002</a></p></div><a id="article1.body1.sec3.sec2.p5" name="article1.body1.sec3.sec2.p5" class="link-target"></a><p>Is the multiple meaning explanation sensible? To investigate this we estimate the average number of multiple meanings &lt; <em>f</em>(<em>k</em>) &gt; using the ansatz form for <em>f</em> including the condition that a single character can only have a single meaning in the text <em>f</em>(<em>k</em> = 1) = 1 <em>i.e.</em> <em>f</em>(<em>k</em>) = (1+<em>d</em>)<em>k</em>/(1+<em>kd</em>) together with the obtained values of <em>d</em> (see <a href="#pone-0125592-t002">Table 2</a>) <a name="pone.0125592.e019" id="pone.0125592.e019" class="link-target"></a><span class="equation"><img src="article/file?type=thumbnail&amp;id=10.1371/journal.pone.0125592.e019" loading="lazy" class="inline-graphic"><span class="note">(7)</span></span> These estimated values for &lt; <em>f</em> &gt; are given in <a href="#pone-0125592-t002">Table 2</a>. <a href="#pone-0125592-g008">Fig 8(a)</a> shows that &lt; <em>f</em> &gt; increases with the text length. This is consistent with the fact that the number of uses of a character increases and hence the chance that more of its multiples meanings appears in the text. For the same reason &lt; <em>f</em> &gt; increases with the average number of uses of a character &lt; <em>k</em> &gt; as shown in <a href="#pone-0125592-g008">Fig 8(b)</a>. In addition the chance for a larger number of dictionary meanings is larger for a more frequent character (see <a href="#pone-0125592-g006">Fig 6</a>). Thus it appears that the connection between &lt; <em>f</em> &gt; and multible meanings makes sense.</p> <a class="link-target" id="pone-0125592-g008" name="pone-0125592-g008"></a><div class="figure" data-doi="10.1371/journal.pone.0125592.g008"><div class="img-box"><a title="Click for larger image" href="article/figure/image?size=medium&amp;id=10.1371/journal.pone.0125592.g008" data-doi="10.1371/journal.pone.0125592" data-uri="10.1371/journal.pone.0125592.g008"><img src="article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g008" alt="thumbnail" class="thumbnail" loading="lazy"></a><div class="expand"></div></div><div class="figure-inline-download"> Download: <ul><li><a href="article/figure/powerpoint?id=10.1371/journal.pone.0125592.g008"><div class="definition-label">PPT</div><div class="definition-description">PowerPoint slide</div></a></li><li><a href="article/figure/image?download&amp;size=large&amp;id=10.1371/journal.pone.0125592.g008"><div class="definition-label">PNG</div><div class="definition-description">larger image</div></a></li><li><a href="article/figure/image?download&amp;size=original&amp;id=10.1371/journal.pone.0125592.g008"><div class="definition-label">TIFF</div><div class="definition-description">original image</div></a></li></ul></div><div class="figcaption"><span>Fig 8. </span> Consistency test of the multiple meaning model.</div><p class="caption_target"><a id="article1.body1.sec3.sec2.fig3.caption1.p1" name="article1.body1.sec3.sec2.fig3.caption1.p1" class="link-target"></a><p>According to the multiple meaning model the parameter <em>d</em> (see <a href="#pone-0125592-t002">Table 2</a>) should give a sensible approximative estimate of the average number of multiple meanings per character within a text &lt; <em>f</em> &gt;. The figure shows that &lt; <em>f</em> &gt; increases with the size of the text <em>M</em>. This is consistent with the fact the number of uses of a character increases and hence the chance that more of its multiples meanings appears in the text. For the same reason &lt; <em>f</em> &gt; increases with the average number of uses of a character &lt; <em>k</em> &gt;. In addition the chance for a larger number of dictionary meanings is larger for a more frequent character (see <a href="#pone-0125592-g006">Fig 6</a>). The inset shows how &lt; <em>k</em> &gt; increases with <em>M</em>.</p> </p><p class="caption_object"><a href="https://doi.org/10.1371/journal.pone.0125592.g008"> https://doi.org/10.1371/journal.pone.0125592.g008</a></p></div><a id="article1.body1.sec3.sec2.p6" name="article1.body1.sec3.sec2.p6" class="link-target"></a><p>Multiple meaning is of course not a unique feature of Chinese, it is a common feature of many languages. Therefore, it is unsurprising that we can also observe systematic deviations from the RGF-prediction in other languages, such as English [<a href="#pone.0125592.ref024" class="ref-tip">24</a>] and Russian [<a href="#pone.0125592.ref036" class="ref-tip">36</a>]. However, the average meaning of English words are much less than that of Chinese character: in modern Chinese there are only about 3,500 commonly used characters [<a href="#pone.0125592.ref037" class="ref-tip">37</a>] and even for a novel including more than one million of characters, the number of distinct characters involved is less than 4,000 (see <a href="#pone-0125592-t002">Table 2</a>); but for the same novel written in English, the number of distinct words is more than 20,000 (see <a href="#pone-0125592-g009">Fig 9(a)</a>). Therefore, the systematic deviation caused by multiple meaning can be neglected for short English text, as shown in <a href="#pone-0125592-g009">Fig 9(b)</a>. Even for a rather long text, the deviation is still very slight and, as shown in <a href="#pone-0125592-g009">Fig 9(a)</a>, the usual RGF gives a good prediction (RGF with multiple meaning constraint incorporates more <em>a priori</em> information and may consequently be expected to give a better prediction but the difference is very small). Taken together, Chinese uses a small amount of characters to describe the primary word, resulting in a high degree of multiple meanings, further leading to that the head of the character-frequency distribution (or tail of the frequency-rank distribution) deviates somewhat from the RGF-prediction. But such deviations are not special to Chinese, as we have demonstrated in <a href="#pone-0125592-g009">Fig 9</a>, it is just more pronounced in Chinese than in some other languages.</p> <a class="link-target" id="pone-0125592-g009" name="pone-0125592-g009"></a><div class="figure" data-doi="10.1371/journal.pone.0125592.g009"><div class="img-box"><a title="Click for larger image" href="article/figure/image?size=medium&amp;id=10.1371/journal.pone.0125592.g009" data-doi="10.1371/journal.pone.0125592" data-uri="10.1371/journal.pone.0125592.g009"><img src="article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g009" alt="thumbnail" class="thumbnail" loading="lazy"></a><div class="expand"></div></div><div class="figure-inline-download"> Download: <ul><li><a href="article/figure/powerpoint?id=10.1371/journal.pone.0125592.g009"><div class="definition-label">PPT</div><div class="definition-description">PowerPoint slide</div></a></li><li><a href="article/figure/image?download&amp;size=large&amp;id=10.1371/journal.pone.0125592.g009"><div class="definition-label">PNG</div><div class="definition-description">larger image</div></a></li><li><a href="article/figure/image?download&amp;size=original&amp;id=10.1371/journal.pone.0125592.g009"><div class="definition-label">TIFF</div><div class="definition-description">original image</div></a></li></ul></div><div class="figcaption"><span>Fig 9. </span> Test of RGF including multiple meaning constraints for English books.</div><p class="caption_target"><a id="article1.body1.sec3.sec2.fig4.caption1.p1" name="article1.body1.sec3.sec2.fig4.caption1.p1" class="link-target"></a><p>The RGF is in each case predicted from the quadruple of state variables (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>, <em>S</em>). The data is from two English novels: <em>Tess of the d’Urbervilles</em> written by T. Hardy and <em>Harry Potter</em> volume 1 to 7 by J. K. Rowling. The RGF predictions with multiple meaning constraint are given by the dashed curves. The RGF without the multiple meaning constraint is predicted from the state variable triple (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) and corresponds to the dotted curves.</p> </p><p class="caption_object"><a href="https://doi.org/10.1371/journal.pone.0125592.g009"> https://doi.org/10.1371/journal.pone.0125592.g009</a></p></div></div> </div> <div xmlns:plos="http://plos.org" id="section4" class="section toc-section"><a id="sec008" name="sec008" data-toc="sec008" class="link-target" title="Discussion"></a><h2>Discussion</h2><a id="article1.body1.sec4.p1" name="article1.body1.sec4.p1" class="link-target"></a><p>The view taken in the present paper is somewhat different and heretical compared to a large body of earlier work [<a href="#pone.0125592.ref003" class="ref-tip">3</a>–<a href="#pone.0125592.ref018" class="ref-tip">18</a>]. First of all we argue that Zipf’s law is not a good starting point, when trying to extract information from word/character frequency distributions. Our starting point is instead a neutral-model containing a minimal <em>a priori</em> information about the system. From this minimal information, the frequency distribution is predicted through a maximum entropy principle. The minimal information consists of the “state”-variable triple (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) corresponding to the <em>(total number of-, number of different-, maximum occurrence of most frequent-)</em> word/character, respectively. The shape of the distribution is entirely determined by the triple (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>). Within this RGF-approach, Zipf’s-law (or any other power law with an exponent different from the Zipf’s law exponent) distribution only results for seemingly accidental triples of (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>). The first question is then if these Zipf’s law triples are really accidental or if they carry some additional information about the system. According to our findings there is nothing special about these power-law cases. First of all in the examples discussed here, Zipf’s law is in most cases not a good approximation of the data, whereas the RGF-prediction in general gives a very good account of all the data <em>including</em> the rare cases when the distribution is close to a Zipf’s law. Second, translating a novel between languages, or between words and Chinese characters, or taking parts of the novel, all changes the triple (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>). This means that the shape of the distribution changes, such that if it happened to be close to a Zipf’s law before the change, it deviates after. Furthermore, in the case of taking parts of a novel, the change in the triple (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) is to large extent trivial, which means that there is no subtle constraint for preferring special values of (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>). All what this leads up to is that the distributions you find in word/character frequencies are very general and apply to any system which can be similarly described in terms of the triple (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>) as discussed in [<a href="#pone.0125592.ref027" class="ref-tip">27</a>, <a href="#pone.0125592.ref028" class="ref-tip">28</a>]. From this point of view the word/character frequency carries little specific information about languages.</p> <a id="article1.body1.sec4.p2" name="article1.body1.sec4.p2" class="link-target"></a><p>In a wider context, this generality and lack of system-dependence was also expressed in [<a href="#pone.0125592.ref028" class="ref-tip">28</a>] as: <em>…we can safely exclude the possibility that the processes that led to the distribution of avian species over families also wrote the United States’ declaration of independence, yet both are described by RGF</em>, and earlier and more drastically by Herbert Simon in [<a href="#pone.0125592.ref011" class="ref-tip">11</a>]: <em>No one supposes that there is any connection between horse-kicks suffered by soldiers in the German army and blood cells on a microscopic slide other than that the same urn scheme provides a satisfactory abstract model for both phenomena</em>. The urn scheme used in the present paper is the maximum entropy principle in the form of RGF.</p> <a id="article1.body1.sec4.p3" name="article1.body1.sec4.p3" class="link-target"></a><p>Herbert Simon’s own urn model is called the Simon model [<a href="#pone.0125592.ref011" class="ref-tip">11</a>]. The problem with the Simon model in the context of written text is that it does presume a specific relation between the parameters of the “state”-triple (<em>M</em>, <em>N</em>, <em>k</em><sub><em>max</em></sub>): for a text with a given <em>M</em> and <em>N</em>, the Simon model <em>predicts</em> a <em>k</em><sub><em>max</em></sub>. This value of <em>k</em><sub><em>max</em></sub> is quite different from the ones describing the real data analyzed here. For example in case of the “state” triple for <em>A Q Zheng Zhuan</em> in Chinese characters the values of <em>M</em> and <em>N</em> are 17,915 and 1,552, respectively (see <a href="#pone-0125592-t001">Table 1</a>) and the Simon model predicts <em>k</em><sub><em>max</em></sub> = 9,256 and <em>P</em>(<em>k</em>) in the form of a power law given by ∝ 1/<em>k</em><sup>2.1</sup>. Thus the most common character accounts for about 50% of the total text, which does not correspond to any realistic language. <a href="#pone-0125592-g010">Fig 10(a)</a> compares this Simon model result with the real data, as well as with the corresponding RGF-predictions. You could perhaps imagine that you in each case could modify the Simon model so as to produce the correct “state”-triple. However, even so a modified Simon models will anyway have a serious problem, as discussed in [<a href="#pone.0125592.ref024" class="ref-tip">24</a>]: if you take a novel written by the Simon stochastic model and divide it into two equally sized parts, then the first part has a quite different triple (<em>M</em>/2, <em>N</em><sub>1/2</sub>, <em>k</em><sub><em>max</em></sub>/2) than the second. Yet both parts of a real book are described by the same “state”-variable triple. This means that the change in shape of the distribution by partitioning cannot be correctly described within any stochastic Simon-type model.</p> <a class="link-target" id="pone-0125592-g010" name="pone-0125592-g010"></a><div class="figure" data-doi="10.1371/journal.pone.0125592.g010"><div class="img-box"><a title="Click for larger image" href="article/figure/image?size=medium&amp;id=10.1371/journal.pone.0125592.g010" data-doi="10.1371/journal.pone.0125592" data-uri="10.1371/journal.pone.0125592.g010"><img src="article/figure/image?size=inline&amp;id=10.1371/journal.pone.0125592.g010" alt="thumbnail" class="thumbnail" loading="lazy"></a><div class="expand"></div></div><div class="figure-inline-download"> Download: <ul><li><a href="article/figure/powerpoint?id=10.1371/journal.pone.0125592.g010"><div class="definition-label">PPT</div><div class="definition-description">PowerPoint slide</div></a></li><li><a href="article/figure/image?download&amp;size=large&amp;id=10.1371/journal.pone.0125592.g010"><div class="definition-label">PNG</div><div class="definition-description">larger image</div></a></li><li><a href="article/figure/image?download&amp;size=original&amp;id=10.1371/journal.pone.0125592.g010"><div class="definition-label">TIFF</div><div class="definition-description">original image</div></a></li></ul></div><div class="figcaption"><span>Fig 10. </span> Test of the Simon model.</div><p class="caption_target"><a id="article1.body1.sec4.fig1.caption1.p1" name="article1.body1.sec4.fig1.caption1.p1" class="link-target"></a><p>(a) The data (solid triangles) together with the RGF-prediction (dashes curve) for <em>A Q Zheng Zhuan</em> in Chinese characters. The Simon model with the same <em>M</em> and <em>N</em> are given by the solid dots and the Simon prediction for infinite <em>M</em> by the dotted line. Note that the most common character appears 9,256 times for the Simon model which is about 50% of the total number of characters. This is completely unrealistic for a sensible language (the most common character in Chinese is about 4% and the most common word in English “the” is also about 4%). Figure (b) shows that the frequency distribution for Simon model is not translation invariant: For a real novel the word frequency distribution of the first half of the novel is to good approximation the same as the second. The data for the novel <em>A Q Zheng Zhuan</em> in Chinese characters illustrates this (full drawn and short dashed curves in the figure). However for the Simon model the frequency distribution depends on which part you take (long dashed- and dotted curves in the figure).</p> </p><p class="caption_object"><a href="https://doi.org/10.1371/journal.pone.0125592.g010"> https://doi.org/10.1371/journal.pone.0125592.g010</a></p></div><a id="article1.body1.sec4.p4" name="article1.body1.sec4.p4" class="link-target"></a><p>From the point of view of the present approach, the fact that the data is very well described by the RGF-model gives a tentative handle to get one step further: since RGF is a neutral-model prediction, the implication is that any systematic deviations between the data and the RGF-prediction might carry additional specific information about the system. Such a deviation was shown to become more discernable the longer the text written in Chinese characters is. The multiple meaning of Chinese characters was suggested as an explanatory factor of this phenomenon. This is based on the notion that characters/words used with larger frequency have a tendency to have more multiple meanings within a text. Some supports for this was gained be comparing to the dictionary meanings of a Chinese character. It was also argued that this tendency of more multiple meanings could be entered as an additional constraint within the RGF-formulation. Comparison with data suggested that this is indeed a sensible contender for an explanation.</p> <a id="article1.body1.sec4.p5" name="article1.body1.sec4.p5" class="link-target"></a><p>Our view is that the neutral-model provided by RGF provides a useful starting point for extracting information from word/character distributions in texts. It has the advantage, compared to most other approaches, in that it actually predicts the real data from a very limited amount of <em>a priori</em> information. It also has the advantage of being a general approach which can be applied to a great variety of different systems.</p> </div> <div xmlns:plos="http://plos.org" id="section5" class="section toc-section"><a id="sec009" name="sec009" data-toc="sec009" class="link-target" title="Supporting Information"></a><h2>Supporting Information</h2><div class="figshare_widget" doi="10.1371/journal.pone.0125592"></div><div class="supplementary-material"><a name="pone.0125592.s001" id="pone.0125592.s001" class="link-target"></a><h3 class="siTitle title-small"><a href="article/file?type=supplementary&amp;id=10.1371/journal.pone.0125592.s001">S1 Data Set. </a>The book data used in this paper.</h3><p class="siDoi"><a href="https://doi.org/10.1371/journal.pone.0125592.s001">https://doi.org/10.1371/journal.pone.0125592.s001</a></p><a id="article1.body1.sec5.supplementary-material1.caption1.p1" name="article1.body1.sec5.supplementary-material1.caption1.p1" class="link-target"></a><p class="postSiDOI">(ZIP)</p> </div></div> <div xmlns:plos="http://plos.org" class="section toc-section"><a id="ack" name="ack" data-toc="ack" title="Acknowledgments" class="link-target"></a><h2>Acknowledgments</h2> <a id="article1.back1.ack1.p1" name="article1.back1.ack1.p1" class="link-target"></a><p>Economic support from IceLab is gratefully acknowledged.</p> </div><div xmlns:plos="http://plos.org" class="contributions toc-section"><a id="authcontrib" name="authcontrib" data-toc="authcontrib" title="Author Contributions"></a><h2>Author Contributions</h2><p>Conceived and designed the experiments: PM XY. Performed the experiments: XY. Analyzed the data: XY. Wrote the paper: PM.</p></div><div xmlns:plos="http://plos.org" class="toc-section"><a id="references" name="references" class="link-target" data-toc="references" title="References"></a><h2>References</h2><ol class="references"><li id="ref1"><span class="order">1. </span><a name="pone.0125592.ref001" id="pone.0125592.ref001" class="link-target"></a> Singh S. The code book. New York: Random House; 2002. <ul class="find-nolinks"></ul></li><li id="ref2"><span class="order">2. </span><a name="pone.0125592.ref002" id="pone.0125592.ref002" class="link-target"></a> Estroup JB. Les Gammes Sténographiques. 4th ed. Paris: Institut Stenographique de France; 1916. <ul class="find-nolinks"></ul></li><li id="ref3"><span class="order">3. </span><a name="pone.0125592.ref003" id="pone.0125592.ref003" class="link-target"></a> Zipf GK. Selective studies of the principle of relative frequency in language. Cambridge: Harvard University Press; 1932. <ul class="find-nolinks"></ul></li><li id="ref4"><span class="order">4. </span><a name="pone.0125592.ref004" id="pone.0125592.ref004" class="link-target"></a> Zipf GK. The psycho-biology of language: an introduction to dynamic philology. Boston: Mifflin; 1935. <ul class="find-nolinks"></ul></li><li id="ref5"><span class="order">5. </span><a name="pone.0125592.ref005" id="pone.0125592.ref005" class="link-target"></a> Zipf GK. Human bevavior and the principle of least effort. Reading, MA: Addison-Wesley; 1949. <ul class="find-nolinks"></ul></li><li id="ref6"><span class="order">6. </span><a name="pone.0125592.ref006" id="pone.0125592.ref006" class="link-target"></a> Mandelbrot B. An informational theory of the statistical structure of languages. Woburn: Butterworth; 1953. <ul class="find-nolinks"></ul></li><li id="ref7"><span class="order">7. </span><a name="pone.0125592.ref007" id="pone.0125592.ref007" class="link-target"></a> Li W. Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE T Inform Theory. 1992;38: 1842–1845. <ul class="reflinks" data-doi="10.1109/18.165464"><li><a href="https://doi.org/10.1109/18.165464" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Random+texts+exhibit+Zipf%E2%80%99s-law-like+word+frequency+distribution+Li+1992" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref8"><span class="order">8. </span><a name="pone.0125592.ref008" id="pone.0125592.ref008" class="link-target"></a> Baayen RH. Word frequency distributions. Dordrecht: Kluwer Academic; 2001. <ul class="find-nolinks"></ul></li><li id="ref9"><span class="order">9. </span><a name="pone.0125592.ref009" id="pone.0125592.ref009" class="link-target"></a> i Cancho RF, Solé RV. Least effort and the origins of scaling in human language. Proc Natl Acad Sci USA. 2003;100: 788–791. <ul class="reflinks" data-doi="10.1073/pnas.0335980100"><li><a href="https://doi.org/10.1073/pnas.0335980100" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Least+effort+and+the+origins+of+scaling+in+human+language+i+Cancho+2003" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref10"><span class="order">10. </span><a name="pone.0125592.ref010" id="pone.0125592.ref010" class="link-target"></a> Montemurro MA. Beyond the Zipf-Mandelbrot law in quantitative linguistics. Physica A. 2001;300: 567–578. <ul class="reflinks" data-doi="10.1016/S0378-4371(01)00355-7"><li><a href="https://doi.org/10.1016/S0378-4371(01)00355-7" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Beyond+the+Zipf-Mandelbrot+law+in+quantitative+linguistics+Montemurro+2001" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref11"><span class="order">11. </span><a name="pone.0125592.ref011" id="pone.0125592.ref011" class="link-target"></a> Simon H. On a class of skew distribution functions. Biometrika. 1955;42: 425–440. <ul class="reflinks" data-doi="10.2307/2333389"><li><a href="https://doi.org/10.2307/2333389" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=On+a+class+of+skew+distribution+functions+Simon+1955" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref12"><span class="order">12. </span><a name="pone.0125592.ref012" id="pone.0125592.ref012" class="link-target"></a> Kanter I, Kessler DA. Markov processes: linguistics and Zipf’s Law. Phys Rev Lett. 1995;74: 4559–4562. pmid:10058537 <ul class="reflinks" data-doi="10.1103/PhysRevLett.74.4559"><li><a href="https://doi.org/10.1103/PhysRevLett.74.4559" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://www.ncbi.nlm.nih.gov/pubmed/10058537" target="_new" title="Go to article in PubMed"> PubMed/NCBI </a></li><li><a href="http://scholar.google.com/scholar?q=Markov+processes%3A+linguistics+and+Zipf%E2%80%99s+Law+Kanter+1995" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref13"><span class="order">13. </span><a name="pone.0125592.ref013" id="pone.0125592.ref013" class="link-target"></a> Dorogovtsev SN, Mendes JFF. Languague as an evolving word web. Proc R Soc Lond B. 2001;268: 2603–2606. <ul class="reflinks" data-doi="10.1098/rspb.2001.1824"><li><a href="https://doi.org/10.1098/rspb.2001.1824" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Languague+as+an+evolving+word+web+Dorogovtsev+2001" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref14"><span class="order">14. </span><a name="pone.0125592.ref014" id="pone.0125592.ref014" class="link-target"></a> Zanette DH, Montemurro MA. Dynamics of text generation with realistic Zipf’s distribution. J Quant Linguistics. 2005; 12: 29–40. <ul class="reflinks" data-doi="10.1080/09296170500055293"><li><a href="https://doi.org/10.1080/09296170500055293" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Dynamics+of+text+generation+with+realistic+Zipf%E2%80%99s+distribution+Zanette+2005" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref15"><span class="order">15. </span><a name="pone.0125592.ref015" id="pone.0125592.ref015" class="link-target"></a> Wang DH, Li MH, Di ZR. Ture reason for Zipf’s law in language. Physica A. 2005;358: 545–550. <ul class="reflinks" data-doi="10.1016/j.physa.2005.04.021"><li><a href="https://doi.org/10.1016/j.physa.2005.04.021" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Ture+reason+for+Zipf%E2%80%99s+law+in+language+Wang+2005" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref16"><span class="order">16. </span><a name="pone.0125592.ref016" id="pone.0125592.ref016" class="link-target"></a> Masucci A, Rodgers G. Networks properties of written human language. Phys Rev E. 2006;74: 26102. <ul class="reflinks" data-doi="10.1103/PhysRevE.74.026102"><li><a href="https://doi.org/10.1103/PhysRevE.74.026102" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Networks+properties+of+written+human+language+Masucci+2006" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref17"><span class="order">17. </span><a name="pone.0125592.ref017" id="pone.0125592.ref017" class="link-target"></a> Cattuto C, Loreto V, Servedio VDP. A Yule-Simon process with memory. Europhys Lett. 2006;76: 208–214. <ul class="reflinks" data-doi="10.1209/epl/i2006-10263-9"><li><a href="https://doi.org/10.1209/epl/i2006-10263-9" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=A+Yule-Simon+process+with+memory+Cattuto+2006" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref18"><span class="order">18. </span><a name="pone.0125592.ref018" id="pone.0125592.ref018" class="link-target"></a> Lü L, Zhang Z-K, Zhou T. Deviation of Zipf’s and Heaps’ laws in human languages with limited dictionary sizes. Sci Rep. 2013;3: 1082. pmid:23378896 <ul class="reflinks" data-doi="10.1038/srep01082"><li><a href="https://doi.org/10.1038/srep01082" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://www.ncbi.nlm.nih.gov/pubmed/23378896" target="_new" title="Go to article in PubMed"> PubMed/NCBI </a></li><li><a href="http://scholar.google.com/scholar?q=Deviation+of+Zipf%E2%80%99s+and+Heaps%E2%80%99+laws+in+human+languages+with+limited+dictionary+sizes+L%C3%BC+2013" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref19"><span class="order">19. </span><a name="pone.0125592.ref019" id="pone.0125592.ref019" class="link-target"></a> Zhao KH. Physics nomenclature in China. Am J Phys. 1990;58: 449–452. <ul class="reflinks" data-doi="10.1119/1.16476"><li><a href="https://doi.org/10.1119/1.16476" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Physics+nomenclature+in+China+Zhao+1990" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref20"><span class="order">20. </span><a name="pone.0125592.ref020" id="pone.0125592.ref020" class="link-target"></a> Rousseau R, Zhang Q. Zipf’s data on the frequency of Chinese words revisited. Scientometrics. 1992;24: 201–220. <ul class="reflinks" data-doi="10.1007/BF02017909"><li><a href="https://doi.org/10.1007/BF02017909" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Zipf%E2%80%99s+data+on+the+frequency+of+Chinese+words+revisited+Rousseau+1992" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref21"><span class="order">21. </span><a name="pone.0125592.ref021" id="pone.0125592.ref021" class="link-target"></a> Shtrikman S. Some comments on Zipf’s law for the Chinese language. J Inf Sci. 1994;20: 142–143. <ul class="reflinks" data-doi="10.1177/016555159402000208"><li><a href="https://doi.org/10.1177/016555159402000208" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Some+comments+on+Zipf%E2%80%99s+law+for+the+Chinese+language+Shtrikman+1994" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref22"><span class="order">22. </span><a name="pone.0125592.ref022" id="pone.0125592.ref022" class="link-target"></a>Ha LQ, Sicilia-Garcia EI, Ming J, Smith FJ. Extension of Zipf’s law to words and phrases. Proc 19th Intl Conf Comput Linguistics. 2002: 315-320. <ul class="find-nolinks"></ul></li><li id="ref23"><span class="order">23. </span><a name="pone.0125592.ref023" id="pone.0125592.ref023" class="link-target"></a> Bernhardsson S, da Rocha LEC, Minnhagen P. Size dependent word frequencies and the translational invariance of books. Physica A. 2010;389: 330–341. <ul class="reflinks" data-doi="10.1016/j.physa.2009.09.022"><li><a href="https://doi.org/10.1016/j.physa.2009.09.022" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Size+dependent+word+frequencies+and+the+translational+invariance+of+books+Bernhardsson+2010" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref24"><span class="order">24. </span><a name="pone.0125592.ref024" id="pone.0125592.ref024" class="link-target"></a> Bernhardsson S, da Rocha LEC, Minnhagen P. The meta book and size-dependent properties of written language. New J Phys. 2009;11: 123015. <ul class="reflinks" data-doi="10.1088/1367-2630/11/12/123015"><li><a href="https://doi.org/10.1088/1367-2630/11/12/123015" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=The+meta+book+and+size-dependent+properties+of+written+language+Bernhardsson+2009" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref25"><span class="order">25. </span><a name="pone.0125592.ref025" id="pone.0125592.ref025" class="link-target"></a> Miller GA. Some effects of intermittance silence. Am J Psychol. 1957;70: 311–314. pmid:13424784 <ul class="reflinks" data-doi="10.2307/1419346"><li><a href="https://doi.org/10.2307/1419346" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://www.ncbi.nlm.nih.gov/pubmed/13424784" target="_new" title="Go to article in PubMed"> PubMed/NCBI </a></li><li><a href="http://scholar.google.com/scholar?q=Some+effects+of+intermittance+silence+Miller+1957" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref26"><span class="order">26. </span><a name="pone.0125592.ref026" id="pone.0125592.ref026" class="link-target"></a> Bernhardsson S, Baek SK, Minnhagen P. A paradoxical property of the monkey book. J Stat Mech. 2011;7: PO7013. <ul class="reflinks" data-doi="10.1088/1742-5468/2011/07/P07013"><li><a href="https://doi.org/10.1088/1742-5468/2011/07/P07013" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=A+paradoxical+property+of+the+monkey+book+Bernhardsson+2011" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref27"><span class="order">27. </span><a name="pone.0125592.ref027" id="pone.0125592.ref027" class="link-target"></a> Baek SK, Bernhardsson S, Minnhagen P. Zipf’s law unzipped. New J Phys. 2011;13: 043004. <ul class="reflinks" data-doi="10.1088/1367-2630/13/4/043004"><li><a href="https://doi.org/10.1088/1367-2630/13/4/043004" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Zipf%E2%80%99s+law+unzipped+Baek+2011" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref28"><span class="order">28. </span><a name="pone.0125592.ref028" id="pone.0125592.ref028" class="link-target"></a> Bokma F, Baek SK, Minnhagen P. 50 years of inordinate fondness. Syst biol. 2013: syt067. <ul class="reflinks"><li><a href="#" data-author="Bokma" data-cit="%0ABokmaF%2C%20BaekSK%2C%20MinnhagenP.%2050%20years%20of%20inordinate%20fondness.%20Syst%20biol.%202013%3A%20syt067." data-title="50%20years%20of%20inordinate%20fondness" target="_new" title="Go to article in CrossRef"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=50+years+of+inordinate+fondness+Bokma+2013" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref29"><span class="order">29. </span><a name="pone.0125592.ref029" id="pone.0125592.ref029" class="link-target"></a> Clauset A, Shalizi CR and Newman MEJ. Power-law distributions in empirical data. SIAM Rev. 2009;51: 661–703. <ul class="reflinks" data-doi="10.1137/070710111"><li><a href="https://doi.org/10.1137/070710111" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Power-law+distributions+in+empirical+data+Clauset+2009" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref30"><span class="order">30. </span><a name="pone.0125592.ref030" id="pone.0125592.ref030" class="link-target"></a> Visser M. Zipf’s law, power laws and maximum entropy. New J Phys. 2013;15: 043021. <ul class="reflinks" data-doi="10.1088/1367-2630/15/4/043021"><li><a href="https://doi.org/10.1088/1367-2630/15/4/043021" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Zipf%E2%80%99s+law%2C+power+laws+and+maximum+entropy+Visser+2013" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref31"><span class="order">31. </span><a name="pone.0125592.ref031" id="pone.0125592.ref031" class="link-target"></a> Lee SH, Bernhardsson S, Holme P, Kim BJ, Minnhagen P. Neutral theory of chemical reaction networks. New J Phys. 2012; 14: 033032. <ul class="reflinks" data-doi="10.1088/1367-2630/14/3/033032"><li><a href="https://doi.org/10.1088/1367-2630/14/3/033032" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Neutral+theory+of+chemical+reaction+networks+Lee+2012" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref32"><span class="order">32. </span><a name="pone.0125592.ref032" id="pone.0125592.ref032" class="link-target"></a> Baek SK, Minnhagen P, Kim BJ. The ten thousand Kims. New J Phys. 2011;13: 073036. <ul class="reflinks" data-doi="10.1088/1367-2630/13/7/073036"><li><a href="https://doi.org/10.1088/1367-2630/13/7/073036" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=The+ten+thousand+Kims+Baek+2011" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref33"><span class="order">33. </span><a name="pone.0125592.ref033" id="pone.0125592.ref033" class="link-target"></a> Zipf GK. The meaning-frequency relationship of words. J Gen Psychol. 1945;33: 251–256. pmid:21006715 <ul class="reflinks" data-doi="10.1080/00221309.1945.10544509"><li><a href="https://doi.org/10.1080/00221309.1945.10544509" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://www.ncbi.nlm.nih.gov/pubmed/21006715" target="_new" title="Go to article in PubMed"> PubMed/NCBI </a></li><li><a href="http://scholar.google.com/scholar?q=The+meaning-frequency+relationship+of+words+Zipf+1945" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref34"><span class="order">34. </span><a name="pone.0125592.ref034" id="pone.0125592.ref034" class="link-target"></a> Reder L, Anderson JR, Bjork RA. A semantic interpretation of encoding specificity. J Exp Psychol. 1974;102: 648–656. <ul class="reflinks" data-doi="10.1037/h0036115"><li><a href="https://doi.org/10.1037/h0036115" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=A+semantic+interpretation+of+encoding+specificity+Reder+1974" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref35"><span class="order">35. </span><a name="pone.0125592.ref035" id="pone.0125592.ref035" class="link-target"></a> i Cancho RF. The variation of Zipf’s law in human language. Eur Phys J B. 2005;44: 249–257. <ul class="reflinks" data-doi="10.1140/epjb/e2005-00121-8"><li><a href="https://doi.org/10.1140/epjb/e2005-00121-8" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=The+variation+of+Zipf%E2%80%99s+law+in+human+language+i+Cancho+2005" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref36"><span class="order">36. </span><a name="pone.0125592.ref036" id="pone.0125592.ref036" class="link-target"></a> Manin DY. Zipf’s law and avoidance of excessive synonymy. Cognitive Sci. 2008;32: 1075–1098. <ul class="reflinks" data-doi="10.1080/03640210802020003"><li><a href="https://doi.org/10.1080/03640210802020003" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://scholar.google.com/scholar?q=Zipf%E2%80%99s+law+and+avoidance+of+excessive+synonymy+Manin+2008" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li><li id="ref37"><span class="order">37. </span><a name="pone.0125592.ref037" id="pone.0125592.ref037" class="link-target"></a> Yan X, Fan Y, Di Z, Havlin S, Wu J. Efficient learning strategy of Chinese characters based on network approach. PLoS ONE. 2013;8: e69745. pmid:23990887 <ul class="reflinks" data-doi="10.1371/journal.pone.0069745"><li><a href="https://doi.org/10.1371/journal.pone.0069745" data-author="doi-provided" data-cit="doi-provided" data-title="doi-provided" target="_new" title="Go to article"> View Article </a></li><li><a href="http://www.ncbi.nlm.nih.gov/pubmed/23990887" target="_new" title="Go to article in PubMed"> PubMed/NCBI </a></li><li><a href="http://scholar.google.com/scholar?q=Efficient+learning+strategy+of+Chinese+characters+based+on+network+approach+Yan+2013" target="_new" title="Go to article in Google Scholar"> Google Scholar </a></li></ul></li></ol></div> <div class="ref-tooltip"> <div class="ref_tooltip-content"> </div> </div> </div> </div> </div> </section> <aside class="article-aside"> <!--[if IE 9]> <style> .dload-xml {margin-top: 38px} </style> <![endif]--> <div class="dload-menu"> <div class="dload-pdf"> <a href="/plosone/article/file?id=10.1371/journal.pone.0125592&type=printable" id="downloadPdf" target="_blank">Download PDF</a> </div> <div data-js-tooltip-hover="trigger" class="dload-hover">&nbsp; <ul class="dload-xml" data-js-tooltip-hover="target"> <li><a href="/plosone/article/citation?id=10.1371/journal.pone.0125592" id="downloadCitation">Citation</a></li> <li><a href="/plosone/article/file?id=10.1371/journal.pone.0125592&type=manuscript" id="downloadXml">XML</a> </li> </ul> </div> </div> <div class="aside-container"> <div class="print-article" id="printArticle" data-js-tooltip-hover="trigger"> <a href="#" onclick="window.print(); return false;" class="preventDefault" id="printBrowser">Print</a> </div> <div class="share-article" id="shareArticle" data-js-tooltip-hover="trigger"> Share <ul data-js-tooltip-hover="target" class="share-options" id="share-options"> <li><a href="https://www.reddit.com/submit?url=https%3A%2F%2Fdx.plos.org%2F10.1371%2Fjournal.pone.0125592" id="shareReddit" target="_blank" title="Submit to Reddit"><img src="/resource/img/icon.reddit.16.png" width="16" height="16" alt="Reddit">Reddit</a></li> <li><a href="https://www.facebook.com/share.php?u=https%3A%2F%2Fdx.plos.org%2F10.1371%2Fjournal.pone.0125592&t=Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings" id="shareFacebook" target="_blank" title="Share on Facebook"><img src="/resource/img/icon.fb.16.png" width="16" height="16" alt="Facebook">Facebook</a></li> <li><a href="https://www.linkedin.com/shareArticle?url=https%3A%2F%2Fdx.plos.org%2F10.1371%2Fjournal.pone.0125592&title=Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings&summary=Checkout this article I found at PLOS" id="shareLinkedIn" target="_blank" title="Add to LinkedIn"><img src="/resource/img/icon.linkedin.16.png" width="16" height="16" alt="LinkedIn">LinkedIn</a></li> <li><a href="https://www.mendeley.com/import/?url=https%3A%2F%2Fdx.plos.org%2F10.1371%2Fjournal.pone.0125592" id="shareMendeley" target="_blank" title="Add to Mendeley"><img src="/resource/img/icon.mendeley.16.png" width="16" height="16" alt="Mendeley">Mendeley</a></li> <li><a href="https://twitter.com/intent/tweet?url=https%3A%2F%2Fdx.plos.org%2F10.1371%2Fjournal.pone.0125592&text=%23PLOSONE%3A%20Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings" target="_blank" title="share on Twitter" id="twitter-share-link"><img src="/resource/img/icon.twtr.16.png" width="16" height="16" alt="Twitter">Twitter</a></li> <li><a href="mailto:?subject=Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings&body=I%20thought%20you%20would%20find%20this%20article%20interesting.%20From%20PLOS ONE:%20https%3A%2F%2Fdx.plos.org%2F10.1371%2Fjournal.pone.0125592" id="shareEmail" rel="noreferrer" aria-label="Email"><img src="/resource/img/icon.email.16.png" width="16" height="16" alt="Email">Email</a></li> <script src="/resource/js/components/tweet140.js" type="text/javascript"></script> </ul> </div> </div>   <!-- Crossmark 2.0 widget --> <script src="https://crossmark-cdn.crossref.org/widget/v2.0/widget.js"></script> <a aria-label="Check for updates via CrossMark" data-target="crossmark"> <img alt="Check for updates via CrossMark" width="150" src="https://crossmark-cdn.crossref.org/widget/v2.0/logos/CROSSMARK_BW_horizontal.svg"> </a> <!-- End Crossmark 2.0 widget --> <div class="aside-container collections-aside-container"><!-- React Magic --></div> <div class="skyscraper-container"> <div class="title">Advertisement</div> <!-- DoubleClick Ad Zone --> <div class='advertisement' id='div-gpt-ad-1458247671871-1' style='width:160px; height:600px;'> <script type='text/javascript'> googletag.cmd.push(function() { googletag.display('div-gpt-ad-1458247671871-1'); }); </script> </div> </div> <div class="subject-areas-container"> <h3>Subject Areas <div id="subjInfo">?</div> <div id="subjInfoText"> <p>For more information about PLOS Subject Areas, click <a href="https://github.com/PLOS/plos-thesaurus/blob/master/README.md" target="_blank" title="Link opens in new window">here</a>.</p> <span class="inline-intro">We want your feedback.</span> Do these Subject Areas make sense for this article? Click the target next to the incorrect Subject Area and let us know. Thanks for your help! </div> </h3> <ul id="subjectList"> <li> <a class="taxo-term" title="Search for articles about Computational linguistics" href="/plosone/search?filterSubjects=Computational+linguistics&filterJournals=PLoSONE&q=">Computational linguistics</a> <span class="taxo-flag">&nbsp;</span> <div class="taxo-tooltip" data-categoryname="Computational linguistics"><p class="taxo-explain">Is the Subject Area <strong>"Computational linguistics"</strong> applicable to this article? <button id="noFlag" data-action="remove">Yes</button> <button id="flagIt" value="flagno" data-action="add">No</button></p> <p class="taxo-confirm">Thanks for your feedback.</p> </div> </li> <li> <a class="taxo-term" title="Search for articles about Language" href="/plosone/search?filterSubjects=Language&filterJournals=PLoSONE&q=">Language</a> <span class="taxo-flag">&nbsp;</span> <div class="taxo-tooltip" data-categoryname="Language"><p class="taxo-explain">Is the Subject Area <strong>"Language"</strong> applicable to this article? <button id="noFlag" data-action="remove">Yes</button> <button id="flagIt" value="flagno" data-action="add">No</button></p> <p class="taxo-confirm">Thanks for your feedback.</p> </div> </li> <li> <a class="taxo-term" title="Search for articles about Semantics" href="/plosone/search?filterSubjects=Semantics&filterJournals=PLoSONE&q=">Semantics</a> <span class="taxo-flag">&nbsp;</span> <div class="taxo-tooltip" data-categoryname="Semantics"><p class="taxo-explain">Is the Subject Area <strong>"Semantics"</strong> applicable to this article? <button id="noFlag" data-action="remove">Yes</button> <button id="flagIt" value="flagno" data-action="add">No</button></p> <p class="taxo-confirm">Thanks for your feedback.</p> </div> </li> <li> <a class="taxo-term" title="Search for articles about Entropy" href="/plosone/search?filterSubjects=Entropy&filterJournals=PLoSONE&q=">Entropy</a> <span class="taxo-flag">&nbsp;</span> <div class="taxo-tooltip" data-categoryname="Entropy"><p class="taxo-explain">Is the Subject Area <strong>"Entropy"</strong> applicable to this article? <button id="noFlag" data-action="remove">Yes</button> <button id="flagIt" value="flagno" data-action="add">No</button></p> <p class="taxo-confirm">Thanks for your feedback.</p> </div> </li> <li> <a class="taxo-term" title="Search for articles about Distribution curves" href="/plosone/search?filterSubjects=Distribution+curves&filterJournals=PLoSONE&q=">Distribution curves</a> <span class="taxo-flag">&nbsp;</span> <div class="taxo-tooltip" data-categoryname="Distribution curves"><p class="taxo-explain">Is the Subject Area <strong>"Distribution curves"</strong> applicable to this article? <button id="noFlag" data-action="remove">Yes</button> <button id="flagIt" value="flagno" data-action="add">No</button></p> <p class="taxo-confirm">Thanks for your feedback.</p> </div> </li> <li> <a class="taxo-term" title="Search for articles about Information entropy" href="/plosone/search?filterSubjects=Information+entropy&filterJournals=PLoSONE&q=">Information entropy</a> <span class="taxo-flag">&nbsp;</span> <div class="taxo-tooltip" data-categoryname="Information entropy"><p class="taxo-explain">Is the Subject Area <strong>"Information entropy"</strong> applicable to this article? <button id="noFlag" data-action="remove">Yes</button> <button id="flagIt" value="flagno" data-action="add">No</button></p> <p class="taxo-confirm">Thanks for your feedback.</p> </div> </li> <li> <a class="taxo-term" title="Search for articles about Monkeys" href="/plosone/search?filterSubjects=Monkeys&filterJournals=PLoSONE&q=">Monkeys</a> <span class="taxo-flag">&nbsp;</span> <div class="taxo-tooltip" data-categoryname="Monkeys"><p class="taxo-explain">Is the Subject Area <strong>"Monkeys"</strong> applicable to this article? <button id="noFlag" data-action="remove">Yes</button> <button id="flagIt" value="flagno" data-action="add">No</button></p> <p class="taxo-confirm">Thanks for your feedback.</p> </div> </li> <li> <a class="taxo-term" title="Search for articles about Probability distribution" href="/plosone/search?filterSubjects=Probability+distribution&filterJournals=PLoSONE&q=">Probability distribution</a> <span class="taxo-flag">&nbsp;</span> <div class="taxo-tooltip" data-categoryname="Probability distribution"><p class="taxo-explain">Is the Subject Area <strong>"Probability distribution"</strong> applicable to this article? <button id="noFlag" data-action="remove">Yes</button> <button id="flagIt" value="flagno" data-action="add">No</button></p> <p class="taxo-confirm">Thanks for your feedback.</p> </div> </li> </ul> </div> <div id="subjectErrors"></div> </aside> </div> </main> <footer id="pageftr"> <div class="row"> <div class="block x-small"> <ul class="nav nav-secondary"> <li class="ftr-header"><a href="https://plos.org/publications/journals/">Publications</a></li> <li><a href="/plosbiology/" id="ftr-bio">PLOS Biology</a></li> <li><a href="/climate/" id="ftr-climate">PLOS Climate</a></li> <li><a href="/complexsystems/" id="ftr-complex-systems">PLOS Complex Systems</a></li> <li><a href="/ploscompbiol/" id="ftr-compbio">PLOS Computational Biology</a></li> <li><a href="/digitalhealth/" id="ftr-digitalhealth">PLOS Digital Health</a></li> <li><a href="/plosgenetics/" id="ftr-gen">PLOS Genetics</a></li> <li><a href="/globalpublichealth/" id="ftr-globalpublichealth">PLOS Global Public Health</a></li> </ul> </div> <div class="block x-small"> <ul class="nav nav-secondary"> <li class="ftr-header">&nbsp;</li> <li><a href="/plosmedicine/" id="ftr-med">PLOS Medicine</a></li> <li><a href="/mentalhealth/" id="ftr-mental-health">PLOS Mental Health</a></li> <li><a href="/plosntds/" id="ftr-ntds">PLOS Neglected Tropical Diseases</a></li> <li><a href="/plosone/" id="ftr-one">PLOS ONE</a></li> <li><a href="/plospathogens/" id="ftr-path">PLOS Pathogens</a></li> <li><a href="/sustainabilitytransformation/" id="ftr-sustainabilitytransformation">PLOS Sustainability and Transformation</a></li> <li><a href="/water/" id="ftr-water">PLOS Water</a></li> </ul> </div> <div class="block xx-small"> <ul class="nav nav-tertiary"> <li> <a href="https://plos.org" id="ftr-home">Home</a> </li> <li> <a href="https://blogs.plos.org" id="ftr-blog">Blogs</a> </li> <li> <a href="https://collections.plos.org/" id="ftr-collections">Collections</a> </li> <li> <a href="mailto:webmaster@plos.org" id="ftr-feedback">Give feedback</a> </li> <li> <a href="/plosone/lockss-manifest" id="ftr-lockss">LOCKSS</a> </li> </ul> </div> <div class="block xx-small"> <ul class="nav nav-primary"> <li><a href="https://plos.org/privacy-policy" id="ftr-privacy">Privacy Policy</a></li> <li><a href="https://plos.org/terms-of-use" id="ftr-terms">Terms of Use</a></li> <li><a href="https://plos.org/advertise/" id="ftr-advertise">Advertise</a></li> <li><a href="https://plos.org/media-inquiries" id="ftr-media">Media Inquiries</a></li> <li><a href="https://plos.org/contact" id="ftr-contact">Contact</a></li> </ul> </div> </div> <div class="row"> <p> <img src="/resource/img/logo-plos-footer.png" alt="PLOS" class="logo-footer"/> <span class="footer-non-profit-statement">PLOS is a nonprofit 501(c)(3) corporation, #C2354500, based in San Francisco, California, US</span> </p> <div class="block"> </div> </div> <script src="/resource/js/global.js" type="text/javascript"></script> </footer> <script type="text/javascript"> var ArticleData = { doi: '10.1371/journal.pone.0125592', title: '<article-title xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple Meanings<\/article-title>', date: 'May 08, 2015' }; </script> <script src="/resource/js/components/show_onscroll.js" type="text/javascript"></script> <script src="/resource/js/components/pagination.js" type="text/javascript"></script> <script src="/resource/js/vendor/spin.js" type="text/javascript"></script> <script src="/resource/js/pages/article.js" type="text/javascript"></script> <script src="/resource/js/pages/article_references.js" type="text/javascript"></script> <script src="/resource/js/pages/article_sidebar.js" type="text/javascript"></script> <script src="/resource/js/vendor/foundation/foundation.dropdown.js" type="text/javascript"></script> <script src="/resource/js/components/table_open.js" type="text/javascript"></script> <script src="/resource/js/components/figshare.js" type="text/javascript"></script> <script src="/resource/js/vendor/jquery.panzoom.min.js" type="text/javascript"></script> <script src="/resource/js/vendor/jquery.mousewheel.js" type="text/javascript"></script> <script src="/resource/js/components/lightbox.js" type="text/javascript"></script> <script src="/resource/js/pages/article_body.js" type="text/javascript"></script> <!-- This file should be loaded before the renderJs, to avoid conflicts with the FigShare, that implements the MathJax also. --> <!-- mathjax configuration options --> <!-- more can be found at http://docs.mathjax.org/en/latest/ --> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ "HTML-CSS": { scale: 100, availableFonts: ["STIX","TeX"], preferredFont: "STIX", webFont: "STIX-Web", linebreaks: { automatic: false } }, jax: ["input/MathML", "output/HTML-CSS"] }); </script> <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=MML_HTMLorMML"></script> <div class="reveal-modal-bg"></div> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10