CINXE.COM

PEP 3131 – Supporting Non-ASCII Identifiers | peps.python.org

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="color-scheme" content="light dark"> <title>PEP 3131 – Supporting Non-ASCII Identifiers | peps.python.org</title> <link rel="shortcut icon" href="../_static/py.png"> <link rel="canonical" href="https://peps.python.org/pep-3131/"> <link rel="stylesheet" href="../_static/style.css" type="text/css"> <link rel="stylesheet" href="../_static/mq.css" type="text/css"> <link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light"> <link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark"> <link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss"> <meta property="og:title" content='PEP 3131 – Supporting Non-ASCII Identifiers | peps.python.org'> <meta property="og:description" content="This PEP suggests to support non-ASCII letters (such as accented characters, Cyrillic, Greek, Kanji, etc.) in Python identifiers."> <meta property="og:type" content="website"> <meta property="og:url" content="https://peps.python.org/pep-3131/"> <meta property="og:site_name" content="Python Enhancement Proposals (PEPs)"> <meta property="og:image" content="https://peps.python.org/_static/og-image.png"> <meta property="og:image:alt" content="Python PEPs"> <meta property="og:image:width" content="200"> <meta property="og:image:height" content="200"> <meta name="description" content="This PEP suggests to support non-ASCII letters (such as accented characters, Cyrillic, Greek, Kanji, etc.) in Python identifiers."> <meta name="theme-color" content="#3776ab"> </head> <body> <svg xmlns="http://www.w3.org/2000/svg" style="display: none;"> <symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all"> <title>Following system colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <circle cx="12" cy="12" r="9"></circle> <path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path> </svg> </symbol> <symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all"> <title>Selected dark colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <path stroke="none" d="M0 0h24v24H0z" fill="none"></path> <path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path> </svg> </symbol> <symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all"> <title>Selected light colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <circle cx="12" cy="12" r="5"></circle> <line x1="12" y1="1" x2="12" y2="3"></line> <line x1="12" y1="21" x2="12" y2="23"></line> <line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line> <line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line> <line x1="1" y1="12" x2="3" y2="12"></line> <line x1="21" y1="12" x2="23" y2="12"></line> <line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line> <line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line> </svg> </symbol> </svg> <script> document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto" </script> <section id="pep-page-section"> <header> <h1>Python Enhancement Proposals</h1> <ul class="breadcrumbs"> <li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> &raquo; </li> <li><a href="../pep-0000/">PEP Index</a> &raquo; </li> <li>PEP 3131</li> </ul> <button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())"> <svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg> <svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg> <svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg> <span class="visually-hidden">Toggle light / dark / auto colour theme</span> </button> </header> <article> <section id="pep-content"> <h1 class="page-title">PEP 3131 – Supporting Non-ASCII Identifiers</h1> <dl class="rfc2822 field-list simple"> <dt class="field-odd">Author<span class="colon">:</span></dt> <dd class="field-odd">Martin von Löwis &lt;martin&#32;&#97;t&#32;v.loewis.de&gt;</dd> <dt class="field-even">Status<span class="colon">:</span></dt> <dd class="field-even"><abbr title="Accepted and implementation complete, or no longer active">Final</abbr></dd> <dt class="field-odd">Type<span class="colon">:</span></dt> <dd class="field-odd"><abbr title="Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem">Standards Track</abbr></dd> <dt class="field-even">Created<span class="colon">:</span></dt> <dd class="field-even">01-May-2007</dd> <dt class="field-odd">Python-Version<span class="colon">:</span></dt> <dd class="field-odd">3.0</dd> <dt class="field-even">Post-History<span class="colon">:</span></dt> <dd class="field-even"><p></p></dd> </dl> <hr class="docutils" /> <section id="contents"> <details><summary>Table of Contents</summary><ul class="simple"> <li><a class="reference internal" href="#abstract">Abstract</a></li> <li><a class="reference internal" href="#rationale">Rationale</a></li> <li><a class="reference internal" href="#common-objections">Common Objections</a></li> <li><a class="reference internal" href="#specification-of-language-changes">Specification of Language Changes</a></li> <li><a class="reference internal" href="#policy-specification">Policy Specification</a></li> <li><a class="reference internal" href="#implementation">Implementation</a></li> <li><a class="reference internal" href="#open-issues">Open Issues</a></li> <li><a class="reference internal" href="#discussion">Discussion</a></li> <li><a class="reference internal" href="#copyright">Copyright</a></li> </ul> </details></section> <section id="abstract"> <h2><a class="toc-backref" href="#abstract" role="doc-backlink">Abstract</a></h2> <p>This PEP suggests to support non-ASCII letters (such as accented characters, Cyrillic, Greek, Kanji, etc.) in Python identifiers.</p> </section> <section id="rationale"> <h2><a class="toc-backref" href="#rationale" role="doc-backlink">Rationale</a></h2> <p>Python code is written by many people in the world who are not familiar with the English language, or even well-acquainted with the Latin writing system. Such developers often desire to define classes and functions with names in their native languages, rather than having to come up with an (often incorrect) English translation of the concept they want to name. By using identifiers in their native language, code clarity and maintainability of the code among speakers of that language improves.</p> <p>For some languages, common transliteration systems exist (in particular, for the Latin-based writing systems). For other languages, users have larger difficulties to use Latin to write their native words.</p> </section> <section id="common-objections"> <h2><a class="toc-backref" href="#common-objections" role="doc-backlink">Common Objections</a></h2> <p>Some objections are often raised against proposals similar to this one.</p> <p>People claim that they will not be able to use a library if to do so they have to use characters they cannot type on their keyboards. However, it is the choice of the designer of the library to decide on various constraints for using the library: people may not be able to use the library because they cannot get physical access to the source code (because it is not published), or because licensing prohibits usage, or because the documentation is in a language they cannot understand. A developer wishing to make a library widely available needs to make a number of explicit choices (such as publication, licensing, language of documentation, and language of identifiers). It should always be the choice of the author to make these decisions - not the choice of the language designers.</p> <p>In particular, projects wishing to have wide usage probably might want to establish a policy that all identifiers, comments, and documentation is written in English (see the GNU coding style guide for an example of such a policy). Restricting the language to ASCII-only identifiers does not enforce comments and documentation to be English, or the identifiers actually to be English words, so an additional policy is necessary, anyway.</p> </section> <section id="specification-of-language-changes"> <h2><a class="toc-backref" href="#specification-of-language-changes" role="doc-backlink">Specification of Language Changes</a></h2> <p>The syntax of identifiers in Python will be based on the <a class="reference external" href="https://www.unicode.org/reports/tr31/">Unicode standard annex UAX-31</a>, with elaboration and changes as defined below.</p> <p>Within the ASCII range (U+0001..U+007F), the valid characters for identifiers are the same as in Python 2.5. This specification only introduces additional characters from outside the ASCII range. For other characters, the classification uses the version of the Unicode Character Database as included in the <code class="docutils literal notranslate"><span class="pre">unicodedata</span></code> module.</p> <p>The identifier syntax is <code class="docutils literal notranslate"><span class="pre">&lt;XID_Start&gt;</span> <span class="pre">&lt;XID_Continue&gt;*</span></code>.</p> <p>The exact specification of what characters have the XID_Start or XID_Continue properties can be found in the <a class="reference external" href="https://www.unicode.org/Public/4.1.0/ucd/DerivedCoreProperties.txt">DerivedCoreProperties file</a> of the Unicode data in use by Python (4.1 at the time this PEP was written). For reference, the construction rules for these sets are given below. The XID_* properties are derived from ID_Start/ID_Continue, which are derived themselves.</p> <p><code class="docutils literal notranslate"><span class="pre">ID_Start</span></code> is defined as all characters having one of the general categories uppercase letters (Lu), lowercase letters (Ll), titlecase letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers (Nl), the underscore, and characters carrying the Other_ID_Start property. <code class="docutils literal notranslate"><span class="pre">XID_Start</span></code> then closes this set under normalization, by removing all characters whose NFKC normalization is not of the form ID_Start ID_Continue* anymore.</p> <p><code class="docutils literal notranslate"><span class="pre">ID_Continue</span></code> is defined as all characters in <code class="docutils literal notranslate"><span class="pre">ID_Start</span></code>, plus nonspacing marks (Mn), spacing combining marks (Mc), decimal number (Nd), connector punctuations (Pc), and characters carrying the Other_ID_Continue property. Again, <code class="docutils literal notranslate"><span class="pre">XID_Continue</span></code> closes this set under NFKC-normalization; it also adds U+00B7 to support Catalan.</p> <p>All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.</p> <p>A non-normative HTML file listing all valid identifier characters for Unicode 4.1 can be found at <a class="reference external" href="https://web.archive.org/web/20081016132748/http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html">https://web.archive.org/web/20081016132748/http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html</a>.</p> </section> <section id="policy-specification"> <h2><a class="toc-backref" href="#policy-specification" role="doc-backlink">Policy Specification</a></h2> <p>As an addition to the Python Coding style, the following policy is prescribed: All identifiers in the Python standard library MUST use ASCII-only identifiers, and SHOULD use English words wherever feasible (in many cases, abbreviations and technical terms are used which aren’t English). In addition, string literals and comments must also be in ASCII. The only exceptions are (a) test cases testing the non-ASCII features, and (b) names of authors. Authors whose names are not based on the Latin alphabet MUST provide a Latin transliteration of their names.</p> <p>As an option, this specification can be applied to Python 2.x. In that case, ASCII-only identifiers would continue to be represented as byte string objects in namespace dictionaries; identifiers with non-ASCII characters would be represented as Unicode strings.</p> </section> <section id="implementation"> <h2><a class="toc-backref" href="#implementation" role="doc-backlink">Implementation</a></h2> <p>The following changes will need to be made to the parser:</p> <ol class="arabic simple"> <li>If a non-ASCII character is found in the UTF-8 representation of the source code, a forward scan is made to find the first ASCII non-identifier character (e.g. a space or punctuation character)</li> <li>The entire UTF-8 string is passed to a function to normalize the string to NFKC, and then verify that it follows the identifier syntax. No such callout is made for pure-ASCII identifiers, which continue to be parsed the way they are today. The Unicode database must start including the Other_ID_{Start|Continue} property.</li> <li>If this specification is implemented for 2.x, reflective libraries (such as pydoc) must be verified to continue to work when Unicode strings appear in <code class="docutils literal notranslate"><span class="pre">__dict__</span></code> slots as keys.</li> </ol> </section> <section id="open-issues"> <h2><a class="toc-backref" href="#open-issues" role="doc-backlink">Open Issues</a></h2> <p>John Nagle suggested consideration of <a class="reference external" href="https://www.unicode.org/reports/tr39/">Unicode Technical Standard #39</a>, which discusses security mechanisms for Unicode identifiers. It’s not clear how that can precisely apply to this PEP; possible consequences are</p> <ul class="simple"> <li>warn about characters listed as “restricted” in xidmodifications.txt</li> <li>warn about identifiers using mixed scripts</li> <li>somehow perform Confusable Detection</li> </ul> <p>In the latter two approaches, it’s not clear how precisely the algorithm should work. For mixed scripts, certain kinds of mixing should probably allowed - are these the “Common” and “Inherited” scripts mentioned in section 5? For Confusable Detection, it seems one needs two identifiers to compare them for confusion - is it possible to somehow apply it to a single identifier only, and warn?</p> <p>In follow-up discussion, it turns out that John Nagle actually meant to suggest <a class="reference external" href="https://www.unicode.org/reports/tr36/">UTR#36</a>, level “Highly Restrictive”.</p> <p>Several people suggested to allow and ignore formatting control characters (general category Cf), as is done in Java, JavaScript, and C#. It’s not clear whether this would improve things (it might for RTL languages); if there is a need, these can be added later.</p> <p>Some people would like to see an option on selecting support for this PEP at run-time; opinions vary on what precisely that option should be, and what precisely its default value should be. <a class="reference external" href="https://mail.python.org/pipermail/python-3000/2007-May/007925.html">Guido van Rossum commented</a> that a global flag passed to the interpreter is not acceptable, as it would apply to all modules.</p> </section> <section id="discussion"> <h2><a class="toc-backref" href="#discussion" role="doc-backlink">Discussion</a></h2> <p><a class="reference external" href="https://mail.python.org/pipermail/python-3000/2007-June/008161.html">Ka-Ping Yee summarizes discussion and further objection</a> as such:</p> <ol class="upperalpha"> <li>Should identifiers be allowed to contain any Unicode letter?<p>Drawbacks of allowing non-ASCII identifiers wholesale:</p> <ol class="arabic simple"> <li>Python will lose the ability to make a reliable round trip to a human-readable display on screen or on paper.</li> <li>Python will become vulnerable to a new class of security exploits; code and submitted patches will be much harder to inspect.</li> <li>Humans will no longer be able to validate Python syntax.</li> <li>Unicode is young; its problems are not yet well understood and solved; tool support is weak.</li> <li>Languages with non-ASCII identifiers use different character sets and normalization schemes; <a class="pep reference internal" href="../pep-3131/" title="PEP 3131 – Supporting Non-ASCII Identifiers">PEP 3131</a>’s choices are non-obvious.</li> <li>The Unicode bidi algorithm yields an extremely confusing display order for RTL text when digits or operators are nearby.</li> </ol> </li> <li>Should the default behaviour accept only ASCII identifiers, or should it accept identifiers containing non-ASCII characters?<p>Arguments for ASCII only by default:</p> <ol class="arabic simple"> <li>Non-ASCII identifiers by default makes common practice/assumptions subtly/unknowingly wrong; rarely wrong is worse than obviously wrong.</li> <li>Better to raise a warning than to fail silently when encountering a probably unexpected situation.</li> <li>All of current usage is ASCII-only; the vast majority of future usage will be ASCII-only.</li> </ol> <ol class="arabic simple" start="3"> <li>It is the pockets of Unicode adoption that are parochial, not the ASCII advocates.</li> <li>Python should audit for ASCII-only identifiers for the same reasons that it audits for tab-space consistency</li> <li>Incremental change is safer.</li> <li>An ASCII-only default favors open-source development and sharing of source code.</li> <li>Existing projects won’t have to waste any brainpower worrying about the implications of Unicode identifiers.</li> </ol> </li> <li>Should non-ASCII identifiers be optional?<p>Various voices in support of a flag (although there’s been debate over which should be the default, no one seems to be saying that there shouldn’t be an off switch)</p> </li> <li>Should the identifier character set be configurable?<p>Various voices proposing and supporting a selectable character set, so that users can get all the benefits of using their own language without the drawbacks of confusable/unfamiliar characters</p> </li> <li>Which identifier characters should be allowed?<ol class="arabic simple"> <li>What to do about bidi format control characters?</li> <li>What about other ID_Continue characters? What about characters that look like punctuation? What about other recommendations in UTS #39? What about mixed-script identifiers?</li> </ol> </li> <li>Which normalization form should be used, NFC or NFKC?</li> <li>Should source code be required to be in normalized form?</li> </ol> </section> <section id="copyright"> <h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2> <p>This document has been placed in the public domain.</p> </section> </section> <hr class="docutils" /> <p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-3131.rst">https://github.com/python/peps/blob/main/peps/pep-3131.rst</a></p> <p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-3131.rst">2025-02-11 05:10:05 GMT</a></p> </article> <nav id="pep-sidebar"> <h2>Contents</h2> <ul> <li><a class="reference internal" href="#abstract">Abstract</a></li> <li><a class="reference internal" href="#rationale">Rationale</a></li> <li><a class="reference internal" href="#common-objections">Common Objections</a></li> <li><a class="reference internal" href="#specification-of-language-changes">Specification of Language Changes</a></li> <li><a class="reference internal" href="#policy-specification">Policy Specification</a></li> <li><a class="reference internal" href="#implementation">Implementation</a></li> <li><a class="reference internal" href="#open-issues">Open Issues</a></li> <li><a class="reference internal" href="#discussion">Discussion</a></li> <li><a class="reference internal" href="#copyright">Copyright</a></li> </ul> <br> <a id="source" href="https://github.com/python/peps/blob/main/peps/pep-3131.rst">Page Source (GitHub)</a> </nav> </section> <script src="../_static/colour_scheme.js"></script> <script src="../_static/wrap_tables.js"></script> <script src="../_static/sticky_banner.js"></script> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10