CINXE.COM
PEP 624 – Remove Py_UNICODE encoder APIs | peps.python.org
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="color-scheme" content="light dark"> <title>PEP 624 – Remove Py_UNICODE encoder APIs | peps.python.org</title> <link rel="shortcut icon" href="../_static/py.png"> <link rel="canonical" href="https://peps.python.org/pep-0624/"> <link rel="stylesheet" href="../_static/style.css" type="text/css"> <link rel="stylesheet" href="../_static/mq.css" type="text/css"> <link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light"> <link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark"> <link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss"> <meta property="og:title" content='PEP 624 – Remove Py_UNICODE encoder APIs | peps.python.org'> <meta property="og:description" content="This PEP proposes to remove deprecated Py_UNICODE encoder APIs in Python 3.11:"> <meta property="og:type" content="website"> <meta property="og:url" content="https://peps.python.org/pep-0624/"> <meta property="og:site_name" content="Python Enhancement Proposals (PEPs)"> <meta property="og:image" content="https://peps.python.org/_static/og-image.png"> <meta property="og:image:alt" content="Python PEPs"> <meta property="og:image:width" content="200"> <meta property="og:image:height" content="200"> <meta name="description" content="This PEP proposes to remove deprecated Py_UNICODE encoder APIs in Python 3.11:"> <meta name="theme-color" content="#3776ab"> </head> <body> <svg xmlns="http://www.w3.org/2000/svg" style="display: none;"> <symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all"> <title>Following system colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <circle cx="12" cy="12" r="9"></circle> <path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path> </svg> </symbol> <symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all"> <title>Selected dark colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <path stroke="none" d="M0 0h24v24H0z" fill="none"></path> <path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path> </svg> </symbol> <symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all"> <title>Selected light colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <circle cx="12" cy="12" r="5"></circle> <line x1="12" y1="1" x2="12" y2="3"></line> <line x1="12" y1="21" x2="12" y2="23"></line> <line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line> <line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line> <line x1="1" y1="12" x2="3" y2="12"></line> <line x1="21" y1="12" x2="23" y2="12"></line> <line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line> <line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line> </svg> </symbol> </svg> <script> document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto" </script> <section id="pep-page-section"> <header> <h1>Python Enhancement Proposals</h1> <ul class="breadcrumbs"> <li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> » </li> <li><a href="../pep-0000/">PEP Index</a> » </li> <li>PEP 624</li> </ul> <button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())"> <svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg> <svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg> <svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg> <span class="visually-hidden">Toggle light / dark / auto colour theme</span> </button> </header> <article> <section id="pep-content"> <h1 class="page-title">PEP 624 – Remove Py_UNICODE encoder APIs</h1> <dl class="rfc2822 field-list simple"> <dt class="field-odd">Author<span class="colon">:</span></dt> <dd class="field-odd">Inada Naoki <songofacandy at gmail.com></dd> <dt class="field-even">Status<span class="colon">:</span></dt> <dd class="field-even"><abbr title="Accepted and implementation complete, or no longer active">Final</abbr></dd> <dt class="field-odd">Type<span class="colon">:</span></dt> <dd class="field-odd"><abbr title="Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem">Standards Track</abbr></dd> <dt class="field-even">Created<span class="colon">:</span></dt> <dd class="field-even">06-Jul-2020</dd> <dt class="field-odd">Python-Version<span class="colon">:</span></dt> <dd class="field-odd">3.11</dd> <dt class="field-even">Post-History<span class="colon">:</span></dt> <dd class="field-even">08-Jul-2020</dd> </dl> <hr class="docutils" /> <section id="contents"> <details><summary>Table of Contents</summary><ul class="simple"> <li><a class="reference internal" href="#abstract">Abstract</a></li> <li><a class="reference internal" href="#motivation">Motivation</a></li> <li><a class="reference internal" href="#rationale">Rationale</a><ul> <li><a class="reference internal" href="#deprecated-since-python-3-3">Deprecated since Python 3.3</a></li> <li><a class="reference internal" href="#inefficient">Inefficient</a></li> <li><a class="reference internal" href="#not-used-widely">Not used widely</a></li> </ul> </li> <li><a class="reference internal" href="#alternative-apis">Alternative APIs</a></li> <li><a class="reference internal" href="#plan">Plan</a></li> <li><a class="reference internal" href="#alternative-ideas">Alternative Ideas</a><ul> <li><a class="reference internal" href="#replace-py-unicode-with-pyobject">Replace <code class="docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code> with <code class="docutils literal notranslate"><span class="pre">PyObject*</span></code></a></li> <li><a class="reference internal" href="#replace-py-unicode-with-py-ucs4">Replace <code class="docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code> with <code class="docutils literal notranslate"><span class="pre">Py_UCS4*</span></code></a></li> <li><a class="reference internal" href="#replace-py-unicode-with-wchar-t">Replace <code class="docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code> with <code class="docutils literal notranslate"><span class="pre">wchar_t*</span></code></a></li> </ul> </li> <li><a class="reference internal" href="#rejected-ideas">Rejected Ideas</a><ul> <li><a class="reference internal" href="#emit-runtime-warning">Emit runtime warning</a></li> </ul> </li> <li><a class="reference internal" href="#discussions">Discussions</a><ul> <li><a class="reference internal" href="#objections">Objections</a></li> </ul> </li> <li><a class="reference internal" href="#references">References</a></li> <li><a class="reference internal" href="#copyright">Copyright</a></li> </ul> </details></section> <section id="abstract"> <h2><a class="toc-backref" href="#abstract" role="doc-backlink">Abstract</a></h2> <p>This PEP proposes to remove deprecated <code class="docutils literal notranslate"><span class="pre">Py_UNICODE</span></code> encoder APIs in Python 3.11:</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_Encode()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeASCII()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeLatin1()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF7()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF8()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF16()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF32()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUnicodeEscape()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeRawUnicodeEscape()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeCharmap()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_TranslateCharmap()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeDecimal()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_TransformDecimalToASCII()</span></code></li> </ul> <div class="admonition note"> <p class="admonition-title">Note</p> <p><a class="pep reference internal" href="../pep-0623/" title="PEP 623 – Remove wstr from Unicode">PEP 623</a> propose to remove Unicode object APIs relating to <code class="docutils literal notranslate"><span class="pre">Py_UNICODE</span></code>. On the other hand, this PEP is not relating to Unicode object. These PEPs are split because they have different motivations and need different discussions.</p> </div> </section> <section id="motivation"> <h2><a class="toc-backref" href="#motivation" role="doc-backlink">Motivation</a></h2> <p>In general, reducing the number of APIs that have been deprecated for a long time and have few users is a good idea for not only it improves the maintainability of CPython, but it also helps API users and other Python implementations.</p> </section> <section id="rationale"> <h2><a class="toc-backref" href="#rationale" role="doc-backlink">Rationale</a></h2> <section id="deprecated-since-python-3-3"> <h3><a class="toc-backref" href="#deprecated-since-python-3-3" role="doc-backlink">Deprecated since Python 3.3</a></h3> <p><code class="docutils literal notranslate"><span class="pre">Py_UNICODE</span></code> and APIs using it has been deprecated since Python 3.3.</p> </section> <section id="inefficient"> <h3><a class="toc-backref" href="#inefficient" role="doc-backlink">Inefficient</a></h3> <p>All of these APIs are implemented using <code class="docutils literal notranslate"><span class="pre">PyUnicode_FromWideChar</span></code>. So these APIs are inefficient when user want to encode Unicode object.</p> </section> <section id="not-used-widely"> <h3><a class="toc-backref" href="#not-used-widely" role="doc-backlink">Not used widely</a></h3> <p>When searching from the top 4000 PyPI packages <a class="footnote-reference brackets" href="#id3" id="id1">[1]</a>, only pyodbc use these APIs.</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF8()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF16()</span></code></li> </ul> <p>pyodbc uses these APIs to encode Unicode object into bytes object. So it is easy to fix it. <a class="footnote-reference brackets" href="#id4" id="id2">[2]</a></p> </section> </section> <section id="alternative-apis"> <h2><a class="toc-backref" href="#alternative-apis" role="doc-backlink">Alternative APIs</a></h2> <p>There are alternative APIs to accept <code class="docutils literal notranslate"><span class="pre">PyObject</span> <span class="pre">*unicode</span></code> instead of <code class="docutils literal notranslate"><span class="pre">Py_UNICODE</span> <span class="pre">*</span></code>. Users can migrate to them.</p> <table class="docutils align-default"> <thead> <tr class="row-odd"><th class="head">Deprecated API</th> <th class="head">Alternative APIs</th> </tr> </thead> <tbody> <tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_Encode()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsEncodedString()</span></code></td> </tr> <tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeASCII()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsASCIIString()</span></code> (1)</td> </tr> <tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeLatin1()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsLatin1String()</span></code> (1)</td> </tr> <tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF7()</span></code></td> <td>(2)</td> </tr> <tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF8()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsUTF8String()</span></code> (1)</td> </tr> <tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF16()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsUTF16String()</span></code> (3)</td> </tr> <tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF32()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsUTF32String()</span></code> (3)</td> </tr> <tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUnicodeEscape()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsUnicodeEscapeString()</span></code></td> </tr> <tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeRawUnicodeEscape()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsRawUnicodeEscapeString()</span></code></td> </tr> <tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeCharmap()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">PyUnicode_AsCharmapString()</span></code> (1)</td> </tr> <tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_TranslateCharmap()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">PyUnicode_Translate()</span></code></td> </tr> <tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeDecimal()</span></code></td> <td>(4)</td> </tr> <tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_TransformDecimalToASCII()</span></code></td> <td>(4)</td> </tr> </tbody> </table> <p>Notes:</p> <ol class="arabic simple"> <li><code class="docutils literal notranslate"><span class="pre">const</span> <span class="pre">char</span> <span class="pre">*errors</span></code> parameter is missing.</li> <li>There is no public alternative API. But user can use generic <code class="docutils literal notranslate"><span class="pre">PyUnicode_AsEncodedString()</span></code> instead.</li> <li><code class="docutils literal notranslate"><span class="pre">const</span> <span class="pre">char</span> <span class="pre">*errors,</span> <span class="pre">int</span> <span class="pre">byteorder</span></code> parameters are missing.</li> <li>There is no direct replacement. But <code class="docutils literal notranslate"><span class="pre">Py_UNICODE_TODECIMAL</span></code> can be used instead. CPython uses <code class="docutils literal notranslate"><span class="pre">_PyUnicode_TransformDecimalAndSpaceToASCII</span></code> for converting from Unicode to numbers instead.</li> </ol> </section> <section id="plan"> <h2><a class="toc-backref" href="#plan" role="doc-backlink">Plan</a></h2> <p>Remove these APIs in Python 3.11. They have been deprecated already.</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_Encode()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeASCII()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeLatin1()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF7()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF8()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF16()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF32()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUnicodeEscape()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeRawUnicodeEscape()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeCharmap()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_TranslateCharmap()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeDecimal()</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_TransformDecimalToASCII()</span></code></li> </ul> </section> <section id="alternative-ideas"> <h2><a class="toc-backref" href="#alternative-ideas" role="doc-backlink">Alternative Ideas</a></h2> <section id="replace-py-unicode-with-pyobject"> <h3><a class="toc-backref" href="#replace-py-unicode-with-pyobject" role="doc-backlink">Replace <code class="docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code> with <code class="docutils literal notranslate"><span class="pre">PyObject*</span></code></a></h3> <p>As described in the “Alternative APIs” section, some APIs don’t have public alternative APIs accepting <code class="docutils literal notranslate"><span class="pre">PyObject</span> <span class="pre">*unicode</span></code> input. And some public alternative APIs have restrictions like missing <code class="docutils literal notranslate"><span class="pre">errors</span></code> and <code class="docutils literal notranslate"><span class="pre">byteorder</span></code> parameters.</p> <p>Instead of removing deprecated APIs, we can reuse their names for alternative public APIs.</p> <p>Since we have private alternative APIs already, it is just renaming from private name to public and deprecated names.</p> <table class="docutils align-default"> <thead> <tr class="row-odd"><th class="head">Rename to</th> <th class="head">Rename from</th> </tr> </thead> <tbody> <tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeASCII()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">_PyUnicode_AsASCIIString()</span></code></td> </tr> <tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeLatin1()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">_PyUnicode_AsLatin1String()</span></code></td> </tr> <tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF7()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">_PyUnicode_EncodeUTF7()</span></code></td> </tr> <tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF8()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">_PyUnicode_AsUTF8String()</span></code></td> </tr> <tr class="row-even"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF16()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">_PyUnicode_EncodeUTF16()</span></code></td> </tr> <tr class="row-odd"><td><code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF32()</span></code></td> <td><code class="docutils literal notranslate"><span class="pre">_PyUnicode_EncodeUTF32()</span></code></td> </tr> </tbody> </table> <p>Pros:</p> <ul class="simple"> <li>We have a more consistent API set.</li> </ul> <p>Cons:</p> <ul class="simple"> <li>Backward incompatible.</li> <li>We have more public APIs to maintain for rare use cases.</li> <li>Existing public APIs are enough for most use cases, and <code class="docutils literal notranslate"><span class="pre">PyUnicode_AsEncodedString()</span></code> can be used in other cases.</li> </ul> </section> <section id="replace-py-unicode-with-py-ucs4"> <h3><a class="toc-backref" href="#replace-py-unicode-with-py-ucs4" role="doc-backlink">Replace <code class="docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code> with <code class="docutils literal notranslate"><span class="pre">Py_UCS4*</span></code></a></h3> <p>We can replace <code class="docutils literal notranslate"><span class="pre">Py_UNICODE</span></code> with <code class="docutils literal notranslate"><span class="pre">Py_UCS4</span></code> and undeprecate these APIs.</p> <p>UTF-8, UTF-16, UTF-32 encoders support <code class="docutils literal notranslate"><span class="pre">Py_UCS4</span></code> internally. So <code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF8()</span></code>, <code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF16()</span></code>, and <code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF32()</span></code> can avoid to create a temporary Unicode object.</p> <p>Pros:</p> <ul class="simple"> <li>We can avoid creating temporary Unicode object when encoding from <code class="docutils literal notranslate"><span class="pre">Py_UCS4*</span></code> into bytes object with UTF-8, UTF-16, UTF-32 codecs.</li> </ul> <p>Cons:</p> <ul class="simple"> <li>Backward incompatible.</li> <li>We have more public APIs to maintain for rare use cases.</li> <li>Other Python implementations that want to support Python/C API need to support these APIs too.</li> <li>If we change the Unicode internal representation to UTF-8 in the future, we need to keep UCS-4 support only for these APIs.</li> </ul> </section> <section id="replace-py-unicode-with-wchar-t"> <h3><a class="toc-backref" href="#replace-py-unicode-with-wchar-t" role="doc-backlink">Replace <code class="docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code> with <code class="docutils literal notranslate"><span class="pre">wchar_t*</span></code></a></h3> <p>We can replace <code class="docutils literal notranslate"><span class="pre">Py_UNICODE</span></code> with <code class="docutils literal notranslate"><span class="pre">wchar_t</span></code>. Since <code class="docutils literal notranslate"><span class="pre">Py_UNICODE</span></code> is typedef of <code class="docutils literal notranslate"><span class="pre">wchar_t</span></code> already, this is status quo.</p> <p>On platforms where <code class="docutils literal notranslate"><span class="pre">sizeof(wchar_t)</span> <span class="pre">==</span> <span class="pre">4</span></code>, we can avoid to create a temporary Unicode object when encoding from <code class="docutils literal notranslate"><span class="pre">wchar_t*</span></code> to bytes objects using UTF-8, UTF-16, and UTF-32 codec, like the “Replace <code class="docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code> with <code class="docutils literal notranslate"><span class="pre">Py_UCS4*</span></code>” idea.</p> <p>Pros:</p> <ul class="simple"> <li>Backward compatible.</li> <li>We can avoid creating temporary Unicode object when encode from <code class="docutils literal notranslate"><span class="pre">Py_UCS4*</span></code> into bytes object with UTF-8, UTF-16, UTF-32 codecs on platform where <code class="docutils literal notranslate"><span class="pre">sizeof(wchar_t)</span> <span class="pre">==</span> <span class="pre">4</span></code>.</li> </ul> <p>Cons:</p> <ul class="simple"> <li>Although Windows is the most major platform that uses <code class="docutils literal notranslate"><span class="pre">wchar_t</span></code> heavily, these APIs need to create a temporary Unicode object always because <code class="docutils literal notranslate"><span class="pre">sizeof(wchar_t)</span> <span class="pre">==</span> <span class="pre">2</span></code> on Windows.</li> <li>We have more public APIs to maintain for rare use cases.</li> <li>Other Python implementations that want to support Python/C API need to support these APIs too.</li> <li>If we change the Unicode internal representation to UTF-8 in the future, we need to keep UCS-4 support only for these APIs.</li> </ul> </section> </section> <section id="rejected-ideas"> <h2><a class="toc-backref" href="#rejected-ideas" role="doc-backlink">Rejected Ideas</a></h2> <section id="emit-runtime-warning"> <h3><a class="toc-backref" href="#emit-runtime-warning" role="doc-backlink">Emit runtime warning</a></h3> <p>In addition to existing compiler warning, emitting runtime <code class="docutils literal notranslate"><span class="pre">DeprecationWarning</span></code> is suggested.</p> <p>But these APIs doesn’t release GIL for now. Emitting a warning from such APIs is not safe. See this example.</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">PyObject</span> <span class="o">*</span><span class="n">u</span> <span class="o">=</span> <span class="n">PyList_GET_ITEM</span><span class="p">(</span><span class="nb">list</span><span class="p">,</span> <span class="n">i</span><span class="p">);</span> <span class="o">//</span> <span class="n">u</span> <span class="ow">is</span> <span class="n">borrowed</span> <span class="n">reference</span><span class="o">.</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">b</span> <span class="o">=</span> <span class="n">PyUnicode_EncodeUTF8</span><span class="p">(</span><span class="n">PyUnicode_AS_UNICODE</span><span class="p">(</span><span class="n">u</span><span class="p">),</span> <span class="n">PyUnicode_GET_SIZE</span><span class="p">(</span><span class="n">u</span><span class="p">),</span> <span class="n">NULL</span><span class="p">);</span> <span class="o">//</span> <span class="n">Assumes</span> <span class="n">u</span> <span class="ow">is</span> <span class="n">still</span> <span class="n">living</span> <span class="n">reference</span><span class="o">.</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">t</span> <span class="o">=</span> <span class="n">PyTuple_Pack</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">u</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span> <span class="n">Py_DECREF</span><span class="p">(</span><span class="n">b</span><span class="p">);</span> <span class="k">return</span> <span class="n">t</span><span class="p">;</span> </pre></div> </div> <p>If we emit Python warning from <code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF8()</span></code>, warning filters and other threads may change the <code class="docutils literal notranslate"><span class="pre">list</span></code> and <code class="docutils literal notranslate"><span class="pre">u</span></code> can be a dangling reference after <code class="docutils literal notranslate"><span class="pre">PyUnicode_EncodeUTF8()</span></code> returned.</p> </section> </section> <section id="discussions"> <h2><a class="toc-backref" href="#discussions" role="doc-backlink">Discussions</a></h2> <ul class="simple"> <li><a class="reference external" href="https://mail.python.org/archives/list/python-dev@python.org/thread/S7KW2U6IGXZFBMGS6WSJB26NZIBW4OLE/#S7KW2U6IGXZFBMGS6WSJB26NZIBW4OLE">[python-dev] Plan to remove Py_UNICODE APis except PEP 623</a></li> <li><a class="reference external" href="https://bugs.python.org/issue41123">bpo-41123: Remove Py_UNICODE APIs except PEP 623</a></li> <li><a class="reference external" href="https://mail.python.org/archives/list/python-dev@python.org/thread/THXVM7FZVT56B7CPEDIYKJG6VMAYIEK5/#QUGBVLQNBFVNX25AEIL77WSFOHQES6LJ">[python-dev] PEP 624: Remove Py_UNICODE encoder APIs</a></li> </ul> <section id="objections"> <h3><a class="toc-backref" href="#objections" role="doc-backlink">Objections</a></h3> <ul class="simple"> <li>Removing these APIs removes ability to use codec without temporary Unicode.<ul> <li>Codecs can not encode Unicode buffer directly without temporary Unicode object since Python 3.3. All these APIs creates temporary Unicode object for now. So removing them doesn’t reduce any abilities.</li> </ul> </li> <li>Why not remove decoder APIs too?<ul> <li>They are part of stable ABI.</li> <li><code class="docutils literal notranslate"><span class="pre">PyUnicode_DecodeASCII()</span></code> and <code class="docutils literal notranslate"><span class="pre">PyUnicode_DecodeUTF8()</span></code> are used very widely. Deprecating them is not worth enough.</li> <li>Decoder APIs can decode from byte buffer directly, without creating temporary bytes object. On the other hand, encoder APIs can not avoid temporary Unicode object.</li> </ul> </li> </ul> </section> </section> <section id="references"> <h2><a class="toc-backref" href="#references" role="doc-backlink">References</a></h2> <aside class="footnote-list brackets"> <aside class="footnote brackets" id="id3" role="doc-footnote"> <dt class="label" id="id3">[<a href="#id1">1</a>]</dt> <dd>Source package list chosen from top 4000 PyPI packages. (<a class="reference external" href="https://github.com/methane/notes/blob/master/2020/wchar-cache/package_list.txt">https://github.com/methane/notes/blob/master/2020/wchar-cache/package_list.txt</a>)</aside> <aside class="footnote brackets" id="id4" role="doc-footnote"> <dt class="label" id="id4">[<a href="#id2">2</a>]</dt> <dd>pyodbc – Don’t use PyUnicode_Encode API #792 (<a class="reference external" href="https://github.com/mkleehammer/pyodbc/pull/792">https://github.com/mkleehammer/pyodbc/pull/792</a>)</aside> </aside> </section> <section id="copyright"> <h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2> <p>This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.</p> </section> </section> <hr class="docutils" /> <p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-0624.rst">https://github.com/python/peps/blob/main/peps/pep-0624.rst</a></p> <p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-0624.rst">2025-02-01 08:55:40 GMT</a></p> </article> <nav id="pep-sidebar"> <h2>Contents</h2> <ul> <li><a class="reference internal" href="#abstract">Abstract</a></li> <li><a class="reference internal" href="#motivation">Motivation</a></li> <li><a class="reference internal" href="#rationale">Rationale</a><ul> <li><a class="reference internal" href="#deprecated-since-python-3-3">Deprecated since Python 3.3</a></li> <li><a class="reference internal" href="#inefficient">Inefficient</a></li> <li><a class="reference internal" href="#not-used-widely">Not used widely</a></li> </ul> </li> <li><a class="reference internal" href="#alternative-apis">Alternative APIs</a></li> <li><a class="reference internal" href="#plan">Plan</a></li> <li><a class="reference internal" href="#alternative-ideas">Alternative Ideas</a><ul> <li><a class="reference internal" href="#replace-py-unicode-with-pyobject">Replace <code class="docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code> with <code class="docutils literal notranslate"><span class="pre">PyObject*</span></code></a></li> <li><a class="reference internal" href="#replace-py-unicode-with-py-ucs4">Replace <code class="docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code> with <code class="docutils literal notranslate"><span class="pre">Py_UCS4*</span></code></a></li> <li><a class="reference internal" href="#replace-py-unicode-with-wchar-t">Replace <code class="docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code> with <code class="docutils literal notranslate"><span class="pre">wchar_t*</span></code></a></li> </ul> </li> <li><a class="reference internal" href="#rejected-ideas">Rejected Ideas</a><ul> <li><a class="reference internal" href="#emit-runtime-warning">Emit runtime warning</a></li> </ul> </li> <li><a class="reference internal" href="#discussions">Discussions</a><ul> <li><a class="reference internal" href="#objections">Objections</a></li> </ul> </li> <li><a class="reference internal" href="#references">References</a></li> <li><a class="reference internal" href="#copyright">Copyright</a></li> </ul> <br> <a id="source" href="https://github.com/python/peps/blob/main/peps/pep-0624.rst">Page Source (GitHub)</a> </nav> </section> <script src="../_static/colour_scheme.js"></script> <script src="../_static/wrap_tables.js"></script> <script src="../_static/sticky_banner.js"></script> </body> </html>