CINXE.COM

PEP 3112 – Bytes literals in Python 3000 | peps.python.org

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="color-scheme" content="light dark"> <title>PEP 3112 – Bytes literals in Python 3000 | peps.python.org</title> <link rel="shortcut icon" href="../_static/py.png"> <link rel="canonical" href="https://peps.python.org/pep-3112/"> <link rel="stylesheet" href="../_static/style.css" type="text/css"> <link rel="stylesheet" href="../_static/mq.css" type="text/css"> <link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light"> <link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark"> <link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss"> <meta property="og:title" content='PEP 3112 – Bytes literals in Python 3000 | peps.python.org'> <meta property="og:description" content="This PEP proposes a literal syntax for the bytes objects introduced in PEP 358. The purpose is to provide a convenient way to spell ASCII strings and arbitrary binary data."> <meta property="og:type" content="website"> <meta property="og:url" content="https://peps.python.org/pep-3112/"> <meta property="og:site_name" content="Python Enhancement Proposals (PEPs)"> <meta property="og:image" content="https://peps.python.org/_static/og-image.png"> <meta property="og:image:alt" content="Python PEPs"> <meta property="og:image:width" content="200"> <meta property="og:image:height" content="200"> <meta name="description" content="This PEP proposes a literal syntax for the bytes objects introduced in PEP 358. The purpose is to provide a convenient way to spell ASCII strings and arbitrary binary data."> <meta name="theme-color" content="#3776ab"> </head> <body> <svg xmlns="http://www.w3.org/2000/svg" style="display: none;"> <symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all"> <title>Following system colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <circle cx="12" cy="12" r="9"></circle> <path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path> </svg> </symbol> <symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all"> <title>Selected dark colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <path stroke="none" d="M0 0h24v24H0z" fill="none"></path> <path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path> </svg> </symbol> <symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all"> <title>Selected light colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <circle cx="12" cy="12" r="5"></circle> <line x1="12" y1="1" x2="12" y2="3"></line> <line x1="12" y1="21" x2="12" y2="23"></line> <line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line> <line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line> <line x1="1" y1="12" x2="3" y2="12"></line> <line x1="21" y1="12" x2="23" y2="12"></line> <line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line> <line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line> </svg> </symbol> </svg> <script> document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto" </script> <section id="pep-page-section"> <header> <h1>Python Enhancement Proposals</h1> <ul class="breadcrumbs"> <li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> &raquo; </li> <li><a href="../pep-0000/">PEP Index</a> &raquo; </li> <li>PEP 3112</li> </ul> <button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())"> <svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg> <svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg> <svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg> <span class="visually-hidden">Toggle light / dark / auto colour theme</span> </button> </header> <article> <section id="pep-content"> <h1 class="page-title">PEP 3112 – Bytes literals in Python 3000</h1> <dl class="rfc2822 field-list simple"> <dt class="field-odd">Author<span class="colon">:</span></dt> <dd class="field-odd">Jason Orendorff &lt;jason.orendorff&#32;&#97;t&#32;gmail.com&gt;</dd> <dt class="field-even">Status<span class="colon">:</span></dt> <dd class="field-even"><abbr title="Accepted and implementation complete, or no longer active">Final</abbr></dd> <dt class="field-odd">Type<span class="colon">:</span></dt> <dd class="field-odd"><abbr title="Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem">Standards Track</abbr></dd> <dt class="field-even">Requires<span class="colon">:</span></dt> <dd class="field-even"><a class="reference external" href="../pep-0358/">358</a></dd> <dt class="field-odd">Created<span class="colon">:</span></dt> <dd class="field-odd">23-Feb-2007</dd> <dt class="field-even">Python-Version<span class="colon">:</span></dt> <dd class="field-even">3.0</dd> <dt class="field-odd">Post-History<span class="colon">:</span></dt> <dd class="field-odd">23-Feb-2007</dd> </dl> <hr class="docutils" /> <section id="contents"> <details><summary>Table of Contents</summary><ul class="simple"> <li><a class="reference internal" href="#abstract">Abstract</a></li> <li><a class="reference internal" href="#motivation">Motivation</a></li> <li><a class="reference internal" href="#grammar-changes">Grammar Changes</a></li> <li><a class="reference internal" href="#semantics">Semantics</a></li> <li><a class="reference internal" href="#rationale">Rationale</a></li> <li><a class="reference internal" href="#reference-implementation">Reference Implementation</a></li> <li><a class="reference internal" href="#references">References</a></li> <li><a class="reference internal" href="#copyright">Copyright</a></li> </ul> </details></section> <section id="abstract"> <h2><a class="toc-backref" href="#abstract" role="doc-backlink">Abstract</a></h2> <p>This PEP proposes a literal syntax for the <code class="docutils literal notranslate"><span class="pre">bytes</span></code> objects introduced in <a class="pep reference internal" href="../pep-0358/" title="PEP 358 – The “bytes” Object">PEP 358</a>. The purpose is to provide a convenient way to spell ASCII strings and arbitrary binary data.</p> </section> <section id="motivation"> <h2><a class="toc-backref" href="#motivation" role="doc-backlink">Motivation</a></h2> <p>Existing spellings of an ASCII string in Python 3000 include:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nb">bytes</span><span class="p">(</span><span class="s1">&#39;Hello world&#39;</span><span class="p">,</span> <span class="s1">&#39;ascii&#39;</span><span class="p">)</span> <span class="s1">&#39;Hello world&#39;</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">&#39;ascii&#39;</span><span class="p">)</span> </pre></div> </div> <p>The proposed syntax is:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="sa">b</span><span class="s1">&#39;Hello world&#39;</span> </pre></div> </div> <p>Existing spellings of an 8-bit binary sequence in Python 3000 include:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nb">bytes</span><span class="p">([</span><span class="mh">0x7f</span><span class="p">,</span> <span class="mh">0x45</span><span class="p">,</span> <span class="mh">0x4c</span><span class="p">,</span> <span class="mh">0x46</span><span class="p">,</span> <span class="mh">0x01</span><span class="p">,</span> <span class="mh">0x01</span><span class="p">,</span> <span class="mh">0x01</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">])</span> <span class="nb">bytes</span><span class="p">(</span><span class="s1">&#39;</span><span class="se">\x7f</span><span class="s1">ELF</span><span class="se">\x01\x01\x01\0</span><span class="s1">&#39;</span><span class="p">,</span> <span class="s1">&#39;latin-1&#39;</span><span class="p">)</span> <span class="s1">&#39;7f454c4601010100&#39;</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">&#39;hex&#39;</span><span class="p">)</span> </pre></div> </div> <p>The proposed syntax is:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="sa">b</span><span class="s1">&#39;</span><span class="se">\x7f\x45\x4c\x46\x01\x01\x01\x00</span><span class="s1">&#39;</span> <span class="sa">b</span><span class="s1">&#39;</span><span class="se">\x7f</span><span class="s1">ELF</span><span class="se">\x01\x01\x01\0</span><span class="s1">&#39;</span> </pre></div> </div> <p>In both cases, the advantages of the new syntax are brevity, some small efficiency gain, and the detection of encoding errors at compile time rather than at runtime. The brevity benefit is especially felt when using the string-like methods of bytes objects:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">lines</span> <span class="o">=</span> <span class="n">bdata</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">,</span> <span class="s1">&#39;ascii&#39;</span><span class="p">))</span> <span class="c1"># existing syntax</span> <span class="n">lines</span> <span class="o">=</span> <span class="n">bdata</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sa">b</span><span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">)</span> <span class="c1"># proposed syntax</span> </pre></div> </div> <p>And when converting code from Python 2.x to Python 3000:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">sok</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="s1">&#39;EXIT</span><span class="se">\r\n</span><span class="s1">&#39;</span><span class="p">)</span> <span class="c1"># Python 2.x</span> <span class="n">sok</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="s1">&#39;EXIT</span><span class="se">\r\n</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">&#39;ascii&#39;</span><span class="p">))</span> <span class="c1"># Python 3000 existing</span> <span class="n">sok</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="sa">b</span><span class="s1">&#39;EXIT</span><span class="se">\r\n</span><span class="s1">&#39;</span><span class="p">)</span> <span class="c1"># proposed</span> </pre></div> </div> </section> <section id="grammar-changes"> <h2><a class="toc-backref" href="#grammar-changes" role="doc-backlink">Grammar Changes</a></h2> <p>The proposed syntax is an extension of the existing string syntax <a class="footnote-reference brackets" href="#stringliterals" id="id1">[1]</a>.</p> <p>The new syntax for strings, including the new bytes literal, is:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">stringliteral</span><span class="p">:</span> <span class="p">[</span><span class="n">stringprefix</span><span class="p">]</span> <span class="p">(</span><span class="n">shortstring</span> <span class="o">|</span> <span class="n">longstring</span><span class="p">)</span> <span class="n">stringprefix</span><span class="p">:</span> <span class="s2">&quot;b&quot;</span> <span class="o">|</span> <span class="s2">&quot;r&quot;</span> <span class="o">|</span> <span class="s2">&quot;br&quot;</span> <span class="o">|</span> <span class="s2">&quot;B&quot;</span> <span class="o">|</span> <span class="s2">&quot;R&quot;</span> <span class="o">|</span> <span class="s2">&quot;BR&quot;</span> <span class="o">|</span> <span class="s2">&quot;Br&quot;</span> <span class="o">|</span> <span class="s2">&quot;bR&quot;</span> <span class="n">shortstring</span><span class="p">:</span> <span class="s2">&quot;&#39;&quot;</span> <span class="n">shortstringitem</span><span class="o">*</span> <span class="s2">&quot;&#39;&quot;</span> <span class="o">|</span> <span class="s1">&#39;&quot;&#39;</span> <span class="n">shortstringitem</span><span class="o">*</span> <span class="s1">&#39;&quot;&#39;</span> <span class="n">longstring</span><span class="p">:</span> <span class="s2">&quot;&#39;&#39;&#39;&quot;</span> <span class="n">longstringitem</span><span class="o">*</span> <span class="s2">&quot;&#39;&#39;&#39;&quot;</span> <span class="o">|</span> <span class="s1">&#39;&quot;&quot;&quot;&#39;</span> <span class="n">longstringitem</span><span class="o">*</span> <span class="s1">&#39;&quot;&quot;&quot;&#39;</span> <span class="n">shortstringitem</span><span class="p">:</span> <span class="n">shortstringchar</span> <span class="o">|</span> <span class="n">escapeseq</span> <span class="n">longstringitem</span><span class="p">:</span> <span class="n">longstringchar</span> <span class="o">|</span> <span class="n">escapeseq</span> <span class="n">shortstringchar</span><span class="p">:</span> <span class="o">&lt;</span><span class="nb">any</span> <span class="n">source</span> <span class="n">character</span> <span class="k">except</span> <span class="s2">&quot;</span><span class="se">\&quot;</span><span class="s2"> or newline or the quote&gt;</span> <span class="n">longstringchar</span><span class="p">:</span> <span class="o">&lt;</span><span class="nb">any</span> <span class="n">source</span> <span class="n">character</span> <span class="k">except</span> <span class="s2">&quot;</span><span class="se">\&quot;</span><span class="s2">&gt;</span> <span class="n">escapeseq</span><span class="p">:</span> <span class="s2">&quot;</span><span class="se">\&quot;</span><span class="s2"> NL</span> <span class="o">|</span> <span class="s2">&quot;</span><span class="se">\\</span><span class="s2">&quot;</span> <span class="o">|</span> <span class="s2">&quot;</span><span class="se">\&#39;</span><span class="s2">&quot;</span> <span class="o">|</span> <span class="s1">&#39;</span><span class="se">\&quot;</span><span class="s1">&#39;</span> <span class="o">|</span> <span class="s2">&quot;</span><span class="se">\a</span><span class="s2">&quot;</span> <span class="o">|</span> <span class="s2">&quot;</span><span class="se">\b</span><span class="s2">&quot;</span> <span class="o">|</span> <span class="s2">&quot;</span><span class="se">\f</span><span class="s2">&quot;</span> <span class="o">|</span> <span class="s2">&quot;</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">|</span> <span class="s2">&quot;</span><span class="se">\r</span><span class="s2">&quot;</span> <span class="o">|</span> <span class="s2">&quot;</span><span class="se">\t</span><span class="s2">&quot;</span> <span class="o">|</span> <span class="s2">&quot;</span><span class="se">\v</span><span class="s2">&quot;</span> <span class="o">|</span> <span class="s2">&quot;\ooo&quot;</span> <span class="o">|</span> <span class="s2">&quot;\xhh&quot;</span> <span class="o">|</span> <span class="s2">&quot;\uxxxx&quot;</span> <span class="o">|</span> <span class="s2">&quot;\Uxxxxxxxx&quot;</span> <span class="o">|</span> <span class="s2">&quot;</span><span class="se">\N{name}</span><span class="s2">&quot;</span> </pre></div> </div> <p>The following additional restrictions apply only to bytes literals (<code class="docutils literal notranslate"><span class="pre">stringliteral</span></code> tokens with <code class="docutils literal notranslate"><span class="pre">b</span></code> or <code class="docutils literal notranslate"><span class="pre">B</span></code> in the <code class="docutils literal notranslate"><span class="pre">stringprefix</span></code>):</p> <ul class="simple"> <li>Each <code class="docutils literal notranslate"><span class="pre">shortstringchar</span></code> or <code class="docutils literal notranslate"><span class="pre">longstringchar</span></code> must be a character between 1 and 127 inclusive, regardless of any encoding declaration <a class="footnote-reference brackets" href="#encodings" id="id2">[2]</a> in the source file.</li> <li>The Unicode-specific escape sequences <code class="docutils literal notranslate"><span class="pre">\u</span></code><em>xxxx</em>, <code class="docutils literal notranslate"><span class="pre">\U</span></code><em>xxxxxxxx</em>, and <code class="docutils literal notranslate"><span class="pre">\N{</span></code><em>name</em><code class="docutils literal notranslate"><span class="pre">}</span></code> are unrecognized in Python 2.x and forbidden in Python 3000.</li> </ul> <p>Adjacent bytes literals are subject to the same concatenation rules as adjacent string literals <a class="footnote-reference brackets" href="#concat" id="id3">[3]</a>. A bytes literal adjacent to a string literal is an error.</p> </section> <section id="semantics"> <h2><a class="toc-backref" href="#semantics" role="doc-backlink">Semantics</a></h2> <p>Each evaluation of a bytes literal produces a new <code class="docutils literal notranslate"><span class="pre">bytes</span></code> object. The bytes in the new object are the bytes represented by the <code class="docutils literal notranslate"><span class="pre">shortstringitem</span></code> or <code class="docutils literal notranslate"><span class="pre">longstringitem</span></code> parts of the literal, in the same order.</p> </section> <section id="rationale"> <h2><a class="toc-backref" href="#rationale" role="doc-backlink">Rationale</a></h2> <p>The proposed syntax provides a cleaner migration path from Python 2.x to Python 3000 for most code involving 8-bit strings. Preserving the old 8-bit meaning of a string literal is usually as simple as adding a <code class="docutils literal notranslate"><span class="pre">b</span></code> prefix. The one exception is Python 2.x strings containing bytes &gt;127, which must be rewritten using escape sequences. Transcoding a source file from one encoding to another, and fixing up the encoding declaration, should preserve the meaning of the program. Python 2.x non-Unicode strings violate this principle; Python 3000 bytes literals shouldn’t.</p> <p>A string literal with a <code class="docutils literal notranslate"><span class="pre">b</span></code> in the prefix is always a syntax error in Python 2.5, so this syntax can be introduced in Python 2.6, along with the <code class="docutils literal notranslate"><span class="pre">bytes</span></code> type.</p> <p>A bytes literal produces a new object each time it is evaluated, like list displays and unlike string literals. This is necessary because bytes literals, like lists and unlike strings, are mutable <a class="footnote-reference brackets" href="#eachnew" id="id4">[4]</a>.</p> </section> <section id="reference-implementation"> <h2><a class="toc-backref" href="#reference-implementation" role="doc-backlink">Reference Implementation</a></h2> <p>Thomas Wouters has checked an implementation into the Py3K branch, r53872.</p> </section> <section id="references"> <h2><a class="toc-backref" href="#references" role="doc-backlink">References</a></h2> <aside class="footnote-list brackets"> <aside class="footnote brackets" id="stringliterals" role="doc-footnote"> <dt class="label" id="stringliterals">[<a href="#id1">1</a>]</dt> <dd><a class="reference external" href="http://docs.python.org/reference/lexical_analysis.html#string-literals">http://docs.python.org/reference/lexical_analysis.html#string-literals</a></aside> <aside class="footnote brackets" id="encodings" role="doc-footnote"> <dt class="label" id="encodings">[<a href="#id2">2</a>]</dt> <dd><a class="reference external" href="http://docs.python.org/reference/lexical_analysis.html#encoding-declarations">http://docs.python.org/reference/lexical_analysis.html#encoding-declarations</a></aside> <aside class="footnote brackets" id="concat" role="doc-footnote"> <dt class="label" id="concat">[<a href="#id3">3</a>]</dt> <dd><a class="reference external" href="http://docs.python.org/reference/lexical_analysis.html#string-literal-concatenation">http://docs.python.org/reference/lexical_analysis.html#string-literal-concatenation</a></aside> <aside class="footnote brackets" id="eachnew" role="doc-footnote"> <dt class="label" id="eachnew">[<a href="#id4">4</a>]</dt> <dd><a class="reference external" href="https://mail.python.org/pipermail/python-3000/2007-February/005779.html">https://mail.python.org/pipermail/python-3000/2007-February/005779.html</a></aside> </aside> </section> <section id="copyright"> <h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2> <p>This document has been placed in the public domain.</p> </section> </section> <hr class="docutils" /> <p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-3112.rst">https://github.com/python/peps/blob/main/peps/pep-3112.rst</a></p> <p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-3112.rst">2023-09-09 17:39:29 GMT</a></p> </article> <nav id="pep-sidebar"> <h2>Contents</h2> <ul> <li><a class="reference internal" href="#abstract">Abstract</a></li> <li><a class="reference internal" href="#motivation">Motivation</a></li> <li><a class="reference internal" href="#grammar-changes">Grammar Changes</a></li> <li><a class="reference internal" href="#semantics">Semantics</a></li> <li><a class="reference internal" href="#rationale">Rationale</a></li> <li><a class="reference internal" href="#reference-implementation">Reference Implementation</a></li> <li><a class="reference internal" href="#references">References</a></li> <li><a class="reference internal" href="#copyright">Copyright</a></li> </ul> <br> <a id="source" href="https://github.com/python/peps/blob/main/peps/pep-3112.rst">Page Source (GitHub)</a> </nav> </section> <script src="../_static/colour_scheme.js"></script> <script src="../_static/wrap_tables.js"></script> <script src="../_static/sticky_banner.js"></script> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10