CINXE.COM

PEP 721 – Using tarfile.data_filter for source distribution extraction | peps.python.org

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="color-scheme" content="light dark"> <title>PEP 721 – Using tarfile.data_filter for source distribution extraction | peps.python.org</title> <link rel="shortcut icon" href="../_static/py.png"> <link rel="canonical" href="https://peps.python.org/pep-0721/"> <link rel="stylesheet" href="../_static/style.css" type="text/css"> <link rel="stylesheet" href="../_static/mq.css" type="text/css"> <link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light"> <link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark"> <link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss"> <meta property="og:title" content='PEP 721 – Using tarfile.data_filter for source distribution extraction | peps.python.org'> <meta property="og:description" content="Extracting a source distribution archive should normally use the data filter added in PEP 706. We clarify details, and specify the behaviour for tools that cannot use the filter directly."> <meta property="og:type" content="website"> <meta property="og:url" content="https://peps.python.org/pep-0721/"> <meta property="og:site_name" content="Python Enhancement Proposals (PEPs)"> <meta property="og:image" content="https://peps.python.org/_static/og-image.png"> <meta property="og:image:alt" content="Python PEPs"> <meta property="og:image:width" content="200"> <meta property="og:image:height" content="200"> <meta name="description" content="Extracting a source distribution archive should normally use the data filter added in PEP 706. We clarify details, and specify the behaviour for tools that cannot use the filter directly."> <meta name="theme-color" content="#3776ab"> </head> <body> <svg xmlns="http://www.w3.org/2000/svg" style="display: none;"> <symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all"> <title>Following system colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <circle cx="12" cy="12" r="9"></circle> <path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path> </svg> </symbol> <symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all"> <title>Selected dark colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <path stroke="none" d="M0 0h24v24H0z" fill="none"></path> <path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path> </svg> </symbol> <symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all"> <title>Selected light colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <circle cx="12" cy="12" r="5"></circle> <line x1="12" y1="1" x2="12" y2="3"></line> <line x1="12" y1="21" x2="12" y2="23"></line> <line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line> <line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line> <line x1="1" y1="12" x2="3" y2="12"></line> <line x1="21" y1="12" x2="23" y2="12"></line> <line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line> <line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line> </svg> </symbol> </svg> <script> document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto" </script> <section id="pep-page-section"> <header> <h1>Python Enhancement Proposals</h1> <ul class="breadcrumbs"> <li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> &raquo; </li> <li><a href="../pep-0000/">PEP Index</a> &raquo; </li> <li>PEP 721</li> </ul> <button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())"> <svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg> <svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg> <svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg> <span class="visually-hidden">Toggle light / dark / auto colour theme</span> </button> </header> <article> <section id="pep-content"> <h1 class="page-title">PEP 721 – Using tarfile.data_filter for source distribution extraction</h1> <dl class="rfc2822 field-list simple"> <dt class="field-odd">Author<span class="colon">:</span></dt> <dd class="field-odd">Petr Viktorin &lt;encukou&#32;&#97;t&#32;gmail.com&gt;</dd> <dt class="field-even">PEP-Delegate<span class="colon">:</span></dt> <dd class="field-even">Paul Moore &lt;p.f.moore&#32;&#97;t&#32;gmail.com&gt;</dd> <dt class="field-odd">Status<span class="colon">:</span></dt> <dd class="field-odd"><abbr title="Accepted and implementation complete, or no longer active">Final</abbr></dd> <dt class="field-even">Type<span class="colon">:</span></dt> <dd class="field-even"><abbr title="Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem">Standards Track</abbr></dd> <dt class="field-odd">Topic<span class="colon">:</span></dt> <dd class="field-odd"><a class="reference external" href="../topic/packaging/">Packaging</a></dd> <dt class="field-even">Requires<span class="colon">:</span></dt> <dd class="field-even"><a class="reference external" href="../pep-0706/">706</a></dd> <dt class="field-odd">Created<span class="colon">:</span></dt> <dd class="field-odd">12-Jul-2023</dd> <dt class="field-even">Python-Version<span class="colon">:</span></dt> <dd class="field-even">3.12</dd> <dt class="field-odd">Post-History<span class="colon">:</span></dt> <dd class="field-odd"><a class="reference external" href="https://discuss.python.org/t/28928" title="Discourse thread">04-Jul-2023</a></dd> <dt class="field-even">Resolution<span class="colon">:</span></dt> <dd class="field-even"><a class="reference external" href="https://discuss.python.org/t/28928/13">02-Aug-2023</a></dd> </dl> <hr class="docutils" /> <section id="contents"> <details><summary>Table of Contents</summary><ul class="simple"> <li><a class="reference internal" href="#abstract">Abstract</a></li> <li><a class="reference internal" href="#motivation">Motivation</a></li> <li><a class="reference internal" href="#rationale">Rationale</a><ul> <li><a class="reference internal" href="#unpatched-versions-of-python">Unpatched versions of Python</a></li> <li><a class="reference internal" href="#permissions">Permissions</a></li> </ul> </li> <li><a class="reference internal" href="#specification">Specification</a><ul> <li><a class="reference internal" href="#unpacking-with-the-data-filter">Unpacking with the data filter</a></li> <li><a class="reference internal" href="#unpacking-without-the-data-filter">Unpacking without the data filter</a></li> <li><a class="reference internal" href="#further-hints">Further hints</a></li> </ul> </li> <li><a class="reference internal" href="#backwards-compatibility">Backwards Compatibility</a></li> <li><a class="reference internal" href="#security-implications">Security Implications</a></li> <li><a class="reference internal" href="#how-to-teach-this">How to Teach This</a></li> <li><a class="reference internal" href="#reference-implementation">Reference Implementation</a></li> <li><a class="reference internal" href="#rejected-ideas">Rejected Ideas</a></li> <li><a class="reference internal" href="#open-issues">Open Issues</a></li> <li><a class="reference internal" href="#copyright">Copyright</a></li> </ul> </details></section> <div class="pep-banner canonical-pypa-spec sticky-banner admonition important"> <p class="admonition-title">Important</p> <p>This PEP is a historical document. The up-to-date, canonical spec, <a class="reference external" href="https://packaging.python.org/en/latest/specifications/source-distribution-format/#sdist-archive-features" title="(in Python Packaging User Guide)"><span>Source distribution archive features</span></a>, is maintained on the <a class="reference external" href="https://packaging.python.org/en/latest/specifications/">PyPA specs page</a>.</p> <p class="close-button">×</p> <p>See the <a class="reference external" href="https://www.pypa.io/en/latest/specifications/#handling-fixes-and-other-minor-updates">PyPA specification update process</a> for how to propose changes.</p> </div> <section id="abstract"> <h2><a class="toc-backref" href="#abstract" role="doc-backlink">Abstract</a></h2> <p>Extracting a source distribution archive should normally use the <code class="docutils literal notranslate"><span class="pre">data</span></code> filter added in <a class="pep reference internal" href="../pep-0706/" title="PEP 706 – Filter for tarfile.extractall">PEP 706</a>. We clarify details, and specify the behaviour for tools that cannot use the filter directly.</p> </section> <section id="motivation"> <h2><a class="toc-backref" href="#motivation" role="doc-backlink">Motivation</a></h2> <p>The <em>source distribution</em> <code class="docutils literal notranslate"><span class="pre">sdist</span></code> is defined as a tar archive.</p> <p>The <code class="docutils literal notranslate"><span class="pre">tar</span></code> format is designed to capture all metadata of Unix-like files. Some of these are dangerous, unnecessary for source code, and/or platform-dependent. As explained in <a class="pep reference internal" href="../pep-0706/" title="PEP 706 – Filter for tarfile.extractall">PEP 706</a>, when extracting a tarball, one should always either limit the allowed features, or explicitly give the tarball total control.</p> </section> <section id="rationale"> <h2><a class="toc-backref" href="#rationale" role="doc-backlink">Rationale</a></h2> <p>For source distributions, the <code class="docutils literal notranslate"><span class="pre">data</span></code> filter introduced in <a class="pep reference internal" href="../pep-0706/" title="PEP 706 – Filter for tarfile.extractall">PEP 706</a> is enough. It allows slightly more features than <code class="docutils literal notranslate"><span class="pre">git</span></code> and <code class="docutils literal notranslate"><span class="pre">zip</span></code> (both commonly used in packaging workflows).</p> <p>However, not all tools can use the <code class="docutils literal notranslate"><span class="pre">data</span></code> filter, so this PEP specifies an explicit set of expectations. The aim is that the current behaviour of <code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">download</span></code> and <code class="docutils literal notranslate"><span class="pre">setuptools.archive_util.unpack_tarfile</span></code> is valid, except cases deemed too dangerous to allow. Another consideration is ease of implementation for non-Python tools.</p> <section id="unpatched-versions-of-python"> <h3><a class="toc-backref" href="#unpatched-versions-of-python" role="doc-backlink">Unpatched versions of Python</a></h3> <p>Tools are allowed to ignore this PEP when running on Python without tarfile filters.</p> <p>The feature has been backported to all versions of Python supported by <code class="docutils literal notranslate"><span class="pre">python.org</span></code>. Vendoring it in third-party libraries is tricky, and we should not force all tools to do so. This shifts the responsibility to keep up with security updates from the tools to the users.</p> </section> <section id="permissions"> <h3><a class="toc-backref" href="#permissions" role="doc-backlink">Permissions</a></h3> <p>Common tools (<code class="docutils literal notranslate"><span class="pre">git</span></code>, <code class="docutils literal notranslate"><span class="pre">zip</span></code>) don’t preserve Unix permissions (mode bits). Telling users to not rely on them in <em>sdists</em>, and allowing tools to handle them relatively freely, seems fair.</p> <p>The only exception is the <em>executable</em> permission. We recommend, but not require, that tools preserve it. Given that scripts are generally platform-specific, it seems fitting to say that keeping them executable is tool-specific behaviour.</p> <p>Note that while <code class="docutils literal notranslate"><span class="pre">git</span></code> preserves executability, <code class="docutils literal notranslate"><span class="pre">zip</span></code> (and thus <code class="docutils literal notranslate"><span class="pre">wheel</span></code>) doesn’t do it natively. (It is possible to encode it in “external attributes”, but Python’s <code class="docutils literal notranslate"><span class="pre">ZipFile.extract</span></code> does not honour that.)</p> </section> </section> <section id="specification"> <h2><a class="toc-backref" href="#specification" role="doc-backlink">Specification</a></h2> <p>The following will be added to <a class="reference external" href="https://packaging.python.org/en/latest/specifications/source-distribution-format/">the PyPA source distribution format spec</a> under a new heading, “<em>Source distribution archive features</em>”:</p> <p>Because extracting tar files as-is is dangerous, and the results are platform-specific, archive features of source distributions are limited.</p> <section id="unpacking-with-the-data-filter"> <h3><a class="toc-backref" href="#unpacking-with-the-data-filter" role="doc-backlink">Unpacking with the data filter</a></h3> <p>When extracting a source distribution, tools MUST either use <code class="docutils literal notranslate"><span class="pre">tarfile.data_filter</span></code> (e.g. <code class="docutils literal notranslate"><span class="pre">TarFile.extractall(...,</span> <span class="pre">filter='data')</span></code>), OR follow the <em>Unpacking without the data filter</em> section below.</p> <p>As an exception, on Python interpreters without <code class="docutils literal notranslate"><span class="pre">hasattr(tarfile,</span> <span class="pre">'data_filter')</span></code> (<a class="pep reference internal" href="../pep-0706/" title="PEP 706 – Filter for tarfile.extractall">PEP 706</a>), tools that normally use that filter (directly on indirectly) MAY warn the user and ignore this specification. The trade-off between usability (e.g. fully trusting the archive) and security (e.g. refusing to unpack) is left up to the tool in this case.</p> </section> <section id="unpacking-without-the-data-filter"> <h3><a class="toc-backref" href="#unpacking-without-the-data-filter" role="doc-backlink">Unpacking without the data filter</a></h3> <p>Tools that do not use the <code class="docutils literal notranslate"><span class="pre">data</span></code> filter directly (e.g. for backwards compatibility, allowing additional features, or not using Python) MUST follow this section. (At the time of this writing, the <code class="docutils literal notranslate"><span class="pre">data</span></code> filter also follows this section, but it may get out of sync in the future.)</p> <p>The following files are invalid in an <code class="docutils literal notranslate"><span class="pre">sdist</span></code> archive. Upon encountering such an entry, tools SHOULD notify the user, MUST NOT unpack the entry, and MAY abort with a failure:</p> <ul class="simple"> <li>Files that would be placed outside the destination directory.</li> <li>Links (symbolic or hard) pointing outside the destination directory.</li> <li>Device files (including pipes).</li> </ul> <p>The following are also invalid. Tools MAY treat them as above, but are NOT REQUIRED to do so:</p> <ul class="simple"> <li>Files with a <code class="docutils literal notranslate"><span class="pre">..</span></code> component in the filename or link target.</li> <li>Links pointing to a file that is not part of the archive.</li> </ul> <p>Tools MAY unpack links (symbolic or hard) as regular files, using content from the archive.</p> <p>When extracting <code class="docutils literal notranslate"><span class="pre">sdist</span></code> archives:</p> <ul class="simple"> <li>Leading slashes in file names MUST be dropped. (This is nowadays standard behaviour for <code class="docutils literal notranslate"><span class="pre">tar</span></code> unpacking.)</li> <li>For each <code class="docutils literal notranslate"><span class="pre">mode</span></code> (Unix permission) bit, tools MUST either:<ul> <li>use the platform’s default for a new file/directory (respectively),</li> <li>set the bit according to the archive, or</li> <li>use the bit from <code class="docutils literal notranslate"><span class="pre">rw-r--r--</span></code> (<code class="docutils literal notranslate"><span class="pre">0o644</span></code>) for non-executable files or <code class="docutils literal notranslate"><span class="pre">rwxr-xr-x</span></code> (<code class="docutils literal notranslate"><span class="pre">0o755</span></code>) for executable files and directories.</li> </ul> </li> <li>High <code class="docutils literal notranslate"><span class="pre">mode</span></code> bits (setuid, setgid, sticky) MUST be cleared.</li> <li>It is RECOMMENDED to preserve the user <em>executable</em> bit.</li> </ul> </section> <section id="further-hints"> <h3><a class="toc-backref" href="#further-hints" role="doc-backlink">Further hints</a></h3> <p>Tool authors are encouraged to consider how <em>hints for further verification</em> in <code class="docutils literal notranslate"><span class="pre">tarfile</span></code> documentation apply for their tool.</p> </section> </section> <section id="backwards-compatibility"> <h2><a class="toc-backref" href="#backwards-compatibility" role="doc-backlink">Backwards Compatibility</a></h2> <p>The existing behaviour is unspecified, and treated differently by different tools. This PEP makes the expectations explicit.</p> <p>There is no known case of backwards incompatibility, but some project out there probably does rely on details that aren’t guaranteed. This PEP bans the most dangerous of those features, and the rest is made tool-specific.</p> </section> <section id="security-implications"> <h2><a class="toc-backref" href="#security-implications" role="doc-backlink">Security Implications</a></h2> <p>The recommended <code class="docutils literal notranslate"><span class="pre">data</span></code> filter is believed safe against common exploits, and is a single place to amend if flaws are found in the future.</p> <p>The explicit specification includes protections from the <code class="docutils literal notranslate"><span class="pre">data</span></code> filter.</p> </section> <section id="how-to-teach-this"> <h2><a class="toc-backref" href="#how-to-teach-this" role="doc-backlink">How to Teach This</a></h2> <p>The PEP is aimed at authors of packaging tools, who should be fine with a PEP and an updated packaging spec.</p> </section> <section id="reference-implementation"> <h2><a class="toc-backref" href="#reference-implementation" role="doc-backlink">Reference Implementation</a></h2> <p>TBD</p> </section> <section id="rejected-ideas"> <h2><a class="toc-backref" href="#rejected-ideas" role="doc-backlink">Rejected Ideas</a></h2> <p>None yet.</p> </section> <section id="open-issues"> <h2><a class="toc-backref" href="#open-issues" role="doc-backlink">Open Issues</a></h2> <p>None yet.</p> </section> <section id="copyright"> <h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2> <p>This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.</p> </section> </section> <hr class="docutils" /> <p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-0721.rst">https://github.com/python/peps/blob/main/peps/pep-0721.rst</a></p> <p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-0721.rst">2025-02-01 08:55:40 GMT</a></p> </article> <nav id="pep-sidebar"> <h2>Contents</h2> <ul> <li><a class="reference internal" href="#abstract">Abstract</a></li> <li><a class="reference internal" href="#motivation">Motivation</a></li> <li><a class="reference internal" href="#rationale">Rationale</a><ul> <li><a class="reference internal" href="#unpatched-versions-of-python">Unpatched versions of Python</a></li> <li><a class="reference internal" href="#permissions">Permissions</a></li> </ul> </li> <li><a class="reference internal" href="#specification">Specification</a><ul> <li><a class="reference internal" href="#unpacking-with-the-data-filter">Unpacking with the data filter</a></li> <li><a class="reference internal" href="#unpacking-without-the-data-filter">Unpacking without the data filter</a></li> <li><a class="reference internal" href="#further-hints">Further hints</a></li> </ul> </li> <li><a class="reference internal" href="#backwards-compatibility">Backwards Compatibility</a></li> <li><a class="reference internal" href="#security-implications">Security Implications</a></li> <li><a class="reference internal" href="#how-to-teach-this">How to Teach This</a></li> <li><a class="reference internal" href="#reference-implementation">Reference Implementation</a></li> <li><a class="reference internal" href="#rejected-ideas">Rejected Ideas</a></li> <li><a class="reference internal" href="#open-issues">Open Issues</a></li> <li><a class="reference internal" href="#copyright">Copyright</a></li> </ul> <br> <a id="source" href="https://github.com/python/peps/blob/main/peps/pep-0721.rst">Page Source (GitHub)</a> </nav> </section> <script src="../_static/colour_scheme.js"></script> <script src="../_static/wrap_tables.js"></script> <script src="../_static/sticky_banner.js"></script> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10