CINXE.COM

PEP 471 – os.scandir() function – a better and faster directory iterator | peps.python.org

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="color-scheme" content="light dark"> <title>PEP 471 – os.scandir() function – a better and faster directory iterator | peps.python.org</title> <link rel="shortcut icon" href="../_static/py.png"> <link rel="canonical" href="https://peps.python.org/pep-0471/"> <link rel="stylesheet" href="../_static/style.css" type="text/css"> <link rel="stylesheet" href="../_static/mq.css" type="text/css"> <link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light"> <link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark"> <link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss"> <meta property="og:title" content='PEP 471 – os.scandir() function – a better and faster directory iterator | peps.python.org'> <meta property="og:description" content="This PEP proposes including a new directory iteration function, os.scandir(), in the standard library. This new function adds useful functionality and increases the speed of os.walk() by 2-20 times (depending on the platform and file system) by avoiding..."> <meta property="og:type" content="website"> <meta property="og:url" content="https://peps.python.org/pep-0471/"> <meta property="og:site_name" content="Python Enhancement Proposals (PEPs)"> <meta property="og:image" content="https://peps.python.org/_static/og-image.png"> <meta property="og:image:alt" content="Python PEPs"> <meta property="og:image:width" content="200"> <meta property="og:image:height" content="200"> <meta name="description" content="This PEP proposes including a new directory iteration function, os.scandir(), in the standard library. This new function adds useful functionality and increases the speed of os.walk() by 2-20 times (depending on the platform and file system) by avoiding..."> <meta name="theme-color" content="#3776ab"> </head> <body> <svg xmlns="http://www.w3.org/2000/svg" style="display: none;"> <symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all"> <title>Following system colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <circle cx="12" cy="12" r="9"></circle> <path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path> </svg> </symbol> <symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all"> <title>Selected dark colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <path stroke="none" d="M0 0h24v24H0z" fill="none"></path> <path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path> </svg> </symbol> <symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all"> <title>Selected light colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <circle cx="12" cy="12" r="5"></circle> <line x1="12" y1="1" x2="12" y2="3"></line> <line x1="12" y1="21" x2="12" y2="23"></line> <line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line> <line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line> <line x1="1" y1="12" x2="3" y2="12"></line> <line x1="21" y1="12" x2="23" y2="12"></line> <line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line> <line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line> </svg> </symbol> </svg> <script> document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto" </script> <section id="pep-page-section"> <header> <h1>Python Enhancement Proposals</h1> <ul class="breadcrumbs"> <li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> &raquo; </li> <li><a href="../pep-0000/">PEP Index</a> &raquo; </li> <li>PEP 471</li> </ul> <button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())"> <svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg> <svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg> <svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg> <span class="visually-hidden">Toggle light / dark / auto colour theme</span> </button> </header> <article> <section id="pep-content"> <h1 class="page-title">PEP 471 – os.scandir() function – a better and faster directory iterator</h1> <dl class="rfc2822 field-list simple"> <dt class="field-odd">Author<span class="colon">:</span></dt> <dd class="field-odd">Ben Hoyt &lt;benhoyt&#32;&#97;t&#32;gmail.com&gt;</dd> <dt class="field-even">BDFL-Delegate<span class="colon">:</span></dt> <dd class="field-even">Victor Stinner &lt;vstinner&#32;&#97;t&#32;python.org&gt;</dd> <dt class="field-odd">Status<span class="colon">:</span></dt> <dd class="field-odd"><abbr title="Accepted and implementation complete, or no longer active">Final</abbr></dd> <dt class="field-even">Type<span class="colon">:</span></dt> <dd class="field-even"><abbr title="Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem">Standards Track</abbr></dd> <dt class="field-odd">Created<span class="colon">:</span></dt> <dd class="field-odd">30-May-2014</dd> <dt class="field-even">Python-Version<span class="colon">:</span></dt> <dd class="field-even">3.5</dd> <dt class="field-odd">Post-History<span class="colon">:</span></dt> <dd class="field-odd">27-Jun-2014, 08-Jul-2014, 14-Jul-2014</dd> </dl> <hr class="docutils" /> <section id="contents"> <details><summary>Table of Contents</summary><ul class="simple"> <li><a class="reference internal" href="#abstract">Abstract</a></li> <li><a class="reference internal" href="#rationale">Rationale</a></li> <li><a class="reference internal" href="#implementation">Implementation</a></li> <li><a class="reference internal" href="#specifics-of-proposal">Specifics of proposal</a><ul> <li><a class="reference internal" href="#os-scandir">os.scandir()</a></li> <li><a class="reference internal" href="#os-walk">os.walk()</a></li> </ul> </li> <li><a class="reference internal" href="#examples">Examples</a><ul> <li><a class="reference internal" href="#notes-on-caching">Notes on caching</a></li> <li><a class="reference internal" href="#notes-on-exception-handling">Notes on exception handling</a></li> </ul> </li> <li><a class="reference internal" href="#support">Support</a></li> <li><a class="reference internal" href="#use-in-the-wild">Use in the wild</a></li> <li><a class="reference internal" href="#rejected-ideas">Rejected ideas</a><ul> <li><a class="reference internal" href="#naming">Naming</a></li> <li><a class="reference internal" href="#wildcard-support">Wildcard support</a></li> <li><a class="reference internal" href="#methods-not-following-symlinks-by-default">Methods not following symlinks by default</a></li> <li><a class="reference internal" href="#direntry-attributes-being-properties">DirEntry attributes being properties</a></li> <li><a class="reference internal" href="#direntry-fields-being-static-attribute-only-objects">DirEntry fields being “static” attribute-only objects</a></li> <li><a class="reference internal" href="#direntry-fields-being-static-with-an-ensure-lstat-option">DirEntry fields being static with an ensure_lstat option</a></li> <li><a class="reference internal" href="#return-values-being-name-stat-result-two-tuples">Return values being (name, stat_result) two-tuples</a></li> <li><a class="reference internal" href="#return-values-being-overloaded-stat-result-objects">Return values being overloaded stat_result objects</a></li> <li><a class="reference internal" href="#return-values-being-pathlib-path-objects">Return values being pathlib.Path objects</a></li> </ul> </li> <li><a class="reference internal" href="#possible-improvements">Possible improvements</a></li> <li><a class="reference internal" href="#previous-discussion">Previous discussion</a></li> <li><a class="reference internal" href="#copyright">Copyright</a></li> </ul> </details></section> <section id="abstract"> <h2><a class="toc-backref" href="#abstract" role="doc-backlink">Abstract</a></h2> <p>This PEP proposes including a new directory iteration function, <code class="docutils literal notranslate"><span class="pre">os.scandir()</span></code>, in the standard library. This new function adds useful functionality and increases the speed of <code class="docutils literal notranslate"><span class="pre">os.walk()</span></code> by 2-20 times (depending on the platform and file system) by avoiding calls to <code class="docutils literal notranslate"><span class="pre">os.stat()</span></code> in most cases.</p> </section> <section id="rationale"> <h2><a class="toc-backref" href="#rationale" role="doc-backlink">Rationale</a></h2> <p>Python’s built-in <code class="docutils literal notranslate"><span class="pre">os.walk()</span></code> is significantly slower than it needs to be, because – in addition to calling <code class="docutils literal notranslate"><span class="pre">os.listdir()</span></code> on each directory – it executes the <code class="docutils literal notranslate"><span class="pre">stat()</span></code> system call or <code class="docutils literal notranslate"><span class="pre">GetFileAttributes()</span></code> on each file to determine whether the entry is a directory or not.</p> <p>But the underlying system calls – <code class="docutils literal notranslate"><span class="pre">FindFirstFile</span></code> / <code class="docutils literal notranslate"><span class="pre">FindNextFile</span></code> on Windows and <code class="docutils literal notranslate"><span class="pre">readdir</span></code> on POSIX systems – already tell you whether the files returned are directories or not, so no further system calls are needed. Further, the Windows system calls return all the information for a <code class="docutils literal notranslate"><span class="pre">stat_result</span></code> object on the directory entry, such as file size and last modification time.</p> <p>In short, you can reduce the number of system calls required for a tree function like <code class="docutils literal notranslate"><span class="pre">os.walk()</span></code> from approximately 2N to N, where N is the total number of files and directories in the tree. (And because directory trees are usually wider than they are deep, it’s often much better than this.)</p> <p>In practice, removing all those extra system calls makes <code class="docutils literal notranslate"><span class="pre">os.walk()</span></code> about <strong>8-9 times as fast on Windows</strong>, and about <strong>2-3 times as fast on POSIX systems</strong>. So we’re not talking about micro-optimizations. See more <a class="reference external" href="https://github.com/benhoyt/scandir#benchmarks">benchmarks here</a>.</p> <p>Somewhat relatedly, many people (see Python <a class="reference external" href="http://bugs.python.org/issue11406">Issue 11406</a>) are also keen on a version of <code class="docutils literal notranslate"><span class="pre">os.listdir()</span></code> that yields filenames as it iterates instead of returning them as one big list. This improves memory efficiency for iterating very large directories.</p> <p>So, as well as providing a <code class="docutils literal notranslate"><span class="pre">scandir()</span></code> iterator function for calling directly, Python’s existing <code class="docutils literal notranslate"><span class="pre">os.walk()</span></code> function can be sped up a huge amount.</p> </section> <section id="implementation"> <h2><a class="toc-backref" href="#implementation" role="doc-backlink">Implementation</a></h2> <p>The implementation of this proposal was written by Ben Hoyt (initial version) and Tim Golden (who helped a lot with the C extension module). It lives on GitHub at <a class="reference external" href="https://github.com/benhoyt/scandir">benhoyt/scandir</a>. (The implementation may lag behind the updates to this PEP a little.)</p> <p>Note that this module has been used and tested (see “Use in the wild” section in this PEP), so it’s more than a proof-of-concept. However, it is marked as beta software and is not extensively battle-tested. It will need some cleanup and more thorough testing before going into the standard library, as well as integration into <code class="docutils literal notranslate"><span class="pre">posixmodule.c</span></code>.</p> </section> <section id="specifics-of-proposal"> <h2><a class="toc-backref" href="#specifics-of-proposal" role="doc-backlink">Specifics of proposal</a></h2> <section id="os-scandir"> <h3><a class="toc-backref" href="#os-scandir" role="doc-backlink">os.scandir()</a></h3> <p>Specifically, this PEP proposes adding a single function to the <code class="docutils literal notranslate"><span class="pre">os</span></code> module in the standard library, <code class="docutils literal notranslate"><span class="pre">scandir</span></code>, that takes a single, optional string as its argument:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">scandir</span><span class="p">(</span><span class="n">path</span><span class="o">=</span><span class="s1">&#39;.&#39;</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">generator</span> <span class="n">of</span> <span class="n">DirEntry</span> <span class="n">objects</span> </pre></div> </div> <p>Like <code class="docutils literal notranslate"><span class="pre">listdir</span></code>, <code class="docutils literal notranslate"><span class="pre">scandir</span></code> calls the operating system’s directory iteration system calls to get the names of the files in the given <code class="docutils literal notranslate"><span class="pre">path</span></code>, but it’s different from <code class="docutils literal notranslate"><span class="pre">listdir</span></code> in two ways:</p> <ul class="simple"> <li>Instead of returning bare filename strings, it returns lightweight <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> objects that hold the filename string and provide simple methods that allow access to the additional data the operating system may have returned.</li> <li>It returns a generator instead of a list, so that <code class="docutils literal notranslate"><span class="pre">scandir</span></code> acts as a true iterator instead of returning the full list immediately.</li> </ul> <p><code class="docutils literal notranslate"><span class="pre">scandir()</span></code> yields a <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> object for each file and sub-directory in <code class="docutils literal notranslate"><span class="pre">path</span></code>. Just like <code class="docutils literal notranslate"><span class="pre">listdir</span></code>, the <code class="docutils literal notranslate"><span class="pre">'.'</span></code> and <code class="docutils literal notranslate"><span class="pre">'..'</span></code> pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> object has the following attributes and methods:</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">name</span></code>: the entry’s filename, relative to the scandir <code class="docutils literal notranslate"><span class="pre">path</span></code> argument (corresponds to the return values of <code class="docutils literal notranslate"><span class="pre">os.listdir</span></code>)</li> <li><code class="docutils literal notranslate"><span class="pre">path</span></code>: the entry’s full path name (not necessarily an absolute path) – the equivalent of <code class="docutils literal notranslate"><span class="pre">os.path.join(scandir_path,</span> <span class="pre">entry.name)</span></code></li> <li><code class="docutils literal notranslate"><span class="pre">inode()</span></code>: return the inode number of the entry. The result is cached on the <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> object, use <code class="docutils literal notranslate"><span class="pre">os.stat(entry.path,</span> <span class="pre">follow_symlinks=False).st_ino</span></code> to fetch up-to-date information. On Unix, no system call is required.</li> <li><code class="docutils literal notranslate"><span class="pre">is_dir(*,</span> <span class="pre">follow_symlinks=True)</span></code>: similar to <code class="docutils literal notranslate"><span class="pre">pathlib.Path.is_dir()</span></code>, but the return value is cached on the <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> object; doesn’t require a system call in most cases; don’t follow symbolic links if <code class="docutils literal notranslate"><span class="pre">follow_symlinks</span></code> is False</li> <li><code class="docutils literal notranslate"><span class="pre">is_file(*,</span> <span class="pre">follow_symlinks=True)</span></code>: similar to <code class="docutils literal notranslate"><span class="pre">pathlib.Path.is_file()</span></code>, but the return value is cached on the <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> object; doesn’t require a system call in most cases; don’t follow symbolic links if <code class="docutils literal notranslate"><span class="pre">follow_symlinks</span></code> is False</li> <li><code class="docutils literal notranslate"><span class="pre">is_symlink()</span></code>: similar to <code class="docutils literal notranslate"><span class="pre">pathlib.Path.is_symlink()</span></code>, but the return value is cached on the <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> object; doesn’t require a system call in most cases</li> <li><code class="docutils literal notranslate"><span class="pre">stat(*,</span> <span class="pre">follow_symlinks=True)</span></code>: like <code class="docutils literal notranslate"><span class="pre">os.stat()</span></code>, but the return value is cached on the <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> object; does not require a system call on Windows (except for symlinks); don’t follow symbolic links (like <code class="docutils literal notranslate"><span class="pre">os.lstat()</span></code>) if <code class="docutils literal notranslate"><span class="pre">follow_symlinks</span></code> is False</li> </ul> <p>All <em>methods</em> may perform system calls in some cases and therefore possibly raise <code class="docutils literal notranslate"><span class="pre">OSError</span></code> – see the “Notes on exception handling” section for more details.</p> <p>The <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> attribute and method names were chosen to be the same as those in the new <code class="docutils literal notranslate"><span class="pre">pathlib</span></code> module where possible, for consistency. The only difference in functionality is that the <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> methods cache their values on the entry object after the first call.</p> <p>Like the other functions in the <code class="docutils literal notranslate"><span class="pre">os</span></code> module, <code class="docutils literal notranslate"><span class="pre">scandir()</span></code> accepts either a bytes or str object for the <code class="docutils literal notranslate"><span class="pre">path</span></code> parameter, and returns the <code class="docutils literal notranslate"><span class="pre">DirEntry.name</span></code> and <code class="docutils literal notranslate"><span class="pre">DirEntry.path</span></code> attributes with the same type as <code class="docutils literal notranslate"><span class="pre">path</span></code>. However, it is <em>strongly recommended</em> to use the str type, as this ensures cross-platform support for Unicode filenames. (On Windows, bytes filenames have been deprecated since Python 3.3).</p> </section> <section id="os-walk"> <h3><a class="toc-backref" href="#os-walk" role="doc-backlink">os.walk()</a></h3> <p>As part of this proposal, <code class="docutils literal notranslate"><span class="pre">os.walk()</span></code> will also be modified to use <code class="docutils literal notranslate"><span class="pre">scandir()</span></code> rather than <code class="docutils literal notranslate"><span class="pre">listdir()</span></code> and <code class="docutils literal notranslate"><span class="pre">os.path.isdir()</span></code>. This will increase the speed of <code class="docutils literal notranslate"><span class="pre">os.walk()</span></code> very significantly (as mentioned above, by 2-20 times, depending on the system).</p> </section> </section> <section id="examples"> <h2><a class="toc-backref" href="#examples" role="doc-backlink">Examples</a></h2> <p>First, a very simple example of <code class="docutils literal notranslate"><span class="pre">scandir()</span></code> showing use of the <code class="docutils literal notranslate"><span class="pre">DirEntry.name</span></code> attribute and the <code class="docutils literal notranslate"><span class="pre">DirEntry.is_dir()</span></code> method:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">subdirs</span><span class="p">(</span><span class="n">path</span><span class="p">):</span> <span class="w"> </span><span class="sd">&quot;&quot;&quot;Yield directory names not starting with &#39;.&#39; under given path.&quot;&quot;&quot;</span> <span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">scandir</span><span class="p">(</span><span class="n">path</span><span class="p">):</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">entry</span><span class="o">.</span><span class="n">name</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">&#39;.&#39;</span><span class="p">)</span> <span class="ow">and</span> <span class="n">entry</span><span class="o">.</span><span class="n">is_dir</span><span class="p">():</span> <span class="k">yield</span> <span class="n">entry</span><span class="o">.</span><span class="n">name</span> </pre></div> </div> <p>This <code class="docutils literal notranslate"><span class="pre">subdirs()</span></code> function will be significantly faster with scandir than <code class="docutils literal notranslate"><span class="pre">os.listdir()</span></code> and <code class="docutils literal notranslate"><span class="pre">os.path.isdir()</span></code> on both Windows and POSIX systems, especially on medium-sized or large directories.</p> <p>Or, for getting the total size of files in a directory tree, showing use of the <code class="docutils literal notranslate"><span class="pre">DirEntry.stat()</span></code> method and <code class="docutils literal notranslate"><span class="pre">DirEntry.path</span></code> attribute:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">get_tree_size</span><span class="p">(</span><span class="n">path</span><span class="p">):</span> <span class="w"> </span><span class="sd">&quot;&quot;&quot;Return total size of files in given path and subdirs.&quot;&quot;&quot;</span> <span class="n">total</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">scandir</span><span class="p">(</span><span class="n">path</span><span class="p">):</span> <span class="k">if</span> <span class="n">entry</span><span class="o">.</span><span class="n">is_dir</span><span class="p">(</span><span class="n">follow_symlinks</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span> <span class="n">total</span> <span class="o">+=</span> <span class="n">get_tree_size</span><span class="p">(</span><span class="n">entry</span><span class="o">.</span><span class="n">path</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="n">total</span> <span class="o">+=</span> <span class="n">entry</span><span class="o">.</span><span class="n">stat</span><span class="p">(</span><span class="n">follow_symlinks</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span><span class="o">.</span><span class="n">st_size</span> <span class="k">return</span> <span class="n">total</span> </pre></div> </div> <p>This also shows the use of the <code class="docutils literal notranslate"><span class="pre">follow_symlinks</span></code> parameter to <code class="docutils literal notranslate"><span class="pre">is_dir()</span></code> – in a recursive function like this, we probably don’t want to follow links. (To properly follow links in a recursive function like this we’d want special handling for the case where following a symlink leads to a recursive loop.)</p> <p>Note that <code class="docutils literal notranslate"><span class="pre">get_tree_size()</span></code> will get a huge speed boost on Windows, because no extra stat call are needed, but on POSIX systems the size information is not returned by the directory iteration functions, so this function won’t gain anything there.</p> <section id="notes-on-caching"> <h3><a class="toc-backref" href="#notes-on-caching" role="doc-backlink">Notes on caching</a></h3> <p>The <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> objects are relatively dumb – the <code class="docutils literal notranslate"><span class="pre">name</span></code> and <code class="docutils literal notranslate"><span class="pre">path</span></code> attributes are obviously always cached, and the <code class="docutils literal notranslate"><span class="pre">is_X</span></code> and <code class="docutils literal notranslate"><span class="pre">stat</span></code> methods cache their values (immediately on Windows via <code class="docutils literal notranslate"><span class="pre">FindNextFile</span></code>, and on first use on POSIX systems via a <code class="docutils literal notranslate"><span class="pre">stat</span></code> system call) and never refetch from the system.</p> <p>For this reason, <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> objects are intended to be used and thrown away after iteration, not stored in long-lived data structured and the methods called again and again.</p> <p>If developers want “refresh” behaviour (for example, for watching a file’s size change), they can simply use <code class="docutils literal notranslate"><span class="pre">pathlib.Path</span></code> objects, or call the regular <code class="docutils literal notranslate"><span class="pre">os.stat()</span></code> or <code class="docutils literal notranslate"><span class="pre">os.path.getsize()</span></code> functions which get fresh data from the operating system every call.</p> </section> <section id="notes-on-exception-handling"> <h3><a class="toc-backref" href="#notes-on-exception-handling" role="doc-backlink">Notes on exception handling</a></h3> <p><code class="docutils literal notranslate"><span class="pre">DirEntry.is_X()</span></code> and <code class="docutils literal notranslate"><span class="pre">DirEntry.stat()</span></code> are explicitly methods rather than attributes or properties, to make it clear that they may not be cheap operations (although they often are), and they may do a system call. As a result, these methods may raise <code class="docutils literal notranslate"><span class="pre">OSError</span></code>.</p> <p>For example, <code class="docutils literal notranslate"><span class="pre">DirEntry.stat()</span></code> will always make a system call on POSIX-based systems, and the <code class="docutils literal notranslate"><span class="pre">DirEntry.is_X()</span></code> methods will make a <code class="docutils literal notranslate"><span class="pre">stat()</span></code> system call on such systems if <code class="docutils literal notranslate"><span class="pre">readdir()</span></code> does not support <code class="docutils literal notranslate"><span class="pre">d_type</span></code> or returns a <code class="docutils literal notranslate"><span class="pre">d_type</span></code> with a value of <code class="docutils literal notranslate"><span class="pre">DT_UNKNOWN</span></code>, which can occur under certain conditions or on certain file systems.</p> <p>Often this does not matter – for example, <code class="docutils literal notranslate"><span class="pre">os.walk()</span></code> as defined in the standard library only catches errors around the <code class="docutils literal notranslate"><span class="pre">listdir()</span></code> calls.</p> <p>Also, because the exception-raising behaviour of the <code class="docutils literal notranslate"><span class="pre">DirEntry.is_X</span></code> methods matches that of <code class="docutils literal notranslate"><span class="pre">pathlib</span></code> – which only raises <code class="docutils literal notranslate"><span class="pre">OSError</span></code> in the case of permissions or other fatal errors, but returns False if the path doesn’t exist or is a broken symlink – it’s often not necessary to catch errors around the <code class="docutils literal notranslate"><span class="pre">is_X()</span></code> calls.</p> <p>However, when a user requires fine-grained error handling, it may be desirable to catch <code class="docutils literal notranslate"><span class="pre">OSError</span></code> around all method calls and handle as appropriate.</p> <p>For example, below is a version of the <code class="docutils literal notranslate"><span class="pre">get_tree_size()</span></code> example shown above, but with fine-grained error handling added:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">get_tree_size</span><span class="p">(</span><span class="n">path</span><span class="p">):</span> <span class="w"> </span><span class="sd">&quot;&quot;&quot;Return total size of files in path and subdirs. If</span> <span class="sd"> is_dir() or stat() fails, print an error message to stderr</span> <span class="sd"> and assume zero size (for example, file has been deleted).</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">total</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">scandir</span><span class="p">(</span><span class="n">path</span><span class="p">):</span> <span class="k">try</span><span class="p">:</span> <span class="n">is_dir</span> <span class="o">=</span> <span class="n">entry</span><span class="o">.</span><span class="n">is_dir</span><span class="p">(</span><span class="n">follow_symlinks</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="k">except</span> <span class="ne">OSError</span> <span class="k">as</span> <span class="n">error</span><span class="p">:</span> <span class="nb">print</span><span class="p">(</span><span class="s1">&#39;Error calling is_dir():&#39;</span><span class="p">,</span> <span class="n">error</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span> <span class="k">continue</span> <span class="k">if</span> <span class="n">is_dir</span><span class="p">:</span> <span class="n">total</span> <span class="o">+=</span> <span class="n">get_tree_size</span><span class="p">(</span><span class="n">entry</span><span class="o">.</span><span class="n">path</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="k">try</span><span class="p">:</span> <span class="n">total</span> <span class="o">+=</span> <span class="n">entry</span><span class="o">.</span><span class="n">stat</span><span class="p">(</span><span class="n">follow_symlinks</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span><span class="o">.</span><span class="n">st_size</span> <span class="k">except</span> <span class="ne">OSError</span> <span class="k">as</span> <span class="n">error</span><span class="p">:</span> <span class="nb">print</span><span class="p">(</span><span class="s1">&#39;Error calling stat():&#39;</span><span class="p">,</span> <span class="n">error</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span> <span class="k">return</span> <span class="n">total</span> </pre></div> </div> </section> </section> <section id="support"> <h2><a class="toc-backref" href="#support" role="doc-backlink">Support</a></h2> <p>The scandir module on GitHub has been forked and used quite a bit (see “Use in the wild” in this PEP), but there’s also been a fair bit of direct support for a scandir-like function from core developers and others on the python-dev and python-ideas mailing lists. A sampling:</p> <ul class="simple"> <li><strong>python-dev</strong>: a good number of +1’s and very few negatives for scandir and <a class="pep reference internal" href="../pep-0471/" title="PEP 471 – os.scandir() function – a better and faster directory iterator">PEP 471</a> on <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2014-June/135217.html">this June 2014 python-dev thread</a></li> <li><strong>Alyssa Coghlan</strong>, a core Python developer: “I’ve had the local Red Hat release engineering team express their displeasure at having to stat every file in a network mounted directory tree for info that is present in the dirent structure, so a definite +1 to os.scandir from me, so long as it makes that info available.” [<a class="reference external" href="http://bugs.python.org/issue11406">source1</a>]</li> <li><strong>Tim Golden</strong>, a core Python developer, supports scandir enough to have spent time refactoring and significantly improving scandir’s C extension module. [<a class="reference external" href="https://github.com/tjguk/scandir">source2</a>]</li> <li><strong>Christian Heimes</strong>, a core Python developer: “+1 for something like yielddir()” [<a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2012-November/017772.html">source3</a>] and “Indeed! I’d like to see the feature in 3.4 so I can remove my own hack from our code base.” [<a class="reference external" href="http://bugs.python.org/issue11406">source4</a>]</li> <li><strong>Gregory P. Smith</strong>, a core Python developer: “As 3.4beta1 happens tonight, this isn’t going to make 3.4 so i’m bumping this to 3.5. I really like the proposed design outlined above.” [<a class="reference external" href="http://bugs.python.org/issue11406">source5</a>]</li> <li><strong>Guido van Rossum</strong> on the possibility of adding scandir to Python 3.5 (as it was too late for 3.4): “The ship has likewise sailed for adding scandir() (whether to os or pathlib). By all means experiment and get it ready for consideration for 3.5, but I don’t want to add it to 3.4.” [<a class="reference external" href="https://mail.python.org/pipermail/python-dev/2013-November/130583.html">source6</a>]</li> </ul> <p>Support for this PEP itself (meta-support?) was given by Alyssa (Nick) Coghlan on python-dev: “A PEP reviewing all this for 3.5 and proposing a specific os.scandir API would be a good thing.” [<a class="reference external" href="https://mail.python.org/pipermail/python-dev/2013-November/130588.html">source7</a>]</p> </section> <section id="use-in-the-wild"> <h2><a class="toc-backref" href="#use-in-the-wild" role="doc-backlink">Use in the wild</a></h2> <p>To date, the <code class="docutils literal notranslate"><span class="pre">scandir</span></code> implementation is definitely useful, but has been clearly marked “beta”, so it’s uncertain how much use of it there is in the wild. Ben Hoyt has had several reports from people using it. For example:</p> <ul class="simple"> <li>Chris F: “I am processing some pretty large directories and was half expecting to have to modify getdents. So thanks for saving me the effort.” [via personal email]</li> <li>bschollnick: “I wanted to let you know about this, since I am using Scandir as a building block for this code. Here’s a good example of scandir making a radical performance improvement over os.listdir.” [<a class="reference external" href="https://github.com/benhoyt/scandir/issues/19">source8</a>]</li> <li>Avram L: “I’m testing our scandir for a project I’m working on. Seems pretty solid, so first thing, just want to say nice work!” [via personal email]</li> <li>Matt Z: “I used scandir to dump the contents of a network dir in under 15 seconds. 13 root dirs, 60,000 files in the structure. This will replace some old VBA code embedded in a spreadsheet that was taking 15-20 minutes to do the exact same thing.” [via personal email]</li> </ul> <p>Others have <a class="reference external" href="https://github.com/benhoyt/scandir/issues/12">requested a PyPI package</a> for it, which has been created. See <a class="reference external" href="https://pypi.python.org/pypi/scandir">PyPI package</a>.</p> <p>GitHub stats don’t mean too much, but scandir does have several watchers, issues, forks, etc. Here’s the run-down as of the stats as of July 7, 2014:</p> <ul class="simple"> <li>Watchers: 17</li> <li>Stars: 57</li> <li>Forks: 20</li> <li>Issues: 4 open, 26 closed</li> </ul> <p>Also, because this PEP will increase the speed of <code class="docutils literal notranslate"><span class="pre">os.walk()</span></code> significantly, there are thousands of developers and scripts, and a lot of production code, that would benefit from it. For example, on GitHub, there are almost as many uses of <code class="docutils literal notranslate"><span class="pre">os.walk</span></code> (194,000) as there are of <code class="docutils literal notranslate"><span class="pre">os.mkdir</span></code> (230,000).</p> </section> <section id="rejected-ideas"> <h2><a class="toc-backref" href="#rejected-ideas" role="doc-backlink">Rejected ideas</a></h2> <section id="naming"> <h3><a class="toc-backref" href="#naming" role="doc-backlink">Naming</a></h3> <p>The only other real contender for this function’s name was <code class="docutils literal notranslate"><span class="pre">iterdir()</span></code>. However, <code class="docutils literal notranslate"><span class="pre">iterX()</span></code> functions in Python (mostly found in Python 2) tend to be simple iterator equivalents of their non-iterator counterparts. For example, <code class="docutils literal notranslate"><span class="pre">dict.iterkeys()</span></code> is just an iterator version of <code class="docutils literal notranslate"><span class="pre">dict.keys()</span></code>, but the objects returned are identical. In <code class="docutils literal notranslate"><span class="pre">scandir()</span></code>’s case, however, the return values are quite different objects (<code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> objects vs filename strings), so this should probably be reflected by a difference in name – hence <code class="docutils literal notranslate"><span class="pre">scandir()</span></code>.</p> <p>See some <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2014-June/135228.html">relevant discussion on python-dev</a>.</p> </section> <section id="wildcard-support"> <h3><a class="toc-backref" href="#wildcard-support" role="doc-backlink">Wildcard support</a></h3> <p><code class="docutils literal notranslate"><span class="pre">FindFirstFile</span></code>/<code class="docutils literal notranslate"><span class="pre">FindNextFile</span></code> on Windows support passing a “wildcard” like <code class="docutils literal notranslate"><span class="pre">*.jpg</span></code>, so at first folks (this PEP’s author included) felt it would be a good idea to include a <code class="docutils literal notranslate"><span class="pre">windows_wildcard</span></code> keyword argument to the <code class="docutils literal notranslate"><span class="pre">scandir</span></code> function so users could pass this in.</p> <p>However, on further thought and discussion it was decided that this would be bad idea, <em>unless it could be made cross-platform</em> (a <code class="docutils literal notranslate"><span class="pre">pattern</span></code> keyword argument or similar). This seems easy enough at first – just use the OS wildcard support on Windows, and something like <code class="docutils literal notranslate"><span class="pre">fnmatch</span></code> or <code class="docutils literal notranslate"><span class="pre">re</span></code> afterwards on POSIX-based systems.</p> <p>Unfortunately the exact Windows wildcard matching rules aren’t really documented anywhere by Microsoft, and they’re quite quirky (see this <a class="reference external" href="http://blogs.msdn.com/b/oldnewthing/archive/2007/12/17/6785519.aspx">blog post</a>), meaning it’s very problematic to emulate using <code class="docutils literal notranslate"><span class="pre">fnmatch</span></code> or regexes.</p> <p>So the consensus was that Windows wildcard support was a bad idea. It would be possible to add at a later date if there’s a cross-platform way to achieve it, but not for the initial version.</p> <p>Read more on the <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2012-November/017770.html">this Nov 2012 python-ideas thread</a> and this <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2014-June/135217.html">June 2014 python-dev thread on PEP 471</a>.</p> </section> <section id="methods-not-following-symlinks-by-default"> <h3><a class="toc-backref" href="#methods-not-following-symlinks-by-default" role="doc-backlink">Methods not following symlinks by default</a></h3> <p>There was much debate on python-dev (see messages in <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2014-July/135485.html">this thread</a>) over whether the <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> methods should follow symbolic links or not (when the <code class="docutils literal notranslate"><span class="pre">is_X()</span></code> methods had no <code class="docutils literal notranslate"><span class="pre">follow_symlinks</span></code> parameter).</p> <p>Initially they did not (see previous versions of this PEP and the scandir.py module), but Victor Stinner made a pretty compelling case on python-dev that following symlinks by default is a better idea, because:</p> <ul class="simple"> <li>following links is usually what you want (in 92% of cases in the standard library, functions using <code class="docutils literal notranslate"><span class="pre">os.listdir()</span></code> and <code class="docutils literal notranslate"><span class="pre">os.path.isdir()</span></code> do follow symlinks)</li> <li>that’s the precedent set by the similar functions <code class="docutils literal notranslate"><span class="pre">os.path.isdir()</span></code> and <code class="docutils literal notranslate"><span class="pre">pathlib.Path.is_dir()</span></code>, so to do otherwise would be confusing</li> <li>with the non-link-following approach, if you wanted to follow links you’d have to say something like <code class="docutils literal notranslate"><span class="pre">if</span> <span class="pre">(entry.is_symlink()</span> <span class="pre">and</span> <span class="pre">os.path.isdir(entry.path))</span> <span class="pre">or</span> <span class="pre">entry.is_dir()</span></code>, which is clumsy</li> </ul> <p>As a case in point that shows the non-symlink-following version is error prone, this PEP’s author had a bug caused by getting this exact test wrong in his initial implementation of <code class="docutils literal notranslate"><span class="pre">scandir.walk()</span></code> in scandir.py (see <a class="reference external" href="https://github.com/benhoyt/scandir/issues/4">Issue #4 here</a>).</p> <p>In the end there was not total agreement that the methods should follow symlinks, but there was basic consensus among the most involved participants, and this PEP’s author believes that the above case is strong enough to warrant following symlinks by default.</p> <p>In addition, it’s straightforward to call the relevant methods with <code class="docutils literal notranslate"><span class="pre">follow_symlinks=False</span></code> if the other behaviour is desired.</p> </section> <section id="direntry-attributes-being-properties"> <h3><a class="toc-backref" href="#direntry-attributes-being-properties" role="doc-backlink">DirEntry attributes being properties</a></h3> <p>In some ways it would be nicer for the <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> <code class="docutils literal notranslate"><span class="pre">is_X()</span></code> and <code class="docutils literal notranslate"><span class="pre">stat()</span></code> to be properties instead of methods, to indicate they’re very cheap or free. However, this isn’t quite the case, as <code class="docutils literal notranslate"><span class="pre">stat()</span></code> will require an OS call on POSIX-based systems but not on Windows. Even <code class="docutils literal notranslate"><span class="pre">is_dir()</span></code> and friends may perform an OS call on POSIX-based systems if the <code class="docutils literal notranslate"><span class="pre">dirent.d_type</span></code> value is <code class="docutils literal notranslate"><span class="pre">DT_UNKNOWN</span></code> (on certain file systems).</p> <p>Also, people would expect the attribute access <code class="docutils literal notranslate"><span class="pre">entry.is_dir</span></code> to only ever raise <code class="docutils literal notranslate"><span class="pre">AttributeError</span></code>, not <code class="docutils literal notranslate"><span class="pre">OSError</span></code> in the case it makes a system call under the covers. Calling code would have to have a <code class="docutils literal notranslate"><span class="pre">try</span></code>/<code class="docutils literal notranslate"><span class="pre">except</span></code> around what looks like a simple attribute access, and so it’s much better to make them <em>methods</em>.</p> <p>See <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2013-May/126184.html">this May 2013 python-dev thread</a> where this PEP author makes this case and there’s agreement from a core developers.</p> </section> <section id="direntry-fields-being-static-attribute-only-objects"> <h3><a class="toc-backref" href="#direntry-fields-being-static-attribute-only-objects" role="doc-backlink">DirEntry fields being “static” attribute-only objects</a></h3> <p>In <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2014-July/135303.html">this July 2014 python-dev message</a>, Paul Moore suggested a solution that was a “thin wrapper round the OS feature”, where the <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> object had only static attributes: <code class="docutils literal notranslate"><span class="pre">name</span></code>, <code class="docutils literal notranslate"><span class="pre">path</span></code>, and <code class="docutils literal notranslate"><span class="pre">is_X</span></code>, with the <code class="docutils literal notranslate"><span class="pre">st_X</span></code> attributes only present on Windows. The idea was to use this simpler, lower-level function as a building block for higher-level functions.</p> <p>At first there was general agreement that simplifying in this way was a good thing. However, there were two problems with this approach. First, the assumption is the <code class="docutils literal notranslate"><span class="pre">is_dir</span></code> and similar attributes are always present on POSIX, which isn’t the case (if <code class="docutils literal notranslate"><span class="pre">d_type</span></code> is not present or is <code class="docutils literal notranslate"><span class="pre">DT_UNKNOWN</span></code>). Second, it’s a much harder-to-use API in practice, as even the <code class="docutils literal notranslate"><span class="pre">is_dir</span></code> attributes aren’t always present on POSIX, and would need to be tested with <code class="docutils literal notranslate"><span class="pre">hasattr()</span></code> and then <code class="docutils literal notranslate"><span class="pre">os.stat()</span></code> called if they weren’t present.</p> <p>See <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2014-July/135312.html">this July 2014 python-dev response</a> from this PEP’s author detailing why this option is a non-ideal solution, and the subsequent reply from Paul Moore voicing agreement.</p> </section> <section id="direntry-fields-being-static-with-an-ensure-lstat-option"> <h3><a class="toc-backref" href="#direntry-fields-being-static-with-an-ensure-lstat-option" role="doc-backlink">DirEntry fields being static with an ensure_lstat option</a></h3> <p>Another seemingly simpler and attractive option was suggested by Alyssa Coghlan in this <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2014-June/135261.html">June 2014 python-dev message</a>: make <code class="docutils literal notranslate"><span class="pre">DirEntry.is_X</span></code> and <code class="docutils literal notranslate"><span class="pre">DirEntry.lstat_result</span></code> properties, and populate <code class="docutils literal notranslate"><span class="pre">DirEntry.lstat_result</span></code> at iteration time, but only if the new argument <code class="docutils literal notranslate"><span class="pre">ensure_lstat=True</span></code> was specified on the <code class="docutils literal notranslate"><span class="pre">scandir()</span></code> call.</p> <p>This does have the advantage over the above in that you can easily get the stat result from <code class="docutils literal notranslate"><span class="pre">scandir()</span></code> if you need it. However, it has the serious disadvantage that fine-grained error handling is messy, because <code class="docutils literal notranslate"><span class="pre">stat()</span></code> will be called (and hence potentially raise <code class="docutils literal notranslate"><span class="pre">OSError</span></code>) during iteration, leading to a rather ugly, hand-made iteration loop:</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">it</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">scandir</span><span class="p">(</span><span class="n">path</span><span class="p">)</span> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span> <span class="k">try</span><span class="p">:</span> <span class="n">entry</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="n">it</span><span class="p">)</span> <span class="k">except</span> <span class="ne">OSError</span> <span class="k">as</span> <span class="n">error</span><span class="p">:</span> <span class="n">handle_error</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">error</span><span class="p">)</span> <span class="k">except</span> <span class="ne">StopIteration</span><span class="p">:</span> <span class="k">break</span> </pre></div> </div> <p>Or it means that <code class="docutils literal notranslate"><span class="pre">scandir()</span></code> would have to accept an <code class="docutils literal notranslate"><span class="pre">onerror</span></code> argument – a function to call when <code class="docutils literal notranslate"><span class="pre">stat()</span></code> errors occur during iteration. This seems to this PEP’s author neither as direct nor as Pythonic as <code class="docutils literal notranslate"><span class="pre">try</span></code>/<code class="docutils literal notranslate"><span class="pre">except</span></code> around a <code class="docutils literal notranslate"><span class="pre">DirEntry.stat()</span></code> call.</p> <p>Another drawback is that <code class="docutils literal notranslate"><span class="pre">os.scandir()</span></code> is written to make code faster. Always calling <code class="docutils literal notranslate"><span class="pre">os.lstat()</span></code> on POSIX would not bring any speedup. In most cases, you don’t need the full <code class="docutils literal notranslate"><span class="pre">stat_result</span></code> object – the <code class="docutils literal notranslate"><span class="pre">is_X()</span></code> methods are enough and this information is already known.</p> <p>See <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2014-July/135312.html">Ben Hoyt’s July 2014 reply</a> to the discussion summarizing this and detailing why he thinks the original <a class="pep reference internal" href="../pep-0471/" title="PEP 471 – os.scandir() function – a better and faster directory iterator">PEP 471</a> proposal is “the right one” after all.</p> </section> <section id="return-values-being-name-stat-result-two-tuples"> <h3><a class="toc-backref" href="#return-values-being-name-stat-result-two-tuples" role="doc-backlink">Return values being (name, stat_result) two-tuples</a></h3> <p>Initially this PEP’s author proposed this concept as a function called <code class="docutils literal notranslate"><span class="pre">iterdir_stat()</span></code> which yielded two-tuples of (name, stat_result). This does have the advantage that there are no new types introduced. However, the <code class="docutils literal notranslate"><span class="pre">stat_result</span></code> is only partially filled on POSIX-based systems (most fields set to <code class="docutils literal notranslate"><span class="pre">None</span></code> and other quirks), so they’re not really <code class="docutils literal notranslate"><span class="pre">stat_result</span></code> objects at all, and this would have to be thoroughly documented as different from <code class="docutils literal notranslate"><span class="pre">os.stat()</span></code>.</p> <p>Also, Python has good support for proper objects with attributes and methods, which makes for a saner and simpler API than two-tuples. It also makes the <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> objects more extensible and future-proof as operating systems add functionality and we want to include this in <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code>.</p> <p>See also some previous discussion:</p> <ul class="simple"> <li><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2013-May/126148.html">May 2013 python-dev thread</a> where Alyssa Coghlan makes the original case for a <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code>-style object.</li> <li><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2014-June/135244.html">June 2014 python-dev thread</a> where Alyssa Coghlan makes (another) good case against the two-tuple approach.</li> </ul> </section> <section id="return-values-being-overloaded-stat-result-objects"> <h3><a class="toc-backref" href="#return-values-being-overloaded-stat-result-objects" role="doc-backlink">Return values being overloaded stat_result objects</a></h3> <p>Another alternative discussed was making the return values to be overloaded <code class="docutils literal notranslate"><span class="pre">stat_result</span></code> objects with <code class="docutils literal notranslate"><span class="pre">name</span></code> and <code class="docutils literal notranslate"><span class="pre">path</span></code> attributes. However, apart from this being a strange (and strained!) kind of overloading, this has the same problems mentioned above – most of the <code class="docutils literal notranslate"><span class="pre">stat_result</span></code> information is not fetched by <code class="docutils literal notranslate"><span class="pre">readdir()</span></code> on POSIX systems, only (part of) the <code class="docutils literal notranslate"><span class="pre">st_mode</span></code> value.</p> </section> <section id="return-values-being-pathlib-path-objects"> <h3><a class="toc-backref" href="#return-values-being-pathlib-path-objects" role="doc-backlink">Return values being pathlib.Path objects</a></h3> <p>With Antoine Pitrou’s new standard library <code class="docutils literal notranslate"><span class="pre">pathlib</span></code> module, it at first seems like a great idea for <code class="docutils literal notranslate"><span class="pre">scandir()</span></code> to return instances of <code class="docutils literal notranslate"><span class="pre">pathlib.Path</span></code>. However, <code class="docutils literal notranslate"><span class="pre">pathlib.Path</span></code>’s <code class="docutils literal notranslate"><span class="pre">is_X()</span></code> and <code class="docutils literal notranslate"><span class="pre">stat()</span></code> functions are explicitly not cached, whereas <code class="docutils literal notranslate"><span class="pre">scandir</span></code> has to cache them by design, because it’s (often) returning values from the original directory iteration system call.</p> <p>And if the <code class="docutils literal notranslate"><span class="pre">pathlib.Path</span></code> instances returned by <code class="docutils literal notranslate"><span class="pre">scandir</span></code> cached stat values, but the ordinary <code class="docutils literal notranslate"><span class="pre">pathlib.Path</span></code> objects explicitly don’t, that would be more than a little confusing.</p> <p>Guido van Rossum explicitly rejected <code class="docutils literal notranslate"><span class="pre">pathlib.Path</span></code> caching stat in the context of scandir <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2013-November/130583.html">here</a>, making <code class="docutils literal notranslate"><span class="pre">pathlib.Path</span></code> objects a bad choice for scandir return values.</p> </section> </section> <section id="possible-improvements"> <h2><a class="toc-backref" href="#possible-improvements" role="doc-backlink">Possible improvements</a></h2> <p>There are many possible improvements one could make to scandir, but here is a short list of some this PEP’s author has in mind:</p> <ul class="simple"> <li>scandir could potentially be further sped up by calling <code class="docutils literal notranslate"><span class="pre">readdir</span></code> / <code class="docutils literal notranslate"><span class="pre">FindNextFile</span></code> say 50 times per <code class="docutils literal notranslate"><span class="pre">Py_BEGIN_ALLOW_THREADS</span></code> block so that it stays in the C extension module for longer, and may be somewhat faster as a result. This approach hasn’t been tested, but was suggested by on Issue 11406 by Antoine Pitrou. [<a class="reference external" href="http://bugs.python.org/msg130125">source9</a>]</li> <li>scandir could use a free list to avoid the cost of memory allocation for each iteration – a short free list of 10 or maybe even 1 may help. Suggested by Victor Stinner on a <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2014-June/135232.html">python-dev thread on June 27</a>.</li> </ul> </section> <section id="previous-discussion"> <h2><a class="toc-backref" href="#previous-discussion" role="doc-backlink">Previous discussion</a></h2> <ul class="simple"> <li><a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2012-November/017770.html">Original November 2012 thread Ben Hoyt started on python-ideas</a> about speeding up <code class="docutils literal notranslate"><span class="pre">os.walk()</span></code></li> <li>Python <a class="reference external" href="http://bugs.python.org/issue11406">Issue 11406</a>, which includes the original proposal for a scandir-like function</li> <li><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2013-May/126119.html">Further May 2013 thread Ben Hoyt started on python-dev</a> that refined the <code class="docutils literal notranslate"><span class="pre">scandir()</span></code> API, including Alyssa Coghlan’s suggestion of scandir yielding <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code>-like objects</li> <li><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2013-November/130572.html">November 2013 thread Ben Hoyt started on python-dev</a> to discuss the interaction between scandir and the new <code class="docutils literal notranslate"><span class="pre">pathlib</span></code> module</li> <li><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2014-June/135215.html">June 2014 thread Ben Hoyt started on python-dev</a> to discuss the first version of this PEP, with extensive discussion about the API</li> <li><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2014-July/135377.html">First July 2014 thread Ben Hoyt started on python-dev</a> to discuss his updates to <a class="pep reference internal" href="../pep-0471/" title="PEP 471 – os.scandir() function – a better and faster directory iterator">PEP 471</a></li> <li><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2014-July/135485.html">Second July 2014 thread Ben Hoyt started on python-dev</a> to discuss the remaining decisions needed to finalize <a class="pep reference internal" href="../pep-0471/" title="PEP 471 – os.scandir() function – a better and faster directory iterator">PEP 471</a>, specifically whether the <code class="docutils literal notranslate"><span class="pre">DirEntry</span></code> methods should follow symlinks by default</li> <li><a class="reference external" href="http://stackoverflow.com/questions/2485719/very-quickly-getting-total-size-of-folder">Question on StackOverflow</a> about why <code class="docutils literal notranslate"><span class="pre">os.walk()</span></code> is slow and pointers on how to fix it (this inspired the author of this PEP early on)</li> <li><a class="reference external" href="https://github.com/benhoyt/betterwalk">BetterWalk</a>, this PEP’s author’s previous attempt at this, on which the scandir code is based</li> </ul> </section> <section id="copyright"> <h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2> <p>This document has been placed in the public domain.</p> </section> </section> <hr class="docutils" /> <p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-0471.rst">https://github.com/python/peps/blob/main/peps/pep-0471.rst</a></p> <p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-0471.rst">2023-10-11 12:05:51 GMT</a></p> </article> <nav id="pep-sidebar"> <h2>Contents</h2> <ul> <li><a class="reference internal" href="#abstract">Abstract</a></li> <li><a class="reference internal" href="#rationale">Rationale</a></li> <li><a class="reference internal" href="#implementation">Implementation</a></li> <li><a class="reference internal" href="#specifics-of-proposal">Specifics of proposal</a><ul> <li><a class="reference internal" href="#os-scandir">os.scandir()</a></li> <li><a class="reference internal" href="#os-walk">os.walk()</a></li> </ul> </li> <li><a class="reference internal" href="#examples">Examples</a><ul> <li><a class="reference internal" href="#notes-on-caching">Notes on caching</a></li> <li><a class="reference internal" href="#notes-on-exception-handling">Notes on exception handling</a></li> </ul> </li> <li><a class="reference internal" href="#support">Support</a></li> <li><a class="reference internal" href="#use-in-the-wild">Use in the wild</a></li> <li><a class="reference internal" href="#rejected-ideas">Rejected ideas</a><ul> <li><a class="reference internal" href="#naming">Naming</a></li> <li><a class="reference internal" href="#wildcard-support">Wildcard support</a></li> <li><a class="reference internal" href="#methods-not-following-symlinks-by-default">Methods not following symlinks by default</a></li> <li><a class="reference internal" href="#direntry-attributes-being-properties">DirEntry attributes being properties</a></li> <li><a class="reference internal" href="#direntry-fields-being-static-attribute-only-objects">DirEntry fields being “static” attribute-only objects</a></li> <li><a class="reference internal" href="#direntry-fields-being-static-with-an-ensure-lstat-option">DirEntry fields being static with an ensure_lstat option</a></li> <li><a class="reference internal" href="#return-values-being-name-stat-result-two-tuples">Return values being (name, stat_result) two-tuples</a></li> <li><a class="reference internal" href="#return-values-being-overloaded-stat-result-objects">Return values being overloaded stat_result objects</a></li> <li><a class="reference internal" href="#return-values-being-pathlib-path-objects">Return values being pathlib.Path objects</a></li> </ul> </li> <li><a class="reference internal" href="#possible-improvements">Possible improvements</a></li> <li><a class="reference internal" href="#previous-discussion">Previous discussion</a></li> <li><a class="reference internal" href="#copyright">Copyright</a></li> </ul> <br> <a id="source" href="https://github.com/python/peps/blob/main/peps/pep-0471.rst">Page Source (GitHub)</a> </nav> </section> <script src="../_static/colour_scheme.js"></script> <script src="../_static/wrap_tables.js"></script> <script src="../_static/sticky_banner.js"></script> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10