CINXE.COM
Why isn't source distribution metadata trustworthy? Can we make it so? - Packaging - Discussions on Python.org
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <title>Why isn't source distribution metadata trustworthy? Can we make it so? - Packaging - Discussions on Python.org</title> <meta name="description" content="PEP 517 backends and setuptools (as used in setup.py) generate source distributions containing a PKG-INFO file, which should contain the metadata associated with the package. Currently this information is not used in pi&hellip;"> <meta name="generator" content="Discourse 3.4.0.beta3-dev - https://github.com/discourse/discourse version 5bf5d1335680f28a8eb65c488353be9585eed08e"> <link rel="icon" type="image/png" href="https://global.discourse-cdn.com/flex016/uploads/python1/optimized/1X/9997f0605d56c4bfecd63594f52f42cdafd6b06a_2_32x32.png"> <link rel="apple-touch-icon" type="image/png" href="https://global.discourse-cdn.com/flex016/uploads/python1/optimized/1X/4c06143de7870c35963b818b15b395092a434991_2_180x180.png"> <meta name="theme-color" media="(prefers-color-scheme: light)" content="#ffffff"> <meta name="theme-color" media="(prefers-color-scheme: dark)" content="#111111"> <meta name="viewport" content="width=device-width, initial-scale=1.0, minimum-scale=1.0, user-scalable=yes, viewport-fit=cover"> <link rel="canonical" href="https://discuss.python.org/t/why-isnt-source-distribution-metadata-trustworthy-can-we-make-it-so/2620" /> <link rel="search" type="application/opensearchdescription+xml" href="https://discuss.python.org/opensearch.xml" title="Discussions on Python.org Search"> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/color_definitions_base__2_89bb3b818d8860c6da830055d8964bc4f806db7f.css?__ws=discuss.python.org" media="all" rel="stylesheet" class="light-scheme"/><link href="https://sea2.discourse-cdn.com/flex016/stylesheets/color_definitions_dark_1_2_0ef303cf72b1435fdf680c3db5c67cce60860418.css?__ws=discuss.python.org" media="(prefers-color-scheme: dark)" rel="stylesheet" class="dark-scheme"/> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/desktop_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="desktop" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/checklist_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="checklist" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-adplugin_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-adplugin" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-akismet_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-akismet" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-cakeday_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-cakeday" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-chat-integration_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-chat-integration" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-details_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-details" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-lazy-videos_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-lazy-videos" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-local-dates_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-local-dates" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-math_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-math" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-narrative-bot_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-narrative-bot" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-policy_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-policy" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-presence_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-presence" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-solved_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-solved" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-templates_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-templates" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-topic-voting_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-topic-voting" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-user-notes_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-user-notes" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-yearly-review_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-yearly-review" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/footnote_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="footnote" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/hosted-site_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="hosted-site" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/poll_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="poll" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/spoiler-alert_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="spoiler-alert" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-topic-voting_desktop_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-topic-voting_desktop" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/poll_desktop_ae2c05eb022973cfd87e169e0799a6bd34290cdc.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="poll_desktop" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/desktop_theme_4_93427783b0a70199f2e06d65a8eb2ca5c365dbcd.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="desktop_theme" data-theme-id="4" data-theme-name="unformatted code detector"/> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/desktop_theme_2_5372499f761ae877fe1ed2c103226d632862786d.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="desktop_theme" data-theme-id="2" data-theme-name="light"/> <link rel="alternate nofollow" type="application/rss+xml" title="RSS feed of 'Why isn't source distribution metadata trustworthy? Can we make it so?'" href="https://discuss.python.org/t/why-isnt-source-distribution-metadata-trustworthy-can-we-make-it-so/2620.rss" /> <meta property="og:site_name" content="Discussions on Python.org" /> <meta property="og:type" content="website" /> <meta name="twitter:card" content="summary" /> <meta name="twitter:image" content="https://global.discourse-cdn.com/flex016/uploads/python1/original/1X/f93ff97c4f381b5e8add5a0c163b4ded29f20ed7.png" /> <meta property="og:image" content="https://global.discourse-cdn.com/flex016/uploads/python1/original/1X/f93ff97c4f381b5e8add5a0c163b4ded29f20ed7.png" /> <meta property="og:url" content="https://discuss.python.org/t/why-isnt-source-distribution-metadata-trustworthy-can-we-make-it-so/2620" /> <meta name="twitter:url" content="https://discuss.python.org/t/why-isnt-source-distribution-metadata-trustworthy-can-we-make-it-so/2620" /> <meta property="og:title" content="Why isn't source distribution metadata trustworthy? Can we make it so?" /> <meta name="twitter:title" content="Why isn't source distribution metadata trustworthy? Can we make it so?" /> <meta property="og:description" content="PEP 517 backends and setuptools (as used in setup.py) generate source distributions containing a PKG-INFO file, which should contain the metadata associated with the package. Currently this information is not used in pip, which opts to get the metadata from the build system. This involves either: For PEP 517: setup the backend execution environment (possibly installing multiple packages in the process) and executing prepare_metadata_for_build_wheel in a subprocess - which may involve building..." /> <meta name="twitter:description" content="PEP 517 backends and setuptools (as used in setup.py) generate source distributions containing a PKG-INFO file, which should contain the metadata associated with the package. Currently this information is not used in pip, which opts to get the metadata from the build system. This involves either: For PEP 517: setup the backend execution environment (possibly installing multiple packages in the process) and executing prepare_metadata_for_build_wheel in a subprocess - which may involve building..." /> <meta property="og:article:section" content="Packaging" /> <meta property="og:article:section:color" content="ED207B" /> <meta name="twitter:label1" value="Reading time" /> <meta name="twitter:data1" value="11 mins 🕑" /> <meta name="twitter:label2" value="Likes" /> <meta name="twitter:data2" value="14 ❤" /> <meta property="article:published_time" content="2019-11-08T01:56:47+00:00" /> <meta property="og:ignore_canonical" content="true" /> <link rel="next" href="/t/why-isnt-source-distribution-metadata-trustworthy-can-we-make-it-so/2620?page=2"> </head> <body class="crawler browser-update"> <header> <a href="/"> Discussions on Python.org </a> </header> <div id="main-outlet" class="wrap" role="main"> <div id="topic-title"> <h1> <a href="/t/why-isnt-source-distribution-metadata-trustworthy-can-we-make-it-so/2620">Why isn't source distribution metadata trustworthy? Can we make it so?</a> </h1> <div class="topic-category" itemscope itemtype="http://schema.org/BreadcrumbList"> <span itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem"> <a href="/c/packaging/14" class="badge-wrapper bullet" itemprop="item"> <span class='badge-category-bg' style='background-color: #ED207B'></span> <span class='badge-category clear-badge'> <span class='category-name' itemprop='name'>Packaging</span> </span> </a> <meta itemprop="position" content="1" /> </span> </div> </div> <div itemscope itemtype='http://schema.org/DiscussionForumPosting'> <meta itemprop='headline' content='Why isn't source distribution metadata trustworthy? Can we make it so?'> <link itemprop='url' href='https://discuss.python.org/t/why-isnt-source-distribution-metadata-trustworthy-can-we-make-it-so/2620'> <meta itemprop='datePublished' content='2019-11-08T01:56:47Z'> <meta itemprop='articleSection' content='Packaging'> <meta itemprop='keywords' content=''> <div itemprop='publisher' itemscope itemtype="http://schema.org/Organization"> <meta itemprop='name' content='Python Software Foundation'> <div itemprop='logo' itemscope itemtype="http://schema.org/ImageObject"> <meta itemprop='url' content='https://global.discourse-cdn.com/flex016/uploads/python1/original/1X/c7591c98caf3b31d4d9c6f322f41ed9d80a50800.png'> </div> </div> <div id='post_1' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/chrahunt'><span itemprop='name'>chrahunt</span></a> (Chris Hunt) </span> <link itemprop="mainEntityOfPage" href="https://discuss.python.org/t/why-isnt-source-distribution-metadata-trustworthy-can-we-make-it-so/2620"> <span class="crawler-post-infos"> <time datetime='2019-11-08T01:56:47Z' class='post-time'> November 8, 2019, 1:56am </time> <meta itemprop='dateModified' content='2019-11-08T01:56:47Z'> <span itemprop='position'>1</span> </span> </div> <div class='post' itemprop='text'> <p>PEP 517 backends and setuptools (as used in <code>setup.py</code>) generate source distributions containing a <code>PKG-INFO</code> file, which should contain the metadata associated with the package. Currently this information is not used in pip, which opts to get the metadata from the build system. This involves either:</p> <ol> <li>For PEP 517: setup the backend execution environment (possibly installing multiple packages in the process) and executing <a href="https://www.python.org/dev/peps/pep-0517/#prepare-metadata-for-build-wheel" rel="nofollow noopener"><code>prepare_metadata_for_build_wheel</code></a> in a subprocess - which may involve building a wheel if the hook is not executed</li> <li>For legacy <code>setup.py</code> packages: run <code>setup.py egg_info</code> in a subprocess</li> </ol> <p>The reason we do this is because currently there is no guarantee that <code>PKG-INFO</code> is complete. This is trivially confirmed by inspecting e.g. <code>requests-2.22.0/PKG-INFO</code> from <a href="https://files.pythonhosted.org/packages/01/62/ddcf76d1d19885e8579acb1b1df26a852b03472c0e46d2b959a714c90608/requests-2.22.0.tar.gz" rel="nofollow noopener"><code>requests-2.22.0.tar.gz</code></a> vs <code>requests-2.22.0.dist-info/METADATA</code> in <a href="https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl" rel="nofollow noopener"><code>requests-2.22.0-py2.py3-none-any.whl</code></a>. The former is missing several important fields, like <code>Requires-Dist</code>, which are in the latter.</p> <p>I would like to add a field to the allowed package metadata of source distributions that would signal to metadata processors that the backend does not need to be consulted. Example spelling: <code>Metadata-Covers: all</code> or <code>Metadata-Covers: Name, Version, Requires-Dist, Requires-Python</code>.</p> <p>This would enable tools like pip to avoid the overhead of creating and tearing down build environments and doing subprocess invocations. The possible benefits increase when considering the upcoming dependency resolver, which may need to download and query metadata for multiple versions of each project.</p> <p>I checked <a href="https://www.python.org/dev/peps/pep-0566/" rel="nofollow noopener">PEP 566</a> and the <a href="https://packaging.python.org/specifications/core-metadata/" rel="nofollow noopener">Core metadata specification</a> but didn’t see any such field listed. I’m sure this would have been discussed as part of the development of PEP 517/518, but searching here and distutils-sig didn’t turn up anything specifically mentioning this case (of threads <a href="https://mail.python.org/archives/list/distutils-sig@python.org/thread/B6EFY34QU7X62GDUGTFW76AZ5ZBTCZBK/" rel="nofollow noopener">1</a>, <a href="https://mail.python.org/archives/list/distutils-sig@python.org/thread/BDAOTQGPNS7MH4MUISK3FNX675ODHEZP/" rel="nofollow noopener">2</a>, <a href="https://mail.python.org/archives/list/distutils-sig@python.org/thread/VQF7XPY36PEQJXMQ4TAPQ6KIQFOB6EC7/" rel="nofollow noopener">3</a>, <a href="https://mail.python.org/archives/list/distutils-sig@python.org/thread/5R2DFBGZQCDR2ZBLCLQU5RRLZNZL365R/" rel="nofollow noopener">4</a>).</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="4" /> <span class='post-likes'>4 Likes</span> </div> <div class='crawler-linkback-list' itemscope itemtype='http://schema.org/ItemList'> <div itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <a itemprop='url' href="https://discuss.python.org/t/standardized-way-for-receiving-dependencies/3821/9">Standardized way for receiving dependencies</a> <meta itemprop='position' content='10'> </div> </div> </div> <div id='post_2' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/sbidoul'><span itemprop='name'>sbidoul</span></a> (Stéphane Bidoul) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2019-11-08T10:05:50Z' class='post-time'> November 8, 2019, 10:05am </time> <meta itemprop='dateModified' content='2019-11-08T10:05:50Z'> <span itemprop='position'>2</span> </span> </div> <div class='post' itemprop='text'> <p>Is obtaining metadata from requirements expressed as VCS references (git+https://…) considered a related question?</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_3' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/pf_moore'><span itemprop='name'>pf_moore</span></a> (Paul Moore) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2019-11-08T11:19:38Z' class='post-time'> November 8, 2019, 11:19am </time> <meta itemprop='dateModified' content='2019-11-08T11:19:38Z'> <span itemprop='position'>3</span> </span> </div> <div class='post' itemprop='text'> <p>I would assume not, as there’s no already-completed “build” process for such requirements. So we <em>have</em> to assume such data is untrusted, as there’s been no chance to validate it. (Yes, we can say that projects should create a file xxx that contains metadata in this format, but without knowing that the file has been validated, tools can’t rely on it).</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_4' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/FRidh'><span itemprop='name'>FRidh</span></a> (Frederik Rietdijk) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2019-11-08T14:44:48Z' class='post-time'> November 8, 2019, 2:44pm </time> <meta itemprop='dateModified' content='2019-11-08T14:44:48Z'> <span itemprop='position'>4</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote no-group" data-username="chrahunt" data-post="1" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/chrahunt/48/947_2.png" class="avatar"> chrahunt:</div> <blockquote> <p>The reason we do this is because currently there is no guarantee that <code>PKG-INFO</code> is complete. This is trivially confirmed by inspecting e.g. <code>requests-2.22.0/PKG-INFO</code> from <a href="https://files.pythonhosted.org/packages/01/62/ddcf76d1d19885e8579acb1b1df26a852b03472c0e46d2b959a714c90608/requests-2.22.0.tar.gz" rel="noopener nofollow ugc"> <code>requests-2.22.0.tar.gz</code> </a> vs <code>requests-2.22.0.dist-info/METADATA</code> in <a href="https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl" rel="noopener nofollow ugc"> <code>requests-2.22.0-py2.py3-none-any.whl</code> </a>. The former is missing several important fields, like <code>Requires-Dist</code> , which are in the latter.</p> </blockquote> </aside> <p>So the issue is actually that the generated metadata for wheels and sdists are not the same in case of <code>setuptools</code>. Without having looked into it in much depth I would argue this then to be a bug in the build system <code>setuptools</code>. Indeed, it is a known issue <a href="https://github.com/pypa/setuptools/issues/1716" class="inline-onebox" rel="noopener nofollow ugc">SDist PKG-INFO file should include Requires-Dist entries. · Issue #1716 · pypa/setuptools · GitHub</a>.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="1" /> <span class='post-likes'>1 Like</span> </div> </div> <div id='post_5' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/pf_moore'><span itemprop='name'>pf_moore</span></a> (Paul Moore) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2019-11-08T18:03:13Z' class='post-time'> November 8, 2019, 6:03pm </time> <meta itemprop='dateModified' content='2019-11-08T18:03:13Z'> <span itemprop='position'>5</span> </span> </div> <div class='post' itemprop='text'> <p>+1 on treating this as “just” a backend bug. The problem is that once lost, trust is hard to regain - how can pip (or any other front end) detect that the backend is trustworthy in this respect?</p> <p>Requiring backends to add a metadata field that (in effect) says “I don’t have a bug” seems a bit silly (and worse still, it would have to be added to the metadata standard as a required field, to be of any use!) but I can’t think of a better alternative.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_6' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/njs'><span itemprop='name'>njs</span></a> (Nathaniel J. Smith) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2019-11-08T19:19:40Z' class='post-time'> November 8, 2019, 7:19pm </time> <meta itemprop='dateModified' content='2019-11-08T19:19:40Z'> <span itemprop='position'>6</span> </span> </div> <div class='post' itemprop='text'> <p>There will also be cases where the sdist simply doesn’t know all the metadata for the final wheel, because it varies depending on what happens during the build. So we could think of this proposed field as “I can promise that my wheel metadata is not dynamic and will match the sdist”, rather than just “I don’t have a bug”.</p> <p>Also if we did this, I think the trick would have to be that if you set this flag, then pip and other build tools need to actually enforce it, by comparing the sdist and wheel metadata and erroring out if they don’t match. That’s the only way to make it actually trustworthy.</p> <p>But setuptools will never be able to set this flag automatically, because setuptools has no idea whether any given <code>setup.py</code> has tricky dynamicity in it. Which means that this flag would have to be something that individual projects have to opt-in to. Which is fine for projects that have active and diligent maintainers. … But those projects mostly distribute wheels already, so this flag is unnecessary. The projects that need it are the ones that only distribute sdists. Some of those projects do have active maintainers that could potentially be convinced to add this flag. But I think to make a real dent in the missing-metadata problem, you’ll need to find something that works for the inactive-but-still-used projects, and an opt-in flag won’t help with those.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="2" /> <span class='post-likes'>2 Likes</span> </div> </div> <div id='post_7' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/chrahunt'><span itemprop='name'>chrahunt</span></a> (Chris Hunt) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2019-11-08T20:36:31Z' class='post-time'> November 8, 2019, 8:36pm </time> <meta itemprop='dateModified' content='2019-11-08T20:36:31Z'> <span itemprop='position'>7</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote group-committers" data-username="njs" data-post="6" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/njs/48/204_2.png" class="avatar"> njs:</div> <blockquote> <p>“I can promise that my wheel metadata is not dynamic and will match the sdist”, rather than just “I don’t have a bug”.</p> </blockquote> </aside> <p>Yes, exactly.</p> <p>I’m also on board for enforcing metadata consistency. That is similar to an issue I filed <a href="https://github.com/pypa/pip/issues/7179" rel="noopener nofollow ugc">here</a>.</p> <aside class="quote group-committers" data-username="njs" data-post="6" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/njs/48/204_2.png" class="avatar"> njs:</div> <blockquote> <p>those projects mostly distribute wheels already, so this flag is unnecessary.</p> </blockquote> </aside> <p>There are other use cases that would make this worthwhile, specifically:</p> <ul> <li>Users that pass <code>--no-binary :all:</code>, where we won’t consider remote wheels</li> <li>Users on platforms that don’t have pre-built wheels</li> </ul> <aside class="quote group-committers" data-username="njs" data-post="6" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/njs/48/204_2.png" class="avatar"> njs:</div> <blockquote> <p>to make a real dent in the missing-metadata problem</p> </blockquote> </aside> <p>The primary focus for me is coming to a conclusion on whether we can make anything about this <code>PKG-INFO</code> useful, or rule it out entirely. An opt-in flag is the only way I see forward on that front. I agree it will not help inactive-but-used projects.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_8' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/pganssle'><span itemprop='name'>pganssle</span></a> (Paul Ganssle) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2019-11-12T21:33:27Z' class='post-time'> November 12, 2019, 9:33pm </time> <meta itemprop='dateModified' content='2019-11-12T21:33:27Z'> <span itemprop='position'>8</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote group-committers" data-username="njs" data-post="6" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/njs/48/204_2.png" class="avatar"> njs:</div> <blockquote> <p>But setuptools will never be able to set this flag automatically, because setuptools has no idea whether any given <code>setup.py</code> has tricky dynamicity in it.</p> </blockquote> </aside> <p>This is not <em>entirely</em> true. There is <a href="https://github.com/pypa/setuptools/issues/1805">an open issue</a> to populate <code>Requires-Dist</code> for sdists if-and-only-if <code>install_requires</code> is specified in <code>setup.cfg</code> and not in <code>setup.py</code>.</p> <p>Of course, one can make the argument that a <code>setup.py</code> could do this:</p> <pre><code class="lang-auto">... if some_condition: kwargs['install_requires'] = ["something"] setup(**kwargs) </code></pre> <p>Even if <code>install_requires</code> is specified in <code>setup.cfg</code>. Even if we’re forced to consider this a possibility and not auto-set the flag, we have other options of decreasing value, e.g. only set the flag if the <code>setup.py</code> is generated by <code>setuptools</code> itself, or use the <code>ast</code> module to parse <code>setup.py</code> and only set the flag if <code>setup()</code> is called with enumerated options and without <code>install_requires</code>.</p> <p>In any case, it’s definitely true that it’s somewhat tricky, but if we combine some zero-false-positive heuristics with education and documentation about the use of declarative metadata, we may get to a world where for the most part, even source distributions have reliable dependency metadata in setuptools.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_9' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/pf_moore'><span itemprop='name'>pf_moore</span></a> (Paul Moore) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2019-11-12T22:56:22Z' class='post-time'> November 12, 2019, 10:56pm </time> <meta itemprop='dateModified' content='2019-11-12T22:56:22Z'> <span itemprop='position'>9</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote group-committers" data-username="njs" data-post="6" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/njs/48/204_2.png" class="avatar"> njs:</div> <blockquote> <p>But I think to make a real dent in the missing-metadata problem, you’ll need to find something that works for the inactive-but-still-used projects, and an opt-in flag won’t help with those.</p> </blockquote> </aside> <p>One thing that I have no feel at all for, is how significant a proportion of projects fall into this category. Are we talking about 50% of downloads from PyPI? Or 10%? Or 1%? Are download counts important, or would some other measure better capture “importance” here?</p> <p>For me, this is a classic 80-20 style of problem, if we could benefit 80% of the cases, I’d be happy with that. The added wrinkle, though, is that we have no real idea where to draw the line between the 80 and the 20. So we too often end up paralysed, unable to make progress because we can’t judge the importance of the use cases we’re considering.</p> <aside class="quote group-committers" data-username="njs" data-post="6" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/njs/48/204_2.png" class="avatar"> njs:</div> <blockquote> <p>The projects that need it are the ones that only distribute sdists.</p> </blockquote> </aside> <p>Not entirely true, there are people who install with <code>--no-binary</code>, for example. As well as people on platforms where wheels aren’t available (am I right that Docker images that use musl don’t have wheels, for example?). Again, a better understanding of use cases would help here.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="1" /> <span class='post-likes'>1 Like</span> </div> </div> <div id='post_10' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/pradyunsg'><span itemprop='name'>pradyunsg</span></a> (Pradyun Gedam) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2019-11-13T04:16:19Z' class='post-time'> November 13, 2019, 4:16am </time> <meta itemprop='dateModified' content='2019-11-13T04:16:19Z'> <span itemprop='position'>10</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote group-committers" data-username="pganssle" data-post="8" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/pganssle/48/245_2.png" class="avatar"> pganssle:</div> <blockquote> <p>use the <code>ast</code> module to parse <code>setup.py</code> and only set the flag</p> </blockquote> </aside> <p>If we’re going this far, might as well start doing “static evaluation” of setup.py to check if there’s anything dynamic happening in setup.py – <a class="mention" href="/u/techalchemy">@techalchemy</a> had something for this if I remember correctly.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_11' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/techalchemy'><span itemprop='name'>techalchemy</span></a> (Dan Ryan) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2019-11-13T07:28:35Z' class='post-time'> November 13, 2019, 7:28am </time> <meta itemprop='dateModified' content='2019-11-13T07:28:35Z'> <span itemprop='position'>11</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote group-committers" data-username="pradyunsg" data-post="10" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/pradyunsg/48/206_2.png" class="avatar"> pradyunsg:</div> <blockquote> <p><a class="mention" href="/u/techalchemy">@techalchemy</a> had something for this if I remember correctly</p> </blockquote> </aside> <p>I do indeed. It’s very poorly implemented though. It’s probably imperfect but it is defintinely possible to traverse the AST for this information. It gets tricky because sometimes people import <code>setup</code> under an alias, e.g. <code>from setuptools import setup as do_stuff</code> (i’ve seen approximations of this) and I’ve even seen people rely on directory-local imports of their own code which imports <code>setuptools.setup</code> on one occasion (<code>from .mymodule import my_version_of_setup</code> which in turn called <code>setuptools.setup</code>). I do not believe my code handles that case <img src="https://emoji.discourse-cdn.com/apple/slight_smile.png?v=12" title=":slight_smile:" class="emoji" alt=":slight_smile:" loading="lazy" width="20" height="20"></p> <p>This is an interesting conversation and one bit of information I would like to add that may be relevant is that I was at a packaging summit hosted by Microsoft recently with folks from npm, go, Java (maven/gradle), OCI, NuGet and a few others and the overarching theme seemed to be enforcement – putting tools in front of the upload process, whether they are strict enforcement or simply encourage the desired behaviors (someone from github suggested that they could fail a check if their wheels lacked metadata after a build). Rather than trusting that the user supplied good metadata, there was a lot of interest in actually validating or if possible generating the metadata at the index.</p> <p>This is obviously super nuanced and I’m hand waving tons of complexity but I think we are relatively smart and we can probably get a basic solution working. It’s in keeping with what we discussed at PyCon last year and it ultimately all comes down to metadata.</p> <p>As of last month I accepted a partly sponsored role with Canonical and I’ll be spending a chunk of my time on packaging related work so I’ll be glad to catch up on these. I believe <a class="mention" href="/u/pradyunsg">@pradyunsg</a> and I were supposed to draft a PEP related to extras based on some of the work <a class="mention" href="/u/njs">@njs</a> had done as an outcome of PyCon last year, but due to a lot of factors I hadn’t had any time to do anything open source related. Now that I have time I’d be glad to pick that back up (I’m sure it’s discussed on discourse somewhere).</p> <p>To the original question, I’d suggest caution around making adjustments to metadata PEPs ahead of the resolver work in pip unless we are prepared to tackle the full extent of the issue surrounding our current metadata representations (see the previous paragraph about extras). If that’s something we are willing to tackle head on, I think it does make sense to do that first, however.</p> <p>Sorry for the many words but hopefully that was mostly on-topic and clear.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_12' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/pf_moore'><span itemprop='name'>pf_moore</span></a> (Paul Moore) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2019-11-13T11:11:00Z' class='post-time'> November 13, 2019, 11:11am </time> <meta itemprop='dateModified' content='2019-11-13T11:11:00Z'> <span itemprop='position'>12</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote no-group" data-username="techalchemy" data-post="11" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/techalchemy/48/238_2.png" class="avatar"> techalchemy:</div> <blockquote> <p>It gets tricky because sometimes people import <code>setup</code> under an alias, e.g. <code>from setuptools import setup as do_stuff</code> (i’ve seen approximations of this) and I’ve even seen people rely on directory-local imports of their own code which imports <code>setuptools.setup</code> on one occasion ( <code>from .mymodule import my_version_of_setup</code> which in turn called <code>setuptools.setup</code> ).</p> </blockquote> </aside> <p>I’m going to reiterate a point I’ve made elsewhere on this, though. How far should we go to support such usages? What requirements drive the use of such unusual approaches for the projects using them, and are those requirements sufficiently compelling to justify the significant amount of extra work required of the packaging tool community to cater for those usages?</p> <p>I strongly believe we should avoid getting trapped in a mindset that says that we have to support absolutely every usage of setuptools imaginable, across all packaging tools. If the requirement for a particular project is strong enough, “use an older version of pip” is an option - and if the cost to the project of doing that is too high, then maybe the cost of supporting that usage in the packaging tools should <em>also</em> be considered too high.</p> <p>We have to be cautious here - breaking backward compatibility should never be something we do lightly - but we should have the option available when it’s needed.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="1" /> <span class='post-likes'>1 Like</span> </div> </div> <div id='post_13' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/pradyunsg'><span itemprop='name'>pradyunsg</span></a> (Pradyun Gedam) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2019-11-14T04:42:16Z' class='post-time'> November 14, 2019, 4:42am </time> <meta itemprop='dateModified' content='2019-11-14T04:42:16Z'> <span itemprop='position'>13</span> </span> </div> <div class='post' itemprop='text'> <p>Yea. I’m going to say that a system that covers the basic case – a literal defined in setup.py should be considered canonical.</p> <p>Anything else, we can tackle that if we see the need to.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_14' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/techalchemy'><span itemprop='name'>techalchemy</span></a> (Dan Ryan) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2019-11-14T05:40:44Z' class='post-time'> November 14, 2019, 5:40am </time> <meta itemprop='dateModified' content='2019-11-14T05:40:44Z'> <span itemprop='position'>14</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote group-committers" data-username="pf_moore" data-post="12" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/pf_moore/48/35_2.png" class="avatar"> pf_moore:</div> <blockquote> <p>I strongly believe we should avoid getting trapped in a mindset that says that we have to support absolutely every usage of setuptools imaginable, across all packaging tools.</p> </blockquote> </aside> <p>Completely agreed</p> <aside class="quote group-committers" data-username="pradyunsg" data-post="13" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/pradyunsg/48/206_2.png" class="avatar"> pradyunsg:</div> <blockquote> <p>a literal defined in setup.py should be considered canonical.</p> </blockquote> </aside> <p>Without getting too much in the weeds anymore we are basically on the same page. Ultimately (and I realize this position may still be controversial) I think we need to move as far away from executable package manifests as possible. I.e. define metadata in one place, and, if needed, build extensions in another. As long as we are stuck asking the question “do we need to write an AST parser for reading <code>install_requires</code> information or should I run <code>python setup.py egg_info</code> and parse the resultant metadata?” we are going to be building these overly complicated workarounds just to get basic metadata.</p> <p>So what would we need to do in order to make that happen? What are the major barriers?</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_15' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/pradyunsg'><span itemprop='name'>pradyunsg</span></a> (Pradyun Gedam) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2019-11-14T07:33:45Z' class='post-time'> November 14, 2019, 7:33am </time> <meta itemprop='dateModified' content='2019-11-14T07:33:45Z'> <span itemprop='position'>15</span> </span> </div> <div class='post' itemprop='text'> <p>I’m saying, setuptools should do these shenanigans, to determine if the metadata from setup.py is “stable”. A field added to the metadata specification for declaring how sdists can have “stable” dependency data would be good to have too.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="1" /> <span class='post-likes'>1 Like</span> </div> <div class='crawler-linkback-list' itemscope itemtype='http://schema.org/ItemList'> <div itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <a itemprop='url' href="https://discuss.python.org/t/what-do-we-want-in-standardized-sdists/3049/10">What do we want in standardized sdists?</a> <meta itemprop='position' content='1'> </div> </div> </div> <div id='post_16' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/bernatgabor'><span itemprop='name'>bernatgabor</span></a> (Bernát Gábor) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2020-01-23T11:35:29Z' class='post-time'> January 23, 2020, 11:35am </time> <meta itemprop='dateModified' content='2020-01-23T11:35:29Z'> <span itemprop='position'>16</span> </span> </div> <div class='post' itemprop='text'> <p>I personally would not ban out dynamic metadata… but introduce a field into our metadata where tools/people can define if some metadata is dynamic or not. When dynamic metadata is needed we can do <code>prepare_metadata_for_build_wheel</code>. I would impose though that the <code>prepare_metadata_for_build_wheel</code> must be stable on subsequent call… that is calling it on the same machine twice, one after another, should give the same metadata.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_17' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/pf_moore'><span itemprop='name'>pf_moore</span></a> (Paul Moore) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2020-01-23T12:15:47Z' class='post-time'> January 23, 2020, 12:15pm </time> <meta itemprop='dateModified' content='2020-01-23T12:15:47Z'> <span itemprop='position'>17</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote no-group" data-username="bernatgabor" data-post="16" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/bernatgabor/48/3003_2.png" class="avatar"> bernatgabor:</div> <blockquote> <p>I personally would not ban out dynamic metadata</p> </blockquote> </aside> <p>Agreed, it’s sometimes necessary. But probably only rarely. So, coming round full circle on this, what’s wrong with the following proposal:</p> <ol> <li>Tools that create sdists (setuptools, flit, etc) work out how they can tell if a given metadata item is “static”. That doesn’t need any standardisation, it’s just a question of their UI. For example, flit can probably say “everything is”, and setuptools can maybe say “everything from <code>setup.cfg</code> is as long as it’s not then modified via <code>setup.py</code>”. Worst case, tools could ask the user to say.</li> <li>We add a way for that information to be recorded in the sdist metadata. <a class="mention" href="/u/chrahunt">@chrahunt</a>’s original suggestion of <code>Metadata-Covers: X, Y, Z</code> seems reasonable, and we can bikeshed as much as we want (or can endure <img src="https://emoji.discourse-cdn.com/apple/wink.png?v=12" title=":wink:" class="emoji" alt=":wink:" loading="lazy" width="20" height="20">).</li> <li>Tools like pip start to rely on the data that’s marked as reliable.</li> </ol> <p>Step (2) is the only one that needs standardising. But all the benefits come from steps 1 and 3, so if the only blocker on achieving this is to agree on (2), then let’s focus on that. Is there anything <em>wrong</em> with the suggestion of <code>Metadata-Covers: <list of items, or "all"></code>? Does anyone have any better name than <code>Metadata-Covers</code>?</p> <p>There is of course the other matter that the sdidt format and the existence of <code>PKG-DATA</code> is not yet standardised. So there’s no standard for the decision in (2) to update at the moment. We could either make standardising sdists a pre-requisite for this discussion, or we could get something agreed, put it in place as an implementation-defined behaviour for now, and tackle standardising sdists separately. I’m inclined to do the latter (says the interop standards BDFL-delegate <img src="https://emoji.discourse-cdn.com/apple/slightly_smiling_face.png?v=12" title=":slightly_smiling_face:" class="emoji" alt=":slightly_smiling_face:" loading="lazy" width="20" height="20">), because it allows us to make faster progress.</p> <p>Thoughts? Is this too simplistic?</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="1" /> <span class='post-likes'>1 Like</span> </div> </div> <div id='post_18' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/bernatgabor'><span itemprop='name'>bernatgabor</span></a> (Bernát Gábor) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2020-01-23T12:50:24Z' class='post-time'> January 23, 2020, 12:50pm </time> <meta itemprop='dateModified' content='2020-01-23T12:50:24Z'> <span itemprop='position'>18</span> </span> </div> <div class='post' itemprop='text'> <p>I’m definitely on the side of someone first implementing a POC before making it proposal. Then we just iterate on that POC to cover edge cases, and standardize that.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_19' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/pganssle'><span itemprop='name'>pganssle</span></a> (Paul Ganssle) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2020-01-23T15:29:42Z' class='post-time'> January 23, 2020, 3:29pm </time> <meta itemprop='dateModified' content='2020-01-23T15:29:42Z'> <span itemprop='position'>19</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote no-group" data-username="bernatgabor" data-post="16" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/bernatgabor/48/3003_2.png" class="avatar"> bernatgabor:</div> <blockquote> <p>I personally would not ban out dynamic metadata…</p> </blockquote> </aside> <aside class="quote group-committers" data-username="pf_moore" data-post="17" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/pf_moore/48/35_2.png" class="avatar"> pf_moore:</div> <blockquote> <p>Agreed, it’s sometimes necessary.</p> </blockquote> </aside> <p>Can you clarify in what circumstances it might be necessary? Are there certain fields where it’s going to be necessary?</p> <p>I suspect that <em>most</em> cases of dynamic metadata are people who don’t know better - they have environment-specific dependencies or something like that and don’t know about environment markers.</p> <p>The only situation I can think of where someone might legitimately want dynamic metadata would be if there’s a very specific environment that we don’t have any environment marker for and someone needs to do a workaround. Even in that case, we could almost certainly use a “conditional dependencies” mechanism that says, “Here are all the dependencies that definitely will be installed, here are some that depend on install-time conditions.” If that is necessary, it seems better than “metadata can be anything”.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_20' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/pf_moore'><span itemprop='name'>pf_moore</span></a> (Paul Moore) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2020-01-23T16:01:22Z' class='post-time'> January 23, 2020, 4:01pm </time> <meta itemprop='dateModified' content='2020-01-23T16:01:22Z'> <span itemprop='position'>20</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote group-committers" data-username="pganssle" data-post="19" data-topic="2620"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/pganssle/48/245_2.png" class="avatar"> pganssle:</div> <blockquote> <p>Can you clarify in what circumstances it might be necessary?</p> </blockquote> </aside> <p>Sorry, terminology may be getting confused here. And I shouldn’t have casually made my comment on the back of a comment about “dynamic metadata”, I should have been clearer what <em>I</em> meant.</p> <p>I’m referring specifically to “if you take a sdist and look at its metadata, and then build a wheel from that sdist, you get different metadata”. That’s not (necessarily) because the metadata is calculated dynamically, but it’s effectively the same to pip - we can’t rely on the sdist metadata. This apparently does happen - see the initial post from <a class="mention" href="/u/chrahunt">@chrahunt</a>, which mentions that it’s a problem with <code>requests</code>.</p> <p>I don’t actually <em>care</em> whether we support dynamic metadata (even assuming there are use cases that need it - I don’t know if <a class="mention" href="/u/bernatgabor">@bernatgabor</a> had anything specific in mind). What I care about is sdist metadata being reliably<sup>1</sup> the same as what we’d get by building a wheel, so that when we have a sdist in pip, we can <em>use</em> that data in the early (resolver) stages, and skip a call to the build backend just to get metadata that in theory we already have.</p> <p>My impression was that the reason we couldn’t have that was because the metadata is calculated dynamically, and we can’t be sure it will still be the same at wheel-build time. But now that I check, I don’t see anything in the requests code that explains why the sdist doesn’t include <code>Requires-Dist</code> - is that actually just a bug (in setuptools or distutils or somewhere) that could be fixed?</p> <p>Maybe step 1 needs to be for someone to understand and clarify <em>why</em> the metadata in sdists is incomplete/unreliable?</p> <p><sup>1</sup> And by “reliably” I mean “we have a means to verify which values it’s OK to rely on”, not that it always has to be 100% accurate.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> </div> <div role='navigation' itemscope itemtype='http://schema.org/SiteNavigationElement' class="topic-body crawler-post"> <span itemprop='name'><b><a rel="next" itemprop="url" href="/t/why-isnt-source-distribution-metadata-trustworthy-can-we-make-it-so/2620?page=2">next page →</a></b></span> </div> </div> <footer class="container wrap"> <nav class='crawler-nav'> <ul> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='/' itemprop="url">Home </a> </span> </li> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='/categories' itemprop="url">Categories </a> </span> </li> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='/guidelines' itemprop="url">Guidelines </a> </span> </li> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='/tos' itemprop="url">Terms of Service </a> </span> </li> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='/privacy' itemprop="url">Privacy Policy </a> </span> </li> </ul> </nav> <p class='powered-by-link'>Powered by <a href="https://www.discourse.org">Discourse</a>, best viewed with JavaScript enabled</p> </footer> <div class="buorg"><div>Unfortunately, <a href="https://www.discourse.org/faq/#browser">your browser is unsupported</a>. Please <a href="https://browsehappy.com">switch to a supported browser</a> to view rich content, log in and reply.</div></div> </body> </html>