CINXE.COM

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <title>PEP 708 - Extending the Repository API to Mitigate Dependency Confusion Attacks - Standards - Discussions on Python.org</title> <meta name="description" content="Following on from the mapping file proposal, which spun off into a thread on pip’s issue tracker, I’ve written up a PEP that I think will allow us to ultimately solve dependency confusion attacks. It takes the approach o&hellip;"> <meta name="generator" content="Discourse 3.4.0.beta3-dev - https://github.com/discourse/discourse version 1f538a81a833a804e609df624d56e92919bc62f3"> <link rel="icon" type="image/png" href="https://global.discourse-cdn.com/flex016/uploads/python1/optimized/1X/9997f0605d56c4bfecd63594f52f42cdafd6b06a_2_32x32.png"> <link rel="apple-touch-icon" type="image/png" href="https://global.discourse-cdn.com/flex016/uploads/python1/optimized/1X/4c06143de7870c35963b818b15b395092a434991_2_180x180.png"> <meta name="theme-color" media="(prefers-color-scheme: light)" content="#ffffff"> <meta name="theme-color" media="(prefers-color-scheme: dark)" content="#111111"> <meta name="viewport" content="width=device-width, initial-scale=1.0, minimum-scale=1.0, user-scalable=yes, viewport-fit=cover"> <link rel="canonical" href="https://discuss.python.org/t/pep-708-extending-the-repository-api-to-mitigate-dependency-confusion-attacks/24179" /> <link rel="search" type="application/opensearchdescription+xml" href="https://discuss.python.org/opensearch.xml" title="Discussions on Python.org Search"> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/color_definitions_base__2_25fa1ee92854c4156785d82f8e4b273a11fbebb8.css?__ws=discuss.python.org" media="all" rel="stylesheet" class="light-scheme"/><link href="https://sea2.discourse-cdn.com/flex016/stylesheets/color_definitions_dark_1_2_c3e06cc91a0b7fd3c21e7c678ac0ebd6a501c657.css?__ws=discuss.python.org" media="(prefers-color-scheme: dark)" rel="stylesheet" class="dark-scheme"/> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/desktop_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="desktop" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/checklist_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="checklist" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-adplugin_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-adplugin" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-akismet_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-akismet" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-cakeday_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-cakeday" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-chat-integration_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-chat-integration" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-details_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-details" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-lazy-videos_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-lazy-videos" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-local-dates_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-local-dates" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-math_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-math" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-narrative-bot_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-narrative-bot" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-policy_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-policy" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-presence_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-presence" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-solved_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-solved" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-templates_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-templates" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-topic-voting_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-topic-voting" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-user-notes_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-user-notes" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-yearly-review_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-yearly-review" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/footnote_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="footnote" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/hosted-site_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="hosted-site" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/poll_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="poll" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/spoiler-alert_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="spoiler-alert" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/discourse-topic-voting_desktop_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="discourse-topic-voting_desktop" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/poll_desktop_ba98439999d40b1cf90c6668baa0bb9a1bf3d3c7.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="poll_desktop" /> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/desktop_theme_4_03355c6ca7c9305adaaed722dba278b6ca0e3488.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="desktop_theme" data-theme-id="4" data-theme-name="unformatted code detector"/> <link href="https://sea2.discourse-cdn.com/flex016/stylesheets/desktop_theme_2_916f4a49a8bbba8cff9cf348e566f30de752b56b.css?__ws=discuss.python.org" media="all" rel="stylesheet" data-target="desktop_theme" data-theme-id="2" data-theme-name="light"/> <link rel="alternate nofollow" type="application/rss+xml" title="RSS feed of 'PEP 708 - Extending the Repository API to Mitigate Dependency Confusion Attacks'" href="https://discuss.python.org/t/pep-708-extending-the-repository-api-to-mitigate-dependency-confusion-attacks/24179.rss" /> <meta property="og:site_name" content="Discussions on Python.org" /> <meta property="og:type" content="website" /> <meta name="twitter:card" content="summary" /> <meta name="twitter:image" content="https://global.discourse-cdn.com/flex016/uploads/python1/original/1X/f93ff97c4f381b5e8add5a0c163b4ded29f20ed7.png" /> <meta property="og:image" content="https://global.discourse-cdn.com/flex016/uploads/python1/original/1X/f93ff97c4f381b5e8add5a0c163b4ded29f20ed7.png" /> <meta property="og:url" content="https://discuss.python.org/t/pep-708-extending-the-repository-api-to-mitigate-dependency-confusion-attacks/24179" /> <meta name="twitter:url" content="https://discuss.python.org/t/pep-708-extending-the-repository-api-to-mitigate-dependency-confusion-attacks/24179" /> <meta property="og:title" content="PEP 708 - Extending the Repository API to Mitigate Dependency Confusion Attacks" /> <meta name="twitter:title" content="PEP 708 - Extending the Repository API to Mitigate Dependency Confusion Attacks" /> <meta property="og:description" content="Following on from the mapping file proposal, which spun off into a thread on pip’s issue tracker, I’ve written up a PEP that I think will allow us to ultimately solve dependency confusion attacks. It takes the approach of extending the simple repository API and assumes that if we do that, then installers like pip will follow the recommendations which is what actually will prevent dependency confusion attacks. That’s now available as PEP 708, which you can view online shortly or I’ve included th..." /> <meta name="twitter:description" content="Following on from the mapping file proposal, which spun off into a thread on pip’s issue tracker, I’ve written up a PEP that I think will allow us to ultimately solve dependency confusion attacks. It takes the approach of extending the simple repository API and assumes that if we do that, then installers like pip will follow the recommendations which is what actually will prevent dependency confusion attacks. That’s now available as PEP 708, which you can view online shortly or I’ve included th..." /> <meta property="og:article:section" content="Packaging" /> <meta property="og:article:section:color" content="ED207B" /> <meta property="og:article:section" content="Standards" /> <meta property="og:article:section:color" content="ed76ab" /> <meta name="twitter:label1" value="Reading time" /> <meta name="twitter:data1" value="49 mins 🕑" /> <meta name="twitter:label2" value="Likes" /> <meta name="twitter:data2" value="113 ❤" /> <meta property="article:published_time" content="2023-02-23T15:51:02+00:00" /> <meta property="og:ignore_canonical" content="true" /> <link rel="next" href="/t/pep-708-extending-the-repository-api-to-mitigate-dependency-confusion-attacks/24179?page=2"> </head> <body class="crawler browser-update"> <header> <a href="/"> Discussions on Python.org </a> </header> <div id="main-outlet" class="wrap" role="main"> <div id="topic-title"> <h1> <a href="/t/pep-708-extending-the-repository-api-to-mitigate-dependency-confusion-attacks/24179">PEP 708 - Extending the Repository API to Mitigate Dependency Confusion Attacks</a> </h1> <div class="topic-category" itemscope itemtype="http://schema.org/BreadcrumbList"> <span itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem"> <a href="/c/packaging/std/35" class="badge-wrapper bullet" itemprop="item"> <span class='badge-category-bg' style='background-color: #ED207B'></span> <span class='badge-category clear-badge'> <span class='category-name' itemprop='name'>Packaging</span> </span> </a> <meta itemprop="position" content="1" /> </span> <span itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem"> <a href="/c/packaging/std/35" class="badge-wrapper bullet" itemprop="item"> <span class='badge-category-bg' style='background-color: #ed76ab'></span> <span class='badge-category clear-badge'> <span class='category-name' itemprop='name'>Standards</span> </span> </a> <meta itemprop="position" content="2" /> </span> </div> </div> <div itemscope itemtype='http://schema.org/DiscussionForumPosting'> <meta itemprop='headline' content='PEP 708 - Extending the Repository API to Mitigate Dependency Confusion Attacks'> <link itemprop='url' href='https://discuss.python.org/t/pep-708-extending-the-repository-api-to-mitigate-dependency-confusion-attacks/24179'> <meta itemprop='datePublished' content='2023-02-23T15:51:01Z'> <meta itemprop='articleSection' content='Standards'> <meta itemprop='keywords' content=''> <div itemprop='publisher' itemscope itemtype="http://schema.org/Organization"> <meta itemprop='name' content='Python Software Foundation'> <div itemprop='logo' itemscope itemtype="http://schema.org/ImageObject"> <meta itemprop='url' content='https://global.discourse-cdn.com/flex016/uploads/python1/original/1X/c7591c98caf3b31d4d9c6f322f41ed9d80a50800.png'> </div> </div> <div id='post_1' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/dstufft'><span itemprop='name'>dstufft</span></a> (Donald Stufft) </span> <link itemprop="mainEntityOfPage" href="https://discuss.python.org/t/pep-708-extending-the-repository-api-to-mitigate-dependency-confusion-attacks/24179"> <span class="crawler-post-infos"> <time datetime='2023-02-23T15:51:02Z' class='post-time'> February 23, 2023, 3:51pm </time> <meta itemprop='dateModified' content='2024-01-04T04:27:52Z'> <span itemprop='position'>1</span> </span> </div> <div class='post' itemprop='text'> <p>Following on from the <a href="https://discuss.python.org/t/proposal-preventing-dependency-confusion-attacks-with-the-map-file/23414/">mapping file proposal</a>, which spun off into a thread on <a href="https://github.com/pypa/pip/issues/11784">pip’s issue tracker</a>, I’ve written up a PEP that I think will allow us to ultimately solve dependency confusion attacks. It takes the approach of extending the simple repository API and assumes that if we do that, then installers like pip will follow the recommendations which is what actually will prevent dependency confusion attacks.</p> <p>That’s now available as PEP 708, which you can <a href="https://peps.python.org/pep-0708/">view online</a> shortly or I’ve included the text below.</p> <p>Since I’m the standing PEP Delegate for PyPI and I can’t delegate my own PEP, I’ve asked <a class="mention" href="/u/pf_moore">@pf_moore</a> to be the delegate for this in my stead, and he has graciously accepted. Having Paul ultimately decide on this PEP should also act as a good check that pip would be willing to implement the non-normative recommendations in this PEP.</p> <p>Currently there’s one major open question, in the <a href="https://docs.google.com/document/d/184fQkb6NggVQfYmjTDA7p_U3iWDKk6grc2DigT1X3Es/">original draft</a> proposal that I wrote that ultimately became this PEP, I specified a “repository file” which was a rough idea for how pip might allow users to explicitly control what repositories a particular package comes from. As of right now I have omitted that from the PEP and instead just handwaved it around that installers should provide some method of doing this.</p> <p>The question is do we think that handwaving it and expecting installers come up with their own independent mechanisms is the right answer? Should we specify a file format that installers can/should support to allow better interoperability between different installers, and if so does the repository file approach from my original proposal look like a good starting point?</p> <p>Thanks everyone whose already participated in this discussion, and I look forward to pushing this further forward!</p> <hr> <details> <summary>PEP 708</summary> <pre><code class="lang-auto">PEP: 708 Title: Extending the Repository API to Mitigate Dependency Confusion Attacks Author: Donald Stufft <donald@stufft.io> PEP-Delegate: Paul Moore <p.f.moore@gmail.com> Discussions-To: https://discuss.python.org/t/24179 Status: Draft Type: Standards Track Topic: Packaging Content-Type: text/x-rst Created: 20-Feb-2023 Post-History: `01-Feb-2023 <https://discuss.python.org/t/23414/>`__, `23-Feb-2023 <https://discuss.python.org/t/24179>`__ Abstract ======== Dependency confusion attacks, in which a malicious package is installed instead of the one the user expected, are an `increasingly common supply chain threat <https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610>`__. Most such attacks against Python dependencies, including the `recent PyTorch incident <https://pytorch.org/blog/compromised-nightly-dependency/>`_, occur with multiple package repositories, where a dependency expected to come from one repository (e.g. a custom index) is installed from another (e.g. PyPI). To help address this problem, this PEP proposes extending the :ref:`Simple Repository API <packaging:simple-repository-api>` to allow repository operators to indicate that a project found on their repository "tracks" a project on a different repository, and allows projects to extend their namespaces across multiple repositories. These features will allow installers to determine when a project being made available from a particular mix of repositories is expected and should be allowed, and when it is not and should halt the install with an error to protect the user. Motivation =========== There is a long-standing class of attacks that are called "dependency confusion" attacks, which roughly boil down to an individual user expected to get package ``A``, but instead they got ``B``. In Python, this almost always happens due to the configuration of multiple repositories (possibly including the default of PyPI), where they expected package ``A`` to come from repository ``X``, but someone is able to publish package ``B`` to repository ``Y`` under the same name. Dependency Confusion attacks have long been possible, but they've recently gained press with `public examples of cases where these attacks were successfully executed <https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610>`__. A specific example of this is the recent case where the PyTorch project had an internal package named ``torchtriton`` which was only ever intended to be installed from their repositories located at ``https://download.pytorch.org/``, but that repository was designed to be used in conjunction with PyPI, and the name of ``torchtriton`` was not claimed on PyPI, which allowed the attacker to use that name and publish a malicious version. There are a number of ways to mitigate against these attacks today, but they all require that the end user go out of their way to protect themselves, rather than being protected by default. This means that for the vast bulk of users, they are likely to remain vulnerable, even if they are ultimately aware of these types of attacks. Ultimately the underlying cause of these attacks come from the fact that there is no globally unique namespace that all Python package names come from. Instead, each repository is its own distinct namespace, and when given an "abstract" name such as ``spam`` to install, an installer has to implicitly turn that into a "concrete" name such as ``pypi.org:spam`` or ``example.com:spam``. Currently the standard behavior in Python installation tools is to implicitly flatten these multiple namespaces into one that contains the files from all namespaces. This assumption that collapsing the namespaces is what was expected means that when packages with the same name in different repositories are authored by different parties (such as in the ``torchtriton`` case) dependency confusion attacks become possible. This is made particularly tricky in that there is no "right" answer; there are valid use cases both for wanting two repositories merged into one namespace *and* for wanting two repositories to be treated as distinct namespaces. This means that an installer needs some mechanism by which to determine when it should merge the namespaces of multiple repositories and when it should not, rather than a blanket always merge or never merge rule. This functionality could be pushed directly to the end user, since ultimately the end user is the person whose expectations of what gets installed from what repository actually matters. However, by extending the repository specification to allow a repository to indicate when it is safe, we can enable individual projects and repositories to "work by default", even when their project naturally spans multiple distinct namespaces, while maintaining the ability for an installer to be secure by default. On its own, this PEP does not solve dependency confusion attacks, but what it does do is provide enough information so that installers can prevent them without causing too much collateral damage to otherwise valid and safe use cases. Rationale ========= There are two broad use cases for merging names across repositories that this PEP seeks to enable. The first use case is when one repository is not defining its own names, but rather is extending names defined in another repository. This commonly happens in cases where a project is being mirrored from one repository to another (see `Bandersnatch <https://pypi.org/project/bandersnatch/>`__) or when a repository is providing supplementary artifacts for a specific platform (see `Piwheels <https://www.piwheels.org/>`__). In this case neither the repository nor the projects that are being extended may have any knowledge that they are being extended or by whom, so this cannot rely on any information that isn't present in the "extending" repository itself. The second use case is when the project wants to publish to one "main" repository, but then have additional repositories that provide binaries for additional platforms, GPUs, CPUs, etc. Currently wheel tags are not sufficiently able to express these types of binary compatibility, so projects that wish to rely on them are forced to set up multiple repositories and have their users manually configure them to get the correct binaries for their platform, GPU, CPU, etc. This use case is similiar to the first, but the important difference that makes it a distinct use case on it's own is who is providing the information and what their level of trust is. When a user configures a specific repository (or relies on the default) there is no ambiguity as to what repository they mean. A repository is identified by an URL, and through the domain system, URLs are globally unique identifiers. This lack of ambiguity means that an installer can assume that the repository operator is trustworthy and can trust metadata that they provide without needing to validate it. On the flip side, given an installer finds a name in multiple repositories it is ambiguous which of them the installer should trust. This ambiguity means that an installer cannot assume that the project owner on either repository is trustworthy and needs to validate that they are indeed the same project and that one isn't a dependency confusion attack. Without some way for the installer to validate the metadata between multiple repositories, projects would be forced into becoming repository operators to safely support this use case. That wouldn't be a particularly wrong choice to make; however, there is a danger that if we don't provide a way for repositories to let project owners express this relationship safely, they will be incentivized to let them use the repository operator's metadata instead which would reintroduce the original insecurity. Specification ============= This specification defines the changes in version 1.2 of the simple repository API, adding new two new metadata items: Repository "Tracks" and "Alternate Locations". Repository "Tracks" Metadata ---------------------------- To enable one repository to extend another, this PEP allows the extending repository to declare that it "tracks" another repository by adding the URL of the repository that it is extending. This is exposed in JSON as the key ``meta.tracks`` and in HTML as a meta element named ``pypi:tracks``. There are a few key properties that **MUST** be preserved when using this metadata: - It **MUST** be under the control of the repository operators themselves, not any individual publisher using that repository. - It **MUST** represent the same "project" as the project at the referenced URL. - This does not mean that it needs to serve the same files. It is valid for it to include binaries built on different platforms, copies with local patches being applied, etc. This is purposefully left vague as it's ultimately up to the expectations that the users have of the repository and its operators what exactly constitutes the "same" project. - It **MUST** point to the repository that "owns" the namespace, not another repository that is also tracking that namespace. - It **MUST** point to a project with the exact same name (after normalization). - It **MUST** point to the actual URL for that project, not the base URL for the extended repository. It is **NOT** required that every name in a repository tracks the same repository, or that they all track a repository at all. Mixed use repositories where some names track a repository and some names do not are explicitly allowed. JSON ~~~~ .. code-block:: JSON { "meta": { "api-version": "1.2", "tracks": "https://pypi.org/simple/holygrail/" }, "name": "holygrail", "files": [ { "filename": "holygrail-1.0.tar.gz", "url": "https://example.com/files/holygrail-1.0.tar.gz", "hashes": {"sha256": "...", "blake2b": "..."}, "requires-python": ">=3.7", "yanked": "Had a vulnerability" }, { "filename": "holygrail-1.0-py3-none-any.whl", "url": "https://example.com/files/holygrail-1.0-py3-none-any.whl", "hashes": {"sha256": "...", "blake2b": "..."}, "requires-python": ">=3.7", "dist-info-metadata": true } ] } HTML ~~~~ .. code-block:: HTML <!DOCTYPE html> <html> <head> <meta name="pypi:repository-version" content="1.2"> <meta name="pypi:tracks" content="https://pypi.org/simple/holygrail/"> </head> <body> <a href="https://example.com/files/holygrail-1.0.tar.gz#sha256=..."> <a href="https://example.com/files/holygrail-1.0-py3-none-any.whl#sha256=..."> </body> </html> "Alternate Locations" Metadata ------------------------------ To enable a project to extend its namespace across multiple repositories, this PEP allows a project owner to declare a list of "alternate locations" for their project. This is exposed in JSON as the key ``alternate-locations`` and in HTML as a meta element named ``pypi-alternate-locations``, which may be used multiple times. There are a few key properties that **MUST** be observed when using this metadata: - In order for this metadata to be trusted, there **MUST** be agreement between all locations where that project is found as to what the alternate locations are. - When using alternate locations, clients **MUST** implicitly assume that the url the response was fetched from was included in the list. This means that if you fetch from ``https://pypi.org/simple/foo/`` and it has an ``alternate-locations`` metadata that has the value ``["https://example.com/simple/foo/"]``, then you **MUST** treat it as if it had the value ``["https://example.com/simple/foo/", "https://pypi.org/simple/foo/"]``. - Order of the elements within the array does not have any particular meaning. When an installer encounters a project that is using the alternate locations metadata it **SHOULD** consider that all repositories named are extending the same namespace across multiple repositories. JSON ~~~~ .. code-block:: JSON { "meta": { "api-version": "1.2" }, "name": "holygrail", "alternate-locations": ["https://pypi.org/simple/holygrail/", "https://test.pypi.org/simple/holygrail/"], "files": [ { "filename": "holygrail-1.0.tar.gz", "url": "https://example.com/files/holygrail-1.0.tar.gz", "hashes": {"sha256": "...", "blake2b": "..."}, "requires-python": ">=3.7", "yanked": "Had a vulnerability" }, { "filename": "holygrail-1.0-py3-none-any.whl", "url": "https://example.com/files/holygrail-1.0-py3-none-any.whl", "hashes": {"sha256": "...", "blake2b": "..."}, "requires-python": ">=3.7", "dist-info-metadata": true } ] } HTML ~~~~ .. code-block:: HTML <!DOCTYPE html> <html> <head> <meta name="pypi:repository-version" content="1.2"> <meta name="pypi:alternate-locations" content="https://pypi.org/simple/holygrail/"> <meta name="pypi:alternate-locations" content="https://test.pypi.org/simple/holygrail/"> </head> <body> <a href="https://example.com/files/holygrail-1.0.tar.gz#sha256=..."> <a href="https://example.com/files/holygrail-1.0-py3-none-any.whl#sha256=..."> </body> </html> Recommendations =============== This section is non-normative; it provides recommendations to installers in how to interpret this metadata that this PEP feels provides the best tradeoff between protecting users by default and minimizing breakages to existing workflows. These recommendations are not binding, and installers are free to ignore them, or apply them selectively as they make sense in their specific situations. File Discovery Algorithm ------------------------ .. note:: This algorithm is written based on how pip currently discovers files; other installers may adapt this based on their own discovery procedures. Currently the "standard" file discovery algorithm looks something like this: 1. Generate a list of all files across all configured repositories. 2. Filter out any files that do not match known hashes from a lockfile or requirements file. 3. Filter out any files that do not match the current platform, Python version, etc. 4. Pass that list of files into the resolver where it will attempt to resolve the "best" match out of those files, irrespective of which repository it came from. It is recommended that installers change their file discovery algorithm to take into account the new metadata, and instead do: 1. Generate a list of all files across all configured repositories. 2. Filter out any files that do not match known hashes from a lockfile or requirements file. 3. If the end user has explicitly told the installer to fetch the project from specific repositories, filter out all other repositories and skip to 5. 4. Look to see if the discovered files span multiple repositories; if they do then determine if either "Tracks" or "Alternate Locations" metadata allows safely merging *ALL* of the repositories where files were discovered together. If that metadata does **NOT** allow that, then generate an error, otherwise continue. - **Note:** This only applies to *remote* repositories; repositories that exist on the local filesystem **SHOULD** always be implicitly allowed to be merged to any remote repository. 5. Filter out any files that do not match the current platform, Python version, etc. 6. Pass that list of files into the resolver where it will attempt to resolve the "best" match out of those files, irrespective of what repository it came from. This is somewhat subtle, but the key things in the recommendation are: - Users who are using lock files or requirements files that include specific hashes of artifacts that are "valid" are assumed to be protected by nature of those hashes, since the rest of these recommendations would apply during hash generation. Thus, we filter out unknown hashes up front. - If the user has explicitly told the installer that it wants to fetch a project from a certain set of repositories, then there is no reason to question that and we assume that they've made sure it is safe to merge those namespaces. - If the project in question only comes from a single repository, then there is no chance of dependency confusion, so there's no reason to do anything but allow. - We check for the metadata in this PEP before filtering out based on platform, Python version, etc., because we don't want errors that only show up on certain platforms, Python versions, etc. - If nothing tells us merging the namespaces is safe, we refuse to implicitly assume it is, and generate an error instead. - Otherwise we merge the namespaces, and continue on. This algorithm ensures that an installer never assumes that two disparate namespaces can be flattened into one, which for all practical purposes eliminates the possibility of any kind of dependency confusion attack, while still giving power throughout the stack in a safe way to allow people to explicitly declare when those disparate namespaces are actually one logical namespace that can be safely merged. The above algorithm is mostly a conceptual model. In reality the algorithm may end up being slightly different in order to be more privacy preserving and faster, or even just adapted to fit a specific installer better. Explicit Configuration for End Users ------------------------------------ This PEP avoids dictating or recommending a specific mechanism by which an installer allows an end user to configure exactly what repositories they want a specific package to be installed from. However, it does recommend that installers do provide *some* mechanism for end users to provide that configuration, as without it users can end up in a DoS situation in cases like ``torchtriton`` where they're just completely broken unless they resolve the namespace collision externally (get the name taken down on one repository, stand up a personal repository that handles the merging, etc). This configuration also allows end users to pre-emptively secure themselves during what is likely to be a long transition until the default behavior is safe. How to Communicate This ======================= .. note:: This example is pip specific and assumes specifics about how pip will choose to implement this PEP; it's included as an example of how we can communicate this change, and not intended to constrain pip or any other installer in how they implement this. This may ultimately be the actual basis for communication, and if so will need be edited for accuracy and clarity. This section should be read as if it were an entire "post" to communicate this change that could be used for a blog post, email, or discourse post. There's a long-standing class of attacks that are called "dependency confusion" attacks, which roughly boil down to an individual expected to get package ``A``, but instead they got ``B``. In Python, this almost always happens due to the end user having configured multiple repositories, where they expect package ``A`` to come from repository ``X``, but someone is able to publish package ``B`` with the same name as package ``A`` in repository ``Y``. There are a number of ways to mitigate against these attacks today, but they all require that the end user explicitly go out of their way to protect themselves, rather than it being inherently safe. In an effort to secure pip's users and protect them from these types of attacks, we will be changing how pip discovers packages to install. What is Changing? ----------------- When pip discovers that the same project is available from multiple remote repositories, by default it will generate an error and refuse to proceed rather than make a guess about which repository was the correct one to install from. Projects that natively publish to multiple repositories will be given the ability to safely "link" their repositories together so that pip does not error when those repositories are used together. End users of pip will be given the ability to explicitly define one or more repositories that are valid for a specific project, causing pip to only consider those repositories for that project, and avoiding generating an error altogether. See TBD for more information. Who is Affected? ---------------- Users who are installing from multiple remote (e.g. not present on the local filesystem) repositories may be affected by having pip error instead of successfully install if: - They install a project where the same "name" is being served by multiple remote repositories. - The project name that is available from multiple remote repositories has not used one of the defined mechanisms to link those repositories together. - The user invoking pip has not used the defined mechanism to explicitly control what repositories are valid for a particular project. Users who are not using multiple remote repositories will not be affected at all, which includes users who are only using a single remote repository, plus a local filesystem "wheel house". What do I need to do? --------------------- As a pip User? ~~~~~~~~~~~~~~ If you're using only a single remote repository you do not have to do anything. If you're using multiple remote repositories, you can opt into the new behavior by adding ``--use-feature=TBD`` to your pip invocation to see if any of your dependencies are being served from multiple remote repositories. If they are, you should audit them to determine why they are, and what the best remediation step will be for you. Once this behavior becomes the default, you can opt out of it temporarily by adding ``--use-deprecated=TBD`` to your pip invocation. If you're using projects that are not hosted on a public repository, but you still have the public repository as a fallback, consider configuring pip with a repository file to be explicit where that dependency is meant to come from to prevent registration of that name in a public repository to cause pip to error for you. As a Project Owner? ~~~~~~~~~~~~~~~~~~~ If you only publish your project to a single repository, then you do not have to do anything. If you publish your project to multiple repositories that are intended to be used together at the same time, configure all repositories to serve the alternate repository metadata to prevent breakages for your end users. If you publish your project to a single repository, but it is commonly used in conjunction with other repositories, consider preemptively registering your names with those repositories to prevent a third party from being able to cause your users ``pip install`` invocations to start failing. This may not be available if your project name is too generic or if the repositories have policies that prevent defensive name squatting. As a Repository Operator? ~~~~~~~~~~~~~~~~~~~~~~~~~ You'll need to decide how you intend for your repository to be used by your end users and how you want them to use it. For private repositories that host private projects, it is recommended that you mirror the public projects that your users depend on into your own repository, taking care not to let a public project merge with a private project, and tell your users to use the ``--index-url`` option to use only your repository. For public repositories that host public projects, you should implement the alternate repository mechanism and enable the owners of those projects to configure the list of repositories that their project is available from if they make it available from more than one repository. For public repositories that "track" another repository, but provide supplemental artifacts such as wheels built for a specific platform, you should implement the "tracks" metadata for your repository. However, this information **MUST NOT** be settable by end users who are publishing projects to your repository. See TBD for more information. Rejected Ideas ============== *Note: Some of these are somewhat specific to pip, but any solution that doesn't work for pip isn't a particularly useful solution.* Implicitly allow mirrors when the list of files are the same ------------------------------------------------------------ If every repository returns the exact same list of files, then it is safe to consider those repositories to be the same namespace and implicitly merge them. This would possibly mean that mirrors would be automatically allowed without any work on any user or repository operator's part. Unfortunately, this has two failings that make it undesirable: - It only solves the case of mirrors that are exact copies of each other, but not repositories that "track" another one, which ends up being a more generic solution. - Even in the case of exact mirrors, multiple repositories mirroring each other is a distributed system will not always be fully consistent with each other, effectively an eventually consistent system. This means that repositories that relied on this implicit heuristic to work would have sporadic failures due to drift between the source repository and the mirror repositories. Provide a mechanism to order the repositories --------------------------------------------- Providing some mechanism to give the repositories an order, and then short circuiting the discovery algorithm when it finds the first repository that provides files for that project is another workable solution that is safe if the order is specified correctly. However, this has been rejected for a number of reasons: - We've spent 15+ years educating users that the ordering of repositories being specified is not meaningful, and they effectively have an undefined order. It would be difficult to backpedal on that and start saying that now order matters. - Users can easily rearrange the order that they specify their repositories in within a single location, but when loading repositories from multiple locations (env var, conf file, requirements file, cli arguments) the order is hard coded into pip. While it would be a deterministic and documented order, there's no reason to assume it's the order that the user wants their repositories to be defined in, forcing them to contort how they configure pip so that the implicit ordering ends up being the correct one. - The above can be mitigated by providing a way to explicitly declare the order rather than by implicitly using the order they were defined in; however, that then means that the protections are not provided unless the user does some explicit configuration. - Ordering assumes that one repository is *always* preferred over another repository without any way to decide on a project by project basis. - Relying on ordering is subtle; if I look at an ordering of repositories, I have no way of knowing or ensuring in advance what names are going to come from what repositories. I can only know in that moment what names are provided by which repositories. - Relying on ordering is fragile. There's no reason to assume that two disparate repositories are not going to have random naming collisions—what happens if I'm using a library from a lower priority repository and then a higher priority repository happens to start having a colliding name? - In cases where ordering does the wrong thing, it does so silently, with no feedback given to the user. This is by design because it doesn't actually know what the wrong or right thing is, it's just hoping that order will give the right thing, and if it does then users are protected without any breakage. However, when it does the wrong thing, users are left with a very confusing behavior coming from pip, where it's just silently installing the wrong thing. There is a variant of this idea which effectively says that it's really just PyPI's nature of open registration that causes the real problems, so if we treat all repositories but the "default" one as equal priority, and then treat the default one as a lower priority then we'll fix things. That is true in that it does improve things, but it has many of the same problems as the general ordering idea (though not all of them). It also assumes that PyPI, or whatever repository is configured as the "default", is the only repository with open registration of names. However, projects like `Piwheels <https://www.piwheels.org/>`_ exist which users are expected to use in addition to PyPI, which also effectively have open registration of names since it tracks whatever names are registered on PyPI. Rely on repository proxies -------------------------- One possible solution is to instead of having the installer have to solve this, to instead depend on repository proxies that can intelligently merge multiple repositories safely. This could provide a better experience for people with complex needs because they can have configuration and features that are dedicated to the problem space. However, that has been rejected because: - It requires users to opt into using them, unless we also remove the facilities to have more than one repository in installers to force users into using a repository proxy when they need multiple repositories. - Removing facilities to have more than one repository configured has been rejected because it would be too disruptive to end users. - A user may need different outcomes of merging multiple repositories in different contexts, or may need to merge different, mutually exclusive repositories. This means they'll need to actually set up multiple repository proxies for each unique set of options. - It requires users to maintain infrastructure or it requires adding features in installers to automatically spin up a repository for each invocation. - It doesn't actually change the requirement to need to have a solution to these problems, it just shifts the responsibility of implementation from installers to some repository proxy, but in either case we still need something that figures out how to merge these disparate namespaces. - Ultimately, most users do not want to have to stand up a repository proxy just to safely interact with multiple repositories. Rely only on hash checking -------------------------- Another possible solution is to rely on hash checking, since with hash checking enabled users cannot get an artifact that they didn't expect; it doesn't matter if the namespaces are incorrectly merged or not. This is certainly a solution; unfortunately it also suffers from problems that make it unworkable: - It requires users to opt in to it, so users are still unprotected by default. - It requires users to do a bunch of labor to manage their hashes, which is something that most users are unlikely to be willing to do. - It is difficult and verbose to get the protection when users are not using a ``requirements.txt`` file as the source of their dependencies (this affects build time dependencies, and dependencies provided at the command line). - It only sort of solves the problem, in a way it just shifts the responsibility of the problem to be whatever system is generating the hashes that the installer would use. If that system isn't a human manually validating hashes, which it's unlikely it would be, then we've just shifted the question of how to merge these namespaces to whatever tool implements the maintenance of the hashes. Require all projects to exist in the "default" repository --------------------------------------------------------- Another idea is that we can narrow the scope of ``--extra-index-url`` such that its only supported use is to refer to supplemental repositories to the default repository, effectively saying that the default repository defines the namespace, and every additional repository just extends it with extra packages. The implementation of this would roughly be to require that the project **MUST** be registered with the default repository in order for any additional repositories to work. This sort of works if you successfully narrow the scope in that way, but ultimately it has been rejected because: - Users are unlikely to understand or accept this reduced scope, and thus are likely to attempt to continue to use it in the now unsupported fashion. - This is complicated by the fact that with the scope now narrowed, users who have the excluded workflow no longer have any alternative besides setting up a repository proxy, which takes infrastructure and effort that they previously didn't have to do. - It assumes that just because a name in an "extra" repository is the same as in the default repository, that they are the same project. If we were starting from scratch in a brand new ecosystem then maybe we could make this assumption from the start and make it stick, but it's going to be incredibly difficult to get the ecosystem to adjust to that change. - This is a fundamental issue with this approach; the underlying problem that drives dependency confusion is that we're taking disparate namespaces and flattening them into one. This approach essentially just declares that OK, and attempts to mitigate it by requiring everyone to register their names. - Because of the above assumption, in cases where a name in an extra repository collides by accident with the default repository, it's going to appear to work for those users, but they are going to be silently in a state of dependency confusion. - This is made worse by the fact that the person who owns the name that is allowing this to work is going to be completely unaware of the role that they're playing for that user, and might possibly delete their project or hand it off to someone else, potentially allowing them to inadvertently allow a malicious user to take it over. - Users are likely to attempt to get back to a working state by registering their names in their default repository as a defensive name squat. Their ability to do this will depend on the specific policies of their default repository, whether someone already has that name, whether it's too generic, etc. As a best case scenario it will cause needless placeholder projects that serve no purpose other than to secure some internal use of a name. Move to Globally Unique Names ----------------------------- The main reason this problem exists is that we don't have globally unique names, we have locally unique names that exist under multiple namespaces that we are attempting to merge into a single flat namespace. If we could instead come up with a way to have globally unique names, we could sidestep the entire issue. This idea has been rejected because: - Generating globally unique but secure names that are also meaningful to humans is a nearly impossible feat without piggybacking off of some kind of centralized database. To my knowledge the only systems that have managed to do this end up piggybacking off of the domain system and refer to packages by URLs with domains etc. - Even if we come up with a mechanism to get globally unique names, our ability to retrofit that into our decades old system is practically zero without burning it all to the ground and starting over. The best we could probably do is declare that all non globally unique names are implicitly names on the PyPI domain name, and force everyone with a non PyPI package to rename their package. - This would upend so many core assumptions and fundamental parts of our current system it's hard to even know where to start to list them. Only recommend that installers offer explicit configuration ----------------------------------------------------------- One idea that has come up is to essentially just implement the explicit configuration and don't make any other changes to anything else. The specific proposal for a mapping policy is what actually inspired the explicit configuration option, and created a file that looked something like: .. code-block:: JSON { "repositories": { "PyTorch": ["https://download.pytorch.org/whl/nightly"], "PyPI": ["https://pypi.org/simple"] }, "mapping": [ { "paths": ["torch*"], "repositories": ["PyTorch"], "terminating": true }, { "paths": ["*"], "repositories": ["PyPI"] } ] } The recommendation to have explicit configuration pushes the decision on how to implement that onto each installer, allowing them to choose what works best for their users. Ultimately only implementing some kind of explicit configuration was rejected because by its nature it's opt in, so it doesn't protect average users who are least capable to solve the problem with the existing tools; by adding additional protections alongside the explicit configuration, we are able to protect all users by default. Additionally, relying on only explicit configuration also means that every end user has to resolve the same problem over and over again, even in cases like mirrors of PyPI, Piwheels, PyTorch, etc. In each and every case they have to sit there and make decisions (or find some example to cargo cult) in order to be secure. Adding extra features into the mix allows us to centralize those protections where we can, while still giving advanced end users the ability to completely control their own destiny. Scopes à la npm --------------- There's been some suggestion that `scopes similar to how npm has implemented them <https://docs.npmjs.com/cli/v9/using-npm/scope>`__ may ultimately solve this. Ultimately scopes do not change anything about this problem. As far as I know scopes in npm are not globally unique, they're tied to a specific registry just like unscoped names are. However what scopes do enable is an obvious mechanism for grouping related projects and the ability for a user or organization on npm.org to claim an entire scope, which makes explicit configuration significantly easier to handle because you can be assured that there's a whole little slice of the namespace that wholly belongs to you, and you can easily write a rule that assigns an entire scope to a specific non public registry. Unfortunately, it basically ends up being an easier version of the idea to only use explicit configuration, which works ok in npm because its not particularly common for people to use their own registries, but in Python we encourage you to do just that. Open Questions ============== * The `original proposal document <https://docs.google.com/document/d/184fQkb6NggVQfYmjTDA7p_U3iWDKk6grc2DigT1X3Es/>`__ was targeted more specifically to a change to pip, and went into more specific details as to what we expected from pip. Since dictating UX to installers isn't something that we do in PEPs, I've rewritten those parts to be more generic; however, that means that we lose the information on repository files. Is that fine? Or should we standardize what a repository file looks like so the same file can be given to multiple installers instead of hand waving around the specific mechanism installers would use for explicit configuration? Acknowledgements ================ Thanks to Trishank Kuppusamy for kick starting the discussion that lead to this PEP with his `proposal <https://discuss.python.org/t/proposal-preventing-dependency-confusion-attacks-with-the-map-file/23414>`__. Thanks to Paul Moore, Pradyun Gedam, Steve Dower, and Trishank Kuppusamy for providing early feedback and discussion on the ideas in this PEP. Thanks to Jelle Zijlstra, C.A.M. Gerlach, Hugo van Kemenade, and Stefano Rivera for copy editing and improving the structure and quality of this PEP. Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. </code></pre> </details> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="10" /> <span class='post-likes'>10 Likes</span> </div> <div class='crawler-linkback-list' itemscope itemtype='http://schema.org/ItemList'> <div itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <a itemprop='url' href="https://discuss.python.org/t/pep-708-extending-the-repository-api-to-mitigate-dependency-confusion-attacks/24180">PEP 708 – Extending the Repository API to Mitigate Dependency Confusion Attacks</a> <meta itemprop='position' content='5'> </div> <div itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <a itemprop='url' href="https://discuss.python.org/t/pep-766-handling-multiple-indexes-index-priority/71589">PEP 766: handling multiple indexes (Index Priority)</a> <meta itemprop='position' content='6'> </div> <div itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <a itemprop='url' href="https://discuss.python.org/t/typosquatting-dependency-confusion-supply-chain-attack-call-it-as-you-wish/52615/4">Typosquatting, dependency confusion, supply chain attack, call it as you wish</a> <meta itemprop='position' content='7'> </div> <div itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <a itemprop='url' href="https://discuss.python.org/t/proposal-overrides-for-installers/23666/5">Proposal: overrides for installers</a> <meta itemprop='position' content='8'> </div> <div itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <a itemprop='url' href="https://discuss.python.org/t/pep-708-extending-the-repository-api-to-mitigate-dependency-confusion-attacks/24180/2">PEP 708 – Extending the Repository API to Mitigate Dependency Confusion Attacks</a> <meta itemprop='position' content='9'> </div> </div> </div> <div id='post_2' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/steve.dower'><span itemprop='name'>steve.dower</span></a> (Steve Dower) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-02-23T19:52:39Z' class='post-time'> February 23, 2023, 7:52pm </time> <meta itemprop='dateModified' content='2023-02-23T19:52:39Z'> <span itemprop='position'>2</span> </span> </div> <div class='post' itemprop='text'> <p>Will give the PEP a thorough read when I have a bit more time, but I’m in favour of everything I’ve seen previously on this proposal.</p> <aside class="quote group-committers" data-username="dstufft" data-post="1" data-topic="24179"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/dstufft/48/23_2.png" class="avatar"> Donald Stufft:</div> <blockquote> <p>The question is do we think that handwaving it and expecting installers come up with their own independent mechanisms is the right answer?</p> </blockquote> </aside> <p>I think handwaving it is best right now, because it’s the best chance to get this proposal accepted. There are a number of reasonable approaches, and while having a common configuration file would <em>also</em>, <em>independently</em>, be a benefit, it’s not the core reason for this PEP and would just be a distraction.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="3" /> <span class='post-likes'>3 Likes</span> </div> </div> <div id='post_3' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/trishankatdatadog'><span itemprop='name'>trishankatdatadog</span></a> (Trishank Karthik Kuppusamy) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-02-23T20:06:22Z' class='post-time'> February 23, 2023, 8:06pm </time> <meta itemprop='dateModified' content='2023-02-23T20:06:22Z'> <span itemprop='position'>3</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote group-committers" data-username="dstufft" data-post="1" data-topic="24179"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/dstufft/48/23_2.png" class="avatar"> Donald Stufft:</div> <blockquote> <p>The question is do we think that handwaving it and expecting installers come up with their own independent mechanisms is the right answer? Should we specify a file format that installers can/should support to allow better interoperability between different installers, and if so does the repository file approach from my original proposal look like a good starting point?</p> </blockquote> </aside> <p>My opinion is that we should specify the “repository file” format which installers can <em>choose</em> to follow. Perhaps pip can implement it by default.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="2" /> <span class='post-likes'>2 Likes</span> </div> </div> <div id='post_4' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/pf_moore'><span itemprop='name'>pf_moore</span></a> (Paul Moore) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-02-23T21:48:34Z' class='post-time'> February 23, 2023, 9:48pm </time> <meta itemprop='dateModified' content='2023-02-23T21:48:34Z'> <span itemprop='position'>4</span> </span> </div> <div class='post' itemprop='text'> <p>With my pip maintainer hat on, I’d sort of rather someone else did the design work so we don’t have to. But as <a class="mention" href="/u/dstufft">@dstufft</a> is also a pip maintainer, I have no problem if he handwaves the format in the PEP, and then implements something specific in pip <img src="https://emoji.discourse-cdn.com/apple/slightly_smiling_face.png?v=12" title=":slightly_smiling_face:" class="emoji" alt=":slightly_smiling_face:" loading="lazy" width="20" height="20"> I do think that having an independently specified format would be worthwhile (we don’t really want another de-facto standard like <code>requirements.txt</code>), but I agree it’s fine to make it a separate PEP.</p> <p>There’s a mild dilemma in that pip has a policy of only implementing things that are standards-backed, so we really ought to get the format in a PEP before implementing it in pip. But I don’t want to over-think it, I think it’s a distraction from the main point of the PEP.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="2" /> <span class='post-likes'>2 Likes</span> </div> </div> <div id='post_5' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/dstufft'><span itemprop='name'>dstufft</span></a> (Donald Stufft) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-02-23T22:08:55Z' class='post-time'> February 23, 2023, 10:08pm </time> <meta itemprop='dateModified' content='2023-02-23T22:08:55Z'> <span itemprop='position'>5</span> </span> </div> <div class='post' itemprop='text'> <p>Yea, regardless I plan on designing what pip does here. I’m honestly not sure if any of the other installers want there to be a codified answer to that or if they want to figure things out themselves.</p> <p>If nobody feels strongly either way I will probably leave it out of this PEP unless we want to say pip would require that mechanism to be standards defined.</p> <p>From my POV I don’t think it <em>has</em> to be standards defined, but it could be.</p> <p>One way to think about it, is it’s effectively part of the UX that an installer provides for configuring repositories, so it should be up to each individual installer to define their own UX, just like something like the pip config file and how it includes <code>--index-url</code> etc are part of the UX for configuring repositories. Though even this is mitigated by the fact that we could standardize it but explicitly grant installers permission not to support it if it doesn’t work for them.</p> <p>Another way to think about it is that it’s likely that there are going to be cases where people want to share these between installers (I can imagine a security team at a company writing one of these files and wanting to use it for both pip and poetry), so there’s decent value in there being an agreed upon standard for it.</p> <p>At the end of the day, I’m pretty neutral on whether it should be a standard or not, and if it should be a standard I’m fine adding it to this PEP or spinning up a companion or follow up PEP for it.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="2" /> <span class='post-likes'>2 Likes</span> </div> </div> <div id='post_6' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/steve.dower'><span itemprop='name'>steve.dower</span></a> (Steve Dower) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-02-23T23:28:43Z' class='post-time'> February 23, 2023, 11:28pm </time> <meta itemprop='dateModified' content='2023-02-23T23:28:43Z'> <span itemprop='position'>6</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote group-committers" data-username="dstufft" data-post="5" data-topic="24179"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/dstufft/48/23_2.png" class="avatar"> Donald Stufft:</div> <blockquote> <p>I can imagine a security team at a company writing one of these files and wanting to use it for both pip and poetry</p> </blockquote> </aside> <p>A security team at a company is likely to want a single wildcard to require everything to come from their internal server <img src="https://emoji.discourse-cdn.com/apple/wink.png?v=12" title=":wink:" class="emoji" alt=":wink:" loading="lazy" width="20" height="20"> It’s the people working <em>around</em> the security team who will use this, and provided they can do it safely, the security teams will probably let them. But it’s still the end user doing it, so a centralised configuration file is unlikely.</p> <p>But if it came to it, we’d write a script to translate into whichever formats we wanted to properly support. So it’s not a big deal.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_7' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/trishankatdatadog'><span itemprop='name'>trishankatdatadog</span></a> (Trishank Karthik Kuppusamy) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-02-24T00:24:00Z' class='post-time'> February 24, 2023, 12:24am </time> <meta itemprop='dateModified' content='2023-02-24T00:24:00Z'> <span itemprop='position'>7</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote group-committers" data-username="dstufft" data-post="5" data-topic="24179"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/dstufft/48/23_2.png" class="avatar"> Donald Stufft:</div> <blockquote> <p>From my POV I don’t think it <em>has</em> to be standards defined, but it could be.</p> </blockquote> </aside> <p>Right, sorry, let me clarify: I don’t think the format of the repository file needs to be standard. However, I do think that the <em>idea</em> should be discussed in the PEP. Right now, for example, there are orphaned references<sup class="footnote-ref"><a href="#footnote-86921-1" id="footnote-ref-86921-1">[1]</a></sup> to it. The more we at least illustrate the idea with an example or two, the better IMHO.</p> <hr class="footnotes-sep"> <ol class="footnotes-list"> <li id="footnote-86921-1" class="footnote-item"><p>For example, see <a href="https://peps.python.org/pep-0708/#as-a-pip-user" rel="noopener nofollow ugc">here</a>. <a href="#footnote-ref-86921-1" class="footnote-backref">↩︎</a></p> </li> </ol> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="1" /> <span class='post-likes'>1 Like</span> </div> </div> <div id='post_8' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/ncoghlan'><span itemprop='name'>ncoghlan</span></a> (Alyssa Coghlan) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-02-24T16:58:30Z' class='post-time'> February 24, 2023, 4:58pm </time> <meta itemprop='dateModified' content='2023-02-24T16:58:30Z'> <span itemprop='position'>8</span> </span> </div> <div class='post' itemprop='text'> <p>Overall proposal looks solid to me, as I think the PEP is correct that tracking an upstream repo and providing deployment target dependent binary repos with or without a common source repo cover the major cases where multiple repositories are needed.</p> <aside class="quote group-committers" data-username="dstufft" data-post="1" data-topic="24179"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/dstufft/48/23_2.png" class="avatar"> Donald Stufft:</div> <blockquote> <p>The question is do we think that handwaving it and expecting installers come up with their own independent mechanisms is the right answer? Should we specify a file format that installers can/should support to allow better interoperability between different installers, and if so does the repository file approach from my original proposal look like a good starting point?</p> </blockquote> </aside> <p>On that front, I think it makes sense to at least draft that proposal as a PEP. Even if it ends up sitting in “deferred” indefinitely, it will still help clarify aspects of the PEP 708 review.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="1" /> <span class='post-likes'>1 Like</span> </div> </div> <div id='post_9' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/steve.dower'><span itemprop='name'>steve.dower</span></a> (Steve Dower) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-02-24T17:02:25Z' class='post-time'> February 24, 2023, 5:02pm </time> <meta itemprop='dateModified' content='2023-02-24T17:02:25Z'> <span itemprop='position'>9</span> </span> </div> <div class='post' itemprop='text'> <p>There’s almost a draft PEP that came out of <a href="https://discuss.python.org/t/adding-a-global-config-to-specify-package-indexes/8599">this discussion</a>.</p> <p>It would need a slight update to cover this case, but that ought to be simpler than defining a brand new file (and I can’t imagine you’d get away with proposing a new, standard file <em>solely</em> for the overrides needed here).</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_10' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/brettcannon'><span itemprop='name'>brettcannon</span></a> (Brett Cannon) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-02-25T01:46:22Z' class='post-time'> February 25, 2023, 1:46am </time> <meta itemprop='dateModified' content='2023-02-25T01:46:22Z'> <span itemprop='position'>10</span> </span> </div> <div class='post' itemprop='text'> <p>Do we need update both the HTML and JSON formats? This will cause a odd schism between the HTML and JSON formats since their will be a 1.1 hole in the HTML format.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_11' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/pf_moore'><span itemprop='name'>pf_moore</span></a> (Paul Moore) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-02-25T08:14:30Z' class='post-time'> February 25, 2023, 8:14am </time> <meta itemprop='dateModified' content='2023-02-25T08:14:30Z'> <span itemprop='position'>11</span> </span> </div> <div class='post' itemprop='text'> <p>How aggressively do we want to deprecate the HTML format? Without this, it’s not possible for tools using the HTML format to prevent dependency confusion. That feels like something that would pretty much kill that format for serious use.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_12' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/CAM-Gerlach'><span itemprop='name'>CAM-Gerlach</span></a> (C.A.M. Gerlach) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-02-25T10:17:45Z' class='post-time'> February 25, 2023, 10:17am </time> <meta itemprop='dateModified' content='2023-02-25T10:17:45Z'> <span itemprop='position'>12</span> </span> </div> <div class='post' itemprop='text'> <p>Or alternatively, potentially substantially slow adoption and use of the safer approach to mitigate dependency confusion that this PEP enables.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="1" /> <span class='post-likes'>1 Like</span> </div> </div> <div id='post_13' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/dstufft'><span itemprop='name'>dstufft</span></a> (Donald Stufft) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-02-25T16:56:04Z' class='post-time'> February 25, 2023, 4:56pm </time> <meta itemprop='dateModified' content='2023-02-25T16:56:04Z'> <span itemprop='position'>13</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote group-committers" data-username="brettcannon" data-post="10" data-topic="24179" data-full="true"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/brettcannon/48/21723_2.png" class="avatar"> Brett Cannon:</div> <blockquote> <p>Do we need update both the HTML and JSON formats? This will cause a odd schism between the HTML and JSON formats since their will be a 1.1 hole in the HTML format.</p> </blockquote> </aside> <p>The way PEP 691 was written the version numbers for HTML and JSON are kept in sync, so v1.1 of HTML already exists since the acceptance of PEP 700, it just happens to look exactly like v1.0.</p> <blockquote> <p>Future versions of the API may add things that can only be represented in a subset of the available serializations of that version. All serializations version numbers, within a major version, <strong>SHOULD</strong> be kept in sync, but the specifics of how a feature serializes into each format may differ, including whether or not that feature is present at all.</p> <p>It is the intent of this PEP that the API should be thought of as URL endpoints that return data, whose interpretation is defined by the version of that data, and then serialized into the target serialization format.</p> </blockquote> <p>So it’s not actually “HTML v1.1” and “JSON v1.1” it’s “v1.1 of the endpoint, which has been serialized into HTML or JSON”.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="1" /> <span class='post-likes'>1 Like</span> </div> </div> <div id='post_14' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/brettcannon'><span itemprop='name'>brettcannon</span></a> (Brett Cannon) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-03-01T01:24:24Z' class='post-time'> March 1, 2023, 1:24am </time> <meta itemprop='dateModified' content='2023-03-01T01:24:24Z'> <span itemprop='position'>14</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote group-committers" data-username="pf_moore" data-post="11" data-topic="24179"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/pf_moore/48/35_2.png" class="avatar"> Paul Moore:</div> <blockquote> <p>How aggressively do we want to deprecate the HTML format?</p> </blockquote> </aside> <p>Very aggressively? <img src="https://emoji.discourse-cdn.com/apple/wink.png?v=12" title=":wink:" class="emoji" alt=":wink:" loading="lazy" width="20" height="20"> But I’m also biased as I don’t want to have to update mousebender to parse HTML responses more than I already have (i.e. this PEP would require me to implement warning when the HTML response is an unsupported version as up to now there hasn’t been another version than 1.0).</p> <aside class="quote group-committers" data-username="dstufft" data-post="13" data-topic="24179"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/dstufft/48/23_2.png" class="avatar"> Donald Stufft:</div> <blockquote> <p>The way PEP 691 was written the version numbers for HTML and JSON are kept in sync, so v1.1 of HTML already exists since the acceptance of PEP 700, it just happens to look exactly like v1.0.</p> </blockquote> </aside> <p>I didn’t see that little line in PEP 700 about the HTML response for 1.1 being just like 1.0. Opened <a href="https://github.com/brettcannon/mousebender/issues/98" class="inline-onebox">Support PEP 629 and 700 for HTML responses · Issue #98 · brettcannon/mousebender · GitHub</a> .</p> <aside class="quote group-committers" data-username="dstufft" data-post="13" data-topic="24179"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/dstufft/48/23_2.png" class="avatar"> Donald Stufft:</div> <blockquote> <p>So it’s not actually “HTML v1.1” and “JSON v1.1” it’s “v1.1 of the endpoint, which has been serialized into HTML or JSON”.</p> </blockquote> </aside> <p>Yeah, try telling that to my brain that has to figure out how to parse all of this. <img src="https://emoji.discourse-cdn.com/apple/wink.png?v=12" title=":wink:" class="emoji" alt=":wink:" loading="lazy" width="20" height="20"></p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="1" /> <span class='post-likes'>1 Like</span> </div> </div> <div id='post_15' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/pf_moore'><span itemprop='name'>pf_moore</span></a> (Paul Moore) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-03-01T15:14:11Z' class='post-time'> March 1, 2023, 3:14pm </time> <meta itemprop='dateModified' content='2023-03-01T15:14:11Z'> <span itemprop='position'>15</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote group-committers" data-username="brettcannon" data-post="14" data-topic="24179"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/brettcannon/48/21723_2.png" class="avatar"> Brett Cannon:</div> <blockquote> <p>Very aggressively?</p> </blockquote> </aside> <p>I can appreciate that <img src="https://emoji.discourse-cdn.com/apple/slightly_smiling_face.png?v=12" title=":slightly_smiling_face:" class="emoji" alt=":slightly_smiling_face:" loading="lazy" width="20" height="20"></p> <p>My approach in PEP 700 was specifically to avoid confronting <a href="https://peps.python.org/pep-0691/#does-this-mean-pypi-is-planning-to-drop-support-for-html-pep-503">the promises made in PEP 691</a> that the HTML form wasn’t being desupported. But it was only really viable because that PEP was a pretty minor enhancement. I think that with PEP 708 we need to confront the question - is the HTML API intended to be purely legacy, or will we add new features to it? Not including a feature that’s security-related sends a very clear signal that the HTML form is no longer recommended.</p> <p>I don’t know whether other index providers like Artifactory, devpi or Azure have adopted the JSON API yet - it’s going to matter to the viability of PEP 708 if they haven’t, and the new APIs are only available in the JSON form.</p> <p>Personally, I’m fine with deprecating the HTML form in favour of the JSON form. I just think we should be open about it if that’s what we want to do.</p> <p>Edit: Just to give some context, would people be OK if <em>pip</em> dropped support for the HTML form of the simple index? We’re not intending to, but if the thought made you uncomfortable<sup class="footnote-ref"><a href="#footnote-87508-1" id="footnote-ref-87508-1">[1]</a></sup>, you should probably ask yourself why.</p> <hr class="footnotes-sep"> <ol class="footnotes-list"> <li id="footnote-87508-1" class="footnote-item"><p>It made me uncomfortable, certainly… <a href="#footnote-ref-87508-1" class="footnote-backref">↩︎</a></p> </li> </ol> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_16' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/steve.dower'><span itemprop='name'>steve.dower</span></a> (Steve Dower) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-03-01T16:06:08Z' class='post-time'> March 1, 2023, 4:06pm </time> <meta itemprop='dateModified' content='2023-03-01T16:06:08Z'> <span itemprop='position'>16</span> </span> </div> <div class='post' itemprop='text'> <p>On a slightly different topic, where is the <code>pypi:alternate-locations</code> metadata supposed to come from? I’d been assuming it would be metadata in the package itself, like <code>Requires-Python</code>, but it’s not actually stated.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_17' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/dstufft'><span itemprop='name'>dstufft</span></a> (Donald Stufft) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-03-01T16:08:09Z' class='post-time'> March 1, 2023, 4:08pm </time> <meta itemprop='dateModified' content='2023-03-01T16:08:09Z'> <span itemprop='position'>17</span> </span> </div> <div class='post' itemprop='text'> <p>A few things of note:</p> <p>The way PEP 691 is written, individual projects are free to deprecate or remove support for HTML as they feel it is warranted. We don’t ever require any project to maintain support for HTML if they don’t want to, but of course that may mean that there are certain repositories and client combinations that cease to work together.</p> <p>Nothing we put in a PEP can change the fact that projects can continue to support HTML if they so desire, and if they decide not to support it then some combinations will break.</p> <p>What would be accomplished by more aggressively deprecating HTML is:</p> <ul> <li>Up front decision that we’re not going to add new features to HTML, so future PEPs don’t have to even consider it.</li> <li>A signal to project implementers that they need to prioritize deploying a JSON supporting version of their project.</li> </ul> <p>So while we can do some signaling to tell people to shift off HTML, we can’t actually force anyone to drop it more than want to.</p> <p>In terms of whether we should drop HTML or not, I don’t have a strong preference. I think either option is fine and right now we’re letting it up to each individual PEP since they typically have the most context.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_18' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/dstufft'><span itemprop='name'>dstufft</span></a> (Donald Stufft) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-03-01T16:09:18Z' class='post-time'> March 1, 2023, 4:09pm </time> <meta itemprop='dateModified' content='2023-03-01T16:09:18Z'> <span itemprop='position'>18</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote group-committers" data-username="steve.dower" data-post="16" data-topic="24179" data-full="true"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" src="https://sea2.discourse-cdn.com/flex016/user_avatar/discuss.python.org/steve.dower/48/56_2.png" class="avatar"> Steve Dower:</div> <blockquote> <p>On a slightly different topic, where is the <code>pypi:alternate-locations</code> metadata supposed to come from? I’d been assuming it would be metadata in the package itself, like <code>Requires-Python</code>, but it’s not actually stated.</p> </blockquote> </aside> <p>I don’t think it would live in the package itself, it’s not a per file metadata or even per version. It’s a project wide metadata and it may change over time, so we also don’t want to be immutable.</p> <p>My expectation is that repositories will provide some configuration option for it.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="1" /> <span class='post-likes'>1 Like</span> </div> </div> <div id='post_19' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/steve.dower'><span itemprop='name'>steve.dower</span></a> (Steve Dower) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-03-01T16:52:53Z' class='post-time'> March 1, 2023, 4:52pm </time> <meta itemprop='dateModified' content='2023-03-01T16:52:53Z'> <span itemprop='position'>19</span> </span> </div> <div class='post' itemprop='text'> <p>So in chatting with the Azure Artifacts team, we’re leaning towards not really having to do anything at all here (except add JSON support, but that’s a different PEP <img src="https://emoji.discourse-cdn.com/apple/wink.png?v=12" title=":wink:" class="emoji" alt=":wink:" loading="lazy" width="20" height="20"> )</p> <p>Our “usual” mirroring scenario is automatic. A feed is assigned one or more “upstreams”, the feed then appears to have all the packages from all upstreams and handles priority resolution itself. This means that if you reference the feed <em>and also</em> one of its upstreams, you are guaranteed to get conflicts, and the best way to fix it is to stop referencing the upstream explicitly.<sup class="footnote-ref"><a href="#footnote-87517-1" id="footnote-ref-87517-1">[1]</a></sup> The best way to inform users that they should do this is to let pip fail with its new error message <img src="https://emoji.discourse-cdn.com/apple/slight_smile.png?v=12" title=":slight_smile:" class="emoji" alt=":slight_smile:" loading="lazy" width="20" height="20"> So overall the PEP is good here, and we don’t have to implement anything on our side.</p> <p>Similarly for manual mirroring. We can only make that work if the person doing the mirroring can modify the upstream package, and in <em>most</em> cases that’s not going to be the case, and users of the Artifacts feed will just have to not reference the original feed at all (or provide explicit configuration). Again, the PEP does its job without Artifacts needing to change.</p> <p>A mirror feed controlled by the package owner (e.g. pytorch) is the easy case. If we allow setting <code>alternate-locations</code> as a package property and so does PyPI, then it will be fine. If not, users will get to manually override (and presumably they’ll choose the specialised feed rather than PyPI). I’ll be pushing to get at least this much supported.</p> <p>The interesting case is for “additional files” feeds, for example, a feed providing only wheels for a more specific platform than PyPI allows, but expecting sdists/etc. to come directly from PyPI. While this would work today, I’m pretty sure it won’t be resolvable at all with this PEP implemented (except when provided by the publisher who’s set up <code>alternate-locations</code> on all feeds). The only option will be to provide the full set of files from the special feed, and users will choose to use that one.</p> <p>So basically all of that is information rather than requests. Our main scenarios are basically unchanged with no effort required from us, and users will get more strongly worded warnings (i.e. errors) if they’re not following our existing guidance. The dual publishing scenario isn’t common for our users right now, but we’d like it to be, so will likely implement <code>alternate-locations</code> for it.</p> <p>And a gentle warning that anyone currently relying on augmenting available files won’t have any option but to mirror more files and use one of the explicit overrides (but to be clear, we aren’t aware of anyone doing this, it’s just a scenario that we came up with while brainstorming).</p> <hr class="footnotes-sep"> <ol class="footnotes-list"> <li id="footnote-87517-1" class="footnote-item"> <p>pip going direct to the upstream would bypass any policies you have set up around it, such as usage tracking, banned versions, etc. And because the user has opted into these, we assume that the best fix involves not bypassing it. <a href="#footnote-ref-87517-1" class="footnote-backref">↩︎</a></p> </li> </ol> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="2" /> <span class='post-likes'>2 Likes</span> </div> </div> <div id='post_20' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://discuss.python.org/u/dstufft'><span itemprop='name'>dstufft</span></a> (Donald Stufft) </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2023-03-02T15:41:45Z' class='post-time'> March 2, 2023, 3:41pm </time> <meta itemprop='dateModified' content='2023-03-02T15:41:45Z'> <span itemprop='position'>20</span> </span> </div> <div class='post' itemprop='text'> <p>Cool, thanks for discussing it with them, it’s good to get validation of the ideas from another repository implementation.</p> <p>Azure Artifacts I think is sort of a weird in between case between repository operators and project operators. I think it would be fine for AA to give whoever owns the overall AA instance for that account the ability to set the tracks metadata, even though they don’t physically operate the actual servers, they still essentially are trusted the same since they’re the ones who get to decide who owns what name in that instance <sup class="footnote-ref"><a href="#footnote-87649-1" id="footnote-ref-87649-1">[1]</a></sup> of the repository. Individual project owners on AA wouldn’t though <sup class="footnote-ref"><a href="#footnote-87649-2" id="footnote-ref-87649-2">[2]</a></sup>.</p> <hr class="footnotes-sep"> <ol class="footnotes-list"> <li id="footnote-87649-1" class="footnote-item"> <p>Sorry, don’t know offhand what Azure Artifacts calls this. <a href="#footnote-ref-87649-1" class="footnote-backref">↩︎</a></p> </li> <li id="footnote-87649-2" class="footnote-item"> <p>I don’t know the permission model of Azure Artifacts, so this may not actually be a distinction at all in there, but I’m guessing there’s a different permission for people who can upload things, and who can manage the overall namespace. <a href="#footnote-ref-87649-2" class="footnote-backref">↩︎</a></p> </li> </ol> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="1" /> <span class='post-likes'>1 Like</span> </div> </div> </div> <div role='navigation' itemscope itemtype='http://schema.org/SiteNavigationElement' class="topic-body crawler-post"> <span itemprop='name'><b><a rel="next" itemprop="url" href="/t/pep-708-extending-the-repository-api-to-mitigate-dependency-confusion-attacks/24179?page=2">next page →</a></b></span> </div> </div> <footer class="container wrap"> <nav class='crawler-nav'> <ul> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='/' itemprop="url">Home </a> </span> </li> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='/categories' itemprop="url">Categories </a> </span> </li> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='/guidelines' itemprop="url">Guidelines </a> </span> </li> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='/tos' itemprop="url">Terms of Service </a> </span> </li> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='/privacy' itemprop="url">Privacy Policy </a> </span> </li> </ul> </nav> <p class='powered-by-link'>Powered by <a href="https://www.discourse.org">Discourse</a>, best viewed with JavaScript enabled</p> </footer> <div class="buorg"><div>Unfortunately, <a href="https://www.discourse.org/faq/#browser">your browser is unsupported</a>. Please <a href="https://browsehappy.com">switch to a supported browser</a> to view rich content, log in and reply.</div></div> </body> </html>