CINXE.COM
PEP 590 – Vectorcall: a fast calling protocol for CPython | peps.python.org
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="color-scheme" content="light dark"> <title>PEP 590 – Vectorcall: a fast calling protocol for CPython | peps.python.org</title> <link rel="shortcut icon" href="../_static/py.png"> <link rel="canonical" href="https://peps.python.org/pep-0590/"> <link rel="stylesheet" href="../_static/style.css" type="text/css"> <link rel="stylesheet" href="../_static/mq.css" type="text/css"> <link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light"> <link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark"> <link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss"> <meta property="og:title" content='PEP 590 – Vectorcall: a fast calling protocol for CPython | peps.python.org'> <meta property="og:description" content="This PEP introduces a new C API to optimize calls of objects. It introduces a new “vectorcall” protocol and calling convention. This is based on the “fastcall” convention, which is already used internally by CPython. The new features can be used by any ..."> <meta property="og:type" content="website"> <meta property="og:url" content="https://peps.python.org/pep-0590/"> <meta property="og:site_name" content="Python Enhancement Proposals (PEPs)"> <meta property="og:image" content="https://peps.python.org/_static/og-image.png"> <meta property="og:image:alt" content="Python PEPs"> <meta property="og:image:width" content="200"> <meta property="og:image:height" content="200"> <meta name="description" content="This PEP introduces a new C API to optimize calls of objects. It introduces a new “vectorcall” protocol and calling convention. This is based on the “fastcall” convention, which is already used internally by CPython. The new features can be used by any ..."> <meta name="theme-color" content="#3776ab"> </head> <body> <svg xmlns="http://www.w3.org/2000/svg" style="display: none;"> <symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all"> <title>Following system colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <circle cx="12" cy="12" r="9"></circle> <path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path> </svg> </symbol> <symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all"> <title>Selected dark colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <path stroke="none" d="M0 0h24v24H0z" fill="none"></path> <path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path> </svg> </symbol> <symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all"> <title>Selected light colour scheme</title> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <circle cx="12" cy="12" r="5"></circle> <line x1="12" y1="1" x2="12" y2="3"></line> <line x1="12" y1="21" x2="12" y2="23"></line> <line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line> <line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line> <line x1="1" y1="12" x2="3" y2="12"></line> <line x1="21" y1="12" x2="23" y2="12"></line> <line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line> <line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line> </svg> </symbol> </svg> <script> document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto" </script> <section id="pep-page-section"> <header> <h1>Python Enhancement Proposals</h1> <ul class="breadcrumbs"> <li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> » </li> <li><a href="../pep-0000/">PEP Index</a> » </li> <li>PEP 590</li> </ul> <button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())"> <svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg> <svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg> <svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg> <span class="visually-hidden">Toggle light / dark / auto colour theme</span> </button> </header> <article> <section id="pep-content"> <h1 class="page-title">PEP 590 – Vectorcall: a fast calling protocol for CPython</h1> <dl class="rfc2822 field-list simple"> <dt class="field-odd">Author<span class="colon">:</span></dt> <dd class="field-odd">Mark Shannon <mark at hotpy.org>, Jeroen Demeyer <J.Demeyer at UGent.be></dd> <dt class="field-even">BDFL-Delegate<span class="colon">:</span></dt> <dd class="field-even">Petr Viktorin <encukou at gmail.com></dd> <dt class="field-odd">Status<span class="colon">:</span></dt> <dd class="field-odd"><abbr title="Normative proposal accepted for implementation">Accepted</abbr></dd> <dt class="field-even">Type<span class="colon">:</span></dt> <dd class="field-even"><abbr title="Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem">Standards Track</abbr></dd> <dt class="field-odd">Created<span class="colon">:</span></dt> <dd class="field-odd">29-Mar-2019</dd> <dt class="field-even">Python-Version<span class="colon">:</span></dt> <dd class="field-even">3.8</dd> <dt class="field-odd">Post-History<span class="colon">:</span></dt> <dd class="field-odd"><p></p></dd> </dl> <hr class="docutils" /> <section id="contents"> <details><summary>Table of Contents</summary><ul class="simple"> <li><a class="reference internal" href="#abstract">Abstract</a></li> <li><a class="reference internal" href="#motivation">Motivation</a></li> <li><a class="reference internal" href="#specification">Specification</a><ul> <li><a class="reference internal" href="#the-function-pointer-type">The function pointer type</a></li> <li><a class="reference internal" href="#changes-to-the-pytypeobject-struct">Changes to the <code class="docutils literal notranslate"><span class="pre">PyTypeObject</span></code> struct</a></li> <li><a class="reference internal" href="#descriptor-behavior">Descriptor behavior</a></li> <li><a class="reference internal" href="#the-call">The call</a></li> <li><a class="reference internal" href="#py-vectorcall-arguments-offset">PY_VECTORCALL_ARGUMENTS_OFFSET</a></li> </ul> </li> <li><a class="reference internal" href="#new-c-api-and-changes-to-cpython">New C API and changes to CPython</a><ul> <li><a class="reference internal" href="#subclassing">Subclassing</a></li> </ul> </li> <li><a class="reference internal" href="#finalizing-the-api">Finalizing the API</a></li> <li><a class="reference internal" href="#internal-cpython-changes">Internal CPython changes</a><ul> <li><a class="reference internal" href="#changes-to-existing-classes">Changes to existing classes</a></li> <li><a class="reference internal" href="#using-the-vectorcall-protocol-for-classes">Using the vectorcall protocol for classes</a></li> <li><a class="reference internal" href="#the-pymethoddef-protocol-and-argument-clinic">The <code class="docutils literal notranslate"><span class="pre">PyMethodDef</span></code> protocol and Argument Clinic</a></li> </ul> </li> <li><a class="reference internal" href="#third-party-extension-classes-using-vectorcall">Third-party extension classes using vectorcall</a></li> <li><a class="reference internal" href="#performance-implications-of-these-changes">Performance implications of these changes</a></li> <li><a class="reference internal" href="#stable-abi">Stable ABI</a></li> <li><a class="reference internal" href="#alternative-suggestions">Alternative Suggestions</a><ul> <li><a class="reference internal" href="#bpo-29259">bpo-29259</a></li> <li><a class="reference internal" href="#pep-576-and-pep-580">PEP 576 and PEP 580</a></li> <li><a class="reference internal" href="#other-rejected-approaches">Other rejected approaches</a></li> </ul> </li> <li><a class="reference internal" href="#acknowledgements">Acknowledgements</a></li> <li><a class="reference internal" href="#references">References</a></li> <li><a class="reference internal" href="#reference-implementation">Reference implementation</a></li> <li><a class="reference internal" href="#copyright">Copyright</a></li> </ul> </details></section> <div class="pep-banner canonical-doc sticky-banner admonition important"> <p class="admonition-title">Important</p> <p>This PEP is a historical document. The up-to-date, canonical documentation can now be found at <a class="reference external" href="https://docs.python.org/3/c-api/call.html#vectorcall" title="(in Python v3.13)"><span>The Vectorcall Protocol</span></a>.</p> <p class="close-button">×</p> <p>See <a class="pep reference internal" href="../pep-0001/" title="PEP 1 – PEP Purpose and Guidelines">PEP 1</a> for how to propose changes.</p> </div> <section id="abstract"> <h2><a class="toc-backref" href="#abstract" role="doc-backlink">Abstract</a></h2> <p>This PEP introduces a new C API to optimize calls of objects. It introduces a new “vectorcall” protocol and calling convention. This is based on the “fastcall” convention, which is already used internally by CPython. The new features can be used by any user-defined extension class.</p> <p>Most of the new API is private in CPython 3.8. The plan is to finalize semantics and make it public in Python 3.9.</p> <p><strong>NOTE</strong>: This PEP deals only with the Python/C API, it does not affect the Python language or standard library.</p> </section> <section id="motivation"> <h2><a class="toc-backref" href="#motivation" role="doc-backlink">Motivation</a></h2> <p>The choice of a calling convention impacts the performance and flexibility of code on either side of the call. Often there is tension between performance and flexibility.</p> <p>The current <code class="docutils literal notranslate"><span class="pre">tp_call</span></code> <a class="footnote-reference brackets" href="#id5" id="id1">[2]</a> calling convention is sufficiently flexible to cover all cases, but its performance is poor. The poor performance is largely a result of having to create intermediate tuples, and possibly intermediate dicts, during the call. This is mitigated in CPython by including special-case code to speed up calls to Python and builtin functions. Unfortunately, this means that other callables such as classes and third party extension objects are called using the slower, more general <code class="docutils literal notranslate"><span class="pre">tp_call</span></code> calling convention.</p> <p>This PEP proposes that the calling convention used internally for Python and builtin functions is generalized and published so that all calls can benefit from better performance. The new proposed calling convention is not fully general, but covers the large majority of calls. It is designed to remove the overhead of temporary object creation and multiple indirections.</p> <p>Another source of inefficiency in the <code class="docutils literal notranslate"><span class="pre">tp_call</span></code> convention is that it has one function pointer per class, rather than per object. This is inefficient for calls to classes as several intermediate objects need to be created. For a class <code class="docutils literal notranslate"><span class="pre">cls</span></code>, at least one intermediate object is created for each call in the sequence <code class="docutils literal notranslate"><span class="pre">type.__call__</span></code>, <code class="docutils literal notranslate"><span class="pre">cls.__new__</span></code>, <code class="docutils literal notranslate"><span class="pre">cls.__init__</span></code>.</p> <p>This PEP proposes an interface for use by extension modules. Such interfaces cannot effectively be tested, or designed, without having the consumers in the loop. For that reason, we provide private (underscore-prefixed) names. The API may change (based on consumer feedback) in Python 3.9, where we expect it to be finalized, and the underscores removed.</p> </section> <section id="specification"> <h2><a class="toc-backref" href="#specification" role="doc-backlink">Specification</a></h2> <section id="the-function-pointer-type"> <h3><a class="toc-backref" href="#the-function-pointer-type" role="doc-backlink">The function pointer type</a></h3> <p>Calls are made through a function pointer taking the following parameters:</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">PyObject</span> <span class="pre">*callable</span></code>: The called object</li> <li><code class="docutils literal notranslate"><span class="pre">PyObject</span> <span class="pre">*const</span> <span class="pre">*args</span></code>: A vector of arguments</li> <li><code class="docutils literal notranslate"><span class="pre">size_t</span> <span class="pre">nargs</span></code>: The number of arguments plus the optional flag <code class="docutils literal notranslate"><span class="pre">PY_VECTORCALL_ARGUMENTS_OFFSET</span></code> (see below)</li> <li><code class="docutils literal notranslate"><span class="pre">PyObject</span> <span class="pre">*kwnames</span></code>: Either <code class="docutils literal notranslate"><span class="pre">NULL</span></code> or a tuple with the names of the keyword arguments</li> </ul> <p>This is implemented by the function pointer type: <code class="docutils literal notranslate"><span class="pre">typedef</span> <span class="pre">PyObject</span> <span class="pre">*(*vectorcallfunc)(PyObject</span> <span class="pre">*callable,</span> <span class="pre">PyObject</span> <span class="pre">*const</span> <span class="pre">*args,</span> <span class="pre">size_t</span> <span class="pre">nargs,</span> <span class="pre">PyObject</span> <span class="pre">*kwnames);</span></code></p> </section> <section id="changes-to-the-pytypeobject-struct"> <h3><a class="toc-backref" href="#changes-to-the-pytypeobject-struct" role="doc-backlink">Changes to the <code class="docutils literal notranslate"><span class="pre">PyTypeObject</span></code> struct</a></h3> <p>The unused slot <code class="docutils literal notranslate"><span class="pre">printfunc</span> <span class="pre">tp_print</span></code> is replaced with <code class="docutils literal notranslate"><span class="pre">tp_vectorcall_offset</span></code>. It has the type <code class="docutils literal notranslate"><span class="pre">Py_ssize_t</span></code>. A new <code class="docutils literal notranslate"><span class="pre">tp_flags</span></code> flag is added, <code class="docutils literal notranslate"><span class="pre">_Py_TPFLAGS_HAVE_VECTORCALL</span></code>, which must be set for any class that uses the vectorcall protocol.</p> <p>If <code class="docutils literal notranslate"><span class="pre">_Py_TPFLAGS_HAVE_VECTORCALL</span></code> is set, then <code class="docutils literal notranslate"><span class="pre">tp_vectorcall_offset</span></code> must be a positive integer. It is the offset into the object of the vectorcall function pointer of type <code class="docutils literal notranslate"><span class="pre">vectorcallfunc</span></code>. This pointer may be <code class="docutils literal notranslate"><span class="pre">NULL</span></code>, in which case the behavior is the same as if <code class="docutils literal notranslate"><span class="pre">_Py_TPFLAGS_HAVE_VECTORCALL</span></code> was not set.</p> <p>The <code class="docutils literal notranslate"><span class="pre">tp_print</span></code> slot is reused as the <code class="docutils literal notranslate"><span class="pre">tp_vectorcall_offset</span></code> slot to make it easier for external projects to backport the vectorcall protocol to earlier Python versions. In particular, the Cython project has shown interest in doing that (see <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2018-June/153927.html">https://mail.python.org/pipermail/python-dev/2018-June/153927.html</a>).</p> </section> <section id="descriptor-behavior"> <h3><a class="toc-backref" href="#descriptor-behavior" role="doc-backlink">Descriptor behavior</a></h3> <p>One additional type flag is specified: <code class="docutils literal notranslate"><span class="pre">Py_TPFLAGS_METHOD_DESCRIPTOR</span></code>.</p> <p><code class="docutils literal notranslate"><span class="pre">Py_TPFLAGS_METHOD_DESCRIPTOR</span></code> should be set if the callable uses the descriptor protocol to create a bound method-like object. This is used by the interpreter to avoid creating temporary objects when calling methods (see <code class="docutils literal notranslate"><span class="pre">_PyObject_GetMethod</span></code> and the <code class="docutils literal notranslate"><span class="pre">LOAD_METHOD</span></code>/<code class="docutils literal notranslate"><span class="pre">CALL_METHOD</span></code> opcodes).</p> <p>Concretely, if <code class="docutils literal notranslate"><span class="pre">Py_TPFLAGS_METHOD_DESCRIPTOR</span></code> is set for <code class="docutils literal notranslate"><span class="pre">type(func)</span></code>, then:</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">func.__get__(obj,</span> <span class="pre">cls)(*args,</span> <span class="pre">**kwds)</span></code> (with <code class="docutils literal notranslate"><span class="pre">obj</span></code> not None) must be equivalent to <code class="docutils literal notranslate"><span class="pre">func(obj,</span> <span class="pre">*args,</span> <span class="pre">**kwds)</span></code>.</li> <li><code class="docutils literal notranslate"><span class="pre">func.__get__(None,</span> <span class="pre">cls)(*args,</span> <span class="pre">**kwds)</span></code> must be equivalent to <code class="docutils literal notranslate"><span class="pre">func(*args,</span> <span class="pre">**kwds)</span></code>.</li> </ul> <p>There are no restrictions on the object <code class="docutils literal notranslate"><span class="pre">func.__get__(obj,</span> <span class="pre">cls)</span></code>. The latter is not required to implement the vectorcall protocol.</p> </section> <section id="the-call"> <h3><a class="toc-backref" href="#the-call" role="doc-backlink">The call</a></h3> <p>The call takes the form <code class="docutils literal notranslate"><span class="pre">((vectorcallfunc)(((char</span> <span class="pre">*)o)+offset))(o,</span> <span class="pre">args,</span> <span class="pre">n,</span> <span class="pre">kwnames)</span></code> where <code class="docutils literal notranslate"><span class="pre">offset</span></code> is <code class="docutils literal notranslate"><span class="pre">Py_TYPE(o)->tp_vectorcall_offset</span></code>. The caller is responsible for creating the <code class="docutils literal notranslate"><span class="pre">kwnames</span></code> tuple and ensuring that there are no duplicates in it.</p> <p><code class="docutils literal notranslate"><span class="pre">n</span></code> is the number of positional arguments plus possibly the <code class="docutils literal notranslate"><span class="pre">PY_VECTORCALL_ARGUMENTS_OFFSET</span></code> flag.</p> </section> <section id="py-vectorcall-arguments-offset"> <h3><a class="toc-backref" href="#py-vectorcall-arguments-offset" role="doc-backlink">PY_VECTORCALL_ARGUMENTS_OFFSET</a></h3> <p>The flag <code class="docutils literal notranslate"><span class="pre">PY_VECTORCALL_ARGUMENTS_OFFSET</span></code> should be added to <code class="docutils literal notranslate"><span class="pre">n</span></code> if the callee is allowed to temporarily change <code class="docutils literal notranslate"><span class="pre">args[-1]</span></code>. In other words, this can be used if <code class="docutils literal notranslate"><span class="pre">args</span></code> points to argument 1 in the allocated vector. The callee must restore the value of <code class="docutils literal notranslate"><span class="pre">args[-1]</span></code> before returning.</p> <p>Whenever they can do so cheaply (without allocation), callers are encouraged to use <code class="docutils literal notranslate"><span class="pre">PY_VECTORCALL_ARGUMENTS_OFFSET</span></code>. Doing so will allow callables such as bound methods to make their onward calls cheaply. The bytecode interpreter already allocates space on the stack for the callable, so it can use this trick at no additional cost.</p> <p>See <a class="footnote-reference brackets" href="#id6" id="id2">[3]</a> for an example of how <code class="docutils literal notranslate"><span class="pre">PY_VECTORCALL_ARGUMENTS_OFFSET</span></code> is used by a callee to avoid allocation.</p> <p>For getting the actual number of arguments from the parameter <code class="docutils literal notranslate"><span class="pre">n</span></code>, the macro <code class="docutils literal notranslate"><span class="pre">PyVectorcall_NARGS(n)</span></code> must be used. This allows for future changes or extensions.</p> </section> </section> <section id="new-c-api-and-changes-to-cpython"> <h2><a class="toc-backref" href="#new-c-api-and-changes-to-cpython" role="doc-backlink">New C API and changes to CPython</a></h2> <p>The following functions or macros are added to the C API:</p> <ul class="simple"> <li><code class="docutils literal notranslate"><span class="pre">PyObject</span> <span class="pre">*_PyObject_Vectorcall(PyObject</span> <span class="pre">*obj,</span> <span class="pre">PyObject</span> <span class="pre">*const</span> <span class="pre">*args,</span> <span class="pre">size_t</span> <span class="pre">nargs,</span> <span class="pre">PyObject</span> <span class="pre">*keywords)</span></code>: Calls <code class="docutils literal notranslate"><span class="pre">obj</span></code> with the given arguments. Note that <code class="docutils literal notranslate"><span class="pre">nargs</span></code> may include the flag <code class="docutils literal notranslate"><span class="pre">PY_VECTORCALL_ARGUMENTS_OFFSET</span></code>. The actual number of positional arguments is given by <code class="docutils literal notranslate"><span class="pre">PyVectorcall_NARGS(nargs)</span></code>. The argument <code class="docutils literal notranslate"><span class="pre">keywords</span></code> is a tuple of keyword names or <code class="docutils literal notranslate"><span class="pre">NULL</span></code>. An empty tuple has the same effect as passing <code class="docutils literal notranslate"><span class="pre">NULL</span></code>. This uses either the vectorcall protocol or <code class="docutils literal notranslate"><span class="pre">tp_call</span></code> internally; if neither is supported, an exception is raised.</li> <li><code class="docutils literal notranslate"><span class="pre">PyObject</span> <span class="pre">*PyVectorcall_Call(PyObject</span> <span class="pre">*obj,</span> <span class="pre">PyObject</span> <span class="pre">*tuple,</span> <span class="pre">PyObject</span> <span class="pre">*dict)</span></code>: Call the object (which must support vectorcall) with the old <code class="docutils literal notranslate"><span class="pre">*args</span></code> and <code class="docutils literal notranslate"><span class="pre">**kwargs</span></code> calling convention. This is mostly meant to put in the <code class="docutils literal notranslate"><span class="pre">tp_call</span></code> slot.</li> <li><code class="docutils literal notranslate"><span class="pre">Py_ssize_t</span> <span class="pre">PyVectorcall_NARGS(size_t</span> <span class="pre">nargs)</span></code>: Given a vectorcall <code class="docutils literal notranslate"><span class="pre">nargs</span></code> argument, return the actual number of arguments. Currently equivalent to <code class="docutils literal notranslate"><span class="pre">nargs</span> <span class="pre">&</span> <span class="pre">~PY_VECTORCALL_ARGUMENTS_OFFSET</span></code>.</li> </ul> <section id="subclassing"> <h3><a class="toc-backref" href="#subclassing" role="doc-backlink">Subclassing</a></h3> <p>Extension types inherit the type flag <code class="docutils literal notranslate"><span class="pre">_Py_TPFLAGS_HAVE_VECTORCALL</span></code> and the value <code class="docutils literal notranslate"><span class="pre">tp_vectorcall_offset</span></code> from the base class, provided that they implement <code class="docutils literal notranslate"><span class="pre">tp_call</span></code> the same way as the base class. Additionally, the flag <code class="docutils literal notranslate"><span class="pre">Py_TPFLAGS_METHOD_DESCRIPTOR</span></code> is inherited if <code class="docutils literal notranslate"><span class="pre">tp_descr_get</span></code> is implemented the same way as the base class.</p> <p>Heap types never inherit the vectorcall protocol because that would not be safe (heap types can be changed dynamically). This restriction may be lifted in the future, but that would require special-casing <code class="docutils literal notranslate"><span class="pre">__call__</span></code> in <code class="docutils literal notranslate"><span class="pre">type.__setattribute__</span></code>.</p> </section> </section> <section id="finalizing-the-api"> <h2><a class="toc-backref" href="#finalizing-the-api" role="doc-backlink">Finalizing the API</a></h2> <p>The underscore in the names <code class="docutils literal notranslate"><span class="pre">_PyObject_Vectorcall</span></code> and <code class="docutils literal notranslate"><span class="pre">_Py_TPFLAGS_HAVE_VECTORCALL</span></code> indicates that this API may change in minor Python versions. When finalized (which is planned for Python 3.9), they will be renamed to <code class="docutils literal notranslate"><span class="pre">PyObject_Vectorcall</span></code> and <code class="docutils literal notranslate"><span class="pre">Py_TPFLAGS_HAVE_VECTORCALL</span></code>. The old underscore-prefixed names will remain available as aliases.</p> <p>The new API will be documented as normal, but will warn of the above.</p> <p>Semantics for the other names introduced in this PEP (<code class="docutils literal notranslate"><span class="pre">PyVectorcall_NARGS</span></code>, <code class="docutils literal notranslate"><span class="pre">PyVectorcall_Call</span></code>, <code class="docutils literal notranslate"><span class="pre">Py_TPFLAGS_METHOD_DESCRIPTOR</span></code>, <code class="docutils literal notranslate"><span class="pre">PY_VECTORCALL_ARGUMENTS_OFFSET</span></code>) are final.</p> </section> <section id="internal-cpython-changes"> <h2><a class="toc-backref" href="#internal-cpython-changes" role="doc-backlink">Internal CPython changes</a></h2> <section id="changes-to-existing-classes"> <h3><a class="toc-backref" href="#changes-to-existing-classes" role="doc-backlink">Changes to existing classes</a></h3> <p>The <code class="docutils literal notranslate"><span class="pre">function</span></code>, <code class="docutils literal notranslate"><span class="pre">builtin_function_or_method</span></code>, <code class="docutils literal notranslate"><span class="pre">method_descriptor</span></code>, <code class="docutils literal notranslate"><span class="pre">method</span></code>, <code class="docutils literal notranslate"><span class="pre">wrapper_descriptor</span></code>, <code class="docutils literal notranslate"><span class="pre">method-wrapper</span></code> classes will use the vectorcall protocol (not all of these will be changed in the initial implementation).</p> <p>For <code class="docutils literal notranslate"><span class="pre">builtin_function_or_method</span></code> and <code class="docutils literal notranslate"><span class="pre">method_descriptor</span></code> (which use the <code class="docutils literal notranslate"><span class="pre">PyMethodDef</span></code> data structure), one could implement a specific vectorcall wrapper for every existing calling convention. Whether or not it is worth doing that remains to be seen.</p> </section> <section id="using-the-vectorcall-protocol-for-classes"> <h3><a class="toc-backref" href="#using-the-vectorcall-protocol-for-classes" role="doc-backlink">Using the vectorcall protocol for classes</a></h3> <p>For a class <code class="docutils literal notranslate"><span class="pre">cls</span></code>, creating a new instance using <code class="docutils literal notranslate"><span class="pre">cls(xxx)</span></code> requires multiple calls. At least one intermediate object is created for each call in the sequence <code class="docutils literal notranslate"><span class="pre">type.__call__</span></code>, <code class="docutils literal notranslate"><span class="pre">cls.__new__</span></code>, <code class="docutils literal notranslate"><span class="pre">cls.__init__</span></code>. So it makes a lot of sense to use vectorcall for calling classes. This really means implementing the vectorcall protocol for <code class="docutils literal notranslate"><span class="pre">type</span></code>. Some of the most commonly used classes will use this protocol, probably <code class="docutils literal notranslate"><span class="pre">range</span></code>, <code class="docutils literal notranslate"><span class="pre">list</span></code>, <code class="docutils literal notranslate"><span class="pre">str</span></code>, and <code class="docutils literal notranslate"><span class="pre">type</span></code>.</p> </section> <section id="the-pymethoddef-protocol-and-argument-clinic"> <h3><a class="toc-backref" href="#the-pymethoddef-protocol-and-argument-clinic" role="doc-backlink">The <code class="docutils literal notranslate"><span class="pre">PyMethodDef</span></code> protocol and Argument Clinic</a></h3> <p>Argument Clinic <a class="footnote-reference brackets" href="#id7" id="id3">[4]</a> automatically generates wrapper functions around lower-level callables, providing safe unboxing of primitive types and other safety checks. Argument Clinic could be extended to generate wrapper objects conforming to the new <code class="docutils literal notranslate"><span class="pre">vectorcall</span></code> protocol. This will allow execution to flow from the caller to the Argument Clinic generated wrapper and thence to the hand-written code with only a single indirection.</p> </section> </section> <section id="third-party-extension-classes-using-vectorcall"> <h2><a class="toc-backref" href="#third-party-extension-classes-using-vectorcall" role="doc-backlink">Third-party extension classes using vectorcall</a></h2> <p>To enable call performance on a par with Python functions and built-in functions, third-party callables should include a <code class="docutils literal notranslate"><span class="pre">vectorcallfunc</span></code> function pointer, set <code class="docutils literal notranslate"><span class="pre">tp_vectorcall_offset</span></code> to the correct value and add the <code class="docutils literal notranslate"><span class="pre">_Py_TPFLAGS_HAVE_VECTORCALL</span></code> flag. Any class that does this must implement the <code class="docutils literal notranslate"><span class="pre">tp_call</span></code> function and make sure its behaviour is consistent with the <code class="docutils literal notranslate"><span class="pre">vectorcallfunc</span></code> function. Setting <code class="docutils literal notranslate"><span class="pre">tp_call</span></code> to <code class="docutils literal notranslate"><span class="pre">PyVectorcall_Call</span></code> is sufficient.</p> </section> <section id="performance-implications-of-these-changes"> <h2><a class="toc-backref" href="#performance-implications-of-these-changes" role="doc-backlink">Performance implications of these changes</a></h2> <p>This PEP should not have much impact on the performance of existing code (neither in the positive nor the negative sense). It is mainly meant to allow efficient new code to be written, not to make existing code faster.</p> <p>Nevertheless, this PEP optimizes for <code class="docutils literal notranslate"><span class="pre">METH_FASTCALL</span></code> functions. Performance of functions using <code class="docutils literal notranslate"><span class="pre">METH_VARARGS</span></code> will become slightly worse.</p> </section> <section id="stable-abi"> <h2><a class="toc-backref" href="#stable-abi" role="doc-backlink">Stable ABI</a></h2> <p>Nothing from this PEP is added to the stable ABI (<a class="pep reference internal" href="../pep-0384/" title="PEP 384 – Defining a Stable ABI">PEP 384</a>).</p> </section> <section id="alternative-suggestions"> <h2><a class="toc-backref" href="#alternative-suggestions" role="doc-backlink">Alternative Suggestions</a></h2> <section id="bpo-29259"> <h3><a class="toc-backref" href="#bpo-29259" role="doc-backlink">bpo-29259</a></h3> <p><a class="pep reference internal" href="../pep-0590/" title="PEP 590 – Vectorcall: a fast calling protocol for CPython">PEP 590</a> is close to what was proposed in bpo-29259 <a class="footnote-reference brackets" href="#bpo29259" id="id4">[1]</a>. The main difference is that this PEP stores the function pointer in the instance rather than in the class. This makes more sense for implementing functions in C, where every instance corresponds to a different C function. It also allows optimizing <code class="docutils literal notranslate"><span class="pre">type.__call__</span></code>, which is not possible with bpo-29259.</p> </section> <section id="pep-576-and-pep-580"> <h3><a class="toc-backref" href="#pep-576-and-pep-580" role="doc-backlink">PEP 576 and PEP 580</a></h3> <p>Both <a class="pep reference internal" href="../pep-0576/" title="PEP 576 – Rationalize Built-in function classes">PEP 576</a> and <a class="pep reference internal" href="../pep-0580/" title="PEP 580 – The C call protocol">PEP 580</a> are designed to enable 3rd party objects to be both expressive and performant (on a par with CPython objects). The purpose of this PEP is provide a uniform way to call objects in the CPython ecosystem that is both expressive and as performant as possible.</p> <p>This PEP is broader in scope than <a class="pep reference internal" href="../pep-0576/" title="PEP 576 – Rationalize Built-in function classes">PEP 576</a> and uses variable rather than fixed offset function-pointers. The underlying calling convention is similar. Because <a class="pep reference internal" href="../pep-0576/" title="PEP 576 – Rationalize Built-in function classes">PEP 576</a> only allows a fixed offset for the function pointer, it would not allow the improvements to any objects with constraints on their layout.</p> <p><a class="pep reference internal" href="../pep-0580/" title="PEP 580 – The C call protocol">PEP 580</a> proposes a major change to the <code class="docutils literal notranslate"><span class="pre">PyMethodDef</span></code> protocol used to define builtin functions. This PEP provides a more general and simpler mechanism in the form of a new calling convention. This PEP also extends the <code class="docutils literal notranslate"><span class="pre">PyMethodDef</span></code> protocol, but merely to formalise existing conventions.</p> </section> <section id="other-rejected-approaches"> <h3><a class="toc-backref" href="#other-rejected-approaches" role="doc-backlink">Other rejected approaches</a></h3> <p>A longer, 6 argument, form combining both the vector and optional tuple and dictionary arguments was considered. However, it was found that the code to convert between it and the old <code class="docutils literal notranslate"><span class="pre">tp_call</span></code> form was overly cumbersome and inefficient. Also, since only 4 arguments are passed in registers on x64 Windows, the two extra arguments would have non-negligible costs.</p> <p>Removing any special cases and making all calls use the <code class="docutils literal notranslate"><span class="pre">tp_call</span></code> form was also considered. However, unless a much more efficient way was found to create and destroy tuples, and to a lesser extent dictionaries, then it would be too slow.</p> </section> </section> <section id="acknowledgements"> <h2><a class="toc-backref" href="#acknowledgements" role="doc-backlink">Acknowledgements</a></h2> <p>Victor Stinner for developing the original “fastcall” calling convention internally to CPython. This PEP codifies and extends his work.</p> </section> <section id="references"> <h2><a class="toc-backref" href="#references" role="doc-backlink">References</a></h2> <aside class="footnote-list brackets"> <aside class="footnote brackets" id="bpo29259" role="doc-footnote"> <dt class="label" id="bpo29259">[<a href="#id4">1</a>]</dt> <dd>Add tp_fastcall to PyTypeObject: support FASTCALL calling convention for all callable objects, <a class="reference external" href="https://bugs.python.org/issue29259">https://bugs.python.org/issue29259</a></aside> <aside class="footnote brackets" id="id5" role="doc-footnote"> <dt class="label" id="id5">[<a href="#id1">2</a>]</dt> <dd>tp_call/PyObject_Call calling convention <a class="reference external" href="https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_call">https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_call</a></aside> <aside class="footnote brackets" id="id6" role="doc-footnote"> <dt class="label" id="id6">[<a href="#id2">3</a>]</dt> <dd>Using PY_VECTORCALL_ARGUMENTS_OFFSET in callee <a class="reference external" href="https://github.com/markshannon/cpython/blob/815cc1a30d85cdf2e3d77d21224db7055a1f07cb/Objects/classobject.c#L53">https://github.com/markshannon/cpython/blob/815cc1a30d85cdf2e3d77d21224db7055a1f07cb/Objects/classobject.c#L53</a></aside> <aside class="footnote brackets" id="id7" role="doc-footnote"> <dt class="label" id="id7">[<a href="#id3">4</a>]</dt> <dd>Argument Clinic <a class="reference external" href="https://docs.python.org/3/howto/clinic.html">https://docs.python.org/3/howto/clinic.html</a></aside> </aside> </section> <section id="reference-implementation"> <h2><a class="toc-backref" href="#reference-implementation" role="doc-backlink">Reference implementation</a></h2> <p>A minimal implementation can be found at <a class="reference external" href="https://github.com/markshannon/cpython/tree/vectorcall-minimal">https://github.com/markshannon/cpython/tree/vectorcall-minimal</a></p> </section> <section id="copyright"> <h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2> <p>This document has been placed in the public domain.</p> </section> </section> <hr class="docutils" /> <p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-0590.rst">https://github.com/python/peps/blob/main/peps/pep-0590.rst</a></p> <p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-0590.rst">2024-06-01 20:09:32 GMT</a></p> </article> <nav id="pep-sidebar"> <h2>Contents</h2> <ul> <li><a class="reference internal" href="#abstract">Abstract</a></li> <li><a class="reference internal" href="#motivation">Motivation</a></li> <li><a class="reference internal" href="#specification">Specification</a><ul> <li><a class="reference internal" href="#the-function-pointer-type">The function pointer type</a></li> <li><a class="reference internal" href="#changes-to-the-pytypeobject-struct">Changes to the <code class="docutils literal notranslate"><span class="pre">PyTypeObject</span></code> struct</a></li> <li><a class="reference internal" href="#descriptor-behavior">Descriptor behavior</a></li> <li><a class="reference internal" href="#the-call">The call</a></li> <li><a class="reference internal" href="#py-vectorcall-arguments-offset">PY_VECTORCALL_ARGUMENTS_OFFSET</a></li> </ul> </li> <li><a class="reference internal" href="#new-c-api-and-changes-to-cpython">New C API and changes to CPython</a><ul> <li><a class="reference internal" href="#subclassing">Subclassing</a></li> </ul> </li> <li><a class="reference internal" href="#finalizing-the-api">Finalizing the API</a></li> <li><a class="reference internal" href="#internal-cpython-changes">Internal CPython changes</a><ul> <li><a class="reference internal" href="#changes-to-existing-classes">Changes to existing classes</a></li> <li><a class="reference internal" href="#using-the-vectorcall-protocol-for-classes">Using the vectorcall protocol for classes</a></li> <li><a class="reference internal" href="#the-pymethoddef-protocol-and-argument-clinic">The <code class="docutils literal notranslate"><span class="pre">PyMethodDef</span></code> protocol and Argument Clinic</a></li> </ul> </li> <li><a class="reference internal" href="#third-party-extension-classes-using-vectorcall">Third-party extension classes using vectorcall</a></li> <li><a class="reference internal" href="#performance-implications-of-these-changes">Performance implications of these changes</a></li> <li><a class="reference internal" href="#stable-abi">Stable ABI</a></li> <li><a class="reference internal" href="#alternative-suggestions">Alternative Suggestions</a><ul> <li><a class="reference internal" href="#bpo-29259">bpo-29259</a></li> <li><a class="reference internal" href="#pep-576-and-pep-580">PEP 576 and PEP 580</a></li> <li><a class="reference internal" href="#other-rejected-approaches">Other rejected approaches</a></li> </ul> </li> <li><a class="reference internal" href="#acknowledgements">Acknowledgements</a></li> <li><a class="reference internal" href="#references">References</a></li> <li><a class="reference internal" href="#reference-implementation">Reference implementation</a></li> <li><a class="reference internal" href="#copyright">Copyright</a></li> </ul> <br> <a id="source" href="https://github.com/python/peps/blob/main/peps/pep-0590.rst">Page Source (GitHub)</a> </nav> </section> <script src="../_static/colour_scheme.js"></script> <script src="../_static/wrap_tables.js"></script> <script src="../_static/sticky_banner.js"></script> </body> </html>