CINXE.COM

PyPy

<?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet type="text/xsl" href="assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>PyPy</title><link>https://www.pypy.org/</link><description>A Faster Python</description><atom:link href="https://www.pypy.org/rss.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2025 &lt;a href="mailto:pypy-dev@pypy.org"&gt;The PyPy Team&lt;/a&gt; </copyright><lastBuildDate>Wed, 26 Feb 2025 12:28:22 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>PyPy v7.3.19 release</title><link>https://www.pypy.org/posts/2025/02/pypy-v7319-release.html</link><dc:creator>mattip</dc:creator><description>&lt;section id="pypy-v7-3-19-release-of-python-2-7-3-10-and-3-11-beta"&gt; &lt;h2&gt;PyPy v7.3.19: release of python 2.7, 3.10 and 3.11 beta&lt;/h2&gt; &lt;p&gt;The PyPy team is proud to release version 7.3.19 of PyPy. This is primarily a bug-fix release fixing JIT-related problems and follows quickly on the heels of the previous release on Feb 6, 2025.&lt;/p&gt; &lt;p&gt;This release includes a python 3.11 interpreter. There were bugs in the first beta that could prevent its wider use, so we are continuing to call this release "beta". In the next release we will drop 3.10 and remove the "beta" label.&lt;/p&gt; &lt;p&gt;The release includes three different interpreters:&lt;/p&gt; &lt;ul class="simple"&gt; &lt;li&gt;&lt;p&gt;PyPy2.7, which is an interpreter supporting the syntax and the features of Python 2.7 including the stdlib for CPython 2.7.18+ (the &lt;code class="docutils literal"&gt;+&lt;/code&gt; is for backported security updates)&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;PyPy3.10, which is an interpreter supporting the syntax and the features of Python 3.10, including the stdlib for CPython 3.10.16.&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;PyPy3.11, which is an interpreter supporting the syntax and the features of Python 3.11, including the stdlib for CPython 3.11.11.&lt;/p&gt;&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;The interpreters are based on much the same codebase, thus the triple release. This is a micro release, all APIs are compatible with the other 7.3 releases. It follows after 7.3.17 release on August 28, 2024.&lt;/p&gt; &lt;p&gt;We recommend updating. You can find links to download the releases here:&lt;/p&gt; &lt;blockquote&gt; &lt;p&gt;&lt;a class="reference external" href="https://pypy.org/download.html"&gt;https://pypy.org/download.html&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt; &lt;p&gt;We would like to thank our donors for the continued support of the PyPy project. If PyPy is not quite good enough for your needs, we are available for &lt;a class="reference external" href="https://www.pypy.org/pypy-sponsors.html"&gt;direct consulting&lt;/a&gt; work. If PyPy is helping you out, we would love to hear about it and encourage submissions to our &lt;a class="reference external" href="https://pypy.org/blog"&gt;blog&lt;/a&gt; via a pull request to &lt;a class="reference external" href="https://github.com/pypy/pypy.org"&gt;https://github.com/pypy/pypy.org&lt;/a&gt;&lt;/p&gt; &lt;p&gt;We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: bug fixes, &lt;a class="reference external" href="https://doc.pypy.org/"&gt;PyPy&lt;/a&gt; and &lt;a class="reference external" href="https://rpython.readthedocs.org"&gt;RPython&lt;/a&gt; documentation improvements, or general &lt;a class="reference external" href="https://doc.pypy.org/en/latest/project-ideas.html"&gt;help&lt;/a&gt; with making RPython's JIT even better.&lt;/p&gt; &lt;p&gt;If you are a python library maintainer and use C-extensions, please consider making a &lt;a class="reference external" href="https://hpyproject.org/"&gt;HPy&lt;/a&gt; / &lt;a class="reference external" href="https://cffi.readthedocs.io"&gt;CFFI&lt;/a&gt; / &lt;a class="reference external" href="https://cppyy.readthedocs.io"&gt;cppyy&lt;/a&gt; version of your library that would be performant on PyPy. In any case, both &lt;a class="reference external" href="https://github.com/joerick/cibuildwheel"&gt;cibuildwheel&lt;/a&gt; and the &lt;a class="reference external" href="https://github.com/matthew-brett/multibuild"&gt;multibuild system&lt;/a&gt; support building wheels for PyPy.&lt;/p&gt; &lt;section id="what-is-pypy"&gt; &lt;h3&gt;What is PyPy?&lt;/h3&gt; &lt;p&gt;PyPy is a Python interpreter, a drop-in replacement for CPython It's fast (&lt;a class="reference external" href="https://speed.pypy.org"&gt;PyPy and CPython&lt;/a&gt; performance comparison) due to its integrated tracing JIT compiler.&lt;/p&gt; &lt;p&gt;We also welcome developers of other &lt;a class="reference external" href="https://rpython.readthedocs.io/en/latest/examples.html"&gt;dynamic languages&lt;/a&gt; to see what RPython can do for them.&lt;/p&gt; &lt;p&gt;We provide binary builds for:&lt;/p&gt; &lt;ul class="simple"&gt; &lt;li&gt;&lt;p&gt;&lt;strong&gt;x86&lt;/strong&gt; machines on most common operating systems (Linux 32/64 bits, Mac OS 64 bits, Windows 64 bits)&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;64-bit &lt;strong&gt;ARM&lt;/strong&gt; machines running Linux (&lt;code class="docutils literal"&gt;aarch64&lt;/code&gt;) and macos (&lt;code class="docutils literal"&gt;macos_arm64&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;PyPy supports Windows 32-bit, Linux PPC64 big- and little-endian, Linux ARM 32 bit, RISC-V RV64IMAFD Linux, and s390x Linux but does not release binaries. Please reach out to us if you wish to sponsor binary releases for those platforms. Downstream packagers provide binary builds for debian, Fedora, conda, OpenBSD, FreeBSD, Gentoo, and more.&lt;/p&gt; &lt;/section&gt; &lt;section id="what-else-is-new"&gt; &lt;h3&gt;What else is new?&lt;/h3&gt; &lt;p&gt;For more information about the 7.3.19 release, see the &lt;a class="reference external" href="https://doc.pypy.org/en/latest/release-v7.3.19.html#changelog"&gt;full changelog&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Please update, and continue to help us make pypy better.&lt;/p&gt; &lt;p&gt;Cheers, The PyPy Team&lt;/p&gt; &lt;/section&gt; &lt;/section&gt;</description><category>release</category><guid>https://www.pypy.org/posts/2025/02/pypy-v7319-release.html</guid><pubDate>Wed, 26 Feb 2025 12:00:00 GMT</pubDate></item><item><title>Low Overhead Allocation Sampling with VMProf in PyPy's GC</title><link>https://www.pypy.org/posts/2025/02/pypy-gc-sampling.html</link><dc:creator>Christoph Jung</dc:creator><description>&lt;h3 id="introduction"&gt;Introduction&lt;/h3&gt; &lt;p&gt;There are many time-based statistical profilers around (like VMProf or py-spy just to name a few). They allow the user to pick a trade-off between profiling precision and runtime overhead.&lt;/p&gt; &lt;p&gt;On the other hand there are memory profilers such as &lt;a href="https://github.com/bloomberg/memray"&gt;memray&lt;/a&gt;. They can be handy for finding leaks or for discovering functions that allocate a lot of memory. Memory profilers typlically save every single allocation a program does. This results in precise profiling, but larger overhead.&lt;/p&gt; &lt;p&gt;In this post we describe our experimental approach to low overhead statistical memory profiling. Instead of saving every single allocation a program does, it only saves every nth allocated byte. We have tightly integrated VMProf and the PyPy Garbage Collector to achieve this. The main technical insight is that the check whether an allocation should be sampled can be made free. This is done by folding it into the bump pointer allocator check that the PyPy’s GC uses to find out if it should start a minor collection. In this way the fast path with and without memory sampling are exactly the same.&lt;/p&gt; &lt;h3 id="background"&gt;Background&lt;/h3&gt; &lt;p&gt;To get an insight how the profiler and GC interact, lets take a brief look at both of them first.&lt;/p&gt; &lt;h4 id="vmprof"&gt;VMProf&lt;/h4&gt; &lt;p&gt;&lt;a href="https://github.com/vmprof/vmprof-python"&gt;VMProf&lt;/a&gt; is a statistical time-based profiler for PyPy. VMProf samples the stack of currently running Python functions a certain user-configured number of times per second. By adjusting this number, the overhead of profiling can be modified to pick the correct trade-off between overhead and precision of the profile. In the resulting profile, functions with huge runtime stand out the most, functions with shorter runtime less so. If you want to get a little more introduction to VMProf and how to use it with PyPy, you may look at &lt;a href="https://pypy.org/posts/2024/05/vmprof-firefox-converter.html"&gt;this blog post&lt;/a&gt;&lt;/p&gt; &lt;h4 id="pypys-gc"&gt;PyPy’s GC&lt;/h4&gt; &lt;p&gt;PyPy uses a generational incremental copying collector. That means there are two spaces for allocated objects, the nursery and the old-space. Freshly allocated objects will be allocated into the nursery. When the nursery is full at some point, it will be collected and all objects that survive will be tenured i.e. moved into the old-space. The old-space is much larger than the nursery and is collected less frequently and &lt;a href="https://www.pypy.org/posts/2024/03/fixing-bug-incremental-gc.html"&gt;incrementally&lt;/a&gt; (not completely collected in one go, but step-by-step). The old space collection is not relevant for the rest of the post though. We will now take a look at nursery allocations and how the nursery is collected.&lt;/p&gt; &lt;h4 id="bump-pointer-allocation-in-the-nursery"&gt;Bump Pointer Allocation in the Nursery&lt;/h4&gt; &lt;p&gt;The nursery (a small continuous memory area) utilizes two pointers to keep track from where on the nursery is free and where it ends. They are called &lt;code&gt;nursery_free&lt;/code&gt; and &lt;code&gt;nursery_top&lt;/code&gt;. When memory is allocated, the GC checks if there is enough space in the nursery left. If there is enough space, the &lt;code&gt;nursery_free&lt;/code&gt; pointer will be returned as the start address for the newly allocated memory, and &lt;code&gt;nursery_free&lt;/code&gt; will be moved forward by the amount of allocated memory.&lt;/p&gt; &lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/nursery_allocation.svg"&gt;&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;allocate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;totalsize&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;# Save position, where the object will be allocated to as result&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt; &lt;span class="c1"&gt;# Move nursery_free pointer forward by totalsize&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;totalsize&lt;/span&gt; &lt;span class="c1"&gt;# Check if this allocation would exceed the nursery&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_top&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# If it does =&amp;gt; collect the nursery and allocate afterwards&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collect_and_reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;totalsize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# result is a pointer into the nursery, obj will be allocated there&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;collect_and_reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size_of_allocation&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;# do a minor collection and return the start of the nursery afterwards&lt;/span&gt; &lt;span class="n"&gt;minor_collection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;Understanding this is crucial for our allocation sampling approach, so let us go through this step-by-step.&lt;/p&gt; &lt;p&gt;We already saw an example on how an allocation into a non-full nursery will look like. But what happens, if the nursery is (too) full?&lt;/p&gt; &lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/nursery_full.svg"&gt;&lt;/p&gt; &lt;p&gt;As soon as an object doesn't fit into the nursery anymore, it will be collected. A nursery collection will move all surviving objects into the old-space, so that the nursery is free afterwards, and the requested allocation can be made.&lt;/p&gt; &lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/nursery_collected.svg"&gt;&lt;/p&gt; &lt;p&gt;(Note that this is still a bit of a simplification.)&lt;/p&gt; &lt;h3 id="sampling-approach"&gt;Sampling Approach&lt;/h3&gt; &lt;p&gt;The last section described how the nursery allocation works normally. Now we'll talk how we integrate the new allocation sampling approach into it.&lt;/p&gt; &lt;p&gt;To decide whether the GC should trigger a sample, the sampling logic is integrated into the bump pointer allocation logic. Usually, when there is not enough space in the nursery left to fulfill an allocation request, the nursery will be collected and the allocation will be done afterwards. We reuse that mechanism for sampling, by introducing a new pointer called &lt;code&gt;sample_point&lt;/code&gt; that is calculated by &lt;code&gt;sample_point = nursery_free + sample_n_bytes&lt;/code&gt; where &lt;code&gt;sample_n_bytes&lt;/code&gt; is the number of bytes allocated before a sample is made (i.e. our sampling rate).&lt;/p&gt; &lt;p&gt;Imagine we'd have a nursery of 2MB and want to sample every 512KB allocated, then you could imagine our nursery looking like that:&lt;/p&gt; &lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/nursery_sampling.svg"&gt;&lt;/p&gt; &lt;p&gt;We use the sample point as &lt;code&gt;nursery_top&lt;/code&gt;, so that allocating a chunk of 512KB would exceed the nursery top and start a nursery collection. But of course we don't want to do a minor collection just then, so before starting a collection, we need to check if the nursery is actually full or if that is just an exceeded sample point. The latter will then trigger a VMprof stack sample. Afterwards we don't actually do a minor collection, but change &lt;code&gt;nursery_top&lt;/code&gt; and immediately return to the caller.&lt;/p&gt; &lt;p&gt;The last picture is a conceptual simplification. Only one sampling point exists at any given time. After we created the sampling point, it will be used as nursery top, if exceeded at some point, we will just add &lt;code&gt;sample_n_bytes&lt;/code&gt; to that sampling point, i.e. move it forward.&lt;/p&gt; &lt;p&gt;Here's how the updated &lt;code&gt;collect_and_reserve&lt;/code&gt; function looks like:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;collect_and_reserve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size_of_allocation&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;# Check if we exceeded a sample point or if we need to do a minor collection&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_top&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sample_point&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# One allocation could exceed multiple sample points&lt;/span&gt; &lt;span class="c1"&gt;# Sample, move sample_point forward&lt;/span&gt; &lt;span class="n"&gt;vmprof&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sample_now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sample_point&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;sample_n_bytes&lt;/span&gt; &lt;span class="c1"&gt;# Set sample point as new nursery_top if it fits into the nursery&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sample_point&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;real_nursery_top&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_top&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sample_point&lt;/span&gt; &lt;span class="c1"&gt;# Or use the real nursery top if it does not fit&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_top&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;real_nursery_top&lt;/span&gt; &lt;span class="c1"&gt;# Is there enough memory left inside the nursery&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;size_of_allocation&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_top&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# Yes =&amp;gt; move nursery_free forward&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;size_of_allocation&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt; &lt;span class="c1"&gt;# We did not exceed a sampling point and must do a minor collection, or&lt;/span&gt; &lt;span class="c1"&gt;# we exceeded a sample point but we needed to do a minor collection anyway&lt;/span&gt; &lt;span class="n"&gt;minor_collection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nursery_free&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;h3 id="why-is-the-overhead-low"&gt;Why is the Overhead ‘low’&lt;/h3&gt; &lt;p&gt;The most important property of our approach is that the bump-pointer fast path is not changed at all. If sampling is turned off, the slow path in &lt;code&gt;collect_and_reserve&lt;/code&gt; has three extra instructions for the if at the beginning, but are only a very small amount of overhead, compared to doing a minor collection.&lt;/p&gt; &lt;p&gt;When sampling is on, the extra logic in &lt;code&gt;collect_and_reserve&lt;/code&gt; gets executed. Every time an allocation exceeds the &lt;code&gt;sample_point&lt;/code&gt;, &lt;code&gt;collect_and_reserve&lt;/code&gt; will sample the Python functions currently executing. The resulting overhead is directly controlled by &lt;code&gt;sample_n_bytes&lt;/code&gt;. After sampling, the &lt;code&gt;sample_point&lt;/code&gt; and &lt;code&gt;nursery_top&lt;/code&gt; must be set accordingly. This will be done once after sampling in &lt;code&gt;collect_and_reserve&lt;/code&gt;. At some point a nursery collection will free the nursery and set the new &lt;code&gt;sample_point&lt;/code&gt; afterwards.&lt;/p&gt; &lt;p&gt;That means that the overhead mostly depends on the sampling rate and the rate at which the user program allocates memory, as the combination of those two factors determines the amount of samples.&lt;/p&gt; &lt;p&gt;Since the sampling rate can be adjusted from as low as 64 Byte to a theoretical maximum of ~4 GB (at the moment), the tradeoff between number of samples (i.e. profiling precision) and overhead can be completely adjusted.&lt;/p&gt; &lt;p&gt;We also suspect linkage between user program stack depth and overhead (a deeper stack takes longer to walk, leading to higher overhead), especially when walking the C call stack to.&lt;/p&gt; &lt;h3 id="sampling-rates-bigger-than-the-nursery-size"&gt;Sampling rates bigger than the nursery size&lt;/h3&gt; &lt;p&gt;The nursery usually has a size of a few megabytes, but profiling long-runningor larger applications with tons of allocations could result in very high number of samples per second (and thus overhead). To combat that it is possible to use sampling rates higher than the nursery size.&lt;/p&gt; &lt;p&gt;The sampling point is not limited by the nursery size, but if it is 'outside' the nursery (e.g. because &lt;code&gt;sample_n_bytes&lt;/code&gt; is set to twice the nursery size) it won't be used as &lt;code&gt;nursery_top&lt;/code&gt; until it 'fits' into the nursery.&lt;/p&gt; &lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/nursery_sampling_larger_than_nursery.svg"&gt;&lt;/p&gt; &lt;p&gt;After every nursery collection, we'd usually set the &lt;code&gt;sample_point&lt;/code&gt; to &lt;code&gt;nursery_free + sample_n_bytes&lt;/code&gt;, but if it is larger than the nursery, then the amount of collected memory during the last nursery collection is subtracted from &lt;code&gt;sample_point&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/nursery_sampling_larger_than_nursery_post_minor.svg"&gt;&lt;/p&gt; &lt;p&gt;At some point the &lt;code&gt;sample_point&lt;/code&gt; will be smaller than the nursery size, then it will be used as &lt;code&gt;nursery_top&lt;/code&gt; again to trigger a sample when exceeded.&lt;/p&gt; &lt;h3 id="differences-to-time-based-sampling"&gt;Differences to Time-Based Sampling&lt;/h3&gt; &lt;p&gt;As mentioned in the introduction, time-based sampling ‘hits’ functions with high runtime, and allocation-sampling ‘hits’ functions allocating much memory. But are those always different functions? The answer is: sometimes. There can be functions allocating lots of memory, that do not have a (relative) high runtime.&lt;/p&gt; &lt;p&gt;Another difference to time-based sampling is that the profiling overhead does not solely depend on the sampling rate (if we exclude a potential stack-depth - overhead correlation for now) but also on the amount of memory the user code allocates.&lt;/p&gt; &lt;p&gt;Let us look at an example:&lt;/p&gt; &lt;p&gt;If we’d sample every 1024 Byte and some program A allocates 3 MB and runs for 5 seconds, and program B allocates 6 MB but also runs for 5 seconds, there will be ~3000 samples when profiling A, but ~6000 samples when profiling B. That means we cannot give a ‘standard’ sampling rate like time-based profilers use to do (e.g. vmprof uses ~1000 samples/s for time sampling), as the number of resulting samples, and thus overhead, depends on sampling rate and amount of memory allocated by the program.&lt;/p&gt; &lt;p&gt;For testing and benchmarking, we usually started with a sampling rate of 128Kb and then halved or doubled that (multiple times) depending on sample counts, our need for precision (and size of the profile).&lt;/p&gt; &lt;h3 id="evaluation"&gt;Evaluation&lt;/h3&gt; &lt;h4 id="overhead"&gt;Overhead&lt;/h4&gt; &lt;p&gt;Now let us take a look at the allocation sampling overhead, by profiling some benchmarks. &lt;/p&gt; &lt;p&gt;The x-axis shows the sampling rate, while the y-axis shows the overhead, which is computed as &lt;code&gt;runtime_with_sampling / runtime_without_sampling&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;All benchmarks were executed five times on a PyPy with JIT and native profiling enabled, so that every dot in the plot is one run of a benchmark.&lt;/p&gt; &lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/as_overhead.png"&gt;&lt;/p&gt; &lt;p&gt;As you probably expected, the Overhead drops with higher allocation sampling rates. Reaching from as high as ~390% for 32kb allocation sampling to as low as &amp;lt; 10% for 32mb.&lt;/p&gt; &lt;p&gt;Let me give one concrete example: One run of the microbenchmark at 32kb sampling took 15.596 seconds and triggered 822050 samples. That makes a ridiculous amount of &lt;code&gt;822050 / 15.596 = ~52709&lt;/code&gt; samples per second. &lt;/p&gt; &lt;p&gt;There is probably no need for that amount of samples per second, so that for 'real' application profiling a much higher sampling rate would be sufficient.&lt;/p&gt; &lt;p&gt;Let us compare that to time sampling.&lt;/p&gt; &lt;p&gt;This time we ran those benchmarks with 100, 1000 and 2000 samples per second.&lt;/p&gt; &lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/ts_overhead.png"&gt;&lt;/p&gt; &lt;p&gt;The overhead varies with the sampling rate. Both with allocation and time sampling, you can reach any amount of overhead and any level of profiling precision you want. The best approach probably is to just try out a sampling rate and choose what gives you the right tradeoff between precision and overhead (and disk usage).&lt;/p&gt; &lt;p&gt;The benchmarks used are:&lt;/p&gt; &lt;p&gt;microbenchmark &lt;/p&gt; &lt;ul&gt; &lt;li&gt;&lt;a href="https://github.com/Cskorpion/microbenchmark"&gt;https://github.com/Cskorpion/microbenchmark&lt;/a&gt;&lt;/li&gt; &lt;li&gt;&lt;code&gt;pypy microbench.py 65536&lt;/code&gt;&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;gcbench &lt;/p&gt; &lt;ul&gt; &lt;li&gt;&lt;a href="https://github.com/pypy/pypy/blob/main/rpython/translator/goal/gcbench.py"&gt;https://github.com/pypy/pypy/blob/main/rpython/translator/goal/gcbench.py&lt;/a&gt;&lt;/li&gt; &lt;li&gt;print statements removed&lt;/li&gt; &lt;li&gt;&lt;code&gt;pypy gcbench.py 1&lt;/code&gt;&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;pypy translate step&lt;/p&gt; &lt;ul&gt; &lt;li&gt;first step of the pypy translation (annotation step)&lt;/li&gt; &lt;li&gt;&lt;code&gt;pypy path/to/rpython --opt=0 --cc=gcc --dont-write-c-files --gc=incminimark --annotate path/to/pypy/goal/targetpypystandalone.py&lt;/code&gt;&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;interpreter pystone&lt;/p&gt; &lt;ul&gt; &lt;li&gt;pystone benchmark on top of an interpreted pypy on top of a translated pypy&lt;/li&gt; &lt;li&gt;&lt;code&gt;pypy path/to/pypy/bin/pyinteractive.py -c "import test.pystone; test.pystone.main(1)"&lt;/code&gt;&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;All benchmarks executed on:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;Kubuntu 24.04&lt;/li&gt; &lt;li&gt;AMD Ryzen 7 5700U&lt;/li&gt; &lt;li&gt;24gb DDR4 3200MHz (dual channel)&lt;/li&gt; &lt;li&gt; &lt;p&gt;SSD benchmarking at read: 1965 MB/s, write: 227 MB/s&lt;/p&gt; &lt;ul&gt; &lt;li&gt;Sequential 1MB 1 Thread 8 Queues&lt;/li&gt; &lt;/ul&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt;Self built PyPy with allocation sampling features&lt;/p&gt; &lt;ul&gt; &lt;li&gt;&lt;a href="https://github.com/Cskorpion/pypy/tree/gc_allocation_sampling_u_2.7"&gt;https://github.com/Cskorpion/pypy/tree/gc_allocation_sampling_u_2.7&lt;/a&gt;&lt;/li&gt; &lt;/ul&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt;Modified VMProf with allocation sampling support&lt;/p&gt; &lt;ul&gt; &lt;li&gt;&lt;a href="https://github.com/Cskorpion/vmprof-python/tree/pypy_gc_allocation_sampling"&gt;https://github.com/Cskorpion/vmprof-python/tree/pypy_gc_allocation_sampling&lt;/a&gt;&lt;/li&gt; &lt;/ul&gt; &lt;/li&gt; &lt;/ul&gt; &lt;h4 id="example"&gt;Example&lt;/h4&gt; &lt;p&gt;We have also modified &lt;a href="https://github.com/Cskorpion/vmprof-firefox-converter/tree/allocation_sampling"&gt;vmprof-firefox-converter&lt;/a&gt; to show the allocation samples in the Firefor Profiler UI. With the techniques from this post, the output looks like this:&lt;/p&gt; &lt;p&gt;&lt;img src="https://www.pypy.org/images/2025_02_allocation_sampling_images/allocation_sampling_call_tree.png"&gt;&lt;/p&gt; &lt;p&gt;While this view is interesting, it would be even better if we could also see what types of objects are being allocated in these functions. We will take about how to do this in a future blog post.&lt;/p&gt; &lt;h3 id="conclusion"&gt;Conclusion&lt;/h3&gt; &lt;p&gt;In this blog post we introduced allocation sampling for PyPy by going through the technical aspects and the corresponding overhead. In a future blog post, we are going to dive into the actual usage of allocation sampling with VMProf, and show an example case study. That will be accompanied by some new improvements and additional features, like extracting the type of an object that triggered a sample.&lt;/p&gt; &lt;p&gt;So far all this work is still experimental and happening on PyPy branches but we hope to get the technique stable enough to merge it to main and ship it with PyPy eventually.&lt;/p&gt; &lt;p&gt;-- Christoph Jung and CF Bolz-Tereick&lt;/p&gt;</description><category>gc</category><category>profiling</category><category>vmprof</category><guid>https://www.pypy.org/posts/2025/02/pypy-gc-sampling.html</guid><pubDate>Tue, 25 Feb 2025 10:16:00 GMT</pubDate></item><item><title>PyPy v7.3.18 release</title><link>https://www.pypy.org/posts/2025/02/pypy-v7318-release.html</link><dc:creator>mattip</dc:creator><description>&lt;section id="pypy-v7-3-18-release-of-python-2-7-3-10-and-3-11-beta"&gt; &lt;h2&gt;PyPy v7.3.18: release of python 2.7, 3.10 and 3.11 beta&lt;/h2&gt; &lt;p&gt;The PyPy team is proud to release version 7.3.18 of PyPy.&lt;/p&gt; &lt;p&gt;This release includes a python 3.11 interpreter. We are labelling it "beta" because it is the first one. In the next release we will drop 3.10 and remove the "beta" label. There are a particularly large set of bugfixes in this release thanks to @devdanzin using fusil on the 3.10 builds, originally written by Victor Stinner. Other significant changes:&lt;/p&gt; &lt;ul class="simple"&gt; &lt;li&gt;&lt;p&gt;We have updated libffi shipped in our portable builds. We also now statically link to libffi where possible which reduces the number of shared object dependencies.&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;We have added code to be able to show the native function names when profiling with VMProf. So far only Linux supports this feature.&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;We have added a &lt;a class="reference external" href="https://peps.python.org/pep-0768/"&gt;PEP 768&lt;/a&gt;-inspired remote debugging facility.&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;The HPy backend has been updated to latest HPy HEAD&lt;/p&gt;&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;The release includes three different interpreters:&lt;/p&gt; &lt;ul class="simple"&gt; &lt;li&gt;&lt;p&gt;PyPy2.7, which is an interpreter supporting the syntax and the features of Python 2.7 including the stdlib for CPython 2.7.18+ (the &lt;code class="docutils literal"&gt;+&lt;/code&gt; is for backported security updates)&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;PyPy3.10, which is an interpreter supporting the syntax and the features of Python 3.10, including the stdlib for CPython 3.10.16.&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;PyPy3.11, which is an interpreter supporting the syntax and the features of Python 3.11, including the stdlib for CPython 3.11.11.&lt;/p&gt;&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;The interpreters are based on much the same codebase, thus the triple release. This is a micro release, all APIs are compatible with the other 7.3 releases. It follows after 7.3.17 release on August 28, 2024.&lt;/p&gt; &lt;p&gt;We recommend updating. You can find links to download the releases here:&lt;/p&gt; &lt;blockquote&gt; &lt;p&gt;&lt;a class="reference external" href="https://pypy.org/download.html"&gt;https://pypy.org/download.html&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt; &lt;p&gt;We would like to thank our donors for the continued support of the PyPy project. If PyPy is not quite good enough for your needs, we are available for &lt;a class="reference external" href="https://www.pypy.org/pypy-sponsors.html"&gt;direct consulting&lt;/a&gt; work. If PyPy is helping you out, we would love to hear about it and encourage submissions to our &lt;a class="reference external" href="https://pypy.org/blog"&gt;blog&lt;/a&gt; via a pull request to &lt;a class="reference external" href="https://github.com/pypy/pypy.org"&gt;https://github.com/pypy/pypy.org&lt;/a&gt;&lt;/p&gt; &lt;p&gt;We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: bug fixes, &lt;a class="reference external" href="https://doc.pypy.org/"&gt;PyPy&lt;/a&gt; and &lt;a class="reference external" href="https://rpython.readthedocs.org"&gt;RPython&lt;/a&gt; documentation improvements, or general &lt;a class="reference external" href="https://doc.pypy.org/en/latest/project-ideas.html"&gt;help&lt;/a&gt; with making RPython's JIT even better.&lt;/p&gt; &lt;p&gt;If you are a python library maintainer and use C-extensions, please consider making a &lt;a class="reference external" href="https://hpyproject.org/"&gt;HPy&lt;/a&gt; / &lt;a class="reference external" href="https://cffi.readthedocs.io"&gt;CFFI&lt;/a&gt; / &lt;a class="reference external" href="https://cppyy.readthedocs.io"&gt;cppyy&lt;/a&gt; version of your library that would be performant on PyPy. In any case, both &lt;a class="reference external" href="https://github.com/joerick/cibuildwheel"&gt;cibuildwheel&lt;/a&gt; and the &lt;a class="reference external" href="https://github.com/matthew-brett/multibuild"&gt;multibuild system&lt;/a&gt; support building wheels for PyPy.&lt;/p&gt; &lt;section id="vmprof-native-symbol-names"&gt; &lt;h3&gt;VMProf Native Symbol Names&lt;/h3&gt; &lt;p&gt;When running VMProf profiling with native profiling enabled, PyPy did so far not produce function names for C functions. The output looked like this:&lt;/p&gt; &lt;pre class="literal-block"&gt;pypy -m vmprof ~/projects/gitpypy/lib-python/2.7/test/pystone.py Pystone(1.1) time for 50000 passes = 0.0109887 This machine benchmarks at 4.55011e+06 pystones/second vmprof output: %: name: location: 100.0% entry_point &amp;lt;builtin&amp;gt;/app_main.py:874 100.0% run_command_line &amp;lt;builtin&amp;gt;/app_main.py:601 100.0% run_toplevel &amp;lt;builtin&amp;gt;/app_main.py:93 100.0% _run_module_as_main /home/user/bin/pypy-c-jit-170203-99a72243b541-linux64/lib-python/2.7/runpy.py:150 100.0% _run_code /home/user/bin/pypy-c-jit-170203-99a72243b541-linux64/lib-python/2.7/runpy.py:62 100.0% &amp;lt;module&amp;gt; /home/user/bin/pypy-c-jit-170203-99a72243b541-linux64/site-packages/vmprof/__main__.py:1 100.0% main /home/user/bin/pypy-c-jit-170203-99a72243b541-linux64/site-packages/vmprof/__main__.py:30 100.0% run_path /home/user/bin/pypy-c-jit-170203-99a72243b541-linux64/lib-python/2.7/runpy.py:238 100.0% _run_module_code /home/user/bin/pypy-c-jit-170203-99a72243b541-linux64/lib-python/2.7/runpy.py:75 100.0% &amp;lt;module&amp;gt; /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:3 100.0% main /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:60 100.0% pystones /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:67 100.0% Proc0 /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:79 76.9% &amp;lt;unknown code&amp;gt; 69.2% &amp;lt;unknown code&amp;gt; 53.8% &amp;lt;unknown code&amp;gt; 53.8% &amp;lt;unknown code&amp;gt; 46.2% &amp;lt;unknown code&amp;gt; 46.2% &amp;lt;unknown code&amp;gt; 38.5% &amp;lt;unknown code&amp;gt; 38.5% Proc8 /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:212 30.8% &amp;lt;unknown code&amp;gt; ...&lt;/pre&gt; &lt;p&gt;We can now symbolify these C functions and give function names and which shared library they come from, at least on Linux:&lt;/p&gt; &lt;pre class="literal-block"&gt;Pystone(1.1) time for 50000 passes = 0.218967 This machine benchmarks at 228345 pystones/second vmprof output: %: name: location: 100.0% entry_point &amp;lt;builtin&amp;gt;/app_main.py:889 100.0% run_command_line &amp;lt;builtin&amp;gt;/app_main.py:616 100.0% run_toplevel &amp;lt;builtin&amp;gt;/app_main.py:95 100.0% _run_module_as_main /home/user/projects/gitpypy/lib-python/2.7/runpy.py:150 100.0% _run_code /home/user/projects/gitpypy/lib-python/2.7/runpy.py:62 100.0% &amp;lt;module&amp;gt; /home/user/projects/gitpypy/site-packages/vmprof/__main__.py:1 100.0% main /home/user/projects/gitpypy/site-packages/vmprof/__main__.py:30 100.0% run_module /home/user/projects/gitpypy/lib-python/2.7/runpy.py:179 100.0% _run_module_code /home/user/projects/gitpypy/lib-python/2.7/runpy.py:75 100.0% &amp;lt;module&amp;gt; /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:3 100.0% main /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:60 100.0% pystones /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:67 100.0% Proc0 /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:79 95.5% n:pypy_g_execute_frame:0:pypy-c 91.4% n:pypy_g_PyFrame_dispatch:0:pypy-c 63.8% n:pypy_g_PyFrame_dispatch_bytecode:0:pypy-c 49.8% Proc1 /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:137 17.6% copy /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:53 13.6% n:pypy_g_PyFrame_CALL_FUNCTION:0:pypy-c 10.4% Proc8 /home/user/projects/gitpypy/lib-python/2.7/test/pystone.py:212 8.6% n:pypy_g_STORE_ATTR_slowpath:0:pypy-c&lt;/pre&gt; &lt;p&gt;This becomes even more useful when using the &lt;a class="reference external" href="https://github.com/Cskorpion/vmprof-firefox-converter/"&gt;VMProf Firefox converter&lt;/a&gt;, which uses the Firefox Profiler Web UI to visualize profiling output:&lt;/p&gt; &lt;img alt="/images/2025-vmprof-firefox.png" src="https://www.pypy.org/images/2025-vmprof-firefox.png"&gt; &lt;/section&gt; &lt;section id="what-is-pypy"&gt; &lt;h3&gt;What is PyPy?&lt;/h3&gt; &lt;p&gt;PyPy is a Python interpreter, a drop-in replacement for CPython It's fast (&lt;a class="reference external" href="https://speed.pypy.org"&gt;PyPy and CPython&lt;/a&gt; performance comparison) due to its integrated tracing JIT compiler.&lt;/p&gt; &lt;p&gt;We also welcome developers of other &lt;a class="reference external" href="https://rpython.readthedocs.io/en/latest/examples.html"&gt;dynamic languages&lt;/a&gt; to see what RPython can do for them.&lt;/p&gt; &lt;p&gt;We provide binary builds for:&lt;/p&gt; &lt;ul class="simple"&gt; &lt;li&gt;&lt;p&gt;&lt;strong&gt;x86&lt;/strong&gt; machines on most common operating systems (Linux 32/64 bits, Mac OS 64 bits, Windows 64 bits)&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;64-bit &lt;strong&gt;ARM&lt;/strong&gt; machines running Linux (&lt;code class="docutils literal"&gt;aarch64&lt;/code&gt;) and macos (&lt;code class="docutils literal"&gt;macos_arm64&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;PyPy supports Windows 32-bit, Linux PPC64 big- and little-endian, Linux ARM 32 bit, RISC-V RV64IMAFD Linux, and s390x Linux but does not release binaries. Please reach out to us if you wish to sponsor binary releases for those platforms. Downstream packagers provide binary builds for debian, Fedora, conda, OpenBSD, FreeBSD, Gentoo, and more.&lt;/p&gt; &lt;/section&gt; &lt;section id="what-else-is-new"&gt; &lt;h3&gt;What else is new?&lt;/h3&gt; &lt;p&gt;For more information about the 7.3.18 release, see the &lt;a class="reference external" href="https://doc.pypy.org/en/latest/release-v7.3.18.html#changelog"&gt;full changelog&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Please update, and continue to help us make pypy better.&lt;/p&gt; &lt;p&gt;Cheers, The PyPy Team&lt;/p&gt; &lt;/section&gt; &lt;/section&gt;</description><category>release</category><guid>https://www.pypy.org/posts/2025/02/pypy-v7318-release.html</guid><pubDate>Thu, 06 Feb 2025 12:00:00 GMT</pubDate></item><item><title>Musings on Tracing in PyPy</title><link>https://www.pypy.org/posts/2025/01/musings-tracing.html</link><dc:creator>CF Bolz-Tereick</dc:creator><description>&lt;p&gt;Last summer, &lt;a href="https://cs.brown.edu/~sk/"&gt;Shriram Krishnamurthi&lt;/a&gt; &lt;a href="https://twitter.com/ShriramKMurthi/status/1818009884484583459"&gt;asked on Twitter&lt;/a&gt;:&lt;/p&gt; &lt;blockquote&gt; &lt;p&gt;"I'm curious what the current state of tracing JITs is. They used to be all the rage for a while, then I though I heard they weren't so effective, then I haven't heard of them at all. Is the latter because they are ubiquitous, or because they proved to not work so well?"&lt;/p&gt; &lt;/blockquote&gt; &lt;p&gt;I replied with my personal (pretty subjective) opinions about the question in a lengthy Twitter thread (which also spawned an even lengthier discussion). I wanted to turn what I wrote there into a blog post to make it more widely available (Twitter is no longer easily consumable without an account), and also because I'm mostly not using Twitter anymore. The blog post i still somewhat terse, I've written a small background section and tried to at least add links to further information. Please ask in the comments if something is particularly unclear.&lt;/p&gt; &lt;h3 id="background"&gt;Background&lt;/h3&gt; &lt;p&gt;I'll explain a few of the central terms of the rest of the post. &lt;em&gt;JIT compilers&lt;/em&gt; are compilers that do their work at runtime, interleaved (or concurrent with) the execution of the program. There are (at least) two common general styles of JIT compiler architectures. The most common one is that of a method-based JIT, which will compile one method or function at a time. Then there are tracing JIT compilers, which generate code by tracing the execution of the user's program. They often focus on loops as their main unit of compilation.&lt;/p&gt; &lt;p&gt;Then there is the distinction between a "regular" JIT compiler and that of a &lt;em&gt;meta-JIT&lt;/em&gt;. A regular JIT is built to compile one specific source language to machine code. A meta-JIT is a framework for building JIT compilers for a variety of different languages, reusing as much machinery as possible between the different implementation.&lt;/p&gt; &lt;h3 id="personal-and-project-context"&gt;Personal and Project Context&lt;/h3&gt; &lt;p&gt;Some personal context: my perspective is informed by nearly &lt;a href="https://mail.python.org/archives/list/pypy-dev@python.org/thread/TZM37YJ733G445R6JGTV26333RQEPLRX/"&gt;two decades&lt;/a&gt; of work on PyPy. PyPy's implementation language, &lt;a href="https://rpython.readthedocs.io/"&gt;RPython&lt;/a&gt;, has support for a meta-JIT, which allows it to reuse its JIT infrastructure for the various Python versions that we support (currently we do releases of PyPy2.7 and PyPy3.10 together). Our meta-JIT infrastructure has been used for some experimental different languages like:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;PyPy's &lt;a href="https://pypy.org/posts/2010/11/pypy-14-ouroboros-in-practice-5437628000869417542.html#more-highlights"&gt;regular expression engine&lt;/a&gt;&lt;/li&gt; &lt;li&gt;&lt;a href="https://github.com/SOM-st/PySOM"&gt;RPySom&lt;/a&gt;, a tiny Smalltalk&lt;/li&gt; &lt;li&gt;&lt;a href="https://github.com/topazproject/topaz"&gt;Ruby&lt;/a&gt;&lt;/li&gt; &lt;li&gt;&lt;a href="https://github.com/hippyvm/hippyvm"&gt;PHP&lt;/a&gt;&lt;/li&gt; &lt;li&gt;&lt;a href="https://dl.acm.org/doi/10.1145/1836089.1836102"&gt;Prolog&lt;/a&gt;,&lt;/li&gt; &lt;li&gt;&lt;a href="https://dl.acm.org/doi/10.1145/2784731.2784740"&gt;Racket&lt;/a&gt;,&lt;/li&gt; &lt;li&gt;a &lt;a href="https://drops.dagstuhl.de/storage/00lipics/lipics-vol056-ecoop2016/LIPIcs.ECOOP.2016.4/LIPIcs.ECOOP.2016.4.pdf"&gt;database (SQLite)&lt;/a&gt;&lt;/li&gt; &lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=fZj3uljJl_k"&gt;Lox&lt;/a&gt;, the language of &lt;a href="https://craftinginterpreters.com/"&gt;Crafting Interpreters&lt;/a&gt;&lt;/li&gt; &lt;li&gt;an &lt;a href="https://docs.pydrofoil.org/en/latest/"&gt;ARM and RISC-V emulator&lt;/a&gt;&lt;/li&gt; &lt;li&gt;and many more&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;Those implementations had various degrees of maturity and many of them are research software and aren't maintained any more.&lt;/p&gt; &lt;p&gt;PyPy gives itself the goal to try to be extremely compatible with all the quirks of the Python language. We don't change the Python language to make things easier to compile and we support the introspection and debugging features of Python. We try very hard to have no opinions on language design. The CPython core developers come up with the semantics, we somehow deal with them.&lt;/p&gt; &lt;h3 id="meta-tracing"&gt;Meta-tracing&lt;/h3&gt; &lt;p&gt;PyPy started using a &lt;a href="https://en.wikipedia.org/wiki/Tracing_just-in-time_compilation"&gt;tracing JIT&lt;/a&gt; approach &lt;em&gt;not&lt;/em&gt; because we thought method-based just-in-time compilers are bad. Historically we &lt;a href="https://foss.heptapod.net/pypy/extradoc/-/blob/branch/extradoc/eu-report/D08.2_JIT_Compiler_Architecture-2007-05-01.pdf"&gt;had tried&lt;/a&gt; to implement a method-based meta-JIT that was using partial evaluation (we wrote three or four method-based prototypes that all weren't as good as we hoped). After all those &lt;a href="https://pypy.org/posts/2008/10/sprint-discussions-jit-generator-3301578822967655604.html"&gt;experiments failed&lt;/a&gt; we switched to the &lt;a href="https://dl.acm.org/doi/10.1145/1565824.1565827"&gt;tracing approach&lt;/a&gt;, and only at this point did our meta-JIT start producing interesting performance.&lt;/p&gt; &lt;p&gt;In the meta-JIT context tracing has good properties, because tracing has relatively understandable behavior and its easy(ish) to tweak how things work &lt;a href="https://dl.acm.org/doi/10.1145/2069172.2069181"&gt;with extra annotations in the interpreter source&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Another reason why meta-tracing often works well for PyPy is that it can often slice through the complicated layers of Python quite effectively and remove a lot of overhead. Python is often described as simple, but I think that's actually a misconception. On the implementation level it's a very big and complicated language, and it is also continuously getting new features every year (the language is quite a bit more complicated than Javascript, for example&lt;sup id="fnref:help"&gt;&lt;a class="footnote-ref" href="https://www.pypy.org/posts/2025/01/musings-tracing.html#fn:help"&gt;1&lt;/a&gt;&lt;/sup&gt;).&lt;/p&gt; &lt;h4 id="truffle"&gt;Truffle&lt;/h4&gt; &lt;p&gt;Later &lt;a href="https://dl.acm.org/doi/abs/10.1145/2509578.2509581"&gt;Truffle&lt;/a&gt; came along and made a method-based meta-JIT using partial evaluation work. However Truffle (and &lt;a href="https://www.oracle.com/java/graalvm/"&gt;Graal&lt;/a&gt;) has had significantly more people working on it and much more money invested. In addition, it at first required a quite specific style of &lt;a href="https://dl.acm.org/doi/10.1145/2384577.2384587"&gt;AST-based interpreters&lt;/a&gt; (in the last few years they have also added support for bytecode-based interpreters).&lt;/p&gt; &lt;p&gt;It's still my impression that getting similar results with Truffle is &lt;a href="https://stefan-marr.de/downloads/oopsla15-marr-ducasse-meta-tracing-vs-partial-evaluation.pdf"&gt;more work for language implementers&lt;/a&gt; than with RPython, and the &lt;a href="https://arxiv.org/pdf/1602.00602"&gt;warmup&lt;/a&gt; of Truffle can often pretty bad. But Truffle is definitely an existence proof that meta-JITs don't &lt;em&gt;have&lt;/em&gt; to be based on tracing.&lt;/p&gt; &lt;h3 id="tracing-the-good"&gt;Tracing, the good&lt;/h3&gt; &lt;p&gt;Let's now actually get to he heart of Shriram's question and discuss some of the advantages of tracing that go beyond the ease of using tracing for a meta-JIT.&lt;/p&gt; &lt;p&gt;Tracing allows for doing very aggressive &lt;a href="https://www.cs.fsu.edu/~xyuan/INTERACT-15/papers/paper11.pdf"&gt;partial inlining&lt;/a&gt;, Following just the hot path through lots of layers of abstraction is obviously often really useful for generating fast code.&lt;/p&gt; &lt;p&gt;It's definitely possible to achieve the same effect in a method-based context with &lt;a href="https://dl.acm.org/doi/pdf/10.1145/117954.117955"&gt;path splitting&lt;/a&gt;. But it requires a lot more implementation work and is not trivial, because the path &lt;a href="https://dl.acm.org/doi/10.1145/504282.504295"&gt;execution counts&lt;/a&gt; of inlined functions can often be very call-site dependent. Tracing, on the other hand, gives you call-site dependent path splitting "for free".&lt;/p&gt; &lt;p&gt;(The aggressive partial inlining and path splitting is even more important in the meta-tracing context of PyPy, where some of inlined layers are a part of the language runtime, and where rare corner cases that are never executed in practice are everywhere.)&lt;/p&gt; &lt;p&gt;Another advantage of tracing is that it makes a number of optimizations really easy to implement, because there are (to first approximation) no control flow merges. This makes all the optimizations that we do (super-)&lt;a href="https://en.wikipedia.org/wiki/Optimizing_compiler#Local_vs._global_scope"&gt;local optimizations&lt;/a&gt;, that operate on a single (very long) basic block. This allows the JIT to do the optimizations in exactly one forwards and one backwards pass. An example is our &lt;a href="https://dl.acm.org/doi/10.1145/1929501.1929508"&gt;allocation removal&lt;/a&gt;/partial escape analysis pass, which is &lt;a href="https://pypy.org/posts/2022/10/toy-optimizer-allocation-removal.html"&gt;quite simple&lt;/a&gt;, whereas the &lt;a href="https://ssw.jku.at/Teaching/PhDTheses/Stadler/Thesis_Stadler_14.pdf"&gt;version for general control flow&lt;/a&gt; has a lot more complexity, particularly in its handling of loops.&lt;/p&gt; &lt;p&gt;This ease of implementation of optimizations allowed us to implement some pretty decent optimizations. Our allocation removal, the way PyPy's JIT can reason about the heap, about dictionary accesses, about properties of functions of the runtime, about the range and &lt;a href="https://pypy.org/posts/2024/08/toy-knownbits.html"&gt;known bits of integer variables&lt;/a&gt; is all quite solid.&lt;/p&gt; &lt;h3 id="tracing-the-bad"&gt;Tracing, the bad&lt;/h3&gt; &lt;p&gt;Tracing also comes with a significant number of downsides. Probably the biggest one is that it tends to have big performance cliffs (PyPy certainly has them, and other tracing JITs such as TraceMonkey had them too). In my experience the 'good' cases of tracing are really good, but if something goes wrong you are annoyed and performance can become a lot slower. With a simple method JIT the performance is often much more "even".&lt;/p&gt; &lt;p&gt;Another set of downsides is that tracing has a number of corner cases and "weird" behaviour in certain situations. Questions such as: - When do you stop inlining? - What happens when you &lt;a href="https://mail.python.org/archives/list/pypy-dev@python.org/thread/GQQ7ABUFHGEAHWN7RQZM6D54CDROQINR/"&gt;trace recursion&lt;/a&gt;? - What happens if your traces are &lt;a href="https://pypy.org/posts/2021/09/jit-auto-generated-code.html"&gt;consistently too long, even without inlining&lt;/a&gt;? - and so on...&lt;/p&gt; &lt;p&gt;Some of these problems can be solved by adding heuristics to the tracing JIT, but doing so loses a lot of the simplicity of tracing again.&lt;/p&gt; &lt;p&gt;There are also some classes of programs that tend to generally perform quite poorly when they are executed by a tracing JIT, bytecode interpreters in particularly, and other extremely unpredictably branchy code. This is because the core assumption of the tracing jit "loops take similar control flow paths" is just really wrong in the case of interpreters.&lt;/p&gt; &lt;h3 id="discussion"&gt;Discussion&lt;/h3&gt; &lt;p&gt;The Twitter thread spawned quite a bit of discussion, please look at the original thread for all of the comments. Here are three that I wanted to highlight:&lt;/p&gt; &lt;blockquote&gt; &lt;p&gt;"This is a really great summary. Meta-tracing is probably the one biggest success story. I think it has to do with how big and branchy the bytecode implementations are for typical dynamic languages; the trace captures latent type feedback naturally.&lt;/p&gt; &lt;p&gt;There is an upper limit, tho."&lt;/p&gt; &lt;/blockquote&gt; &lt;p&gt;&lt;a href="https://twitter.com/TitzerBL/status/1818385622203298265"&gt;Ben Titzer&lt;/a&gt;&lt;/p&gt; &lt;p&gt;I agree with this completely. The complexity of Python bytecodes is a big factor for why meta tracing works well for us. But also in Python there are many builtin types (collection types, types that form the &lt;a href="https://en.wikipedia.org/wiki/Metaobject#Metaobject_protocol"&gt;meta-object protocol&lt;/a&gt; of Python, standard library modules implemented in C/RPython) and tracing operations on them is very important too, for good performance.&lt;/p&gt; &lt;hr&gt; &lt;blockquote&gt; &lt;p&gt;"I think Mozilla had a blog post talking more about the difficulty with TraceMonkey, could only find this one: https://blog.mozilla.org/nnethercote/category/jagermonkey/"&lt;/p&gt; &lt;/blockquote&gt; &lt;p&gt;&lt;a href="https://twitter.com/smarr/status/1818600052752797990"&gt;Stefan Marr&lt;/a&gt;&lt;/p&gt; &lt;blockquote&gt; &lt;p&gt;"imo doing tracing for JS is really hard mode, because the browser is so incredibly warmup-sensitive. IIRC tracemonkey used a really low loop trip count (single-digit?) to decide when to start tracing (pypy uses &amp;gt;1000). the JS interpreters of the time were also quite slow."&lt;/p&gt; &lt;/blockquote&gt; &lt;p&gt;&lt;a href="https://twitter.com/cfbolz/status/1818609594219811245"&gt;me&lt;/a&gt;&lt;/p&gt; &lt;p&gt;In the meantime there were some more reminiscences about tracing in Javascript by &lt;a href="https://www.youtube.com/live/_VF3pISRYRc?t=24797s"&gt;Shu-Yu Guo in a panel discussion&lt;/a&gt; and by &lt;a href="https://kfogel.org/notice/AngH0uqyJl231yLLOa"&gt;Jason Orendorff on Mastodon&lt;/a&gt;.&lt;/p&gt; &lt;hr&gt; &lt;blockquote&gt; &lt;p&gt;"There are a number of corner cases you have to deal with in a tracing JIT. It's unfortunately not as simple and easy as the initial papers would have you believe. One example is how would you deal with a loop inside a loop? Is your tracing now recursive?&lt;/p&gt; &lt;p&gt;There's been some research work on trace stitching to deal with trace explosion but it does add complexity. With the increase in complexity, I think most industrial VM developers would rather pick tried-and-true method-based JITs that are well understood."&lt;/p&gt; &lt;/blockquote&gt; &lt;p&gt;&lt;a href="https://twitter.com/Love2Code/status/1818292516753383644"&gt;Maxime Chevalier&lt;/a&gt;&lt;/p&gt; &lt;h3 id="conclusion"&gt;Conclusion&lt;/h3&gt; &lt;p&gt;Given access to enough developers and in the context of "normal" jitting (ie not meta-jitting) it's very unclear to me that you should use tracing. It makes more sense to rather spend effort on a solid control-flow-graph-based baseline and then try to get some of the good properties of tracing on top (path splitting, partial inlining, partial escape analysis, etc).&lt;/p&gt; &lt;p&gt;For PyPy with its meta-JIT (and the fact that we don't have particularly much funding nor people) I still think tracing was/is a relatively pragmatic choice. When I talked with &lt;a href="https://samth.github.io/"&gt;Sam Tobin-Hochstadt&lt;/a&gt; about this topic recently he characterized it like this: "tracing is a labor-saving device for compiler authors".&lt;/p&gt; &lt;p&gt;Performance-wise PyPy is still quite hard to beat in the cases where it works well (i.e. pure Python code that doesn't use too many C modules, which are &lt;a href="https://pypy.org/posts/2018/09/inside-cpyext-why-emulating-cpython-c-8083064623681286567.html"&gt;supported but slow in PyPy&lt;/a&gt;). In general, there are very few JITs for Python (particularly with the constraint of not being "allowed" to change the language), the most competitive other ones are &lt;a href="https://www.graalvm.org/python/"&gt;GraalPy&lt;/a&gt;, also based on a meta-JIT approach. Instagram is running on &lt;a href="https://github.com/facebookincubator/cinder/"&gt;Cinder&lt;/a&gt; and also CPython has &lt;a href="https://tonybaloney.github.io/posts/python-gets-a-jit.html"&gt;grown a JIT recently&lt;/a&gt; which was part of the recent &lt;a href="https://docs.python.org/3.13/whatsnew/3.13.html#an-experimental-just-in-time-jit-compiler"&gt;3.13 release, but only as an off-by-default build option&lt;/a&gt;, so I'm very excited about how Python's performance will develop in the next years!&lt;/p&gt; &lt;div class="footnote"&gt; &lt;hr&gt; &lt;ol&gt; &lt;li id="fn:help"&gt; &lt;p&gt;(A side point: people who haven't worked on Python tend to underestimate its complexity and pace of development. A pet peeve of mine is C++ compiler devs/static analysis/Javascript people/other well-meaning communities coming with statements like "why don't you just..." 🤷‍♀️) &lt;a class="footnote-backref" href="https://www.pypy.org/posts/2025/01/musings-tracing.html#fnref:help" title="Jump back to footnote 1 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt; &lt;/li&gt; &lt;/ol&gt; &lt;/div&gt;</description><guid>https://www.pypy.org/posts/2025/01/musings-tracing.html</guid><pubDate>Sun, 05 Jan 2025 17:01:09 GMT</pubDate></item><item><title>Towards PyPy3.11 - an update</title><link>https://www.pypy.org/posts/2025/01/towards-pypy311-an-update.html</link><dc:creator>mattip</dc:creator><description>&lt;p&gt;We&lt;sup id="fnref:0"&gt;&lt;a class="footnote-ref" href="https://www.pypy.org/posts/2025/01/towards-pypy311-an-update.html#fn:0"&gt;1&lt;/a&gt;&lt;/sup&gt; are steadily working towards a Python 3.11 interpreter, which will be part of the upcoming PyPy 7.3.18 release. Along with that, we also recently updated &lt;a href="https://speed.pypy.org"&gt;speed.pypy.org&lt;/a&gt; to compare PyPy's performance to CPython 3.11 (it used to be CPython 3.7). &lt;/p&gt; &lt;!-- TEASER_END --&gt; &lt;h2 id="why-is-there-no-pypy-for-python-311"&gt;Why is there no PyPy for Python 3.11?&lt;/h2&gt; &lt;p&gt;TL;DR: we are working on it and hopefully will have a beta version soon&lt;/p&gt; &lt;p&gt;We started by merging the &lt;a href="https://peps.python.org/pep-0654/"&gt;exception groups&lt;/a&gt; work by Nico Rittinghaus, merging the CPython stdlib for Python 3.11.9, and updating the regex engine to handle atomic groupings. I think these were the largest changes needed to support Python3.11, maybe I missed something else?&lt;/p&gt; &lt;p&gt;We then created a &lt;a href="https://github.com/pypy/pypy/milestone/15"&gt;milestone&lt;/a&gt; with many of the changes that might not be caught in testing. As of today that milestone stands at 24/40 issues closed, 16 open. This is in addition to the &lt;a href="https://github.com/pypy/pypy/milestone/22"&gt;v7.3.18 milestone&lt;/a&gt;, which has 11/23 issues closed, 12 open.&lt;/p&gt; &lt;p&gt;We also updated our infrastructure to run nightly &lt;a href="https://buildbot.pypy.org/summary?branch=py3.11"&gt;buildbot test of the py3.11&lt;/a&gt; branch, including adding py3.11 to our benchmarking site &lt;a href="https://speed.pypy.org/"&gt;speed.pypy.org&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Then the real work started: fixing these milestone issues and failing stdlib tests. Some of the changes were cosmetic changes to error messages, some were more involved changes to the interpreter to behave more like CPython. For instance, &lt;code&gt;hex(x)&lt;/code&gt; where &lt;code&gt;x&lt;/code&gt; is an int calls &lt;code&gt;long.format(x, "#x")&lt;/code&gt; on CPython where in PyPy we used &lt;code&gt;x.__format__("#x")&lt;/code&gt;. This subtle difference caused a failure in the &lt;code&gt;repr&lt;/code&gt; of &lt;code&gt;IntEnum&lt;/code&gt;. Tracking down problems like these takes time. We are now down to about 250 failing stdlib tests, with thousands passing. For comparison, PyPy3.10, first released in June 2023, still has around 100 failing stdlib tests.&lt;/p&gt; &lt;h3 id="c-extension-support"&gt;C-extension support&lt;/h3&gt; &lt;p&gt;PyPy supports the Python C-API via a cpyext compatibility layer. We "mangle" the CPython symbols to add an extra &lt;code&gt;Py&lt;/code&gt; to prevent loading CPython c-extension modules into PyPy, since the ABI is different. So a function like &lt;code&gt;PyLong_FromLong&lt;/code&gt; will be exported from the c shared object as &lt;code&gt;PyPyLong_FromLong&lt;/code&gt;. One of my long-standing goals is to remove this mangling, but it then requires that our c declarations, inlined functions, and macros in the headers match 1:1 the CPython headers. We can get by with not declaring and implementing parts of the interfaces, but what is declared must be identical. This is a long-term project, with each release the headers get closer to the CPython versions. I hope to concentrate on the PyUnicode interfaces for this release.&lt;/p&gt; &lt;p&gt;Note that the time to do this is before a new version release. Once the version is released, we cannot change the headers significantly.&lt;/p&gt; &lt;h3 id="so-what-is-left"&gt;So what is left?&lt;/h3&gt; &lt;p&gt;Summarizing the milestones and other things to do:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;Update to the recently released 3.11.11 stdlib&lt;/li&gt; &lt;li&gt;Make sure &lt;a href="https://vmprof.readthedocs.io/en/latest/"&gt;vmprof&lt;/a&gt; works&lt;/li&gt; &lt;li&gt;Update the &lt;code&gt;time&lt;/code&gt; module to use more &lt;code&gt;MONOTONIC_CLOCK&lt;/code&gt;, implement &lt;code&gt;time.sleep&lt;/code&gt; differently, and clean up the many duplicate &lt;code&gt;time&lt;/code&gt; like interfaces we have across the codebase. We have the &lt;code&gt;time&lt;/code&gt; module, some time routines in the &lt;code&gt;_threading&lt;/code&gt; module&lt;code&gt;and RPython threading support in the&lt;/code&gt;rpython/rlib` code. We should also make sure we are using 64-bit time interfaces.&lt;/li&gt; &lt;li&gt;Decide whether zero-cost exceptions gain us in performance and whether we should implement them even if they do not improve performance.&lt;/li&gt; &lt;li&gt;Update our &lt;a href="http://hpyproject.org/"&gt;hpy&lt;/a&gt; backend to the latest HEAD, which would allow running the &lt;a href="https://github.com/hpyproject/numpy-hpy/tree/graal-team/hpy#readme"&gt;hpy numpy fork&lt;/a&gt;&lt;/li&gt; &lt;li&gt;Reintegrate the pure-python pyrepl libbrary from CPython 3.13.&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;What else did I forget?&lt;/p&gt; &lt;h2 id="why-did-the-benchmark-results-get-worse-on-speedpypyorg"&gt;Why did the benchmark results get worse on speed.pypy.org?&lt;/h2&gt; &lt;p&gt;TL;DR: running a benchmark site is hard. Something changed in the benchmark runner, and suddenly benchmarks got 10-15% slower.&lt;/p&gt; &lt;p&gt;PyPy run an instance of &lt;a href="https://github.com/python/codespeed/tree/speed.pypy.org"&gt;codepseed&lt;/a&gt; with a very old &lt;a href="https://foss.heptapod.net/pypy/benchmarks"&gt;benchmarking suite&lt;/a&gt; that can still run Python2 (remember: that is the language of &lt;a href="https://rpython.readthedocs.io/en/latest/"&gt;RPython&lt;/a&gt; underlying PyPy). The site runs on PSA infrastructure and the benchmarking machine is generously sponsored by &lt;a href="https://baroquesoftware.com/"&gt;Baroque Software&lt;/a&gt;. On Nov 9, there was a sudden jump in benchmarking times. For instance &lt;a href="https://speed.pypy.org/timeline/#/?exe=21,8&amp;amp;base=10+2622&amp;amp;ben=float&amp;amp;revs=50&amp;amp;equid=off&amp;amp;quarts=on&amp;amp;extr=on&amp;amp;env=3"&gt;here&lt;/a&gt; is the result for the &lt;code&gt;float&lt;/code&gt; benchmark. This happened across various benchmark runs: both PyPy2.7 and PyPy3.11alpha, with and without the JIT. After spending some time rerunning various benchmarks, I could only conclude the machine itself had gotten slower, maybe due to some security update in the linux kernel, maybe some change in the hosting platform. This "broke" the front-page comparsison: suddenly "latest" is much slower than the historic benchmarks run previous to the changes in the machine.&lt;/p&gt; &lt;p&gt;That page also recently (as of last week) uses CPython 3.11 as a baseline for comparison, where previously it used CPython3.7. It is common knowledge that the newer CPython versions are faster, and we see this now quite clearly. Diving into individual benchmarks, we can see that ones where PyPy-with-a-jit was comparable to CPython3.7 seem to be the ones that CPython3.11 improved greatly. Looking at a &lt;a href="https://speed.pypy.org/comparison/?exe=22%2BL%2Bpy3.11%2C2%2B2360%2C2%2B3893&amp;amp;ben=1%2C34%2C58%2C63%2C60%2C27%2C61%2C2%2C25%2C57%2C3%2C46%2C4%2C5%2C41%2C42%2C22%2C44%2C6%2C59%2C39%2C7%2C8%2C65%2C45%2C23%2C62%2C66%2C24%2C9%2C10%2C47%2C48%2C49%2C50%2C51%2C11%2C12%2C13%2C40%2C14%2C69%2C15%2C70%2C67%2C68%2C64%2C35%2C36%2C37%2C38%2C16%2C52%2C54%2C55%2C53%2C56%2C28%2C30%2C32%2C29%2C33%2C17%2C18%2C19%2C20%2C43&amp;amp;env=3&amp;amp;hor=true&amp;amp;bas=2%2B2360&amp;amp;chart=normal+bars"&gt;comparison of the runs&lt;/a&gt; this can be seen in benchmarks like deltablue and the sqlalchemy family. So the new graph has more lines that extend past 1.5 than the old graph.&lt;/p&gt; &lt;p style="text-align: center;"&gt; New graph with PyPy3.11 &lt;/p&gt; &lt;p&gt;&lt;img alt="new graph" src="https://www.pypy.org/images/2024-12-new-graph.png"&gt;&lt;/p&gt; &lt;p style="text-align: center;"&gt; Older graph with PyPy3.10 &lt;/p&gt; &lt;p&gt;&lt;img alt="old graph" src="https://www.pypy.org/images/2024-12-old-graph.png"&gt;.&lt;/p&gt; &lt;div class="footnote"&gt; &lt;hr&gt; &lt;ol&gt; &lt;li id="fn:0"&gt; &lt;p&gt;These days most of the work is done by &lt;a href="https://cfbolz.de/"&gt;CF Bolz-Tereick&lt;/a&gt; and &lt;a href="https://github.com/mattip"&gt;me&lt;/a&gt; on a volunteer basis. Want to get involved? Reach out, we would love to expand the team. Have an idea for funding the work? Fantastic, let's talk. &lt;a class="footnote-backref" href="https://www.pypy.org/posts/2025/01/towards-pypy311-an-update.html#fnref:0" title="Jump back to footnote 1 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt; &lt;/li&gt; &lt;/ol&gt; &lt;/div&gt;</description><category>release</category><guid>https://www.pypy.org/posts/2025/01/towards-pypy311-an-update.html</guid><pubDate>Sat, 04 Jan 2025 13:29:11 GMT</pubDate></item><item><title>Guest Post: Final Encoding in RPython Interpreters</title><link>https://www.pypy.org/posts/2024/11/guest-post-final-encoding-in-rpython.html</link><dc:creator>Corbin</dc:creator><description>&lt;h3 id="introduction"&gt;Introduction&lt;/h3&gt; &lt;p&gt;This post started as a quick note summarizing a recent experiment I carried out upon a small RPython interpreter by rewriting it in an uncommon style. It is written for folks who have already written some RPython and want to take a deeper look at interpreter architecture.&lt;/p&gt; &lt;p&gt;Some experiments are about finding solutions to problems. This experiment is about taking a solution which is already well-understood and applying it in the context of RPython to find a new approach. As we will see, there is no real change in functionality or the number of clauses in the interpreter; it's more like a comparison between endo- and exoskeletons, a different arrangement of equivalent bones and plates.&lt;/p&gt; &lt;h3 id="overview"&gt;Overview&lt;/h3&gt; &lt;p&gt;An RPython interpreter for a programming language generally does three or four things, in order:&lt;/p&gt; &lt;ol&gt; &lt;li&gt;Read and parse input programs&lt;/li&gt; &lt;li&gt;Encode concrete syntax as abstract syntax&lt;/li&gt; &lt;li&gt;&lt;em&gt;Optionally&lt;/em&gt;, optimize or reduce the abstract syntax&lt;/li&gt; &lt;li&gt;Evaluate the abstract syntax: read input data, compute, print output data, etc.&lt;/li&gt; &lt;/ol&gt; &lt;p&gt;Today we'll look at abstract syntax. Most programming languages admit a &lt;a href="https://en.wikipedia.org/wiki/Parse_tree"&gt;concrete parse tree&lt;/a&gt; which is readily abstracted to provide an &lt;a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree"&gt;abstract syntax tree&lt;/a&gt; (AST). The AST is usually encoded with the &lt;em&gt;initial&lt;/em&gt; style of encoding. An initial encoding can be transformed into any other encoding for the same AST, looks like a hierarchy of classes, and is implemented as a static structure on the heap.&lt;/p&gt; &lt;p&gt;In contrast, there is also a &lt;em&gt;final&lt;/em&gt; encoding. A final encoding can be transformed into by any other encoding, looks like an interface for the actions of the interpreter, and is implemented as an unwinding structure on the stack. From the RPython perspective, Python builtin modules like &lt;code&gt;os&lt;/code&gt; or &lt;code&gt;sys&lt;/code&gt; are final encodings for features of the operating system; the underlying implementation is different when translated or untranslated, but the interface used to access those features does not change.&lt;/p&gt; &lt;p&gt;In RPython, an initial encoding is built from a hierarchy of classes. Each class represents a type of tree nodes, corresponding to a parser production in the concrete parse tree. Each class instance therefore represents an individual tree node. The fields of a class, particularly those filled during &lt;code&gt;.__init__()&lt;/code&gt;, store pre-computed properties of each node; methods can be used to compute node properties on demand. This seems like an obvious and simple approach; what other approaches could there be? We need an example.&lt;/p&gt; &lt;h3 id="final-encoding-of-brainfuck"&gt;Final Encoding of Brainfuck&lt;/h3&gt; &lt;p&gt;We will consider &lt;a href="https://esolangs.org/wiki/Brainfuck"&gt;Brainfuck&lt;/a&gt;, a simple Turing-complete programming language. An example Brainfuck program might be:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;-&lt;/span&gt;&lt;span class="k"&gt;]&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;This program is built from a loop and a decrement, and sets a cell to zero. In an initial encoding which follows the &lt;a href="https://esolangs.org/wiki/Algebraic_Brainfuck"&gt;algebraic semantics of Brainfuck&lt;/a&gt;, the program could be expressed by applying class constructors to build a structure on the heap:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;Loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Plus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;A final encoding is similar, except that class constructors are replaced by methods, the structure is built on the stack, and we are parameterized over the choice of class:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="bp"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;cls&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;cls&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;In ordinary Python, transforming between these would be trivial, and mostly is a matter of passing around the appropriate class. Indeed, initial and final encodings are equivalent; we'll return to that fact later. However, in RPython, all of the types must line up, and classes must be determined before translation. We'll need to monomorphize our final encodings, using some RPython tricks later on. Before that, let's see what an actual Brainfuck interface looks like, so that we can cover all of the difficulties with final encoding.&lt;/p&gt; &lt;p&gt;Before we embark, please keep in mind that local code doesn't know what &lt;code&gt;cls&lt;/code&gt; is. There's no type-safe way to inspect an arbitrary semantic domain. In the initial-encoded version, we can ask &lt;code&gt;isinstance(bf, Loop)&lt;/code&gt; to see whether an AST node is a loop, but there simply isn't an equivalent for final-encoded ASTs. So, there is an implicit challenge to think about: how do we evaluate a program in an arbitrary semantic domain? For bonus points, how do we optimize a program without inspecting the types of its AST nodes?&lt;/p&gt; &lt;p&gt;What follows is a dissection of &lt;a href="https://github.com/rpypkgs/rpypkgs/blob/d439d225b79ac82e009a5f1cd1c670f00356464c/bf/bf.py"&gt;this&lt;/a&gt; module at the given revision. Readers may find it satisfying to read the entire interpreter top to bottom first; it is less than 300 lines.&lt;/p&gt; &lt;h4 id="core-functionality"&gt;Core Functionality&lt;/h4&gt; &lt;p&gt;Final encoding is given as methods on an interface. These five methods correspond precisely to the summands of the algebra of Brainfuck.&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;BF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;# Other methods elided&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;plus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;pass&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;right&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;pass&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;pass&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;output&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;pass&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;pass&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;Note that the &lt;code&gt;.loop()&lt;/code&gt; method takes another program as an argument. Initial-encoded ASTs have other initial-encoded ASTs as fields on class instances; final-encoded ASTs have other final-encoded ASTs as parameters to interface methods. RPython infers all of the types, so the reader has to know that &lt;code&gt;i&lt;/code&gt; is usually an integer while &lt;code&gt;bfs&lt;/code&gt; is a sequence of Brainfuck operations.&lt;/p&gt; &lt;p&gt;We're using a class to implement this functionality. Later, we'll treat it as a mixin, rather than a superclass, to avoid typing problems.&lt;/p&gt; &lt;h4 id="monoid"&gt;Monoid&lt;/h4&gt; &lt;p&gt;In order to optimize input programs, we'll need to represent the underlying &lt;a href="https://en.wikipedia.org/wiki/Monoid"&gt;monoid&lt;/a&gt; of Brainfuck programs. To do this, we add the signature for a monoid:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;BF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;# Other methods elided&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;unit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;pass&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;pass&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;This is technically a &lt;a href="https://en.wikipedia.org/wiki/Magma_(algebra)"&gt;unital magma&lt;/a&gt;, since RPython doesn't support algebraic laws, but we will enforce the algebraic laws later on during optimization. We also want to make use of the folklore that &lt;a href="https://en.wikipedia.org/wiki/Free_monoid"&gt;free monoids&lt;/a&gt; are lists, allowing callers to pass a list of actions which we'll reduce with recursion:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;BF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;# Other methods elided&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;joinList&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;joinList&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;joinList&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:]))&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;&lt;code&gt;.joinList()&lt;/code&gt; is a little bulky to implement, but Wirth's principle applies: the interpreter is shorter with it than without it.&lt;/p&gt; &lt;h4 id="idioms"&gt;Idioms&lt;/h4&gt; &lt;p&gt;Finally, our interface includes a few high-level idioms, like the zero program shown earlier, which are defined in terms of low-level behaviors. In an initial encoding, these could be defined as module-level functions; here, we define them on the mixin class &lt;code&gt;BF&lt;/code&gt;.&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;BF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;# Other methods elided&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;zero&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scalemove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;move2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scalemove2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;scalemove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;joinList&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;right&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;right&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)]))&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;scalemove2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;joinList&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;right&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;right&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;right&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;)]))&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;h3 id="interface-oriented-architecture"&gt;Interface-oriented Architecture&lt;/h3&gt; &lt;h4 id="applying-interfaces"&gt;Applying Interfaces&lt;/h4&gt; &lt;p&gt;Now, we hack at RPython's object model until everything translates. First, consider the task of pretty-printing. For Brainfuck, we'll simply regurgitate the input program as a Python string:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;AsStr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;import_from_mixin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;unit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;plus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s1"&gt;'+'&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="s1"&gt;'-'&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;right&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s1"&gt;'&amp;gt;'&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;'&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s1"&gt;'['&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s1"&gt;']'&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s1"&gt;','&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;output&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s1"&gt;'.'&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;Via &lt;code&gt;rlib.objectmodel.import_from_mixin&lt;/code&gt;, no stressing with covariance of return types is required. Instead, we shift from a Java-esque view of classes and objects, to an OCaml-ish view of prebuilt classes and constructors. &lt;code&gt;AsStr&lt;/code&gt; is monomorphic, and any caller of it will have to create their own covariance somehow. For example, here are the first few lines of the parsing function:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nd"&gt;@specialize&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argtype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt; &lt;span class="c1"&gt;# Parser elided to preserve the reader's attention&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;By invoking &lt;code&gt;rlib.objectmodel.specialize.argtype&lt;/code&gt;, we make copies of the parsing function, up to one per call site, based on our choice of semantic domain. &lt;a href="https://okmij.org/ftp/tagless-final/"&gt;Oleg&lt;/a&gt; calls these "symantics" but I prefer "domain" in code. Also, note how the parsing stack starts with the unit of the monoid, which corresponds to the empty input string; the parser will repeatedly use the monoidal join to build up a parsed expression without inspecting it. Here's a small taste of that:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;char&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;char&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;'+'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;char&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;'-'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;# and so on&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;The reader may feel justifiably mystified; what breaks if we don't add these magic annotations? Well, the translator will throw &lt;code&gt;UnionError&lt;/code&gt; because the low-level types don't match. RPython only wants to make one copy of functions like &lt;code&gt;parse()&lt;/code&gt; in its low-level representation, and each copy of &lt;code&gt;parse()&lt;/code&gt; will be compiled to monomorphic machine code. In this interpreter, in order to support parsing to an optimized string and also parsing to an evaluator, we need two copies of &lt;code&gt;parse()&lt;/code&gt;. &lt;strong&gt;It is okay to not fully understand this at first.&lt;/strong&gt;&lt;/p&gt; &lt;h4 id="composing-interfaces"&gt;Composing Interfaces&lt;/h4&gt; &lt;p&gt;Earlier, we noted that an interpreter can optionally optimize input programs after parsing. To support this, we'll precompose a &lt;a href="https://en.wikipedia.org/wiki/Peephole_optimization"&gt;peephole optimizer&lt;/a&gt; onto an arbitrary domain. We could also postcompose with a parser instead, but that sounds more difficult. Here are the relevant parts:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;makePeephole&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;stripDomain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;joinList&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Peephole&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;import_from_mixin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;unit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="c1"&gt;# Actual definition elided... for now...&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Peephole&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stripDomain&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;Don't worry about the actual optimization yet. What's important here is the pattern of initialization of semantic domains. &lt;code&gt;makePeephole&lt;/code&gt; is an &lt;a href="https://en.wikipedia.org/wiki/Standard_ML"&gt;SML&lt;/a&gt;-style functor on semantic domains: given a final encoding of Brainfuck, it produces another final encoding of Brainfuck which incorporates optimizations. The helper &lt;code&gt;stripDomain&lt;/code&gt; is a finalizer which performs the extraction from the optimizer's domain to the underlying &lt;code&gt;cls&lt;/code&gt; that was passed in at translation time. For example, let's optimize pretty-printing:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;AsStr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;finishStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;makePeephole&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AsStr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;Now, it only takes one line to parse and print an optimized AST without ever building it on the heap. To be pedantic, fragments of the output string will be heap-allocated, but the AST's node structure will only ever be stack-allocated. Further, to be shallow, the parser is written to prevent malicious input from causing a stack overflow, and this forces it to maintain a heap-allocated RPython list of intermediate operations inside loops.&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="n"&gt;finishStr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AsStr&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;h3 id="performance"&gt;Performance&lt;/h3&gt; &lt;p&gt;But is it fast? Yes. It's faster than the prior version, which was initial-encoded, and also faster than Andrew Brown's classic version (&lt;a href="https://pypy.org/posts/2011/04/tutorial-writing-interpreter-with-pypy-3785910476193156295.html"&gt;part 1&lt;/a&gt;, &lt;a href="https://pypy.org/posts/2011/04/tutorial-part-2-adding-jit-8121732841568309472.html"&gt;part 2&lt;/a&gt;). Since Brown's interpreter does not perform much optimization, we will focus on how final encoding can outperform initial encoding.&lt;/p&gt; &lt;h4 id="jit"&gt;JIT&lt;/h4&gt; &lt;p&gt;First, why is it faster than the same interpreter with initial encoding? Well, it still has initial encoding from the JIT's perspective! There is an &lt;code&gt;Op&lt;/code&gt; class with a hierarchy of subclasses implementing individual behaviors. A sincere tagless-final student, or those who remember &lt;a href="https://pyvideo.org/pycon-us-2012/stop-writing-classes.html"&gt;Stop Writing Classes (2012, Pycon US)&lt;/a&gt;, will recognize that the following classes could be plain functions, and should think of the classes as a concession to RPython's lack of support for lambdas with closures rather than an initial encoding. We aren't ever going to directly typecheck any &lt;code&gt;Op&lt;/code&gt;, but the JIT will generate typechecking guards anyway, so we effectively get a fully-promoted AST inlined into each JIT trace. First, some simple behaviors:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Op&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;_immutable_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;_Input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Op&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;_immutable_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;runOn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt; &lt;span class="n"&gt;Input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_Input&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;_Output&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Op&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;_immutable_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;runOn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt; &lt;span class="n"&gt;Output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_Output&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Op&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;_immutable_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt; &lt;span class="n"&gt;_immutable_fields_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"imm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;imm&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;imm&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;runOn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;imm&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;The JIT does technically have less information than before; it no longer knows that a sequence of immutable operations is immutable enough to be worth unrolling, but a bit of &lt;code&gt;rlib.jit.unroll_safe&lt;/code&gt; fixes that:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Seq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Op&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;_immutable_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt; &lt;span class="n"&gt;_immutable_fields_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ops[*]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ops&lt;/span&gt; &lt;span class="nd"&gt;@unroll_safe&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;runOn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;runOn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;Finally, the JIT entry point is at the head of each loop, just like with prior interpreters. Since Brainfuck doesn't support mid-loop jumps, there's no penalty for only allowing merge points at the head of the loop.&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;Loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Op&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;_immutable_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt; &lt;span class="n"&gt;_immutable_fields_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"op"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;runOn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="n"&gt;jitdriver&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jit_merge_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;runOn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;That's the end of the implicit challenge. There's no secret to it; just evaluate the AST. Here's part of the semantic domain for evaluation, as well as the "functor" to optimize it. In &lt;code&gt;AsOps.join()&lt;/code&gt; are the &lt;em&gt;only&lt;/em&gt; &lt;code&gt;isinstance()&lt;/code&gt; calls in the entire interpreter! This is acceptable because &lt;code&gt;Seq&lt;/code&gt; is effectively a type wrapper for an RPython list, so that a list of operations is also an operation; its list is initial-encoded and available for inspection.&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;AsOps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;import_from_mixin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;unit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Shift&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Seq&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Seq&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Seq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Seq&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Seq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Seq&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Seq&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Seq&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="c1"&gt;# Other methods elided!&lt;/span&gt; &lt;span class="n"&gt;AsOps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;finishOps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;makePeephole&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AsOps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;And finally here is the actual top-level code to evaluate the input program. As before, once everything is composed, the actual invocation only takes one line.&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;tape&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;bytearray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\x00&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;cells&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;finishOps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AsOps&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;runOn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;h4 id="peephole-optimization"&gt;Peephole Optimization&lt;/h4&gt; &lt;p&gt;Our peephole optimizer is an &lt;a href="https://en.wikipedia.org/wiki/Abstract_interpretation"&gt;abstract interpreter&lt;/a&gt; with one instruction of lookahead/rewrite buffer. It implements the aforementioned algebraic laws of the Brainfuck monoid. It also implements idiom recognition for loops. First, the abstract interpreter. The abstract domain has six elements:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;AbstractDomain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;pass&lt;/span&gt; &lt;span class="n"&gt;meh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;aLoop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;aZero&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;theIdentity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;anAdd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;aRight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;AbstractDomain&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;We'll also tag everything with an integer, so that &lt;code&gt;anAdd&lt;/code&gt; or &lt;code&gt;aRight&lt;/code&gt; can be exact annotations. &lt;em&gt;This&lt;/em&gt; is the actual &lt;code&gt;Peephole.join()&lt;/code&gt; method:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="n"&gt;rv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;[:]&lt;/span&gt; &lt;span class="n"&gt;bfHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;immHead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ad&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;imm&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ad&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;theIdentity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;aLoop&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;ad&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;aLoop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;theIdentity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bfHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;immHead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ad&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;imm&lt;/span&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;anAdd&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;ad&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;aZero&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bfHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;immHead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ad&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;imm&lt;/span&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;anAdd&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;ad&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;anAdd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;immHead&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;imm&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;immHead&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bfHead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;immHead&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;rv&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bfHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;immHead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bfHead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;theIdentity&lt;/span&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;aRight&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;ad&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;aRight&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;immHead&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;imm&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;immHead&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bfHead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;right&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;immHead&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;rv&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bfHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;immHead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bfHead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;theIdentity&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;bfHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;immHead&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;bfHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;immHead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ad&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;imm&lt;/span&gt; &lt;span class="n"&gt;rv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;bfHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;adHead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;immHead&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;rv&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;If this were to get much longer, then &lt;a href="https://pypy.org/posts/2024/10/jit-peephole-dsl.html"&gt;implementing a DSL&lt;/a&gt; would be worth it, but this is a short-enough method to inline. The abstract interpretation is assumed by induction for the left-hand side of the join, save for the final instruction, which is loaded into a rewrite register. Each instruction on the right-hand side is inspected exactly once. The logic for &lt;code&gt;anAdd&lt;/code&gt; followed by &lt;code&gt;anAdd&lt;/code&gt; is exactly the same as for &lt;code&gt;aRight&lt;/code&gt; followed by &lt;code&gt;aRight&lt;/code&gt; because they both have underlying &lt;a href="https://en.wikipedia.org/wiki/Abelian_group"&gt;Abelian groups&lt;/a&gt; given by the integers. The rewrite register is carefully pushed onto and popped off from the left-hand side in order to cancel out &lt;code&gt;theIdentity&lt;/code&gt;, which itself is merely a unifier for &lt;code&gt;anAdd&lt;/code&gt; or &lt;code&gt;aRight&lt;/code&gt; of 0.&lt;/p&gt; &lt;p&gt;Note that we generate a lot of garbage. For example, parsing a string of &lt;em&gt;n&lt;/em&gt; '+' characters will cause the peephole optimizer to allocate &lt;em&gt;n&lt;/em&gt; instances of the underlying &lt;code&gt;domain.plus()&lt;/code&gt; action, from &lt;code&gt;domain.plus(1)&lt;/code&gt; up to &lt;code&gt;domain.plus(n)&lt;/code&gt;. An older initial-encoded version of this interpreter used &lt;a href="https://en.wikipedia.org/wiki/Hash_consing"&gt;hash consing&lt;/a&gt; to avoid ever building an op more than once, even loops. It appears more efficient to generate lots of immutable garbage than to repeatedly hash inputs and search mutable hash tables, at least for optimizing Brainfuck incrementally during parsing.&lt;/p&gt; &lt;p&gt;Finally, let's look at idiom recognition. RPython lists are initial-coded, so we can dispatch based on the length of the list, and then inspect the abstract domains of each action.&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;isConstAdd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;anAdd&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;oppositeShifts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bf1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bf2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;bf1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;bf2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;aRight&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;bf1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bf2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;oppositeShifts2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bf1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bf2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bf3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bf1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;bf2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;bf3&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;aRight&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;bf1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;bf2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;bf3&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ad&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;imm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ad&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;anAdd&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;imm&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zero&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;aZero&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;isConstAdd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;anAdd&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;oppositeShifts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;])):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scalemove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="n"&gt;aLoop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;isConstAdd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;anAdd&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;oppositeShifts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scalemove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="n"&gt;aLoop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;isConstAdd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;anAdd&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;oppositeShifts2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;])):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scalemove2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="n"&gt;aLoop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;isConstAdd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;anAdd&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;oppositeShifts2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;])):&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scalemove2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="n"&gt;aLoop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stripDomain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bfs&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="n"&gt;aLoop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;This ends the bonus question. How do we optimize an unknown semantic domain? We must maintain an abstract context which describes elements of the domain. In initial encoding, we ask an AST about itself. In final encoding, we already know everything relevant about the AST.&lt;/p&gt; &lt;p&gt;The careful reader will see that I didn't really answer that opening question in the JIT section. Because the JIT still ranges over the same operations as before, it can't really be slower; but why is it now faster? Because the optimizer is now slightly better in a few edge cases. It performs the same optimizations as before, but the rigor of abstract interpretation causes it to emit slightly better operations to the JIT backend.&lt;/p&gt; &lt;p&gt;Concretely, improving the optimizer can shorten pretty-printed programs. The &lt;a href="https://bbgauge.info/"&gt;Busy Beaver Gauge&lt;/a&gt; measures the length of programs which search for solutions to mathematical problems. After implementing and debugging the final-encoded interpreter, I found that two of my entries on the &lt;a href="https://bbgauge.info/brainfuck.html"&gt;Busy Beaver Gauge for Brainfuck&lt;/a&gt; had become shorter by about 2%. (Most other entries are already hand-optimized according to the standard algebra and have no optimization opportunities.)&lt;/p&gt; &lt;h3 id="discussion"&gt;Discussion&lt;/h3&gt; &lt;p&gt;Given that initial and final encodings are equivalent, and noting that RPython's toolchain is written to prefer initial encodings, what did we actually gain? Did we gain anything?&lt;/p&gt; &lt;p&gt;One obvious downside to final encoding in RPython is interpreter size. The example interpreter shown here is a rewrite of an initial-encoded interpreter which can be seen &lt;a href="https://github.com/rpypkgs/rpypkgs/blob/659c8a26d428a1e04fdff614b28e464a50d4647b/bf/bf.py"&gt;here&lt;/a&gt; for comparison. Final encoding adds about 20% more code in this case.&lt;/p&gt; &lt;p&gt;Final encoding is not necessarily more code than initial encoding, though. All AST encodings in interpreters are subject to the &lt;a href="https://en.wikipedia.org/wiki/Expression_problem"&gt;Expression Problem&lt;/a&gt;, which states that there is generally a quadratic amount of code required to implement multiple behaviors for an AST with multiple types of nodes; specifically, &lt;em&gt;n&lt;/em&gt; behaviors for &lt;em&gt;m&lt;/em&gt; types of nodes require &lt;em&gt;n&lt;/em&gt; × &lt;em&gt;m&lt;/em&gt; methods. Initial encodings improve the cost of adding new types of nodes; final encodings improve the cost of adding new behaviors. Final encoding may tend to win in large codebases for mature languages, where the language does not change often but new behaviors are added frequently and maintained for long periods.&lt;/p&gt; &lt;p&gt;Optimizations in final encoding require a bit of planning. The abstract-interpretation approach is solid but relies upon the monoid and its algebraic laws. In the worst case, an entire class hierarchy could be required to encode the abstraction.&lt;/p&gt; &lt;p&gt;It is remarkable to find &lt;strong&gt;a 2% improvement in residual program size&lt;/strong&gt; merely by reimplementing an optimizer as an abstract interpreter respecting the algebraic laws. This could be the most important lesson for compiler engineers, if it happens to generalize.&lt;/p&gt; &lt;p&gt;Final encoding was popularized via the tagless-final movement in OCaml and Scala, including famously in a series of tutorials by &lt;a href="https://okmij.org/ftp/tagless-final/"&gt;Kiselyov et al&lt;/a&gt;. A "tag", in this jargon, is a runtime identifier for an object's type or class; a tagless encoding effectively doesn't allow &lt;code&gt;isinstance()&lt;/code&gt; at all. In the above presentation, tags could be hacked in, but were not materially relevant to most steps. Tags were required for the final evaluation step, though, and the tagless-final insight is that certain type systems can express type-safe evaluation without those tags. We won't go further in this direction because tags also communicate valuable information to the JIT.&lt;/p&gt; &lt;h4 id="summarizing-table"&gt;Summarizing Table&lt;/h4&gt; &lt;table&gt; &lt;thead&gt; &lt;tr&gt; &lt;th&gt;Initial Encoding&lt;/th&gt; &lt;th&gt;Final Encoding&lt;/th&gt; &lt;/tr&gt; &lt;/thead&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td&gt;hierarchy of classes&lt;/td&gt; &lt;td&gt;signature of interfaces&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td&gt;class constructors&lt;/td&gt; &lt;td&gt;method calls&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td&gt;built on the heap&lt;/td&gt; &lt;td&gt;built on the stack&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td&gt;traversals allocate stack&lt;/td&gt; &lt;td&gt;traversals allocate heap&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td&gt;tags are available with &lt;code&gt;isinstance()&lt;/code&gt;&lt;/td&gt; &lt;td&gt;tags are only available through hacks&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td&gt;cost of adding a new AST node: one class&lt;/td&gt; &lt;td&gt;cost of adding a new AST node: one method on every other class&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td&gt;cost of adding a new behavior: one method on every other class&lt;/td&gt; &lt;td&gt;cost of adding a new behavior: one class&lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;/table&gt; &lt;h3 id="credits"&gt;Credits&lt;/h3&gt; &lt;p&gt;Thanks to folks in &lt;code&gt;#pypy&lt;/code&gt; on Libera Chat: arigato for the idea, larstiq for pushing me to write it up, and cfbolz and mattip for reviewing and finding mistakes. The original IRC discussion leading to this blog post is available &lt;a href="https://gist.github.com/MostAwesomeDude/fd86ad2d2e38af7aa67b6e548aabe008"&gt;here&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;This interpreter is part of the &lt;a href="https://github.com/rpypkgs/rpypkgs"&gt;rpypkgs&lt;/a&gt; suite, a Nix flake for RPython interpreters. Readers with Nix installed can run this interpreter directly from the flake:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;$&lt;span class="w"&gt; &lt;/span&gt;nix-prefetch-url&lt;span class="w"&gt; &lt;/span&gt;https://github.com/MG-K/pypy-tutorial-ko/raw/refs/heads/master/mandel.b $&lt;span class="w"&gt; &lt;/span&gt;nix&lt;span class="w"&gt; &lt;/span&gt;run&lt;span class="w"&gt; &lt;/span&gt;github:rpypkgs/rpypkgs#bf&lt;span class="w"&gt; &lt;/span&gt;--&lt;span class="w"&gt; &lt;/span&gt;/nix/store/ngnphbap9ncvz41d0fkvdh61n7j2bg21-mandel.b &lt;/pre&gt;&lt;/div&gt;</description><category>guestpost</category><guid>https://www.pypy.org/posts/2024/11/guest-post-final-encoding-in-rpython.html</guid><pubDate>Thu, 14 Nov 2024 08:42:36 GMT</pubDate></item><item><title>A DSL for Peephole Transformation Rules of Integer Operations in the PyPy JIT</title><link>https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html</link><dc:creator>CF Bolz-Tereick</dc:creator><description>&lt;p&gt;As is probably apparent from the sequence of blog posts about the topic in the last year, I have been thinking about and working on integer optimizations in the JIT compiler a lot. This work was mainly motivated by &lt;a class="reference external" href="https://docs.pydrofoil.org/en/latest/"&gt;Pydrofoil&lt;/a&gt;, where integer operations matter a lot more than for your typical Python program.&lt;/p&gt; &lt;p&gt;In this post I'll describe my most recent change, which is a new small domain specific language that I implemented to specify peephole optimizations on integer operations in the JIT. It uses pattern matching to specify how (sequences of) integer operations should be simplified and optimized. The rules are then compiled to RPython code that then becomes part of the JIT's optimization passes.&lt;/p&gt; &lt;p&gt;To make it less likely to introduce incorrect optimizations into the JIT, the rules are automatically proven correct with Z3 as part of the build process (for a more hands-on intro to how that works you can look at the &lt;a class="reference external" href="https://pypy.org/posts/2024/08/toy-knownbits.html#proving-correctness-of-the-transfer-functions-with-z3"&gt;knownbits&lt;/a&gt; post). In this blog post I want to motivate why I introduced the DSL and give an introduction to how it works.&lt;/p&gt; &lt;section id="motivation"&gt; &lt;h2&gt;Motivation&lt;/h2&gt; &lt;p&gt;This summer, after I wrote my &lt;a class="reference external" href="https://www.pypy.org/posts/2024/07/mining-jit-traces-missing-optimizations-z3.html"&gt;scripts to mine JIT traces for missed optimization&lt;/a&gt; opportunities, I started implementing a few of the integer peephole rewrite that the script identified. Unfortunately, doing so led to the problem that the way we express these rewrites up to now is very imperative and verbose. Here's a snippet of RPython code that shows some rewrites for integer multiplication (look at the comments to see what the different parts actually do). You don't need to understand the code in detail, but basically it's in very imperative style and there's quite a lot of boilerplate.&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_eee775dbffd04b1385c373f787204896-1" name="rest_code_eee775dbffd04b1385c373f787204896-1" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-1"&gt;&lt;/a&gt;&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;optimize_INT_MUL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-2" name="rest_code_eee775dbffd04b1385c373f787204896-2" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-2"&gt;&lt;/a&gt; &lt;span class="n"&gt;arg0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_box_replacement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-3" name="rest_code_eee775dbffd04b1385c373f787204896-3" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-3"&gt;&lt;/a&gt; &lt;span class="n"&gt;b0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getintbound&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-4" name="rest_code_eee775dbffd04b1385c373f787204896-4" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-4"&gt;&lt;/a&gt; &lt;span class="n"&gt;arg1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_box_replacement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-5" name="rest_code_eee775dbffd04b1385c373f787204896-5" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-5"&gt;&lt;/a&gt; &lt;span class="n"&gt;b1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getintbound&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-6" name="rest_code_eee775dbffd04b1385c373f787204896-6" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-6"&gt;&lt;/a&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-7" name="rest_code_eee775dbffd04b1385c373f787204896-7" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-7"&gt;&lt;/a&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;b0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;known_eq_const&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-8" name="rest_code_eee775dbffd04b1385c373f787204896-8" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-8"&gt;&lt;/a&gt; &lt;span class="c1"&gt;# 1 * x == x&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-9" name="rest_code_eee775dbffd04b1385c373f787204896-9" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-9"&gt;&lt;/a&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;make_equal_to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-10" name="rest_code_eee775dbffd04b1385c373f787204896-10" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-10"&gt;&lt;/a&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;b1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;known_eq_const&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-11" name="rest_code_eee775dbffd04b1385c373f787204896-11" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-11"&gt;&lt;/a&gt; &lt;span class="c1"&gt;# x * 1 == x&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-12" name="rest_code_eee775dbffd04b1385c373f787204896-12" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-12"&gt;&lt;/a&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;make_equal_to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-13" name="rest_code_eee775dbffd04b1385c373f787204896-13" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-13"&gt;&lt;/a&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;b0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;known_eq_const&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;b1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;known_eq_const&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-14" name="rest_code_eee775dbffd04b1385c373f787204896-14" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-14"&gt;&lt;/a&gt; &lt;span class="c1"&gt;# 0 * x == x * 0 == 0&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-15" name="rest_code_eee775dbffd04b1385c373f787204896-15" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-15"&gt;&lt;/a&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;make_constant_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-16" name="rest_code_eee775dbffd04b1385c373f787204896-16" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-16"&gt;&lt;/a&gt; &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-17" name="rest_code_eee775dbffd04b1385c373f787204896-17" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-17"&gt;&lt;/a&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;lhs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rhs&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg0&lt;/span&gt;&lt;span class="p"&gt;)]:&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-18" name="rest_code_eee775dbffd04b1385c373f787204896-18" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-18"&gt;&lt;/a&gt; &lt;span class="n"&gt;lh_info&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getintbound&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lhs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-19" name="rest_code_eee775dbffd04b1385c373f787204896-19" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-19"&gt;&lt;/a&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;lh_info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_constant&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-20" name="rest_code_eee775dbffd04b1385c373f787204896-20" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-20"&gt;&lt;/a&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lh_info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_constant_int&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-21" name="rest_code_eee775dbffd04b1385c373f787204896-21" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-21"&gt;&lt;/a&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-22" name="rest_code_eee775dbffd04b1385c373f787204896-22" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-22"&gt;&lt;/a&gt; &lt;span class="c1"&gt;# x * (2 ** c) == x &amp;lt;&amp;lt; c&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-23" name="rest_code_eee775dbffd04b1385c373f787204896-23" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-23"&gt;&lt;/a&gt; &lt;span class="n"&gt;new_rhs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ConstInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;highest_bit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lh_info&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_constant_int&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-24" name="rest_code_eee775dbffd04b1385c373f787204896-24" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-24"&gt;&lt;/a&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace_op_with&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rop&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INT_LSHIFT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;rhs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_rhs&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-25" name="rest_code_eee775dbffd04b1385c373f787204896-25" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-25"&gt;&lt;/a&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send_extra_operation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-26" name="rest_code_eee775dbffd04b1385c373f787204896-26" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-26"&gt;&lt;/a&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-27" name="rest_code_eee775dbffd04b1385c373f787204896-27" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-27"&gt;&lt;/a&gt; &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-28" name="rest_code_eee775dbffd04b1385c373f787204896-28" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-28"&gt;&lt;/a&gt; &lt;span class="c1"&gt;# x * -1 == -x&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-29" name="rest_code_eee775dbffd04b1385c373f787204896-29" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-29"&gt;&lt;/a&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace_op_with&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rop&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INT_NEG&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;rhs&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-30" name="rest_code_eee775dbffd04b1385c373f787204896-30" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-30"&gt;&lt;/a&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send_extra_operation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-31" name="rest_code_eee775dbffd04b1385c373f787204896-31" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-31"&gt;&lt;/a&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-32" name="rest_code_eee775dbffd04b1385c373f787204896-32" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-32"&gt;&lt;/a&gt; &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-33" name="rest_code_eee775dbffd04b1385c373f787204896-33" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-33"&gt;&lt;/a&gt; &lt;span class="c1"&gt;# x * (1 &amp;lt;&amp;lt; y) == x &amp;lt;&amp;lt; y&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-34" name="rest_code_eee775dbffd04b1385c373f787204896-34" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-34"&gt;&lt;/a&gt; &lt;span class="n"&gt;shiftop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;as_operation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;get_box_replacement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lhs&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;rop&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INT_LSHIFT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-35" name="rest_code_eee775dbffd04b1385c373f787204896-35" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-35"&gt;&lt;/a&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;shiftop&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-36" name="rest_code_eee775dbffd04b1385c373f787204896-36" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-36"&gt;&lt;/a&gt; &lt;span class="k"&gt;continue&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-37" name="rest_code_eee775dbffd04b1385c373f787204896-37" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-37"&gt;&lt;/a&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;shiftop&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_constant&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;shiftop&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getint&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-38" name="rest_code_eee775dbffd04b1385c373f787204896-38" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-38"&gt;&lt;/a&gt; &lt;span class="k"&gt;continue&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-39" name="rest_code_eee775dbffd04b1385c373f787204896-39" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-39"&gt;&lt;/a&gt; &lt;span class="n"&gt;shiftvar&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_box_replacement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shiftop&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getarg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-40" name="rest_code_eee775dbffd04b1385c373f787204896-40" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-40"&gt;&lt;/a&gt; &lt;span class="n"&gt;shiftbound&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getintbound&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shiftvar&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-41" name="rest_code_eee775dbffd04b1385c373f787204896-41" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-41"&gt;&lt;/a&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;shiftbound&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;known_nonnegative&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;shiftbound&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;known_lt_const&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LONG_BIT&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-42" name="rest_code_eee775dbffd04b1385c373f787204896-42" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-42"&gt;&lt;/a&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace_op_with&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-43" name="rest_code_eee775dbffd04b1385c373f787204896-43" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-43"&gt;&lt;/a&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rop&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INT_LSHIFT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;rhs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shiftvar&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-44" name="rest_code_eee775dbffd04b1385c373f787204896-44" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-44"&gt;&lt;/a&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send_extra_operation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-45" name="rest_code_eee775dbffd04b1385c373f787204896-45" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-45"&gt;&lt;/a&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;a id="rest_code_eee775dbffd04b1385c373f787204896-46" name="rest_code_eee775dbffd04b1385c373f787204896-46" href="https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html#rest_code_eee775dbffd04b1385c373f787204896-46"&gt;&lt;/a&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;Adding more rules to these functions is very tedious and gets super confusing when the functions get bigger. In addition I am always worried about making mistakes when writing this kind of code, and there is no feedback at all about which of these rules are actually applied a lot in real programs.&lt;/p&gt; &lt;p&gt;Therefore I decided to write a small domain specific language with the goal of expressing these rules in a more declarative way. In the rest of the post I'll describe the DSL (most of that description is adapted from the &lt;a class="reference external" href="https://rpython.readthedocs.io/en/latest/jit/ruleopt.html"&gt;documentation&lt;/a&gt; about it that I wrote).&lt;/p&gt; &lt;/section&gt; &lt;section id="the-peephole-rule-dsl"&gt; &lt;h2&gt;The Peephole Rule DSL&lt;/h2&gt; &lt;section id="simple-transformation-rules"&gt; &lt;h3&gt;Simple transformation rules&lt;/h3&gt; &lt;p&gt;The rules in the DSL specify how integer operation can be transformed into cheaper other integer operations. A rule always consists of a name, a pattern, and a target. Here's a simple rule:&lt;/p&gt; &lt;pre class="literal-block"&gt;add_zero: int_add(x, 0) =&amp;gt; x&lt;/pre&gt; &lt;p&gt;The name of the rule is &lt;code class="docutils literal"&gt;add_zero&lt;/code&gt;. It matches operations in the trace of the form &lt;code class="docutils literal"&gt;int_add(x, 0)&lt;/code&gt;, where &lt;code class="docutils literal"&gt;x&lt;/code&gt; will match anything and &lt;code class="docutils literal"&gt;0&lt;/code&gt; will match only the constant zero. After the &lt;code class="docutils literal"&gt;=&amp;gt;&lt;/code&gt; arrow is the target of the rewrite, i.e. what the operation is rewritten to, in this case &lt;code class="docutils literal"&gt;x&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;The rule language has a list of which of the operations are commutative, so &lt;code class="docutils literal"&gt;add_zero&lt;/code&gt; will also optimize &lt;code class="docutils literal"&gt;int_add(0, x)&lt;/code&gt; to &lt;code class="docutils literal"&gt;x&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;Variables in the pattern can repeat:&lt;/p&gt; &lt;pre class="literal-block"&gt;sub_x_x: int_sub(x, x) =&amp;gt; 0&lt;/pre&gt; &lt;p&gt;This rule matches against &lt;code class="docutils literal"&gt;int_sub&lt;/code&gt; operations where the two arguments are the same (either the same box, or the same constant).&lt;/p&gt; &lt;p&gt;Here's a rule with a more complicated pattern:&lt;/p&gt; &lt;pre class="literal-block"&gt;sub_add: int_sub(int_add(x, y), y) =&amp;gt; x&lt;/pre&gt; &lt;p&gt;This pattern matches &lt;code class="docutils literal"&gt;int_sub&lt;/code&gt; operations, where the first argument was produced by an &lt;code class="docutils literal"&gt;int_add&lt;/code&gt; operation. In addition, one of the arguments of the addition has to be the same as the second argument of the subtraction.&lt;/p&gt; &lt;p&gt;The constants &lt;code class="docutils literal"&gt;MININT&lt;/code&gt;, &lt;code class="docutils literal"&gt;MAXINT&lt;/code&gt; and &lt;code class="docutils literal"&gt;LONG_BIT&lt;/code&gt; (which is either 32 or 64, depending on which platform the JIT is built for) can be used in rules, they behave like writing numbers but allow bit-width-independent formulations:&lt;/p&gt; &lt;pre class="literal-block"&gt;is_true_and_minint: int_is_true(int_and(x, MININT)) =&amp;gt; int_lt(x, 0)&lt;/pre&gt; &lt;p&gt;It is also possible to have a pattern where some arguments needs to be a constant, without specifying which constant. Those patterns look like this:&lt;/p&gt; &lt;pre class="literal-block"&gt;sub_add_consts: int_sub(int_add(x, C1), C2) # incomplete # more goes here =&amp;gt; int_sub(x, C)&lt;/pre&gt; &lt;p&gt;Variables in the pattern that start with a &lt;code class="docutils literal"&gt;C&lt;/code&gt; match against constants only. However, in this current form the rule is incomplete, because the variable &lt;code class="docutils literal"&gt;C&lt;/code&gt; that is being used in the target operation is not defined anywhere. We will see how to compute it in the next section.&lt;/p&gt; &lt;/section&gt; &lt;section id="computing-constants-and-other-intermediate-results"&gt; &lt;h3&gt;Computing constants and other intermediate results&lt;/h3&gt; &lt;p&gt;Sometimes it is necessary to compute intermediate results that are used in the target operation. To do that, there can be extra assignments between the rule head and the rule target.:&lt;/p&gt; &lt;pre class="literal-block"&gt;sub_add_consts: int_sub(int_add(x, C1), C2) # incomplete C = C1 + C2 =&amp;gt; int_sub(x, C)&lt;/pre&gt; &lt;p&gt;The right hand side of such an assignment is a subset of Python syntax, supporting arithmetic using &lt;code class="docutils literal"&gt;+&lt;/code&gt;, &lt;code class="docutils literal"&gt;-&lt;/code&gt;, &lt;code class="docutils literal"&gt;*&lt;/code&gt;, and certain helper functions. However, the syntax allows you to be explicit about unsignedness for some operations. E.g. &lt;code class="docutils literal"&gt;&amp;gt;&amp;gt;u&lt;/code&gt; exists for unsigned right shifts (and I plan to add &lt;code class="docutils literal"&gt;&amp;gt;u&lt;/code&gt;, &lt;code class="docutils literal"&gt;&amp;gt;=u&lt;/code&gt;, &lt;code class="docutils literal"&gt;&amp;lt;u&lt;/code&gt;, &lt;code class="docutils literal"&gt;&amp;lt;=u&lt;/code&gt; for comparisons).&lt;/p&gt; &lt;p&gt;Here's an example of a rule that uses &lt;code class="docutils literal"&gt;&amp;gt;&amp;gt;u&lt;/code&gt;:&lt;/p&gt; &lt;pre class="literal-block"&gt;urshift_lshift_x_c_c: uint_rshift(int_lshift(x, C), C) mask = (-1 &amp;lt;&amp;lt; C) &amp;gt;&amp;gt;u C =&amp;gt; int_and(x, mask)&lt;/pre&gt; &lt;/section&gt; &lt;section id="checks"&gt; &lt;h3&gt;Checks&lt;/h3&gt; &lt;p&gt;Some rewrites are only true under certain conditions. For example, &lt;code class="docutils literal"&gt;int_eq(x, 1)&lt;/code&gt; can be rewritten to &lt;code class="docutils literal"&gt;x&lt;/code&gt;, if &lt;code class="docutils literal"&gt;x&lt;/code&gt; is known to store a boolean value. This can be expressed with &lt;em&gt;checks&lt;/em&gt;:&lt;/p&gt; &lt;pre class="literal-block"&gt;eq_one: int_eq(x, 1) check x.is_bool() =&amp;gt; x&lt;/pre&gt; &lt;p&gt;A check is followed by a boolean expression. The variables from the pattern can be used as &lt;code class="docutils literal"&gt;IntBound&lt;/code&gt; instances in checks (and also in assignments) to find out what the abstract interpretation of the JIT knows about the value of a trace variable (&lt;code class="docutils literal"&gt;IntBound&lt;/code&gt; is the name of the abstract domain that the JIT uses for integers, despite the fact that it also stores &lt;a class="reference external" href="https://www.pypy.org/posts/2024/08/toy-knownbits.html"&gt;knownbits&lt;/a&gt; information nowadays).&lt;/p&gt; &lt;p&gt;Here's another example:&lt;/p&gt; &lt;pre class="literal-block"&gt;mul_lshift: int_mul(x, int_lshift(1, y)) check y.known_ge_const(0) and y.known_le_const(LONG_BIT) =&amp;gt; int_lshift(x, y)&lt;/pre&gt; &lt;p&gt;It expresses that &lt;code class="docutils literal"&gt;x * (1 &amp;lt;&amp;lt; y)&lt;/code&gt; can be rewritten to &lt;code class="docutils literal"&gt;x &amp;lt;&amp;lt; y&lt;/code&gt; but checks that &lt;code class="docutils literal"&gt;y&lt;/code&gt; is known to be between &lt;code class="docutils literal"&gt;0&lt;/code&gt; and &lt;code class="docutils literal"&gt;LONG_BIT&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;Checks and assignments can be repeated and combined with each other:&lt;/p&gt; &lt;pre class="literal-block"&gt;mul_pow2_const: int_mul(x, C) check C &amp;gt; 0 and C &amp;amp; (C - 1) == 0 shift = highest_bit(C) =&amp;gt; int_lshift(x, shift)&lt;/pre&gt; &lt;p&gt;In addition to calling methods on &lt;code class="docutils literal"&gt;IntBound&lt;/code&gt; instances, it's also possible to access their attributes, like in this rule:&lt;/p&gt; &lt;pre class="literal-block"&gt;and_x_c_in_range: int_and(x, C) check x.lower &amp;gt;= 0 and x.upper &amp;lt;= C &amp;amp; ~(C + 1) =&amp;gt; x&lt;/pre&gt; &lt;/section&gt; &lt;section id="rule-ordering-and-liveness"&gt; &lt;h3&gt;Rule Ordering and Liveness&lt;/h3&gt; &lt;p&gt;The generated optimizer code will give preference to applying rules that produce a constant or a variable as a rewrite result. Only if none of those match do rules that produce new result operations get applied. For example, the rules &lt;code class="docutils literal"&gt;sub_x_x&lt;/code&gt; and &lt;code class="docutils literal"&gt;sub_add&lt;/code&gt; are tried before trying &lt;code class="docutils literal"&gt;sub_add_consts&lt;/code&gt;, because the former two rules optimize to a constant and a variable respectively, while the latter produces a new operation as the result.&lt;/p&gt; &lt;p&gt;The rule &lt;code class="docutils literal"&gt;sub_add_consts&lt;/code&gt; has a possible problem, which is that if the intermediate result of the &lt;code class="docutils literal"&gt;int_add&lt;/code&gt; operation in the rule head is used by some other operations, then the &lt;code class="docutils literal"&gt;sub_add_consts&lt;/code&gt; rule does not actually reduce the number of operations (and might actually make things slightly worse due to increased register pressure). However, currently it would be extremely hard to take that kind of information into account in the optimization pass of the JIT, so we optimistically apply the rules anyway.&lt;/p&gt; &lt;/section&gt; &lt;section id="checking-rule-coverage"&gt; &lt;h3&gt;Checking rule coverage&lt;/h3&gt; &lt;p&gt;Every rewrite rule should have at least one unit test where it triggers. To ensure this, the &lt;a class="reference external" href="https://github.com/pypy/pypy/blob/d92d0bfd38318ede1cbaadadafd77da69d431fad/rpython/jit/metainterp/optimizeopt/test/test_optimizeintbound.py"&gt;unit test file that mainly checks integer optimizations&lt;/a&gt; in the JIT has an assert at the end of a test run, that every rule fired at least once.&lt;/p&gt; &lt;/section&gt; &lt;section id="printing-rule-statistics"&gt; &lt;h3&gt;Printing rule statistics&lt;/h3&gt; &lt;p&gt;The JIT can print statistics about which rule fired how often in the &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;jit-intbounds-stats&lt;/span&gt;&lt;/code&gt; logging category, using the &lt;a class="reference external" href="https://rpython.readthedocs.io/en/latest/logging.html"&gt;PYPYLOG&lt;/a&gt; mechanism. For example, to print the category to stdout at the end of program execution, run PyPy like this:&lt;/p&gt; &lt;pre class="literal-block"&gt;PYPYLOG=jit-intbounds-stats:- pypy ...&lt;/pre&gt; &lt;p&gt;The output of that will look something like this:&lt;/p&gt; &lt;pre class="literal-block"&gt;int_add add_reassoc_consts 2514 add_zero 107008 int_sub sub_zero 31519 sub_from_zero 523 sub_x_x 3153 sub_add_consts 159 sub_add 55 sub_sub_x_c_c 1752 sub_sub_c_x_c 0 sub_xor_x_y_y 0 sub_or_x_y_y 0 int_mul mul_zero 0 mul_one 110 mul_minus_one 0 mul_pow2_const 1456 mul_lshift 0 ...&lt;/pre&gt; &lt;/section&gt; &lt;section id="termination-and-confluence"&gt; &lt;h3&gt;Termination and Confluence&lt;/h3&gt; &lt;p&gt;Right now there are unfortunately no checks that the rules actually rewrite operations towards "simpler" forms. There is no cost model, and also nothing that prevents you from writing a rule like this:&lt;/p&gt; &lt;pre class="literal-block"&gt;neg_complication: int_neg(x) # leads to infinite rewrites =&amp;gt; int_mul(-1, x)&lt;/pre&gt; &lt;p&gt;Doing this would lead to endless rewrites if there is also another rule that turns multiplication with -1 into negation.&lt;/p&gt; &lt;p&gt;There is also no checking for &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Confluence_(abstract_rewriting)"&gt;confluence&lt;/a&gt; (yet?), i.e. the property that all rewrites starting from the same input trace always lead to the same output trace, no matter in which order the rules are applied.&lt;/p&gt; &lt;/section&gt; &lt;section id="proofs"&gt; &lt;h3&gt;Proofs&lt;/h3&gt; &lt;p&gt;It is very easy to write a peephole rule that is not correct in all corner cases. Therefore all the rules are proven correct with Z3 before compiled into actual JIT code, by default. When the proof fails, a (hopefully minimal) counterexample is printed. The counterexample consists of values for all the inputs that fulfil the checks, values for the intermediate expressions, and then two &lt;em&gt;different&lt;/em&gt; values for the source and the target operations.&lt;/p&gt; &lt;p&gt;E.g. if we try to add the incorrect rule:&lt;/p&gt; &lt;pre class="literal-block"&gt;mul_is_add: int_mul(a, b) =&amp;gt; int_add(a, b)&lt;/pre&gt; &lt;p&gt;We get the following counterexample as output:&lt;/p&gt; &lt;pre class="literal-block"&gt;Could not prove correctness of rule 'mul_is_add' in line 1 counterexample given by Z3: counterexample values: a: 0 b: 1 operation int_mul(a, b) with Z3 formula a*b has counterexample result vale: 0 BUT target expression: int_add(a, b) with Z3 formula a + b has counterexample value: 1&lt;/pre&gt; &lt;p&gt;If we add conditions, they are taken into account and the counterexample will fulfil the conditions:&lt;/p&gt; &lt;pre class="literal-block"&gt;mul_is_add: int_mul(a, b) check a.known_gt_const(1) and b.known_gt_const(2) =&amp;gt; int_add(a, b)&lt;/pre&gt; &lt;p&gt;This leads to the following counterexample:&lt;/p&gt; &lt;pre class="literal-block"&gt;Could not prove correctness of rule 'mul_is_add' in line 46 counterexample given by Z3: counterexample values: a: 2 b: 3 operation int_mul(a, b) with Z3 formula a*b has counterexample result vale: 6 BUT target expression: int_add(a, b) with Z3 formula a + b has counterexample value: 5&lt;/pre&gt; &lt;p&gt;Some &lt;code class="docutils literal"&gt;IntBound&lt;/code&gt; methods cannot be used in Z3 proofs because their &lt;a class="reference external" href="https://www.pypy.org/posts/2024/08/toy-knownbits.html#cases-where-this-style-of-z3-proof-doesnt-work)."&gt;control flow is too complex&lt;/a&gt;. If that is the case, they can have Z3-equivalent formulations defined (in every case this is done, it's a potential proof hole if the Z3 friendly reformulation and the real implementation differ from each other, therefore extra care is required to make very sure they are equivalent).&lt;/p&gt; &lt;p&gt;It's possible to skip the proof of individual rules entirely by adding &lt;code class="docutils literal"&gt;SORRY_Z3&lt;/code&gt; to its body (but we should try not to do that too often):&lt;/p&gt; &lt;pre class="literal-block"&gt;eq_different_knownbits: int_eq(x, y) SORRY_Z3 check x.known_ne(y) =&amp;gt; 0&lt;/pre&gt; &lt;/section&gt; &lt;section id="checking-for-satisfiability"&gt; &lt;h3&gt;Checking for satisfiability&lt;/h3&gt; &lt;p&gt;In addition to checking whether the rule yields a correct optimization, we also check whether the rule can ever apply. This ensures that there are &lt;em&gt;some&lt;/em&gt; runtime values that would fulfil all the checks in a rule. Here's an example of a rule violating this:&lt;/p&gt; &lt;pre class="literal-block"&gt;never_applies: int_is_true(x) check x.known_lt_const(0) and x.known_gt_const(0) # impossible condition, always False =&amp;gt; x&lt;/pre&gt; &lt;p&gt;Right now the error messages if this goes wrong are not completely easy to understand. I hope to be able to improve this later:&lt;/p&gt; &lt;pre class="literal-block"&gt;Rule 'never_applies' cannot ever apply in line 1 Z3 did not manage to find values for variables x such that the following condition becomes True: And(x &amp;lt;= x_upper, x_lower &amp;lt;= x, If(x_upper &amp;lt; 0, x_lower &amp;gt; 0, x_upper &amp;lt; 0))&lt;/pre&gt; &lt;/section&gt; &lt;section id="implementation-notes"&gt; &lt;h3&gt;Implementation Notes&lt;/h3&gt; &lt;p&gt;The implementation of the DSL is done in a relatively ad-hoc manner. It is parsed using &lt;a class="reference external" href="https://rply.readthedocs.io/"&gt;rply&lt;/a&gt;, there's a small type checker that tries to find common problems in how the rules are written. Z3 is used via the Python API, like in the previous blog posts that are using it. The pattern matching RPython code is generated using an approach inspired by Luc Maranget's paper &lt;a class="reference external" href="http://moscova.inria.fr/~maranget/papers/ml05e-maranget.pdf"&gt;Compiling Pattern Matching to Good Decision Trees&lt;/a&gt;. See &lt;a class="reference external" href="https://compiler.club/compiling-pattern-matching/"&gt;this blog post&lt;/a&gt; for an approachable introduction.&lt;/p&gt; &lt;/section&gt; &lt;/section&gt; &lt;section id="conclusion"&gt; &lt;h2&gt;Conclusion&lt;/h2&gt; &lt;p&gt;Now that I've described the DSL, here are the rules that are equivalent to the imperative code in the motivation section:&lt;/p&gt; &lt;pre class="literal-block"&gt;mul_zero: int_mul(x, 0) =&amp;gt; 0 mul_one: int_mul(x, 1) =&amp;gt; x mul_minus_one: int_mul(x, -1) =&amp;gt; int_neg(x) mul_pow2_const: int_mul(x, C) check C &amp;gt; 0 and C &amp;amp; (C - 1) == 0 shift = highest_bit(C) =&amp;gt; int_lshift(x, shift) mul_lshift: int_mul(x, int_lshift(1, y)) check y.known_ge_const(0) and y.known_le_const(LONG_BIT) =&amp;gt; int_lshift(x, y)&lt;/pre&gt; &lt;p&gt;The current status of the DSL is that it got merged to PyPy's main branch. I rewrote a part of the integer rewrites &lt;a class="reference external" href="https://github.com/pypy/pypy/blob/d92d0bfd38318ede1cbaadadafd77da69d431fad/rpython/jit/metainterp/ruleopt/real.rules"&gt;into the DSL&lt;/a&gt;, but some are still in the old imperative style (mostly for complicated reasons, the easily ported ones are all done). Since I've only been porting optimizations that had existed prior to the existence of the DSL, performance numbers of benchmarks didn't change.&lt;/p&gt; &lt;p&gt;There are a number of features that are still missing and some possible extensions that I plan to work on in the future:&lt;/p&gt; &lt;ul class="simple"&gt; &lt;li&gt;&lt;p&gt;All the integer operations that the DSL handles so far are the variants that do not check for overflow (or where overflow was proven to be impossible to happen). In regular Python code the overflow-checking variants &lt;cite&gt;int_add_ovf&lt;/cite&gt; etc are much more common, but the DSL doesn't support them yet. I plan to fix this, but don't completely understand how the correctness proofs for them should be done correctly.&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;A related problem is that I don't understand what it means for a rewrite to be correct if some of the operations are only defined for a subset of the input values. E.g. division isn't defined if the divisor is zero. In theory, a division operation in the trace should always be preceded by a check that the divisor isn't zero. But sometimes other optimization move the check around and the connection to the division gets lost or muddled. What optimizations can we still safely perform on the division? There's lots of prior work on this question, but I still don't understand what the correct approach in our context would be.&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;Ordering comparisons like &lt;code class="docutils literal"&gt;int_lt&lt;/code&gt;, &lt;code class="docutils literal"&gt;int_le&lt;/code&gt; and their unsigned variants are not ported to the DSL yet. Comparisons are an area where the JIT is not super good yet at optimizing away operations. This is a pretty big topic and I've started a project with Nico Rittinghaus to try to improve the situation a bit more generally.&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;A more advanced direction of work would be to implement a simplified form of &lt;a class="reference external" href="https://egraphs-good.github.io/"&gt;e-graphs&lt;/a&gt; (or &lt;a class="reference external" href="https://vimeo.com/843540328"&gt;ae-graphs&lt;/a&gt;). The JIT has like half of an e-graph data structure already, and we probably can't afford a full one in terms of compile time costs, but maybe we can have two thirds or something?&lt;/p&gt;&lt;/li&gt; &lt;/ul&gt; &lt;/section&gt; &lt;section id="acknowledgements"&gt; &lt;h2&gt;Acknowledgements&lt;/h2&gt; &lt;p&gt;Thank you to &lt;a class="reference external" href="https://bernsteinbear.com/"&gt;Max Bernstein&lt;/a&gt; and &lt;a class="reference external" href="https://martinfriedrichberger.net/"&gt;Martin Berger&lt;/a&gt; for super helpful feedback on drafts of the post!&lt;/p&gt; &lt;/section&gt;</description><category>jit</category><category>z3</category><guid>https://www.pypy.org/posts/2024/10/jit-peephole-dsl.html</guid><pubDate>Wed, 23 Oct 2024 15:00:00 GMT</pubDate></item><item><title>Guest Post: How PortaOne uses PyPy for high-performance processing, connecting over 1B of phone calls every month</title><link>https://www.pypy.org/posts/2024/08/portaone.html</link><dc:creator>The PyPy Team</dc:creator><description>&lt;p&gt;The PyPy project is always happy to hear about industrial use and deployments of PyPy. For the &lt;a href="https://www.pypy.org/posts/2024/03/fixing-bug-incremental-gc.html"&gt;GC bug finding&lt;/a&gt; task earlier this year, we collaborated with PortaOne and we're super happy that Serhii Titov, head of the QA department at PortaOne, was up to writing this guest post to describe their use and experience with the project.&lt;/p&gt; &lt;hr&gt; &lt;h3 id="what-does-portaone-do"&gt;What does PortaOne do?&lt;/h3&gt; &lt;p&gt;We at &lt;a href="https://www.portaone.com/"&gt;PortaOne Inc.&lt;/a&gt; allow telecom operators to launch new services (or provide existing services more efficiently) using our VoIP platform (PortaSIP) and our real-time charging system (PortaBilling), which provides additional features for cloud PBX, such as call transfer, queues, interactive voice response (IVR) and more. At this moment our support team manages several thousand servers with our software installed in 100 countries, through which over 500 telecommunication service providers connect millions of end users every day. The unique thing about PortaOne is that we supply the source code of our product to our customers - something unheard of in the telecom world! Thus we attract "telco innovators", who use our APIs to build around the system and the source code to create unique tweaks of functionality, which produces amazing products.&lt;/p&gt; &lt;p&gt;At the core of PortaSIP is the middle-ware component (the proper name for it is "B2BUA", but that probably does not say much to anyone outside of experts in VoIP), which implements the actual handling of SIP calls, messages, etc. and all added features (for instance, trying to send a call via telco operators through which the cost per minute is lower). It has to be fast (since even a small delay in establishing a call is noticed by a customer), reliable (everyone hates when a call drops or cannot be completed) and yet easily expandable with new functionality. This is why we decided to use Python as opposed to C/C++ or similar programming languages, which are often used in telecom equipment.&lt;/p&gt; &lt;p&gt;The B2BUA component is a batch of similar Python processes that are looped inside a &lt;a href="https://docs.python.org/3.10/library/asyncore.html"&gt;&lt;code&gt;asyncore.dispatcher&lt;/code&gt;&lt;/a&gt; wrapper. The load balancing between these Python processes is done by our stateless SIP proxy server written in C++. All our sockets are served by this B2BUA. We have our custom client-wrappers around &lt;code&gt;pymysql&lt;/code&gt;, &lt;code&gt;redis&lt;/code&gt;, &lt;code&gt;cassandra-driver&lt;/code&gt; and &lt;code&gt;requests&lt;/code&gt; to communicate with external services. Some of the Python processes use &lt;a href="https://cffi.readthedocs.io/en/stable/"&gt;&lt;code&gt;cffi&lt;/code&gt;&lt;/a&gt; wrappers around C-code to improve their performance (examples: an Oracle DB driver, a client to a radius server, a custom C logger).&lt;/p&gt; &lt;p&gt;The I/O operations that block the main thread of the Python processes are processed in sub-threads. We have custom wrappers around &lt;code&gt;threading.Thread&lt;/code&gt; and also &lt;code&gt;asyncore.dispatcher&lt;/code&gt;. The results of such operations are returned to the main thread.&lt;/p&gt; &lt;h3 id="improving-our-performance-with-pypy"&gt;Improving our performance with PyPy&lt;/h3&gt; &lt;p&gt;We started with CPython and then in 2014 switched to PyPy because it was faster. Here's an exact quote from our first testing notes: "PyPy gives significant performance boost, ~50%". Nowadays, after years of changes in all the software involved, PyPy still gives us +50% boost compared to CPython.&lt;/p&gt; &lt;p&gt;Taking care of real time traffic for so many people around the globe is something we're really proud of. I hope the PyPy team can be proud of it as well, as the PyPy product is a part of this solution.&lt;/p&gt; &lt;h3 id="finding-a-garbage-collector-bug-stage-1-the-gc-hooks"&gt;Finding a garbage collector bug: stage 1, the GC hooks&lt;/h3&gt; &lt;p&gt;However our path with PyPy wasn't perfectly smooth. There were very rare cases of crashes on PyPy that we weren't able to catch. That's because to make coredump useful we needed to switch to PyPy with debug, but we cannot let it run in that mode on a production system for an extended period of time, and we did not have any STR (steps-to-reproduce) to make PyPy crash again in our lab. That's why we kept (and still keep) both interpreters installed just in case, and we would switch to CPython if we noticed it happening.&lt;/p&gt; &lt;p&gt;At the time of updating PyPy from 3.5 to 3.6 our QA started noticing those crashes more often, but we still had no luck with STR or collecting proper coredumps with debug symbols. Then it became even worse after our development played with the &lt;a href="https://doc.pypy.org/en/latest/gc_info.html"&gt;Garbage Collector's options&lt;/a&gt; to increase performance of our middleware component. The crashes started to affect our regular performance testing (controlled by QA manager Yevhenii Bovda). At that point it was decided that we can no longer live like that and so we started an intense investigation.&lt;/p&gt; &lt;p&gt;During the first stage of our investigation (following the best practice of troubleshooting) we narrowed down the issue as much as we could. So, it was not our code, it was definitely somewhere in PyPy. Eventually our SIP software engineer &lt;a href="https://github.com/Yevhenii-Yatchenko"&gt;Yevhenii Yatchenko&lt;/a&gt; found out that this bug is connected with the use of our &lt;a href="https://doc.pypy.org/en/latest/gc_info.html#gc-hooks"&gt;custom hooks in the GC&lt;/a&gt;. Yevhenii created ticket &lt;a href="https://github.com/pypy/pypy/issues/4899"&gt;#4899&lt;/a&gt; and within 2-3 days we got a fix from a &lt;a href="https://github.com/cfbolz"&gt;member of the PyPy team&lt;/a&gt;, in true open-source fashion.&lt;/p&gt; &lt;h3 id="finding-a-garbage-collector-bug-stage-2-the-real-bug"&gt;Finding a garbage collector bug: stage 2, the real bug&lt;/h3&gt; &lt;p&gt;Then came stage 2. In parallel with the previous ticket, Yevhenii created &lt;a href="https://github.com/pypy/pypy/issues/4900"&gt;#4900&lt;/a&gt; that we still see failing with coredumps quite often, and they are not connected to GC custom hooks. In a nutshell, it took us dozens of back and forward emails, three Zoom sessions and four versions of a patch to solve the issue. During the last iteration we got a new set of options to try and a new version of the patch. Surprisingly, that helped! What a relief! So, the next logical step was to remove all debug options and run PyPy only with the patch. Unfortunately, it started to fail again and we came to the obvious conclusion that what will help us is not a patch, but one of options we were testing out. At that point we found out that &lt;a href="https://doc.pypy.org/en/latest/gc_info.html#environment-variables"&gt;&lt;code&gt;PYPY_GC_MAX_PINNED=0&lt;/code&gt;&lt;/a&gt; is a necessary and sufficient condition to solve our issue. This points to another bug in the garbage collector, somehow related to object pinning.&lt;/p&gt; &lt;p&gt;Here's our current state: we have to add &lt;code&gt;PYPY_GC_MAX_PINNED=0&lt;/code&gt;, but we do not face the crashes anymore.&lt;/p&gt; &lt;h3 id="conclusion-and-next-steps"&gt;Conclusion and next steps&lt;/h3&gt; &lt;p&gt;Gratitude is extended to Carl for his invaluable assistance in resolving the nasty bugss, because it seems we're the only ones who suffered from the last one and we really did not want to fall back to CPython due to its performance disadvantage.&lt;/p&gt; &lt;p&gt;Serhii Titov, head of the QA department at PortaOne Inc.&lt;/p&gt; &lt;p&gt;P.S. If you are a perfectionist and at this point you have mixed feelings and you are still bothered by the question "But there might still be a bug in the GC, what about that?" - Carl has some ideas about it and he will sort it out (we will help with the testing/verification part).&lt;/p&gt;</description><category>casestudy</category><category>guestpost</category><guid>https://www.pypy.org/posts/2024/08/portaone.html</guid><pubDate>Thu, 29 Aug 2024 09:00:00 GMT</pubDate></item><item><title>PyPy v7.3.17 release</title><link>https://www.pypy.org/posts/2024/08/pypy-v7317-release.html</link><dc:creator>mattip</dc:creator><description>&lt;section id="pypy-v7-3-17-release-of-python-2-7-and-3-10"&gt; &lt;h2&gt;PyPy v7.3.17: release of python 2.7 and 3.10&lt;/h2&gt; &lt;p&gt;The PyPy team is proud to release version 7.3.17 of PyPy.&lt;/p&gt; &lt;p&gt;This release includes a new &lt;a class="reference internal" href="https://www.pypy.org/posts/2024/08/pypy-v7317-release.html#risc-v-jit-backend"&gt;RISC-V JIT backend&lt;/a&gt;, an &lt;a class="reference internal" href="https://www.pypy.org/posts/2024/08/pypy-v7317-release.html#improved-repl"&gt;improved REPL&lt;/a&gt; based on work by the CPython team, and &lt;a class="reference internal" href="https://www.pypy.org/posts/2024/08/pypy-v7317-release.html#better-jit-optimizations"&gt;better JIT optimizations&lt;/a&gt; of integer operations. Special shout-outs to &lt;a class="reference external" href="https://github.com/loganchien"&gt;Logan Chien&lt;/a&gt; for the &lt;a class="reference external" href="https://github.com/pypy/pypy/pull/5002"&gt;RISC-V backend work&lt;/a&gt;, to &lt;a class="reference external" href="https://github.com/nirit100"&gt;Nico Rittinghaus&lt;/a&gt; for better integer optimization in the JIT, and the CPython team that has worked on the repl.&lt;/p&gt; &lt;p&gt;The release includes two different interpreters:&lt;/p&gt; &lt;ul class="simple"&gt; &lt;li&gt;&lt;p&gt;PyPy2.7, which is an interpreter supporting the syntax and the features of Python 2.7 including the stdlib for CPython 2.7.18+ (the &lt;code class="docutils literal"&gt;+&lt;/code&gt; is for backported security updates)&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;PyPy3.10, which is an interpreter supporting the syntax and the features of Python 3.10, including the stdlib for CPython 3.10.14.&lt;/p&gt;&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;The interpreters are based on much the same codebase, thus the dual release. This is a micro release, all APIs are compatible with the other 7.3 releases. It follows after 7.3.16 release on April 23, 2024.&lt;/p&gt; &lt;p&gt;We recommend updating. You can find links to download the releases here:&lt;/p&gt; &lt;blockquote&gt; &lt;p&gt;&lt;a class="reference external" href="https://pypy.org/download.html"&gt;https://pypy.org/download.html&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt; &lt;p&gt;We would like to thank our donors for the continued support of the PyPy project. If PyPy is not quite good enough for your needs, we are available for &lt;a class="reference external" href="https://www.pypy.org/pypy-sponsors.html"&gt;direct consulting&lt;/a&gt; work. If PyPy is helping you out, we would love to hear about it and encourage submissions to our &lt;a class="reference external" href="https://pypy.org/blog"&gt;blog&lt;/a&gt; via a pull request to &lt;a class="reference external" href="https://github.com/pypy/pypy.org"&gt;https://github.com/pypy/pypy.org&lt;/a&gt;&lt;/p&gt; &lt;p&gt;We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: bug fixes, &lt;a class="reference external" href="https://doc.pypy.org/"&gt;PyPy&lt;/a&gt; and &lt;a class="reference external" href="https://rpython.readthedocs.org"&gt;RPython&lt;/a&gt; documentation improvements, or general &lt;a class="reference external" href="https://doc.pypy.org/en/latest/project-ideas.html"&gt;help&lt;/a&gt; with making RPython's JIT even better.&lt;/p&gt; &lt;p&gt;If you are a python library maintainer and use C-extensions, please consider making a &lt;a class="reference external" href="https://hpyproject.org/"&gt;HPy&lt;/a&gt; / &lt;a class="reference external" href="https://cffi.readthedocs.io"&gt;CFFI&lt;/a&gt; / &lt;a class="reference external" href="https://cppyy.readthedocs.io"&gt;cppyy&lt;/a&gt; version of your library that would be performant on PyPy. In any case, both &lt;a class="reference external" href="https://github.com/joerick/cibuildwheel"&gt;cibuildwheel&lt;/a&gt; and the &lt;a class="reference external" href="https://github.com/matthew-brett/multibuild"&gt;multibuild system&lt;/a&gt; support building wheels for PyPy.&lt;/p&gt; &lt;section id="risc-v-backend-for-the-jit"&gt; &lt;span id="risc-v-jit-backend"&gt;&lt;/span&gt;&lt;h3&gt;RISC-V backend for the JIT&lt;/h3&gt; &lt;p&gt;PyPy's JIT has added support for generating 64-bit RISC-V machine code at runtime (RV64-IMAD, specifically). So far we are not releasing binaries for any RISC-V platforms, but there are &lt;a class="reference external" href="https://rpython.readthedocs.io/en/latest/riscv.html"&gt;instructions&lt;/a&gt; on how to cross-compile binaries.&lt;/p&gt; &lt;/section&gt; &lt;section id="repl-improvements"&gt; &lt;span id="improved-repl"&gt;&lt;/span&gt;&lt;h3&gt;REPL Improvements&lt;/h3&gt; &lt;p&gt;The biggest user-visible change of the release is new features in the repl of PyPy3.10. CPython 3.13 has adopted and extended PyPy's pure-Python repl, adding a number of features and fixing a number or bugs in the process. We have backported and added the following features:&lt;/p&gt; &lt;ul class="simple"&gt; &lt;li&gt;&lt;p&gt;Prompts and tracebacks use terminal colors, as well as &lt;a class="reference external" href="https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda"&gt;terminal hyperlinks&lt;/a&gt; for file names.&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Bracketed-paste"&gt;Bracketed paste&lt;/a&gt; enable pasting several lines of input into the terminal without auto-indentation getting in the way.&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;A special interactive help browser (F1), history browser (F2), explicit paste mode (F3).&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;Support for Ctrl-&amp;lt;left/right&amp;gt; to jump over whole words at a time.&lt;/p&gt;&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;See the &lt;a class="reference external" href="https://docs.python.org/3.13/whatsnew/3.13.html#a-better-interactive-interpreter"&gt;CPython documentation for further details&lt;/a&gt;. Thanks to Łukasz Langa, Pablo Galindo Salgado and the other CPython devs involved in this work.&lt;/p&gt; &lt;/section&gt; &lt;section id="better-jit-optimizations-of-integer-operations"&gt; &lt;span id="better-jit-optimizations"&gt;&lt;/span&gt;&lt;h3&gt;Better JIT optimizations of integer operations&lt;/h3&gt; &lt;p&gt;The optimizers of PyPy's JIT have become much better at reasoning about and optimizing integer operations. This is done with a new &lt;a class="reference external" href="https://pypy.org/posts/2024/08/toy-knownbits.html"&gt;"knownbits" abstract domain&lt;/a&gt;. In many programs that do bit-manipulation of integers, some of the bits of the integer variables of the program can be statically known. Here's a simple example:&lt;/p&gt; &lt;div class="code"&gt;&lt;pre class="code python"&gt;&lt;a id="rest_code_bf930ab507984607a20043060866170d-1" name="rest_code_bf930ab507984607a20043060866170d-1" href="https://www.pypy.org/posts/2024/08/pypy-v7317-release.html#rest_code_bf930ab507984607a20043060866170d-1"&gt;&lt;/a&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;a id="rest_code_bf930ab507984607a20043060866170d-2" name="rest_code_bf930ab507984607a20043060866170d-2" href="https://www.pypy.org/posts/2024/08/pypy-v7317-release.html#rest_code_bf930ab507984607a20043060866170d-2"&gt;&lt;/a&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;a id="rest_code_bf930ab507984607a20043060866170d-3" name="rest_code_bf930ab507984607a20043060866170d-3" href="https://www.pypy.org/posts/2024/08/pypy-v7317-release.html#rest_code_bf930ab507984607a20043060866170d-3"&gt;&lt;/a&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;a id="rest_code_bf930ab507984607a20043060866170d-4" name="rest_code_bf930ab507984607a20043060866170d-4" href="https://www.pypy.org/posts/2024/08/pypy-v7317-release.html#rest_code_bf930ab507984607a20043060866170d-4"&gt;&lt;/a&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;a id="rest_code_bf930ab507984607a20043060866170d-5" name="rest_code_bf930ab507984607a20043060866170d-5" href="https://www.pypy.org/posts/2024/08/pypy-v7317-release.html#rest_code_bf930ab507984607a20043060866170d-5"&gt;&lt;/a&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;a id="rest_code_bf930ab507984607a20043060866170d-6" name="rest_code_bf930ab507984607a20043060866170d-6" href="https://www.pypy.org/posts/2024/08/pypy-v7317-release.html#rest_code_bf930ab507984607a20043060866170d-6"&gt;&lt;/a&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;/pre&gt;&lt;/div&gt; &lt;p&gt;With the new abstract domain, the JIT can optimize the &lt;code class="docutils literal"&gt;if&lt;/code&gt;-condition to &lt;code class="docutils literal"&gt;True&lt;/code&gt;, because it already knows that the lowest bit of &lt;code class="docutils literal"&gt;x&lt;/code&gt; must be set. This optimization applies to all Python-integers that fit into a machine word (PyPy optimistically picks between two different representations for &lt;code class="docutils literal"&gt;int&lt;/code&gt;, depending on the size of the value). Unfortunately there is very little impact of this change on almost all Python code, because intensive bit-manipulation is rare in Python. However, the change leads to significant performance improvements in &lt;a class="reference external" href="https://docs.pydrofoil.org/en/latest/"&gt;Pydrofoil&lt;/a&gt; (the RPython-based RISC-V/ARM emulators that are automatically generated from high-level &lt;a class="reference external" href="https://github.com/rems-project/sail/"&gt;Sail&lt;/a&gt; specifications of the respective ISAs, and that use the RPython JIT to improve performance).&lt;/p&gt; &lt;/section&gt; &lt;section id="pypy-versions-and-speed-pypy-org"&gt; &lt;h3&gt;PyPy versions and speed.pypy.org&lt;/h3&gt; &lt;p&gt;The keen-eyed will have noticed no mention of Python version 3.9 in the releases above. Typically we will maintain only one version of Python3, but due to PyPy3.9 support on conda-forge we maintained multiple versions from the first release of PyPy3.10 in PyPy v7.3.12 (Dec 2022). Conda-forge is &lt;a class="reference external" href="https://pypy.org/posts/2024/08/conda-forge-proposes-dropping-support-for-pypy.html"&gt;sunsetting its PyPy support&lt;/a&gt;, which means we can drop PyPy3.9. Since that was the major driver of benchmarks at &lt;a class="reference external" href="https://speed.pypy.org"&gt;https://speed.pypy.org&lt;/a&gt;, we revamped the site to showcase PyPy3.9, PyPy3.10, and various versions of cpython on the home page. For historical reasons, the "baseline" for comparison is still cpython 3.7.19.&lt;/p&gt; &lt;p&gt;We will keep the buildbots building PyPY3.9 until the end of August, these builds will still be available on the &lt;a class="reference external" href="https://buildbot.pypy.org/nightly/"&gt;nightly builds&lt;/a&gt; tab of the buildbot.&lt;/p&gt; &lt;/section&gt; &lt;section id="what-is-pypy"&gt; &lt;h3&gt;What is PyPy?&lt;/h3&gt; &lt;p&gt;PyPy is a Python interpreter, a drop-in replacement for CPython It's fast (&lt;a class="reference external" href="https://speed.pypy.org"&gt;PyPy and CPython&lt;/a&gt; performance comparison) due to its integrated tracing JIT compiler.&lt;/p&gt; &lt;p&gt;We also welcome developers of other &lt;a class="reference external" href="https://rpython.readthedocs.io/en/latest/examples.html"&gt;dynamic languages&lt;/a&gt; to see what RPython can do for them.&lt;/p&gt; &lt;p&gt;We provide binary builds for:&lt;/p&gt; &lt;ul class="simple"&gt; &lt;li&gt;&lt;p&gt;&lt;strong&gt;x86&lt;/strong&gt; machines on most common operating systems (Linux 32/64 bits, Mac OS 64 bits, Windows 64 bits)&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;64-bit &lt;strong&gt;ARM&lt;/strong&gt; machines running Linux (&lt;code class="docutils literal"&gt;aarch64&lt;/code&gt;) and macos (&lt;code class="docutils literal"&gt;macos_arm64&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;PyPy supports Windows 32-bit, Linux PPC64 big- and little-endian, Linux ARM 32 bit, RISC-V RV64IMAFD Linux, and s390x Linux but does not release binaries. Please reach out to us if you wish to sponsor binary releases for those platforms. Downstream packagers provide binary builds for debian, Fedora, conda, OpenBSD, FreeBSD, Gentoo, and more.&lt;/p&gt; &lt;/section&gt; &lt;section id="what-else-is-new"&gt; &lt;h3&gt;What else is new?&lt;/h3&gt; &lt;p&gt;For more information about the 7.3.17 release, see the &lt;a class="reference external" href="https://doc.pypy.org/en/latest/release-v7.3.17.html#changelog"&gt;full changelog&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Please update, and continue to help us make pypy better.&lt;/p&gt; &lt;p&gt;Cheers, The PyPy Team&lt;/p&gt; &lt;/section&gt; &lt;/section&gt;</description><category>release</category><guid>https://www.pypy.org/posts/2024/08/pypy-v7317-release.html</guid><pubDate>Wed, 28 Aug 2024 12:22:08 GMT</pubDate></item><item><title>Conda-forge proposes sunsetting support for PyPy</title><link>https://www.pypy.org/posts/2024/08/conda-forge-proposes-dropping-support-for-pypy.html</link><dc:creator>mattip</dc:creator><description>&lt;p&gt;Conda-forge has kindly been providing support for PyPy since 2019. The conda-forge team has been very patient and generous with resources, but it seems the uptake of PyPy has not justified the effort. Major packages still are not &lt;a href="https://conda-forge.org/status/migration/?name=pypy38"&gt;available on PyPy&lt;/a&gt;, others find it hard to &lt;a href="https://github.com/conda-forge/numpy-feedstock/pull/310"&gt;update versions&lt;/a&gt;. We don't get much feedback at all about people using PyPy, and even less about PyPy on conda-forge. The conda-forge team has proposed &lt;a href="https://github.com/conda-forge/conda-forge.github.io/pull/2259"&gt;sunsetting PyPy&lt;/a&gt; going forward, which means current packages would remain but no new packages would be built. If you have an opinion, you can comment on that PR, or on this blog post.&lt;/p&gt; &lt;p&gt;Since conda-forge supports PyPy3.9 but not PyPy3.10, we have continued releasing PyPy3.9 even though we typically support only one version of PyPy3. With the sunsetting proposal, we will not release any more updates to PyPy3.9. I opened a &lt;a href="https://github.com/orgs/pypy/discussions/4998"&gt;poll&lt;/a&gt; about the intention to drop PyPy3.9. If you have an opinion, please chime in.&lt;/p&gt;</description><category>conda-forge</category><guid>https://www.pypy.org/posts/2024/08/conda-forge-proposes-dropping-support-for-pypy.html</guid><pubDate>Fri, 09 Aug 2024 06:27:41 GMT</pubDate></item></channel></rss>