CINXE.COM
True streaming for libvips
<!DOCTYPE html> <html lang="en-us"> <head> <meta charset="UTF-8"> <title>True streaming for libvips</title> <meta name="description" content="A fast image processing library with low memory needs."/> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="theme-color" content="#157878"> <link href='https://fonts.googleapis.com/css?family=Open+Sans:400,700' rel='stylesheet' type='text/css'> <link href='https://fonts.googleapis.com/css?family=Cabin:400,700' rel='stylesheet' type='text/css'> <link rel="stylesheet" href="/assets/css/style.css?v=e926315de91139ac2a6fc1a99556e1fe8f3ddce3"> <link rel="shortcut icon" type="image/x-icon" href="favicon.ico"> <link type="application/atom+xml" rel="alternate" href="https://www.libvips.org/feed.xml" title="libvips" /> </head> <body> <section class="page-header"> <h1 class="project-name"><a href="/">libvips</a></h1> <h2 class="project-tagline">A fast image processing library with low memory needs.</h2> <a href="https://github.com/libvips/libvips/releases" class="btn">Download</a> <a href="/install.html" class="btn">Install</a> <a href="/API/current" class="btn">Documentation</a> <a href="https://github.com/libvips/libvips/issues" class="btn">Issues</a> <a href="https://github.com/libvips/libvips/wiki" class="btn">Wiki</a> <a href="https://github.com/libvips" class="btn">libvips projects</a> <a href="https://github.com/libvips/libvips" class="btn">libvips on GitHub</a> </section> <section class="main-content"> <p>An interesting feature has just landed in libvips git master (and should be in the upcoming libvips 8.9): true streaming. This has been talked about on and off for five years or more, but it鈥檚 now finally happened! This post explains what this feature is and why it could be useful.</p> <h1 id="overview">Overview</h1> <p>Previously, libvips let you use files and areas of memory as the source and destination of image processing pipelines.</p> <p>The new <code class="language-plaintext highlighter-rouge">VipsConnection</code> classes let you connect image processing pipelines efficiently to <em>any</em> kind of data object, for example, pipes. You can now do this:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cat k2.jpg | \ vips invert stdin[shrink=2] .jpg[Q=90] | \ cat > x.jpg </code></pre></div></div> <p>The magic filename <code class="language-plaintext highlighter-rouge">"stdin"</code> opens a stream attached to file descriptor 0 (<code class="language-plaintext highlighter-rouge">stdin</code>), does <code class="language-plaintext highlighter-rouge">vips_image_new_from_source()</code>, and passes that image into the operation. Writing to a filename with nothing before the suffix will open a stream to <code class="language-plaintext highlighter-rouge">stdout</code> and write in that format.</p> <h1 id="why-is-this-useful">Why is this useful</h1> <p>To see why this is a useful thing to be able to do, imagine how something like a thumbnailing service on S3 works.</p> <p>S3 keeps data (images in this case) in <em>buckets</em> and lets you read and write buckets using http <code class="language-plaintext highlighter-rouge">GET</code> and <code class="language-plaintext highlighter-rouge">POST</code> requests to addresses like <code class="language-plaintext highlighter-rouge">http://johnsmith.s3.amazonaws.com/photos/puppy.jpg</code>.</p> <p>Processing with a system that works in whole images, like ImageMagick, happens like this:</p> <p><img src="/assets/images/magick-s3.png" alt="Processing with image-at-a-time systems" /></p> <p>Reading from the left, first the data is downloaded from the bucket into a large area of memory, then the image is decompressed, then processed, perhaps in several stages, then recompressed, then finally uploaded back to cloud storage.</p> <p>Each stage must complete before the next stage can start, and each stage needs at least two large areas of memory to function.</p> <p>Current libvips is able to execute decode, process and encode all at the same time, in parallel, and without needing any intermediate images. It looks more like this:</p> <p><img src="/assets/images/old-libvips-s3.png" alt="Processing with current libvips" /></p> <p>Because the middle sections are overlapped we get much better <em>latency</em>: the total time the whole process takes from start to finish is much lower.</p> <p>However, current libvips still needs the compressed input image to be read to memory before it can start, and can鈥檛 start to upload the result to cloud storage until it has finished compressing the whole output image.</p> <p>This is where true streaming comes in. libvips git master can now decode directly from a pipe and encode directly to a pipe. It looks more like this:</p> <p><img src="/assets/images/new-libvips-s3.png" alt="Processing with libvips streams" /></p> <p>Now <em>everything</em> overlaps, and latency should drop again.</p> <h1 id="api">API</h1> <p>Here鈥檚 how it looks in Python:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">source</span> <span class="o">=</span> <span class="n">pyvips</span><span class="p">.</span><span class="n">Source</span><span class="p">.</span><span class="n">new_from_descriptor</span><span class="p">(</span><span class="mi">4132</span><span class="p">)</span> <span class="n">image</span> <span class="o">=</span> <span class="n">pyvips</span><span class="p">.</span><span class="n">Image</span><span class="p">.</span><span class="n">new_from_source</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="s">""</span><span class="p">)</span> <span class="k">if</span> <span class="n">image</span><span class="p">.</span><span class="n">width</span> <span class="o">></span> <span class="mi">1000</span><span class="p">:</span> <span class="c1"># big image! .. shrink on load </span> <span class="n">image</span> <span class="o">=</span> <span class="n">pyvips</span><span class="p">.</span><span class="n">Image</span><span class="p">.</span><span class="n">new_from_source</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="s">""</span><span class="p">,</span> <span class="n">shrink</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span> <span class="n">image</span> <span class="o">=</span> <span class="n">image</span><span class="p">.</span><span class="n">invert</span><span class="p">()</span> <span class="n">target</span> <span class="o">=</span> <span class="n">pyvips</span><span class="p">.</span><span class="n">Target</span><span class="p">.</span><span class="n">new_to_descriptor</span><span class="p">(</span><span class="mi">2487</span><span class="p">)</span> <span class="n">image</span><span class="p">.</span><span class="n">write_to_target</span><span class="p">(</span><span class="n">target</span><span class="p">)</span> </code></pre></div></div> <p>The neat part is that you can open the source twice, once to get the header and decide how to process it, and a second time with the parameters you want.</p> <p>Behind the scenes, the source is buffering bytes as they arrive from the input. If you reuse the source, it鈥檒l automatically rewind and reuse the buffered bytes until they run out. Once you switch from reading the header to processing pixels, the buffer is discarded and bytes from the source are fed directly into the decompressor.</p> <p>The mechanism that supports this is set of calls loaders can use on sources to hint what kind of access pattern they are likely to need, and what part of the image (header, pixels) they are working on.</p> <h1 id="custom-sources">Custom sources</h1> <p>libvips ships with streams that can attach to files, areas of memory, and file descriptors (eg. pipes).</p> <p>You can add your own connection types by subclassing <code class="language-plaintext highlighter-rouge">VipsSource</code> and <code class="language-plaintext highlighter-rouge">VipsTarget</code> and implementing <code class="language-plaintext highlighter-rouge">read</code> and <code class="language-plaintext highlighter-rouge">write</code> methods, but this can be awkward for languages other than C or C++.</p> <p>To make custom streams easy in languages like Python, there are classes called <code class="language-plaintext highlighter-rouge">VipsSourceCustom</code> and <code class="language-plaintext highlighter-rouge">VipsTargetCustom</code>. You can make your own stream objects like this:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">file</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="p">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">"rb"</span><span class="p">)</span> <span class="k">def</span> <span class="nf">read_handler</span><span class="p">(</span><span class="n">size</span><span class="p">):</span> <span class="k">return</span> <span class="nb">file</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="n">size</span><span class="p">)</span> <span class="n">source</span> <span class="o">=</span> <span class="n">pyvips</span><span class="p">.</span><span class="n">SourceCustom</span><span class="p">()</span> <span class="n">source</span><span class="p">.</span><span class="n">on_read</span><span class="p">(</span><span class="n">read_handler</span><span class="p">)</span> </code></pre></div></div> <p>This makes a very simple source which just reads from a file. Without a seek handler, <code class="language-plaintext highlighter-rouge">Source</code> will treat this as a pipe and do automatic header buffering.</p> <p>Like any source, you can use it to make an image:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">image</span> <span class="o">=</span> <span class="n">pyvips</span><span class="p">.</span><span class="n">Image</span><span class="p">.</span><span class="n">new_from_source</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="s">''</span><span class="p">)</span> </code></pre></div></div> <p>Or perhaps:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">image</span> <span class="o">=</span> <span class="n">pyvips</span><span class="p">.</span><span class="n">Image</span><span class="p">.</span><span class="n">thumbnail_source</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="mi">128</span><span class="p">)</span> </code></pre></div></div> <p>You could make one with a seek handler like this:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">file</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="p">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">"rb"</span><span class="p">)</span> <span class="k">def</span> <span class="nf">read_handler</span><span class="p">(</span><span class="n">size</span><span class="p">):</span> <span class="k">return</span> <span class="nb">file</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="n">size</span><span class="p">)</span> <span class="k">def</span> <span class="nf">seek_handler</span><span class="p">(</span><span class="n">offset</span><span class="p">,</span> <span class="n">whence</span><span class="p">):</span> <span class="nb">file</span><span class="p">.</span><span class="n">seek</span><span class="p">(</span><span class="n">offset</span><span class="p">,</span> <span class="n">whence</span><span class="p">)</span> <span class="k">return</span> <span class="nb">file</span><span class="p">.</span><span class="n">tell</span><span class="p">()</span> <span class="n">source</span> <span class="o">=</span> <span class="n">pyvips</span><span class="p">.</span><span class="n">Source</span><span class="p">()</span> <span class="n">source</span><span class="p">.</span><span class="n">on_read</span><span class="p">(</span><span class="n">read_handler</span><span class="p">)</span> <span class="n">source</span><span class="p">.</span><span class="n">on_seek</span><span class="p">(</span><span class="n">seek_handler</span><span class="p">)</span> </code></pre></div></div> <p>A seek method is optional, but will help file formats like TIFF which seek a lot during read.</p> <h1 id="custom-output-streams">Custom output streams</h1> <p>Output streams are almost the same:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">file</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="p">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="s">"wb"</span><span class="p">)</span> <span class="k">def</span> <span class="nf">write_handler</span><span class="p">(</span><span class="n">chunk</span><span class="p">):</span> <span class="k">return</span> <span class="nb">file</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">chunk</span><span class="p">)</span> <span class="k">def</span> <span class="nf">finish_handler</span><span class="p">():</span> <span class="nb">file</span><span class="p">.</span><span class="n">close</span><span class="p">()</span> <span class="n">target</span> <span class="o">=</span> <span class="n">pyvips</span><span class="p">.</span><span class="n">TargetCustom</span><span class="p">()</span> <span class="n">target</span><span class="p">.</span><span class="n">on_write</span><span class="p">(</span><span class="n">write_handler</span><span class="p">)</span> <span class="n">target</span><span class="p">.</span><span class="n">on_finish</span><span class="p">(</span><span class="n">finish_handler</span><span class="p">)</span> </code></pre></div></div> <p>So you can now do this!</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">image</span> <span class="o">=</span> <span class="n">pyvips</span><span class="p">.</span><span class="n">Image</span><span class="p">.</span><span class="n">new_from_source</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="s">''</span><span class="p">)</span> <span class="n">image</span><span class="p">.</span><span class="n">write_to_target</span><span class="p">(</span><span class="n">target</span><span class="p">,</span> <span class="s">'.png'</span><span class="p">)</span> </code></pre></div></div> <p>And it鈥檒l copy between your two objects.</p> <h1 id="loader-and-saver-api">Loader and saver API</h1> <p>There鈥檚 quite a large chunk of new API for loaders and savers to use to hook themselves up to streams. We鈥檝e rewritten jpg, png, webp, hdr (Radiance), tif (though only load, not save), svg and ppm/pfm/pnm to work only via this new class.</p> <p>We plan to rework more loaders and savers in the next few libvips versions. The old file and buffer API will become a thin layer over the new connection system.</p> <footer class="site-footer"> <span class="site-footer-credits"> <a href="https://github.com/libvips/libvips">View on GitHub</a> </span> <a style='float: right;' href="/feed.xml"><img src="/assets/images/rss.png" alt="subscribe via rss"/></a> </footer> </section> <script type="text/javascript"> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-48550036-2', 'auto'); ga('send', 'pageview'); </script> </body> </html>