CINXE.COM
Chapter 1: Combining Existing Transformations - MLIR
<!doctype html><html lang=en-us><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=no"><title>Chapter 1: Combining Existing Transformations - MLIR</title><meta name=description content="Multi-Level IR Compiler Framework"><meta name=generator content="Hugo 0.119.0"><link href=https://mlir.llvm.org/index.xml rel=alternate type=application/rss+xml><link rel=canonical href=https://mlir.llvm.org/docs/Tutorials/transform/Ch1/><link rel=stylesheet href=https://mlir.llvm.org/css/theme.css><script src=https://use.fontawesome.com/releases/v5.0.6/js/all.js></script> <link rel=stylesheet href=https://mlir.llvm.org/css/chroma.min.css><script src=https://cdn.jsdelivr.net/npm/jquery@3.3.1/dist/jquery.min.js></script> <script src=https://cdn.jsdelivr.net/npm/jquery.easing@1.4.1/jquery.easing.min.js></script> <script src=https://mlir.llvm.org/js/bundle.js></script> <script type=text/javascript src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script> <script type=text/x-mathjax-config> MathJax.Hub.Config({ tex2jax: { inlineMath: [['$', '$'] ], displayMath: [ ['$$','$$'], ["\\[","\\]"] ] } }); </script><link rel=apple-touch-icon sizes=180x180 href="/apple-touch-icon.png?v=1"><link rel=icon type=image/png sizes=32x32 href="/favicon-32x32.png?v=1"><link rel=icon type=image/png sizes=16x16 href="/favicon-16x16.png?v=1"><link rel=manifest href="/site.webmanifest?v=1"><link rel=mask-icon href="/safari-pinned-tab.svg?v=1" color=#3775e0><link rel="shortcut icon" href="/favicon.ico?v=1"><meta name=msapplication-TileColor content="#2d89ef"><meta name=theme-color content="#ffffff"><link rel=icon href=/favicon.svg type=image/svg+xml sizes=any><style>:root{}</style></head><body><div class=container><header><h1><div><img src=https://mlir.llvm.org//mlir-logo.png width=40px align=absmiddle> MLIR</div></h1><p class=description>Multi-Level IR Compiler Framework</p></header><div class=global-menu><nav><ul><li class=parent><a href>Community<i class="fas fa-angle-right"></i></a><ul class=sub-menu><li class=child><a href=https://llvm.discourse.group/c/mlir/31>Forums</a></li><li class=child><a href=https://discord.gg/xS7Z362>Chat</a></li></ul></li><li><a href=/getting_started/Debugging/>Debugging Tips</a></li><li><a href=/getting_started/Faq/>FAQ</a></li><li class=parent><a href=https://github.com/llvm/llvm-project/tree/main/mlir>Source<i class="fas fa-angle-right"></i></a><ul class=sub-menu><li class=child><a href=/doxygen/>Doxygen</a></li><li class=child><a href=https://github.com/llvm/llvm-project/tree/main/mlir>GitHub</a></li></ul></li><li><a href="https://bugs.llvm.org/buglist.cgi?bug_status=__open__&list_id=177877&order=changeddate%20DESC%2Cpriority%2Cbug_severity&product=MLIR&query_format=specific">Bugs</a></li><li><a href=https://github.com/llvm/mlir-www/tree/main/website/static/LogoAssets>Logo Assets</a></li><li><a href=https://www.youtube.com/MLIRCompiler>Youtube Channel</a></li></ul></nav></div><div class=content-container><main><h1>Chapter 1: Combining Existing Transformations</h1><h2 id=introduction>Introduction <a class=headline-hash href=#introduction>¶</a></h2><p>The Transform dialect allows one to precisely target transformations at specific operations in the IR and to chain them, that is to apply a transformation to operations produced by the previous transformation. To achieve this, transformations are expressed as other operations in the IR. We call these the IR containing these operations transform IR. And we call the IR that is being transformed payload IR.</p><p>Transform IR operations operate on values that may be associated with payload IR operations, values or attributes. We call the first two kinds of values operation and value handles, respectively. We call the last kind of values parameters.</p><p>The application of transform IR always starts from one top-level operation. In the C++ API, this operation is passed to the <code>applyTransforms</code> function. This top-level operation specifies if other transformations should be performed and how. The most common top-level operation, <code>transform.named_sequence</code> merely applies other transform operations listed in its body one after the other, similarly to a function or a macro.</p><p>Let us illustrate this with a simple sequence of transformations on the common “fully connected + bias + ReLU” ML layer, which boils down to performing a matrix multiplication, followed by an (elementwise) matrix addition and taking an elementwise maximum with 0. This can be expressed using the following IR:</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl><span class=kt>func</span><span class=p>.</span><span class=kt>func</span> <span class=nf>@fc_relu</span><span class=p>(</span><span class=nv>%lhs</span><span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>,</span> <span class=nv>%rhs</span><span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nv>%bias</span><span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>,</span> <span class=nv>%output</span><span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>)</span> </span></span><span class=line><span class=cl> <span class=p>-></span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>></span> <span class=p>{</span> </span></span><span class=line><span class=cl> <span class=c>// Matrix-matrix multiplication. </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%matmul</span> <span class=p>=</span> linalg<span class=p>.</span>matmul ins<span class=p>(</span><span class=nv>%lhs</span><span class=p>,</span> <span class=nv>%rhs</span><span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>,</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>)</span> </span></span><span class=line><span class=cl> outs<span class=p>(</span><span class=nv>%output</span><span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>)</span> <span class=p>-></span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl> <span class=c>// Elementwise addition. </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%biased</span> <span class=p>=</span> linalg<span class=p>.</span>elemwise_binary <span class=p>{</span> <span class=nl>fun =</span> <span class=nv>#linalg.binary_fn</span><span class=p><</span>add<span class=p>></span> <span class=p>}</span> </span></span><span class=line><span class=cl> ins<span class=p>(</span><span class=nv>%matmul</span><span class=p>,</span> <span class=nv>%bias</span> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>,</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>)</span> </span></span><span class=line><span class=cl> outs<span class=p>(</span><span class=nv>%output</span> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>)</span> <span class=p>-></span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl> <span class=c>// Elementwise max with 0 (ReLU). </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%c0f</span> <span class=p>=</span> arith<span class=p>.</span><span class=kt>constant</span> <span class=m>0.0</span> <span class=p>:</span> <span class=k>f32</span> </span></span><span class=line><span class=cl> <span class=nv>%relued</span> <span class=p>=</span> linalg<span class=p>.</span>elemwise_binary <span class=p>{</span> <span class=nl>fun =</span> <span class=nv>#linalg.binary_fn</span><span class=p><</span>max_signed<span class=p>></span> <span class=p>}</span> </span></span><span class=line><span class=cl> ins<span class=p>(</span><span class=nv>%biased</span><span class=p>,</span> <span class=nv>%c0f</span> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>,</span> <span class=k>f32</span><span class=p>)</span> </span></span><span class=line><span class=cl> outs<span class=p>(</span><span class=nv>%output</span> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>)</span> <span class=p>-></span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl> <span class=kt>func</span><span class=p>.</span><span class=kt>return</span> <span class=nv>%relued</span> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl><span class=p>}</span> </span></span></code></pre></div><h2 id=top-level-sequence-operation>Top-Level Sequence Operation <a class=headline-hash href=#top-level-sequence-operation>¶</a></h2><p>For performance reasons, we would like to tile and fuse these operations to exploit cache locality. This is a sequence of transformations that need to be performed one after another, so we naturally start with the corresponding top-level transform operation.</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl>module attributes <span class=p>{</span>transform<span class=p>.</span>with_named_sequence<span class=p>}</span> <span class=p>{</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>named_sequence <span class=nf>@__transform_main</span><span class=p>(</span> </span></span><span class=line><span class=cl> <span class=nv>%arg0</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg1</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg2</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.elemwise_binary"</span><span class=p>>):</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>yield </span></span><span class=line><span class=cl> <span class=p>}</span> </span></span><span class=line><span class=cl><span class=p>}</span> </span></span></code></pre></div><p>There are several aspects worth noticing in this operation.</p><p>Its special name, <code>@__transform_main</code> and the first argument are mandated by the interpreter pass, similarly to how the entry point of C programs needs to be called <code>main</code> and may have the <code>int (int argc, char** argv)</code> signature. This argument will be associated with the top-level payload operation, most often the operation that the pass is applied to. Note that none of this is required when applying the transformation <em>programmatically</em> via <code>applyTransforms</code> or <code>applyNamedSequence</code>.</p><p>The remaining entry block arguments are optional and can be associated with payload attributes, operations or values that are useful in the sequence. These are also specified when calling <code>applyTransforms</code>. In our case, we are interested in the matrix multiplication and elementwise operations that we are going to tile and fuse.</p><p>All value handles have Transform dialect types. These types specify certain properties of the payload IR entities associated with them. In this example, <code>transform.any_op</code> indicates that the handle is associated with arbitrary payload operations. On the contrary, <code>transform.op<"X"></code> indicates that the handle is associated <em>only</em> with payload operations of kind <code>X</code>. These constraints are verified when the handle/payload association is created. For entry block arguments of top-level transform operations, this happens early in the <code>applyTransforms</code> function. If the constraints are not satisfied, the transform application fails and produces diagnostics for the user.</p><p>Finally, the operation is wrapped in a module with the <code>transform.with_named_sequence</code> attribute that triggers all necessary verifications if multiple named sequences exist.</p><h2 id=failure-propagation>Failure Propagation <a class=headline-hash href=#failure-propagation>¶</a></h2><p>The Transform dialect infrastructure has a particular mechanism for handling diagnostics that supports recoverable errors. It is best understood by considering the (unnamed) sequence operation that has a mandatory attribute specifying the failure propagation mode. There are two options:</p><ul><li>“propagate” makes the sequence transformation fail if any of the nested transformation fails;</li><li>“suppress” makes the sequence succeed even if one of the nested transformations fails, but without attempting to perform the transformations following the failed one in the sequence.</li></ul><p>This latter allows the transformation script surrounding the sequence to continue despite errors within the sequence, assuming they are recoverable. As we are only building the transformation script, it is preferable to propagate failures so we know when something did not apply.</p><p>To check or debug a transform sequence, it is possible to print various entities associated with the transform IR values. For example, we can print the operations associated with the handles:</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl>transform<span class=p>.</span>sequence failures<span class=p>(</span>propagate<span class=p>)</span> <span class=p>{</span> </span></span><span class=line><span class=cl><span class=nl>^bb0</span><span class=p>(</span><span class=nv>%arg0</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg1</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg2</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.elemwise_binary"</span><span class=p>>):</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>debug<span class=p>.</span>emit_remark_at <span class=nv>%arg1</span><span class=p>,</span> <span class=s>"matmul"</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>></span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>debug<span class=p>.</span>emit_remark_at <span class=nv>%arg2</span><span class=p>,</span> <span class=s>"elemwise_binaries"</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.elemwise_binary"</span><span class=p>></span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>yield </span></span><span class=line><span class=cl><span class=p>}</span> </span></span></code></pre></div><h2 id=transform-dialect-interpreter>Transform Dialect Interpreter <a class=headline-hash href=#transform-dialect-interpreter>¶</a></h2><p>Since we don’t want to recompile the compiler every time we change a transformation, we can use a Transform dialect interpreter pass to apply this transformation sequence to the payload IR. As we will see in the next chapter, it is possible to define custom passes or even integrate the transform interpreter into a larger pass. For now, we can use the existing test pass:</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-sh data-lang=sh><span class=line><span class=cl>$ mlir-opt sequence.mlir --pass-pipeline<span class=o>=</span><span class=s2>" </span></span></span><span class=line><span class=cl><span class=s2> builtin.module(transform-interpreter{ </span></span></span><span class=line><span class=cl><span class=s2> debug-bind-trailing-args=linalg.matmul,linalg.elemwise_binary})"</span> </span></span></code></pre></div><p>The <code>sequence.mlir</code> file contains <em>both</em> the payload IR function <em>and</em> the transform IR sequence nested in the same module. The transform interpreter pass will apply the <code>@__transform_main</code> named sequence to the anchor operation of the pass. In our case, we also asked the interpreter pass to associate the two extra arguments of the top-level sequence with all <code>linalg.matmul</code> and <code>linalg.elemwise_binary</code> payload operations through the respective pass options. Running this pass results in the expected remarks:</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-sh data-lang=sh><span class=line><span class=cl>sequence.mlir:7:13: remark: matmul </span></span><span class=line><span class=cl> %matmul <span class=o>=</span> linalg.matmul ins<span class=o>(</span>%lhs, %rhs: tensor<512x512xf32>, tensor<512x512xf32><span class=o>)</span> </span></span><span class=line><span class=cl> ^ </span></span><span class=line><span class=cl>sequence.mlir:7:13: note: see current operation: %0 <span class=o>=</span> linalg.matmul ins<span class=o>(</span>%arg0, %arg1 : tensor<512x512xf32>, tensor<512x512xf32><span class=o>)</span> outs<span class=o>(</span>%arg3 : tensor<512x512xf32><span class=o>)</span> -> tensor<512x512xf32> </span></span><span class=line><span class=cl>sequence.mlir:10:13: remark: elemwise_binaries </span></span><span class=line><span class=cl> %biased <span class=o>=</span> linalg.elemwise_binary <span class=o>{</span> <span class=nv>fun</span> <span class=o>=</span> <span class=c1>#linalg.binary_fn<add> }</span> </span></span><span class=line><span class=cl> ^ </span></span><span class=line><span class=cl>sequence.mlir:10:13: note: see current operation: %1 <span class=o>=</span> linalg.elemwise_binary <span class=o>{</span><span class=nv>fun</span> <span class=o>=</span> <span class=c1>#linalg.binary_fn<add>} ins(%0, %arg2 : tensor<512x512xf32>, tensor<512x512xf32>) outs(%arg3 : tensor<512x512xf32>) -> tensor<512x512xf32></span> </span></span><span class=line><span class=cl>sequence.mlir:14:13: remark: elemwise_binaries </span></span><span class=line><span class=cl> %relued <span class=o>=</span> linalg.elemwise_binary <span class=o>{</span> <span class=nv>fun</span> <span class=o>=</span> <span class=c1>#linalg.binary_fn<max_signed> }</span> </span></span><span class=line><span class=cl> ^ </span></span><span class=line><span class=cl>sequence.mlir:14:13: note: see current operation: %2 <span class=o>=</span> linalg.elemwise_binary <span class=o>{</span><span class=nv>fun</span> <span class=o>=</span> <span class=c1>#linalg.binary_fn<max_signed>} ins(%1, %cst : tensor<512x512xf32>, f32) outs(%arg3 : tensor<512x512xf32>) -> tensor<512x512xf32></span> </span></span></code></pre></div><p>Note that <code>%arg2</code> is associated with both elementwise payload operations. Any handle is associated with a list of entities. Individual transformations may or may not care about the order of elements in that list.</p><h2 id=specifying-transformations>Specifying Transformations <a class=headline-hash href=#specifying-transformations>¶</a></h2><p>Now that we have handles to the operations we want to transform, we are ready to apply the transformations. Let us first try tiling the matmul operation itself.</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl>module attributes <span class=p>{</span>transform<span class=p>.</span>with_named_sequence<span class=p>}</span> <span class=p>{</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>named_sequence <span class=nf>@__transform_main</span><span class=p>(</span> </span></span><span class=line><span class=cl> <span class=nv>%arg0</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg1</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg2</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.elemwise_binary"</span><span class=p>>)</span> <span class=p>{</span> </span></span><span class=line><span class=cl> <span class=c>// The actual tiling transformation takes tile sizes as attributes. </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%loop</span><span class=p>,</span> <span class=nv>%tiled</span> <span class=p>=</span> transform<span class=p>.</span>structured<span class=p>.</span>tile_using_forall <span class=nv>%arg1</span> </span></span><span class=line><span class=cl> tile_sizes <span class=p>[</span><span class=m>4</span><span class=p>,</span> <span class=m>32</span><span class=p>]</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>>)</span> </span></span><span class=line><span class=cl> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>yield </span></span><span class=line><span class=cl> <span class=p>}</span> </span></span><span class=line><span class=cl><span class=p>}</span> </span></span></code></pre></div><p>The transformation returns two handles, as indicated in its <a href=https://mlir.llvm.org/docs/Dialects/Transform/#transformstructuredtile_using_forall-transformtileusingforallop>documentation</a>:</p><ul><li>A handle to <code>linalg.generic</code> operating on the subset of the original data.</li><li>A handle to the <code>scf.forall</code> “multi-for” loop around tensors.</li></ul><p>Running this transformation with the same command as above expectedly produces the tiled code.</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl><span class=kt>func</span><span class=p>.</span><span class=kt>func</span> <span class=nf>@fc_relu</span><span class=p>(</span><span class=nv>%arg0</span><span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg1</span><span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg2</span><span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg3</span><span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>)</span> <span class=p>-></span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>></span> <span class=p>{</span> </span></span><span class=line><span class=cl> <span class=nv>%cst</span> <span class=p>=</span> arith<span class=p>.</span><span class=kt>constant</span> <span class=m>0.000000e+00</span> <span class=p>:</span> <span class=k>f32</span> </span></span><span class=line><span class=cl> <span class=nv>%0</span> <span class=p>=</span> scf<span class=p>.</span>forall <span class=p>(</span><span class=nv>%arg4</span><span class=p>,</span> <span class=nv>%arg5</span><span class=p>)</span> in <span class=p>(</span><span class=m>128</span><span class=p>,</span> <span class=m>16</span><span class=p>)</span> shared_outs<span class=p>(</span><span class=nv>%arg6</span> <span class=p>=</span> <span class=nv>%arg3</span><span class=p>)</span> <span class=p>-></span> <span class=p>(</span><span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>)</span> <span class=p>{</span> </span></span><span class=line><span class=cl> <span class=nv>%3</span> <span class=p>=</span> affine<span class=p>.</span>apply affine_map<span class=p><(</span>d0<span class=p>)</span> <span class=p>-></span> <span class=p>(</span>d0 <span class=p>*</span> <span class=m>4</span><span class=p>)>(</span><span class=nv>%arg4</span><span class=p>)</span> </span></span><span class=line><span class=cl> <span class=nv>%4</span> <span class=p>=</span> affine<span class=p>.</span>apply affine_map<span class=p><(</span>d0<span class=p>)</span> <span class=p>-></span> <span class=p>(</span>d0 <span class=p>*</span> <span class=m>32</span><span class=p>)>(</span><span class=nv>%arg5</span><span class=p>)</span> </span></span><span class=line><span class=cl> <span class=nv>%extracted_slice</span> <span class=p>=</span> <span class=kt>tensor</span><span class=p>.</span>extract_slice <span class=nv>%arg0</span><span class=p>[</span><span class=nv>%3</span><span class=p>,</span> <span class=m>0</span><span class=p>]</span> <span class=p>[</span><span class=m>4</span><span class=p>,</span> <span class=m>512</span><span class=p>]</span> <span class=p>[</span><span class=m>1</span><span class=p>,</span> <span class=m>1</span><span class=p>]</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>></span> to <span class=kt>tensor</span><span class=p><</span><span class=m>4x512x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl> <span class=nv>%extracted_slice_0</span> <span class=p>=</span> <span class=kt>tensor</span><span class=p>.</span>extract_slice <span class=nv>%arg1</span><span class=p>[</span><span class=m>0</span><span class=p>,</span> <span class=nv>%4</span><span class=p>]</span> <span class=p>[</span><span class=m>512</span><span class=p>,</span> <span class=m>32</span><span class=p>]</span> <span class=p>[</span><span class=m>1</span><span class=p>,</span> <span class=m>1</span><span class=p>]</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>></span> to <span class=kt>tensor</span><span class=p><</span><span class=m>512x32x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl> <span class=nv>%extracted_slice_1</span> <span class=p>=</span> <span class=kt>tensor</span><span class=p>.</span>extract_slice <span class=nv>%arg6</span><span class=p>[</span><span class=nv>%3</span><span class=p>,</span> <span class=nv>%4</span><span class=p>]</span> <span class=p>[</span><span class=m>4</span><span class=p>,</span> <span class=m>32</span><span class=p>]</span> <span class=p>[</span><span class=m>1</span><span class=p>,</span> <span class=m>1</span><span class=p>]</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>></span> to <span class=kt>tensor</span><span class=p><</span><span class=m>4x32x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl> <span class=nv>%5</span> <span class=p>=</span> linalg<span class=p>.</span>matmul </span></span><span class=line><span class=cl> ins<span class=p>(</span><span class=nv>%extracted_slice</span><span class=p>,</span> <span class=nv>%extracted_slice_0</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>4x512x</span><span class=k>f32</span><span class=p>>,</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x32x</span><span class=k>f32</span><span class=p>>)</span> </span></span><span class=line><span class=cl> outs<span class=p>(</span><span class=nv>%extracted_slice_1</span> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>4x32x</span><span class=k>f32</span><span class=p>>)</span> <span class=p>-></span> <span class=kt>tensor</span><span class=p><</span><span class=m>4x32x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl> scf<span class=p>.</span>forall<span class=p>.</span>in_parallel <span class=p>{</span> </span></span><span class=line><span class=cl> <span class=kt>tensor</span><span class=p>.</span>parallel_insert_slice <span class=nv>%5</span> into <span class=nv>%arg6</span><span class=p>[</span><span class=nv>%3</span><span class=p>,</span> <span class=nv>%4</span><span class=p>]</span> <span class=p>[</span><span class=m>4</span><span class=p>,</span> <span class=m>32</span><span class=p>]</span> <span class=p>[</span><span class=m>1</span><span class=p>,</span> <span class=m>1</span><span class=p>]</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>4x32x</span><span class=k>f32</span><span class=p>></span> into <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl> <span class=p>}</span> </span></span><span class=line><span class=cl> <span class=p>}</span> </span></span><span class=line><span class=cl> <span class=nv>%1</span> <span class=p>=</span> linalg<span class=p>.</span>elemwise_binary <span class=p>{</span><span class=nl>fun =</span> <span class=nv>#linalg.binary_fn</span><span class=p><</span>add<span class=p>>}</span> </span></span><span class=line><span class=cl> ins<span class=p>(</span><span class=nv>%0</span><span class=p>,</span> <span class=nv>%arg2</span> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>,</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>)</span> </span></span><span class=line><span class=cl> outs<span class=p>(</span><span class=nv>%arg3</span> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>)</span> <span class=p>-></span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl> <span class=nv>%2</span> <span class=p>=</span> linalg<span class=p>.</span>elemwise_binary <span class=p>{</span><span class=nl>fun =</span> <span class=nv>#linalg.binary_fn</span><span class=p><</span>max_signed<span class=p>>}</span> </span></span><span class=line><span class=cl> ins<span class=p>(</span><span class=nv>%1</span><span class=p>,</span> <span class=nv>%cst</span> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>,</span> <span class=k>f32</span><span class=p>)</span> </span></span><span class=line><span class=cl> outs<span class=p>(</span><span class=nv>%arg3</span> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>>)</span> <span class=p>-></span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl> <span class=kt>return</span> <span class=nv>%2</span> <span class=p>:</span> <span class=kt>tensor</span><span class=p><</span><span class=m>512x512x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl><span class=p>}</span> </span></span></code></pre></div><p>Besides producing new handles, the tiling transform operation <em>consumes</em> the operand handle. This means that the handle is <em>invalidated</em> after this operation, and is no longer supposed to be used. Transform operations are required to mark all their operands as either consumed or readonly. Transform operations usually consume the operand if the associated payload operations are erased or recreated (which means erased and created anew with similar structure). As handles are essentially references to payload operations, they would become dangling if the payload no longer exists.</p><h2 id=handle-invalidation-and-expensive-checks-mode>Handle Invalidation and Expensive Checks Mode <a class=headline-hash href=#handle-invalidation-and-expensive-checks-mode>¶</a></h2><p>Undefined behavior is difficult to grapple with when it does happen, so the Transform dialect interpreter defaults to performing a set of additional, potentially expensive, checks that detect most undefined behavior in the transform IR. For example, if we wanted to use the <code>%arg1</code> handle after it is consumed, it would cause undefined behavior that manifests as an assertion in the debug build, and likely as a segmentation fault in the release mode.</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl>module attributes <span class=p>{</span>transform<span class=p>.</span>with_named_sequence<span class=p>}</span> <span class=p>{</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>named_sequence <span class=nf>@__transform_main</span><span class=p>(</span> </span></span><span class=line><span class=cl> <span class=nv>%arg0</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg1</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg2</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.elemwise_binary"</span><span class=p>>)</span> <span class=p>{</span> </span></span><span class=line><span class=cl> <span class=c>// The actual tiling transformation takes tile sizes as attributes. </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%loop</span><span class=p>,</span> <span class=nv>%tiled</span> <span class=p>=</span> transform<span class=p>.</span>structured<span class=p>.</span>tile_using_forall <span class=nv>%arg1</span> tile_sizes <span class=p>[</span><span class=m>4</span><span class=p>,</span> <span class=m>32</span><span class=p>]</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>>)</span> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl> <span class=c>// This is trying to use an invalidated handle leading to undefined behavior. </span></span></span><span class=line><span class=cl><span class=c></span> transform<span class=p>.</span>debug<span class=p>.</span>emit_remark_at <span class=nv>%arg1</span><span class=p>,</span> <span class=s>"remark"</span> <span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>></span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>yield </span></span><span class=line><span class=cl> <span class=p>}</span> </span></span><span class=line><span class=cl><span class=p>}</span> </span></span></code></pre></div><p>However, with the expensive checks enabled in the interpreter, a nice diagnostic is produced:</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-sh data-lang=sh><span class=line><span class=cl>sequence.mlir:28:3: error: op uses a handle invalidated by a previously executed transform op </span></span><span class=line><span class=cl> transform.debug.emit_remark_at %mm, <span class=s2>"elemwise_binaries"</span> : !transform.any_op </span></span><span class=line><span class=cl> ^ </span></span><span class=line><span class=cl>sequence.mlir:26:9: note: handle to invalidated ops </span></span><span class=line><span class=cl> %mm <span class=o>=</span> transform.cast %matmul : !transform.op<<span class=s2>"linalg.matmul"</span>> to !transform.any_op </span></span><span class=line><span class=cl> ^ </span></span><span class=line><span class=cl>sequence.mlir:27:19: note: invalidated by this transform op that consumes its operand <span class=c1>#0 and invalidates all handles to payload IR entities associated with this operand and entities nested in them</span> </span></span><span class=line><span class=cl> %loop, %tiled <span class=o>=</span> transform.structured.tile_using_forall %mm tile_sizes <span class=o>[</span>4, 32<span class=o>]</span> </span></span></code></pre></div><p>When compile-time performance is a concern, and the transformation sequence is sufficiently stable, it is possible to disable expensive checks in the interpreter for improved performance by providing the <code>disable-expensive-checks</code> option to the pass or by setting the corresponding flag in the <code>TransformOptions</code> passed into <code>applyTransforms</code>.</p><p>One may observe that some operations such as <code>transform.cast</code> do not consume the operand (because they don’t erase the corresponding operation). So what would happen if we tried to use that operand instead?</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl>module attributes <span class=p>{</span>transform<span class=p>.</span>with_named_sequence<span class=p>}</span> <span class=p>{</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>named_sequence <span class=nf>@__transform_main</span> </span></span><span class=line><span class=cl> <span class=nv>%arg0</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg1</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg2</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.elemwise_binary"</span><span class=p>>)</span> <span class=p>{</span> </span></span><span class=line><span class=cl> <span class=c>// We can cast one type to another as long as operations are compatible </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// with both types. This creates "aliasing" handles. </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%casted</span> <span class=p>=</span> transform<span class=p>.</span>cast <span class=nv>%arg1</span> <span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>></span> </span></span><span class=line><span class=cl> to <span class=p>!</span>transform<span class=p>.</span>any_op </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl> <span class=c>// The actual tiling transformation takes tile sizes as attributes. </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%loop</span><span class=p>,</span> <span class=nv>%tiled</span> <span class=p>=</span> transform<span class=p>.</span>structured<span class=p>.</span>tile_using_forall <span class=nv>%arg1</span> </span></span><span class=line><span class=cl> tile_sizes <span class=p>[</span><span class=m>4</span><span class=p>,</span> <span class=m>32</span><span class=p>]</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>>)</span> </span></span><span class=line><span class=cl> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl> <span class=c>// Consuming an operand invalidates the consumed handle and any other handle </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// that is associated with the same payload operations, or payload </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// operations nested in them. </span></span></span><span class=line><span class=cl><span class=c></span> transform<span class=p>.</span>debug<span class=p>.</span>emit_remark_at <span class=nv>%casted</span><span class=p>,</span> <span class=s>"remark"</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>any_op </span></span><span class=line><span class=cl> transform<span class=p>.</span>yield </span></span><span class=line><span class=cl> <span class=p>}</span> </span></span><span class=line><span class=cl><span class=p>}</span> </span></span></code></pre></div><p>Both <code>%arg1</code> and <code>%casted</code> reference the same payload operation. Extending the reference analogy, these references alias. Naturally, when the payload operation is erased, all references to it become dangling. This is also the case for handles. In fact, consuming an operand invalidates the operand handle as well as any other handle that is associated with any of the same payload operations. The payload IR consideration is recursive: a handle associated with a payload operation <em>nested</em> in the erased one is also invalidated (because erasing the operation also erases its regions and all contained operations). The expensive-checks mode can also handle this case.</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-sh data-lang=sh><span class=line><span class=cl>sequence.mlir:28:3: error: op uses a handle invalidated by a previously executed transform op </span></span><span class=line><span class=cl> transform.debug.emit_remark_at %matmul, <span class=s2>"elemwise_binaries"</span> : !transform.op<<span class=s2>"linalg.matmul"</span>> </span></span><span class=line><span class=cl> ^ </span></span><span class=line><span class=cl>sequence.mlir:21:29: note: handle to invalidated ops </span></span><span class=line><span class=cl>^bb0<span class=o>(</span>%root: !transform.any_op, %matmul: !transform.op<<span class=s2>"linalg.matmul"</span>>, %elemwise: !transform.op<<span class=s2>"linalg.elemwise_binary"</span>><span class=o>)</span>: </span></span><span class=line><span class=cl> ^ </span></span><span class=line><span class=cl>sequence.mlir:27:19: note: invalidated by this transform op that consumes its operand <span class=c1>#0 and invalidates all handles to payload IR entities associated with this operand and entities nested in them</span> </span></span><span class=line><span class=cl> %loop, %tiled <span class=o>=</span> transform.structured.tile_using_forall %mm tile_sizes <span class=o>[</span>4, 32<span class=o>]</span> </span></span></code></pre></div><h2 id=chaining-transformations-with-handles>Chaining Transformations with Handles <a class=headline-hash href=#chaining-transformations-with-handles>¶</a></h2><p>Going back to the transformation sequence, we have tiled the matrix multiplication, but we also want to tile and fuse the elementwise operations. The typical way of doing in the structured operations paradigm is to tile the last operation in some acyclic dataflow graph, and then progressively fuse the operations that produce its operands. This removes the need to explicitly tile all operations as fusion can adapt their sizes and inject recomputation if desired. So instead of tiling the matmul operation, we are going to tile the last operation in the chain, and then fuse the preceding operations into the loops produced by tiling.</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl>module attributes <span class=p>{</span>transform<span class=p>.</span>with_named_sequence<span class=p>}</span> <span class=p>{</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>named_sequence <span class=nf>@__transform_main</span><span class=p>(</span> </span></span><span class=line><span class=cl> <span class=nv>%arg0</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg1</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg2</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.elemwise_binary"</span><span class=p>>)</span> <span class=p>{</span> </span></span><span class=line><span class=cl> <span class=c>// Since the %arg2 handle is associated with both elementwise operations, </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// we need to split it into two handles so we can target only the second </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// elementwise operation. </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%add</span><span class=p>,</span> <span class=nv>%max</span> <span class=p>=</span> transform<span class=p>.</span>split_handle <span class=nv>%arg2</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.elemwise_binary"</span><span class=p>>)</span> </span></span><span class=line><span class=cl> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl> <span class=c>// The actual tiling transformation takes tile sizes as attributes. It </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// produces a handle to the loop generated during tiling. </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%tiled_max</span><span class=p>,</span> <span class=nv>%loop</span> <span class=p>=</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>structured<span class=p>.</span>tile_using_forall <span class=nv>%max</span> tile_sizes <span class=p>[</span><span class=m>8</span><span class=p>,</span> <span class=m>32</span><span class=p>]</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>)</span> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl> <span class=c>// We can now fuse the other operations into the loop. Here, we fuse </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// operations one by one. This requires the operation that is being fused to </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// define the value used within the loop, so the order of such fusions is </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// important. We could also use "transform.merge_handles" to obtain a single </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// handle to all operations and give it to `fuse_into_containing_op` that </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// would take care of the ordering in this case. </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%add_fused</span><span class=p>,</span> <span class=nv>%loop_0</span> <span class=p>=</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>structured<span class=p>.</span>fuse_into_containing_op <span class=nv>%add</span> into <span class=nv>%loop</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> <span class=nv>%matmul_fused</span><span class=p>,</span> <span class=nv>%loop_1</span> <span class=p>=</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>structured<span class=p>.</span>fuse_into_containing_op <span class=nv>%arg1</span> into <span class=nv>%loop_0</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl> transform<span class=p>.</span>yield </span></span><span class=line><span class=cl> <span class=p>}</span> </span></span><span class=line><span class=cl><span class=p>}</span> </span></span></code></pre></div><p>This achieves the desired tiling and fusion.</p><h2 id=more-handle-invalidation>More Handle Invalidation <a class=headline-hash href=#more-handle-invalidation>¶</a></h2><p>Finally, let us assume there exists an efficient microkernel, or a hardware instruction expressed as an intrinsic function, for a 4x4 matrix multiplication. For this purpose, we need to tile the fused operation to the desired size, and then outline it. The resulting function call can then be replaced with a call to the microkernel.</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl>module attributes <span class=p>{</span>transform<span class=p>.</span>with_named_sequence<span class=p>}</span> <span class=p>{</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>named_sequence <span class=nf>@__transform_main</span><span class=p>(</span> </span></span><span class=line><span class=cl> <span class=nv>%arg0</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg1</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nv>%arg2</span><span class=p>:</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.elemwise_binary"</span><span class=p>>)</span> <span class=p>{</span> </span></span><span class=line><span class=cl> <span class=c>// Since the %arg2 handle is associated with both elementwise operations, </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// we need to split it into two handles so we can target only the second </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// elementwise operation. </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%add</span><span class=p>,</span> <span class=nv>%max</span> <span class=p>=</span> transform<span class=p>.</span>split_handle <span class=nv>%arg2</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.elemwise_binary"</span><span class=p>>)</span> </span></span><span class=line><span class=cl> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl> <span class=c>// The actual tiling transformation takes tile sizes as attributes. It </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// produces a handle to the loop generated during tiling. </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%tiled</span><span class=p>,</span> <span class=nv>%loop</span> <span class=p>=</span> transform<span class=p>.</span>structured<span class=p>.</span>tile_using_forall <span class=nv>%max</span> </span></span><span class=line><span class=cl> tile_sizes <span class=p>[</span><span class=m>8</span><span class=p>,</span> <span class=m>32</span><span class=p>]</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>)</span> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl> <span class=c>// We can now fuse the other operations into the loop. Here, we fuse </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// operations one by one. This requires the operation that is being fused to </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// define the value used within the loop, so the order of such fusions is </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// important. We could also use "transform.merge_handles" to obtain a single </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// handle to all operations and give it to `fuse_into_containing_op` that </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// would take care of the ordering in this case. </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%add_fused</span><span class=p>,</span> <span class=nv>%loop_0</span> <span class=p>=</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>structured<span class=p>.</span>fuse_into_containing_op <span class=nv>%add</span> into <span class=nv>%loop</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> <span class=nv>%matmul_fused</span><span class=p>,</span> <span class=nv>%loop_1</span> <span class=p>=</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>structured<span class=p>.</span>fuse_into_containing_op <span class=nv>%arg1</span> into <span class=nv>%loop_0</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"linalg.matmul"</span><span class=p>>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl> <span class=c>// Tile again to get the desired size. Note that this time this tiles the </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// "add" operation and fuses matmul into the loop, but doesn't affect the </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// "max" operation. This illustrates the precise targeting with the </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// transform dialect. Otherwise, it is difficult to differentiate "add" and </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// "max", both of which having the same kind. </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%tiled_2</span><span class=p>,</span> <span class=nv>%loop_2</span> <span class=p>=</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>structured<span class=p>.</span>tile_using_forall <span class=nv>%add_fused</span> tile_sizes <span class=p>[</span><span class=m>4</span><span class=p>,</span> <span class=m>4</span><span class=p>]</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>)</span> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> <span class=nv>%matmul_fused_2</span><span class=p>,</span> <span class=nv>%loop_3</span> <span class=p>=</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>structured<span class=p>.</span>fuse_into_containing_op <span class=nv>%matmul_fused</span> into <span class=nv>%loop_2</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl> <span class=c>// Since outlining is currently only implemented for region-holding </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// operations such as loops, use tiling to size 1 to materialize the outer </span></span></span><span class=line><span class=cl><span class=c></span> <span class=c>// loop that is going to be outlined. </span></span></span><span class=line><span class=cl><span class=c></span> <span class=nv>%_</span><span class=p>,</span> <span class=nv>%outline_target</span> <span class=p>=</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>structured<span class=p>.</span>tile_using_forall <span class=nv>%tiled_2</span> tile_sizes <span class=p>[</span><span class=m>1</span><span class=p>]</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>)</span> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> transform<span class=p>.</span>structured<span class=p>.</span>fuse_into_containing_op <span class=nv>%matmul_fused_2</span> </span></span><span class=line><span class=cl> into <span class=nv>%outline_target</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>any_op<span class=p>)</span> </span></span><span class=line><span class=cl> <span class=nv>%func</span><span class=p>,</span> <span class=nv>%call</span> <span class=p>=</span> transform<span class=p>.</span>loop<span class=p>.</span>outline <span class=nv>%outline_target</span> </span></span><span class=line><span class=cl> <span class=p>{</span><span class=nl>func_name =</span> <span class=s>"outlined"</span><span class=p>}</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>)</span> <span class=p>-></span> <span class=p>(!</span>transform<span class=p>.</span>any_op<span class=p>,</span> <span class=p>!</span>transform<span class=p>.</span>op<span class=p><</span><span class=s>"func.call"</span><span class=p>>)</span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl> transform<span class=p>.</span>yield </span></span><span class=line><span class=cl> <span class=p>}</span> </span></span><span class=line><span class=cl><span class=p>}</span> </span></span></code></pre></div><p>This additional transformation also illustrates handle invalidation for nested operations. The <code>transform.loop.outline</code> operation consumes the handle to the loop, which invalidates it and all handles to any operations nested in it, such as <code>%2</code>. Attempting to use this handle will cause undefined behavior. (Note that it isn’t strictly necessary for this specific form of the outlining to consume the operand as the implementation only <em>moves</em> the region without recreating the operations, but the author of the transformation chose to invalidate the handle anyway.)</p><p>Attempting to access the fusion result after outlining produces the following error</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-sh data-lang=sh><span class=line><span class=cl>test/Examples/transform/Ch1/invalidation-2.mlir:109:3: error: op uses a handle invalidated by a previously executed transform op </span></span><span class=line><span class=cl> transform.debug.emit_remark_at %outline_target, <span class=s2>"outlined loop"</span> : !transform.any_op </span></span><span class=line><span class=cl> ^ </span></span><span class=line><span class=cl>test/Examples/transform/Ch1/invalidation-2.mlir:102:25: note: handle to invalidated ops </span></span><span class=line><span class=cl> %outline_target, %_ <span class=o>=</span> transform.structured.tile_using_forall %tiled_2 tile_sizes <span class=o>[</span>1<span class=o>]</span> </span></span><span class=line><span class=cl> ^ </span></span><span class=line><span class=cl>test/Examples/transform/Ch1/invalidation-2.mlir:106:18: note: invalidated by this transform op that consumes its operand <span class=c1>#0 and invalidates all handles to payload IR entities associated with this operand and entities nested in them</span> </span></span><span class=line><span class=cl> %func, %call <span class=o>=</span> transform.loop.outline %outline_target <span class=o>{</span><span class=nv>func_name</span> <span class=o>=</span> <span class=s2>"outlined"</span><span class=o>}</span> </span></span><span class=line><span class=cl> ^ </span></span><span class=line><span class=cl>test/Examples/transform/Ch1/invalidation-2.mlir:24:13: note: ancestor payload op </span></span><span class=line><span class=cl> %biased <span class=o>=</span> linalg.elemwise_binary <span class=o>{</span> <span class=nv>fun</span> <span class=o>=</span> <span class=c1>#linalg.binary_fn<add> }</span> </span></span><span class=line><span class=cl> ^ </span></span><span class=line><span class=cl>test/Examples/transform/Ch1/invalidation-2.mlir:24:13: note: nested payload op </span></span><span class=line><span class=cl> %matmul <span class=o>=</span> linalg.matmul ins<span class=o>(</span>%lhs, %rhs: tensor<512x512xf32>, tensor<512x512xf32><span class=o>)</span> </span></span></code></pre></div><p>Note that the “add” elementwise operation is indicated as payload ancestor because it was used to produce the tile loop, and the loop therefore has its location.</p><p>Finally, we would like to replace the call to the outlined function with a call to the microkernel. Unfortunately, the Transform dialect doesn’t have support for this transformation (and cannot have if the call is rewritten to a custom, out-of-tree operation). Therefore, we need to define new transform operations. The next chapters will describe how this can be done.</p><h2 id=tracking-ir-modifications>Tracking IR Modifications <a class=headline-hash href=#tracking-ir-modifications>¶</a></h2><p>The Transform dialect automatically tracks all IR changes that are made as part of transform ops. (Implementations must use the provided rewriter to modify IR.) If a payload op is erased, it is automatically removed from all handles that it is currently associated with. If a payload op is replaced, the transform dialect tries to find the replacement op and updates all handles accordingly. If a multi-result op is replaced with values that are defined by multiple ops, or if an op is replaced with an op of a different type, an error is produced. This is because it is unclear whether the direct replacements actually represent the computation of the original op. There are ways to customize this behavior. More details can be found at the documentation of <code>transform::TrackingListener</code>.</p><div class=edit-meta><br></div><nav class=pagination><a class="nav nav-prev" href=https://mlir.llvm.org/docs/Tutorials/transform/Ch0/ title="Chapter 0: A Primer on “Structured” Linalg Operations"><i class="fas fa-arrow-left" aria-hidden=true></i> Prev - Chapter 0: A Primer on “Structured” Linalg Operations</a> <a class="nav nav-next" href=https://mlir.llvm.org/docs/Tutorials/transform/Ch2/ title="Chapter 2: Adding a Simple New Transformation Operation">Next - Chapter 2: Adding a Simple New Transformation Operation <i class="fas fa-arrow-right" aria-hidden=true></i></a></nav><footer><p class=powered>Powered by <a href=https://gohugo.io>Hugo</a>. Theme by <a href=https://themes.gohugo.io/hugo-theme-techdoc/>TechDoc</a>. Designed by <a href=https://github.com/thingsym/hugo-theme-techdoc>Thingsym</a>.</p></footer></main><div class=sidebar><nav class=slide-menu><ul><li><a href=https://mlir.llvm.org/>Home</a></li><li><a href=https://mlir.llvm.org/users/>Users of MLIR</a></li><li><a href=https://mlir.llvm.org/pubs/>MLIR Related Publications</a></li><li><a href=https://mlir.llvm.org/talks/>Talks</a></li><li><a href=https://mlir.llvm.org/deprecation/>Deprecations & Current Refactoring</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/getting_started/>Getting Started<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/getting_started/ReportingIssues/>Reporting Issues</a></li><li><a href=https://mlir.llvm.org/getting_started/Debugging/>Debugging Tips</a></li><li><a href=https://mlir.llvm.org/getting_started/Faq/>FAQ</a></li><li><a href=https://mlir.llvm.org/getting_started/Contributing/>How to Contribute</a></li><li><a href=https://mlir.llvm.org/getting_started/DeveloperGuide/>Developer Guide</a></li><li><a href=https://mlir.llvm.org/getting_started/openprojects/>Open Projects</a></li><li><a href=https://mlir.llvm.org/getting_started/Glossary/>Glossary</a></li><li><a href=https://mlir.llvm.org/getting_started/TestingGuide/>Testing Guide</a></li></ul></li><li class="parent has-sub-menu"><a href=https://mlir.llvm.org/docs/>Code Documentation<span class="mark opened">-</span></a><ul class=sub-menu><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Bindings/>Bindings<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Bindings/Python/>MLIR Python Bindings</a></li></ul></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Tools/>Tools<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Tools/MLIRLSP/>MLIR : Language Server Protocol</a></li><li><a href=https://mlir.llvm.org/docs/Tools/mlir-reduce/>MLIR Reduce</a></li><li><a href=https://mlir.llvm.org/docs/Tools/mlir-rewrite/>mlir-rewrite</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/QuantPasses/></a></li><li><a href=https://mlir.llvm.org/docs/ActionTracing/>Action: Tracing and Debugging MLIR-based Compilers</a></li><li><a href=https://mlir.llvm.org/docs/BufferDeallocationInternals/>Buffer Deallocation - Internals</a></li><li><a href=https://mlir.llvm.org/docs/Bufferization/>Bufferization</a></li><li><a href=https://mlir.llvm.org/docs/DataLayout/>Data Layout Modeling</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/DefiningDialects/>Defining Dialects<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/DefiningDialects/Constraints/>Constraints</a></li><li><a href=https://mlir.llvm.org/docs/DefiningDialects/AttributesAndTypes/>Defining Dialect Attributes and Types</a></li><li><a href=https://mlir.llvm.org/docs/DefiningDialects/Operations/>Operation Definition Specification (ODS)</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Diagnostics/>Diagnostic Infrastructure</a></li><li><a href=https://mlir.llvm.org/docs/DialectConversion/>Dialect Conversion</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Dialects/>Dialects<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Dialects/DLTITransformOps/></a></li><li><a href=https://mlir.llvm.org/docs/Dialects/OpenACCDialect/>'acc' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Affine/>'affine' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/AMDGPU/>'amdgpu' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/AMX/>'amx' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ArithOps/>'arith' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ArmNeon/>'arm_neon' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ArmSVE/>'arm_sve' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ArmSME/>'ArmSME' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/AsyncDialect/>'async' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/BufferizationOps/>'bufferization' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ControlFlowDialect/>'cf' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ComplexOps/>'complex' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/DLTIDialect/>'dlti' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/EmitC/>'emitc' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Func/>'func' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/GPU/>'gpu' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/IndexOps/>'index' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/IRDL/>'irdl' Dialect</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Dialects/Linalg/>'linalg' Dialect<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Dialects/Linalg/OpDSL/>Linalg OpDSL</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Dialects/LLVM/>'llvm' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MathOps/>'math' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MemRef/>'memref' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Mesh/>'mesh' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MLProgramOps/>'ml_program' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MPI/>'mpi' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/NVGPU/>'nvgpu' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/NVVMDialect/>'nvvm' Dialect</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Dialects/OpenMPDialect/>'omp' Dialect<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Dialects/OpenMPDialect/ODS/>ODS Documentation</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Dialects/PDLInterpOps/>'pdl_interp' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/PDLOps/>'pdl' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/PolynomialDialect/>'polynomial' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/PtrOps/>'ptr' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/QuantDialect/>'quant' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ROCDLDialect/>'rocdl' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/SCFDialect/>'scf' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ShapeDialect/>'shape' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/SparseTensorOps/>'sparse_tensor' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/TensorOps/>'tensor' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/UBOps/>'ub' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/VCIXDialect/>'vcix' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Vector/>'vector' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/X86Vector/>'x86vector' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/XeGPU/>'xegpu' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Builtin/>Builtin Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MatchOpInterfaces/>OpInterface definitions</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/SPIR-V/>SPIR-V Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/TOSA/>Tensor Operator Set Architecture (TOSA) Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Transform/>Transform Dialect</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Interfaces/>Interfaces</a></li><li><a href=https://mlir.llvm.org/docs/TargetLLVMIR/>LLVM IR Target</a></li><li><a href=https://mlir.llvm.org/docs/BytecodeFormat/>MLIR Bytecode Format</a></li><li><a href=https://mlir.llvm.org/docs/CAPI/>MLIR C API</a></li><li><a href=https://mlir.llvm.org/docs/LangRef/>MLIR Language Reference</a></li><li><a href=https://mlir.llvm.org/docs/ReleaseNotes/>MLIR Release Notes</a></li><li><a href=https://mlir.llvm.org/docs/Canonicalization/>Operation Canonicalization</a></li><li><a href=https://mlir.llvm.org/docs/OwnershipBasedBufferDeallocation/>Ownership-based Buffer Deallocation</a></li><li><a href=https://mlir.llvm.org/docs/PassManagement/>Pass Infrastructure</a></li><li><a href=https://mlir.llvm.org/docs/Passes/>Passes</a></li><li><a href=https://mlir.llvm.org/docs/PatternRewriter/>Pattern Rewriting : Generic DAG-to-DAG Rewriting</a></li><li><a href=https://mlir.llvm.org/docs/PDLL/>PDLL - PDL Language</a></li><li><a href=https://mlir.llvm.org/docs/Quantization/>Quantization</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Rationale/>Rationale<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Rationale/RationaleGenericDAGRewriter/>Generic DAG Rewriter Infrastructure Rationale</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/RationaleLinalgDialect/>Linalg Dialect Rationale: The Case For Compiler-Friendly Custom Operations</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/Rationale/>MLIR Rationale</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/MLIRForGraphAlgorithms/>MLIR: Incremental Application to Graph Algorithms in ML Frameworks</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/RationaleSimplifiedPolyhedralForm/>MLIR: The case for a simplified polyhedral form</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/SideEffectsAndSpeculation/>Side Effects & Speculation</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/UsageOfConst/>Usage of 'const' in MLIR, for core IR types</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/ShapeInference/>Shape Inference</a></li><li><a href=https://mlir.llvm.org/docs/SPIRVToLLVMDialectConversion/>SPIR-V Dialect to LLVM Dialect conversion manual</a></li><li><a href=https://mlir.llvm.org/docs/SymbolsAndSymbolTables/>Symbols and Symbol Tables</a></li><li><a href=https://mlir.llvm.org/docs/DeclarativeRewrites/>Table-driven Declarative Rewrite Rule (DRR)</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Traits/>Traits<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Traits/Broadcastable/>The `Broadcastable` Trait</a></li></ul></li><li class="parent has-sub-menu"><a href=https://mlir.llvm.org/docs/Tutorials/>Tutorials<span class="mark opened">-</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Tutorials/CreatingADialect/>Creating a Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/QuickstartRewrites/>Quickstart tutorial to adding MLIR graph rewrite</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Tutorials/Toy/>Toy Tutorial<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-1/>Chapter 1: Toy Language and AST</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-2/>Chapter 2: Emitting Basic MLIR</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-3/>Chapter 3: High-level Language-Specific Analysis and Transformation</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-4/>Chapter 4: Enabling Generic Transformation with Interfaces</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-5/>Chapter 5: Partial Lowering to Lower-Level Dialects for Optimization</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-6/>Chapter 6: Lowering to LLVM and CodeGeneration</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-7/>Chapter 7: Adding a Composite Type to Toy</a></li></ul></li><li class="parent has-sub-menu"><a href=https://mlir.llvm.org/docs/Tutorials/transform/>Transform Dialect Tutorial<span class="mark opened">-</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch0/>Chapter 0: A Primer on “Structured” Linalg Operations</a></li><li class=active><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch1/>Chapter 1: Combining Existing Transformations</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch2/>Chapter 2: Adding a Simple New Transformation Operation</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch3/>Chapter 3: More than Simple Transform Operations</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch4/>Chapter 4: Matching Payload with Transform Operations</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/ChH/>Chapter H: Reproducing Halide Schedule</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Tutorials/UnderstandingTheIRStructure/>Understanding the IR Structure</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/MlirOpt/>Using `mlir-opt`</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/DataFlowAnalysis/>Writing DataFlow Analyses in MLIR</a></li></ul></li></ul></li></ul></nav><div class=sidebar-footer></div></div></div><a href=# id=backtothetop-fixed class=backtothetop data-backtothetop-duration=600 data-backtothetop-easing=easeOutQuart data-backtothetop-fixed-fadein=1000 data-backtothetop-fixed-fadeout=1000 data-backtothetop-fixed-bottom=10 data-backtothetop-fixed-right=20><span class="fa-layers fa-fw"><i class="fas fa-circle"></i> <i class="fas fa-arrow-circle-up"></i></span></a></div></body></html>