CINXE.COM

<!doctype html><html lang=en-us><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=no"><title>Passes - MLIR</title><meta name=description content="Multi-Level IR Compiler Framework"><meta name=generator content="Hugo 0.119.0"><link href=https://mlir.llvm.org/index.xml rel=alternate type=application/rss+xml><link rel=canonical href=https://mlir.llvm.org/docs/Passes/><link rel=stylesheet href=https://mlir.llvm.org/css/theme.css><script src=https://use.fontawesome.com/releases/v5.0.6/js/all.js></script> <link rel=stylesheet href=https://mlir.llvm.org/css/chroma.min.css><script src=https://cdn.jsdelivr.net/npm/jquery@3.3.1/dist/jquery.min.js></script> <script src=https://cdn.jsdelivr.net/npm/jquery.easing@1.4.1/jquery.easing.min.js></script> <script src=https://mlir.llvm.org/js/bundle.js></script> <script type=text/javascript src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script> <script type=text/x-mathjax-config> MathJax.Hub.Config({ tex2jax: { inlineMath: [['$', '$'] ], displayMath: [ ['$$','$$'], ["\\[","\\]"] ] } }); </script><link rel=apple-touch-icon sizes=180x180 href="/apple-touch-icon.png?v=1"><link rel=icon type=image/png sizes=32x32 href="/favicon-32x32.png?v=1"><link rel=icon type=image/png sizes=16x16 href="/favicon-16x16.png?v=1"><link rel=manifest href="/site.webmanifest?v=1"><link rel=mask-icon href="/safari-pinned-tab.svg?v=1" color=#3775e0><link rel="shortcut icon" href="/favicon.ico?v=1"><meta name=msapplication-TileColor content="#2d89ef"><meta name=theme-color content="#ffffff"><link rel=icon href=/favicon.svg type=image/svg+xml sizes=any><style>:root{}</style></head><body><div class=container><header><h1><div><img src=https://mlir.llvm.org//mlir-logo.png width=40px align=absmiddle> MLIR</div></h1>Multi-Level IR Compiler Framework</header><div class=global-menu><nav><ul><li class=parent><a href>Community</a><ul class=sub-menu><li class=child><a href=https://llvm.discourse.group/c/mlir/31>Forums</a></li><li class=child><a href=https://discord.gg/xS7Z362>Chat</a></li></ul></li><li><a href=/getting_started/Debugging/>Debugging Tips</a></li><li><a href=/getting_started/Faq/>FAQ</a></li><li class=parent><a href=https://github.com/llvm/llvm-project/tree/main/mlir>Source</a><ul class=sub-menu><li class=child><a href=/doxygen/>Doxygen</a></li><li class=child><a href=https://github.com/llvm/llvm-project/tree/main/mlir>GitHub</a></li></ul></li><li><a href="https://bugs.llvm.org/buglist.cgi?bug_status=__open__&list_id=177877&order=changeddate%20DESC%2Cpriority%2Cbug_severity&product=MLIR&query_format=specific">Bugs</a></li><li><a href=https://github.com/llvm/mlir-www/tree/main/website/static/LogoAssets>Logo Assets</a></li><li><a href=https://www.youtube.com/MLIRCompiler>Youtube Channel</a></li></ul></nav></div><div class=content-container><main><h1>Passes</h1>This document describes the available MLIR passes and their contracts.<nav id=TableOfContents><ul><li><a href=#general-transformation-passes>General Transformation Passes</a><ul><li><a href=#-canonicalize><code>-canonicalize</code></a></li><li><a href=#-composite-fixed-point-pass><code>-composite-fixed-point-pass</code></a></li><li><a href=#-control-flow-sink><code>-control-flow-sink</code></a></li><li><a href=#-cse><code>-cse</code></a></li><li><a href=#-generate-runtime-verification><code>-generate-runtime-verification</code></a></li><li><a href=#-inline><code>-inline</code></a></li><li><a href=#-loop-invariant-code-motion><code>-loop-invariant-code-motion</code></a></li><li><a href=#-loop-invariant-subset-hoisting><code>-loop-invariant-subset-hoisting</code></a></li><li><a href=#-mem2reg><code>-mem2reg</code></a></li><li><a href=#-print-ir><code>-print-ir</code></a></li><li><a href=#-print-op-stats><code>-print-op-stats</code></a></li><li><a href=#-remove-dead-values><code>-remove-dead-values</code></a></li><li><a href=#-sccp><code>-sccp</code></a></li><li><a href=#-snapshot-op-locations><code>-snapshot-op-locations</code></a></li><li><a href=#-sroa><code>-sroa</code></a></li><li><a href=#-strip-debuginfo><code>-strip-debuginfo</code></a></li><li><a href=#-symbol-dce><code>-symbol-dce</code></a></li><li><a href=#-symbol-privatize><code>-symbol-privatize</code></a></li><li><a href=#-topological-sort><code>-topological-sort</code></a></li><li><a href=#-view-op-graph><code>-view-op-graph</code></a></li></ul></li><li><a href=#bufferization-passes>Bufferization Passes</a><ul><li><a href=#-buffer-deallocation-simplification><code>-buffer-deallocation-simplification</code></a></li><li><a href=#-buffer-hoisting><code>-buffer-hoisting</code></a></li><li><a href=#-buffer-loop-hoisting><code>-buffer-loop-hoisting</code></a></li><li><a href=#-buffer-results-to-out-params><code>-buffer-results-to-out-params</code></a></li><li><a href=#-bufferization-lower-deallocations><code>-bufferization-lower-deallocations</code></a></li><li><a href=#-drop-equivalent-buffer-results><code>-drop-equivalent-buffer-results</code></a></li><li><a href=#-eliminate-empty-tensors><code>-eliminate-empty-tensors</code></a></li><li><a href=#-empty-tensor-to-alloc-tensor><code>-empty-tensor-to-alloc-tensor</code></a></li><li><a href=#-one-shot-bufferize><code>-one-shot-bufferize</code></a></li><li><a href=#-optimize-allocation-liveness><code>-optimize-allocation-liveness</code></a></li><li><a href=#-ownership-based-buffer-deallocation><code>-ownership-based-buffer-deallocation</code></a></li><li><a href=#-promote-buffers-to-stack><code>-promote-buffers-to-stack</code></a></li></ul></li><li><a href=#conversion-passes>Conversion Passes</a><ul><li><a href=#-arm-neon-2d-to-intr><code>-arm-neon-2d-to-intr</code></a></li><li><a href=#-convert-affine-for-to-gpu><code>-convert-affine-for-to-gpu</code></a></li><li><a href=#-convert-amdgpu-to-rocdl><code>-convert-amdgpu-to-rocdl</code></a></li><li><a href=#-convert-arith-to-amdgpu><code>-convert-arith-to-amdgpu</code></a></li><li><a href=#-convert-arith-to-arm-sme><code>-convert-arith-to-arm-sme</code></a></li><li><a href=#-convert-arith-to-emitc><code>-convert-arith-to-emitc</code></a></li><li><a href=#-convert-arith-to-llvm><code>-convert-arith-to-llvm</code></a></li><li><a href=#-convert-arith-to-spirv><code>-convert-arith-to-spirv</code></a></li><li><a href=#-convert-arm-sme-to-llvm><code>-convert-arm-sme-to-llvm</code></a></li><li><a href=#-convert-arm-sme-to-scf><code>-convert-arm-sme-to-scf</code></a></li><li><a href=#-convert-async-to-llvm><code>-convert-async-to-llvm</code></a></li><li><a href=#-convert-bufferization-to-memref><code>-convert-bufferization-to-memref</code></a></li><li><a href=#-convert-cf-to-llvm><code>-convert-cf-to-llvm</code></a></li><li><a href=#-convert-cf-to-spirv><code>-convert-cf-to-spirv</code></a></li><li><a href=#-convert-complex-to-libm><code>-convert-complex-to-libm</code></a></li><li><a href=#-convert-complex-to-llvm><code>-convert-complex-to-llvm</code></a></li><li><a href=#-convert-complex-to-spirv><code>-convert-complex-to-spirv</code></a></li><li><a href=#-convert-complex-to-standard><code>-convert-complex-to-standard</code></a></li><li><a href=#-convert-func-to-emitc><code>-convert-func-to-emitc</code></a></li><li><a href=#-convert-func-to-llvm><code>-convert-func-to-llvm</code></a></li><li><a href=#-convert-func-to-spirv><code>-convert-func-to-spirv</code></a></li><li><a href=#-convert-gpu-to-llvm-spv><code>-convert-gpu-to-llvm-spv</code></a></li><li><a href=#-convert-gpu-to-nvvm><code>-convert-gpu-to-nvvm</code></a></li><li><a href=#-convert-gpu-to-rocdl><code>-convert-gpu-to-rocdl</code></a></li><li><a href=#-convert-gpu-to-spirv><code>-convert-gpu-to-spirv</code></a></li><li><a href=#-convert-index-to-llvm><code>-convert-index-to-llvm</code></a></li><li><a href=#-convert-index-to-spirv><code>-convert-index-to-spirv</code></a></li><li><a href=#-convert-linalg-to-std><code>-convert-linalg-to-std</code></a></li><li><a href=#-convert-math-to-emitc><code>-convert-math-to-emitc</code></a></li><li><a href=#-convert-math-to-funcs><code>-convert-math-to-funcs</code></a></li><li><a href=#-convert-math-to-libm><code>-convert-math-to-libm</code></a></li><li><a href=#-convert-math-to-llvm><code>-convert-math-to-llvm</code></a></li><li><a href=#-convert-math-to-rocdl><code>-convert-math-to-rocdl</code></a></li><li><a href=#-convert-math-to-spirv><code>-convert-math-to-spirv</code></a></li><li><a href=#-convert-memref-to-emitc><code>-convert-memref-to-emitc</code></a></li><li><a href=#-convert-memref-to-spirv><code>-convert-memref-to-spirv</code></a></li><li><a href=#-convert-mesh-to-mpi><code>-convert-mesh-to-mpi</code></a></li><li><a href=#-convert-nvgpu-to-nvvm><code>-convert-nvgpu-to-nvvm</code></a></li><li><a href=#-convert-nvvm-to-llvm><code>-convert-nvvm-to-llvm</code></a></li><li><a href=#-convert-openacc-to-scf><code>-convert-openacc-to-scf</code></a></li><li><a href=#-convert-openmp-to-llvm><code>-convert-openmp-to-llvm</code></a></li><li><a href=#-convert-parallel-loops-to-gpu><code>-convert-parallel-loops-to-gpu</code></a></li><li><a href=#-convert-pdl-to-pdl-interp><code>-convert-pdl-to-pdl-interp</code></a></li><li><a href=#-convert-scf-to-cf><code>-convert-scf-to-cf</code></a></li><li><a href=#-convert-scf-to-emitc><code>-convert-scf-to-emitc</code></a></li><li><a href=#-convert-scf-to-openmp><code>-convert-scf-to-openmp</code></a></li><li><a href=#-convert-scf-to-spirv><code>-convert-scf-to-spirv</code></a></li><li><a href=#-convert-shape-constraints><code>-convert-shape-constraints</code></a></li><li><a href=#-convert-shape-to-std><code>-convert-shape-to-std</code></a></li><li><a href=#-convert-spirv-to-llvm><code>-convert-spirv-to-llvm</code></a></li><li><a href=#-convert-tensor-to-linalg><code>-convert-tensor-to-linalg</code></a></li><li><a href=#-convert-tensor-to-spirv><code>-convert-tensor-to-spirv</code></a></li><li><a href=#-convert-to-llvm><code>-convert-to-llvm</code></a></li><li><a href=#-convert-ub-to-llvm><code>-convert-ub-to-llvm</code></a></li><li><a href=#-convert-ub-to-spirv><code>-convert-ub-to-spirv</code></a></li><li><a href=#-convert-vector-to-arm-sme><code>-convert-vector-to-arm-sme</code></a></li><li><a href=#-convert-vector-to-gpu><code>-convert-vector-to-gpu</code></a></li><li><a href=#-convert-vector-to-llvm><code>-convert-vector-to-llvm</code></a></li><li><a href=#-convert-vector-to-scf><code>-convert-vector-to-scf</code></a></li><li><a href=#-convert-vector-to-spirv><code>-convert-vector-to-spirv</code></a></li><li><a href=#-convert-vector-to-xegpu><code>-convert-vector-to-xegpu</code></a></li><li><a href=#-finalize-memref-to-llvm><code>-finalize-memref-to-llvm</code></a></li><li><a href=#-gpu-to-llvm><code>-gpu-to-llvm</code></a></li><li><a href=#-lift-cf-to-scf><code>-lift-cf-to-scf</code></a></li><li><a href=#-lower-affine><code>-lower-affine</code></a></li><li><a href=#-lower-host-to-llvm><code>-lower-host-to-llvm</code></a></li><li><a href=#-map-memref-spirv-storage-class><code>-map-memref-spirv-storage-class</code></a></li><li><a href=#-reconcile-unrealized-casts><code>-reconcile-unrealized-casts</code></a></li><li><a href=#-set-llvm-module-datalayout><code>-set-llvm-module-datalayout</code></a></li><li><a href=#-tosa-to-arith><code>-tosa-to-arith</code></a></li><li><a href=#-tosa-to-linalg><code>-tosa-to-linalg</code></a></li><li><a href=#-tosa-to-linalg-named><code>-tosa-to-linalg-named</code></a></li><li><a href=#-tosa-to-mlprogram><code>-tosa-to-mlprogram</code></a></li><li><a href=#-tosa-to-scf><code>-tosa-to-scf</code></a></li><li><a href=#-tosa-to-tensor><code>-tosa-to-tensor</code></a></li></ul></li><li><a href=#acc-dialect-passes>‘acc’ Dialect Passes</a><ul><li><a href=#-openacc-legalize-data-values><code>-openacc-legalize-data-values</code></a></li></ul></li><li><a href=#affine-dialect-passes>‘affine’ Dialect Passes</a><ul><li><a href=#-affine-data-copy-generate><code>-affine-data-copy-generate</code></a></li><li><a href=#-affine-expand-index-ops><code>-affine-expand-index-ops</code></a></li><li><a href=#-affine-expand-index-ops-as-affine><code>-affine-expand-index-ops-as-affine</code></a></li><li><a href=#-affine-loop-coalescing><code>-affine-loop-coalescing</code></a></li><li><a href=#-affine-loop-fusion><code>-affine-loop-fusion</code></a></li><li><a href=#-affine-loop-invariant-code-motion><code>-affine-loop-invariant-code-motion</code></a></li><li><a href=#-affine-loop-normalize><code>-affine-loop-normalize</code></a></li><li><a href=#-affine-loop-tile><code>-affine-loop-tile</code></a></li><li><a href=#-affine-loop-unroll><code>-affine-loop-unroll</code></a></li><li><a href=#-affine-loop-unroll-jam><code>-affine-loop-unroll-jam</code></a></li><li><a href=#-affine-parallelize><code>-affine-parallelize</code></a></li><li><a href=#-affine-pipeline-data-transfer><code>-affine-pipeline-data-transfer</code></a></li><li><a href=#-affine-scalrep><code>-affine-scalrep</code></a></li><li><a href=#-affine-simplify-structures><code>-affine-simplify-structures</code></a></li><li><a href=#-affine-super-vectorize><code>-affine-super-vectorize</code></a></li></ul></li><li><a href=#amdgpu-dialect-passes>‘amdgpu’ Dialect Passes</a><ul><li><a href=#-amdgpu-emulate-atomics><code>-amdgpu-emulate-atomics</code></a></li><li><a href=#-amdgpu-resolve-strided-metadata><code>-amdgpu-resolve-strided-metadata</code></a></li><li><a href=#-amdgpu-transfer-read-to-load><code>-amdgpu-transfer-read-to-load</code></a></li></ul></li><li><a href=#arith-dialect-passes>‘arith’ Dialect Passes</a><ul><li><a href=#-arith-emulate-unsupported-floats><code>-arith-emulate-unsupported-floats</code></a></li><li><a href=#-arith-emulate-wide-int><code>-arith-emulate-wide-int</code></a></li><li><a href=#-arith-expand><code>-arith-expand</code></a></li><li><a href=#-arith-int-range-narrowing><code>-arith-int-range-narrowing</code></a></li><li><a href=#-arith-unsigned-when-equivalent><code>-arith-unsigned-when-equivalent</code></a></li><li><a href=#-int-range-optimizations><code>-int-range-optimizations</code></a></li></ul></li><li><a href=#arm_sme-dialect-passes>‘arm_sme’ Dialect Passes</a><ul><li><a href=#-arm-sme-outer-product-fusion><code>-arm-sme-outer-product-fusion</code></a></li><li><a href=#-arm-sme-vector-legalization><code>-arm-sme-vector-legalization</code></a></li><li><a href=#-enable-arm-streaming><code>-enable-arm-streaming</code></a></li><li><a href=#-test-arm-sme-tile-allocation><code>-test-arm-sme-tile-allocation</code></a></li></ul></li><li><a href=#arm_sve-dialect-passes>‘arm_sve’ Dialect Passes</a><ul><li><a href=#-arm-sve-legalize-vector-storage><code>-arm-sve-legalize-vector-storage</code></a></li></ul></li><li><a href=#async-dialect-passes>‘async’ Dialect Passes</a><ul><li><a href=#-async-func-to-async-runtime><code>-async-func-to-async-runtime</code></a></li><li><a href=#-async-parallel-for><code>-async-parallel-for</code></a></li><li><a href=#-async-runtime-policy-based-ref-counting><code>-async-runtime-policy-based-ref-counting</code></a></li><li><a href=#-async-runtime-ref-counting><code>-async-runtime-ref-counting</code></a></li><li><a href=#-async-runtime-ref-counting-opt><code>-async-runtime-ref-counting-opt</code></a></li><li><a href=#-async-to-async-runtime><code>-async-to-async-runtime</code></a></li></ul></li><li><a href=#emitc-dialect-passes>’emitc’ Dialect Passes</a><ul><li><a href=#-form-expressions><code>-form-expressions</code></a></li></ul></li><li><a href=#func-dialect-passes>‘func’ Dialect Passes</a><ul><li><a href=#-duplicate-function-elimination><code>-duplicate-function-elimination</code></a></li></ul></li><li><a href=#gpu-dialect-passes>‘gpu’ Dialect Passes</a><ul><li><a href=#-gpu-async-region><code>-gpu-async-region</code></a></li><li><a href=#-gpu-decompose-memrefs><code>-gpu-decompose-memrefs</code></a></li><li><a href=#-gpu-eliminate-barriers><code>-gpu-eliminate-barriers</code></a></li><li><a href=#-gpu-kernel-outlining><code>-gpu-kernel-outlining</code></a></li><li><a href=#-gpu-launch-sink-index-computations><code>-gpu-launch-sink-index-computations</code></a></li><li><a href=#-gpu-map-parallel-loops><code>-gpu-map-parallel-loops</code></a></li><li><a href=#-gpu-module-to-binary><code>-gpu-module-to-binary</code></a></li><li><a href=#-nvvm-attach-target><code>-nvvm-attach-target</code></a></li><li><a href=#-rocdl-attach-target><code>-rocdl-attach-target</code></a></li><li><a href=#-spirv-attach-target><code>-spirv-attach-target</code></a></li></ul></li><li><a href=#linalg-dialect-passes>’linalg’ Dialect Passes</a><ul><li><a href=#-convert-elementwise-to-linalg><code>-convert-elementwise-to-linalg</code></a></li><li><a href=#-convert-linalg-to-affine-loops><code>-convert-linalg-to-affine-loops</code></a></li><li><a href=#-convert-linalg-to-loops><code>-convert-linalg-to-loops</code></a></li><li><a href=#-convert-linalg-to-parallel-loops><code>-convert-linalg-to-parallel-loops</code></a></li><li><a href=#-linalg-block-pack-matmul><code>-linalg-block-pack-matmul</code></a></li><li><a href=#-linalg-detensorize><code>-linalg-detensorize</code></a></li><li><a href=#-linalg-fold-into-elementwise><code>-linalg-fold-into-elementwise</code></a></li><li><a href=#-linalg-fold-unit-extent-dims><code>-linalg-fold-unit-extent-dims</code></a></li><li><a href=#-linalg-fuse-elementwise-ops><code>-linalg-fuse-elementwise-ops</code></a></li><li><a href=#-linalg-generalize-named-ops><code>-linalg-generalize-named-ops</code></a></li><li><a href=#-linalg-inline-scalar-operands><code>-linalg-inline-scalar-operands</code></a></li><li><a href=#-linalg-named-op-conversion><code>-linalg-named-op-conversion</code></a></li><li><a href=#-linalg-specialize-generic-ops><code>-linalg-specialize-generic-ops</code></a></li></ul></li><li><a href=#llvm-dialect-passes>’llvm’ Dialect Passes</a><ul><li><a href=#-ensure-debug-info-scope-on-llvm-func><code>-ensure-debug-info-scope-on-llvm-func</code></a></li><li><a href=#-llvm-add-comdats><code>-llvm-add-comdats</code></a></li><li><a href=#-llvm-legalize-for-export><code>-llvm-legalize-for-export</code></a></li><li><a href=#-llvm-optimize-for-nvvm-target><code>-llvm-optimize-for-nvvm-target</code></a></li><li><a href=#-llvm-request-c-wrappers><code>-llvm-request-c-wrappers</code></a></li></ul></li><li><a href=#math-dialect-passes>‘math’ Dialect Passes</a><ul><li><a href=#-math-extend-to-supported-types><code>-math-extend-to-supported-types</code></a></li><li><a href=#-math-uplift-to-fma><code>-math-uplift-to-fma</code></a></li></ul></li><li><a href=#memref-dialect-passes>‘memref’ Dialect Passes</a><ul><li><a href=#-expand-realloc><code>-expand-realloc</code></a></li><li><a href=#-expand-strided-metadata><code>-expand-strided-metadata</code></a></li><li><a href=#-fold-memref-alias-ops><code>-fold-memref-alias-ops</code></a></li><li><a href=#-memref-emulate-wide-int><code>-memref-emulate-wide-int</code></a></li><li><a href=#-memref-expand><code>-memref-expand</code></a></li><li><a href=#-normalize-memrefs><code>-normalize-memrefs</code></a></li><li><a href=#-resolve-ranked-shaped-type-result-dims><code>-resolve-ranked-shaped-type-result-dims</code></a></li><li><a href=#-resolve-shaped-type-result-dims><code>-resolve-shaped-type-result-dims</code></a></li></ul></li><li><a href=#mesh-dialect-passes>‘mesh’ Dialect Passes</a><ul><li><a href=#-mesh-spmdization><code>-mesh-spmdization</code></a></li><li><a href=#-sharding-propagation><code>-sharding-propagation</code></a></li></ul></li><li><a href=#ml_program-dialect-passes>‘ml_program’ Dialect Passes</a><ul><li><a href=#-mlprogram-pipeline-globals><code>-mlprogram-pipeline-globals</code></a></li></ul></li><li><a href=#nvgpu-dialect-passes>’nvgpu’ Dialect Passes</a><ul><li><a href=#-nvgpu-optimize-shared-memory><code>-nvgpu-optimize-shared-memory</code></a></li></ul></li><li><a href=#reducer-passes>Reducer Passes</a><ul><li><a href=#-opt-reduction-pass><code>-opt-reduction-pass</code></a></li><li><a href=#-reduction-tree><code>-reduction-tree</code></a></li></ul></li><li><a href=#scf-dialect-passes>‘scf’ Dialect Passes</a><ul><li><a href=#-scf-for-loop-canonicalization><code>-scf-for-loop-canonicalization</code></a></li><li><a href=#-scf-for-loop-peeling><code>-scf-for-loop-peeling</code></a></li><li><a href=#-scf-for-loop-range-folding><code>-scf-for-loop-range-folding</code></a></li><li><a href=#-scf-for-loop-specialization><code>-scf-for-loop-specialization</code></a></li><li><a href=#-scf-for-to-while><code>-scf-for-to-while</code></a></li><li><a href=#-scf-forall-to-for><code>-scf-forall-to-for</code></a></li><li><a href=#-scf-forall-to-parallel><code>-scf-forall-to-parallel</code></a></li><li><a href=#-scf-parallel-loop-fusion><code>-scf-parallel-loop-fusion</code></a></li><li><a href=#-scf-parallel-loop-specialization><code>-scf-parallel-loop-specialization</code></a></li><li><a href=#-scf-parallel-loop-tiling><code>-scf-parallel-loop-tiling</code></a></li><li><a href=#-test-scf-parallel-loop-collapsing><code>-test-scf-parallel-loop-collapsing</code></a></li></ul></li><li><a href=#shape-dialect-passes>‘shape’ Dialect Passes</a><ul><li><a href=#-outline-shape-computation><code>-outline-shape-computation</code></a></li><li><a href=#-remove-shape-constraints><code>-remove-shape-constraints</code></a></li><li><a href=#-shape-to-shape-lowering><code>-shape-to-shape-lowering</code></a></li></ul></li><li><a href=#sparse_tensor-dialect-passes>‘sparse_tensor’ Dialect Passes</a><ul><li><a href=#-lower-sparse-foreach-to-scf><code>-lower-sparse-foreach-to-scf</code></a></li><li><a href=#-lower-sparse-iteration-to-scf><code>-lower-sparse-iteration-to-scf</code></a></li><li><a href=#-lower-sparse-ops-to-foreach><code>-lower-sparse-ops-to-foreach</code></a></li><li><a href=#-pre-sparsification-rewrite><code>-pre-sparsification-rewrite</code></a></li><li><a href=#-sparse-assembler><code>-sparse-assembler</code></a></li><li><a href=#-sparse-buffer-rewrite><code>-sparse-buffer-rewrite</code></a></li><li><a href=#-sparse-gpu-codegen><code>-sparse-gpu-codegen</code></a></li><li><a href=#-sparse-reinterpret-map><code>-sparse-reinterpret-map</code></a></li><li><a href=#-sparse-space-collapse><code>-sparse-space-collapse</code></a></li><li><a href=#-sparse-storage-specifier-to-llvm><code>-sparse-storage-specifier-to-llvm</code></a></li><li><a href=#-sparse-tensor-codegen><code>-sparse-tensor-codegen</code></a></li><li><a href=#-sparse-tensor-conversion><code>-sparse-tensor-conversion</code></a></li><li><a href=#-sparse-vectorization><code>-sparse-vectorization</code></a></li><li><a href=#-sparsification><code>-sparsification</code></a></li><li><a href=#-sparsification-and-bufferization><code>-sparsification-and-bufferization</code></a></li><li><a href=#-stage-sparse-ops><code>-stage-sparse-ops</code></a></li></ul></li><li><a href=#spv-dialect-passes>‘spv’ Dialect Passes</a><ul><li><a href=#-decorate-spirv-composite-type-layout><code>-decorate-spirv-composite-type-layout</code></a></li><li><a href=#-spirv-canonicalize-gl><code>-spirv-canonicalize-gl</code></a></li><li><a href=#-spirv-lower-abi-attrs><code>-spirv-lower-abi-attrs</code></a></li><li><a href=#-spirv-rewrite-inserts><code>-spirv-rewrite-inserts</code></a></li><li><a href=#-spirv-unify-aliased-resource><code>-spirv-unify-aliased-resource</code></a></li><li><a href=#-spirv-update-vce><code>-spirv-update-vce</code></a></li><li><a href=#-spirv-webgpu-prepare><code>-spirv-webgpu-prepare</code></a></li></ul></li><li><a href=#tensor-dialect-passes>’tensor’ Dialect Passes</a><ul><li><a href=#-fold-tensor-subset-ops><code>-fold-tensor-subset-ops</code></a></li></ul></li><li><a href=#transform-dialect-passes>’transform’ Dialect Passes</a><ul><li><a href=#-transform-dialect-check-uses><code>-transform-dialect-check-uses</code></a></li><li><a href=#-transform-infer-effects><code>-transform-infer-effects</code></a></li><li><a href=#-transform-interpreter><code>-transform-interpreter</code></a></li><li><a href=#-transform-preload-library><code>-transform-preload-library</code></a></li></ul></li><li><a href=#vector-dialect-passes>‘vector’ Dialect Passes</a><ul><li><a href=#-lower-vector-mask><code>-lower-vector-mask</code></a></li><li><a href=#-lower-vector-multi-reduction><code>-lower-vector-multi-reduction</code></a></li></ul></li><li><a href=#tosa-dialect-passes>TOSA Dialect Passes</a><ul><li><a href=#-tosa-infer-shapes><code>-tosa-infer-shapes</code></a></li><li><a href=#-tosa-layerwise-constant-fold><code>-tosa-layerwise-constant-fold</code></a></li><li><a href=#-tosa-make-broadcastable><code>-tosa-make-broadcastable</code></a></li><li><a href=#-tosa-optional-decompositions><code>-tosa-optional-decompositions</code></a></li><li><a href=#-tosa-reduce-transposes><code>-tosa-reduce-transposes</code></a></li><li><a href=#-tosa-validate><code>-tosa-validate</code></a></li></ul></li><li><a href=#xegpu-dialect-passes>XeGPU Dialect Passes</a><ul><li><a href=#-xegpu-fold-alias-ops><code>-xegpu-fold-alias-ops</code></a></li><li><a href=#-xegpu-subgroup-distribute><code>-xegpu-subgroup-distribute</code></a></li></ul></li></ul></nav><h2 id=general-transformation-passes>General Transformation Passes <a class=headline-hash href=#general-transformation-passes>¶</a></h2><h3 id=-canonicalize><code>-canonicalize</code> <a class=headline-hash href=#-canonicalize>¶</a></h3>Canonicalize operationsThis pass performs various types of canonicalizations over a set of operations by iteratively applying the canonicalization patterns of all loaded dialects until either a fixpoint is reached or the maximum number of iterations/rewrites is exhausted. Canonicalization is best-effort and does not guarantee that the entire IR is in a canonical form after running this pass. See <a href=/docs/Canonicalization/>Operation Canonicalization</a> for more details.<h4 id=options>Options <a class=headline-hash href=#options>¶</a></h4><pre tabindex=0><code>-top-down : Seed the worklist in general top-down order -region-simplify : Perform control flow optimizations to the region tree -max-iterations : Max. iterations between applying patterns / simplifying regions -max-num-rewrites : Max. number of pattern rewrites within an iteration -test-convergence : Test only: Fail pass on non-convergence to detect cyclic pattern -disable-patterns : Labels of patterns that should be filtered out during application -enable-patterns : Labels of patterns that should be used during application, all other patterns are filtered out </code></pre><h3 id=-composite-fixed-point-pass><code>-composite-fixed-point-pass</code> <a class=headline-hash href=#-composite-fixed-point-pass>¶</a></h3>Composite fixed point passComposite pass runs provided set of passes until fixed point or maximum number of iterations reached.<h4 id=options-1>Options <a class=headline-hash href=#options-1>¶</a></h4><pre tabindex=0><code>-name : Composite pass display name -pipeline : Composite pass inner pipeline -max-iterations : Maximum number of iterations if inner pipeline </code></pre><h3 id=-control-flow-sink><code>-control-flow-sink</code> <a class=headline-hash href=#-control-flow-sink>¶</a></h3>Sink operations into conditional blocksThis pass implements control-flow sink on operations that implement <code>RegionBranchOpInterface</code> by moving dominating operations whose only uses are in a conditionally-executed regions into those regions so that executions paths where their results are not needed do not perform unnecessary computations.This is similar (but opposite) to loop-invariant code motion, which hoists operations out of regions executed more than once. The implementation of control-flow sink uses a simple and conversative cost model: operations are never duplicated and are only moved into singly-executed regions.It is recommended to run canonicalization first to remove unreachable blocks: ops in unreachable blocks may prevent other operations from being sunk as they may contain uses of their results<h4 id=statistics>Statistics <a class=headline-hash href=#statistics>¶</a></h4><pre tabindex=0><code>num-sunk : Number of operations sunk </code></pre><h3 id=-cse><code>-cse</code> <a class=headline-hash href=#-cse>¶</a></h3>Eliminate common sub-expressionsThis pass implements a generalized algorithm for common sub-expression elimination. This pass relies on information provided by the <code>Memory SideEffect</code> interface to identify when it is safe to eliminate operations. See <a href=https://en.wikipedia.org/wiki/Common_subexpression_elimination>Common subexpression elimination</a> for more general details on this optimization.<h4 id=statistics-1>Statistics <a class=headline-hash href=#statistics-1>¶</a></h4><pre tabindex=0><code>num-cse'd : Number of operations CSE'd num-dce'd : Number of operations DCE'd </code></pre><h3 id=-generate-runtime-verification><code>-generate-runtime-verification</code> <a class=headline-hash href=#-generate-runtime-verification>¶</a></h3>Generate additional runtime op verification checksThis pass generates op-specific runtime checks using the <code>RuntimeVerifiableOpInterface</code>. It can be run for debugging purposes after passes that are suspected to introduce faulty IR.<h3 id=-inline><code>-inline</code> <a class=headline-hash href=#-inline>¶</a></h3>Inline function calls<h4 id=options-2>Options <a class=headline-hash href=#options-2>¶</a></h4><pre tabindex=0><code>-default-pipeline : The optimizer pipeline used for callables that do not have a dedicated optimizer pipeline in opPipelineList -op-pipelines : Callable operation specific optimizer pipelines (in the form of `dialect.op(pipeline)`) -max-iterations : Maximum number of iterations when inlining within an SCC -inlining-threshold : If the ratio between the number of the operations in the callee and the number of the operations in the caller exceeds this value (in percentage), then the callee is not inlined even if it is legal to inline it </code></pre><h3 id=-loop-invariant-code-motion><code>-loop-invariant-code-motion</code> <a class=headline-hash href=#-loop-invariant-code-motion>¶</a></h3>Hoist loop invariant instructions outside of the loop<h3 id=-loop-invariant-subset-hoisting><code>-loop-invariant-subset-hoisting</code> <a class=headline-hash href=#-loop-invariant-subset-hoisting>¶</a></h3>Hoist loop invariant subset ops outside of the loop<h3 id=-mem2reg><code>-mem2reg</code> <a class=headline-hash href=#-mem2reg>¶</a></h3>Promotes memory slots into values.This pass removes loads out of and stores into a memory slot, and turns them into direct uses of SSA values. This is done generically using the <code>PromoteAllocationOpInterface</code>, <code>PromoteOpInterface</code> and <code>PromoteMemOpInterface</code> interfaces.This pass will attempt to compute which definitions of the content of the memory slot reach operations that use the memory slot pointer. It will rewire or remove operations that use the slot pointer so they no longer use it. If any of this is not possible, the IR will be left without mutation.This pass only supports unstructured control-flow. Promotion of operations within subregions will not happen.<h4 id=options-3>Options <a class=headline-hash href=#options-3>¶</a></h4><pre tabindex=0><code>-region-simplify : Perform control flow optimizations to the region tree </code></pre><h4 id=statistics-2>Statistics <a class=headline-hash href=#statistics-2>¶</a></h4><pre tabindex=0><code>promoted slots : Total amount of memory slot promoted new block args : Total amount of new block argument inserted in blocks </code></pre><h3 id=-print-ir><code>-print-ir</code> <a class=headline-hash href=#-print-ir>¶</a></h3>Print IR on the debug streamPrint the entire IR on the debug stream. This is meant for debugging purposes to inspect the IR at a specific point in the pipeline.<h4 id=options-4>Options <a class=headline-hash href=#options-4>¶</a></h4><pre tabindex=0><code>-label : Label </code></pre><h3 id=-print-op-stats><code>-print-op-stats</code> <a class=headline-hash href=#-print-op-stats>¶</a></h3>Print statistics of operations<h4 id=options-5>Options <a class=headline-hash href=#options-5>¶</a></h4><pre tabindex=0><code>-json : print the stats as JSON </code></pre><h3 id=-remove-dead-values><code>-remove-dead-values</code> <a class=headline-hash href=#-remove-dead-values>¶</a></h3>Remove dead valuesThe goal of this pass is optimization (reducing runtime) by removing unnecessary instructions. Unlike other passes that rely on local information gathered from patterns to accomplish optimization, this pass uses a full analysis of the IR, specifically, liveness analysis, and is thus more powerful.Currently, this pass performs the following optimizations: (A) Removes function arguments that are not live, (B) Removes function return values that are not live across all callers of the function, (C) Removes unneccesary operands, results, region arguments, and region terminator operands of region branch ops, and, (D) Removes simple and region branch ops that have all non-live results and don’t affect memory in any way,iffthe IR doesn’t have any non-function symbol ops, non-call symbol user ops and branch ops.Here, a “simple op” refers to an op that isn’t a symbol op, symbol-user op, region branch op, branch op, region branch terminator op, or return-like.It is noteworthy that we do not refer to non-live values as “dead” in this file to avoid confusing it with dead code analysis’s “dead”, which refers to unreachable code (code that never executes on hardware) while “non-live” refers to code that executes on hardware but is unnecessary. Thus, while the removal of dead code helps little in reducing runtime, removing non-live values should theoretically have significant impact (depending on the amount removed).It is also important to note that unlike other passes (like <code>canonicalize</code>) that apply op-specific optimizations through patterns, this pass uses different interfaces to handle various types of ops and tries to cover all existing ops through these interfaces.It is because of its reliance on (a) liveness analysis and (b) interfaces that makes it so powerful that it can optimize ops that don’t have a canonicalizer and even when an op does have a canonicalizer, it can perform more aggressive optimizations, as observed in the test files associated with this pass.Example of optimization (A):-<pre tabindex=0><code>int add_2_to_y(int x, int y) { return 2 + y } print(add_2_to_y(3, 4)) print(add_2_to_y(5, 6)) </code></pre>becomes<pre tabindex=0><code>int add_2_to_y(int y) { return 2 + y } print(add_2_to_y(4)) print(add_2_to_y(6)) </code></pre>Example of optimization (B):-<pre tabindex=0><code>int, int get_incremented_values(int y) { store y somewhere in memory return y + 1, y + 2 } y1, y2 = get_incremented_values(4) y3, y4 = get_incremented_values(6) print(y2) </code></pre>becomes<pre tabindex=0><code>int get_incremented_values(int y) { store y somewhere in memory return y + 2 } y2 = get_incremented_values(4) y4 = get_incremented_values(6) print(y2) </code></pre>Example of optimization (C):-Assume only <code>%result1</code> is live here. Then,<pre tabindex=0><code>%result1, %result2, %result3 = scf.while (%arg1 = %operand1, %arg2 = %operand2) { %terminator_operand2 = add %arg2, %arg2 %terminator_operand3 = mul %arg2, %arg2 %terminator_operand4 = add %arg1, %arg1 scf.condition(%terminator_operand1) %terminator_operand2, %terminator_operand3, %terminator_operand4 } do { ^bb0(%arg3, %arg4, %arg5): %terminator_operand6 = add %arg4, %arg4 %terminator_operand5 = add %arg5, %arg5 scf.yield %terminator_operand5, %terminator_operand6 } </code></pre>becomes<pre tabindex=0><code>%result1, %result2 = scf.while (%arg2 = %operand2) { %terminator_operand2 = add %arg2, %arg2 %terminator_operand3 = mul %arg2, %arg2 scf.condition(%terminator_operand1) %terminator_operand2, %terminator_operand3 } do { ^bb0(%arg3, %arg4): %terminator_operand6 = add %arg4, %arg4 scf.yield %terminator_operand6 } </code></pre>It is interesting to see that <code>%result2</code> won’t be removed even though it is not live because <code>%terminator_operand3</code> forwards to it and cannot be removed. And, that is because it also forwards to <code>%arg4</code>, which is live.Example of optimization (D):-<pre tabindex=0><code>int square_and_double_of_y(int y) { square = y ^ 2 double = y * 2 return square, double } sq, do = square_and_double_of_y(5) print(do) </code></pre>becomes<pre tabindex=0><code>int square_and_double_of_y(int y) { double = y * 2 return double } do = square_and_double_of_y(5) print(do) </code></pre><h3 id=-sccp><code>-sccp</code> <a class=headline-hash href=#-sccp>¶</a></h3>Sparse Conditional Constant PropagationThis pass implements a general algorithm for sparse conditional constant propagation. This algorithm detects values that are known to be constant and optimistically propagates this throughout the IR. Any values proven to be constant are replaced, and removed if possible.This implementation is based on the algorithm described by Wegman and Zadeck in <a href=https://dl.acm.org/doi/10.1145/103135.103136>“Constant Propagation with Conditional Branches”</a> (1991).<h3 id=-snapshot-op-locations><code>-snapshot-op-locations</code> <a class=headline-hash href=#-snapshot-op-locations>¶</a></h3>Generate new locations from the current IRThis pass allows for generating new locations from the IR during any stage of compilation, by snapshotting the IR to a file and using that file to generate new locations for the operations.Depending on the value of the <code>tag</code> option, different resulting locations may be generated:<ul><li>If unset, the original location of the operation is replaced.</li></ul>Example:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>// old: ... loc("original_source.cpp":1:1) // new: ... loc("snapshot_source.mlir":10:10) </code></pre></div><ul><li>If set, the new location is fused with the original location in the form of a <a href=/docs/Dialects/Builtin/#nameloc><code>Name Location</code></a> with the specified tag.</li></ul>Example:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>// old: ... loc("original_source.cpp":1:1) // new: ... loc(fused["original_source.cpp":1:1, "snapshot"("snapshot_source.mlir":10:10)]) </code></pre></div><h4 id=options-6>Options <a class=headline-hash href=#options-6>¶</a></h4><pre tabindex=0><code>-filename : The filename to print the generated IR -tag : A tag to use when fusing the new locations with the original. If unset, the locations are replaced. -print-debuginfo : Print debug info in MLIR output -print-op-generic : Print the generic op form -print-local-scope : Print with local scope and inline information (eliding aliases for attributes, types, and locations -pretty-debuginfo : Print pretty debug info in MLIR output </code></pre><h3 id=-sroa><code>-sroa</code> <a class=headline-hash href=#-sroa>¶</a></h3>Scalar Replacement of AggregatesScalar Replacement of Aggregates. Replaces allocations of aggregates into independant allocations of its elements.Allocators must implement <code>DestructurableAllocationOpInterface</code> to provide the list of memory slots for which destructuring should be attempted.This pass will only be applied if all accessors of the aggregate implement the <code>DestructurableAccessorOpInterface</code>. If the accessors provide a view into the struct, users of the view must ensure it is used in a type-safe manner and within bounds by implementing <code>TypeSafeOpInterface</code>.<h4 id=statistics-3>Statistics <a class=headline-hash href=#statistics-3>¶</a></h4><pre tabindex=0><code>destructured slots : Total amount of memory slots destructured slots with memory benefit : Total amount of memory slots in which the destructured size was smaller than the total size after eliminating unused fields max subelement number : Maximal number of sub-elements a successfully destructured slot initially had </code></pre><h3 id=-strip-debuginfo><code>-strip-debuginfo</code> <a class=headline-hash href=#-strip-debuginfo>¶</a></h3>Strip debug info from all operationsThis pass strips the IR of any location information, by replacing all operation locations with <a href=/docs/Dialects/Builtin/#unknownloc><code>unknown</code></a>.<h3 id=-symbol-dce><code>-symbol-dce</code> <a class=headline-hash href=#-symbol-dce>¶</a></h3>Eliminate dead symbolsThis pass deletes all symbols that are found to be unreachable. This is done by computing the set of operations that are known to be live, propagating that liveness to other symbols, and then deleting all symbols that are not within this live set. Live symbols are those that have a <a href=/docs/SymbolsAndSymbolTables/#symbol-visibility>visibility</a> that extends beyond the IR, e.g. <code>public</code>, or those that are referenced by live symbols or other non-Symbol operations.For example, consider the following input:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>func.func private @dead_private_function() func.func private @live_private_function() // Note: The `public` isn't necessary here, as this is the default. func.func public @public_function() { "foo.return"() {uses = [@live_private_function]} : () -> () } </code></pre></div>A known live function, <code>public_function</code>, contains a reference to an otherwise non-live function <code>live_private_function</code>. After running <code>symbol-dce</code>, only these two symbols should remain, as the final symbol <code>dead_private_function</code> is not visible outside of the current IR and there are no links to known-live operations. After running, we get the expected:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>func.func private @live_private_function() func.func public @public_function() { "foo.return"() {uses = [@live_private_function]} : () -> () } </code></pre></div>See <a href=/docs/SymbolsAndSymbolTables/>Symbols and SymbolTables</a> for more information on <code>Symbols</code>.<h4 id=statistics-4>Statistics <a class=headline-hash href=#statistics-4>¶</a></h4><pre tabindex=0><code>num-dce'd : Number of symbols DCE'd </code></pre><h3 id=-symbol-privatize><code>-symbol-privatize</code> <a class=headline-hash href=#-symbol-privatize>¶</a></h3>Mark symbols privateThis pass marks all top-level symbols of the operation run as <code>private</code> except if listed in <code>exclude</code> pass option.<h4 id=options-7>Options <a class=headline-hash href=#options-7>¶</a></h4><pre tabindex=0><code>-exclude : Comma separated list of symbols that should not be marked private </code></pre><h3 id=-topological-sort><code>-topological-sort</code> <a class=headline-hash href=#-topological-sort>¶</a></h3>Sort regions without SSA dominance in topological orderRecursively sorts all nested regions without SSA dominance in topological order. The main purpose is readability, as well as potentially processing of certain transformations and analyses. The function sorts the operations in all nested regions such that, as much as possible, all users appear after their producers.This sort is stable. If the block is already topologically sorted, the IR is not changed. Operations that form a cycle are moved to the end of the regions in a stable order.<h3 id=-view-op-graph><code>-view-op-graph</code> <a class=headline-hash href=#-view-op-graph>¶</a></h3>Print Graphviz visualization of an operationThis pass prints a Graphviz graph of a module.<ul><li>Operations are represented as nodes;</li><li>Uses (data flow) as edges;</li><li>Control flow as dashed edges;</li><li>Regions/blocks as subgraphs.</li></ul>By default, only data flow edges are printed.Note: See <a href=https://www.graphviz.org/doc/info/lang.html>https://www.graphviz.org/doc/info/lang.html</a> for more information about the Graphviz DOT language.<h4 id=options-8>Options <a class=headline-hash href=#options-8>¶</a></h4><pre tabindex=0><code>-max-label-len : Limit attribute/type length to number of chars -print-attrs : Print attributes of operations -print-control-flow-edges : Print control flow edges -print-data-flow-edges : Print data flow edges -print-result-types : Print result types of operations </code></pre><h2 id=bufferization-passes>Bufferization Passes <a class=headline-hash href=#bufferization-passes>¶</a></h2><h3 id=-buffer-deallocation-simplification><code>-buffer-deallocation-simplification</code> <a class=headline-hash href=#-buffer-deallocation-simplification>¶</a></h3>Optimizes <code>bufferization.dealloc</code> operation for more efficient codegenThis pass uses static alias analysis to reduce the number of alias checks required at runtime. Such checks are sometimes necessary to make sure that memrefs aren’t deallocated before their last usage (use after free) or that some memref isn’t deallocated twice (double free).<h3 id=-buffer-hoisting><code>-buffer-hoisting</code> <a class=headline-hash href=#-buffer-hoisting>¶</a></h3>Optimizes placement of allocation operations by moving them into common dominators and out of nested regionsThis pass implements an approach to aggressively move allocations upwards into common dominators and out of nested regions.<h3 id=-buffer-loop-hoisting><code>-buffer-loop-hoisting</code> <a class=headline-hash href=#-buffer-loop-hoisting>¶</a></h3>Optimizes placement of allocation operations by moving them out of loop nestsThis pass implements an approach to aggressively move allocations upwards out of loop nests. It does not move allocations into common dominators.<h3 id=-buffer-results-to-out-params><code>-buffer-results-to-out-params</code> <a class=headline-hash href=#-buffer-results-to-out-params>¶</a></h3>Converts memref-typed function results to out-paramsSome calling conventions prefer to pass output memrefs as “out params”. The conversion to this calling convention must be done as an atomic transformation of the entire program (hence this is a module pass).For example, if a call is rewritten, the callee needs to be rewritten otherwise the IR will end up invalid. Thus, this transformation require an atomic change to the entire program (e.g. the whole module).This pass is expected to run immediately after bufferization is finished. At that point, tensor-typed results will have been converted to memref-typed results, and can be consistently converted to out params.All memref-typed results are appended to the function argument list.The main issue with this pass (and the out-param calling convention) is that buffers for results need to be allocated in the caller. This currently only works for static shaped memrefs.If the hoist-static-allocs option is on, the pass tries to eliminate the allocation for the returned memref and avoid the memory-copy if possible. This optimization applies on the returned memref which has static shape and is allocated by memref.alloc in the function. It will use the memref given in function argument to replace the allocated memref.<h4 id=options-9>Options <a class=headline-hash href=#options-9>¶</a></h4><pre tabindex=0><code>-add-result-attr : Add the attribute 'bufferize.result' to all output parameters. -hoist-static-allocs : Hoist static allocations to call sites. </code></pre><h3 id=-bufferization-lower-deallocations><code>-bufferization-lower-deallocations</code> <a class=headline-hash href=#-bufferization-lower-deallocations>¶</a></h3>Lowers <code>bufferization.dealloc</code> operations to <code>memref.dealloc</code>operationsThis pass lowers <code>bufferization.dealloc</code> operations to the <code>memref</code> dialect. It can be applied to a <code>builtin.module</code> or operations implementing the <code>FunctionOpInterface</code>. For the latter, only simple <code>dealloc</code> operations can be lowered because the library function necessary for the fully generic lowering cannot be inserted. In this case, an error will be emitted. Next to <code>memref.dealloc</code> operations, it may also emit operations from the <code>arith</code>, <code>scf</code>, and <code>func</code> dialects to build conditional deallocations and library functions to avoid code-size blow-up.<h3 id=-drop-equivalent-buffer-results><code>-drop-equivalent-buffer-results</code> <a class=headline-hash href=#-drop-equivalent-buffer-results>¶</a></h3>Remove MemRef return values that are equivalent to a bbArgThis pass removes MemRef return values from functions if they are equivalent to a function bbArg. In that case, the return value is redundant and the respective CallOp operand can be used at the call site.Note: If a bbArg buffer is not returned directly but casted to beforehand, the buffer is still considered equivalent.<h3 id=-eliminate-empty-tensors><code>-eliminate-empty-tensors</code> <a class=headline-hash href=#-eliminate-empty-tensors>¶</a></h3>Try to eliminate all tensor.empty ops.Try to eliminate “tensor.empty” ops inside <code>op</code>. This transformation looks for subset ops that insert a tensor that originates from a “tensor.empty” (as per the reverse use-def chain). Such “tensor.empty” ops are replaced with the destination subset.E.g.:<pre tabindex=0><code>%0 = tensor.empty() : tensor<10xf32> %1 = linalg.fill ... outs(%0 : tensor<10xf32>) %2 = tensor.insert_slice %1 into %t ... </code></pre>In the above example, the subset op is “tensor.insert_slice”. When tracing back the reverse use-def chain of a the source, we end up at a “tensor.empty” op. The “tensor.empty” op is replaced with a “tensor.extract_slice” op.<h3 id=-empty-tensor-to-alloc-tensor><code>-empty-tensor-to-alloc-tensor</code> <a class=headline-hash href=#-empty-tensor-to-alloc-tensor>¶</a></h3>Replace all empty ops by alloc_tensor ops.tensor.empty ops return a tensor of unspecified contents who’s only purpose is to carry the tensor shape. This pass converts such ops to bufferization.alloc_tensor ops, which bufferize to buffer allocations.<h3 id=-one-shot-bufferize><code>-one-shot-bufferize</code> <a class=headline-hash href=#-one-shot-bufferize>¶</a></h3>One-Shot BufferizeThis pass bufferizes all ops that implement <code>BufferizableOpInterface</code>. It first performs an inplacability analysis on SSA use-def chains of tensor values to determine which OpOperands may bufferize in-place, i.e., without inserting a buffer copy. It then rewrites the IR, inserting a buffer allocation and copy for each OpOperand that was decided to bufferize out-of-place.One-Shot Bufferize (and <code>BufferizableOpInterface</code>) was designed for ops that are in destination-passing style. When bufferizing such ops, it is possible to reuse the buffer of a tensor OpOperand for a tensor OpResult. In essence, a possible destination of an operation is already passed as an SSA value.<code>tensor.insert</code> is an example for an op in destination-passing style. E.g., when bufferizing <code>%t0 = tensor.insert %f into %dest[%idx]</code>, <code>buffer(%t0)</code> is identical to <code>buffer(%dest)</code> in the absence of RaW conflicts. As a counter example, <code>tensor.generate</code> is not in destination-passing style and always results in a new buffer allocation.One-Shot Bufferize does not deallocate any buffers that it allocates. The <code>-buffer-deallocation-pipeline</code> pipeline should be run after One-Shot Bufferize to insert the deallocation operations necessary to eliminate memory leaks.One-Shot Bufferize will by default reject IR that contains non-bufferizable op, i.e., ops that do not implemement BufferizableOpInterface. Such IR can be allowed with <code>allow-unknown-ops=1</code>. In that case, to_memref and to_tensor ops will be generated at the bufferization boundary. This is useful for compatibility with existing partial bufferization passes: These can bufferize the remaining IR after running One-Shot Bufferize.Note: Running One-Shot Bufferize after a partial bufferization pass is currently not supported. Running partial bufferization passes after running One-Shot Bufferize is supported and the recommended way to gradually migrate from partial bufferization to One-Shot Bufferize.With <code>dialect-filter</code>, bufferization can be restricted to a set of dialects. If no filter is specified, all ops that implement <code>BufferizableOpInterface</code> are bufferized. Ops from the <code>std</code> dialect are an exception: These ops are always ignored, even if no filter is specified. When specifying a dialect filter and <code>allow-unknown-ops</code> is not turned on, bufferization would fail when encountering an op that is not included in the filter (even if it is bufferizable).One-Shot Bufferize will by default assume memref types with fully dynamic layout maps when a precise layout cannot be inferred. E.g., this is the case when wrapping a non-bufferizable op in to_memref/to_tensor ops. This behavior can be overridden with <code>unknown-type-conversion</code>. Valid values are <code>fully-dynamic-layout-map</code> and <code>identity-layout-map</code>.For testing/debugging purposes, <code>test-analysis-only=1 print-conflicts=1</code> prints analysis results and explains why an OpOperand was decided to bufferize out-of-place. This is useful for understanding why One-Shot Bufferize chose to insert a certain buffer copy.<code>bufferize-function-boundaries</code> is an experimental flag for bufferizing <code>FuncOp</code>, <code>ReturnOp</code> and <code>CallOp</code>. This feature is still under development and supports only simple cases at the moment. In particular:<ul><li>Recursive or circular function call graphs are not supported.</li><li>External functions (without bodies) that return a tensor are not supported.</li><li>Function with multiple blocks or multiple ReturnOps are not supported.</li><li>Layout maps on function signatures can be controlled with a separate <code>function-boundary-type-conversion</code> option, which is similar to <code>unknown-type-conversion</code> but supports an additional <code>infer-layout-map</code> option. <code>fully-dynamic-layout-map</code> and <code>identity-layout-map</code> ensure that function signatures bufferize to easily predictable types, potentially at the cost of additional casts and copies, respectively. When layout maps are inferred, function return types may be more precise, but less predictable. Function argument types cannot be inferred and always have fully dynamic layout maps with <code>infer-layout-map</code>.</li></ul>One-Shot Bufferize implements the following contract around function calls: The buffer of function arguments is always writable (unless annotated with <code>bufferization.writable = false</code>). A buffer copy may be inserted at the call site where necessary. Alias sets and equivalence info is propagated through function calls. Whenever a function is bufferized, all other functions that are being called were already analyzed and bufferized, so exact alias and equivalence information is available. This is why recursive function calls are not yet supported.One-Shot Bufferize gathers additional information during the analysis phase when function boundary bufferization is activated. E.g., whether a function argument is read/written and which returned values are aliasing/equivalent. For debugging purposes, such information can be printed with <code>test-analysis-only</code>.The order in which ops are analyzed is important. The analysis is greedy and ops that are analyzed earlier are more likely to bufferize in-place. The heuristic can be set with <code>analysis-heuristic</code>. At the moment, the following heuristics are available:<ul><li><code>bottom-up</code> (default): Analyze ops from bottom to top.</li><li><code>top-down</code>: Analyze ops from top to bottom.</li><li><code>fuzzer</code>: Randomize the ordering of ops with <code>analysis-fuzzer-seed</code>.</li><li><code>bottom-up-from-terminators</code>: Traverse the reverse use-def chains of tensor IR, starting from region branch terminators (bottom-up). Nested regions are traversed before enclosing regions. Analyze the traversed ops first, then analyze the remaining ops bottom-up. This heuristic is useful for bufferizing loop constructs. One-Shot Bufferize currently supports only such IR where yielded tensor values bufferize to equivalent region iter_args, and first analyzing all ops on the path from the “yielding” op to the beginning of the loop body makes it more likely for the region iter_args and yielded values to bufferize to equivalent buffers.</li></ul><h4 id=options-10>Options <a class=headline-hash href=#options-10>¶</a></h4><pre tabindex=0><code>-allow-return-allocs-from-loops : Allows returning/yielding new allocations from a loop. -allow-unknown-ops : Allows unknown (not bufferizable) ops in the input IR. -analysis-fuzzer-seed : Test only: Analyze ops in random order with a given seed (fuzzer) -analysis-heuristic : Heuristic that control the IR traversal during analysis -bufferize-function-boundaries : Bufferize function boundaries (experimental). -check-parallel-regions : Account for parallel regions in RaW analysis. -copy-before-write : Skip the analysis. Make a buffer copy on every write. -dialect-filter : Restrict bufferization to ops from these dialects. -dump-alias-sets : Test only: Annotate tensor IR with alias sets -no-analysis-func-filter : Skip analysis of functions with these symbol names.Set copyBeforeWrite to true when bufferizing them. -function-boundary-type-conversion : Controls layout maps when bufferizing function signatures. -must-infer-memory-space : The memory space of an memref types must always be inferred. If unset, a default memory space of 0 is used otherwise. -use-encoding-for-memory-space : Use the Tensor encoding attribute for the memory space. Exclusive to the 'must-infer-memory-space' option -test-analysis-only : Test only: Only run inplaceability analysis and annotate IR -print-conflicts : Test only: Annotate IR with RaW conflicts. Requires test-analysis-only. -unknown-type-conversion : Controls layout maps for non-inferrable memref types. -buffer-alignment : Sets the alignment of newly allocated buffers. </code></pre><h4 id=statistics-5>Statistics <a class=headline-hash href=#statistics-5>¶</a></h4><pre tabindex=0><code>num-buffer-alloc : Number of buffer allocations num-tensor-in-place : Number of in-place tensor OpOperands num-tensor-out-of-place : Number of out-of-place tensor OpOperands </code></pre><h3 id=-optimize-allocation-liveness><code>-optimize-allocation-liveness</code> <a class=headline-hash href=#-optimize-allocation-liveness>¶</a></h3>This pass optimizes the liveness of temp allocations in the input functionThis pass will find all operations that have a memory allocation effect. It will search for the corresponding deallocation and move it right after the last user of the allocation. This will optimize the liveness of the allocations.The pass is expected to run after the deallocation pipeline.<h3 id=-ownership-based-buffer-deallocation><code>-ownership-based-buffer-deallocation</code> <a class=headline-hash href=#-ownership-based-buffer-deallocation>¶</a></h3>Adds all required dealloc operations for all allocations in the input programThis pass implements an algorithm to automatically introduce all required deallocation operations for all buffers in the input program. This ensures that the resulting program does not have any memory leaks.The Buffer Deallocation pass operates on the level of operations implementing the FunctionOpInterface. Such operations can take MemRefs as arguments, but also return them. To ensure compatibility among all functions (including external ones), some rules have to be enforced. They are just assumed to hold for all external functions. Functions for which the definition is available ideally also already adhere to the ABI. Otherwise, all MemRef write operations in the input IR must dominate all MemRef read operations in the input IR. Then, the pass may modify the input IR by inserting <code>bufferization.clone</code> operations such that the output IR adheres to the function boundary ABI:<ul><li>When a MemRef is passed as a function argument, ownership is never acquired. It is always the caller’s responsibility to deallocate such MemRefs.</li><li>Returning a MemRef from a function always passes ownership to the caller, i.e., it is also the caller’s responsibility to deallocate MemRefs returned from a called function.</li><li>A function must not return a MemRef with the same allocated base buffer as one of its arguments (in this case a copy has to be created). Note that in this context two subviews of the same buffer that don’t overlap are also considered an alias.</li></ul>It is recommended to bufferize all operations first such that no tensor values remain in the IR once this pass is applied. That way all allocated MemRefs will be properly deallocated without any additional manual work. Otherwise, the pass that bufferizes the remaining tensors is responsible to add the corresponding deallocation operations. Note that this pass does not consider any values of tensor type and assumes that MemRef values defined by <code>bufferization.to_memref</code> do not return ownership and do not have to be deallocated. <code>bufferization.to_tensor</code> operations are handled similarly to <code>bufferization.clone</code> operations with the exception that the result value is not handled because it’s a tensor (not a MemRef).Input<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>#map0 = affine_map<(d0) -> (d0)> module { func.func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { cf.cond_br %arg0, ^bb1, ^bb2 ^bb1: cf.br ^bb3(%arg1 : memref<2xf32>) ^bb2: %0 = memref.alloc() : memref<2xf32> linalg.generic { indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} outs(%arg1, %0 : memref<2xf32>, memref<2xf32>) { ^bb0(%gen1_arg0: f32, %gen1_arg1: f32): %tmp1 = exp %gen1_arg0 : f32 linalg.yield %tmp1 : f32 } cf.br ^bb3(%0 : memref<2xf32>) ^bb3(%1: memref<2xf32>): "memref.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> () return } } </code></pre></div>Output<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>#map = affine_map<(d0) -> (d0)> module { func.func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { %false = arith.constant false %true = arith.constant true cf.cond_br %arg0, ^bb1, ^bb2 ^bb1: // pred: ^bb0 cf.br ^bb3(%arg1, %false : memref<2xf32>, i1) ^bb2: // pred: ^bb0 %alloc = memref.alloc() : memref<2xf32> linalg.generic { indexing_maps = [#map, #map], iterator_types = ["parallel"]} outs(%arg1, %alloc : memref<2xf32>, memref<2xf32>) ^bb0(%out: f32, %out_0: f32): %2 = math.exp %out : f32 linalg.yield %2, %out_0 : f32, f32 } cf.br ^bb3(%alloc, %true : memref<2xf32>, i1) ^bb3(%0: memref<2xf32>, %1: i1): // 2 preds: ^bb1, ^bb2 memref.copy %0, %arg2 : memref<2xf32> to memref<2xf32> %base_buffer, %offset, %sizes, %strides = memref.extract_strided_metadata %0 : memref<2xf32> -> memref<f32>, index, index, index bufferization.dealloc (%base_buffer : memref<f32>) if (%1) return } } </code></pre></div>The <code>private-function-dynamic-ownership</code> pass option allows the pass to add additional arguments to private functions to dynamically give ownership of MemRefs to callees. This can enable earlier deallocations and allows the pass to by-pass the function boundary ABI and thus potentially leading to fewer MemRef clones being inserted. For example, the private function<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>func.func private @passthrough(%memref: memref<2xi32>) -> memref<2xi32> { return %memref : memref<2xi32> } </code></pre></div>would be converted to<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>func.func private @passthrough(%memref: memref<2xi32>, %ownership: i1) -> (memref<2xi32>, i1) { return %memref, %ownership : memref<2xi32>, i1 } </code></pre></div>and thus allows the returned MemRef to alias with the MemRef passed as argument (which would otherwise be forbidden according to the function boundary ABI).<h4 id=options-11>Options <a class=headline-hash href=#options-11>¶</a></h4><pre tabindex=0><code>-private-function-dynamic-ownership : Allows to add additional arguments to private functions to dynamically pass ownership of memrefs to callees. This can enable earlier deallocations. </code></pre><h3 id=-promote-buffers-to-stack><code>-promote-buffers-to-stack</code> <a class=headline-hash href=#-promote-buffers-to-stack>¶</a></h3>Promotes heap-based allocations to automatically managed stack-based allocationsThis pass implements a simple algorithm to convert heap-based memory allocations to stack-based ones. It uses a built-in heuristic to decide whether it makes sense to convert an allocation. Furthermore, dynamic shaped buffers that are limited by the rank of the tensor can be converted. They are only transformed if they are considered to be small.<h4 id=options-12>Options <a class=headline-hash href=#options-12>¶</a></h4><pre tabindex=0><code>-max-alloc-size-in-bytes : Maximal size in bytes to promote allocations to stack. -max-rank-of-allocated-memref : Maximal memref rank to promote dynamic buffers. </code></pre><h2 id=conversion-passes>Conversion Passes <a class=headline-hash href=#conversion-passes>¶</a></h2><h3 id=-arm-neon-2d-to-intr><code>-arm-neon-2d-to-intr</code> <a class=headline-hash href=#-arm-neon-2d-to-intr>¶</a></h3>Convert Arm NEON structured ops to intrinsicsCreates a pass to lower Arm NEON 2D ops to intrinsics, i.e. equivalent ops operating on flattened 1D vectors and mapping more directly to the corresponding Arm NEON instruction.<h3 id=-convert-affine-for-to-gpu><code>-convert-affine-for-to-gpu</code> <a class=headline-hash href=#-convert-affine-for-to-gpu>¶</a></h3>Convert top-level AffineFor Ops to GPU kernels<h4 id=options-13>Options <a class=headline-hash href=#options-13>¶</a></h4><pre tabindex=0><code>-gpu-block-dims : Number of GPU block dimensions for mapping -gpu-thread-dims : Number of GPU thread dimensions for mapping </code></pre><h3 id=-convert-amdgpu-to-rocdl><code>-convert-amdgpu-to-rocdl</code> <a class=headline-hash href=#-convert-amdgpu-to-rocdl>¶</a></h3>Convert AMDGPU dialect to ROCDL dialectThis pass converts supported AMDGPU ops to ROCDL dialect intrinsics.<h4 id=options-14>Options <a class=headline-hash href=#options-14>¶</a></h4><pre tabindex=0><code>-chipset : Chipset that these operations will run on </code></pre><h3 id=-convert-arith-to-amdgpu><code>-convert-arith-to-amdgpu</code> <a class=headline-hash href=#-convert-arith-to-amdgpu>¶</a></h3>Convert Arith operations to AMDGPU-specific implementationsConvert <code>arith</code> operations (currently extf and truncf on 8-bit floats) to operations in the <code>amdgpu</code> dialect. This pass is done in two steps in order to avoid running a notional arith-to-rocdl and arith-to-llvm simultaniously.<h4 id=options-15>Options <a class=headline-hash href=#options-15>¶</a></h4><pre tabindex=0><code>-chipset : Chipset that these operations will run on -saturate-fp8-truncf : Use saturating truncation for 8-bit float types -allow-packed-f16-round-to-zero : Whether we should allow f32->f16 packed round-to-zero conversion </code></pre><h3 id=-convert-arith-to-arm-sme><code>-convert-arith-to-arm-sme</code> <a class=headline-hash href=#-convert-arith-to-arm-sme>¶</a></h3>Convert Arith dialect to ArmSME dialect<h3 id=-convert-arith-to-emitc><code>-convert-arith-to-emitc</code> <a class=headline-hash href=#-convert-arith-to-emitc>¶</a></h3>Convert Arith dialect to EmitC dialect<h3 id=-convert-arith-to-llvm><code>-convert-arith-to-llvm</code> <a class=headline-hash href=#-convert-arith-to-llvm>¶</a></h3>Convert Arith dialect to LLVM dialectThis pass converts supported Arith ops to LLVM dialect instructions.<h4 id=options-16>Options <a class=headline-hash href=#options-16>¶</a></h4><pre tabindex=0><code>-index-bitwidth : Bitwidth of the index type, 0 to use size of machine word </code></pre><h3 id=-convert-arith-to-spirv><code>-convert-arith-to-spirv</code> <a class=headline-hash href=#-convert-arith-to-spirv>¶</a></h3>Convert Arith dialect to SPIR-V dialect<h4 id=options-17>Options <a class=headline-hash href=#options-17>¶</a></h4><pre tabindex=0><code>-emulate-lt-32-bit-scalar-types : Emulate narrower scalar types with 32-bit ones if not supported by the target </code></pre><h3 id=-convert-arm-sme-to-llvm><code>-convert-arm-sme-to-llvm</code> <a class=headline-hash href=#-convert-arm-sme-to-llvm>¶</a></h3>Lower the operations from the ArmSME dialect into the LLVM dialect<h4 id=options-18>Options <a class=headline-hash href=#options-18>¶</a></h4><pre tabindex=0><code>-dump-tile-live-ranges : Dump the live ranges of SME tiles (for debugging) </code></pre><h3 id=-convert-arm-sme-to-scf><code>-convert-arm-sme-to-scf</code> <a class=headline-hash href=#-convert-arm-sme-to-scf>¶</a></h3>Lower the operations from the ArmSME dialect into the SCF dialect<h3 id=-convert-async-to-llvm><code>-convert-async-to-llvm</code> <a class=headline-hash href=#-convert-async-to-llvm>¶</a></h3>Convert the operations from the async dialect into the LLVM dialectConvert <code>async.execute</code> operations to LLVM coroutines and use async runtime API to execute them.<h3 id=-convert-bufferization-to-memref><code>-convert-bufferization-to-memref</code> <a class=headline-hash href=#-convert-bufferization-to-memref>¶</a></h3>Convert operations from the Bufferization dialect to the MemRef dialectThis pass converts bufferization operations into memref operations.In the current state, this pass only transforms a <code>bufferization.clone</code> operation into <code>memref.alloc</code> and <code>memref.copy</code> operations and <code>bufferization.dealloc</code> operations (the same way as the <code>-bufferization-lower-deallocations</code> pass). The conversion of <code>clone</code> operations is needed, since some clone operations could remain after applying several transformation processes. Currently, only <code>canonicalize</code> transforms clone operations or even eliminates them. This can lead to errors if any clone op survived after all conversion passes (starting from the bufferization dialect) are performed.See: <a href=https://llvm.discourse.group/t/bufferization-error-related-to-memref-clone/4665>https://llvm.discourse.group/t/bufferization-error-related-to-memref-clone/4665</a>To avoid these errors, this pass can be performed as a last clean-up pass to transform remaining operations and to proceed in other dialects (memref e.g.).Note that this pass only transforms the operation without any further analyses. This pass does not consider any memory analysis or optimization and hence does not resolve any memory leaks.<h3 id=-convert-cf-to-llvm><code>-convert-cf-to-llvm</code> <a class=headline-hash href=#-convert-cf-to-llvm>¶</a></h3>Convert ControlFlow operations to the LLVM dialectConvert ControlFlow operations into LLVM IR dialect operations.If other operations are present and their results are required by the LLVM IR dialect operations, the pass will fail. Any LLVM IR operations or types already present in the IR will be kept as is.<h4 id=options-19>Options <a class=headline-hash href=#options-19>¶</a></h4><pre tabindex=0><code>-index-bitwidth : Bitwidth of the index type, 0 to use size of machine word </code></pre><h3 id=-convert-cf-to-spirv><code>-convert-cf-to-spirv</code> <a class=headline-hash href=#-convert-cf-to-spirv>¶</a></h3>Convert ControlFlow dialect to SPIR-V dialect<h4 id=options-20>Options <a class=headline-hash href=#options-20>¶</a></h4><pre tabindex=0><code>-emulate-lt-32-bit-scalar-types : Emulate narrower scalar types with 32-bit ones if not supported by the target </code></pre><h3 id=-convert-complex-to-libm><code>-convert-complex-to-libm</code> <a class=headline-hash href=#-convert-complex-to-libm>¶</a></h3>Convert Complex dialect to libm callsThis pass converts supported Complex ops to libm calls.<h3 id=-convert-complex-to-llvm><code>-convert-complex-to-llvm</code> <a class=headline-hash href=#-convert-complex-to-llvm>¶</a></h3>Convert Complex dialect to LLVM dialect<h4 id=options-21>Options <a class=headline-hash href=#options-21>¶</a></h4><pre tabindex=0><code>-complex-range : Control the intermediate calculation of complex number division </code></pre><h3 id=-convert-complex-to-spirv><code>-convert-complex-to-spirv</code> <a class=headline-hash href=#-convert-complex-to-spirv>¶</a></h3>Convert Complex dialect to SPIRV dialect<h3 id=-convert-complex-to-standard><code>-convert-complex-to-standard</code> <a class=headline-hash href=#-convert-complex-to-standard>¶</a></h3>Convert Complex dialect to standard dialect<h4 id=options-22>Options <a class=headline-hash href=#options-22>¶</a></h4><pre tabindex=0><code>-complex-range : Control the intermediate calculation of complex number division </code></pre><h3 id=-convert-func-to-emitc><code>-convert-func-to-emitc</code> <a class=headline-hash href=#-convert-func-to-emitc>¶</a></h3>Convert Func dialect to EmitC dialect<h3 id=-convert-func-to-llvm><code>-convert-func-to-llvm</code> <a class=headline-hash href=#-convert-func-to-llvm>¶</a></h3>Convert from the Func dialect to the LLVM dialectConvert Func dialect operations into the LLVM IR dialect operations.<h4 id=input-invariant>Input invariant <a class=headline-hash href=#input-invariant>¶</a></h4><ul><li>no <code>tensor</code> types;</li><li>all <code>vector</code> are one-dimensional;</li><li>all blocks are reachable by following the successors of the first basic block;</li></ul>If other operations are present and their results are required by the LLVM IR dialect operations, the pass will fail. Any LLVM IR operations or types already present in the IR will be kept as is.An LLVM datalayout string can be attached as an attribute to the module on which the pass anchors. Such an attribute is attached by calling the set-module-datalayout pass. If present, an llvm::DataLayout object is created from this attribute and used in the conversion to LLVM.<h4 id=output-ir>Output IR <a class=headline-hash href=#output-ir>¶</a></h4>Functions converted to LLVM IR. Function arguments types are converted one-to-one. Function results are converted one-to-one and, in case more than 1 value is returned, packed into an LLVM IR struct type. Function calls and returns are updated accordingly. Block argument types are updated to use LLVM IR types.<h4 id=options-23>Options <a class=headline-hash href=#options-23>¶</a></h4><pre tabindex=0><code>-use-bare-ptr-memref-call-conv : Replace FuncOp's MemRef arguments with bare pointers to the MemRef element types -index-bitwidth : Bitwidth of the index type, 0 to use size of machine word </code></pre><h3 id=-convert-func-to-spirv><code>-convert-func-to-spirv</code> <a class=headline-hash href=#-convert-func-to-spirv>¶</a></h3>Convert Func dialect to SPIR-V dialect<h4 id=options-24>Options <a class=headline-hash href=#options-24>¶</a></h4><pre tabindex=0><code>-emulate-lt-32-bit-scalar-types : Emulate narrower scalar types with 32-bit ones if not supported by the target </code></pre><h3 id=-convert-gpu-to-llvm-spv><code>-convert-gpu-to-llvm-spv</code> <a class=headline-hash href=#-convert-gpu-to-llvm-spv>¶</a></h3>Generate LLVM operations to be ingested by a SPIR-V backend for gpu operations<h4 id=options-25>Options <a class=headline-hash href=#options-25>¶</a></h4><pre tabindex=0><code>-use-64bit-index : Use 64-bit integers to convert index types </code></pre><h3 id=-convert-gpu-to-nvvm><code>-convert-gpu-to-nvvm</code> <a class=headline-hash href=#-convert-gpu-to-nvvm>¶</a></h3>Generate NVVM operations for gpu operations<h4 id=options-26>Options <a class=headline-hash href=#options-26>¶</a></h4><pre tabindex=0><code>-index-bitwidth : Bitwidth of the index type, 0 to use size of machine word -has-redux : Target gpu supports redux -use-bare-ptr-memref-call-conv : Replace memref arguments in GPU functions with bare pointers. All memrefs must have static shape. -allowed-dialects : Run conversion patterns of only the specified dialects </code></pre><h3 id=-convert-gpu-to-rocdl><code>-convert-gpu-to-rocdl</code> <a class=headline-hash href=#-convert-gpu-to-rocdl>¶</a></h3>Generate ROCDL operations for gpu operations<h4 id=options-27>Options <a class=headline-hash href=#options-27>¶</a></h4><pre tabindex=0><code>-chipset : Chipset that these operations will run on -index-bitwidth : Bitwidth of the index type, 0 to use size of machine word -use-bare-ptr-memref-call-conv : Replace memref arguments in GPU functions with bare pointers.All memrefs must have static shape -runtime : Runtime code will be run on (default is Unknown, can also use HIP or OpenCL) -allowed-dialects : Run conversion patterns of only the specified dialects </code></pre><h3 id=-convert-gpu-to-spirv><code>-convert-gpu-to-spirv</code> <a class=headline-hash href=#-convert-gpu-to-spirv>¶</a></h3>Convert GPU dialect to SPIR-V dialectThis pass converts supported GPU device ops to SPIR-V ops. It does not handle GPU host ops.A <code>gpu.func</code> op can have parameters to pass in resources. But in SPIR-V entry functions cannot take parameters; they use descriptors to access resources. By default, parameters to a <code>gpu.func</code> op will be converted to global variables. These global variables will be assigned sequential binding numbers following their order in the original <code>gpu.func</code> op, starting from 0, in set 0. One can attach <code>spirv.interface_var_abi</code> to those parameters to control the set and binding if wanted.<h4 id=options-28>Options <a class=headline-hash href=#options-28>¶</a></h4><pre tabindex=0><code>-use-64bit-index : Use 64-bit integers to convert index types </code></pre><h3 id=-convert-index-to-llvm><code>-convert-index-to-llvm</code> <a class=headline-hash href=#-convert-index-to-llvm>¶</a></h3>Lower the <code>index</code> dialect to the <code>llvm</code> dialect.This pass lowers Index dialect operations to LLVM dialect operations. Operation conversions are 1-to-1 except for the exotic divides: <code>ceildivs</code>, <code>ceildivu</code>, and <code>floordivs</code>, which expand to series of LLVM operations. Importantly, the index bitwidth should be correctly set to the target pointer width via <code>index-bitwidth</code>.<h4 id=options-29>Options <a class=headline-hash href=#options-29>¶</a></h4><pre tabindex=0><code>-index-bitwidth : Bitwidth of the index type, 0 to use size of machine word </code></pre><h3 id=-convert-index-to-spirv><code>-convert-index-to-spirv</code> <a class=headline-hash href=#-convert-index-to-spirv>¶</a></h3>Lower the <code>index</code> dialect to the <code>spirv</code> dialect.This pass lowers Index dialect operations to SPIR-V dialect operations. Operation conversions are 1-to-1 except for the exotic divides: <code>ceildivs</code>, <code>ceildivu</code>, and <code>floordivs</code>. The index bitwidth will be 32 or 64 as specified by use-64bit-index.<h4 id=options-30>Options <a class=headline-hash href=#options-30>¶</a></h4><pre tabindex=0><code>-use-64bit-index : Use 64-bit integers to convert index types </code></pre><h3 id=-convert-linalg-to-std><code>-convert-linalg-to-std</code> <a class=headline-hash href=#-convert-linalg-to-std>¶</a></h3>Convert the operations from the linalg dialect into the Standard dialect<h3 id=-convert-math-to-emitc><code>-convert-math-to-emitc</code> <a class=headline-hash href=#-convert-math-to-emitc>¶</a></h3>Convert some Math operations to EmitC call_opaque operationsThis pass converts supported Math ops to <code>call_opaque</code> ops targeting libc/libm functions. Unlike convert-math-to-funcs pass, converting to <code>call_opaque</code> ops allows to overload the same function with different argument types.<h4 id=options-31>Options <a class=headline-hash href=#options-31>¶</a></h4><pre tabindex=0><code>-language-target : Select the language standard target for callees (c99 or cpp11). </code></pre><h3 id=-convert-math-to-funcs><code>-convert-math-to-funcs</code> <a class=headline-hash href=#-convert-math-to-funcs>¶</a></h3>Convert Math operations to calls of outlined implementations.This pass converts supported Math ops to calls of compiler generated functions implementing these operations in software. The LLVM dialect is used for LinkonceODR linkage of the generated functions.<h4 id=options-32>Options <a class=headline-hash href=#options-32>¶</a></h4><pre tabindex=0><code>-min-width-of-fpowi-exponent : Convert FPowI only if the width of its exponent's integer type is greater than or equal to this value -convert-ctlz : Convert math.ctlz to a software implementation. Enable for targets that do not natively support ctlz. </code></pre><h3 id=-convert-math-to-libm><code>-convert-math-to-libm</code> <a class=headline-hash href=#-convert-math-to-libm>¶</a></h3>Convert Math dialect to libm callsThis pass converts supported Math ops to libm calls.<h3 id=-convert-math-to-llvm><code>-convert-math-to-llvm</code> <a class=headline-hash href=#-convert-math-to-llvm>¶</a></h3>Convert Math dialect to LLVM dialect<h4 id=options-33>Options <a class=headline-hash href=#options-33>¶</a></h4><pre tabindex=0><code>-approximate-log1p : Enable approximation of Log1p. </code></pre><h3 id=-convert-math-to-rocdl><code>-convert-math-to-rocdl</code> <a class=headline-hash href=#-convert-math-to-rocdl>¶</a></h3>Convert Math dialect to ROCDL library callsThis pass converts supported Math ops to ROCDL library calls.<h3 id=-convert-math-to-spirv><code>-convert-math-to-spirv</code> <a class=headline-hash href=#-convert-math-to-spirv>¶</a></h3>Convert Math dialect to SPIR-V dialect<h3 id=-convert-memref-to-emitc><code>-convert-memref-to-emitc</code> <a class=headline-hash href=#-convert-memref-to-emitc>¶</a></h3>Convert MemRef dialect to EmitC dialect<h3 id=-convert-memref-to-spirv><code>-convert-memref-to-spirv</code> <a class=headline-hash href=#-convert-memref-to-spirv>¶</a></h3>Convert MemRef dialect to SPIR-V dialect<h4 id=options-34>Options <a class=headline-hash href=#options-34>¶</a></h4><pre tabindex=0><code>-bool-num-bits : The number of bits to store a boolean value -use-64bit-index : Use 64-bit integers to convert index types </code></pre><h3 id=-convert-mesh-to-mpi><code>-convert-mesh-to-mpi</code> <a class=headline-hash href=#-convert-mesh-to-mpi>¶</a></h3>Convert Mesh dialect to MPI dialect.This pass converts communication operations from the Mesh dialect to the MPI dialect. If it finds the DLTI attribute “MPI:comm_world-rank” on the module it will use that integer value instead of calling MPI_Comm_rank. This allows optimizations like constant shape propagation and fusion because shard/partition sizes depend on the rank.<h3 id=-convert-nvgpu-to-nvvm><code>-convert-nvgpu-to-nvvm</code> <a class=headline-hash href=#-convert-nvgpu-to-nvvm>¶</a></h3>Convert NVGPU dialect to NVVM dialectThis pass converts supported NVGPU ops to NVVM dialect intrinsics.<h3 id=-convert-nvvm-to-llvm><code>-convert-nvvm-to-llvm</code> <a class=headline-hash href=#-convert-nvvm-to-llvm>¶</a></h3>Convert NVVM to PTX with Inline Assembly in LLVM dialectThis pass generates PTX instructions using inline assembly for NVVM operations implements <code>BasicPtxBuilderInterface</code>.<h3 id=-convert-openacc-to-scf><code>-convert-openacc-to-scf</code> <a class=headline-hash href=#-convert-openacc-to-scf>¶</a></h3>Convert the OpenACC ops to OpenACC with SCF dialect<h3 id=-convert-openmp-to-llvm><code>-convert-openmp-to-llvm</code> <a class=headline-hash href=#-convert-openmp-to-llvm>¶</a></h3>Convert the OpenMP ops to OpenMP ops with LLVM dialect<h3 id=-convert-parallel-loops-to-gpu><code>-convert-parallel-loops-to-gpu</code> <a class=headline-hash href=#-convert-parallel-loops-to-gpu>¶</a></h3>Convert mapped scf.parallel ops to gpu launch operationsCreates a pass that converts scf.parallel operations into a gpu.launch operation. The mapping of loop dimensions to launch dimensions is derived from mapping attributes. See ParallelToGpuLaunchLowering::matchAndRewrite for a description of the used attributes.<h3 id=-convert-pdl-to-pdl-interp><code>-convert-pdl-to-pdl-interp</code> <a class=headline-hash href=#-convert-pdl-to-pdl-interp>¶</a></h3>Convert PDL ops to PDL interpreter ops<h3 id=-convert-scf-to-cf><code>-convert-scf-to-cf</code> <a class=headline-hash href=#-convert-scf-to-cf>¶</a></h3>Convert SCF dialect to ControlFlow dialect, replacing structured control flow with a CFG<h3 id=-convert-scf-to-emitc><code>-convert-scf-to-emitc</code> <a class=headline-hash href=#-convert-scf-to-emitc>¶</a></h3>Convert SCF dialect to EmitC dialect, maintaining structured control flow<h3 id=-convert-scf-to-openmp><code>-convert-scf-to-openmp</code> <a class=headline-hash href=#-convert-scf-to-openmp>¶</a></h3>Convert SCF parallel loop to OpenMP parallel + workshare constructs.<h4 id=options-35>Options <a class=headline-hash href=#options-35>¶</a></h4><pre tabindex=0><code>-num-threads : Number of threads to use </code></pre><h3 id=-convert-scf-to-spirv><code>-convert-scf-to-spirv</code> <a class=headline-hash href=#-convert-scf-to-spirv>¶</a></h3>Convert SCF dialect to SPIR-V dialect.Converts SCF ops into SPIR-V structured control flow ops. SPIR-V structured control flow ops do not support yielding values. So for SCF ops yielding values, SPIR-V variables are created for holding the values and load/store operations are emitted for updating them.<h3 id=-convert-shape-constraints><code>-convert-shape-constraints</code> <a class=headline-hash href=#-convert-shape-constraints>¶</a></h3>Convert shape constraint operations to the standard dialectThis pass eliminates shape constraints from the program, converting them to eager (side-effecting) error handling code.This pass is separate from the regular convert-shape-to-standard, despite converting between the same dialects, because converting shape constraints can happen at a different part of the program than general shape computation lowering.<h3 id=-convert-shape-to-std><code>-convert-shape-to-std</code> <a class=headline-hash href=#-convert-shape-to-std>¶</a></h3>Convert operations from the shape dialect into the standard dialect<h3 id=-convert-spirv-to-llvm><code>-convert-spirv-to-llvm</code> <a class=headline-hash href=#-convert-spirv-to-llvm>¶</a></h3>Convert SPIR-V dialect to LLVM dialectSee <a href=https://mlir.llvm.org/docs/SPIRVToLLVMDialectConversion/>https://mlir.llvm.org/docs/SPIRVToLLVMDialectConversion/</a> for more details.<h4 id=options-36>Options <a class=headline-hash href=#options-36>¶</a></h4><pre tabindex=0><code>-client-api : Derive StorageClass to address space mapping from the client API </code></pre><h3 id=-convert-tensor-to-linalg><code>-convert-tensor-to-linalg</code> <a class=headline-hash href=#-convert-tensor-to-linalg>¶</a></h3>Convert some Tensor dialect ops to Linalg dialect<h3 id=-convert-tensor-to-spirv><code>-convert-tensor-to-spirv</code> <a class=headline-hash href=#-convert-tensor-to-spirv>¶</a></h3>Convert Tensor dialect to SPIR-V dialect<h4 id=options-37>Options <a class=headline-hash href=#options-37>¶</a></h4><pre tabindex=0><code>-emulate-lt-32-bit-scalar-types : Emulate narrower scalar types with 32-bit ones if not supported by the target </code></pre><h3 id=-convert-to-llvm><code>-convert-to-llvm</code> <a class=headline-hash href=#-convert-to-llvm>¶</a></h3>Convert to LLVM via dialect interfaces found in the input IRThis is a generic pass to convert to LLVM, it uses the <code>ConvertToLLVMPatternInterface</code> dialect interface to delegate to dialects the injection of conversion patterns.If <code>dynamic</code> is set to <code>true</code>, the pass will look for <code>ConvertToLLVMAttrInterface</code> attributes and use them to further configure the conversion process. This option also uses the <code>DataLayoutAnalysis</code> analysis to configure the type converter. Enabling this option incurs in extra overhead.<h4 id=options-38>Options <a class=headline-hash href=#options-38>¶</a></h4><pre tabindex=0><code>-filter-dialects : Test conversion patterns of only the specified dialects -dynamic : Use op conversion attributes to configure the conversion </code></pre><h3 id=-convert-ub-to-llvm><code>-convert-ub-to-llvm</code> <a class=headline-hash href=#-convert-ub-to-llvm>¶</a></h3>Convert UB dialect to LLVM dialectThis pass converts supported UB ops to LLVM dialect instructions.<h4 id=options-39>Options <a class=headline-hash href=#options-39>¶</a></h4><pre tabindex=0><code>-index-bitwidth : Bitwidth of the index type, 0 to use size of machine word </code></pre><h3 id=-convert-ub-to-spirv><code>-convert-ub-to-spirv</code> <a class=headline-hash href=#-convert-ub-to-spirv>¶</a></h3>Convert UB dialect to SPIR-V dialectThis pass converts supported UB ops to SPIR-V dialect ops.<h3 id=-convert-vector-to-arm-sme><code>-convert-vector-to-arm-sme</code> <a class=headline-hash href=#-convert-vector-to-arm-sme>¶</a></h3>Lower the operations from the vector dialect into the ArmSME dialectPass that converts vector dialect operations into equivalent ArmSME dialect operations.<h3 id=-convert-vector-to-gpu><code>-convert-vector-to-gpu</code> <a class=headline-hash href=#-convert-vector-to-gpu>¶</a></h3>Lower the operations from the vector dialect into the GPU dialect<h4 id=options-40>Options <a class=headline-hash href=#options-40>¶</a></h4><pre tabindex=0><code>-use-nvgpu : convert to NvGPU ops instead of GPU dialect ops </code></pre><h3 id=-convert-vector-to-llvm><code>-convert-vector-to-llvm</code> <a class=headline-hash href=#-convert-vector-to-llvm>¶</a></h3>Lower the operations from the vector dialect into the LLVM dialectConvert operations from the vector dialect into the LLVM IR dialect operations. The lowering pass provides several options to control the kinds of optimizations that are allowed. It also provides options that enable the use of one or more architectural-specific dialects (AMX, X86Vector, ArmNeon, ArmSVE, etc.) in combination with the architectural-neutral vector dialect lowering.<h4 id=options-41>Options <a class=headline-hash href=#options-41>¶</a></h4><pre tabindex=0><code>-reassociate-fp-reductions : Allows llvm to reassociate floating-point reductions for speed -force-32bit-vector-indices : Allows compiler to assume vector indices fit in 32-bit if that yields faster code -enable-amx : Enables the use of AMX dialect while lowering the vector dialect. -enable-arm-neon : Enables the use of ArmNeon dialect while lowering the vector dialect. -enable-arm-sve : Enables the use of ArmSVE dialect while lowering the vector dialect. -enable-x86vector : Enables the use of X86Vector dialect while lowering the vector dialect. -vector-contract-lowering : control the lowering of `vector.contract` operations. -vector-transpose-lowering : control the lowering of `vector.transpose` operations. </code></pre><h3 id=-convert-vector-to-scf><code>-convert-vector-to-scf</code> <a class=headline-hash href=#-convert-vector-to-scf>¶</a></h3>Lower the operations from the vector dialect into the SCF dialect<h4 id=options-42>Options <a class=headline-hash href=#options-42>¶</a></h4><pre tabindex=0><code>-full-unroll : Perform full unrolling when converting vector transfers to SCF -target-rank : Target vector rank to which transfer ops should be lowered -lower-tensors : Lower transfer ops that operate on tensors -lower-scalable : Add scalable vector specific lowerings (that introduce loops) </code></pre><h3 id=-convert-vector-to-spirv><code>-convert-vector-to-spirv</code> <a class=headline-hash href=#-convert-vector-to-spirv>¶</a></h3>Convert Vector dialect to SPIR-V dialect<h3 id=-convert-vector-to-xegpu><code>-convert-vector-to-xegpu</code> <a class=headline-hash href=#-convert-vector-to-xegpu>¶</a></h3>Lower the operations from the vector dialect into the XeGPU dialect<h3 id=-finalize-memref-to-llvm><code>-finalize-memref-to-llvm</code> <a class=headline-hash href=#-finalize-memref-to-llvm>¶</a></h3>Finalize MemRef dialect to LLVM dialect conversionFinalize the conversion of the operations from the MemRef dialect to the LLVM dialect. This conversion will not convert some complex MemRef operations. Make sure to run <code>expand-strided-metadata</code> beforehand for these.<h4 id=options-43>Options <a class=headline-hash href=#options-43>¶</a></h4><pre tabindex=0><code>-use-aligned-alloc : Use aligned_alloc in place of malloc for heap allocations -index-bitwidth : Bitwidth of the index type, 0 to use size of machine word -use-generic-functions : Use generic allocation and deallocation functions instead of the classic 'malloc', 'aligned_alloc' and 'free' functions </code></pre><h3 id=-gpu-to-llvm><code>-gpu-to-llvm</code> <a class=headline-hash href=#-gpu-to-llvm>¶</a></h3>Convert GPU dialect to LLVM dialect with GPU runtime callsCreates a pass to convert a GPU operations into a sequence of GPU runtime calls.This pass does not generate code to call GPU runtime APIs directly but instead uses a small wrapper library that exports a stable and conveniently typed ABI on top of GPU runtimes such as CUDA or ROCm (HIP).<h4 id=options-44>Options <a class=headline-hash href=#options-44>¶</a></h4><pre tabindex=0><code>-use-bare-pointers-for-host : Use bare pointers to pass memref arguments to host functions. All memrefs must have static shape. -use-bare-pointers-for-kernels : Use bare pointers to pass memref arguments to kernels. The kernel must use the same setting for this option. -intersperse-sizes-for-kernels : Inserts a size_t argument following each memref argument, containing the static size in bytes of the buffer. Incompatible arguments are rejected. This is intended for use by the Vulkan runtime with the kernel bare pointer calling convention, to enable dynamic binding of buffers as arguments without static type info. </code></pre><h3 id=-lift-cf-to-scf><code>-lift-cf-to-scf</code> <a class=headline-hash href=#-lift-cf-to-scf>¶</a></h3>Lift ControlFlow dialect to SCF dialectLifts ControlFlow operations to SCF dialect operations.This pass is prefixed with “lift” instead of “convert” as it is not always guaranteed to replace all ControlFlow ops. If a region contains only a single kind of return-like operation, all ControlFlow operations will be replaced successfully. Otherwise a single ControlFlow switch branching to one block per return-like operation kind remains.This pass may need to create unreachable terminators in case of infinite loops, which is only supported for ‘func.func’ for now. If you potentially have infinite loops inside CFG regions not belonging to ‘func.func’, consider using <code>transformCFGToSCF</code> function directly with corresponding <code>CFGToSCFInterface::createUnreachableTerminator</code> implementation.<h3 id=-lower-affine><code>-lower-affine</code> <a class=headline-hash href=#-lower-affine>¶</a></h3>Lower Affine operations to a combination of Arith and SCF operationsConvert operations from the affine dialect into operations from the SCF and standard dialects.<code>affine.for</code> operations are converted to <code>scf.for</code> operations that are free of certain structural restrictions (on their bounds and step). <code>affine.if</code> is similarly converted to the <code>scf.if</code> operation. <code>affine.apply</code> operations are converted into sequences of primitive arithmetic operations from the arith dialect that have the same effect, using operands of the <code>index</code> type. Consequently, named maps and sets thare are no longer in use may be removed from the module.For example, <code>%r = affine.apply affine_map<(d0, d1)[s0] -> (d0 + 2*d1 + s0)>(%d0, %d1)[%s0]</code> can be converted into:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>%d0 = <...> %d1 = <...> %s0 = <...> %0 = arith.constant 2 : index %1 = arith.muli %0, %d1 %2 = arith.addi %d0, %1 %r = arith.addi %2, %s0 </code></pre></div><h4 id=input-invariant-1>Input invariant <a class=headline-hash href=#input-invariant-1>¶</a></h4><ul><li>no <code>Tensor</code> types;</li></ul>These restrictions may be lifted in the future.<h4 id=output-ir-1>Output IR <a class=headline-hash href=#output-ir-1>¶</a></h4>Functions with <code>affine.for</code> and <code>affine.if</code> operations eliminated. These functions may contain operations from the Standard dialect in addition to those already present before the pass.<h4 id=invariants>Invariants <a class=headline-hash href=#invariants>¶</a></h4><ul><li>Functions without a body are not modified.</li><li>The semantics of the other functions is preserved.</li><li>Individual operations other than those mentioned above are not modified if they do not depend on the loop iterator value or on the result of <code>affine.apply</code>.</li></ul><h3 id=-lower-host-to-llvm><code>-lower-host-to-llvm</code> <a class=headline-hash href=#-lower-host-to-llvm>¶</a></h3>Lowers the host module code and <code>gpu.launch_func</code> to LLVMCreates a pass to emulate <code>gpu.launch_func</code> call in LLVM dialect and lower the host module code to LLVM.This transformation creates a sequence of global variables that are later linked to the variables in the kernel module, and a series of copies to/from them to emulate the memory transfer from the host or to the device sides. It also converts the remaining Arithmetic, Func, and MemRef dialects into LLVM dialect, emitting C wrappers.<h3 id=-map-memref-spirv-storage-class><code>-map-memref-spirv-storage-class</code> <a class=headline-hash href=#-map-memref-spirv-storage-class>¶</a></h3>Map numeric MemRef memory spaces to SPIR-V storage classes<h4 id=options-45>Options <a class=headline-hash href=#options-45>¶</a></h4><pre tabindex=0><code>-client-api : The client API to use for populating mappings </code></pre><h3 id=-reconcile-unrealized-casts><code>-reconcile-unrealized-casts</code> <a class=headline-hash href=#-reconcile-unrealized-casts>¶</a></h3>Simplify and eliminate unrealized conversion castsEliminate <code>unrealized_conversion_cast</code> operations, commonly introduced by partial dialect conversions, that transitively convert a value to another value of the same type, that is:<pre tabindex=0><code>%0 = "producer.op"() : () -> !type.A %1 = unrealized_conversion_cast %0 : !type.A to !type.B %2 = unrealized_conversion_cast %1 : !type.B to !type.C %3 = unrealized_conversion_cast %2 : !type.C to !type.A "consumer.op"(%3) : (!type.A) -> () </code></pre>Such situations appear when the consumer operation is converted by one pass and the producer operation is converted by another pass, each of which produces an unrealized cast. This pass can be used to clean up the IR.<h3 id=-set-llvm-module-datalayout><code>-set-llvm-module-datalayout</code> <a class=headline-hash href=#-set-llvm-module-datalayout>¶</a></h3>Attach a datalayout string as a module attributeVerify that the dataLayout string is a valid LLVM datalayout string and attach it as an attribute <code>LLVMDialect::getDataLayoutAttrName()</code> to the module, overriding the existing one.<h4 id=options-46>Options <a class=headline-hash href=#options-46>¶</a></h4><pre tabindex=0><code>-data-layout : String description (LLVM format) of the data layout that is expected on the produced module </code></pre><h3 id=-tosa-to-arith><code>-tosa-to-arith</code> <a class=headline-hash href=#-tosa-to-arith>¶</a></h3>Lower TOSA to the Arith dialectPass that converts TOSA operations to the equivalent operations using the operations in the Arith dialect. The ApplyScale operator is optionally included as it is often preserved until the final invocation.<h4 id=options-47>Options <a class=headline-hash href=#options-47>¶</a></h4><pre tabindex=0><code>-include-apply-rescale : Whether to include the lowering for tosa.apply_rescale to arith -use-32-bit : Whether to prioritze lowering to 32-bit operations </code></pre><h3 id=-tosa-to-linalg><code>-tosa-to-linalg</code> <a class=headline-hash href=#-tosa-to-linalg>¶</a></h3>Lower TOSA to LinAlg on tensorsPass that converts TOSA operations to the equivalent operations using the tensor operations in LinAlg.<h4 id=options-48>Options <a class=headline-hash href=#options-48>¶</a></h4><pre tabindex=0><code>-disable-tosa-decompositions : Disable tosa decompositions pass -aggressive-reduce-constant : Always perform the reduce constant optimization </code></pre><h3 id=-tosa-to-linalg-named><code>-tosa-to-linalg-named</code> <a class=headline-hash href=#-tosa-to-linalg-named>¶</a></h3>Lower TOSA to LinAlg named operationsPass that converts TOSA operations to the equivalent operations using the Linalg named operations.<h4 id=options-49>Options <a class=headline-hash href=#options-49>¶</a></h4><pre tabindex=0><code>-prefer-conv2d-kernel-layout-hwcf : Prefer generating linalg.conv_2d_nhwc_hwcf over linalg.conv_2d_nhwc_fhwc </code></pre><h3 id=-tosa-to-mlprogram><code>-tosa-to-mlprogram</code> <a class=headline-hash href=#-tosa-to-mlprogram>¶</a></h3>Lower TOSA to the MLProgram dialectPass that converts TOSA’s variable operator operations to the equivalent MLProgram operations.<h3 id=-tosa-to-scf><code>-tosa-to-scf</code> <a class=headline-hash href=#-tosa-to-scf>¶</a></h3>Lower TOSA to the SCF dialectPass that converts TOSA’s control flow operations to the equivalent SCF operations.<h3 id=-tosa-to-tensor><code>-tosa-to-tensor</code> <a class=headline-hash href=#-tosa-to-tensor>¶</a></h3>Lower TOSA to the Tensor dialectPass that converts TOSA operations to the equivalent operations using the operations in the Tensor dialect.<h2 id=acc-dialect-passes>‘acc’ Dialect Passes <a class=headline-hash href=#acc-dialect-passes>¶</a></h2><h3 id=-openacc-legalize-data-values><code>-openacc-legalize-data-values</code> <a class=headline-hash href=#-openacc-legalize-data-values>¶</a></h3>Legalizes SSA values in compute regions with results from data clause operationsThis pass replace uses of the <code>varPtr</code> in compute regions (kernels, parallel, serial) with the result of data clause operations (<code>accPtr</code>).<h4 id=options-50>Options <a class=headline-hash href=#options-50>¶</a></h4><pre tabindex=0><code>-host-to-device : Replace varPtr uses with accPtr if true. Replace accPtr uses with varPtr if false -apply-to-acc-data-construct : Replaces varPtr uses with accPtr for acc compute regions contained within acc.data or acc.declare region. </code></pre><h2 id=affine-dialect-passes>‘affine’ Dialect Passes <a class=headline-hash href=#affine-dialect-passes>¶</a></h2><h3 id=-affine-data-copy-generate><code>-affine-data-copy-generate</code> <a class=headline-hash href=#-affine-data-copy-generate>¶</a></h3>Generate explicit copying for affine memory operations<h4 id=options-51>Options <a class=headline-hash href=#options-51>¶</a></h4><pre tabindex=0><code>-fast-mem-capacity : Set fast memory space capacity in KiB (default: unlimited) -fast-mem-space : Fast memory space identifier for copy generation (default: 1) -generate-dma : Generate DMA instead of point-wise copy -min-dma-transfer : Minimum DMA transfer size supported by the target in bytes -slow-mem-space : Slow memory space identifier for copy generation (default: 0) -skip-non-unit-stride-loops : Testing purposes: avoid non-unit stride loop choice depths for copy placement -tag-mem-space : Tag memory space identifier for copy generation (default: 0) </code></pre><h3 id=-affine-expand-index-ops><code>-affine-expand-index-ops</code> <a class=headline-hash href=#-affine-expand-index-ops>¶</a></h3>Lower affine operations operating on indices into more fundamental operations<h3 id=-affine-expand-index-ops-as-affine><code>-affine-expand-index-ops-as-affine</code> <a class=headline-hash href=#-affine-expand-index-ops-as-affine>¶</a></h3>Lower affine operations operating on indices into affine.apply operations<h3 id=-affine-loop-coalescing><code>-affine-loop-coalescing</code> <a class=headline-hash href=#-affine-loop-coalescing>¶</a></h3>Coalesce nested loops with independent bounds into a single loop<h3 id=-affine-loop-fusion><code>-affine-loop-fusion</code> <a class=headline-hash href=#-affine-loop-fusion>¶</a></h3>Fuse affine loop nestsThis pass performs fusion of loop nests using a slicing-based approach. The transformation works on an MLIR <code>Block</code> granularity and applies to all blocks of the pass is run on. It combines two fusion strategies: producer-consumer fusion and sibling fusion. Producer-consumer fusion is aimed at fusing pairs of loops where the first one writes to a memref that the second reads. Sibling fusion targets pairs of loops that share no dependences between them but that load from the same memref. The fused loop nests, when possible, are rewritten to access significantly smaller local buffers instead of the original memref’s, and the latter are often either completely optimized away or contracted. This transformation leads to enhanced locality and lower memory footprint through the elimination or contraction of temporaries/intermediate memref’s. These benefits are sometimes achieved at the expense of redundant computation through a cost model that evaluates available choices such as the depth at which a source slice should be materialized in the designation slice.Example 1: Producer-consumer fusion. Input:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) { %0 = memref.alloc() : memref<10xf32> %1 = memref.alloc() : memref<10xf32> %cst = arith.constant 0.000000e+00 : f32 affine.for %arg2 = 0 to 10 { affine.store %cst, %0[%arg2] : memref<10xf32> affine.store %cst, %1[%arg2] : memref<10xf32> } affine.for %arg2 = 0 to 10 { %2 = affine.load %0[%arg2] : memref<10xf32> %3 = arith.addf %2, %2 : f32 affine.store %3, %arg0[%arg2] : memref<10xf32> } affine.for %arg2 = 0 to 10 { %2 = affine.load %1[%arg2] : memref<10xf32> %3 = arith.mulf %2, %2 : f32 affine.store %3, %arg1[%arg2] : memref<10xf32> } return } </code></pre></div>Output:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) { %0 = memref.alloc() : memref<1xf32> %1 = memref.alloc() : memref<1xf32> %cst = arith.constant 0.000000e+00 : f32 affine.for %arg2 = 0 to 10 { affine.store %cst, %0[0] : memref<1xf32> affine.store %cst, %1[0] : memref<1xf32> %2 = affine.load %1[0] : memref<1xf32> %3 = arith.mulf %2, %2 : f32 affine.store %3, %arg1[%arg2] : memref<10xf32> %4 = affine.load %0[0] : memref<1xf32> %5 = arith.addf %4, %4 : f32 affine.store %5, %arg0[%arg2] : memref<10xf32> } return } </code></pre></div>Example 2: Sibling fusion. Input:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>func.func @sibling_fusion(%arg0: memref<10x10xf32>, %arg1: memref<10x10xf32>, %arg2: memref<10x10xf32>, %arg3: memref<10x10xf32>, %arg4: memref<10x10xf32>) { affine.for %arg5 = 0 to 3 { affine.for %arg6 = 0 to 3 { %0 = affine.load %arg0[%arg5, %arg6] : memref<10x10xf32> %1 = affine.load %arg1[%arg5, %arg6] : memref<10x10xf32> %2 = arith.mulf %0, %1 : f32 affine.store %2, %arg3[%arg5, %arg6] : memref<10x10xf32> } } affine.for %arg5 = 0 to 3 { affine.for %arg6 = 0 to 3 { %0 = affine.load %arg0[%arg5, %arg6] : memref<10x10xf32> %1 = affine.load %arg2[%arg5, %arg6] : memref<10x10xf32> %2 = arith.addf %0, %1 : f32 affine.store %2, %arg4[%arg5, %arg6] : memref<10x10xf32> } } return } </code></pre></div>Output:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>func.func @sibling_fusion(%arg0: memref<10x10xf32>, %arg1: memref<10x10xf32>, %arg2: memref<10x10xf32>, %arg3: memref<10x10xf32>, %arg4: memref<10x10xf32>) { affine.for %arg5 = 0 to 3 { affine.for %arg6 = 0 to 3 { %0 = affine.load %arg0[%arg5, %arg6] : memref<10x10xf32> %1 = affine.load %arg1[%arg5, %arg6] : memref<10x10xf32> %2 = arith.mulf %0, %1 : f32 affine.store %2, %arg3[%arg5, %arg6] : memref<10x10xf32> %3 = affine.load %arg0[%arg5, %arg6] : memref<10x10xf32> %4 = affine.load %arg2[%arg5, %arg6] : memref<10x10xf32> %5 = arith.addf %3, %4 : f32 affine.store %5, %arg4[%arg5, %arg6] : memref<10x10xf32> } } return } </code></pre></div><h4 id=options-52>Options <a class=headline-hash href=#options-52>¶</a></h4><pre tabindex=0><code>-compute-tolerance : Fractional increase in additional computation tolerated while fusing -fast-mem-space : Faster memory space number to promote fusion buffers to -local-buf-threshold : Threshold size (KiB) for promoting local buffers to fast memory space -maximal : Enables maximal loop fusion -mode : fusion mode to attempt </code></pre><h3 id=-affine-loop-invariant-code-motion><code>-affine-loop-invariant-code-motion</code> <a class=headline-hash href=#-affine-loop-invariant-code-motion>¶</a></h3>Hoist loop invariant instructions outside of affine loops<h3 id=-affine-loop-normalize><code>-affine-loop-normalize</code> <a class=headline-hash href=#-affine-loop-normalize>¶</a></h3>Apply normalization transformations to affine loop-like ops<h4 id=options-53>Options <a class=headline-hash href=#options-53>¶</a></h4><pre tabindex=0><code>-promote-single-iter : Promote single iteration loops </code></pre><h3 id=-affine-loop-tile><code>-affine-loop-tile</code> <a class=headline-hash href=#-affine-loop-tile>¶</a></h3>Tile affine loop nests<h4 id=options-54>Options <a class=headline-hash href=#options-54>¶</a></h4><pre tabindex=0><code>-cache-size : Set size of cache to tile for in KiB (default: 512) -separate : Separate full and partial tiles (default: false) -tile-size : Use this tile size for all loops -tile-sizes : List of tile sizes for each perfect nest (overridden by -tile-size) </code></pre><h3 id=-affine-loop-unroll><code>-affine-loop-unroll</code> <a class=headline-hash href=#-affine-loop-unroll>¶</a></h3>Unroll affine loops<h4 id=options-55>Options <a class=headline-hash href=#options-55>¶</a></h4><pre tabindex=0><code>-unroll-factor : Use this unroll factor for all loops being unrolled -unroll-up-to-factor : Allow unrolling up to the factor specified -unroll-full : Fully unroll loops -unroll-num-reps : Unroll innermost loops repeatedly this many times -unroll-full-threshold : Unroll all loops with trip count less than or equal to this -cleanup-unroll : Fully unroll the cleanup loop when possible. </code></pre><h3 id=-affine-loop-unroll-jam><code>-affine-loop-unroll-jam</code> <a class=headline-hash href=#-affine-loop-unroll-jam>¶</a></h3>Unroll and jam affine loops<h4 id=options-56>Options <a class=headline-hash href=#options-56>¶</a></h4><pre tabindex=0><code>-unroll-jam-factor : Use this unroll jam factor for all loops (default 4) </code></pre><h3 id=-affine-parallelize><code>-affine-parallelize</code> <a class=headline-hash href=#-affine-parallelize>¶</a></h3>Convert affine.for ops into 1-D affine.parallel<h4 id=options-57>Options <a class=headline-hash href=#options-57>¶</a></h4><pre tabindex=0><code>-max-nested : Maximum number of nested parallel loops to produce. Defaults to unlimited (UINT_MAX). -parallel-reductions : Whether to parallelize reduction loops. Defaults to false. </code></pre><h3 id=-affine-pipeline-data-transfer><code>-affine-pipeline-data-transfer</code> <a class=headline-hash href=#-affine-pipeline-data-transfer>¶</a></h3>Pipeline non-blocking data transfers between explicitly managed levels of the memory hierarchyThis pass performs a transformation to overlap non-blocking DMA operations in a loop with computations through double buffering. This is achieved by advancing dma_start operations with respect to other operations.Input<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>func.func @pipelinedatatransfer() { %0 = memref.alloc() : memref<256xf32> %1 = memref.alloc() : memref<32xf32, 1> %2 = memref.alloc() : memref<1xf32> %c0 = arith.constant 0 : index %c128 = arith.constant 128 : index affine.for %i0 = 0 to 8 { affine.dma_start %0[%i0], %1[%i0], %2[%c0], %c128 : memref<256xf32>, memref<32xf32, 1>, memref<1xf32> affine.dma_wait %2[%c0], %c128 : memref<1xf32> %3 = affine.load %1[%i0] : memref<32xf32, 1> %4 = "compute"(%3) : (f32) -> f32 affine.store %4, %1[%i0] : memref<32xf32, 1> } return } </code></pre></div>Output<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>module { func.func @pipelinedatatransfer() { %c8 = arith.constant 8 : index %c0 = arith.constant 0 : index %0 = memref.alloc() : memref<256xf32> %c0_0 = arith.constant 0 : index %c128 = arith.constant 128 : index %1 = memref.alloc() : memref<2x32xf32, 1> %2 = memref.alloc() : memref<2x1xf32> affine.dma_start %0[%c0], %1[%c0 mod 2, %c0], %2[%c0 mod 2, symbol(%c0_0)], %c128 : memref<256xf32>, memref<2x32xf32, 1>, memref<2x1xf32> affine.for %arg0 = 1 to 8 { affine.dma_start %0[%arg0], %1[%arg0 mod 2, %arg0], %2[%arg0 mod 2, symbol(%c0_0)], %c128 : memref<256xf32>, memref<2x32xf32, 1>, memref<2x1xf32> %8 = affine.apply #map3(%arg0) %9 = affine.apply #map4(%8) %10 = affine.apply #map4(%8) affine.dma_wait %2[%8 mod 2, symbol(%c0_0)], %c128 : memref<2x1xf32> %11 = affine.load %1[%8 mod 2, %8] : memref<2x32xf32, 1> %12 = "compute"(%11) : (f32) -> f32 affine.store %12, %1[%8 mod 2, %8] : memref<2x32xf32, 1> } %3 = affine.apply #map3(%c8) %4 = affine.apply #map4(%3) %5 = affine.apply #map4(%3) affine.dma_wait %2[%3 mod 2, symbol(%c0_0)], %c128 : memref<2x1xf32> %6 = affine.load %1[%3 mod 2, %3] : memref<2x32xf32, 1> %7 = "compute"(%6) : (f32) -> f32 affine.store %7, %1[%3 mod 2, %3] : memref<2x32xf32, 1> memref.dealloc %2 : memref<2x1xf32> memref.dealloc %1 : memref<2x32xf32, 1> return } } </code></pre></div><h3 id=-affine-scalrep><code>-affine-scalrep</code> <a class=headline-hash href=#-affine-scalrep>¶</a></h3>Replace affine memref accesses by scalars by forwarding stores to loads and eliminating redundant loadsThis pass performs store to load forwarding and redundant load elimination for affine memref accesses and potentially eliminates the entire memref if all its accesses are forwarded.Input<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>func.func @store_load_affine_apply() -> memref<10x10xf32> { %cf7 = arith.constant 7.0 : f32 %m = memref.alloc() : memref<10x10xf32> affine.for %i0 = 0 to 10 { affine.for %i1 = 0 to 10 { affine.store %cf7, %m[%i0, %i1] : memref<10x10xf32> %v0 = affine.load %m[%i0, %i1] : memref<10x10xf32> %v1 = arith.addf %v0, %v0 : f32 } } return %m : memref<10x10xf32> } </code></pre></div>Output<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>module { func.func @store_load_affine_apply() -> memref<10x10xf32> { %cst = arith.constant 7.000000e+00 : f32 %0 = memref.alloc() : memref<10x10xf32> affine.for %arg0 = 0 to 10 { affine.for %arg1 = 0 to 10 { affine.store %cst, %0[%arg0, %arg1] : memref<10x10xf32> %1 = arith.addf %cst, %cst : f32 } } return %0 : memref<10x10xf32> } } </code></pre></div><h3 id=-affine-simplify-structures><code>-affine-simplify-structures</code> <a class=headline-hash href=#-affine-simplify-structures>¶</a></h3>Simplify affine expressions in maps/sets and normalize memrefs<h3 id=-affine-super-vectorize><code>-affine-super-vectorize</code> <a class=headline-hash href=#-affine-super-vectorize>¶</a></h3>Vectorize to a target independent n-D vector abstraction<h4 id=options-58>Options <a class=headline-hash href=#options-58>¶</a></h4><pre tabindex=0><code>-virtual-vector-size : Specify an n-D virtual vector size for vectorization. This must be greater than zero. -test-fastest-varying : Specify a 1-D, 2-D or 3-D pattern of fastest varying memory dimensions to match. See defaultPatterns in Vectorize.cpp for a description and examples. This is used for testing purposes -vectorize-reductions : Vectorize known reductions expressed via iter_args. Switched off by default. </code></pre><h2 id=amdgpu-dialect-passes>‘amdgpu’ Dialect Passes <a class=headline-hash href=#amdgpu-dialect-passes>¶</a></h2><h3 id=-amdgpu-emulate-atomics><code>-amdgpu-emulate-atomics</code> <a class=headline-hash href=#-amdgpu-emulate-atomics>¶</a></h3>Emulate atomic operations on chipsets that do not support themThis pass rewrites any AMDGPU-specific atomic operation that is not supported on the given <code>chipset</code> into a compare-and-swap loop.<h4 id=options-59>Options <a class=headline-hash href=#options-59>¶</a></h4><pre tabindex=0><code>-chipset : Chipset that these operations will run on </code></pre><h3 id=-amdgpu-resolve-strided-metadata><code>-amdgpu-resolve-strided-metadata</code> <a class=headline-hash href=#-amdgpu-resolve-strided-metadata>¶</a></h3>Resolve memref.extract_strided_metadata on AMDGPU opsThis pass rrewrites <code>memref.extract_strided_metadata</code> operations targeting the AMDGPU dialect casts.The patterns in this pass should normally be run alongside those in -expand-strided-metadata, and creating a pass that combines those two sets of patterns is the recommended way to use this functionality. However, this pass (which will likely need a second -expand-strided-metadata after it) is provided so that simple usecases do not need to create custom passes. These patterns have not been added to -expnad-strided-metadata to prevent the memref dialect from depending on platform-specific code.<h3 id=-amdgpu-transfer-read-to-load><code>-amdgpu-transfer-read-to-load</code> <a class=headline-hash href=#-amdgpu-transfer-read-to-load>¶</a></h3>Lower the operations from the vector transfer_read to vector loadThis pass creates a transfer read op lowering. A vector trasfer read op will be lowered to a combination of vector.load, arith.select and vector.broadcast.This pattern will make it possible for masked transfer_read to be lowered towards buffer load with bounds check, allowing a more optimized global load accessing pattern compared with existing implementation of llvm.intr.masked.load on vectors.<h2 id=arith-dialect-passes>‘arith’ Dialect Passes <a class=headline-hash href=#arith-dialect-passes>¶</a></h2><h3 id=-arith-emulate-unsupported-floats><code>-arith-emulate-unsupported-floats</code> <a class=headline-hash href=#-arith-emulate-unsupported-floats>¶</a></h3>Emulate operations on unsupported floats with extf/truncfEmulate arith and vector floating point operations that use float types which are unspported on a target by inserting extf/truncf pairs around all such operations in order to produce arithmetic that can be performed while preserving the original rounding behavior.This pass does not attempt to reason about the operations being performed to determine when type conversions can be elided.<h4 id=options-60>Options <a class=headline-hash href=#options-60>¶</a></h4><pre tabindex=0><code>-source-types : MLIR types without arithmetic support on a given target -target-type : MLIR type to convert the unsupported source types to </code></pre><h3 id=-arith-emulate-wide-int><code>-arith-emulate-wide-int</code> <a class=headline-hash href=#-arith-emulate-wide-int>¶</a></h3>Emulate 2*N-bit integer operations using N-bit operationsEmulate arith integer operations that use too wide integer types with equivalent operations on supported narrow integer types. This is done by splitting original integer values into two halves.This pass is intended preserve semantics but not necessarily provide the most efficient implementation. TODO: Optimize op emulation.Currently, only power-of-two integer bitwidths are supported.<h4 id=options-61>Options <a class=headline-hash href=#options-61>¶</a></h4><pre tabindex=0><code>-widest-int-supported : Widest integer type supported by the target </code></pre><h3 id=-arith-expand><code>-arith-expand</code> <a class=headline-hash href=#-arith-expand>¶</a></h3>Legalize Arith ops to be convertible to LLVM.<h4 id=options-62>Options <a class=headline-hash href=#options-62>¶</a></h4><pre tabindex=0><code>-include-bf16 : Enable the BF16 expansion patterns </code></pre><h3 id=-arith-int-range-narrowing><code>-arith-int-range-narrowing</code> <a class=headline-hash href=#-arith-int-range-narrowing>¶</a></h3>Reduce integer operations bitwidth based on integer range analysisThis pass runs integer range analysis and tries to narrow arith ops to the specified bitwidth based on its results.<code>bitwidthsSupported</code> assumed to be not wider than <code>index</code> type. TODO: get index width from DLTI.<h4 id=options-63>Options <a class=headline-hash href=#options-63>¶</a></h4><pre tabindex=0><code>-int-bitwidths-supported : Integer bitwidths supported </code></pre><h3 id=-arith-unsigned-when-equivalent><code>-arith-unsigned-when-equivalent</code> <a class=headline-hash href=#-arith-unsigned-when-equivalent>¶</a></h3>Replace signed ops with unsigned ones where they are proven equivalentReplace signed ops with their unsigned equivalents when integer range analysis determines that their arguments and results are all guaranteed to be non-negative when interpreted as signed integers. When this occurs, we know that the semantics of the signed and unsigned operations are the same, since they share the same behavior when their operands and results are in the range [0, signed_max(type)].The affect ops include division, remainder, shifts, min, max, and integer comparisons.<h3 id=-int-range-optimizations><code>-int-range-optimizations</code> <a class=headline-hash href=#-int-range-optimizations>¶</a></h3>Do optimizations based on integer range analysisThis pass runs integer range analysis and apllies optimizations based on its results. It replaces operations with known-constant results with said constants, rewrites <code>(0 <= %x < D) mod D</code> to <code>%x</code>.<h2 id=arm_sme-dialect-passes>‘arm_sme’ Dialect Passes <a class=headline-hash href=#arm_sme-dialect-passes>¶</a></h2><h3 id=-arm-sme-outer-product-fusion><code>-arm-sme-outer-product-fusion</code> <a class=headline-hash href=#-arm-sme-outer-product-fusion>¶</a></h3>Fuse ‘arm_sme.outerproduct’ operations into 2-way or 4-way widening variantsThis pass fuses ‘arm_sme.outerproduct’ operations that are chained via the accumulator into 2-way or 4-way ArmSME outer product operations.For example:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>%a0_ext = arith.extf %a0 : vector<[4]xf16> to vector<[4]xf32> %b0_ext = arith.extf %b0 : vector<[4]xf16> to vector<[4]xf32> %a1_ext = arith.extf %a1 : vector<[4]xf16> to vector<[4]xf32> %b1_ext = arith.extf %b1 : vector<[4]xf16> to vector<[4]xf32> %0 = arm_sme.outerproduct %a0_ext, %b0_ext : vector<[4]xf32>, vector<[4]xf32> %1 = arm_sme.outerproduct %a1_ext, %b1_ext acc(%0) : vector<[4]xf32>, vector<[4]xf32> </code></pre></div>Becomes:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>%a_packed = vector.interleave %a0, %a1 : vector<[4]xf16> -> vector<[8]xf16> %b_packed = vector.interleave %b0, %b1 : vector<[4]xf16> -> vector<[8]xf16> %0 = arm_sme.fmopa_2way %a_packed, %b_packed : vector<[8]xf16>, vector<[8]xf16> into vector<[4]x[4]xf32> </code></pre></div>For further information on the 2-way or 4-way widening ops see: <a href=https://mlir.llvm.org/docs/Dialects/ArmSME/#arm_smefmopa_2way-arm_smefmopa_2wayop>https://mlir.llvm.org/docs/Dialects/ArmSME/#arm_smefmopa_2way-arm_smefmopa_2wayop</a> <a href=https://mlir.llvm.org/docs/Dialects/ArmSME/#arm_smesmopa_4way-arm_smesmopa_4wayop>https://mlir.llvm.org/docs/Dialects/ArmSME/#arm_smesmopa_4way-arm_smesmopa_4wayop</a><h3 id=-arm-sme-vector-legalization><code>-arm-sme-vector-legalization</code> <a class=headline-hash href=#-arm-sme-vector-legalization>¶</a></h3>Legalize vectors for ArmSMEThis pass legalizes vector operations so that they can be lowered to ArmSME. This includes decomposing operations that operate on vector types larger than a single SME tile (e.g. <code>vector<[8]x[8]xf32></code>) into multiple SME tile-sized operations, as well as rewrites needed to get operations into forms compatible with SME lowerings.Note: Decomposition is currently limited to vector types that are an exact multiple of SME tiles. That is scalable in two dimensions, with both the rows and columns divisible by the SVE vector length for the element type.<h3 id=-enable-arm-streaming><code>-enable-arm-streaming</code> <a class=headline-hash href=#-enable-arm-streaming>¶</a></h3>Enable Armv9 Streaming SVE modeEnables the Armv9 Streaming SVE mode [1] for func.func ops by annotating them with attributes. See options for more details.[1] <a href=https://developer.arm.com/documentation/ddi0616/aa>https://developer.arm.com/documentation/ddi0616/aa</a><h4 id=options-64>Options <a class=headline-hash href=#options-64>¶</a></h4><pre tabindex=0><code>-streaming-mode : Select how streaming-mode is managed at the function-level. -za-mode : Select how ZA-storage is managed at the function-level. -if-required-by-ops : Only apply the selected streaming/ZA modes if the function contains ops that implement the ArmSMETileOpInterface. -if-scalable-and-supported : Only apply the selected streaming/ZA modes if the function contains supported scalable vector operations. </code></pre><h3 id=-test-arm-sme-tile-allocation><code>-test-arm-sme-tile-allocation</code> <a class=headline-hash href=#-test-arm-sme-tile-allocation>¶</a></h3>Tests SME ‘virtual tile’ allocationThis pass does tile allocation for SME “virtual tiles”. It is run at the ‘func.func’ op level, and assigns tile IDs (via an attribute) to all ops that implement the <code>ArmSMETileOpInterface</code>. Note: This pass is only intended to be used for testing, tile allocation is done as part of the ArmSME to LLVM conversion (<code>convert-arm-sme-to-llvm</code>).<h4 id=options-65>Options <a class=headline-hash href=#options-65>¶</a></h4><pre tabindex=0><code>-dump-tile-live-ranges : Dump the live ranges of SME tiles (for debugging) -preprocess-only : Only preprocess IR so it is ready for tile allocation (but do not allocate any tiles) </code></pre><h2 id=arm_sve-dialect-passes>‘arm_sve’ Dialect Passes <a class=headline-hash href=#arm_sve-dialect-passes>¶</a></h2><h3 id=-arm-sve-legalize-vector-storage><code>-arm-sve-legalize-vector-storage</code> <a class=headline-hash href=#-arm-sve-legalize-vector-storage>¶</a></h3>Ensures stores of SVE vector types will be legalThis pass ensures that loads, stores, and allocations of SVE vector types will be legal in the LLVM backend. It does this at the memref level, so this pass must be applied before lowering all the way to LLVM.This pass currently addresses two issues.<h4 id=loading-and-storing-predicate-types>Loading and storing predicate types <a class=headline-hash href=#loading-and-storing-predicate-types>¶</a></h4>It is only legal to load/store predicate types equal to (or greater than) a full predicate register, which in MLIR is <code>vector<[16]xi1></code>. Smaller predicate types (<code>vector<[1|2|4|8]xi1></code>) must be converted to/from a full predicate type (referred to as a <code>svbool</code>) before and after storing and loading respectively. This pass does this by widening allocations and inserting conversion intrinsics. Note: Non-powers-of-two masks (e.g. <code>vector<[7]xi1></code>), which are not SVE predicates, are ignored.For example:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>%alloca = memref.alloca() : memref<vector<[4]xi1>> %mask = vector.constant_mask [4] : vector<[4]xi1> memref.store %mask, %alloca[] : memref<vector<[4]xi1>> %reload = memref.load %alloca[] : memref<vector<[4]xi1>> </code></pre></div>Becomes:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>%alloca = memref.alloca() {alignment = 1 : i64} : memref<vector<[16]xi1>> %mask = vector.constant_mask [4] : vector<[4]xi1> %svbool = arm_sve.convert_to_svbool %mask : vector<[4]xi1> memref.store %svbool, %alloca[] : memref<vector<[16]xi1>> %reload_svbool = memref.load %alloca[] : memref<vector<[16]xi1>> %reload = arm_sve.convert_from_svbool %reload_svbool : vector<[4]xi1> </code></pre></div><h4 id=relax-alignments-for-sve-vector-allocas>Relax alignments for SVE vector allocas <a class=headline-hash href=#relax-alignments-for-sve-vector-allocas>¶</a></h4>The storage for SVE vector types only needs to have an alignment that matches the element type (for example 4 byte alignment for <code>f32</code>s). However, the LLVM backend currently defaults to aligning to <code>base size</code> x <code>element size</code> bytes. For non-legal vector types like <code>vector<[8]xf32></code> this results in 8 x 4 = 32-byte alignment, but the backend only supports up to 16-byte alignment for SVE vectors on the stack. Explicitly setting a smaller alignment prevents this issue.<h2 id=async-dialect-passes>‘async’ Dialect Passes <a class=headline-hash href=#async-dialect-passes>¶</a></h2><h3 id=-async-func-to-async-runtime><code>-async-func-to-async-runtime</code> <a class=headline-hash href=#-async-func-to-async-runtime>¶</a></h3>Lower async.func operations to the explicit async.runtime andasync.coro operations<h3 id=-async-parallel-for><code>-async-parallel-for</code> <a class=headline-hash href=#-async-parallel-for>¶</a></h3>Convert scf.parallel operations to multiple async compute ops executed concurrently for non-overlapping iteration ranges<h4 id=options-66>Options <a class=headline-hash href=#options-66>¶</a></h4><pre tabindex=0><code>-async-dispatch : Dispatch async compute tasks using recursive work splitting. If `false` async compute tasks will be launched using simple for loop in the caller thread. -num-workers : The number of available workers to execute async operations. If `-1` the value will be retrieved from the runtime. -min-task-size : The minimum task size for sharding parallel operation. </code></pre><h3 id=-async-runtime-policy-based-ref-counting><code>-async-runtime-policy-based-ref-counting</code> <a class=headline-hash href=#-async-runtime-policy-based-ref-counting>¶</a></h3>Policy based reference counting for Async runtime operationsThis pass works at the async runtime abtraction level, after all <code>async.execute</code> and <code>async.await</code> operations are lowered to the async runtime API calls, and async coroutine operations.This pass doesn’t rely on reference counted values liveness analysis, and instead uses simple policy to create reference counting operations. If the program violates any of the assumptions, then this pass might lead to memory leaks or runtime errors.The default reference counting policy assumptions:<ol><li>Async token can be awaited or added to the group only once.</li><li>Async value or group can be awaited only once.</li></ol>Under these assumptions reference counting only needs to drop reference:<ol><li>After <code>async.runtime.await</code> operation for async tokens and groups (until error handling is not implemented for the sync await).</li><li>After <code>async.runtime.is_error</code> operation for async tokens and groups (this is the last operation in the coroutine resume function).</li><li>After <code>async.runtime.load</code> operation for async values.</li></ol>This pass introduces significanly less runtime overhead compared to the automatic reference counting.<h3 id=-async-runtime-ref-counting><code>-async-runtime-ref-counting</code> <a class=headline-hash href=#-async-runtime-ref-counting>¶</a></h3>Automatic reference counting for Async runtime operationsThis pass works at the async runtime abtraction level, after all <code>async.execute</code> and <code>async.await</code> operations are lowered to the async runtime API calls, and async coroutine operations.It relies on the LLVM coroutines switched-resume lowering semantics for the correct placing of the reference counting operations.See: <a href=https://llvm.org/docs/Coroutines.html#switched-resume-lowering>https://llvm.org/docs/Coroutines.html#switched-resume-lowering</a><h3 id=-async-runtime-ref-counting-opt><code>-async-runtime-ref-counting-opt</code> <a class=headline-hash href=#-async-runtime-ref-counting-opt>¶</a></h3>Optimize automatic reference counting operations for theAsync runtime by removing redundant operations<h3 id=-async-to-async-runtime><code>-async-to-async-runtime</code> <a class=headline-hash href=#-async-to-async-runtime>¶</a></h3>Lower all high level async operations (e.g. async.execute) tothe explicit async.runtime and async.coro operations<h2 id=emitc-dialect-passes>’emitc’ Dialect Passes <a class=headline-hash href=#emitc-dialect-passes>¶</a></h2><h3 id=-form-expressions><code>-form-expressions</code> <a class=headline-hash href=#-form-expressions>¶</a></h3>Form C-style expressions from C-operator opsThe pass wraps emitc ops modelling C operators in emitc.expression ops and then folds single-use expressions into their users where possible.<h2 id=func-dialect-passes>‘func’ Dialect Passes <a class=headline-hash href=#func-dialect-passes>¶</a></h2><h3 id=-duplicate-function-elimination><code>-duplicate-function-elimination</code> <a class=headline-hash href=#-duplicate-function-elimination>¶</a></h3>Deduplicate functionsDeduplicate functions that are equivalent in all aspects but their symbol name. The pass chooses one representative per equivalence class, erases the remainder, and updates function calls accordingly.<h2 id=gpu-dialect-passes>‘gpu’ Dialect Passes <a class=headline-hash href=#gpu-dialect-passes>¶</a></h2><h3 id=-gpu-async-region><code>-gpu-async-region</code> <a class=headline-hash href=#-gpu-async-region>¶</a></h3>Make GPU ops async<h3 id=-gpu-decompose-memrefs><code>-gpu-decompose-memrefs</code> <a class=headline-hash href=#-gpu-decompose-memrefs>¶</a></h3>Decomposes memref index computation into explicit ops.This pass decomposes memref index computation into explicit computations on sizes/strides, obtained from <code>memref.extract_memref_metadata</code> which it tries to place outside of <code>gpu.launch</code> body. Memrefs are then reconstructed using <code>memref.reinterpret_cast</code>. This is needed for as some targets (SPIR-V) lower memrefs to bare pointers and sizes/strides for dynamically-sized memrefs are not available inside <code>gpu.launch</code>.<h3 id=-gpu-eliminate-barriers><code>-gpu-eliminate-barriers</code> <a class=headline-hash href=#-gpu-eliminate-barriers>¶</a></h3>Erase unnecessary barriersBarrier elimination pass. If a barrier does not enforce any conflicting pair of memory effects, including a pair that is enforced by another barrier, it is unnecessary and can be removed. Adapted from “High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs” by Moses, Ivanov, Domke, Endo, Doerfert, and Zinenko in PPoPP 2023 and implementation in Polygeist.<h3 id=-gpu-kernel-outlining><code>-gpu-kernel-outlining</code> <a class=headline-hash href=#-gpu-kernel-outlining>¶</a></h3>Outline gpu.launch bodies to kernel functions<h4 id=options-67>Options <a class=headline-hash href=#options-67>¶</a></h4><pre tabindex=0><code>-data-layout-str : String description of the data layout </code></pre><h3 id=-gpu-launch-sink-index-computations><code>-gpu-launch-sink-index-computations</code> <a class=headline-hash href=#-gpu-launch-sink-index-computations>¶</a></h3>Sink index computations into gpu.launch body<h3 id=-gpu-map-parallel-loops><code>-gpu-map-parallel-loops</code> <a class=headline-hash href=#-gpu-map-parallel-loops>¶</a></h3>Greedily maps loops to GPU hardware dimensions.Maps the parallel loops found in the given function to workgroups. The first loop encountered will be mapped to the global workgroup and the second loop encountered to the local workgroup. Within each mapping, the first three dimensions are mapped to x/y/z hardware ids and all following dimensions are mapped to sequential loops.<h3 id=-gpu-module-to-binary><code>-gpu-module-to-binary</code> <a class=headline-hash href=#-gpu-module-to-binary>¶</a></h3>Transforms a GPU module into a GPU binary.This pass searches for all nested GPU modules and serializes the module using the target attributes attached to the module, producing a GPU binary with an object for every target.The <code>format</code> argument can have the following values:<ol><li><code>offloading</code>, <code>llvm</code>: produces an offloading representation.</li><li><code>assembly</code>, <code>isa</code>: produces assembly code.</li><li><code>binary</code>, <code>bin</code>: produces binaries.</li><li><code>fatbinary</code>, <code>fatbin</code>: produces fatbinaries.</li></ol><h4 id=options-68>Options <a class=headline-hash href=#options-68>¶</a></h4><pre tabindex=0><code>-toolkit : Toolkit path. -l : Extra files to link to. -opts : Command line options to pass to the tools. -format : The target representation of the compilation process. -section : ELF section where binary is to be located. </code></pre><h3 id=-nvvm-attach-target><code>-nvvm-attach-target</code> <a class=headline-hash href=#-nvvm-attach-target>¶</a></h3>Attaches an NVVM target attribute to a GPU Module.This pass searches for all GPU Modules in the immediate regions and attaches an NVVM target if the module matches the name specified by the <code>module</code> argument.Example:<pre tabindex=0><code>// File: in.mlir: gpu.module @nvvm_module_1 {...} gpu.module @nvvm_module_2 {...} gpu.module @rocdl_module_1 {...} // mlir-opt --nvvm-attach-target="module=nvvm.* chip=sm_90" in.mlir gpu.module @nvvm_module_1 [#nvvm.target<chip = "sm_90">] {...} gpu.module @nvvm_module_2 [#nvvm.target<chip = "sm_90">] {...} gpu.module @rocdl_module_1 {...} </code></pre><h4 id=options-69>Options <a class=headline-hash href=#options-69>¶</a></h4><pre tabindex=0><code>-module : Regex used to identify the modules to attach the target to. -triple : Target triple. -chip : Target chip. -features : Target features. -O : Optimization level. -fast : Enable fast math mode. -ftz : Enable flush to zero for denormals. -l : Extra bitcode libraries paths to link to. -ptxas-cmd-options : Command line options passed to downstream compiler </code></pre><h3 id=-rocdl-attach-target><code>-rocdl-attach-target</code> <a class=headline-hash href=#-rocdl-attach-target>¶</a></h3>Attaches a ROCDL target attribute to a GPU Module.This pass searches for all GPU Modules in the immediate regions and attaches a ROCDL target if the module matches the name specified by the <code>module</code> argument.Example:<pre tabindex=0><code>// File: in.mlir: gpu.module @nvvm_module_1 {...} gpu.module @nvvm_module_2 {...} gpu.module @rocdl_module_1 {...} // mlir-opt --nvvm-attach-target="module=rocdl.* chip=gfx90a" in.mlir gpu.module @nvvm_module_1 {...} gpu.module @nvvm_module_2 {...} gpu.module @rocdl_module_1 [#rocdl.target<chip = "gfx90a">] {...} </code></pre><h4 id=options-70>Options <a class=headline-hash href=#options-70>¶</a></h4><pre tabindex=0><code>-module : Regex used to identify the modules to attach the target to. -triple : Target triple. -chip : Target chip. -features : Target features. -abi : ABI version. -O : Optimization level. -wave64 : Use Wave64 mode. -fast : Enable fast relaxed math opt. -daz : Enable denormals are zero opt. -finite-only : Enable finite only opt. -unsafe-math : Enable unsafe math opt. -correct-sqrt : Enable correct rounded sqrt. -l : Extra bitcode libraries paths to link to. </code></pre><h3 id=-spirv-attach-target><code>-spirv-attach-target</code> <a class=headline-hash href=#-spirv-attach-target>¶</a></h3>Attaches an SPIR-V target attribute to a GPU Module.This pass searches for all GPU Modules in the immediate regions and attaches an SPIR-V target if the module matches the name specified by the <code>module</code> argument.Example:<pre tabindex=0><code>// Given the following file: in1.mlir: gpu.module @nvvm_module_1 {...} gpu.module @spirv_module_1 {...} // With // mlir-opt --spirv-attach-target="module=spirv.* ver=v1.0 caps=Kernel" in1.mlir // it will generate, gpu.module @nvvm_module_1 {...} gpu.module @spirv_module_1 [#spirv.target<#spirv.vce<v1.0, [Kernel], []>, #spirv.resource_limits<>>] {...} </code></pre><h4 id=options-71>Options <a class=headline-hash href=#options-71>¶</a></h4><pre tabindex=0><code>-module : Regex used to identify the modules to attach the target to. -ver : SPIR-V Version. -caps : List of supported SPIR-V Capabilities -exts : List of supported SPIR-V Extensions -client_api : Client API -vendor : Device Vendor -device_type : Device Type -device_id : Device ID </code></pre><h2 id=linalg-dialect-passes>’linalg’ Dialect Passes <a class=headline-hash href=#linalg-dialect-passes>¶</a></h2><h3 id=-convert-elementwise-to-linalg><code>-convert-elementwise-to-linalg</code> <a class=headline-hash href=#-convert-elementwise-to-linalg>¶</a></h3>Convert ElementwiseMappable ops to linalgConvert ops with the <code>ElementwiseMappable</code> trait to linalg parallel loops.This pass only converts ops that operate on ranked tensors. It can be run on op which contains linalg ops (most commonly a FunctionOpInterface op).<h3 id=-convert-linalg-to-affine-loops><code>-convert-linalg-to-affine-loops</code> <a class=headline-hash href=#-convert-linalg-to-affine-loops>¶</a></h3>Lower the operations from the linalg dialect into affine loops<h3 id=-convert-linalg-to-loops><code>-convert-linalg-to-loops</code> <a class=headline-hash href=#-convert-linalg-to-loops>¶</a></h3>Lower the operations from the linalg dialect into loopsLowers the <code>linalg</code> ops to loop nests using <code>scf.for</code>.Pre-condition: the operands used by the <code>linalg</code> ops have buffer semantics, i.e., tensor operands and results must be converted to memrefs via bufferization.<h3 id=-convert-linalg-to-parallel-loops><code>-convert-linalg-to-parallel-loops</code> <a class=headline-hash href=#-convert-linalg-to-parallel-loops>¶</a></h3>Lower the operations from the linalg dialect into parallel loops<h3 id=-linalg-block-pack-matmul><code>-linalg-block-pack-matmul</code> <a class=headline-hash href=#-linalg-block-pack-matmul>¶</a></h3>Convert linalg matmul ops to block layout and backPack a matmul operation into blocked layout with two levels of subdivision:<ul><li>major 2D blocks - outer dimensions, consist of minor blocks</li><li>minor 2D blocks - inner dimensions, consist of scalar elements</li></ul>A 2D matmul MxNxK gets reshaped into blocked 4D representation as: [MB][NB][mb][nb] += [MB][KB][mb][kb] * [NB][KB][nb][kb] where the (MB, NB, KB) dimensions represent the major blocks, and the (mb, nb, kb) are the minor blocks of their respective original 2D dimensions (M, N, K).Depending on the initial operands’ data layout and the specified packing options, the major blocks dimensions might get transposed e.g., [MB][KB] -> [KB][MB]. The minor blocks can also be transposed e.g., [mb][kb] -> [kb][mb]. Any present batch dimensions remain unchanged. The final result is unpacked back to the original shape.For example, given a matmul operation:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir> %res = linalg.matmul ins(%A, %B) outs(%C) </code></pre></div>the default transformation result can be represented as:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir> %A_packed = pack %A : 2D <MxK> -> 4D <MBxKBxmbxkb> %B_packed = pack %B : 2D <KxN> -> 4D <NBxKBxnbxkb> %C_packed = pack %C : 2D <MxN> -> 4D <MBxNBxmbxnb> %res_packed = linalg.mmt4d ins(%A_packed, %B_packed) outs(%C_packed) %res = unpack %res_packed : 4D <MBxNBxmbxnb> -> 2D <MxN> </code></pre></div><h4 id=options-72>Options <a class=headline-hash href=#options-72>¶</a></h4><pre tabindex=0><code>-block-factors : Block factors (mb, nb, kb) for relayout -allow-padding : Allow packing padding -mnk-padded-multiples : Next multiples of the packing sizes -mnk-order : Permutation of matmul (M, N, K) dimensions order -lhs-transpose-outer-blocks : Transpose LHS outer block layout [MB][KB] -> [KB][MB] -lhs-transpose-inner-blocks : Transpose LHS inner block layout [mb][kb] -> [kb][mb] -rhs-transpose-outer-blocks : Transpose RHS outer block layout [KB][NB] -> [NB][KB] -rhs-transpose-inner-blocks : Transpose RHS inner block layout [kb][nb] -> [nb][kb] </code></pre><h3 id=-linalg-detensorize><code>-linalg-detensorize</code> <a class=headline-hash href=#-linalg-detensorize>¶</a></h3>Detensorize linalg opsDetensoring is the process through which a tensor value is converted to one or potentially more primitive value(s). During this process, operations with such detensored operands are also converted to an equivalent form that works on primitives.The detensoring process is driven by linalg-on-tensor ops. In particular, a linalg-on-tensor op is checked to see whether all its operands can be detensored. If so, those operands are converted to their primitive counterparts and the linalg op is replaced by an equivalent op that takes those new primitive values as operands. Therefore, detensoring an op can be divided into 2 main logical phases:<ol><li>Detect/match an op that can be detensored.</li><li>Detensor the operands of the op and replace it with a primitive equivalent.</li></ol>In addition to detensoring individual ops, this pass detensors internal control flow inside a function. All blocks except for the entry block are detensored by converting their arguments whenever possible.This can be run on any FunctionOpInterface op and must not be run on others. This is because it performs specific legalization of the blocks that make up the body, which it assumes has is a FunctionOpInterface.<h4 id=options-73>Options <a class=headline-hash href=#options-73>¶</a></h4><pre tabindex=0><code>-aggressive-mode : Detensorize all ops that qualify for detensoring along with branch operands and basic-block arguments. </code></pre><h3 id=-linalg-fold-into-elementwise><code>-linalg-fold-into-elementwise</code> <a class=headline-hash href=#-linalg-fold-into-elementwise>¶</a></h3>Fold transform, broadcast and other ops into elementwise<h3 id=-linalg-fold-unit-extent-dims><code>-linalg-fold-unit-extent-dims</code> <a class=headline-hash href=#-linalg-fold-unit-extent-dims>¶</a></h3>Remove unit-extent dimension in Linalg ops on tensors<h4 id=options-74>Options <a class=headline-hash href=#options-74>¶</a></h4><pre tabindex=0><code>-use-rank-reducing-slices : Generate rank-reducing slices instead of reassociative reshapes </code></pre><h3 id=-linalg-fuse-elementwise-ops><code>-linalg-fuse-elementwise-ops</code> <a class=headline-hash href=#-linalg-fuse-elementwise-ops>¶</a></h3>Fuse elementwise operations on tensors<h3 id=-linalg-generalize-named-ops><code>-linalg-generalize-named-ops</code> <a class=headline-hash href=#-linalg-generalize-named-ops>¶</a></h3>Convert named ops into generic ops<h3 id=-linalg-inline-scalar-operands><code>-linalg-inline-scalar-operands</code> <a class=headline-hash href=#-linalg-inline-scalar-operands>¶</a></h3>Inline scalar operands into linalg generic ops<h3 id=-linalg-named-op-conversion><code>-linalg-named-op-conversion</code> <a class=headline-hash href=#-linalg-named-op-conversion>¶</a></h3>Convert from one named linalg op to another.<h3 id=-linalg-specialize-generic-ops><code>-linalg-specialize-generic-ops</code> <a class=headline-hash href=#-linalg-specialize-generic-ops>¶</a></h3>Convert generic ops back to named ops<h2 id=llvm-dialect-passes>’llvm’ Dialect Passes <a class=headline-hash href=#llvm-dialect-passes>¶</a></h2><h3 id=-ensure-debug-info-scope-on-llvm-func><code>-ensure-debug-info-scope-on-llvm-func</code> <a class=headline-hash href=#-ensure-debug-info-scope-on-llvm-func>¶</a></h3>Materialize LLVM debug info subprogram attribute on every LLVMFuncOpHaving a debug info subprogram attribute on a function is required for emitting line tables from MLIR FileLocCol locations.This is not intended to be a proper replacement for frontends to emit complete debug informations, however it is a convenient way to get line tables for debugging purposes. This allow to step trough in a debugger line-by-line or get a backtrace with line numbers.<h4 id=options-75>Options <a class=headline-hash href=#options-75>¶</a></h4><pre tabindex=0><code>-emission-kind : Emission kind to generate debug info. </code></pre><h3 id=-llvm-add-comdats><code>-llvm-add-comdats</code> <a class=headline-hash href=#-llvm-add-comdats>¶</a></h3>Add comdats to linkonce and linkonce_odr functionsAdd an any COMDAT to every linkonce and linkonce_odr function. This is necessary on Windows to link these functions as the system linker won’t link weak symbols without a COMDAT. It also provides better behavior than standard weak symbols on ELF-based platforms. This pass will still add COMDATs on platforms that do not support them, for example macOS, so should only be run when the target platform supports COMDATs.<h3 id=-llvm-legalize-for-export><code>-llvm-legalize-for-export</code> <a class=headline-hash href=#-llvm-legalize-for-export>¶</a></h3>Legalize LLVM dialect to be convertible to LLVM IR<h3 id=-llvm-optimize-for-nvvm-target><code>-llvm-optimize-for-nvvm-target</code> <a class=headline-hash href=#-llvm-optimize-for-nvvm-target>¶</a></h3>Optimize NVVM IR<h3 id=-llvm-request-c-wrappers><code>-llvm-request-c-wrappers</code> <a class=headline-hash href=#-llvm-request-c-wrappers>¶</a></h3>Request C wrapper emission for all functionsAnnotate every builtin function in the module with the LLVM dialect attribute that instructs the conversion to LLVM to emit the C wrapper for the function. This pass is expected to be applied immediately before the conversion of builtin functions to LLVM to avoid the attribute being dropped by other passes.<h2 id=math-dialect-passes>‘math’ Dialect Passes <a class=headline-hash href=#math-dialect-passes>¶</a></h2><h3 id=-math-extend-to-supported-types><code>-math-extend-to-supported-types</code> <a class=headline-hash href=#-math-extend-to-supported-types>¶</a></h3>Legalize floating-point math ops on low-precision floatsOn many targets, the math functions are not implemented for floating-point types less precise than IEEE single-precision (aka f32), such as half-floats, bfloat16, or 8-bit floats.This pass explicitly legalizes these math functions by inserting <code>arith.extf</code> and <code>arith.truncf</code> pairs around said op, which preserves the original semantics while enabling lowering. The extra supported floating-point types for the target are passed as arguments. Types f64 and f32 are implicitly supported.As an exception, this pass does not legalize <code>math.fma</code>, because that is an operation frequently implemented at low precisions.<h4 id=options-76>Options <a class=headline-hash href=#options-76>¶</a></h4><pre tabindex=0><code>-extra-types : MLIR types with arithmetic support on a given target (f64 and f32 are implicitly supported) -target-type : MLIR type to convert the unsupported source types to </code></pre><h3 id=-math-uplift-to-fma><code>-math-uplift-to-fma</code> <a class=headline-hash href=#-math-uplift-to-fma>¶</a></h3>Uplift arith ops to math.fma.Uplift sequence of addf and mulf ops to math.fma if fastmath flags allows it.<h2 id=memref-dialect-passes>‘memref’ Dialect Passes <a class=headline-hash href=#memref-dialect-passes>¶</a></h2><h3 id=-expand-realloc><code>-expand-realloc</code> <a class=headline-hash href=#-expand-realloc>¶</a></h3>Expand memref.realloc operations into its componentsThe <code>memref.realloc</code> operation performs a conditional allocation and copy to increase the size of a buffer if necessary. This pass converts a <code>realloc</code> operation into this sequence of simpler operations such that other passes at a later stage in the compilation pipeline do not have to consider the <code>realloc</code> operation anymore (e.g., the buffer deallocation pass and the conversion pass to LLVM).Example of an expansion:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>%realloc = memref.realloc %alloc (%size) : memref<?xf32> to memref<?xf32> </code></pre></div>is expanded to<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>%c0 = arith.constant 0 : index %dim = memref.dim %alloc, %c0 : memref<?xf32> %is_old_smaller = arith.cmpi ult, %dim, %arg1 %realloc = scf.if %is_old_smaller -> (memref<?xf32>) { %new_alloc = memref.alloc(%size) : memref<?xf32> %subview = memref.subview %new_alloc[0] [%dim] [1] memref.copy %alloc, %subview memref.dealloc %alloc scf.yield %alloc_0 : memref<?xf32> } else { %reinterpret_cast = memref.reinterpret_cast %alloc to offset: [0], sizes: [%size], strides: [1] scf.yield %reinterpret_cast : memref<?xf32> } </code></pre></div><h4 id=options-77>Options <a class=headline-hash href=#options-77>¶</a></h4><pre tabindex=0><code>-emit-deallocs : Emit deallocation operations for the original MemRef </code></pre><h3 id=-expand-strided-metadata><code>-expand-strided-metadata</code> <a class=headline-hash href=#-expand-strided-metadata>¶</a></h3>Expand memref operations into easier to analyze constructsThe pass expands memref operations that modify the metadata of a memref (sizes, offset, strides) into a sequence of easier to analyze constructs. In particular, this pass transforms operations into explicit sequence of operations that model the effect of this operation on the different metadata. This pass uses affine constructs to materialize these effects.Supported ops include:<ul><li><code>memref.collapse_shape</code></li><li><code>memref.expand_shape</code></li><li><code>memref.extract_aligned_pointer_as_index</code></li><li><code>memref.extract_strided_metadata</code></li><li><code>memref.subview</code></li></ul><h3 id=-fold-memref-alias-ops><code>-fold-memref-alias-ops</code> <a class=headline-hash href=#-fold-memref-alias-ops>¶</a></h3>Fold memref alias ops into consumer load/store opsThe pass folds loading/storing from/to memref aliasing ops to loading/storing from/to the original memref.<h3 id=-memref-emulate-wide-int><code>-memref-emulate-wide-int</code> <a class=headline-hash href=#-memref-emulate-wide-int>¶</a></h3>Emulate 2*N-bit integer operations using N-bit operationsEmulate memref integer operations that use too wide integer types with equivalent operations on supported narrow integer types. This is done by splitting original integer values into two halves.Currently, only power-of-two integer bitwidths are supported.<h4 id=options-78>Options <a class=headline-hash href=#options-78>¶</a></h4><pre tabindex=0><code>-widest-int-supported : Widest integer type supported by the target </code></pre><h3 id=-memref-expand><code>-memref-expand</code> <a class=headline-hash href=#-memref-expand>¶</a></h3>Legalize memref operations to be convertible to LLVM.<h3 id=-normalize-memrefs><code>-normalize-memrefs</code> <a class=headline-hash href=#-normalize-memrefs>¶</a></h3>Normalize memrefsThis pass transforms memref types with a non-trivial <a href=https://mlir.llvm.org/docs/Dialects/Builtin/#affine-map-layout>layout map</a> into memref types with an identity layout map, e.g. (i, j) -> (i, j). This pass is inter-procedural, in the sense that it can modify function interfaces and call sites that pass memref types. In order to modify memref types while preserving the original behavior, users of those memref types are also modified to incorporate the resulting layout map. For instance, an <a href=https://mlir.llvm.org/docs/Dialects/Affine/#affineload-mliraffineloadop>AffineLoadOp</a> will be updated to compose the layout map with with the affine expression contained in the op. Operations marked with the <a href=https://mlir.llvm.org/docs/Traits/#memrefsnormalizable>MemRefsNormalizable</a> trait are expected to be normalizable. Supported operations include affine operations, memref.alloc, memref.dealloc, and func.return.Given an appropriate layout map specified in the code, this transformation can express tiled or linearized access to multi-dimensional data structures, but will not modify memref types without an explicit layout map.Currently this pass is limited to only modify functions where all memref types can be normalized. If a function contains any operations that are not MemRefNormalizable, then the function and any functions that call or call it will not be modified.Input<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>#tile = affine_map<(i) -> (i floordiv 4, i mod 4)> func.func @matmul(%A: memref<16xf64, #tile>, %B: index, %C: memref<16xf64>) -> (memref<16xf64, #tile>) { affine.for %arg3 = 0 to 16 { %a = affine.load %A[%arg3] : memref<16xf64, #tile> %p = arith.mulf %a, %a : f64 affine.store %p, %A[%arg3] : memref<16xf64, #tile> } %c = memref.alloc() : memref<16xf64, #tile> %d = affine.load %c[0] : memref<16xf64, #tile> return %A: memref<16xf64, #tile> } </code></pre></div>Output<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>func.func @matmul(%arg0: memref<4x4xf64>, %arg1: index, %arg2: memref<16xf64>) -> memref<4x4xf64> { affine.for %arg3 = 0 to 16 { %3 = affine.load %arg0[%arg3 floordiv 4, %arg3 mod 4]: memref<4x4xf64> %4 = arith.mulf %3, %3 : f64 affine.store %4, %arg0[%arg3 floordiv 4, %arg3 mod 4]: memref<4x4xf64> } %0 = memref.alloc() : memref<4x4xf64> %1 = affine.apply #map1() %2 = affine.load %0[0, 0] : memref<4x4xf64> return %arg0 : memref<4x4xf64> } </code></pre></div>Input<pre tabindex=0><code>#linear8 = affine_map<(i, j) -> (i * 8 + j)> func.func @linearize(%arg0: memref<8x8xi32, #linear8>, %arg1: memref<8x8xi32, #linear8>, %arg2: memref<8x8xi32, #linear8>) { %c8 = arith.constant 8 : index %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index affine.for %arg3 = %c0 to %c8 { affine.for %arg4 = %c0 to %c8 { affine.for %arg5 = %c0 to %c8 { %0 = affine.load %arg0[%arg3, %arg5] : memref<8x8xi32, #linear8> %1 = affine.load %arg1[%arg5, %arg4] : memref<8x8xi32, #linear8> %2 = affine.load %arg2[%arg3, %arg4] : memref<8x8xi32, #linear8> %3 = arith.muli %0, %1 : i32 %4 = arith.addi %2, %3 : i32 affine.store %4, %arg2[%arg3, %arg4] : memref<8x8xi32, #linear8> } } } return } </code></pre>Output<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>func.func @linearize(%arg0: memref<64xi32>, %arg1: memref<64xi32>, %arg2: memref<64xi32>) { %c8 = arith.constant 8 : index %c0 = arith.constant 0 : index affine.for %arg3 = %c0 to %c8 { affine.for %arg4 = %c0 to %c8 { affine.for %arg5 = %c0 to %c8 { %0 = affine.load %arg0[%arg3 * 8 + %arg5] : memref<64xi32> %1 = affine.load %arg1[%arg5 * 8 + %arg4] : memref<64xi32> %2 = affine.load %arg2[%arg3 * 8 + %arg4] : memref<64xi32> %3 = arith.muli %0, %1 : i32 %4 = arith.addi %2, %3 : i32 affine.store %4, %arg2[%arg3 * 8 + %arg4] : memref<64xi32> } } } return } </code></pre></div><h3 id=-resolve-ranked-shaped-type-result-dims><code>-resolve-ranked-shaped-type-result-dims</code> <a class=headline-hash href=#-resolve-ranked-shaped-type-result-dims>¶</a></h3>Resolve memref.dim of result values of ranked shape typeThe pass resolves memref.dim of result of operations that implement the <code>ReifyRankedShapedTypeOpInterface</code> in terms of shapes of its operands.<h3 id=-resolve-shaped-type-result-dims><code>-resolve-shaped-type-result-dims</code> <a class=headline-hash href=#-resolve-shaped-type-result-dims>¶</a></h3>Resolve memref.dim of result valuesThe pass resolves memref.dim of result of operations that implement the <code>InferShapedTypeOpInterface</code> or <code>ReifyRankedShapedTypeOpInterface</code> in terms of shapes of its operands.<h2 id=mesh-dialect-passes>‘mesh’ Dialect Passes <a class=headline-hash href=#mesh-dialect-passes>¶</a></h2><h3 id=-mesh-spmdization><code>-mesh-spmdization</code> <a class=headline-hash href=#-mesh-spmdization>¶</a></h3>Partition a function into SPMD form.This pass fits in right after a pass that annotates the function with shardings like the <code>ShardingPropagation</code> pass. It operates on a fully annotated IR.A fully annotated IR required that all ranked tensor operands, results and block arguments are annotated with the <code>mesh.shard</code> operation.All direct descendant operations in the function must implement the <code>ShardingInterface</code> interface or all their ranked tensor operands and results must have full replication sharding.The input IR must have sharding annotations such that each operation that implements <code>ShardingInterface</code> can handle during spmdization with its <code>spmdize</code> method. This can be achieved with the <code>ShardingPropagation</code> pass.If the function has multiple terminating blocks, it is the responsibility of the the one who annotates the function with shardings to make sure that all returns would be consisted that is, have the same sharding.Example:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>mesh.mesh @mesh_1d(shape = 2) func.func @f( %arg0: tensor<2xi8> ) -> tensor<2xi8> { %0 = mesh.shard %arg0 to <@mesh_1d, [[0]]> : tensor<2xi8> %1 = mesh.shard %0 to <@mesh_1d, [[0]]> annotate_for_users: tensor<2xi8> %2 = tosa.abs %1 : (tensor<2xi8>) -> tensor<2xi8> %3 = mesh.shard %2 to <@mesh_1d, [[0]]> : tensor<2xi8> %4 = mesh.shard %3 to <@mesh_1d, [[]]> annotate_for_users: tensor<2xi8> return %4 : tensor<2xi8> } </code></pre></div>Spmdizing the above would result in<ul><li>Performing the element-wise <code>abs</code> operation on each device.</li><li>Resharding to full replication with an all-gather.</li></ul><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>mesh.mesh @mesh_1d(shape = 2) func.func @f(%arg0: tensor<1xi8>) -> tensor<2xi8> { %0 = tosa.abs %arg0 : (tensor<1xi8>) -> tensor<1xi8> %1 = mesh.all_gather %0 on @mesh_1d mesh_axes = [0] gather_axis = 0 : tensor<1xi8> -> tensor<2xi8> return %1 : tensor<2xi8> } </code></pre></div><h3 id=-sharding-propagation><code>-sharding-propagation</code> <a class=headline-hash href=#-sharding-propagation>¶</a></h3>Sharding propagationPropagates sharding information throughout the graph. After this pass, each of the operations’ operands and results is annotated with a <code>mesh.shard</code> operation, and the operations themselves are added with sharding option attributes.<h2 id=ml_program-dialect-passes>‘ml_program’ Dialect Passes <a class=headline-hash href=#ml_program-dialect-passes>¶</a></h2><h3 id=-mlprogram-pipeline-globals><code>-mlprogram-pipeline-globals</code> <a class=headline-hash href=#-mlprogram-pipeline-globals>¶</a></h3>Optimize <code>ml_program</code> global operations for read and store<code>ml_program</code>’s load and store operations can be optimized for write-write or write-read sets of operations. This allows known tensors to not be re-read when the value is already known in IR.The pass is designed to handle both nested regions and function calls safely.<h2 id=nvgpu-dialect-passes>’nvgpu’ Dialect Passes <a class=headline-hash href=#nvgpu-dialect-passes>¶</a></h2><h3 id=-nvgpu-optimize-shared-memory><code>-nvgpu-optimize-shared-memory</code> <a class=headline-hash href=#-nvgpu-optimize-shared-memory>¶</a></h3>Optimizes accesses to shard memory memrefs in order to reduce bank conflicts.<h2 id=reducer-passes>Reducer Passes <a class=headline-hash href=#reducer-passes>¶</a></h2><h3 id=-opt-reduction-pass><code>-opt-reduction-pass</code> <a class=headline-hash href=#-opt-reduction-pass>¶</a></h3>A wrapper pass that reduces the file with optimization passes<h4 id=options-79>Options <a class=headline-hash href=#options-79>¶</a></h4><pre tabindex=0><code>-opt-pass : The optimization passes used for reduction, e.g., symbol-dce -test : The location of the tester which tests the file interestingness -test-arg : arguments of the tester </code></pre><h3 id=-reduction-tree><code>-reduction-tree</code> <a class=headline-hash href=#-reduction-tree>¶</a></h3>Reduce the input with reduction-tree algorithm<h4 id=options-80>Options <a class=headline-hash href=#options-80>¶</a></h4><pre tabindex=0><code>-traversal-mode : The graph traversal mode, the default is single-path mode -test : The location of the tester which tests the file interestingness -test-arg : arguments of the tester </code></pre><h2 id=scf-dialect-passes>‘scf’ Dialect Passes <a class=headline-hash href=#scf-dialect-passes>¶</a></h2><h3 id=-scf-for-loop-canonicalization><code>-scf-for-loop-canonicalization</code> <a class=headline-hash href=#-scf-for-loop-canonicalization>¶</a></h3>Canonicalize operations within scf.for loop bodies<h3 id=-scf-for-loop-peeling><code>-scf-for-loop-peeling</code> <a class=headline-hash href=#-scf-for-loop-peeling>¶</a></h3>Peel <code>for</code> loops at their upper bounds.<h4 id=options-81>Options <a class=headline-hash href=#options-81>¶</a></h4><pre tabindex=0><code>-peel-front : Peel the first iteration out of the loop. -skip-partial : Do not peel loops inside of the last, partial iteration of another already peeled loop. </code></pre><h3 id=-scf-for-loop-range-folding><code>-scf-for-loop-range-folding</code> <a class=headline-hash href=#-scf-for-loop-range-folding>¶</a></h3>Fold add/mul ops into loop range<h3 id=-scf-for-loop-specialization><code>-scf-for-loop-specialization</code> <a class=headline-hash href=#-scf-for-loop-specialization>¶</a></h3>Specialize <code>for</code> loops for vectorization<h3 id=-scf-for-to-while><code>-scf-for-to-while</code> <a class=headline-hash href=#-scf-for-to-while>¶</a></h3>Convert SCF for loops to SCF while loopsThis pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the ‘before’ region of the while operation, and the induction variable incrementation and loop body in the ‘after’ region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop. Any ‘yield’ ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir> scf.for %i = %c0 to %arg1 step %c1 { %0 = arith.addi %arg2, %arg2 : i32 memref.store %0, %arg0[%i] : memref<?xi32> } # After: %0 = scf.while (%i = %c0) : (index) -> index { %1 = arith.cmpi slt, %i, %arg1 : index scf.condition(%1) %i : index } do { ^bb0(%i: index): %1 = arith.addi %i, %c1 : index %2 = arith.addi %arg2, %arg2 : i32 memref.store %2, %arg0[%i] : memref<?xi32> scf.yield %1 : index } </code></pre></div><h3 id=-scf-forall-to-for><code>-scf-forall-to-for</code> <a class=headline-hash href=#-scf-forall-to-for>¶</a></h3>Convert SCF forall loops to SCF for loops<h3 id=-scf-forall-to-parallel><code>-scf-forall-to-parallel</code> <a class=headline-hash href=#-scf-forall-to-parallel>¶</a></h3>Convert SCF forall loops to SCF parallel loops<h3 id=-scf-parallel-loop-fusion><code>-scf-parallel-loop-fusion</code> <a class=headline-hash href=#-scf-parallel-loop-fusion>¶</a></h3>Fuse adjacent parallel loops<h3 id=-scf-parallel-loop-specialization><code>-scf-parallel-loop-specialization</code> <a class=headline-hash href=#-scf-parallel-loop-specialization>¶</a></h3>Specialize parallel loops for vectorization<h3 id=-scf-parallel-loop-tiling><code>-scf-parallel-loop-tiling</code> <a class=headline-hash href=#-scf-parallel-loop-tiling>¶</a></h3>Tile parallel loops<h4 id=options-82>Options <a class=headline-hash href=#options-82>¶</a></h4><pre tabindex=0><code>-parallel-loop-tile-sizes : Factors to tile parallel loops by -no-min-max-bounds : Perform tiling with fixed upper bound with inbound check inside the internal loops </code></pre><h3 id=-test-scf-parallel-loop-collapsing><code>-test-scf-parallel-loop-collapsing</code> <a class=headline-hash href=#-test-scf-parallel-loop-collapsing>¶</a></h3>Test parallel loops collapsing transformationThis pass is purely for testing the scf::collapseParallelLoops transformation. The transformation does not have opinions on how a parallel loop should be collapsed, so this pass is structured for the common case on GPUs of collapsing to a 3d parallel loop. 3 lists can be provided to collapsed-indices-{0,1,2} to represent how the loop should be collapsed and must reference evrey iterator in the original parallel loop.<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>scf.parallel (%arg0, %arg1) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) { "test.sink"(%5, %3) : (index, index) -> () scf.yield } # After: scf.parallel (%arg0) = (%c0) to (%c4) step (%c1) { %0 = arith.remsi %arg0, %c2 : index %1 = arith.divsi %arg0, %c2 : index %2 = arith.muli %0, %c7 : index %3 = arith.addi %2, %c3 : index %4 = arith.muli %1, %c7 : index %5 = arith.addi %4, %c3 : index "test.sink"(%5, %3) : (index, index) -> () } </code></pre></div><h4 id=options-83>Options <a class=headline-hash href=#options-83>¶</a></h4><pre tabindex=0><code>-collapsed-indices-0 : Which loop indices to combine 0th loop index -collapsed-indices-1 : Which loop indices to combine into the position 1 loop index -collapsed-indices-2 : Which loop indices to combine into the position 2 loop index </code></pre><h2 id=shape-dialect-passes>‘shape’ Dialect Passes <a class=headline-hash href=#shape-dialect-passes>¶</a></h2><h3 id=-outline-shape-computation><code>-outline-shape-computation</code> <a class=headline-hash href=#-outline-shape-computation>¶</a></h3>Using shape.func to preserve shape computationThis pass outlines the shape computation part in high level IR by adding shape.func and populate corresponding mapping infoemation into ShapeMappingAnalysis. The shape computation part is usually introduced by shape reification, and each single dynamic shape is denoted by shape.with_shape.There’re two main reasons this shape-outline pass is needed:<ol><li>Many passes don’t take shape reification part into consideration. Therefore we need to “remove” the shape reification part temporarily for these passes.</li><li>Sometimes we cannot redo shape reification after converting from dialect A to dialect B. Because op-level shape reification is only implemented on A.</li></ol>Input:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>func.func @main(%arg0: tensor<?x4x?xf32>, %arg1: tensor<2x4x?xf32>) -> tensor<?x4x?xf32> { %c2 = arith.constant 2 : index %c0 = arith.constant 0 : index %c4 = arith.constant 4 : index %0 = shape.shape_of %arg0 : tensor<?x4x?xf32> -> tensor<3xindex> %1 = shape.get_extent %0, %c2 : tensor<3xindex>, index -> index %2 = "test.abs"(%arg0) : (tensor<?x4x?xf32>) -> tensor<?x4x?xf32> %3 = shape.with_shape %2, %0 : tensor<?x4x?xf32>, tensor<3xindex> %4 = shape.value_of %3 : tensor<?x4x?xf32> %5 = "test.concat"(%4, %arg1) {axis = 0 : i64} : (tensor<?x4x?xf32>, tensor<2x4x?xf32>) -> tensor<?x4x?xf32> %6 = shape.get_extent %0, %c0 : tensor<3xindex>, index -> index %7 = arith.addi %6, %c2 : index %8 = shape.from_extents %7, %c4, %1 : index, index, index %9 = shape.with_shape %5, %8 : tensor<?x4x?xf32>, !shape.shape %10 = shape.value_of %9 : tensor<?x4x?xf32> return %10 : tensor<?x4x?xf32> } </code></pre></div>Output<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>func.func @main(%arg0: tensor<?x4x?xf32>, %arg1: tensor<2x4x?xf32>) -> tensor<?x4x?xf32> { %0 = "test.abs"(%arg0) : (tensor<?x4x?xf32>) -> tensor<?x4x?xf32> %1 = "test.concat"(%0, %arg1) {axis = 0 : i64} : (tensor<?x4x?xf32>, tensor<2x4x?xf32>) -> tensor<?x4x?xf32> return %1 : tensor<?x4x?xf32> } shape.func private @shape_cal_1(%arg0: tensor<?x4x?xf32>) -> !shape.shape { %c2 = arith.constant 2 : index %c0 = arith.constant 0 : index %c4 = arith.constant 4 : index %0 = shape_of %arg0 : tensor<?x4x?xf32> -> tensor<3xindex> %1 = get_extent %0, %c2 : tensor<3xindex>, index -> index %2 = get_extent %0, %c0 : tensor<3xindex>, index -> index %3 = arith.addi %2, %c2 : index %4 = from_extents %3, %c4, %1 : index, index, index return %4 : !shape.shape } shape.func private @shape_cal_0(%arg0: tensor<?x4x?xf32>) -> tensor<3xindex> { %0 = shape_of %arg0 : tensor<?x4x?xf32> -> tensor<3xindex> return %0 : tensor<3xindex> } </code></pre></div>For the above example, the shape computation is inlined in the input IR, which is used for two values’ (test.abs and test.concat) shape. And the shape compuatation part is outlined in the output IR.And the shape mapping infomation will be:<pre tabindex=0><code>// ---- Shape Mapping Infomation ----- // - Shape for: %0 = "test.abs"(%arg0) : (tensor<?x4x?xf32>) -> tensor<?x4x?xf32> :: @shape_cal_0(<block argument> of type 'tensor<?x4x?xf32>' at index: 0) // - Shape for: %1 = "test.concat"(%0, %arg1) {axis = 0 : i64} : (tensor<?x4x?xf32>, tensor<2x4x?xf32>) -> tensor<?x4x?xf32> :: @shape_cal_1(<block argument> of type 'tensor<?x4x?xf32>' at index: 0) </code></pre><h3 id=-remove-shape-constraints><code>-remove-shape-constraints</code> <a class=headline-hash href=#-remove-shape-constraints>¶</a></h3>Replace all cstr ops with a true witness_<h3 id=-shape-to-shape-lowering><code>-shape-to-shape-lowering</code> <a class=headline-hash href=#-shape-to-shape-lowering>¶</a></h3>Legalize Shape dialect to be convertible to Arith<h2 id=sparse_tensor-dialect-passes>‘sparse_tensor’ Dialect Passes <a class=headline-hash href=#sparse_tensor-dialect-passes>¶</a></h2><h3 id=-lower-sparse-foreach-to-scf><code>-lower-sparse-foreach-to-scf</code> <a class=headline-hash href=#-lower-sparse-foreach-to-scf>¶</a></h3>Decompose a complex sparse operation into multiple stagesA pass that lowers sparse_tensor.foreach operation to scf dialect.<h3 id=-lower-sparse-iteration-to-scf><code>-lower-sparse-iteration-to-scf</code> <a class=headline-hash href=#-lower-sparse-iteration-to-scf>¶</a></h3>Lower sparse_tensor.iterate/coiterate into scf loopsThis pass lowers <code>sparse_tensor.iterate</code> operations into <code>scf.for/while</code> operations. The pass is not yet stabilized.<h3 id=-lower-sparse-ops-to-foreach><code>-lower-sparse-ops-to-foreach</code> <a class=headline-hash href=#-lower-sparse-ops-to-foreach>¶</a></h3>Applies sparse tensor rewriting rules after sparsificationA pass that lowers high-level sparse operations to sparse_tensor.foreach.<h4 id=options-84>Options <a class=headline-hash href=#options-84>¶</a></h4><pre tabindex=0><code>-enable-runtime-library : Enable runtime library for manipulating sparse tensors -enable-convert : Enable rewriting rules for the convert operator </code></pre><h3 id=-pre-sparsification-rewrite><code>-pre-sparsification-rewrite</code> <a class=headline-hash href=#-pre-sparsification-rewrite>¶</a></h3>Applies sparse tensor rewriting rules prior to sparsificationA pass that applies rewriting rules to sparse tensor operations prior to running the actual sparsification pass.<h3 id=-sparse-assembler><code>-sparse-assembler</code> <a class=headline-hash href=#-sparse-assembler>¶</a></h3>Add [dis]assemble operations on external sparse tensorsUnlike dense tensors, MLIR does not provide a direct <code>_mlir_ciface_</code> ABI for passing sparse tensors as arguments from and to external methods (within MLIR-generated methods, sparse tensors can be freely passed around, but this eventually uses a bespoke parameter passing format that is subject to change; like opaque pointers when the sparse runtime support library is used or the constituent arrays and structs for direct IR codegen). The sparse assembler pass, however, can be used to obtain a stable <code>_mlir_ciface_</code> API for passing sparse tensors from and to an external environment, such as Python, PyTorch, or JAX.The pass converts public entry methods that use sparse tensors as input parameters and/or output return values into wrapper methods that [dis]assemble the individual tensors that constitute the actual storage used externally into MLIR sparse tensors. This pass can be used to prepare the public entry methods of a program that is compiled by the MLIR sparsifier to interface with an external runtime, e.g., when passing sparse tensors as numpy arrays from and to Python. Note that eventual bufferization decisions (e.g. who [de]allocates the underlying memory) should be resolved in agreement with the external runtime.By default, the pass uses the [dis]assemble operations to input and output sparse tensors. When the direct-out option is set, however, the output directly returns the MLIR allocated buffers to the external runtime.The pass should always run before the actual sparsification passes.<h4 id=options-85>Options <a class=headline-hash href=#options-85>¶</a></h4><pre tabindex=0><code>-direct-out : Directly returns buffers externally </code></pre><h3 id=-sparse-buffer-rewrite><code>-sparse-buffer-rewrite</code> <a class=headline-hash href=#-sparse-buffer-rewrite>¶</a></h3>Rewrite sparse primitives on buffers to actual codeA pass that rewrites sparse primitives on buffers to the MLIR implementation of the primitives. For example, sparse_tensor.sort operator is implemented in this pass.<h4 id=options-86>Options <a class=headline-hash href=#options-86>¶</a></h4><pre tabindex=0><code>-enable-buffer-initialization : Enable zero-initialization of the memory buffers </code></pre><h3 id=-sparse-gpu-codegen><code>-sparse-gpu-codegen</code> <a class=headline-hash href=#-sparse-gpu-codegen>¶</a></h3>Generates GPU code during sparsificationEnables the sparsifier to use GPU acceleration. When the number of GPU threads is set to zero, the pass tries to enable GPU acceleration by means of direct library calls (like cuSPARSE).<h4 id=options-87>Options <a class=headline-hash href=#options-87>¶</a></h4><pre tabindex=0><code>-num-threads : Sets the number of GPU threads -enable-runtime-library : Enable runtime library for manipulating sparse tensors </code></pre><h3 id=-sparse-reinterpret-map><code>-sparse-reinterpret-map</code> <a class=headline-hash href=#-sparse-reinterpret-map>¶</a></h3>Reinterprets sparse tensor type mappingsA pass that reinterprets the mappings in all sparse tensor types in a way that enables subsequent sparsification. This involves expressing all <code>linalg.generic</code> operations in terms of level coordinates (rather than the dimension coordinates of the input tensors) to align the iteration space with the potentially remapped level space as well as resolving cycles in the resulting iteration graphs with explicit sparse tensor conversions where needed.<h4 id=options-88>Options <a class=headline-hash href=#options-88>¶</a></h4><pre tabindex=0><code>-scope : Set the reiterpretation scope </code></pre><h3 id=-sparse-space-collapse><code>-sparse-space-collapse</code> <a class=headline-hash href=#-sparse-space-collapse>¶</a></h3>Sparse space collapsing passThis pass collapses consecutive sparse spaces (extracted from the same tensor) into one multi-dimensional space. The pass is not yet stabilized.<h3 id=-sparse-storage-specifier-to-llvm><code>-sparse-storage-specifier-to-llvm</code> <a class=headline-hash href=#-sparse-storage-specifier-to-llvm>¶</a></h3>Lower sparse storage specifer to llvm structureThis pass rewrites sparse tensor storage specifier-related operations into LLVMDialect, and converts sparse tensor storage specifier into an llvm.struct.Example of the conversion:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>Before: %0 = sparse_tensor.storage_specifier.get %arg0 dim_sz at 0 : !sparse_tensor.storage_specifier<#CSR> to i64 After: %0 = llvm.extractvalue %arg0[0, 0] : !llvm.struct<(array<2 x i64>, array<3 x i64>)> </code></pre></div><h3 id=-sparse-tensor-codegen><code>-sparse-tensor-codegen</code> <a class=headline-hash href=#-sparse-tensor-codegen>¶</a></h3>Convert sparse tensors and primitives to actual codeA pass that converts sparse tensor types and primitives to actual compiler visible buffers and compiler IR that implements these primitives on the selected sparse tensor storage schemes.This pass provides an alternative to the SparseTensorConversion pass, eliminating the dependence on a runtime support library, and providing much more opportunities for subsequent compiler optimization of the generated code.Example of the conversion:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir> Before: func.func @foo(%arg0: tensor<8x8xf32, #CSR>) -> memref<?xindex> { %0 = sparse_tensor.pointers %arg0 {dimension = 1 : index} : tensor<8x8xf32, #CSR> to memref<?xindex> return %0 : memref<?xindex> } After: func.func @foo(%arg0: memref<2xindex>, %arg1: memref<3xindex>, %arg2: memref<?xindex>, %arg3: memref<?xindex>, %arg4: memref<?xf32>) -> memref<?xindex> { return %arg2 : memref<?xindex> } </code></pre></div><h4 id=options-89>Options <a class=headline-hash href=#options-89>¶</a></h4><pre tabindex=0><code>-enable-buffer-initialization : Enable zero-initialization of the memory buffers -create-sparse-deallocs : Specify if the temporary buffers created by the sparse compiler should be deallocated. For compatibility with core bufferization passes. This option is only used when enable-runtime-library=false. See also create-deallocs for BufferizationOption. </code></pre><h3 id=-sparse-tensor-conversion><code>-sparse-tensor-conversion</code> <a class=headline-hash href=#-sparse-tensor-conversion>¶</a></h3>Convert sparse tensors and primitives to library callsA pass that converts sparse tensor primitives into calls into a runtime support library. Sparse tensor types are converted into opaque pointers to the underlying sparse storage schemes.The use of opaque pointers together with runtime support library keeps the conversion relatively simple, but at the expense of IR opacity, which obscures opportunities for subsequent optimization of the IR. An alternative is provided by the SparseTensorCodegen pass.Example of the conversion:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir> Before: func.func @foo(%arg0: tensor<8x8xf32, #CSR>) -> memref<?xindex> { %0 = sparse_tensor.pointers %arg0 {dimension = 1 : index} : tensor<8x8xf32, #CSR> to memref<?xindex> return %0 : memref<?xindex> } After: func.func @foo(%arg0: !llvm.ptr) -> memref<?xindex> { %c1 = arith.constant 1 : index %0 = call @sparsePointers0(%arg0, %c1) : (!llvm.ptr, index) -> memref<?xindex> return %0 : memref<?xindex> } </code></pre></div><h3 id=-sparse-vectorization><code>-sparse-vectorization</code> <a class=headline-hash href=#-sparse-vectorization>¶</a></h3>Vectorizes loops after sparsificationA pass that converts loops after sparsification into vector loops. The vector dialect is used as target to provide an architectural neutral way of exploiting any platform that supports SIMD instructions.The vector length (viz. <code>vl</code>) describes the number of packed data elements (e.g. both vector<16xf32> and vector<16xf64> have a vector length of 16 even though the actual bitwidths differ). A small multiple of the actual lengths supported in hardware typically results in efficient SIMD code, since the backend will map longer vectors to multiple vector registers, thereby effectively unrolling an addition level within the generated for-loop.Example of the conversion:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir> Before: %3 = memref.load %2[] : memref<f32> %4 = scf.for %arg3 = %c0 to %c1024 step %c1 iter_args(%arg4 = %3) -> (f32) { %6 = memref.load %0[%arg3] : memref<?xf32> %7 = memref.load %1[%arg3] : memref<1024xf32> %8 = arith.mulf %6, %7 : f32 %9 = arith.addf %arg4, %8 : f32 scf.yield %9 : f32 } memref.store %4, %2[] : memref<f32> After: %3 = memref.load %2[] : memref<f32> %4 = vector.insertelement %3, %cst[%c0 : index] : vector<32xf32> %5 = scf.for %arg3 = %c0 to %c1024 step %c32 iter_args(%arg4 = %4) -> (vector<32xf32>) { %8 = vector.load %0[%arg3] : memref<?xf32>, vector<32xf32> %9 = vector.load %1[%arg3] : memref<1024xf32>, vector<32xf32> %10 = arith.mulf %8, %9 : vector<32xf32> %11 = arith.addf %arg4, %10 : vector<32xf32> scf.yield %11 : vector<32xf32> } %6 = vector.reduction <add>, %5 : vector<32xf32> into f32 memref.store %6, %2[] : memref<f32> </code></pre></div><h4 id=options-90>Options <a class=headline-hash href=#options-90>¶</a></h4><pre tabindex=0><code>-vl : Set the vector length (use 0 to disable vectorization) -enable-vla-vectorization : Enable vector length agnostic vectorization -enable-simd-index32 : Enable i32 indexing into vectors (for efficient gather/scatter) </code></pre><h3 id=-sparsification><code>-sparsification</code> <a class=headline-hash href=#-sparsification>¶</a></h3>Automatically generate sparse tensor code from sparse tensor typesA pass that implements the core functionality of a sparsifier. Each Linalg operation (MLIR’s tensor index notation) that operates on sparse tensor types is converted into code in which the sparsity is explicit both in terms of co-iterating looping logic as well as selected sparse storage schemes.See the <code>SparseTensor</code> dialect documentation for more background.Example input:<div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir>#matvec = { indexing_maps = [ affine_map<(i,j) -> (i,j)>, // A affine_map<(i,j) -> (j)>, // b affine_map<(i,j) -> (i)> // x (out) ], iterator_types = ["parallel", "reduction"], doc = "X(i) += A(i,j) * B(j)" } // Multiply a sparse matrix A with a dense vector b into a dense vector x. func.func @kernel_matvec(%arga: tensor<?x?xf64, #SparseMatrix>, %argb: tensor<?xf64>, %argx: tensor<?xf64>) -> tensor<?xf64> { %0 = linalg.generic #matvec ins(%arga, %argb: tensor<?x?xf64, #SparseMatrix>, tensor<?xf64>) outs(%argx: tensor<?xf64>) { ^bb(%a: f64, %b: f64, %x: f64): %0 = arith.mulf %a, %b : f64 %1 = arith.addf %x, %0 : f64 linalg.yield %1 : f64 } -> tensor<?xf64> return %0 : tensor<?xf64> } </code></pre></div><h4 id=options-91>Options <a class=headline-hash href=#options-91>¶</a></h4><pre tabindex=0><code>-parallelization-strategy : Set the parallelization strategy -sparse-emit-strategy : Emit functional code or interfaces (to debug) for sparse loops -enable-runtime-library : Enable runtime library for manipulating sparse tensors </code></pre><h3 id=-sparsification-and-bufferization><code>-sparsification-and-bufferization</code> <a class=headline-hash href=#-sparsification-and-bufferization>¶</a></h3>Mini-pipeline that combines bufferization and sparsifiationThis pass forms a mini-pipeline that combines bufferization and sparsifiation.<h4 id=options-92>Options <a class=headline-hash href=#options-92>¶</a></h4><pre tabindex=0><code>-vl : Set the vector length (use 0 to disable vectorization) -enable-vla-vectorization : Enable vector length agnostic vectorization -enable-simd-index32 : Enable i32 indexing into vectors (for efficient gather/scatter) -enable-gpu-libgen : Enable GPU acceleration by means of direct library calls -sparse-emit-strategy : Emit functional code or interfaces (to debug) for sparse loops -parallelization-strategy : Set the parallelization strategy </code></pre><h3 id=-stage-sparse-ops><code>-stage-sparse-ops</code> <a class=headline-hash href=#-stage-sparse-ops>¶</a></h3>Decompose a complex sparse operation into multiple stagesA pass that decomposes a complex sparse operation into multiple stages. E.g., CSR -> CSC is staged into CSR -> COO (unordered) -> sort -> CSC.<h2 id=spv-dialect-passes>‘spv’ Dialect Passes <a class=headline-hash href=#spv-dialect-passes>¶</a></h2><h3 id=-decorate-spirv-composite-type-layout><code>-decorate-spirv-composite-type-layout</code> <a class=headline-hash href=#-decorate-spirv-composite-type-layout>¶</a></h3>Decorate SPIR-V composite type with layout infoModule pass that converts composite types used by objects in the StorageBuffer, PhysicalStorageBuffer, Uniform, and PushConstant storage classes to attatch layout information. Right now this pass only supports Vulkan layout rules.<h3 id=-spirv-canonicalize-gl><code>-spirv-canonicalize-gl</code> <a class=headline-hash href=#-spirv-canonicalize-gl>¶</a></h3>Canonicalize GLSL opsPass to run canoncalization patterns that involve GL ops. These patterns cannot be run in default canonicalization because GL ops aren’t always available. So they should be involed specifically when needed.<h3 id=-spirv-lower-abi-attrs><code>-spirv-lower-abi-attrs</code> <a class=headline-hash href=#-spirv-lower-abi-attrs>¶</a></h3>Decorate SPIR-V composite type with layout infoOperation pass that lowers the ABI attributes specified during SPIR-V Lowering. Specifically:<ol><li>Creates the global variables for arguments of entry point function using the specification in the <code>spirv.interface_var_abi</code> attribute for each argument.</li><li>Inserts the EntryPointOp and the ExecutionModeOp for entry point functions using the specification in the <code>spirv.entry_point_abi</code> attribute.</li></ol><h3 id=-spirv-rewrite-inserts><code>-spirv-rewrite-inserts</code> <a class=headline-hash href=#-spirv-rewrite-inserts>¶</a></h3>Rewrite sequential chains of <code>spirv.CompositeInsert</code> operations into <code>spirv.CompositeConstruct</code> operations<h3 id=-spirv-unify-aliased-resource><code>-spirv-unify-aliased-resource</code> <a class=headline-hash href=#-spirv-unify-aliased-resource>¶</a></h3>Unify access of multiple aliased resources into access of one single resource<h3 id=-spirv-update-vce><code>-spirv-update-vce</code> <a class=headline-hash href=#-spirv-update-vce>¶</a></h3>Deduce and attach minimal (version, capabilities, extensions) requirements to spirv.module opsOperation pass that deduces and attaches the minimal version/ capabilities/extensions requirements for spirv.module ops. For each spirv.module op, this pass requires a <code>spirv.target_env</code> attribute on it or an enclosing module-like op to drive the deduction. The reason is that an op can be enabled by multiple extensions/capabilities. So we need to know which one to pick. <code>spirv.target_env</code> gives the hard limit as for what the target environment can support; this pass deduces what are actually needed for a specific spirv.module op.<h3 id=-spirv-webgpu-prepare><code>-spirv-webgpu-prepare</code> <a class=headline-hash href=#-spirv-webgpu-prepare>¶</a></h3>Prepare SPIR-V to target WebGPU by expanding unsupported ops and replacing with supported ones<h2 id=tensor-dialect-passes>’tensor’ Dialect Passes <a class=headline-hash href=#tensor-dialect-passes>¶</a></h2><h3 id=-fold-tensor-subset-ops><code>-fold-tensor-subset-ops</code> <a class=headline-hash href=#-fold-tensor-subset-ops>¶</a></h3>Fold tensor subset ops into producer/consumer opsThe pass folds tensor subset ops into producer/consumer ops.At the moment, the following foldings occur when possible:<ul><li>tensor.extract_slice into vector.transfer_read</li><li>vector.transfer_write into tensor.insert_slice</li></ul><h2 id=transform-dialect-passes>’transform’ Dialect Passes <a class=headline-hash href=#transform-dialect-passes>¶</a></h2><h3 id=-transform-dialect-check-uses><code>-transform-dialect-check-uses</code> <a class=headline-hash href=#-transform-dialect-check-uses>¶</a></h3>Warn about potential use-after-free in the transform dialectThis pass analyzes operations from the transform dialect and its extensions and warns if a transform IR value may be used by an operation after it was “freed” by some other operation, as described by side effects on the <code>TransformMappingResource</code>. This statically detects situations that lead to errors when interpreting the Transform IR.The pass is capable of handling branching control flow and reports all potential use-after-free situations, e.g., a may-use-after-free is reported if at least one of the control flow paths between the definition of a value and its use contains an operation with a “free” effect on the <code>TransformMappingResource</code>. It does not currently perform an SCCP-style data flow analysis to prove that some branches are not taken, however, SCCP and other control flow simplifications can be performed on the transform IR prior to this pass provided that transform ops implement the relevant control flow interfaces.<h3 id=-transform-infer-effects><code>-transform-infer-effects</code> <a class=headline-hash href=#-transform-infer-effects>¶</a></h3>Infer transform side effects for symbolsThis pass analyzes the definitions of transform dialect callable symbol operations, such as <code>transform.named_sequence</code>, and annotates the symbol arguments with attributes indicating the side effects that the nested operations have on them.<h3 id=-transform-interpreter><code>-transform-interpreter</code> <a class=headline-hash href=#-transform-interpreter>¶</a></h3>Transform dialect interpreterThis pass runs the transform dialect interpreter and applies the named sequence transformation specified by the provided name (defaults to <code>TransformDialect::kTransformEntryPointSymbolName</code>, i.e. <code>__transform_main</code>).Additional options can be used to narrow down the pass applicability for debugging purposes:<ul><li><code>debugPayloadRootTag</code> makes the transform script apply to the payload operation that has a <code>transform.target_tag</code> string attribute with the given value, rather than to the anchor operation of the pass.</li><li><code>debugBindTrailingArgs</code> allows one to bind values to trailing arguments of the transform entry point as follows:<ul><li>arguments of <code>TransformHandleTypeInterface</code> type can be bound to all payload operations with the name provided as a simple string;</li><li>arguments of <code>TransformValueHandleTypeInterface</code> type can be bound to a flattened list of results of all operations with the name provided as a string prefixed with <code>^</code>;</li><li>arguments of <code>TransformParamTypeInterface</code> type can be bound to integer constants provided as <code>;</code>-separated list prefixed with <code>#</code>.</li></ul></li><li><code>entryPoint</code> specifies the name of the transform symbol to serve as the entry point.</li></ul><h4 id=options-93>Options <a class=headline-hash href=#options-93>¶</a></h4><pre tabindex=0><code>-debug-payload-root-tag : Select the operation with 'transform.target_tag' attribute having the given value as payload IR root. If empty select the pass anchor operation as the payload IR root. -debug-bind-trailing-args : Binds trailing arguments of the entry point to the payload operations with specified names. -disable-expensive-checks : Disable expensive checks in the interpreter for a faster run. -entry-point : Entry point of the pass pipeline. </code></pre><h3 id=-transform-preload-library><code>-transform-preload-library</code> <a class=headline-hash href=#-transform-preload-library>¶</a></h3>Preload transform dialect libraryThis pass preloads a transform library and makes it available to subsequent transform interpreter passes. The preloading occurs into the Transform dialect and thus provides very limited functionality that does not scale.Warning: Only a single such pass should exist for a given MLIR context. This is a temporary solution until a resource-based solution is available.<h4 id=options-94>Options <a class=headline-hash href=#options-94>¶</a></h4><pre tabindex=0><code>-transform-library-paths : Optional paths to files with modules that should be merged into the transform module to provide the definitions of external named sequences. </code></pre><h2 id=vector-dialect-passes>‘vector’ Dialect Passes <a class=headline-hash href=#vector-dialect-passes>¶</a></h2><h3 id=-lower-vector-mask><code>-lower-vector-mask</code> <a class=headline-hash href=#-lower-vector-mask>¶</a></h3>Lower ‘vector.mask’ operations<h3 id=-lower-vector-multi-reduction><code>-lower-vector-multi-reduction</code> <a class=headline-hash href=#-lower-vector-multi-reduction>¶</a></h3>Lower ‘vector.multi_reduction’ operations<h4 id=options-95>Options <a class=headline-hash href=#options-95>¶</a></h4><pre tabindex=0><code>-lowering-strategy : Select the strategy to control how multi_reduction is lowered. </code></pre><h2 id=tosa-dialect-passes>TOSA Dialect Passes <a class=headline-hash href=#tosa-dialect-passes>¶</a></h2><h3 id=-tosa-infer-shapes><code>-tosa-infer-shapes</code> <a class=headline-hash href=#-tosa-infer-shapes>¶</a></h3>Propagate shapes across TOSA operationsPass that uses operand types and propagates shapes to TOSA operations. This includes legalizing rankless and dynamic shapes towards static.<h3 id=-tosa-layerwise-constant-fold><code>-tosa-layerwise-constant-fold</code> <a class=headline-hash href=#-tosa-layerwise-constant-fold>¶</a></h3>Fold layerwise operations on constant tensorsPass that enables folding of full-layer operations on constant tensors.<h4 id=options-96>Options <a class=headline-hash href=#options-96>¶</a></h4><pre tabindex=0><code>-aggressive-reduce-constant : Always perform the reduce constant optimizationMay add more tosa.const but would reduce runtime calculations </code></pre><h3 id=-tosa-make-broadcastable><code>-tosa-make-broadcastable</code> <a class=headline-hash href=#-tosa-make-broadcastable>¶</a></h3>TOSA rank Reshape to enable BroadcastingPass that enables broadcast by making all input arrays have the same number of dimensions. Insert RESHAPE operations to prepend dimensions of size one until the number of dimensions is equal. Implements approach similar to step 1 of Numpy 4-step broadcasting: <a href=https://numpy.org/doc/stable/reference/ufuncs.html#broadcasting>https://numpy.org/doc/stable/reference/ufuncs.html#broadcasting</a><h3 id=-tosa-optional-decompositions><code>-tosa-optional-decompositions</code> <a class=headline-hash href=#-tosa-optional-decompositions>¶</a></h3>Applies Tosa operations optional decompositionsPass to apply the Tosa operations decompositions exposed as populate functions in include/mlir/Dialect/Tosa/Transforms/Passes.h<h3 id=-tosa-reduce-transposes><code>-tosa-reduce-transposes</code> <a class=headline-hash href=#-tosa-reduce-transposes>¶</a></h3>Reduce transposes through other operatorsPass that identifies and reduces tosa.TRANSPOSE operations through chains of operators.The pass traverses dependencies of tosa.TRANSPOSE operations until they terminate in either a tosa.RESHAPE that we can fold the hoisted tosa.TRANSPOSE into, a tosa.TRANSPOSE that forms the identity with the hoisted one, or a tosa.CONST with a dense elements attribute. It then propagates the hoisted transform upward through the intervening operators if the support is implemented. Finally, it observes that no duplication will occur of both the chain that was hoisted through and the new chain that results, and if so, it replaces the hoisted tosa.TRANSPOSE.The pass has an important use-case in cleaning up the results of frameworks that introduce a lot of data-layout transformations when legalizing to TOSA, a common one being transformations between NHWC and NCHW layouts.<h3 id=-tosa-validate><code>-tosa-validate</code> <a class=headline-hash href=#-tosa-validate>¶</a></h3>Validates TOSA dialectThis pass validates if input TOSA operations match the specification for given criteria, e.g. TOSA profile.<h4 id=options-97>Options <a class=headline-hash href=#options-97>¶</a></h4><pre tabindex=0><code>-profile : Validate if operations match for the given profile set -extension : Validate if operations match for the given extension set -strict-op-spec-alignment : Verify if the properties of certain operations align the spec requirement -allow-invalid-op-datatype-combinations : Disable checks for operations that are determined to be invalid due to their operand/result datatypes not aligning with the 'Supported Data Types' sections of the specifciation -level : Validate if operator parameters are within specfication for the given level </code></pre><h2 id=xegpu-dialect-passes>XeGPU Dialect Passes <a class=headline-hash href=#xegpu-dialect-passes>¶</a></h2><h3 id=-xegpu-fold-alias-ops><code>-xegpu-fold-alias-ops</code> <a class=headline-hash href=#-xegpu-fold-alias-ops>¶</a></h3>Fold alias ops into XeGPU opsThe pass folds aliasing ops into XeGPU ops that they operate on the original source references.<h3 id=-xegpu-subgroup-distribute><code>-xegpu-subgroup-distribute</code> <a class=headline-hash href=#-xegpu-subgroup-distribute>¶</a></h3>Distribute XeGPU ops to work itemsThe pass distributes subgroup level (SIMD) XeGPU ops to work items.<h4 id=options-98>Options <a class=headline-hash href=#options-98>¶</a></h4><pre tabindex=0><code>-print-analysis-only : Print the result of the subgroup map propagation analysis and exit. </code></pre><div class=edit-meta> </div><nav class=pagination><a class="nav nav-prev" href=https://mlir.llvm.org/docs/PassManagement/ title="Pass Infrastructure"> Prev - Pass Infrastructure</a> <a class="nav nav-next" href=https://mlir.llvm.org/docs/PatternRewriter/ title="Pattern Rewriting : Generic DAG-to-DAG Rewriting">Next - Pattern Rewriting : Generic DAG-to-DAG Rewriting </a></nav><footer>Powered by <a href=https://gohugo.io>Hugo</a>. Theme by <a href=https://themes.gohugo.io/hugo-theme-techdoc/>TechDoc</a>. Designed by <a href=https://github.com/thingsym/hugo-theme-techdoc>Thingsym</a>.</footer></main><div class=sidebar><nav class=slide-menu><ul><li><a href=https://mlir.llvm.org/>Home</a></li><li><a href=https://mlir.llvm.org/governance/>Governance</a></li><li><a href=https://mlir.llvm.org/users/>Users of MLIR</a></li><li><a href=https://mlir.llvm.org/pubs/>MLIR Related Publications</a></li><li><a href=https://mlir.llvm.org/talks/>Talks</a></li><li><a href=https://mlir.llvm.org/deprecation/>Deprecations & Current Refactoring</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/getting_started/>Getting Started+</a><ul class=sub-menu><li><a href=https://mlir.llvm.org/getting_started/ReportingIssues/>Reporting Issues</a></li><li><a href=https://mlir.llvm.org/getting_started/Debugging/>Debugging Tips</a></li><li><a href=https://mlir.llvm.org/getting_started/Faq/>FAQ</a></li><li><a href=https://mlir.llvm.org/getting_started/Contributing/>How to Contribute</a></li><li><a href=https://mlir.llvm.org/getting_started/DeveloperGuide/>Developer Guide</a></li><li><a href=https://mlir.llvm.org/getting_started/openprojects/>Open Projects</a></li><li><a href=https://mlir.llvm.org/getting_started/Glossary/>Glossary</a></li><li><a href=https://mlir.llvm.org/getting_started/TestingGuide/>Testing Guide</a></li></ul></li><li class="parent has-sub-menu"><a href=https://mlir.llvm.org/docs/>Code Documentation-</a><ul class=sub-menu><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Bindings/>Bindings+</a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Bindings/Python/>MLIR Python Bindings</a></li></ul></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Tools/>Tools+</a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Tools/MLIRLSP/>MLIR : Language Server Protocol</a></li><li><a href=https://mlir.llvm.org/docs/Tools/mlir-reduce/>MLIR Reduce</a></li><li><a href=https://mlir.llvm.org/docs/Tools/mlir-rewrite/>mlir-rewrite</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/QuantPasses/></a></li><li><a href=https://mlir.llvm.org/docs/ActionTracing/>Action: Tracing and Debugging MLIR-based Compilers</a></li><li><a href=https://mlir.llvm.org/docs/Bufferization/>Bufferization</a></li><li><a href=https://mlir.llvm.org/docs/DataLayout/>Data Layout Modeling</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/DefiningDialects/>Defining Dialects+</a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/DefiningDialects/Constraints/>Constraints</a></li><li><a href=https://mlir.llvm.org/docs/DefiningDialects/Assembly/>Customizing Assembly Behavior</a></li><li><a href=https://mlir.llvm.org/docs/DefiningDialects/AttributesAndTypes/>Defining Dialect Attributes and Types</a></li><li><a href=https://mlir.llvm.org/docs/DefiningDialects/Operations/>Operation Definition Specification (ODS)</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Diagnostics/>Diagnostic Infrastructure</a></li><li><a href=https://mlir.llvm.org/docs/DialectConversion/>Dialect Conversion</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Dialects/>Dialects+</a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Dialects/OpenACCDialect/>'acc' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Affine/>'affine' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/AMDGPU/>'amdgpu' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/AMX/>'amx' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ArithOps/>'arith' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ArmNeon/>'arm_neon' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ArmSVE/>'arm_sve' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ArmSME/>'ArmSME' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/AsyncDialect/>'async' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/BufferizationOps/>'bufferization' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ControlFlowDialect/>'cf' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ComplexOps/>'complex' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/DLTIDialect/>'dlti' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/EmitC/>'emitc' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Func/>'func' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/GPU/>'gpu' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/IndexOps/>'index' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/IRDL/>'irdl' Dialect</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Dialects/Linalg/>'linalg' Dialect+</a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Dialects/Linalg/OpDSL/>Linalg OpDSL</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Dialects/LLVM/>'llvm' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MathOps/>'math' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MemRef/>'memref' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Mesh/>'mesh' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MLProgramOps/>'ml_program' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MPI/>'mpi' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/NVGPU/>'nvgpu' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/NVVMDialect/>'nvvm' Dialect</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Dialects/OpenMPDialect/>'omp' Dialect+</a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Dialects/OpenMPDialect/ODS/>ODS Documentation</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Dialects/PDLInterpOps/>'pdl_interp' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/PDLOps/>'pdl' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/PolynomialDialect/>'polynomial' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/PtrOps/>'ptr' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/QuantDialect/>'quant' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ROCDLDialect/>'rocdl' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/SCFDialect/>'scf' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ShapeDialect/>'shape' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/SparseTensorOps/>'sparse_tensor' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/TensorOps/>'tensor' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/UBOps/>'ub' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/VCIXDialect/>'vcix' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Vector/>'vector' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/X86Vector/>'x86vector' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/XeGPU/>'xegpu' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Builtin/>Builtin Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MatchOpInterfaces/>OpInterface definitions</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/SPIR-V/>SPIR-V Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/TOSA/>Tensor Operator Set Architecture (TOSA) Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Transform/>Transform Dialect</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Interfaces/>Interfaces</a></li><li><a href=https://mlir.llvm.org/docs/TargetLLVMIR/>LLVM IR Target</a></li><li><a href=https://mlir.llvm.org/docs/BytecodeFormat/>MLIR Bytecode Format</a></li><li><a href=https://mlir.llvm.org/docs/CAPI/>MLIR C API</a></li><li><a href=https://mlir.llvm.org/docs/LangRef/>MLIR Language Reference</a></li><li><a href=https://mlir.llvm.org/docs/ReleaseNotes/>MLIR Release Notes</a></li><li><a href=https://mlir.llvm.org/docs/Canonicalization/>Operation Canonicalization</a></li><li><a href=https://mlir.llvm.org/docs/OwnershipBasedBufferDeallocation/>Ownership-based Buffer Deallocation</a></li><li><a href=https://mlir.llvm.org/docs/PassManagement/>Pass Infrastructure</a></li><li class=active><a href=https://mlir.llvm.org/docs/Passes/>Passes</a></li><li><a href=https://mlir.llvm.org/docs/PatternRewriter/>Pattern Rewriting : Generic DAG-to-DAG Rewriting</a></li><li><a href=https://mlir.llvm.org/docs/PDLL/>PDLL - PDL Language</a></li><li><a href=https://mlir.llvm.org/docs/Quantization/>Quantization</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Rationale/>Rationale+</a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Rationale/RationaleGenericDAGRewriter/>Generic DAG Rewriter Infrastructure Rationale</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/RationaleLinalgDialect/>Linalg Dialect Rationale: The Case For Compiler-Friendly Custom Operations</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/Rationale/>MLIR Rationale</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/MLIRForGraphAlgorithms/>MLIR: Incremental Application to Graph Algorithms in ML Frameworks</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/RationaleSimplifiedPolyhedralForm/>MLIR: The case for a simplified polyhedral form</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/SideEffectsAndSpeculation/>Side Effects & Speculation</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/UsageOfConst/>Usage of 'const' in MLIR, for core IR types</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/ShapeInference/>Shape Inference</a></li><li><a href=https://mlir.llvm.org/docs/SPIRVToLLVMDialectConversion/>SPIR-V Dialect to LLVM Dialect conversion manual</a></li><li><a href=https://mlir.llvm.org/docs/SymbolsAndSymbolTables/>Symbols and Symbol Tables</a></li><li><a href=https://mlir.llvm.org/docs/DeclarativeRewrites/>Table-driven Declarative Rewrite Rule (DRR)</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Traits/>Traits+</a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Traits/Broadcastable/>The `Broadcastable` Trait</a></li></ul></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Tutorials/>Tutorials+</a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Tutorials/CreatingADialect/>Creating a Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/QuickstartRewrites/>Quickstart tutorial to adding MLIR graph rewrite</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Tutorials/Toy/>Toy Tutorial+</a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-1/>Chapter 1: Toy Language and AST</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-2/>Chapter 2: Emitting Basic MLIR</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-3/>Chapter 3: High-level Language-Specific Analysis and Transformation</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-4/>Chapter 4: Enabling Generic Transformation with Interfaces</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-5/>Chapter 5: Partial Lowering to Lower-Level Dialects for Optimization</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-6/>Chapter 6: Lowering to LLVM and CodeGeneration</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-7/>Chapter 7: Adding a Composite Type to Toy</a></li></ul></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Tutorials/transform/>Transform Dialect Tutorial+</a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch0/>Chapter 0: A Primer on “Structured” Linalg Operations</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch1/>Chapter 1: Combining Existing Transformations</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch2/>Chapter 2: Adding a Simple New Transformation Operation</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch3/>Chapter 3: More than Simple Transform Operations</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch4/>Chapter 4: Matching Payload with Transform Operations</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/ChH/>Chapter H: Reproducing Halide Schedule</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Tutorials/UnderstandingTheIRStructure/>Understanding the IR Structure</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/MlirOpt/>Using `mlir-opt`</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/DataFlowAnalysis/>Writing DataFlow Analyses in MLIR</a></li></ul></li></ul></li></ul></nav><div class=sidebar-footer></div></div></div><a href=# id=backtothetop-fixed class=backtothetop data-backtothetop-duration=600 data-backtothetop-easing=easeOutQuart data-backtothetop-fixed-fadein=1000 data-backtothetop-fixed-fadeout=1000 data-backtothetop-fixed-bottom=10 data-backtothetop-fixed-right=20> </a></div></body></html>