CINXE.COM

<!doctype html><html lang=en-us><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=no"><title>'xegpu' Dialect - MLIR</title><meta name=description content="Multi-Level IR Compiler Framework"><meta name=generator content="Hugo 0.119.0"><link href=https://mlir.llvm.org/index.xml rel=alternate type=application/rss+xml><link rel=canonical href=https://mlir.llvm.org/docs/Dialects/XeGPU/><link rel=stylesheet href=https://mlir.llvm.org/css/theme.css><script src=https://use.fontawesome.com/releases/v5.0.6/js/all.js></script> <link rel=stylesheet href=https://mlir.llvm.org/css/chroma.min.css><script src=https://cdn.jsdelivr.net/npm/jquery@3.3.1/dist/jquery.min.js></script> <script src=https://cdn.jsdelivr.net/npm/jquery.easing@1.4.1/jquery.easing.min.js></script> <script src=https://mlir.llvm.org/js/bundle.js></script> <script type=text/javascript src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script> <script type=text/x-mathjax-config> MathJax.Hub.Config({ tex2jax: { inlineMath: [['$', '$'] ], displayMath: [ ['$$','$$'], ["\\[","\\]"] ] } }); </script><link rel=apple-touch-icon sizes=180x180 href="/apple-touch-icon.png?v=1"><link rel=icon type=image/png sizes=32x32 href="/favicon-32x32.png?v=1"><link rel=icon type=image/png sizes=16x16 href="/favicon-16x16.png?v=1"><link rel=manifest href="/site.webmanifest?v=1"><link rel=mask-icon href="/safari-pinned-tab.svg?v=1" color=#3775e0><link rel="shortcut icon" href="/favicon.ico?v=1"><meta name=msapplication-TileColor content="#2d89ef"><meta name=theme-color content="#ffffff"><link rel=icon href=/favicon.svg type=image/svg+xml sizes=any><style>:root{}</style></head><body><div class=container><header><h1><div><img src=https://mlir.llvm.org//mlir-logo.png width=40px align=absmiddle> MLIR</div></h1><p class=description>Multi-Level IR Compiler Framework</p></header><div class=global-menu><nav><ul><li class=parent><a href>Community<i class="fas fa-angle-right"></i></a><ul class=sub-menu><li class=child><a href=https://llvm.discourse.group/c/mlir/31>Forums</a></li><li class=child><a href=https://discord.gg/xS7Z362>Chat</a></li></ul></li><li><a href=/getting_started/Debugging/>Debugging Tips</a></li><li><a href=/getting_started/Faq/>FAQ</a></li><li class=parent><a href=https://github.com/llvm/llvm-project/tree/main/mlir>Source<i class="fas fa-angle-right"></i></a><ul class=sub-menu><li class=child><a href=/doxygen/>Doxygen</a></li><li class=child><a href=https://github.com/llvm/llvm-project/tree/main/mlir>GitHub</a></li></ul></li><li><a href="https://bugs.llvm.org/buglist.cgi?bug_status=__open__&list_id=177877&order=changeddate%20DESC%2Cpriority%2Cbug_severity&product=MLIR&query_format=specific">Bugs</a></li><li><a href=https://github.com/llvm/mlir-www/tree/main/website/static/LogoAssets>Logo Assets</a></li><li><a href=https://www.youtube.com/MLIRCompiler>Youtube Channel</a></li></ul></nav></div><div class=content-container><main><h1>'xegpu' Dialect</h1><p><em>The XeGPU dialect that models Intel GPU’s ISA</em></p><p>The XeGPU dialect models Intel Xe ISA semantics but works at vector and TensorDesc data type. It provides 1:1 mappings to match Xe instructions like DPAS and 2D block load. The matrix size being processed at this level exactly matches the hardware instructions or the intrinsic supported by the lower-level GPU compiler.</p><p><nav id=TableOfContents><ul><li><a href=#operations>Operations</a><ul><li><a href=#xegpualloc_nbarrier-xegpuallocnbarrierop><code>xegpu.alloc_nbarrier</code> (xegpu::AllocNbarrierOp)</a></li><li><a href=#xegpuatomic_rmw-xegpuatomicrmwop><code>xegpu.atomic_rmw</code> (xegpu::AtomicRMWOp)</a></li><li><a href=#xegpucreate_nd_tdesc-xegpucreatenddescop><code>xegpu.create_nd_tdesc</code> (xegpu::CreateNdDescOp)</a></li><li><a href=#xegpucreate_tdesc-xegpucreatedescop><code>xegpu.create_tdesc</code> (xegpu::CreateDescOp)</a></li><li><a href=#xegpudpas-xegpudpasop><code>xegpu.dpas</code> (xegpu::DpasOp)</a></li><li><a href=#xegpufence-xegpufenceop><code>xegpu.fence</code> (xegpu::FenceOp)</a></li><li><a href=#xegpuinit_nbarrier-xegpuinitnbarrierop><code>xegpu.init_nbarrier</code> (xegpu::InitNbarrierOp)</a></li><li><a href=#xegpuload-xegpuloadgatherop><code>xegpu.load</code> (xegpu::LoadGatherOp)</a></li><li><a href=#xegpuload_nd-xegpuloadndop><code>xegpu.load_nd</code> (xegpu::LoadNdOp)</a></li><li><a href=#xegpunbarrier_arrive-xegpunbarrierarriveop><code>xegpu.nbarrier_arrive</code> (xegpu::NbarrierArriveOp)</a></li><li><a href=#xegpunbarrier_wait-xegpunbarrierwaitop><code>xegpu.nbarrier_wait</code> (xegpu::NbarrierWaitOp)</a></li><li><a href=#xegpuprefetch-xegpuprefetchop><code>xegpu.prefetch</code> (xegpu::PrefetchOp)</a></li><li><a href=#xegpuprefetch_nd-xegpuprefetchndop><code>xegpu.prefetch_nd</code> (xegpu::PrefetchNdOp)</a></li><li><a href=#xegpustore-xegpustorescatterop><code>xegpu.store</code> (xegpu::StoreScatterOp)</a></li><li><a href=#xegpustore_nd-xegpustorendop><code>xegpu.store_nd</code> (xegpu::StoreNdOp)</a></li><li><a href=#xegpuupdate_nd_offset-xegpuupdatendoffsetop><code>xegpu.update_nd_offset</code> (xegpu::UpdateNdOffsetOp)</a></li><li><a href=#xegpuupdate_offset-xegpuupdateoffsetop><code>xegpu.update_offset</code> (xegpu::UpdateOffsetOp)</a></li></ul></li><li><a href=#attributes-11>Attributes</a><ul><li><a href=#blocktensordescattr>BlockTensorDescAttr</a></li><li><a href=#cachepolicyattr>CachePolicyAttr</a></li><li><a href=#fencescopeattr>FenceScopeAttr</a></li><li><a href=#memoryspaceattr>MemorySpaceAttr</a></li><li><a href=#sgmapattr>SGMapAttr</a></li><li><a href=#scattertensordescattr>ScatterTensorDescAttr</a></li></ul></li><li><a href=#types>Types</a><ul><li><a href=#nbarriertype>NbarrierType</a></li><li><a href=#tensordesctype>TensorDescType</a></li></ul></li><li><a href=#enums>Enums</a><ul><li><a href=#cmpfpredicate>CmpFPredicate</a></li><li><a href=#cmpipredicate>CmpIPredicate</a></li><li><a href=#integeroverflowflags>IntegerOverflowFlags</a></li><li><a href=#roundingmode>RoundingMode</a></li><li><a href=#atomicrmwkind>AtomicRMWKind</a></li><li><a href=#fastmathflags>FastMathFlags</a></li><li><a href=#cachepolicy>CachePolicy</a></li><li><a href=#fencescope>FenceScope</a></li><li><a href=#memoryspace>MemorySpace</a></li></ul></li></ul></nav><h2 id=operations>Operations <a class=headline-hash href=#operations>¶</a></h2><p><a href=https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/XeGPU/IR/XeGPU.td>source</a></p><h3 id=xegpualloc_nbarrier-xegpuallocnbarrierop><code>xegpu.alloc_nbarrier</code> (xegpu::AllocNbarrierOp) <a class=headline-hash href=#xegpualloc_nbarrier-xegpuallocnbarrierop>¶</a></h3><p><em>It allocates a set of named barriers.</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.alloc_nbarrier` $nbarrier_num attr-dict </code></pre><p>AllocNbarrier is to create a set of named barriers as specified by <code>nbarrier_num</code>. Named barriers are workgroup level resources, and are shared by all threads in the workgroup. For example, there are up to 32 barriers (range 0-31) for each XeCore on PVC. A typical use case is that a workgroup is partitioned into N subgroups of threads (N <= 32), and each subgroup coordinating their work with a separate barrier with id range from 0 to N respectively.</p><h4 id=attributes>Attributes: <a class=headline-hash href=#attributes>¶</a></h4><table><tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr><tr><td><code>nbarrier_num</code></td><td>::mlir::IntegerAttr</td><td>64-bit signless integer attribute</td></tr></table><h3 id=xegpuatomic_rmw-xegpuatomicrmwop><code>xegpu.atomic_rmw</code> (xegpu::AtomicRMWOp) <a class=headline-hash href=#xegpuatomic_rmw-xegpuatomicrmwop>¶</a></h3><p><em>Atomic ready-modify-write operation on the TensorDesc.</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.atomic_rmw` $kind $tensorDesc `,` $mask `,` $value attr-dict `:` qualified(type($tensorDesc)) `,` type($mask) `,` type($value) `->` type($result) </code></pre><p>The <code>xegpu.atomic_rmw</code> operation provides a way to perform a read-modify-write operation on the region described by the <code>TensorDesc</code> free from data races. The <code>kind</code> enumeration specifies the modification to be performed, The <code>mask</code> operand has the same shape with <code>TensorDesc</code>, and is used to enable or disable specific data points of the <code>TensorDesc</code>. The <code>value</code> operand represents the new value to be applied during the modification.</p><p>Traits: <code>AlwaysSpeculatableImplTrait</code></p><p>Interfaces: <code>ConditionallySpeculatable</code>, <code>NoMemoryEffect (MemoryEffectOpInterface)</code></p><p>Effects: <code>MemoryEffects::Effect{}</code></p><h4 id=attributes-1>Attributes: <a class=headline-hash href=#attributes-1>¶</a></h4><table><tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr><tr><td><code>kind</code></td><td>::mlir::arith::AtomicRMWKindAttr</td><td><details><summary>allowed 64-bit signless integer cases: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14</summary><p>Enum cases:</p><ul><li>addf (<code>addf</code>)</li><li>addi (<code>addi</code>)</li><li>assign (<code>assign</code>)</li><li>maximumf (<code>maximumf</code>)</li><li>maxs (<code>maxs</code>)</li><li>maxu (<code>maxu</code>)</li><li>minimumf (<code>minimumf</code>)</li><li>mins (<code>mins</code>)</li><li>minu (<code>minu</code>)</li><li>mulf (<code>mulf</code>)</li><li>muli (<code>muli</code>)</li><li>ori (<code>ori</code>)</li><li>andi (<code>andi</code>)</li><li>maxnumf (<code>maxnumf</code>)</li><li>minnumf (<code>minnumf</code>)</li></ul></details></td></tr></table><h4 id=operands>Operands: <a class=headline-hash href=#operands>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>tensorDesc</code></td><td>TensorDesc describing regions of interested data.</td></tr><tr><td style=text-align:center><code>mask</code></td><td>vector of 1-bit signless integer values of ranks 1 or 1-bit signless integer</td></tr><tr><td style=text-align:center><code>value</code></td><td>vector of 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type values of ranks 1/2/3/4 or 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type</td></tr></tbody></table><h4 id=results>Results: <a class=headline-hash href=#results>¶</a></h4><table><thead><tr><th style=text-align:center>Result</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>result</code></td><td>vector of 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type values of ranks 1/2/3/4 or 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type</td></tr></tbody></table><h3 id=xegpucreate_nd_tdesc-xegpucreatenddescop><code>xegpu.create_nd_tdesc</code> (xegpu::CreateNdDescOp) <a class=headline-hash href=#xegpucreate_nd_tdesc-xegpucreatenddescop>¶</a></h3><p><em>Create nd-tensor descriptor operation</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.create_nd_tdesc` $source `` custom<DynamicIndexList>($offsets, $const_offsets) (`,` custom<DynamicIndexList>($shape, $const_shape)^ `,` custom<DynamicIndexList>($strides, $const_strides))? attr-dict `:` type($source) `->` qualified(type($TensorDesc)) </code></pre><p>The “create_nd_tdesc” operation creates a TensorDescType which represents a sub-view of a 1D/2D memory region inside the one or two innermost dimensions of the source. (It can be extended to support n-D memory region if needed in future). Elements in the subview continuous in each dimension. It encodes the following important information for supporting Intel hardware features:</p><ul><li><p>source: an object representing (starting address/pointer of) a memory region. It can be either a memref object, or simply a pointer represented by uint64_t type. For the case of dynamic memrefs or pointer, the shape and layout information of the memory region should be explicitly passed via <code>shape</code> and <code>strides</code> parameters.</p></li><li><p>offsets: index values represents offsets from the “source” at the each dimension at which the subview of the target memory will be created. It is encoded via “offsets” and “const_offsets”, such that it can accept various forms, such as, operands (e.g., [%c0, %c]) and attributes (e.g., [2, 4]).</p></li><li><p>shape: the shape information of the memory region pointed by the “source”. It is typically encoded via the MemRefType of the source, e.g., memref<4096x4096xf16>. But if “source” is simply a pointer represented as uint64_t type, or a memref type without shape information e.g., memref<?x?xf16>, the shape information has to be explicitly passed via the “shape” and “const_shape” arguments.</p></li><li><p>strides: the strides of the memory region pointed by the “source”. Similar to shape, it is typically encoded via the MemRefType of the source too. But if “source” is simply a pointer represented as uint64_t type, or a memref type without shape information e.g., memref<?x?xf16>, the strides information has to be explicitly passed via the “strides” and “const_strides” argument.</p></li></ul><p>Example 1 (suppose the tensor shape inferred by the compiler is 8x16):</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl><span class=nv>%0</span> <span class=p>=</span> <span class=kt>memref</span><span class=p>.</span>alloc<span class=p>()</span> <span class=p>:</span> <span class=kt>memref</span><span class=p><</span><span class=m>1024x1024x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl><span class=nv>%c0</span> <span class=p>=</span> arith<span class=p>.</span><span class=kt>constant</span> <span class=m>0</span> <span class=p>:</span> <span class=k>index</span> </span></span><span class=line><span class=cl><span class=nv>%c1</span> <span class=p>=</span> arith<span class=p>.</span><span class=kt>constant</span> <span class=m>1</span> <span class=p>:</span> <span class=k>index</span> </span></span><span class=line><span class=cl><span class=nv>%1</span> <span class=p>=</span> xegpu<span class=p>.</span>create_nd_tdesc <span class=nv>%0</span><span class=p>[</span><span class=nv>%c0</span><span class=p>,</span> <span class=nv>%c0</span><span class=p>]:</span> <span class=kt>memref</span><span class=p><</span><span class=m>1024x1024x</span><span class=k>f32</span><span class=p>></span> <span class=p>-></span> TensorDesc<span class=p><</span><span class=m>8x16x</span><span class=k>f32</span><span class=p>></span> </span></span></code></pre></div><p>Example 2 (suppose the tensor shape inferred by the compiler is 8x16):</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl><span class=nv>%0</span> <span class=p>=</span> <span class=kt>memref</span><span class=p>.</span>alloc<span class=p>(</span><span class=nv>%h</span><span class=p>,</span> <span class=nv>%w</span><span class=p>)</span> <span class=p>:</span> <span class=kt>memref</span><span class=p><</span><span class=m>?x?x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl><span class=nv>%c0</span> <span class=p>=</span> arith<span class=p>.</span><span class=kt>constant</span> <span class=m>0</span> <span class=p>:</span> <span class=k>index</span> </span></span><span class=line><span class=cl><span class=nv>%c1</span> <span class=p>=</span> arith<span class=p>.</span><span class=kt>constant</span> <span class=m>1</span> <span class=p>:</span> <span class=k>index</span> </span></span><span class=line><span class=cl><span class=nv>%1</span> <span class=p>=</span> xegpu<span class=p>.</span>create_nd_tdesc <span class=nv>%0</span><span class=p>[</span><span class=nv>%c0</span><span class=p>,</span> <span class=nv>%c0</span><span class=p>],</span> <span class=p>[</span><span class=nv>%h</span><span class=p>,</span> <span class=nv>%w</span><span class=p>],</span> <span class=p>[</span><span class=nv>%w</span><span class=p>,</span> <span class=nv>%c1</span><span class=p>]:</span> <span class=kt>memref</span><span class=p><</span><span class=m>?x?x</span><span class=k>f32</span><span class=p>></span> <span class=p>-></span> TensorDesc<span class=p><</span><span class=m>8x16x</span><span class=k>f32</span><span class=p>></span> </span></span></code></pre></div><p>Example 3 (suppose the tensor shape inferred by the compiler is 8x16):</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl><span class=nv>%0</span> <span class=p>=</span> <span class=p>...</span> <span class=p>:</span> ui64 </span></span><span class=line><span class=cl><span class=nv>%c0</span> <span class=p>=</span> arith<span class=p>.</span><span class=kt>constant</span> <span class=m>0</span> <span class=p>:</span> <span class=k>index</span> </span></span><span class=line><span class=cl><span class=nv>%c1</span> <span class=p>=</span> arith<span class=p>.</span><span class=kt>constant</span> <span class=m>1</span> <span class=p>:</span> <span class=k>index</span> </span></span><span class=line><span class=cl><span class=nv>%1</span> <span class=p>=</span> xegpu<span class=p>.</span>create_nd_tdesc <span class=nv>%0</span><span class=p>[</span><span class=nv>%c0</span><span class=p>,</span> <span class=nv>%c0</span><span class=p>],</span> <span class=p>[</span><span class=nv>%h</span><span class=p>,</span> <span class=nv>%w</span><span class=p>],</span> <span class=p>[</span><span class=nv>%w</span><span class=p>,</span> <span class=nv>%c1</span><span class=p>]:</span> ui64 <span class=p>-></span> TensorDesc<span class=p><</span><span class=m>8x16x</span><span class=k>f32</span><span class=p>></span> </span></span></code></pre></div><p>Traits: <code>AlwaysSpeculatableImplTrait</code>, <code>AttrSizedOperandSegments</code></p><p>Interfaces: <code>ConditionallySpeculatable</code>, <code>NoMemoryEffect (MemoryEffectOpInterface)</code>, <code>OffsetSizeAndStrideOpInterface</code>, <code>ViewLikeOpInterface</code></p><p>Effects: <code>MemoryEffects::Effect{}</code></p><h4 id=attributes-2>Attributes: <a class=headline-hash href=#attributes-2>¶</a></h4><table><tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr><tr><td><code>const_offsets</code></td><td>::mlir::DenseI64ArrayAttr</td><td>i64 dense array attribute</td></tr><tr><td><code>const_shape</code></td><td>::mlir::DenseI64ArrayAttr</td><td>i64 dense array attribute</td></tr><tr><td><code>const_strides</code></td><td>::mlir::DenseI64ArrayAttr</td><td>i64 dense array attribute</td></tr></table><h4 id=operands-1>Operands: <a class=headline-hash href=#operands-1>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>source</code></td><td>non-0-ranked.memref of 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type values or 64-bit unsigned integer or 32-bit unsigned integer or 64-bit signless integer or 32-bit signless integer</td></tr><tr><td style=text-align:center><code>offsets</code></td><td>variadic of index</td></tr><tr><td style=text-align:center><code>shape</code></td><td>variadic of index</td></tr><tr><td style=text-align:center><code>strides</code></td><td>variadic of index</td></tr></tbody></table><h4 id=results-1>Results: <a class=headline-hash href=#results-1>¶</a></h4><table><thead><tr><th style=text-align:center>Result</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>TensorDesc</code></td><td>TensorDesc describing regions of interested data.</td></tr></tbody></table><h3 id=xegpucreate_tdesc-xegpucreatedescop><code>xegpu.create_tdesc</code> (xegpu::CreateDescOp) <a class=headline-hash href=#xegpucreate_tdesc-xegpucreatedescop>¶</a></h3><p><em>Create scattered tensor descriptors (TensorDesc).</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.create_tdesc` $source `,` $offsets attr-dict `:` type($source) `,` type($offsets) `->` qualified(type($TensorDesc)) </code></pre><p>“create_tdesc” is similar to “create_nd_tdesc” in terms that it creates a Tensor Descriptor (TensorDescType) for a memory region. While “create_nd_tdesc” is for creating continuous subviews, “create_tdesc” is for creating non-continuous (scattered) subviews, allowing each work-item in a subgroup specifying their own offset. It accepts the following parameters:</p><ul><li>source: a 1D memref or pointer (uint64_t) represents the flattened memory object.</li><li>offsets: a vector containing offsets of each access point. Its size is fixed to the hardware supportted subgroup size, e.g., 16 on PVC, implying each element in the vector corresponds to a work-item (SIMT lane) in the subgroup.</li></ul><p>The first dimension of the result TensorDesc corresponds to work-items, so it should match the dimension of offsets. It may also has a second dimension corresponding to the chunk_size if the chunk size is larger than 1.</p><p>Example 1. It assumes subgroup size is 4, and accesses a[0], a[16], a[32], a[64]</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl><span class=nv>%a</span> <span class=p>=</span> <span class=kt>memref</span><span class=p>.</span>alloc<span class=p>()</span> <span class=p>:</span> <span class=kt>memref</span><span class=p><</span><span class=m>1024x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl><span class=nv>%0</span> <span class=p>=</span> arith<span class=p>.</span><span class=kt>constant</span> dense<span class=p><[</span><span class=m>0</span><span class=p>,</span> <span class=m>16</span><span class=p>,</span> <span class=m>32</span><span class=p>,</span> <span class=m>64</span><span class=p>]></span> <span class=p>:</span> <span class=kt>vector</span><span class=p><</span><span class=m>4x</span><span class=k>index</span><span class=p>></span> </span></span><span class=line><span class=cl><span class=nv>%1</span> <span class=p>=</span> xegpu<span class=p>.</span>create_tdesc <span class=nv>%a</span><span class=p>,</span> <span class=nv>%0</span><span class=p>:</span> <span class=kt>memref</span><span class=p><</span><span class=m>1024x</span><span class=k>f32</span><span class=p>>,</span> <span class=kt>vector</span><span class=p><</span><span class=m>4x</span><span class=k>index</span><span class=p>></span> <span class=p>-></span> TensorDesc<span class=p><</span><span class=m>4x</span><span class=k>f32</span><span class=p>></span> </span></span></code></pre></div><p>Example 2. It assumes subgroup size is 4, and each workitem access 8 elements. It will access totally 32 data elements: a[0:7], a[16:23], a[32:39], a[64:71]</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl><span class=nv>%0</span> <span class=p>=</span> <span class=kt>memref</span><span class=p>.</span>alloc<span class=p>()</span> <span class=p>:</span> <span class=kt>memref</span><span class=p><</span><span class=m>1024x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl><span class=nv>%off</span> <span class=p>=</span> arith<span class=p>.</span><span class=kt>constant</span> dense<span class=p><[</span><span class=m>0</span><span class=p>,</span> <span class=m>16</span><span class=p>,</span> <span class=m>32</span><span class=p>,</span> <span class=m>64</span><span class=p>]></span> <span class=p>:</span> <span class=kt>vector</span><span class=p><</span><span class=m>4x</span><span class=k>index</span><span class=p>></span> </span></span><span class=line><span class=cl><span class=nv>%1</span> <span class=p>=</span> xegpu<span class=p>.</span>create_tdesc <span class=nv>%0</span><span class=p>,</span> <span class=nv>%off</span> <span class=p>:</span> <span class=kt>memref</span><span class=p><</span><span class=m>1024x</span><span class=k>f32</span><span class=p>>,</span> <span class=kt>vector</span><span class=p><</span><span class=m>4x</span><span class=k>index</span><span class=p>></span> </span></span><span class=line><span class=cl> <span class=p>-></span> TensorDesc<span class=p><</span><span class=m>4x8x</span><span class=k>f32</span><span class=p>,</span> <span class=nv>#xegpu.scattered_tdesc_attr</span><span class=p><</span><span class=nl>chunk_size =</span> <span class=m>8</span><span class=p>>></span> </span></span></code></pre></div><p>Example 3. It is similar to Example 2, but there is some overlaps among workitems. It accesses: a[0:7], a[4:11], a[8:15], a[12:19]</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl><span class=nv>%0</span> <span class=p>=</span> <span class=kt>memref</span><span class=p>.</span>alloc<span class=p>()</span> <span class=p>:</span> <span class=kt>memref</span><span class=p><</span><span class=m>1024x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl><span class=nv>%off</span> <span class=p>=</span> arith<span class=p>.</span><span class=kt>constant</span> dense<span class=p><[</span><span class=m>0</span><span class=p>,</span> <span class=m>4</span><span class=p>,</span> <span class=m>8</span><span class=p>,</span> <span class=m>12</span><span class=p>]></span> <span class=p>:</span> <span class=kt>vector</span><span class=p><</span><span class=m>4x</span><span class=k>index</span><span class=p>></span> </span></span><span class=line><span class=cl><span class=nv>%1</span> <span class=p>=</span> xegpu<span class=p>.</span>create_tdesc <span class=nv>%0</span><span class=p>,</span> <span class=nv>%off</span> <span class=p>:</span> <span class=kt>memref</span><span class=p><</span><span class=m>1024x</span><span class=k>f32</span><span class=p>>,</span> <span class=kt>vector</span><span class=p><</span><span class=m>4x</span><span class=k>index</span><span class=p>></span> </span></span><span class=line><span class=cl> <span class=p>-></span> TensorDesc<span class=p><</span><span class=m>4x8x</span><span class=k>f32</span><span class=p>,</span> <span class=nv>#xegpu.scattered_tdesc_attr</span><span class=p><</span><span class=nl>chunk_size =</span> <span class=m>8</span><span class=p>>></span> </span></span></code></pre></div><p>Traits: <code>AlwaysSpeculatableImplTrait</code></p><p>Interfaces: <code>ConditionallySpeculatable</code>, <code>NoMemoryEffect (MemoryEffectOpInterface)</code>, <code>ViewLikeOpInterface</code></p><p>Effects: <code>MemoryEffects::Effect{}</code></p><h4 id=operands-2>Operands: <a class=headline-hash href=#operands-2>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>source</code></td><td>non-0-ranked.memref of 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type values or 64-bit unsigned integer or 32-bit unsigned integer or 64-bit signless integer or 32-bit signless integer</td></tr><tr><td style=text-align:center><code>offsets</code></td><td>vector of index values of ranks 1</td></tr></tbody></table><h4 id=results-2>Results: <a class=headline-hash href=#results-2>¶</a></h4><table><thead><tr><th style=text-align:center>Result</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>TensorDesc</code></td><td>TensorDesc describing regions of interested data.</td></tr></tbody></table><h3 id=xegpudpas-xegpudpasop><code>xegpu.dpas</code> (xegpu::DpasOp) <a class=headline-hash href=#xegpudpas-xegpudpasop>¶</a></h3><p><em>It performs mma computation</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.dpas` $lhs `,` $rhs (`,` $acc^)? attr-dict `:` type($lhs)`,` type($rhs) (`,` type($acc)^)? `->` type($result) </code></pre><p>DPAS performs matrix multiplication on matrix A of <code>mxk</code> size, B of <code>kxn</code> size, and accumulate on matrix C of <code>mxn</code> to the same size matrix , <code>m=8</code>, <code>n=16</code> and <code>k=8 * 32/bit_width_of_elem_type</code>. So for fp16 data type, the matrices are <code>A: vector<8x16xf16></code>, <code>B: vector<16x16xf16></code>, and <code>C/D: vector<8x16xf32></code>. Besides the matrix size requirements, DPAS also requires A and B to be loaded with the required data layout. Specially,</p><pre><code>VNNI layout is required for B operand. It is achieved via adding `packed` attribute to the `load_nd` operator. Due to the VNNI transformation, B operands can be represented as a 3D vector, with the last dimension representing the VNNI factor, which is computed as `32/bit_width_of_elem_type`. Thus, `B: vector<16x16xf16>` can be represented as `B: vector<8x16x2xf16>`. Note: on PVC, the hardware can perform load with VNNI transformation when data element type is 16-bit or lower precision, taking 2 or 4 elements from the first dimension and inserted into the newly added innermost dimension. </code></pre><p>Traits: <code>AlwaysSpeculatableImplTrait</code></p><p>Interfaces: <code>ConditionallySpeculatable</code>, <code>NoMemoryEffect (MemoryEffectOpInterface)</code></p><p>Effects: <code>MemoryEffects::Effect{}</code></p><h4 id=operands-3>Operands: <a class=headline-hash href=#operands-3>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>lhs</code></td><td>vector of 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type values of ranks 2/3</td></tr><tr><td style=text-align:center><code>rhs</code></td><td>vector of 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type values of ranks 2/3</td></tr><tr><td style=text-align:center><code>acc</code></td><td>vector of 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type values of ranks 2</td></tr></tbody></table><h4 id=results-3>Results: <a class=headline-hash href=#results-3>¶</a></h4><table><thead><tr><th style=text-align:center>Result</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>result</code></td><td>vector of 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type values of ranks 2</td></tr></tbody></table><h3 id=xegpufence-xegpufenceop><code>xegpu.fence</code> (xegpu::FenceOp) <a class=headline-hash href=#xegpufence-xegpufenceop>¶</a></h3><p><em>It synchronizes memory accesses.</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.fence` `memory_kind` `=` `` $memory_kind `,` `fence_scope` `=` `` $fence_scope attr-dict </code></pre><p>It synchronizes the memory access between write and following read or write. 1. <code>Memory_kind</code> describes the memory kind. “global” means the global memory, “slm” means the share local memory. 2. <code>Fence_scope</code> describes the scope of fence. “Workgroup” means that the scope would be within each workgroup. “GPU” means the scope would be across workgroups within the GPU.</p><h4 id=attributes-3>Attributes: <a class=headline-hash href=#attributes-3>¶</a></h4><table><tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr><tr><td><code>memory_kind</code></td><td>::mlir::xegpu::MemorySpaceAttr</td><td><details><summary>Describe the location of data described by a `TensorDesc`: Global device memory (`Global`) or Shared local memory (`SLM`).</summary><p>Enum cases:</p><ul><li>global (<code>Global</code>)</li><li>slm (<code>SLM</code>)</li></ul></details></td></tr><tr><td><code>fence_scope</code></td><td>::mlir::xegpu::FenceScopeAttr</td><td><details><summary>Describes the scope of fence. "workgroup" means that the scope is within each work group. "gpu" means the scope is across work groups within the gpu.</summary><p>Enum cases:</p><ul><li>workgroup (<code>Workgroup</code>)</li><li>gpu (<code>GPU</code>)</li></ul></details></td></tr></table><h3 id=xegpuinit_nbarrier-xegpuinitnbarrierop><code>xegpu.init_nbarrier</code> (xegpu::InitNbarrierOp) <a class=headline-hash href=#xegpuinit_nbarrier-xegpuinitnbarrierop>¶</a></h3><p><em>It assigns a named barrier to the current thread.</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.init_nbarrier` $nbarrier_id `,` $participant_thread_num attr-dict `:` type($nbarrier_id) `,` type($participant_thread_num) `->` qualified(type($result)) </code></pre><p>InitNbarrierOp assigns the named barrier with the specified barrier ID (0~31) to the current thread. Multiple threads may bind to the same named barrier, and the <code>participant_thread_num</code> specifies the total number of threads associated with the nbarrier. It returns an object of NbarrierType representing the barrier</p><h4 id=operands-4>Operands: <a class=headline-hash href=#operands-4>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>nbarrier_id</code></td><td>8-bit signless integer</td></tr><tr><td style=text-align:center><code>participant_thread_num</code></td><td>8-bit signless integer</td></tr></tbody></table><h4 id=results-4>Results: <a class=headline-hash href=#results-4>¶</a></h4><table><thead><tr><th style=text-align:center>Result</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>result</code></td><td>!xegpu.nbarrier a custom XeGPU type representing a barrier.</td></tr></tbody></table><h3 id=xegpuload-xegpuloadgatherop><code>xegpu.load</code> (xegpu::LoadGatherOp) <a class=headline-hash href=#xegpuload-xegpuloadgatherop>¶</a></h3><p><em>Load a set of scattered data points from memory.</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.load` $TensorDesc `,` $mask prop-dict attr-dict `:` qualified(type($TensorDesc)) `,` type($mask) `->` type($value) </code></pre><p>It (aka. load) load data per each work-item. The output describes the data being loaded at the subgroup level, so its size is consistent with the number of work-items in a subgroup. When the chunk size is larger than 2, the output vector is a 2D vector, with dim-1 correspoding to work-items, and dim-0 corresponding to the chunk size loaded by each work-item. Specially, there is a transpose effect on the result (as compared to the TensorDesc) due to the hardware implementation. Therefore, a transpose attribute is introduced on purpose, making sure users are aware of this implicit transformation.</p><p>The mask operand masks out memory access so that it is safe to pass out-of-boundary addresses/offsets as long as they are masked. It applies to slots of SIMD lanes.</p><p>Example 1:</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl> <span class=nv>%2</span> <span class=p>=</span> xegpu<span class=p>.</span>load <span class=nv>%1</span><span class=p>,</span> <span class=nv>%0</span> <span class=p>{</span><span class=nl>l1_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>cached<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l2_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>uncached<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l3_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>uncached<span class=p>>}</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>!</span>xegpu<span class=p>.</span><span class=kt>tensor</span>_desc<span class=p><</span><span class=m>16x</span><span class=k>f32</span><span class=p>,</span> <span class=nv>#xegpu.scatter_tdesc_attr</span><span class=p><</span><span class=nl>memory_space=</span>global<span class=p>>>,</span> </span></span><span class=line><span class=cl> <span class=kt>vector</span><span class=p><</span><span class=m>16x</span><span class=k>i1</span><span class=p>></span> <span class=p>-></span> <span class=kt>vector</span><span class=p><</span><span class=m>16x</span><span class=k>f32</span><span class=p>></span> </span></span></code></pre></div><p>Example 2:</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl> <span class=nv>%2</span> <span class=p>=</span> xegpu<span class=p>.</span>load <span class=nv>%1</span><span class=p>,</span> <span class=nv>%0</span> <span class=p>{</span>transpose<span class=p>,</span> </span></span><span class=line><span class=cl> <span class=nl>l1_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>cached<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l2_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>uncached<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l3_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>uncached<span class=p>>}</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>!</span>xegpu<span class=p>.</span><span class=kt>tensor</span>_desc<span class=p><</span><span class=m>16x8x</span><span class=k>f32</span><span class=p>,</span> <span class=nv>#xegpu.scatter_tdesc_attr</span><span class=p><</span><span class=nl>memory_space=</span>global<span class=p>,</span> <span class=nl>chunk_size=</span><span class=m>8</span><span class=p>>>,</span> </span></span><span class=line><span class=cl> <span class=kt>vector</span><span class=p><</span><span class=m>16x</span><span class=k>i1</span><span class=p>></span> <span class=p>-></span> <span class=kt>vector</span><span class=p><</span><span class=m>8x16x</span><span class=k>f32</span><span class=p>></span> </span></span></code></pre></div><p>Interfaces: <code>MemoryEffectOpInterface (MemoryEffectOpInterface)</code></p><p>Effects: <code>MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource}</code></p><h4 id=attributes-4>Attributes: <a class=headline-hash href=#attributes-4>¶</a></h4><table><tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr><tr><td><code>transpose</code></td><td>::mlir::UnitAttr</td><td>unit attribute</td></tr><tr><td><code>l1_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr><tr><td><code>l2_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr><tr><td><code>l3_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr></table><h4 id=operands-5>Operands: <a class=headline-hash href=#operands-5>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>TensorDesc</code></td><td>TensorDesc describing regions of interested data.</td></tr><tr><td style=text-align:center><code>mask</code></td><td>vector of 1-bit signless integer values of ranks 1 or 1-bit signless integer</td></tr></tbody></table><h4 id=results-5>Results: <a class=headline-hash href=#results-5>¶</a></h4><table><thead><tr><th style=text-align:center>Result</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>value</code></td><td>vector of 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type values of ranks 1/2/3/4 or 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type</td></tr></tbody></table><h3 id=xegpuload_nd-xegpuloadndop><code>xegpu.load_nd</code> (xegpu::LoadNdOp) <a class=headline-hash href=#xegpuload_nd-xegpuloadndop>¶</a></h3><p><em>Loads a n-D block from memory (represented by TensorDesc)to registers (represented by vector)</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.load_nd` $TensorDesc prop-dict attr-dict `:` qualified(type($TensorDesc)) `->` type($value) </code></pre><p>LoadNdOp essentially mimics the hardware block read instruction to read a block of data from memory to register. It takes a set of optional cache hints for each level of cache, L1, L2 and L3. If hardware does not have a correspoding cache, Corresponding cache hint attribute will be masked. VNNI transformation is an hardware feature for Intel GPU, which is used to do data packing during the load for B operand of matrix operation, if the bit width of the data type is less then 32 bits, e.g., fp16. And transpose is another Intel hardware feature, which will do transpose operation when loading the data if the bit width of the data type is fp32 or fp64. It implies that vnni and transpose cannot exit at the same time.</p><p>Example:</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl> xegpu<span class=p>.</span>load_nd <span class=nv>%1</span> <span class=p>{</span><span class=nl>transpose =</span> <span class=p>[</span><span class=m>1</span><span class=p>,</span> <span class=m>0</span><span class=p>],</span> </span></span><span class=line><span class=cl> <span class=nl>l1_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>cached<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l2_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>uncached<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l3_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>streaming<span class=p>>}</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>!</span>xegpu<span class=p>.</span><span class=kt>tensor</span>_desc<span class=p><</span><span class=m>8x16x</span><span class=k>f32</span><span class=p>></span> <span class=p>-></span> <span class=kt>vector</span><span class=p><</span><span class=m>16x8x</span><span class=k>f32</span><span class=p>></span> </span></span></code></pre></div><p>Interfaces: <code>MemoryEffectOpInterface (MemoryEffectOpInterface)</code></p><p>Effects: <code>MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource}</code></p><h4 id=attributes-5>Attributes: <a class=headline-hash href=#attributes-5>¶</a></h4><table><tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr><tr><td><code>packed</code></td><td>::mlir::UnitAttr</td><td>unit attribute</td></tr><tr><td><code>transpose</code></td><td>::mlir::DenseI64ArrayAttr</td><td>i64 dense array attribute</td></tr><tr><td><code>l1_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr><tr><td><code>l2_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr><tr><td><code>l3_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr></table><h4 id=operands-6>Operands: <a class=headline-hash href=#operands-6>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>TensorDesc</code></td><td>TensorDesc describing regions of interested data.</td></tr></tbody></table><h4 id=results-6>Results: <a class=headline-hash href=#results-6>¶</a></h4><table><thead><tr><th style=text-align:center>Result</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>value</code></td><td>vector of 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type values of ranks 1/2/3/4 or 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type</td></tr></tbody></table><h3 id=xegpunbarrier_arrive-xegpunbarrierarriveop><code>xegpu.nbarrier_arrive</code> (xegpu::NbarrierArriveOp) <a class=headline-hash href=#xegpunbarrier_arrive-xegpunbarrierarriveop>¶</a></h3><p><em>It signals the arrival at the named barrier.</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.nbarrier_arrive` $nbarrier attr-dict `:` qualified(type($nbarrier)) </code></pre><p>NbarrierArriveOp signals the hardware (or other threads) that the current thread has produced its data for the consumer threads. When the hardware signalled by <code>participant_thread_num</code> threads for the named barrier, it will notify the threads waiting for the named barrier to continue their work.</p><h4 id=operands-7>Operands: <a class=headline-hash href=#operands-7>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>nbarrier</code></td><td>!xegpu.nbarrier a custom XeGPU type representing a barrier.</td></tr></tbody></table><h3 id=xegpunbarrier_wait-xegpunbarrierwaitop><code>xegpu.nbarrier_wait</code> (xegpu::NbarrierWaitOp) <a class=headline-hash href=#xegpunbarrier_wait-xegpunbarrierwaitop>¶</a></h3><p><em>It waits for a named barrier.</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.nbarrier_wait` $nbarrier attr-dict `:` qualified(type($nbarrier)) </code></pre><p>NbarrierWaitOp signals the hardware which named barrier the current thread is waiting for, such that it can get notified when the named barrier is completed.</p><h4 id=operands-8>Operands: <a class=headline-hash href=#operands-8>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>nbarrier</code></td><td>!xegpu.nbarrier a custom XeGPU type representing a barrier.</td></tr></tbody></table><h3 id=xegpuprefetch-xegpuprefetchop><code>xegpu.prefetch</code> (xegpu::PrefetchOp) <a class=headline-hash href=#xegpuprefetch-xegpuprefetchop>¶</a></h3><p><em>Prefetches a set of scattered data points to cache</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.prefetch` $TensorDesc prop-dict attr-dict `:` qualified(type($TensorDesc)) </code></pre><p>It issues instructions to prefetch a set of scattered data points from memory to each level of the cache based on their cache policy. As compared to prefetch_nd, which works on non-scattered TensorDesc, it works on scattered TensorDesc instead.</p><p>Example:</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl> xegpu<span class=p>.</span>prefetch <span class=nv>%tdesc</span> <span class=p>{</span><span class=nl>l1_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>cached<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l2_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>cached<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l3_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>cached<span class=p>>}</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>!</span>xegpu<span class=p>.</span><span class=kt>tensor</span>_desc<span class=p><</span><span class=m>16x</span><span class=k>f16</span><span class=p>></span> </span></span></code></pre></div><h4 id=attributes-6>Attributes: <a class=headline-hash href=#attributes-6>¶</a></h4><table><tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr><tr><td><code>l1_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr><tr><td><code>l2_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr><tr><td><code>l3_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr></table><h4 id=operands-9>Operands: <a class=headline-hash href=#operands-9>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>TensorDesc</code></td><td>TensorDesc describing regions of interested data.</td></tr></tbody></table><h3 id=xegpuprefetch_nd-xegpuprefetchndop><code>xegpu.prefetch_nd</code> (xegpu::PrefetchNdOp) <a class=headline-hash href=#xegpuprefetch_nd-xegpuprefetchndop>¶</a></h3><p><em>Prefetches a n-D block to cache</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.prefetch_nd` $TensorDesc prop-dict attr-dict `:` qualified(type($TensorDesc)) </code></pre><p>It issues an instruction to prefetch a block of data from continuous memory regions to each level of the cache based on their cache policy.</p><p>Example:</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl> xegpu<span class=p>.</span>prefetch_nd <span class=nv>%tdesc</span> <span class=p>{</span><span class=nl>l1_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>cached<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l2_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>cached<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l3_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>cached<span class=p>>}</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=p>!</span>xegpu<span class=p>.</span><span class=kt>tensor</span>_desc<span class=p><</span><span class=m>8x16x</span><span class=k>f16</span><span class=p>></span> </span></span></code></pre></div><h4 id=attributes-7>Attributes: <a class=headline-hash href=#attributes-7>¶</a></h4><table><tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr><tr><td><code>l1_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr><tr><td><code>l2_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr><tr><td><code>l3_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr></table><h4 id=operands-10>Operands: <a class=headline-hash href=#operands-10>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>TensorDesc</code></td><td>TensorDesc describing regions of interested data.</td></tr></tbody></table><h3 id=xegpustore-xegpustorescatterop><code>xegpu.store</code> (xegpu::StoreScatterOp) <a class=headline-hash href=#xegpustore-xegpustorescatterop>¶</a></h3><p><em>Store data to scattered memory locations.</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.store` $value `,` $TensorDesc `,` $mask prop-dict attr-dict `:` type($value) `,` qualified(type($TensorDesc)) `,` type($mask) </code></pre><p>It (aka. store) stores data to scattered memory locations. The value is typically a 1D vector. But when the chunk size of the TensorDesc is larger than 1, it will be a 2D vector instead. For the later case, dim-1 of the value correspods to the simd lanes and the dim-0 of the value corresponds to the chunk size stored per lane. So <code>store_scatter</code> has transpose effect, which is similar to <code>load_gather</code>. Therefore, a transpose attribute is introduced on purpose, making sure users are aware of this implicit transformation.</p><p>Example 1:</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl> <span class=nv>%3</span> <span class=p>=</span> xegpu<span class=p>.</span>store <span class=nv>%0</span><span class=p>,</span> <span class=nv>%1</span><span class=p>,</span> <span class=nv>%2</span> <span class=p>{</span><span class=nl>l1_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>uncached<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l2_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>write_back<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l3_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>write_through<span class=p>>}</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=kt>vector</span><span class=p><</span><span class=m>16x</span><span class=k>f32</span><span class=p>>,</span> <span class=p>!</span>xegpu<span class=p>.</span><span class=kt>tensor</span>_desc<span class=p><</span><span class=m>16x</span><span class=k>f32</span><span class=p>,</span> <span class=nv>#xegpu.scattered_tdesc_attr</span><span class=p><>>,</span> <span class=kt>vector</span><span class=p><</span><span class=m>16x</span><span class=k>i1</span><span class=p>></span> </span></span></code></pre></div><p>Example 2:</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl> <span class=nv>%3</span> <span class=p>=</span> xegpu<span class=p>.</span>store <span class=nv>%0</span><span class=p>,</span> <span class=nv>%1</span><span class=p>,</span> <span class=nv>%2</span> <span class=p>{</span>transpose<span class=p>,</span> </span></span><span class=line><span class=cl> <span class=nl>l1_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>uncached<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l2_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>write_back<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l3_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>write_through<span class=p>>}</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=kt>vector</span><span class=p><</span><span class=m>8x16x</span><span class=k>f32</span><span class=p>>,</span> <span class=p>!</span>xegpu<span class=p>.</span><span class=kt>tensor</span>_desc<span class=p><</span><span class=m>16x8x</span><span class=k>f32</span><span class=p>,</span> <span class=nv>#xegpu.scattered_tdesc_attr</span><span class=p><</span><span class=nl>chunk_size=</span><span class=m>8</span><span class=p>>>,</span> <span class=kt>vector</span><span class=p><</span><span class=m>16x</span><span class=k>i1</span><span class=p>></span> </span></span></code></pre></div><p>Interfaces: <code>MemoryEffectOpInterface (MemoryEffectOpInterface)</code></p><p>Effects: <code>MemoryEffects::Effect{MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}</code></p><h4 id=attributes-8>Attributes: <a class=headline-hash href=#attributes-8>¶</a></h4><table><tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr><tr><td><code>transpose</code></td><td>::mlir::UnitAttr</td><td>unit attribute</td></tr><tr><td><code>l1_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr><tr><td><code>l2_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr><tr><td><code>l3_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr></table><h4 id=operands-11>Operands: <a class=headline-hash href=#operands-11>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>value</code></td><td>vector of 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type values of ranks 1/2/3/4 or 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type</td></tr><tr><td style=text-align:center><code>TensorDesc</code></td><td>TensorDesc describing regions of interested data.</td></tr><tr><td style=text-align:center><code>mask</code></td><td>vector of 1-bit signless integer values of ranks 1 or 1-bit signless integer</td></tr></tbody></table><h3 id=xegpustore_nd-xegpustorendop><code>xegpu.store_nd</code> (xegpu::StoreNdOp) <a class=headline-hash href=#xegpustore_nd-xegpustorendop>¶</a></h3><p><em>Stores a n-D block register region back to memory, currently only supports 2D</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.store_nd` $value `,` $TensorDesc prop-dict attr-dict `:` type($value) `,` qualified(type($TensorDesc)) </code></pre><p>StoreNdOp essentially mimics the hardware block write instruction io write a block of data from register into the memory region as described by the TensorDesc. It takes a set of optional cache hints for each level of cache, L1, L2 and L3. If hardware does not have a correspoding cache, Corresponding cache hint attribute will be masked.</p><p>Example:</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl> xegpu<span class=p>.</span>store_nd <span class=nv>%3</span><span class=p>,</span> <span class=nv>%2</span> <span class=p>{</span><span class=nl>l1_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>uncached<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l2_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>write_back<span class=p>>,</span> </span></span><span class=line><span class=cl> <span class=nl>l3_hint =</span> <span class=nv>#xegpu.cache_hint</span><span class=p><</span>write_through<span class=p>>}</span> </span></span><span class=line><span class=cl> <span class=p>:</span> <span class=kt>vector</span><span class=p><</span><span class=m>8x16x</span><span class=k>f16</span><span class=p>>,</span> <span class=p>!</span>xegpu<span class=p>.</span><span class=kt>tensor</span>_desc<span class=p><</span><span class=m>8x16x</span><span class=k>f16</span><span class=p>></span> </span></span></code></pre></div><p>Interfaces: <code>MemoryEffectOpInterface (MemoryEffectOpInterface)</code></p><p>Effects: <code>MemoryEffects::Effect{MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}</code></p><h4 id=attributes-9>Attributes: <a class=headline-hash href=#attributes-9>¶</a></h4><table><tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr><tr><td><code>l1_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr><tr><td><code>l2_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr><tr><td><code>l3_hint</code></td><td>::mlir::xegpu::CachePolicyAttr</td><td><details><summary>Describe the cache settings for prefetch/load/store operators</summary><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul></details></td></tr></table><h4 id=operands-12>Operands: <a class=headline-hash href=#operands-12>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>value</code></td><td>vector of 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type values of ranks 1/2/3/4 or 1-bit signless integer or 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 1-bit signed integer or 8-bit signed integer or 16-bit signed integer or 32-bit signed integer or 64-bit signed integer or 1-bit unsigned integer or 8-bit unsigned integer or 16-bit unsigned integer or 32-bit unsigned integer or 64-bit unsigned integer or 16-bit float or 32-bit float or 64-bit float or bfloat16 type or tf32 type</td></tr><tr><td style=text-align:center><code>TensorDesc</code></td><td>TensorDesc describing regions of interested data.</td></tr></tbody></table><h3 id=xegpuupdate_nd_offset-xegpuupdatendoffsetop><code>xegpu.update_nd_offset</code> (xegpu::UpdateNdOffsetOp) <a class=headline-hash href=#xegpuupdate_nd_offset-xegpuupdatendoffsetop>¶</a></h3><p><em>It updates the offsets for the TensorDesc.</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.update_nd_offset` $TensorDesc `,` custom<DynamicIndexList>($offsets, $const_offsets) attr-dict `:` qualified(type($result)) </code></pre><p>The op updates the offset of the given TensorDesc. The offsets are relative offset to the current position in the number of elements. It will result in a same type TensorDesc as the input.</p><p>example:</p><pre tabindex=0><code> %2 = xegpu.update_nd_offset %1, [0, 16]: !xegpu.tensor_desc<8x16xf32> </code></pre><h4 id=attributes-10>Attributes: <a class=headline-hash href=#attributes-10>¶</a></h4><table><tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr><tr><td><code>const_offsets</code></td><td>::mlir::DenseI64ArrayAttr</td><td>i64 dense array attribute</td></tr></table><h4 id=operands-13>Operands: <a class=headline-hash href=#operands-13>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>TensorDesc</code></td><td>TensorDesc describing regions of interested data.</td></tr><tr><td style=text-align:center><code>offsets</code></td><td>variadic of index</td></tr></tbody></table><h4 id=results-7>Results: <a class=headline-hash href=#results-7>¶</a></h4><table><thead><tr><th style=text-align:center>Result</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>result</code></td><td>TensorDesc describing regions of interested data.</td></tr></tbody></table><h3 id=xegpuupdate_offset-xegpuupdateoffsetop><code>xegpu.update_offset</code> (xegpu::UpdateOffsetOp) <a class=headline-hash href=#xegpuupdate_offset-xegpuupdateoffsetop>¶</a></h3><p><em>It updates the offsets for the given tensor descriptor</em></p><p>Syntax:</p><pre tabindex=0><code>operation ::= `xegpu.update_offset` $TensorDesc `,` $offsets attr-dict `:` qualified(type($TensorDesc)) `,` type($offsets) </code></pre><p>It behaves similar to <code>update_nd_offset</code> in terms that it updates offset of a TensorDesc, and the offsets are relative offset to the current position in the number of elements. However, <code>update_nd_offset</code> is to update the start point of a 2D block, so its offset constains two elements representing the shift in each dimension. <code>update_offset</code> is to update the offset per work-item, so its offsets contains values representing shifts for each work-item.</p><pre><code>Example: ```mlir %off = arith.constant dense<[32, 32, 32, 32]> : vector<4xindex> %2 = xegpu.update_offset %1, %off : !xegpu.tensor_desc<4x2xf32, #xegpu.scattered_tdesc_attr<>>, vector<4xindex> ``` </code></pre><h4 id=operands-14>Operands: <a class=headline-hash href=#operands-14>¶</a></h4><table><thead><tr><th style=text-align:center>Operand</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>TensorDesc</code></td><td>TensorDesc describing regions of interested data.</td></tr><tr><td style=text-align:center><code>offsets</code></td><td>vector of index values of ranks 1</td></tr></tbody></table><h4 id=results-8>Results: <a class=headline-hash href=#results-8>¶</a></h4><table><thead><tr><th style=text-align:center>Result</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center><code>result</code></td><td>TensorDesc describing regions of interested data.</td></tr></tbody></table><h2 id=attributes-11>Attributes <a class=headline-hash href=#attributes-11>¶</a></h2><h3 id=blocktensordescattr>BlockTensorDescAttr <a class=headline-hash href=#blocktensordescattr>¶</a></h3><p><em>A composite attribute for <code>TensorDescType</code></em></p><p>Syntax:</p><pre tabindex=0><code>#xegpu.block_tdesc_attr< MemorySpaceAttr, # memory_space IntegerAttr, # array_length BoolAttr # boundary_check > </code></pre><p><code>BlockTensorDesc</code> (or <code>block_tdesc_attr</code>) is a composite attribute defined for <code>TensorDescType</code> for describing following properties of a <code>TensorDesc</code>. 1. <code>memory_space</code>: It describes where the data block described by the TensorDesc is located, <code>Global</code> device memory or <code>Shared</code> local memory. It is default to <code>Global</code>. 2. <code>array_length</code>: It describes how many horizontally consecutive blocks will be loaded by a hardware load instruction. If the TensorDesc shape is 8x16, with array_length = 2. The loaded block shape will be acctually 8x32. Its default value is 1. 3. <code>boundary_check</code>: It is used to indicates the hardware whether to do out-of-boundary check. The default value is true.</p><h4 id=parameters>Parameters: <a class=headline-hash href=#parameters>¶</a></h4><table><thead><tr><th style=text-align:center>Parameter</th><th style=text-align:center>C++ type</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center>memory_space</td><td style=text-align:center><code>MemorySpaceAttr</code></td><td></td></tr><tr><td style=text-align:center>array_length</td><td style=text-align:center><code>IntegerAttr</code></td><td>1</td></tr><tr><td style=text-align:center>boundary_check</td><td style=text-align:center><code>BoolAttr</code></td><td>true</td></tr></tbody></table><h3 id=cachepolicyattr>CachePolicyAttr <a class=headline-hash href=#cachepolicyattr>¶</a></h3><p><em>Describe the cache settings for prefetch/load/store operators</em></p><p>Syntax:</p><pre tabindex=0><code>#xegpu.cache_hint< ::mlir::xegpu::CachePolicy # value > </code></pre><p>Enum cases:</p><ul><li>cached (<code>CACHED</code>)</li><li>uncached (<code>UNCACHED</code>)</li><li>streaming (<code>STREAMING</code>)</li><li>read_invalidate (<code>READ_INVALIDATE</code>)</li><li>write_back (<code>WRITE_BACK</code>)</li><li>write_through (<code>WRITE_THROUGH</code>)</li></ul><h4 id=parameters-1>Parameters: <a class=headline-hash href=#parameters-1>¶</a></h4><table><thead><tr><th style=text-align:center>Parameter</th><th style=text-align:center>C++ type</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center>value</td><td style=text-align:center><code>::mlir::xegpu::CachePolicy</code></td><td>an enum of type CachePolicy</td></tr></tbody></table><h3 id=fencescopeattr>FenceScopeAttr <a class=headline-hash href=#fencescopeattr>¶</a></h3><p><em>Describes the scope of fence. “workgroup” means that the scope is within each work group. “gpu” means the scope is across work groups within the gpu.</em></p><p>Syntax:</p><pre tabindex=0><code>#xegpu.fence_scope< ::mlir::xegpu::FenceScope # value > </code></pre><p>Enum cases:</p><ul><li>workgroup (<code>Workgroup</code>)</li><li>gpu (<code>GPU</code>)</li></ul><h4 id=parameters-2>Parameters: <a class=headline-hash href=#parameters-2>¶</a></h4><table><thead><tr><th style=text-align:center>Parameter</th><th style=text-align:center>C++ type</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center>value</td><td style=text-align:center><code>::mlir::xegpu::FenceScope</code></td><td>an enum of type FenceScope</td></tr></tbody></table><h3 id=memoryspaceattr>MemorySpaceAttr <a class=headline-hash href=#memoryspaceattr>¶</a></h3><p><em>Describe the location of data described by a <code>TensorDesc</code>: Global device memory (<code>Global</code>) or Shared local memory (<code>SLM</code>).</em></p><p>Syntax:</p><pre tabindex=0><code>#xegpu.memory_space< ::mlir::xegpu::MemorySpace # value > </code></pre><p>Enum cases:</p><ul><li>global (<code>Global</code>)</li><li>slm (<code>SLM</code>)</li></ul><h4 id=parameters-3>Parameters: <a class=headline-hash href=#parameters-3>¶</a></h4><table><thead><tr><th style=text-align:center>Parameter</th><th style=text-align:center>C++ type</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center>value</td><td style=text-align:center><code>::mlir::xegpu::MemorySpace</code></td><td>an enum of type MemorySpace</td></tr></tbody></table><h3 id=sgmapattr>SGMapAttr <a class=headline-hash href=#sgmapattr>¶</a></h3><p><em>Describes the mapping between work item (WI) and the 2D tensor specified by the tensor descriptor.</em></p><p>To distribute the XeGPU operation to work items, the tensor_desc must be specified with the sg_map attribute at the tensor description creation time. Within the <code>sg_map</code>, <code>wi_layout</code> specifies the layout of work items, describing the mapping of work items to the tensor. wi_layout[0] x wi_layout[1] must be equal to the total number of work items within a subgroup. <code>wi_data</code> specifies the minimum number of data elements assigned to each work item for a single distribution.</p><p>E.g., #xegpu.sg_map<wi_layout = [1, 16], wi_data = [1, 1]> In this example, the subgroup has 16 work items in wi_layout=[1, 16], each accessing 1 element as specified by wi_data=[1, 1].</p><p><code>wi_data[0] * wi_data[1]</code> can be greater than 1, meaning that each work item operates on multiple elements, which is eventually lowered to “SIMT-flavor” vector, like SPIR-V vector or llvm vector, or packed to a storage data type. The multiple elements indicated by <code>wi_data</code> can only be from one dimension and must be contiguous in the memory along either dimension.</p><h4 id=parameters-4>Parameters: <a class=headline-hash href=#parameters-4>¶</a></h4><table><thead><tr><th style=text-align:center>Parameter</th><th style=text-align:center>C++ type</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center>wi_layout</td><td style=text-align:center><code>::llvm::ArrayRef<uint32_t></code></td><td></td></tr><tr><td style=text-align:center>wi_data</td><td style=text-align:center><code>::llvm::ArrayRef<uint32_t></code></td><td></td></tr></tbody></table><h3 id=scattertensordescattr>ScatterTensorDescAttr <a class=headline-hash href=#scattertensordescattr>¶</a></h3><p><em>A composite attribute for <code>TensorDescType</code></em></p><p>Syntax:</p><pre tabindex=0><code>#xegpu.scatter_tdesc_attr< MemorySpaceAttr, # memory_space IntegerAttr # chunk_size > </code></pre><p><code>ScatterTensorDesc</code> is a composite attribute defined for <code>TensorDescType</code> for describing following properties of a <code>TensorDesc</code>:</p><ol><li><p><code>memory_space</code>: It describes where the data block described by the TensorDesc is located, <code>Global</code> device memory or <code>Shared</code> local memory. It is default to <code>Global</code>.</p></li><li><p><code>chunk_size</code>: indicates number of contiguous elements accessed for each offset, default is 1. It is used with <code>scattered</code> attr only.</p></li></ol><h4 id=parameters-5>Parameters: <a class=headline-hash href=#parameters-5>¶</a></h4><table><thead><tr><th style=text-align:center>Parameter</th><th style=text-align:center>C++ type</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center>memory_space</td><td style=text-align:center><code>MemorySpaceAttr</code></td><td>Data memory location</td></tr><tr><td style=text-align:center>chunk_size</td><td style=text-align:center><code>IntegerAttr</code></td><td>Number of contiguous elements</td></tr></tbody></table><h2 id=types>Types <a class=headline-hash href=#types>¶</a></h2><h3 id=nbarriertype>NbarrierType <a class=headline-hash href=#nbarriertype>¶</a></h3><p><em>!xegpu.nbarrier a custom XeGPU type representing a barrier.</em></p><p>Syntax: <code>!xegpu.nbarrier</code></p><h3 id=tensordesctype>TensorDescType <a class=headline-hash href=#tensordesctype>¶</a></h3><p><em>TensorDesc describing regions of interested data.</em></p><p>TensorDesc is a type designed to describe regions of the interested data as well as some features that are unique to Intel hardware. Different with the builtin tensor type in MLIR, it essentially only contains the meta data, and doesn’t hold the data by itself. It is designed to mainly support 2D block load/store and DPAS (matrix multiplication instruction) on Intel GPU. It encodes the following information:</p><ul><li>shape: the sizes/shape of the intereted data block, e.g., 8x16 means 8 rows and each row contains 16 contiguous data element. The rows could be either contiguous or not, depends on whether the encoding attribute is set or not.</li><li>element_type: the data type of the data element, e.g., f16, f32.</li></ul><p>Similar to the builtin tensor, it also provides an optinal attribute to encoding the following information via the TensorDescAttr object:</p><ul><li>memory_space (xegpu::MemorySpace): [optional] where the data is located, global memory or shared memory. It is default to Global.</li><li>array_length (int): [optional] The number of contiguous blocks with size as <code>shape</code>, that will be loaded by block load at a time. It is default to 1.</li><li>boundary_check (bool): [optional] indicates whether the operation detects the boundary and pads with zero for out-of-boundary access. It is default to do boundary check.</li></ul><p>Syntax:</p><pre tabindex=0><code>TensorDesc-type ::= `tensor_desc` `<` dim-list element-type (attr-list)? `>` element-type ::= float-type | integer-type | index-type dim-list := (static-dim-list `x`)? static-dim-list ::= decimal-literal `x` decimal-literal attr-list = (, memory_space = value)? (, arr_len = value)? (, boundary_check = value)? (, scattered = value)? (, sg_map `<` wi_layout = value, wi_data = value `>`)? </code></pre><p>Examples:</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-mlir data-lang=mlir><span class=line><span class=cl><span class=c>// A block TensorDesc with 8x16 i32 elements </span></span></span><span class=line><span class=cl><span class=c></span>xegpu<span class=p>.</span><span class=kt>tensor</span>_desc<span class=p><</span><span class=m>8x16x</span><span class=k>i32</span><span class=p>></span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl><span class=c>// A block TensorDesc with 8x16 f32 elements </span></span></span><span class=line><span class=cl><span class=c></span>xegpu<span class=p>.</span><span class=kt>tensor</span>_desc<span class=p><</span><span class=m>8x16x</span><span class=k>f32</span><span class=p>></span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl><span class=c>// A TensorDesc with 8x16 f32 elements for a memory region in shared memory space. </span></span></span><span class=line><span class=cl><span class=c></span>xegpu<span class=p>.</span><span class=kt>tensor</span>_desc<span class=p><</span><span class=m>8x16x</span><span class=k>f32</span><span class=p>,</span> <span class=nv>#xegpu.tdesc_attr</span><span class=p><</span><span class=nl>memory_space =</span> slm<span class=p>>></span> </span></span><span class=line><span class=cl> </span></span><span class=line><span class=cl><span class=c>// A TensorDesc with a sg_map </span></span></span><span class=line><span class=cl><span class=c></span>xegpu<span class=p>.</span><span class=kt>tensor</span>_desc<span class=p><</span><span class=m>8x16x</span><span class=k>f32</span><span class=p>,</span> <span class=nv>#xegpu.sg_map</span><span class=p><</span><span class=nl>wi_layout =</span> <span class=p>[</span><span class=m>1</span><span class=p>,</span> <span class=m>16</span><span class=p>],</span> <span class=nl>wi_data =</span> <span class=p>[</span><span class=m>1</span><span class=p>,</span> <span class=m>1</span><span class=p>]>></span> </span></span></code></pre></div><h4 id=parameters-6>Parameters: <a class=headline-hash href=#parameters-6>¶</a></h4><table><thead><tr><th style=text-align:center>Parameter</th><th style=text-align:center>C++ type</th><th>Description</th></tr></thead><tbody><tr><td style=text-align:center>shape</td><td style=text-align:center><code>::llvm::ArrayRef<int64_t></code></td><td></td></tr><tr><td style=text-align:center>elementType</td><td style=text-align:center><code>mlir::Type</code></td><td></td></tr><tr><td style=text-align:center>encoding</td><td style=text-align:center><code>mlir::Attribute</code></td><td></td></tr><tr><td style=text-align:center>sg_map</td><td style=text-align:center><code>mlir::Attribute</code></td><td></td></tr></tbody></table><h2 id=enums>Enums <a class=headline-hash href=#enums>¶</a></h2><h3 id=cmpfpredicate>CmpFPredicate <a class=headline-hash href=#cmpfpredicate>¶</a></h3><p><em>Allowed 64-bit signless integer cases: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15</em></p><h4 id=cases>Cases: <a class=headline-hash href=#cases>¶</a></h4><table><thead><tr><th style=text-align:center>Symbol</th><th style=text-align:center>Value</th><th>String</th></tr></thead><tbody><tr><td style=text-align:center>AlwaysFalse</td><td style=text-align:center><code>0</code></td><td>false</td></tr><tr><td style=text-align:center>OEQ</td><td style=text-align:center><code>1</code></td><td>oeq</td></tr><tr><td style=text-align:center>OGT</td><td style=text-align:center><code>2</code></td><td>ogt</td></tr><tr><td style=text-align:center>OGE</td><td style=text-align:center><code>3</code></td><td>oge</td></tr><tr><td style=text-align:center>OLT</td><td style=text-align:center><code>4</code></td><td>olt</td></tr><tr><td style=text-align:center>OLE</td><td style=text-align:center><code>5</code></td><td>ole</td></tr><tr><td style=text-align:center>ONE</td><td style=text-align:center><code>6</code></td><td>one</td></tr><tr><td style=text-align:center>ORD</td><td style=text-align:center><code>7</code></td><td>ord</td></tr><tr><td style=text-align:center>UEQ</td><td style=text-align:center><code>8</code></td><td>ueq</td></tr><tr><td style=text-align:center>UGT</td><td style=text-align:center><code>9</code></td><td>ugt</td></tr><tr><td style=text-align:center>UGE</td><td style=text-align:center><code>10</code></td><td>uge</td></tr><tr><td style=text-align:center>ULT</td><td style=text-align:center><code>11</code></td><td>ult</td></tr><tr><td style=text-align:center>ULE</td><td style=text-align:center><code>12</code></td><td>ule</td></tr><tr><td style=text-align:center>UNE</td><td style=text-align:center><code>13</code></td><td>une</td></tr><tr><td style=text-align:center>UNO</td><td style=text-align:center><code>14</code></td><td>uno</td></tr><tr><td style=text-align:center>AlwaysTrue</td><td style=text-align:center><code>15</code></td><td>true</td></tr></tbody></table><h3 id=cmpipredicate>CmpIPredicate <a class=headline-hash href=#cmpipredicate>¶</a></h3><p><em>Allowed 64-bit signless integer cases: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9</em></p><h4 id=cases-1>Cases: <a class=headline-hash href=#cases-1>¶</a></h4><table><thead><tr><th style=text-align:center>Symbol</th><th style=text-align:center>Value</th><th>String</th></tr></thead><tbody><tr><td style=text-align:center>eq</td><td style=text-align:center><code>0</code></td><td>eq</td></tr><tr><td style=text-align:center>ne</td><td style=text-align:center><code>1</code></td><td>ne</td></tr><tr><td style=text-align:center>slt</td><td style=text-align:center><code>2</code></td><td>slt</td></tr><tr><td style=text-align:center>sle</td><td style=text-align:center><code>3</code></td><td>sle</td></tr><tr><td style=text-align:center>sgt</td><td style=text-align:center><code>4</code></td><td>sgt</td></tr><tr><td style=text-align:center>sge</td><td style=text-align:center><code>5</code></td><td>sge</td></tr><tr><td style=text-align:center>ult</td><td style=text-align:center><code>6</code></td><td>ult</td></tr><tr><td style=text-align:center>ule</td><td style=text-align:center><code>7</code></td><td>ule</td></tr><tr><td style=text-align:center>ugt</td><td style=text-align:center><code>8</code></td><td>ugt</td></tr><tr><td style=text-align:center>uge</td><td style=text-align:center><code>9</code></td><td>uge</td></tr></tbody></table><h3 id=integeroverflowflags>IntegerOverflowFlags <a class=headline-hash href=#integeroverflowflags>¶</a></h3><p><em>Integer overflow arith flags</em></p><h4 id=cases-2>Cases: <a class=headline-hash href=#cases-2>¶</a></h4><table><thead><tr><th style=text-align:center>Symbol</th><th style=text-align:center>Value</th><th>String</th></tr></thead><tbody><tr><td style=text-align:center>none</td><td style=text-align:center><code>0</code></td><td>none</td></tr><tr><td style=text-align:center>nsw</td><td style=text-align:center><code>1</code></td><td>nsw</td></tr><tr><td style=text-align:center>nuw</td><td style=text-align:center><code>2</code></td><td>nuw</td></tr></tbody></table><h3 id=roundingmode>RoundingMode <a class=headline-hash href=#roundingmode>¶</a></h3><p><em>Floating point rounding mode</em></p><h4 id=cases-3>Cases: <a class=headline-hash href=#cases-3>¶</a></h4><table><thead><tr><th style=text-align:center>Symbol</th><th style=text-align:center>Value</th><th>String</th></tr></thead><tbody><tr><td style=text-align:center>to_nearest_even</td><td style=text-align:center><code>0</code></td><td>to_nearest_even</td></tr><tr><td style=text-align:center>downward</td><td style=text-align:center><code>1</code></td><td>downward</td></tr><tr><td style=text-align:center>upward</td><td style=text-align:center><code>2</code></td><td>upward</td></tr><tr><td style=text-align:center>toward_zero</td><td style=text-align:center><code>3</code></td><td>toward_zero</td></tr><tr><td style=text-align:center>to_nearest_away</td><td style=text-align:center><code>4</code></td><td>to_nearest_away</td></tr></tbody></table><h3 id=atomicrmwkind>AtomicRMWKind <a class=headline-hash href=#atomicrmwkind>¶</a></h3><p><em>Allowed 64-bit signless integer cases: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14</em></p><h4 id=cases-4>Cases: <a class=headline-hash href=#cases-4>¶</a></h4><table><thead><tr><th style=text-align:center>Symbol</th><th style=text-align:center>Value</th><th>String</th></tr></thead><tbody><tr><td style=text-align:center>addf</td><td style=text-align:center><code>0</code></td><td>addf</td></tr><tr><td style=text-align:center>addi</td><td style=text-align:center><code>1</code></td><td>addi</td></tr><tr><td style=text-align:center>assign</td><td style=text-align:center><code>2</code></td><td>assign</td></tr><tr><td style=text-align:center>maximumf</td><td style=text-align:center><code>3</code></td><td>maximumf</td></tr><tr><td style=text-align:center>maxs</td><td style=text-align:center><code>4</code></td><td>maxs</td></tr><tr><td style=text-align:center>maxu</td><td style=text-align:center><code>5</code></td><td>maxu</td></tr><tr><td style=text-align:center>minimumf</td><td style=text-align:center><code>6</code></td><td>minimumf</td></tr><tr><td style=text-align:center>mins</td><td style=text-align:center><code>7</code></td><td>mins</td></tr><tr><td style=text-align:center>minu</td><td style=text-align:center><code>8</code></td><td>minu</td></tr><tr><td style=text-align:center>mulf</td><td style=text-align:center><code>9</code></td><td>mulf</td></tr><tr><td style=text-align:center>muli</td><td style=text-align:center><code>10</code></td><td>muli</td></tr><tr><td style=text-align:center>ori</td><td style=text-align:center><code>11</code></td><td>ori</td></tr><tr><td style=text-align:center>andi</td><td style=text-align:center><code>12</code></td><td>andi</td></tr><tr><td style=text-align:center>maxnumf</td><td style=text-align:center><code>13</code></td><td>maxnumf</td></tr><tr><td style=text-align:center>minnumf</td><td style=text-align:center><code>14</code></td><td>minnumf</td></tr></tbody></table><h3 id=fastmathflags>FastMathFlags <a class=headline-hash href=#fastmathflags>¶</a></h3><p><em>Floating point fast math flags</em></p><h4 id=cases-5>Cases: <a class=headline-hash href=#cases-5>¶</a></h4><table><thead><tr><th style=text-align:center>Symbol</th><th style=text-align:center>Value</th><th>String</th></tr></thead><tbody><tr><td style=text-align:center>none</td><td style=text-align:center><code>0</code></td><td>none</td></tr><tr><td style=text-align:center>reassoc</td><td style=text-align:center><code>1</code></td><td>reassoc</td></tr><tr><td style=text-align:center>nnan</td><td style=text-align:center><code>2</code></td><td>nnan</td></tr><tr><td style=text-align:center>ninf</td><td style=text-align:center><code>4</code></td><td>ninf</td></tr><tr><td style=text-align:center>nsz</td><td style=text-align:center><code>8</code></td><td>nsz</td></tr><tr><td style=text-align:center>arcp</td><td style=text-align:center><code>16</code></td><td>arcp</td></tr><tr><td style=text-align:center>contract</td><td style=text-align:center><code>32</code></td><td>contract</td></tr><tr><td style=text-align:center>afn</td><td style=text-align:center><code>64</code></td><td>afn</td></tr><tr><td style=text-align:center>fast</td><td style=text-align:center><code>127</code></td><td>fast</td></tr></tbody></table><h3 id=cachepolicy>CachePolicy <a class=headline-hash href=#cachepolicy>¶</a></h3><p><em>Cache policy</em></p><h4 id=cases-6>Cases: <a class=headline-hash href=#cases-6>¶</a></h4><table><thead><tr><th style=text-align:center>Symbol</th><th style=text-align:center>Value</th><th>String</th></tr></thead><tbody><tr><td style=text-align:center>CACHED</td><td style=text-align:center><code>0</code></td><td>cached</td></tr><tr><td style=text-align:center>UNCACHED</td><td style=text-align:center><code>1</code></td><td>uncached</td></tr><tr><td style=text-align:center>STREAMING</td><td style=text-align:center><code>2</code></td><td>streaming</td></tr><tr><td style=text-align:center>READ_INVALIDATE</td><td style=text-align:center><code>3</code></td><td>read_invalidate</td></tr><tr><td style=text-align:center>WRITE_BACK</td><td style=text-align:center><code>4</code></td><td>write_back</td></tr><tr><td style=text-align:center>WRITE_THROUGH</td><td style=text-align:center><code>5</code></td><td>write_through</td></tr></tbody></table><h3 id=fencescope>FenceScope <a class=headline-hash href=#fencescope>¶</a></h3><p><em>The enumeration for the scope of fence operation.</em></p><h4 id=cases-7>Cases: <a class=headline-hash href=#cases-7>¶</a></h4><table><thead><tr><th style=text-align:center>Symbol</th><th style=text-align:center>Value</th><th>String</th></tr></thead><tbody><tr><td style=text-align:center>Workgroup</td><td style=text-align:center><code>0</code></td><td>workgroup</td></tr><tr><td style=text-align:center>GPU</td><td style=text-align:center><code>1</code></td><td>gpu</td></tr></tbody></table><h3 id=memoryspace>MemorySpace <a class=headline-hash href=#memoryspace>¶</a></h3><p><em>The address space of the memory the tensor descritor is created for</em></p><h4 id=cases-8>Cases: <a class=headline-hash href=#cases-8>¶</a></h4><table><thead><tr><th style=text-align:center>Symbol</th><th style=text-align:center>Value</th><th>String</th></tr></thead><tbody><tr><td style=text-align:center>Global</td><td style=text-align:center><code>0</code></td><td>global</td></tr><tr><td style=text-align:center>SLM</td><td style=text-align:center><code>3</code></td><td>slm</td></tr></tbody></table><div class=edit-meta><br></div><nav class=pagination><a class="nav nav-prev" href=https://mlir.llvm.org/docs/Dialects/X86Vector/ title="'x86vector' Dialect"><i class="fas fa-arrow-left" aria-hidden=true></i> Prev - 'x86vector' Dialect</a> <a class="nav nav-next" href=https://mlir.llvm.org/docs/Dialects/Builtin/ title="Builtin Dialect">Next - Builtin Dialect <i class="fas fa-arrow-right" aria-hidden=true></i></a></nav><footer><p class=powered>Powered by <a href=https://gohugo.io>Hugo</a>. Theme by <a href=https://themes.gohugo.io/hugo-theme-techdoc/>TechDoc</a>. Designed by <a href=https://github.com/thingsym/hugo-theme-techdoc>Thingsym</a>.</p></footer></main><div class=sidebar><nav class=slide-menu><ul><li><a href=https://mlir.llvm.org/>Home</a></li><li><a href=https://mlir.llvm.org/users/>Users of MLIR</a></li><li><a href=https://mlir.llvm.org/pubs/>MLIR Related Publications</a></li><li><a href=https://mlir.llvm.org/talks/>Talks</a></li><li><a href=https://mlir.llvm.org/deprecation/>Deprecations & Current Refactoring</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/getting_started/>Getting Started<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/getting_started/ReportingIssues/>Reporting Issues</a></li><li><a href=https://mlir.llvm.org/getting_started/Debugging/>Debugging Tips</a></li><li><a href=https://mlir.llvm.org/getting_started/Faq/>FAQ</a></li><li><a href=https://mlir.llvm.org/getting_started/Contributing/>How to Contribute</a></li><li><a href=https://mlir.llvm.org/getting_started/DeveloperGuide/>Developer Guide</a></li><li><a href=https://mlir.llvm.org/getting_started/openprojects/>Open Projects</a></li><li><a href=https://mlir.llvm.org/getting_started/Glossary/>Glossary</a></li><li><a href=https://mlir.llvm.org/getting_started/TestingGuide/>Testing Guide</a></li></ul></li><li class="parent has-sub-menu"><a href=https://mlir.llvm.org/docs/>Code Documentation<span class="mark opened">-</span></a><ul class=sub-menu><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Bindings/>Bindings<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Bindings/Python/>MLIR Python Bindings</a></li></ul></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Tools/>Tools<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Tools/MLIRLSP/>MLIR : Language Server Protocol</a></li><li><a href=https://mlir.llvm.org/docs/Tools/mlir-reduce/>MLIR Reduce</a></li><li><a href=https://mlir.llvm.org/docs/Tools/mlir-rewrite/>mlir-rewrite</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/QuantPasses/></a></li><li><a href=https://mlir.llvm.org/docs/ActionTracing/>Action: Tracing and Debugging MLIR-based Compilers</a></li><li><a href=https://mlir.llvm.org/docs/Bufferization/>Bufferization</a></li><li><a href=https://mlir.llvm.org/docs/DataLayout/>Data Layout Modeling</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/DefiningDialects/>Defining Dialects<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/DefiningDialects/Constraints/>Constraints</a></li><li><a href=https://mlir.llvm.org/docs/DefiningDialects/Assembly/>Customizing Assembly Behavior</a></li><li><a href=https://mlir.llvm.org/docs/DefiningDialects/AttributesAndTypes/>Defining Dialect Attributes and Types</a></li><li><a href=https://mlir.llvm.org/docs/DefiningDialects/Operations/>Operation Definition Specification (ODS)</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Diagnostics/>Diagnostic Infrastructure</a></li><li><a href=https://mlir.llvm.org/docs/DialectConversion/>Dialect Conversion</a></li><li class="parent has-sub-menu"><a href=https://mlir.llvm.org/docs/Dialects/>Dialects<span class="mark opened">-</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Dialects/OpenACCDialect/>'acc' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Affine/>'affine' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/AMDGPU/>'amdgpu' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/AMX/>'amx' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ArithOps/>'arith' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ArmNeon/>'arm_neon' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ArmSVE/>'arm_sve' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ArmSME/>'ArmSME' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/AsyncDialect/>'async' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/BufferizationOps/>'bufferization' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ControlFlowDialect/>'cf' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ComplexOps/>'complex' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/DLTIDialect/>'dlti' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/EmitC/>'emitc' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Func/>'func' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/GPU/>'gpu' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/IndexOps/>'index' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/IRDL/>'irdl' Dialect</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Dialects/Linalg/>'linalg' Dialect<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Dialects/Linalg/OpDSL/>Linalg OpDSL</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Dialects/LLVM/>'llvm' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MathOps/>'math' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MemRef/>'memref' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Mesh/>'mesh' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MLProgramOps/>'ml_program' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MPI/>'mpi' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/NVGPU/>'nvgpu' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/NVVMDialect/>'nvvm' Dialect</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Dialects/OpenMPDialect/>'omp' Dialect<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Dialects/OpenMPDialect/ODS/>ODS Documentation</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Dialects/PDLInterpOps/>'pdl_interp' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/PDLOps/>'pdl' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/PolynomialDialect/>'polynomial' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/PtrOps/>'ptr' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/QuantDialect/>'quant' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ROCDLDialect/>'rocdl' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/SCFDialect/>'scf' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/ShapeDialect/>'shape' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/SparseTensorOps/>'sparse_tensor' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/TensorOps/>'tensor' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/UBOps/>'ub' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/VCIXDialect/>'vcix' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Vector/>'vector' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/X86Vector/>'x86vector' Dialect</a></li><li class=active><a href=https://mlir.llvm.org/docs/Dialects/XeGPU/>'xegpu' Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Builtin/>Builtin Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/MatchOpInterfaces/>OpInterface definitions</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/SPIR-V/>SPIR-V Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/TOSA/>Tensor Operator Set Architecture (TOSA) Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Dialects/Transform/>Transform Dialect</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Interfaces/>Interfaces</a></li><li><a href=https://mlir.llvm.org/docs/TargetLLVMIR/>LLVM IR Target</a></li><li><a href=https://mlir.llvm.org/docs/BytecodeFormat/>MLIR Bytecode Format</a></li><li><a href=https://mlir.llvm.org/docs/CAPI/>MLIR C API</a></li><li><a href=https://mlir.llvm.org/docs/LangRef/>MLIR Language Reference</a></li><li><a href=https://mlir.llvm.org/docs/ReleaseNotes/>MLIR Release Notes</a></li><li><a href=https://mlir.llvm.org/docs/Canonicalization/>Operation Canonicalization</a></li><li><a href=https://mlir.llvm.org/docs/OwnershipBasedBufferDeallocation/>Ownership-based Buffer Deallocation</a></li><li><a href=https://mlir.llvm.org/docs/PassManagement/>Pass Infrastructure</a></li><li><a href=https://mlir.llvm.org/docs/Passes/>Passes</a></li><li><a href=https://mlir.llvm.org/docs/PatternRewriter/>Pattern Rewriting : Generic DAG-to-DAG Rewriting</a></li><li><a href=https://mlir.llvm.org/docs/PDLL/>PDLL - PDL Language</a></li><li><a href=https://mlir.llvm.org/docs/Quantization/>Quantization</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Rationale/>Rationale<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Rationale/RationaleGenericDAGRewriter/>Generic DAG Rewriter Infrastructure Rationale</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/RationaleLinalgDialect/>Linalg Dialect Rationale: The Case For Compiler-Friendly Custom Operations</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/Rationale/>MLIR Rationale</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/MLIRForGraphAlgorithms/>MLIR: Incremental Application to Graph Algorithms in ML Frameworks</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/RationaleSimplifiedPolyhedralForm/>MLIR: The case for a simplified polyhedral form</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/SideEffectsAndSpeculation/>Side Effects & Speculation</a></li><li><a href=https://mlir.llvm.org/docs/Rationale/UsageOfConst/>Usage of 'const' in MLIR, for core IR types</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/ShapeInference/>Shape Inference</a></li><li><a href=https://mlir.llvm.org/docs/SPIRVToLLVMDialectConversion/>SPIR-V Dialect to LLVM Dialect conversion manual</a></li><li><a href=https://mlir.llvm.org/docs/SymbolsAndSymbolTables/>Symbols and Symbol Tables</a></li><li><a href=https://mlir.llvm.org/docs/DeclarativeRewrites/>Table-driven Declarative Rewrite Rule (DRR)</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Traits/>Traits<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Traits/Broadcastable/>The `Broadcastable` Trait</a></li></ul></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Tutorials/>Tutorials<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Tutorials/CreatingADialect/>Creating a Dialect</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/QuickstartRewrites/>Quickstart tutorial to adding MLIR graph rewrite</a></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Tutorials/Toy/>Toy Tutorial<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-1/>Chapter 1: Toy Language and AST</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-2/>Chapter 2: Emitting Basic MLIR</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-3/>Chapter 3: High-level Language-Specific Analysis and Transformation</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-4/>Chapter 4: Enabling Generic Transformation with Interfaces</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-5/>Chapter 5: Partial Lowering to Lower-Level Dialects for Optimization</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-6/>Chapter 6: Lowering to LLVM and CodeGeneration</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/Toy/Ch-7/>Chapter 7: Adding a Composite Type to Toy</a></li></ul></li><li class=has-sub-menu><a href=https://mlir.llvm.org/docs/Tutorials/transform/>Transform Dialect Tutorial<span class="mark closed">+</span></a><ul class=sub-menu><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch0/>Chapter 0: A Primer on “Structured” Linalg Operations</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch1/>Chapter 1: Combining Existing Transformations</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch2/>Chapter 2: Adding a Simple New Transformation Operation</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch3/>Chapter 3: More than Simple Transform Operations</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/Ch4/>Chapter 4: Matching Payload with Transform Operations</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/transform/ChH/>Chapter H: Reproducing Halide Schedule</a></li></ul></li><li><a href=https://mlir.llvm.org/docs/Tutorials/UnderstandingTheIRStructure/>Understanding the IR Structure</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/MlirOpt/>Using `mlir-opt`</a></li><li><a href=https://mlir.llvm.org/docs/Tutorials/DataFlowAnalysis/>Writing DataFlow Analyses in MLIR</a></li></ul></li></ul></li></ul></nav><div class=sidebar-footer></div></div></div><a href=# id=backtothetop-fixed class=backtothetop data-backtothetop-duration=600 data-backtothetop-easing=easeOutQuart data-backtothetop-fixed-fadein=1000 data-backtothetop-fixed-fadeout=1000 data-backtothetop-fixed-bottom=10 data-backtothetop-fixed-right=20><span class="fa-layers fa-fw"><i class="fas fa-circle"></i> <i class="fas fa-arrow-circle-up"></i></span></a></div></body></html>