CINXE.COM
pytorch-quantization master documentation
<!DOCTYPE html> <html class="writer-html5" lang="en" > <head> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>pytorch-quantization master documentation</title> <link rel="stylesheet" href="_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="_static/css/theme.css" type="text/css" /> <!--[if lt IE 9]> <script src="_static/js/html5shiv.min.js"></script> <![endif]--> <script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script> <script src="_static/jquery.js"></script> <script src="_static/underscore.js"></script> <script src="_static/doctools.js"></script> <script src="_static/js/theme.js"></script> <link rel="index" title="Index" href="genindex.html" /> <link rel="search" title="Search" href="search.html" /> <script src="//assets.adobedtm.com/5d4962a43b79/c1061d2c5e7b/launch-191c2462b890.min.js"></script> </head> <body class="wy-body-for-nav"> <div class="wy-grid-for-nav"> <nav data-toggle="wy-nav-shift" class="wy-nav-side"> <div class="wy-side-scroll"> <div class="wy-side-nav-search" > <a href="#" class="icon icon-home"> pytorch-quantization </a> <div class="version"> 2.2.1 </div> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> <p class="caption" role="heading"><span class="caption-text">User Guide</span></p> <ul> <li class="toctree-l1"><a class="reference internal" href="index.html#document-userguide">Basic Functionalities</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#post-training-quantization">Post training quantization</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#quantization-aware-training">Quantization Aware Training</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#export-to-onnx">Export to ONNX</a></li> </ul> <p class="caption" role="heading"><span class="caption-text">Tutorials</span></p> <ul> <li class="toctree-l1"><a class="reference internal" href="index.html#document-tutorials/quant_resnet50">Quantizing Resnet50</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#document-tutorials/creating_custom_quantized_modules">Creating Custom Quantized Modules</a></li> </ul> <p class="caption" role="heading"><span class="caption-text">Package Reference</span></p> <ul> <li class="toctree-l1"><a class="reference internal" href="index.html#document-calib">pytorch_quantization.calib</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#document-nn">pytorch_quantization.nn</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#document-functional">pytorch_quantization.nn.functional</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#document-optim">pytorch_quantization.optim.helper</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#document-tensor_quant">pytorch_quantization.tensor_quant</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#document-utils">pytorch_quantization.utils</a></li> </ul> </div> </div> </nav> <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <i data-toggle="wy-nav-top" class="fa fa-bars"></i> <a href="#">pytorch-quantization</a> </nav> <div class="wy-nav-content"> <div class="rst-content"> <div role="navigation" aria-label="Page navigation"> <ul class="wy-breadcrumbs"> <li><a href="#" class="icon icon-home"></a> »</li> <li>pytorch-quantization master documentation</li> <li class="wy-breadcrumbs-aside"> </li> </ul> <hr/> </div> <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article"> <div itemprop="articleBody"> <div class="section" id="pytorch-quantization-s-documentation"> <h1>pytorch-quantization’s documentation<a class="headerlink" href="#pytorch-quantization-s-documentation" title="Permalink to this headline"></a></h1> <div class="toctree-wrapper compound"> <span id="document-userguide"></span><div class="section" id="basic-functionalities"> <h2>Basic Functionalities<a class="headerlink" href="#basic-functionalities" title="Permalink to this headline"></a></h2> <div class="section" id="quantization-function"> <h3>Quantization function<a class="headerlink" href="#quantization-function" title="Permalink to this headline"></a></h3> <p><code class="docutils literal notranslate"><span class="pre">tensor_quant</span></code> and <code class="docutils literal notranslate"><span class="pre">fake_tensor_quant</span></code> are 2 basic functions to quantize a tensor. <code class="docutils literal notranslate"><span class="pre">fake_tensor_quant</span></code> returns fake quantized tensor (float value). <code class="docutils literal notranslate"><span class="pre">tensor_quant</span></code> returns quantized tensor (integer value) and scale.</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">tensor_quant</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">amax</span><span class="p">,</span> <span class="n">num_bits</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">output_dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float</span><span class="p">,</span> <span class="n">unsigned</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="n">fake_tensor_quant</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">amax</span><span class="p">,</span> <span class="n">num_bits</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">output_dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float</span><span class="p">,</span> <span class="n">unsigned</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> </pre></div> </div> <p>Example:</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pytorch_quantization</span> <span class="kn">import</span> <span class="n">tensor_quant</span> <span class="c1"># Generate random input. With fixed seed 12345, x should be</span> <span class="c1"># tensor([0.9817, 0.8796, 0.9921, 0.4611, 0.0832, 0.1784, 0.3674, 0.5676, 0.3376, 0.2119])</span> <span class="n">torch</span><span class="o">.</span><span class="n">manual_seed</span><span class="p">(</span><span class="mi">12345</span><span class="p">)</span> <span class="n">x</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span> <span class="c1"># fake quantize tensor x. fake_quant_x will be</span> <span class="c1"># tensor([0.9843, 0.8828, 0.9921, 0.4609, 0.0859, 0.1797, 0.3672, 0.5703, 0.3359, 0.2109])</span> <span class="n">fake_quant_x</span> <span class="o">=</span> <span class="n">tensor_quant</span><span class="o">.</span><span class="n">fake_tensor_quant</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x</span><span class="o">.</span><span class="n">abs</span><span class="p">()</span><span class="o">.</span><span class="n">max</span><span class="p">())</span> <span class="c1"># quantize tensor x. quant_x will be</span> <span class="c1"># tensor([126., 113., 127., 59., 11., 23., 47., 73., 43., 27.])</span> <span class="c1"># with scale=128.0057</span> <span class="n">quant_x</span><span class="p">,</span> <span class="n">scale</span> <span class="o">=</span> <span class="n">tensor_quant</span><span class="o">.</span><span class="n">tensor_quant</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x</span><span class="o">.</span><span class="n">abs</span><span class="p">()</span><span class="o">.</span><span class="n">max</span><span class="p">())</span> </pre></div> </div> <p>Backward of both functions are defined as <a class="reference external" href="https://arxiv.org/abs/1308.3432">Straight-Through Estimator (STE)</a>.</p> </div> <div class="section" id="descriptor-and-quantizer"> <h3>Descriptor and quantizer<a class="headerlink" href="#descriptor-and-quantizer" title="Permalink to this headline"></a></h3> <p><code class="docutils literal notranslate"><span class="pre">QuantDescriptor</span></code> defines how a tensor should be quantized. There are also some predefined <code class="docutils literal notranslate"><span class="pre">QuantDescriptor</span></code>, e.g. <code class="docutils literal notranslate"><span class="pre">QUANT_DESC_8BIT_PER_TENSOR</span></code> and <code class="docutils literal notranslate"><span class="pre">QUANT_DESC_8BIT_CONV2D_WEIGHT_PER_CHANNEL</span></code>.</p> <p><code class="docutils literal notranslate"><span class="pre">TensorQuantizer</span></code> is the module for quantizing tensors and defined by <code class="docutils literal notranslate"><span class="pre">QuantDescriptor</span></code>.</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pytorch_quantization.tensor_quant</span> <span class="kn">import</span> <span class="n">QuantDescriptor</span> <span class="kn">from</span> <span class="nn">pytorch_quantization.nn.modules.tensor_quantizer</span> <span class="kn">import</span> <span class="n">TensorQuantizer</span> <span class="n">quant_desc</span> <span class="o">=</span> <span class="n">QuantDescriptor</span><span class="p">(</span><span class="n">num_bits</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">fake_quant</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="n">unsigned</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> <span class="n">quantizer</span> <span class="o">=</span> <span class="n">TensorQuantizer</span><span class="p">(</span><span class="n">quant_desc</span><span class="p">)</span> <span class="n">torch</span><span class="o">.</span><span class="n">manual_seed</span><span class="p">(</span><span class="mi">12345</span><span class="p">)</span> <span class="n">x</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">7</span><span class="p">)</span> <span class="n">quant_x</span> <span class="o">=</span> <span class="n">quantizer</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> </pre></div> </div> <p>If <code class="docutils literal notranslate"><span class="pre">amax</span></code> is given in the <a class="reference internal" href="index.html#pytorch_quantization.tensor_quant.QuantDescriptor" title="pytorch_quantization.tensor_quant.QuantDescriptor"><code class="xref py py-func docutils literal notranslate"><span class="pre">QuantDescriptor</span></code></a>, <a class="reference internal" href="index.html#pytorch_quantization.nn.TensorQuantizer" title="pytorch_quantization.nn.TensorQuantizer"><code class="xref py py-func docutils literal notranslate"><span class="pre">TensorQuantizer</span></code></a> will use it to quantize. Otherwise, <a class="reference internal" href="index.html#pytorch_quantization.nn.TensorQuantizer" title="pytorch_quantization.nn.TensorQuantizer"><code class="xref py py-func docutils literal notranslate"><span class="pre">TensorQuantizer</span></code></a> will compute amax then quantize. amax will be computed w.r.t <code class="docutils literal notranslate"><span class="pre">axis</span></code> specified. Note that <code class="docutils literal notranslate"><span class="pre">axis</span></code> of QuantDescriptor specify remaining axis as oppsed to axis of <a class="reference external" href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.amax.html">max()</a>.</p> </div> <div class="section" id="quantized-module"> <h3>Quantized module<a class="headerlink" href="#quantized-module" title="Permalink to this headline"></a></h3> <p>There are 2 major types of module, <code class="docutils literal notranslate"><span class="pre">Conv</span></code> and <code class="docutils literal notranslate"><span class="pre">Linear</span></code>. Both can replace <code class="docutils literal notranslate"><span class="pre">torch.nn</span></code> version and apply quantization on both weight and activation.</p> <p>Both take <code class="docutils literal notranslate"><span class="pre">quant_desc_input</span></code> and <code class="docutils literal notranslate"><span class="pre">quant_desc_weight</span></code> in addition to arguments of the original module.</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">torch</span> <span class="kn">import</span> <span class="n">nn</span> <span class="kn">from</span> <span class="nn">pytorch_quantization</span> <span class="kn">import</span> <span class="n">tensor_quant</span> <span class="kn">import</span> <span class="nn">pytorch_quantization.nn</span> <span class="k">as</span> <span class="nn">quant_nn</span> <span class="c1"># pytorch's module</span> <span class="n">fc1</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">in_features</span><span class="p">,</span> <span class="n">out_features</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> <span class="n">conv1</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">in_channels</span><span class="p">,</span> <span class="n">out_channels</span><span class="p">,</span> <span class="n">kernel_size</span><span class="p">)</span> <span class="c1"># quantized version</span> <span class="n">quant_fc1</span> <span class="o">=</span> <span class="n">quant_nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span> <span class="n">in_features</span><span class="p">,</span> <span class="n">out_features</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">quant_desc_input</span><span class="o">=</span><span class="n">tensor_quant</span><span class="o">.</span><span class="n">QUANT_DESC_8BIT_PER_TENSOR</span><span class="p">,</span> <span class="n">quant_desc_weight</span><span class="o">=</span><span class="n">tensor_quant</span><span class="o">.</span><span class="n">QUANT_DESC_8BIT_LINEAR_WEIGHT_PER_ROW</span><span class="p">)</span> <span class="n">quant_conv1</span> <span class="o">=</span> <span class="n">quant_nn</span><span class="o">.</span><span class="n">Conv2d</span><span class="p">(</span> <span class="n">in_channels</span><span class="p">,</span> <span class="n">out_channels</span><span class="p">,</span> <span class="n">kernel_size</span><span class="p">,</span> <span class="n">quant_desc_input</span><span class="o">=</span><span class="n">tensor_quant</span><span class="o">.</span><span class="n">QUANT_DESC_8BIT_PER_TENSOR</span><span class="p">,</span> <span class="n">quant_desc_weight</span><span class="o">=</span><span class="n">tensor_quant</span><span class="o">.</span><span class="n">QUANT_DESC_8BIT_CONV2D_WEIGHT_PER_CHANNEL</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="section" id="post-training-quantization"> <h2>Post training quantization<a class="headerlink" href="#post-training-quantization" title="Permalink to this headline"></a></h2> <p>A model can be post training quantized by simply by calling <code class="docutils literal notranslate"><span class="pre">quant_modules.initialize()</span></code></p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pytorch_quantization</span> <span class="kn">import</span> <span class="n">quant_modules</span> <span class="n">model</span> <span class="o">=</span> <span class="n">torchvision</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">resnet50</span><span class="p">()</span> </pre></div> </div> <p>If a model is not entirely defined by module, than TensorQuantizer should be manually created and added to the right place in the model.</p> <div class="section" id="calibration"> <h3>Calibration<a class="headerlink" href="#calibration" title="Permalink to this headline"></a></h3> <p>Calibration is the TensorRT terminology of passing data samples to the quantizer and deciding the best amax for activations. We support 3 calibration methods:</p> <ul class="simple"> <li><p><code class="docutils literal notranslate"><span class="pre">max</span></code>: Simply use global maximum absolute value</p></li> <li><p><code class="docutils literal notranslate"><span class="pre">entropy</span></code>: TensorRT’s entropy calibration</p></li> <li><p><code class="docutils literal notranslate"><span class="pre">percentile</span></code>: Get rid of outlier based on given percentile.</p></li> <li><p><code class="docutils literal notranslate"><span class="pre">mse</span></code>: MSE(Mean Squared Error) based calibration</p></li> </ul> <p>In above ResNet50 example, calibration method is set to <code class="docutils literal notranslate"><span class="pre">mse</span></code>, it can be used as the following example:</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># Find the TensorQuantizer and enable calibration</span> <span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">module</span> <span class="ow">in</span> <span class="n">model</span><span class="o">.</span><span class="n">named_modules</span><span class="p">():</span> <span class="k">if</span> <span class="n">name</span><span class="o">.</span><span class="n">endswith</span><span class="p">(</span><span class="s1">'_quantizer'</span><span class="p">):</span> <span class="n">module</span><span class="o">.</span><span class="n">enable_calib</span><span class="p">()</span> <span class="n">module</span><span class="o">.</span><span class="n">disable_quant</span><span class="p">()</span> <span class="c1"># Use full precision data to calibrate</span> <span class="c1"># Feeding data samples</span> <span class="n">model</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="c1"># ...</span> <span class="c1"># Finalize calibration</span> <span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">module</span> <span class="ow">in</span> <span class="n">model</span><span class="o">.</span><span class="n">named_modules</span><span class="p">():</span> <span class="k">if</span> <span class="n">name</span><span class="o">.</span><span class="n">endswith</span><span class="p">(</span><span class="s1">'_quantizer'</span><span class="p">):</span> <span class="n">module</span><span class="o">.</span><span class="n">load_calib_amax</span><span class="p">()</span> <span class="n">module</span><span class="o">.</span><span class="n">disable_calib</span><span class="p">()</span> <span class="n">module</span><span class="o">.</span><span class="n">enable_quant</span><span class="p">()</span> <span class="c1"># If running on GPU, it needs to call .cuda() again because new tensors will be created by calibration process</span> <span class="n">model</span><span class="o">.</span><span class="n">cuda</span><span class="p">()</span> <span class="c1"># Keep running the quantized model</span> <span class="c1"># ...</span> </pre></div> </div> <div class="admonition note"> <p class="admonition-title">Note</p> <p>Calibration needs to be performed before exporting the model to ONNX.</p> </div> </div> </div> <div class="section" id="quantization-aware-training"> <h2>Quantization Aware Training<a class="headerlink" href="#quantization-aware-training" title="Permalink to this headline"></a></h2> <p>Quantization Aware Training is based on Straight Through Estimator (STE) derivative approximation. It is some time known as “quantization aware training”. We don’t use the name because it doesn’t reflect the underneath assumption. If anything, it makes training being “unaware” of quantization because of the STE approximation.</p> <p>After calibration is done, Quantization Aware Training is simply select a training schedule and continue training the calibrated model. Usually, it doesn’t need to fine tune very long. We usually use around 10% of the original training schedule, starting at 1% of the initial training learning rate, and a cosine annealing learning rate schedule that follows the decreasing half of a cosine period, down to 1% of the initial fine tuning learning rate (0.01% of the initial training learning rate).</p> <div class="section" id="some-recommendations"> <h3>Some recommendations<a class="headerlink" href="#some-recommendations" title="Permalink to this headline"></a></h3> <p>Quantization Aware Training (Essentially a discrete numerical optimization problem) is not a solved problem mathematically. Based on our experience, here are some recommendations:</p> <ul class="simple"> <li><p>For STE approximation to work well, it is better to use small learning rate. Large learning rate is more likely to enlarge the variance introduced by STE approximation and destroy the trained network.</p></li> <li><p>Do not change quantization representation (scale) during training, at least not too frequently. Changing scale every step, it is effectively like changing data format (e8m7, e5m10, e3m4, et.al) every step, which will easily affect convergence.</p></li> </ul> </div> </div> <div class="section" id="export-to-onnx"> <h2>Export to ONNX<a class="headerlink" href="#export-to-onnx" title="Permalink to this headline"></a></h2> <p>The goal of exporting to ONNX is to deploy to TensorRT, not to ONNX runtime. So we only export fake quantized model into a form TensorRT will take. Fake quantization will be broken into a pair of QuantizeLinear/DequantizeLinear ONNX ops. TensorRT will take the generated ONNX graph, and execute it in int8 in the most optimized way to its capability.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>Currently, we only support exporting int8 and fp8 fake quantized modules. Additionally, quantized modules need to be calibrated before exporting to ONNX.</p> </div> <p>Fake quantized model can be exported to ONNX as any other Pytorch model. Please learn more about exporting a Pytorch model to ONNX at <a class="reference external" href="https://pytorch.org/docs/stable/onnx.html?highlight=onnx#module-torch.onnx">torch.onnx</a>. For example:</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pytorch_quantization</span> <span class="kn">from</span> <span class="nn">pytorch_quantization</span> <span class="kn">import</span> <span class="n">nn</span> <span class="k">as</span> <span class="n">quant_nn</span> <span class="kn">from</span> <span class="nn">pytorch_quantization</span> <span class="kn">import</span> <span class="n">quant_modules</span> <span class="n">quant_modules</span><span class="o">.</span><span class="n">initialize</span><span class="p">()</span> <span class="n">model</span> <span class="o">=</span> <span class="n">torchvision</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">resnet50</span><span class="p">()</span> <span class="c1"># load the calibrated model</span> <span class="n">state_dict</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"quant_resnet50-entropy-1024.pth"</span><span class="p">,</span> <span class="n">map_location</span><span class="o">=</span><span class="s2">"cpu"</span><span class="p">)</span> <span class="n">model</span><span class="o">.</span><span class="n">load_state_dict</span><span class="p">(</span><span class="n">state_dict</span><span class="p">)</span> <span class="n">model</span><span class="o">.</span><span class="n">cuda</span><span class="p">()</span> <span class="n">dummy_input</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">224</span><span class="p">,</span> <span class="mi">224</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span> <span class="n">input_names</span> <span class="o">=</span> <span class="p">[</span> <span class="s2">"actual_input_1"</span> <span class="p">]</span> <span class="n">output_names</span> <span class="o">=</span> <span class="p">[</span> <span class="s2">"output1"</span> <span class="p">]</span> <span class="k">with</span> <span class="n">pytorch_quantization</span><span class="o">.</span><span class="n">enable_onnx_export</span><span class="p">():</span> <span class="c1"># enable_onnx_checker needs to be disabled. See notes below.</span> <span class="n">torch</span><span class="o">.</span><span class="n">onnx</span><span class="o">.</span><span class="n">export</span><span class="p">(</span> <span class="n">model</span><span class="p">,</span> <span class="n">dummy_input</span><span class="p">,</span> <span class="s2">"quant_resnet50.onnx"</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">opset_version</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">enable_onnx_checker</span><span class="o">=</span><span class="kc">False</span> <span class="p">)</span> </pre></div> </div> <div class="admonition note"> <p class="admonition-title">Note</p> <p>Note that <code class="docutils literal notranslate"><span class="pre">axis</span></code> is added to <code class="docutils literal notranslate"><span class="pre">QuantizeLinear</span></code> and <code class="docutils literal notranslate"><span class="pre">DequantizeLinear</span></code> in opset13.</p> </div> </div> </div> <div class="toctree-wrapper compound"> <span id="document-tutorials/quant_resnet50"></span><div class="section" id="quantizing-resnet50"> <h2>Quantizing Resnet50<a class="headerlink" href="#quantizing-resnet50" title="Permalink to this headline"></a></h2> <div class="section" id="create-a-quantized-model"> <h3>Create a quantized model<a class="headerlink" href="#create-a-quantized-model" title="Permalink to this headline"></a></h3> <p>Import the necessary python modules:</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">torch</span> <span class="kn">import</span> <span class="nn">torch.utils.data</span> <span class="kn">from</span> <span class="nn">torch</span> <span class="kn">import</span> <span class="n">nn</span> <span class="kn">from</span> <span class="nn">pytorch_quantization</span> <span class="kn">import</span> <span class="n">nn</span> <span class="k">as</span> <span class="n">quant_nn</span> <span class="kn">from</span> <span class="nn">pytorch_quantization</span> <span class="kn">import</span> <span class="n">calib</span> <span class="kn">from</span> <span class="nn">pytorch_quantization.tensor_quant</span> <span class="kn">import</span> <span class="n">QuantDescriptor</span> <span class="kn">from</span> <span class="nn">torchvision</span> <span class="kn">import</span> <span class="n">models</span> <span class="n">sys</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s2">"path to torchvision/references/classification/"</span><span class="p">)</span> <span class="kn">from</span> <span class="nn">train</span> <span class="kn">import</span> <span class="n">evaluate</span><span class="p">,</span> <span class="n">train_one_epoch</span><span class="p">,</span> <span class="n">load_data</span> </pre></div> </div> <div class="section" id="adding-quantized-modules"> <h4>Adding quantized modules<a class="headerlink" href="#adding-quantized-modules" title="Permalink to this headline"></a></h4> <p>The first step is to add quantizer modules to the neural network graph. This package provides a number of quantized layer modules, which contain quantizers for inputs and weights. e.g. <code class="docutils literal notranslate"><span class="pre">quant_nn.QuantLinear</span></code>, which can be used in place of <code class="docutils literal notranslate"><span class="pre">nn.Linear</span></code>. These quantized layers can be substituted automatically, via monkey-patching, or by manually modifying the model definition.</p> <p>Automatic layer substitution is done with <code class="docutils literal notranslate"><span class="pre">quant_modules</span></code>. This should be called before model creation.</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pytorch_quantization</span> <span class="kn">import</span> <span class="n">quant_modules</span> <span class="n">quant_modules</span><span class="o">.</span><span class="n">initialize</span><span class="p">()</span> </pre></div> </div> <p>This will apply to all instances of each module. If you do not want all modules to be quantized you should instead substitute the quantized modules manually. Stand-alone quantizers can also be added to the model with <code class="docutils literal notranslate"><span class="pre">quant_nn.TensorQuantizer</span></code>.</p> </div> </div> <div class="section" id="post-training-quantization"> <h3>Post training quantization<a class="headerlink" href="#post-training-quantization" title="Permalink to this headline"></a></h3> <p>For efficient inference, we want to select a fixed range for each quantizer. Starting with a pre-trained model, the simplest way to do this is by calibration.</p> <div class="section" id="calibration"> <h4>Calibration<a class="headerlink" href="#calibration" title="Permalink to this headline"></a></h4> <p>We will use histogram based calibration for activations and the default max calibration for weights.</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">quant_desc_input</span> <span class="o">=</span> <span class="n">QuantDescriptor</span><span class="p">(</span><span class="n">calib_method</span><span class="o">=</span><span class="s1">'histogram'</span><span class="p">)</span> <span class="n">quant_nn</span><span class="o">.</span><span class="n">QuantConv2d</span><span class="o">.</span><span class="n">set_default_quant_desc_input</span><span class="p">(</span><span class="n">quant_desc_input</span><span class="p">)</span> <span class="n">quant_nn</span><span class="o">.</span><span class="n">QuantLinear</span><span class="o">.</span><span class="n">set_default_quant_desc_input</span><span class="p">(</span><span class="n">quant_desc_input</span><span class="p">)</span> <span class="n">model</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">resnet50</span><span class="p">(</span><span class="n">pretrained</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> <span class="n">model</span><span class="o">.</span><span class="n">cuda</span><span class="p">()</span> </pre></div> </div> <p>To collect activation histograms we must feed sample data in to the model. First, create ImageNet dataloaders as done in the training script. Then, enable calibration in each quantizer and feed training data in to the model. 1024 samples (2 batches of 512) should be sufficient to estimate the distribution of activations. Use training data for calibration so that validation also measures generalization of the selected ranges.</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">data_path</span> <span class="o">=</span> <span class="s2">"PATH to imagenet"</span> <span class="n">batch_size</span> <span class="o">=</span> <span class="mi">512</span> <span class="n">traindir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="s1">'train'</span><span class="p">)</span> <span class="n">valdir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">data_path</span><span class="p">,</span> <span class="s1">'val'</span><span class="p">)</span> <span class="n">dataset</span><span class="p">,</span> <span class="n">dataset_test</span><span class="p">,</span> <span class="n">train_sampler</span><span class="p">,</span> <span class="n">test_sampler</span> <span class="o">=</span> <span class="n">load_data</span><span class="p">(</span><span class="n">traindir</span><span class="p">,</span> <span class="n">valdir</span><span class="p">,</span> <span class="kc">False</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span> <span class="n">data_loader</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">DataLoader</span><span class="p">(</span> <span class="n">dataset</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="n">batch_size</span><span class="p">,</span> <span class="n">sampler</span><span class="o">=</span><span class="n">train_sampler</span><span class="p">,</span> <span class="n">num_workers</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">pin_memory</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> <span class="n">data_loader_test</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">DataLoader</span><span class="p">(</span> <span class="n">dataset_test</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="n">batch_size</span><span class="p">,</span> <span class="n">sampler</span><span class="o">=</span><span class="n">test_sampler</span><span class="p">,</span> <span class="n">num_workers</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">pin_memory</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> </pre></div> </div> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span> <span class="k">def</span> <span class="nf">collect_stats</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">data_loader</span><span class="p">,</span> <span class="n">num_batches</span><span class="p">):</span> <span class="sd">"""Feed data to the network and collect statistic"""</span> <span class="c1"># Enable calibrators</span> <span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">module</span> <span class="ow">in</span> <span class="n">model</span><span class="o">.</span><span class="n">named_modules</span><span class="p">():</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">module</span><span class="p">,</span> <span class="n">quant_nn</span><span class="o">.</span><span class="n">TensorQuantizer</span><span class="p">):</span> <span class="k">if</span> <span class="n">module</span><span class="o">.</span><span class="n">_calibrator</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span> <span class="n">module</span><span class="o">.</span><span class="n">disable_quant</span><span class="p">()</span> <span class="n">module</span><span class="o">.</span><span class="n">enable_calib</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="n">module</span><span class="o">.</span><span class="n">disable</span><span class="p">()</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">_</span><span class="p">)</span> <span class="ow">in</span> <span class="n">tqdm</span><span class="p">(</span><span class="nb">enumerate</span><span class="p">(</span><span class="n">data_loader</span><span class="p">),</span> <span class="n">total</span><span class="o">=</span><span class="n">num_batches</span><span class="p">):</span> <span class="n">model</span><span class="p">(</span><span class="n">image</span><span class="o">.</span><span class="n">cuda</span><span class="p">())</span> <span class="k">if</span> <span class="n">i</span> <span class="o">>=</span> <span class="n">num_batches</span><span class="p">:</span> <span class="k">break</span> <span class="c1"># Disable calibrators</span> <span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">module</span> <span class="ow">in</span> <span class="n">model</span><span class="o">.</span><span class="n">named_modules</span><span class="p">():</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">module</span><span class="p">,</span> <span class="n">quant_nn</span><span class="o">.</span><span class="n">TensorQuantizer</span><span class="p">):</span> <span class="k">if</span> <span class="n">module</span><span class="o">.</span><span class="n">_calibrator</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span> <span class="n">module</span><span class="o">.</span><span class="n">enable_quant</span><span class="p">()</span> <span class="n">module</span><span class="o">.</span><span class="n">disable_calib</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="n">module</span><span class="o">.</span><span class="n">enable</span><span class="p">()</span> <span class="k">def</span> <span class="nf">compute_amax</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span> <span class="c1"># Load calib result</span> <span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">module</span> <span class="ow">in</span> <span class="n">model</span><span class="o">.</span><span class="n">named_modules</span><span class="p">():</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">module</span><span class="p">,</span> <span class="n">quant_nn</span><span class="o">.</span><span class="n">TensorQuantizer</span><span class="p">):</span> <span class="k">if</span> <span class="n">module</span><span class="o">.</span><span class="n">_calibrator</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">module</span><span class="o">.</span><span class="n">_calibrator</span><span class="p">,</span> <span class="n">calib</span><span class="o">.</span><span class="n">MaxCalibrator</span><span class="p">):</span> <span class="n">module</span><span class="o">.</span><span class="n">load_calib_amax</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="n">module</span><span class="o">.</span><span class="n">load_calib_amax</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="sa">F</span><span class="s2">"</span><span class="si">{</span><span class="n">name</span><span class="si">:</span><span class="s2">40</span><span class="si">}</span><span class="s2">: </span><span class="si">{</span><span class="n">module</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span> <span class="n">model</span><span class="o">.</span><span class="n">cuda</span><span class="p">()</span> <span class="c1"># It is a bit slow since we collect histograms on CPU</span> <span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span> <span class="n">collect_stats</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">data_loader</span><span class="p">,</span> <span class="n">num_batches</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span> <span class="n">compute_amax</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s2">"percentile"</span><span class="p">,</span> <span class="n">percentile</span><span class="o">=</span><span class="mf">99.99</span><span class="p">)</span> </pre></div> </div> <p>After calibration is done, quantizers will have <code class="docutils literal notranslate"><span class="pre">amax</span></code> set, which represents the absolute maximum input value representable in the quantized space. By default, weight ranges are per channel while activation ranges are per tensor. We can see the condensed amaxes by printing each <code class="docutils literal notranslate"><span class="pre">TensorQuantizer</span></code> module.</p> <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">conv1</span><span class="o">.</span><span class="n">_input_quantizer</span> <span class="p">:</span> <span class="n">TensorQuantizer</span><span class="p">(</span><span class="mi">8</span><span class="n">bit</span> <span class="n">fake</span> <span class="n">per</span><span class="o">-</span><span class="n">tensor</span> <span class="n">amax</span><span class="o">=</span><span class="mf">2.6400</span> <span class="n">calibrator</span><span class="o">=</span><span class="n">MaxCalibrator</span><span class="p">(</span><span class="n">track_amax</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="n">quant</span><span class="p">)</span> <span class="n">conv1</span><span class="o">.</span><span class="n">_weight_quantizer</span> <span class="p">:</span> <span class="n">TensorQuantizer</span><span class="p">(</span><span class="mi">8</span><span class="n">bit</span> <span class="n">fake</span> <span class="n">axis</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="n">amax</span><span class="o">=</span><span class="p">[</span><span class="mf">0.0000</span><span class="p">,</span> <span class="mf">0.7817</span><span class="p">](</span><span class="mi">64</span><span class="p">)</span> <span class="n">calibrator</span><span class="o">=</span><span class="n">MaxCalibrator</span><span class="p">(</span><span class="n">track_amax</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="n">quant</span><span class="p">)</span> <span class="n">layer1</span><span class="mf">.0</span><span class="o">.</span><span class="n">conv1</span><span class="o">.</span><span class="n">_input_quantizer</span> <span class="p">:</span> <span class="n">TensorQuantizer</span><span class="p">(</span><span class="mi">8</span><span class="n">bit</span> <span class="n">fake</span> <span class="n">per</span><span class="o">-</span><span class="n">tensor</span> <span class="n">amax</span><span class="o">=</span><span class="mf">6.8645</span> <span class="n">calibrator</span><span class="o">=</span><span class="n">MaxCalibrator</span><span class="p">(</span><span class="n">track_amax</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="n">quant</span><span class="p">)</span> <span class="n">layer1</span><span class="mf">.0</span><span class="o">.</span><span class="n">conv1</span><span class="o">.</span><span class="n">_weight_quantizer</span> <span class="p">:</span> <span class="n">TensorQuantizer</span><span class="p">(</span><span class="mi">8</span><span class="n">bit</span> <span class="n">fake</span> <span class="n">axis</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="n">amax</span><span class="o">=</span><span class="p">[</span><span class="mf">0.0000</span><span class="p">,</span> <span class="mf">0.7266</span><span class="p">](</span><span class="mi">64</span><span class="p">)</span> <span class="n">calibrator</span><span class="o">=</span><span class="n">MaxCalibrator</span><span class="p">(</span><span class="n">track_amax</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="n">quant</span><span class="p">)</span> <span class="o">...</span> </pre></div> </div> </div> <div class="section" id="evaluate-the-calibrated-model"> <h4>Evaluate the calibrated model<a class="headerlink" href="#evaluate-the-calibrated-model" title="Permalink to this headline"></a></h4> <p>Next we will evaluate the classification accuracy of our post training quantized model on the ImageNet validation set.</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">criterion</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">CrossEntropyLoss</span><span class="p">()</span> <span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span> <span class="n">evaluate</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">data_loader_test</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s2">"cuda"</span><span class="p">,</span> <span class="n">print_freq</span><span class="o">=</span><span class="mi">20</span><span class="p">)</span> <span class="c1"># Save the model</span> <span class="n">torch</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">state_dict</span><span class="p">(),</span> <span class="s2">"/tmp/quant_resnet50-calibrated.pth"</span><span class="p">)</span> </pre></div> </div> <p>This should yield 76.1% top-1 accuracy, which is close to the pre-trained model accuracy of 76.2%.</p> </div> <div class="section" id="use-different-calibration"> <h4>Use different calibration<a class="headerlink" href="#use-different-calibration" title="Permalink to this headline"></a></h4> <p>We can try different calibrations without recollecting the histograms, and see which one gets the best accuracy.</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span> <span class="n">compute_amax</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s2">"percentile"</span><span class="p">,</span> <span class="n">percentile</span><span class="o">=</span><span class="mf">99.9</span><span class="p">)</span> <span class="n">evaluate</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">data_loader_test</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s2">"cuda"</span><span class="p">,</span> <span class="n">print_freq</span><span class="o">=</span><span class="mi">20</span><span class="p">)</span> <span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span> <span class="k">for</span> <span class="n">method</span> <span class="ow">in</span> <span class="p">[</span><span class="s2">"mse"</span><span class="p">,</span> <span class="s2">"entropy"</span><span class="p">]:</span> <span class="nb">print</span><span class="p">(</span><span class="sa">F</span><span class="s2">"</span><span class="si">{</span><span class="n">method</span><span class="si">}</span><span class="s2"> calibration"</span><span class="p">)</span> <span class="n">compute_amax</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="n">method</span><span class="p">)</span> <span class="n">evaluate</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">data_loader_test</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s2">"cuda"</span><span class="p">,</span> <span class="n">print_freq</span><span class="o">=</span><span class="mi">20</span><span class="p">)</span> </pre></div> </div> <p>MSE and entropy should both get over 76%. 99.9% clips too many values for resnet50 and will get slightly lower accuracy.</p> </div> </div> <div class="section" id="quantization-aware-training"> <h3>Quantization Aware Training<a class="headerlink" href="#quantization-aware-training" title="Permalink to this headline"></a></h3> <p>Optionally, we can fine-tune the calibrated model to improve accuracy further.</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">criterion</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">CrossEntropyLoss</span><span class="p">()</span> <span class="n">optimizer</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">optim</span><span class="o">.</span><span class="n">SGD</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">parameters</span><span class="p">(),</span> <span class="n">lr</span><span class="o">=</span><span class="mf">0.0001</span><span class="p">)</span> <span class="n">lr_scheduler</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">optim</span><span class="o">.</span><span class="n">lr_scheduler</span><span class="o">.</span><span class="n">StepLR</span><span class="p">(</span><span class="n">optimizer</span><span class="p">,</span> <span class="n">step_size</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">gamma</span><span class="o">=</span><span class="mf">0.1</span><span class="p">)</span> <span class="c1"># Training takes about one and half hour per epoch on a single V100</span> <span class="n">train_one_epoch</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">optimizer</span><span class="p">,</span> <span class="n">data_loader</span><span class="p">,</span> <span class="s2">"cuda"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span> <span class="c1"># Save the model</span> <span class="n">torch</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">state_dict</span><span class="p">(),</span> <span class="s2">"/tmp/quant_resnet50-finetuned.pth"</span><span class="p">)</span> </pre></div> </div> <p>After one epoch of fine-tuning, we can achieve over 76.4% top-1 accuracy. Fine-tuning for more epochs with learning rate annealing can improve accuracy further. For example, fine-tuning for 15 epochs with cosine annealing starting with a learning rate of 0.001 can get over 76.7%. It should be noted that the same fine-tuning schedule will improve the accuracy of the unquantized model as well.</p> <div class="section" id="further-optimization"> <h4>Further optimization<a class="headerlink" href="#further-optimization" title="Permalink to this headline"></a></h4> <p>For efficient inference on TensorRT, we need know more details about the runtime optimization. TensorRT supports fusion of quantizing convolution and residual add. The new fused operator has two inputs. Let us call them conv-input and residual-input. Here the fused operator’s output precision must match the residual input precision. When there is another quantizing node after the fused operator, we can insert a pair of quantizing/dequantizing nodes between the residual-input and the Elementwise-Addition node, so that quantizing node after the Convolution node is fused with the Convolution node, and the Convolution node is completely quantized with INT8 input and output. We cannot use automatic monkey-patching to apply this optimization and we need to manually insert the quantizing/dequantizing nodes.</p> <p>First create a copy of resnet.py from <a class="reference external" href="https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py">https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py</a>, modify the constructor, add explicit bool flag ‘quantize’</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">resnet50</span><span class="p">(</span><span class="n">pretrained</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span> <span class="n">progress</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">True</span><span class="p">,</span> <span class="n">quantize</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">:</span> <span class="n">Any</span><span class="p">)</span> <span class="o">-></span> <span class="n">ResNet</span><span class="p">:</span> <span class="k">return</span> <span class="n">_resnet</span><span class="p">(</span><span class="s1">'resnet50'</span><span class="p">,</span> <span class="n">Bottleneck</span><span class="p">,</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="n">pretrained</span><span class="p">,</span> <span class="n">progress</span><span class="p">,</span> <span class="n">quantize</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="k">def</span> <span class="nf">_resnet</span><span class="p">(</span><span class="n">arch</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">block</span><span class="p">:</span> <span class="n">Type</span><span class="p">[</span><span class="n">Union</span><span class="p">[</span><span class="n">BasicBlock</span><span class="p">,</span> <span class="n">Bottleneck</span><span class="p">]],</span> <span class="n">layers</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">int</span><span class="p">],</span> <span class="n">pretrained</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span> <span class="n">progress</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span> <span class="n">quantize</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">:</span> <span class="n">Any</span><span class="p">)</span> <span class="o">-></span> <span class="n">ResNet</span><span class="p">:</span> <span class="n">model</span> <span class="o">=</span> <span class="n">ResNet</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">quantize</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="k">class</span> <span class="nc">ResNet</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">block</span><span class="p">:</span> <span class="n">Type</span><span class="p">[</span><span class="n">Union</span><span class="p">[</span><span class="n">BasicBlock</span><span class="p">,</span> <span class="n">Bottleneck</span><span class="p">]],</span> <span class="n">layers</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">int</span><span class="p">],</span> <span class="n">quantize</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span> <span class="n">num_classes</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1000</span><span class="p">,</span> <span class="n">zero_init_residual</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span> <span class="n">groups</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">width_per_group</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">64</span><span class="p">,</span> <span class="n">replace_stride_with_dilation</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">List</span><span class="p">[</span><span class="nb">bool</span><span class="p">]]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="n">norm_layer</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Callable</span><span class="p">[</span><span class="o">...</span><span class="p">,</span> <span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">]]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span> <span class="nb">super</span><span class="p">(</span><span class="n">ResNet</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">_quantize</span> <span class="o">=</span> <span class="n">quantize</span> </pre></div> </div> <p>When this <code class="docutils literal notranslate"><span class="pre">self._quantize</span></code> flag is set to <code class="docutils literal notranslate"><span class="pre">True</span></code>, we need replace all the <code class="docutils literal notranslate"><span class="pre">nn.Conv2d</span></code> with <code class="docutils literal notranslate"><span class="pre">quant_nn.QuantConv2d</span></code>.</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">conv3x3</span><span class="p">(</span><span class="n">in_planes</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">out_planes</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">stride</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">groups</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">dilation</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">quantize</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">)</span> <span class="o">-></span> <span class="n">nn</span><span class="o">.</span><span class="n">Conv2d</span><span class="p">:</span> <span class="sd">"""3x3 convolution with padding"""</span> <span class="k">if</span> <span class="n">quantize</span><span class="p">:</span> <span class="k">return</span> <span class="n">quant_nn</span><span class="o">.</span><span class="n">QuantConv2d</span><span class="p">(</span><span class="n">in_planes</span><span class="p">,</span> <span class="n">out_planes</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="n">stride</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="n">dilation</span><span class="p">,</span> <span class="n">groups</span><span class="o">=</span><span class="n">groups</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">dilation</span><span class="o">=</span><span class="n">dilation</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="k">return</span> <span class="n">nn</span><span class="o">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">in_planes</span><span class="p">,</span> <span class="n">out_planes</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="n">stride</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="n">dilation</span><span class="p">,</span> <span class="n">groups</span><span class="o">=</span><span class="n">groups</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">dilation</span><span class="o">=</span><span class="n">dilation</span><span class="p">)</span> <span class="k">def</span> <span class="nf">conv1x1</span><span class="p">(</span><span class="n">in_planes</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">out_planes</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">stride</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">quantize</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">)</span> <span class="o">-></span> <span class="n">nn</span><span class="o">.</span><span class="n">Conv2d</span><span class="p">:</span> <span class="sd">"""1x1 convolution"""</span> <span class="k">if</span> <span class="n">quantize</span><span class="p">:</span> <span class="k">return</span> <span class="n">quant_nn</span><span class="o">.</span><span class="n">QuantConv2d</span><span class="p">(</span><span class="n">in_planes</span><span class="p">,</span> <span class="n">out_planes</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="n">stride</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="k">return</span> <span class="n">nn</span><span class="o">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">in_planes</span><span class="p">,</span> <span class="n">out_planes</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="n">stride</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> </pre></div> </div> <p>The residual conv add can be find both in both <code class="docutils literal notranslate"><span class="pre">BasicBlock</span></code> and <code class="docutils literal notranslate"><span class="pre">Bottleneck</span></code>. We need first declare quantization node in the <code class="docutils literal notranslate"><span class="pre">__init__</span></code> function.</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inplanes</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">planes</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">stride</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">downsample</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="n">groups</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">base_width</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">64</span><span class="p">,</span> <span class="n">dilation</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="n">norm_layer</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Callable</span><span class="p">[</span><span class="o">...</span><span class="p">,</span> <span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">]]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="n">quantize</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span> <span class="c1"># other code...</span> <span class="bp">self</span><span class="o">.</span><span class="n">_quantize</span> <span class="o">=</span> <span class="n">quantize</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">_quantize</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">residual_quantizer</span> <span class="o">=</span> <span class="n">quant_nn</span><span class="o">.</span><span class="n">TensorQuantizer</span><span class="p">(</span><span class="n">quant_nn</span><span class="o">.</span><span class="n">QuantConv2d</span><span class="o">.</span><span class="n">default_quant_desc_input</span><span class="p">)</span> </pre></div> </div> <p>Finally we need patch the <code class="docutils literal notranslate"><span class="pre">forward</span></code> function in both <code class="docutils literal notranslate"><span class="pre">BasicBlock</span></code> and <code class="docutils literal notranslate"><span class="pre">Bottleneck</span></code>, inserting extra quantization/dequantization nodes here.</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">:</span> <span class="n">Tensor</span><span class="p">)</span> <span class="o">-></span> <span class="n">Tensor</span><span class="p">:</span> <span class="c1"># other code...</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">_quantize</span><span class="p">:</span> <span class="n">out</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">residual_quantizer</span><span class="p">(</span><span class="n">identity</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="n">out</span> <span class="o">+=</span> <span class="n">identity</span> <span class="n">out</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="n">out</span><span class="p">)</span> <span class="k">return</span> <span class="n">out</span> </pre></div> </div> <p>The final resnet code with residual quantized can be found in <a class="reference external" href="https://github.com/NVIDIA/TensorRT/blob/master/tools/pytorch-quantization/examples/torchvision/models/classification/resnet.py">https://github.com/NVIDIA/TensorRT/blob/master/tools/pytorch-quantization/examples/torchvision/models/classification/resnet.py</a></p> </div> </div> </div> <span id="document-tutorials/creating_custom_quantized_modules"></span><div class="section" id="creating-custom-quantized-modules"> <h2>Creating Custom Quantized Modules<a class="headerlink" href="#creating-custom-quantized-modules" title="Permalink to this headline"></a></h2> <p>There are several quantized modules provided by the quantization tool as follows:</p> <ul class="simple"> <li><p>QuantConv1d, QuantConv2d, QuantConv3d, QuantConvTranspose1d, QuantConvTranspose2d, QuantConvTranspose3d</p></li> <li><p>QuantLinear</p></li> <li><p>QuantAvgPool1d, QuantAvgPool2d, QuantAvgPool3d, QuantMaxPool1d, QuantMaxPool2d, QuantMaxPool3d</p></li> </ul> <p>To quantize a module, we need to quantize the input and weights if present. Following are 3 major use-cases:</p> <ol class="arabic simple"> <li><p>Create quantized wrapper for modules that have only inputs</p></li> <li><p>Create quantized wrapper for modules that have inputs as well as weights.</p></li> <li><p>Directly add the <code class="docutils literal notranslate"><span class="pre">TensorQuantizer</span></code> module to the inputs of an operation in the model graph.</p></li> </ol> <p>The first two methods are very useful if it’s needed to automatically replace the original modules (nodes in the graph) with their quantized versions. The third method could be useful when it’s required to manually add the quantization to the model graph at very specific places (more manual, more control).</p> <p>Let’s see each use-case with examples below.</p> <div class="section" id="quantizing-modules-with-only-inputs"> <h3>Quantizing Modules With Only Inputs<a class="headerlink" href="#quantizing-modules-with-only-inputs" title="Permalink to this headline"></a></h3> <p>A suitable example would be quantizing the <code class="docutils literal notranslate"><span class="pre">pooling</span></code> module variants.</p> <p>Essentially, we need to provide a wrapper function that takes the original module and adds the <code class="docutils literal notranslate"><span class="pre">TensorQuantizer</span></code> module around it so that the input is first quantized and then fed into the original module.</p> <ul class="simple"> <li><p>Create the wrapper by subclassing the original module (<code class="docutils literal notranslate"><span class="pre">pooling.MaxPool2d</span></code>) along with the utilities module (<code class="docutils literal notranslate"><span class="pre">_utils.QuantInputMixin</span></code>).</p></li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">QuantMaxPool2d</span><span class="p">(</span><span class="n">pooling</span><span class="o">.</span><span class="n">MaxPool2d</span><span class="p">,</span> <span class="n">_utils</span><span class="o">.</span><span class="n">QuantInputMixin</span><span class="p">):</span> </pre></div> </div> <ul class="simple"> <li><p>The <code class="docutils literal notranslate"><span class="pre">__init__.py</span></code> function would call the original module’s init function and provide it with the corresponding arguments. There would be just one additional argument using <code class="docutils literal notranslate"><span class="pre">**kwargs</span></code> which contains the quantization configuration information. The <code class="docutils literal notranslate"><span class="pre">QuantInputMixin</span></code> utility contains the method <code class="docutils literal notranslate"><span class="pre">pop_quant_desc_in_kwargs</span></code> which extracts this configuration information from the input or returns a default if that input is <code class="docutils literal notranslate"><span class="pre">None</span></code>. Finally the <code class="docutils literal notranslate"><span class="pre">init_quantizer</span></code> method is called that initializes the <code class="docutils literal notranslate"><span class="pre">TensorQuantizer</span></code> module which would quantize the inputs.</p></li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">kernel_size</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">dilation</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">return_indices</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">ceil_mode</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span> <span class="nb">super</span><span class="p">(</span><span class="n">QuantMaxPool2d</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">kernel_size</span><span class="p">,</span> <span class="n">stride</span><span class="p">,</span> <span class="n">padding</span><span class="p">,</span> <span class="n">dilation</span><span class="p">,</span> <span class="n">return_indices</span><span class="p">,</span> <span class="n">ceil_mode</span><span class="p">)</span> <span class="n">quant_desc_input</span> <span class="o">=</span> <span class="n">_utils</span><span class="o">.</span><span class="n">pop_quant_desc_in_kwargs</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="p">,</span> <span class="n">input_only</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">init_quantizer</span><span class="p">(</span><span class="n">quant_desc_input</span><span class="p">)</span> </pre></div> </div> <ul class="simple"> <li><p>After the initialization, the <code class="docutils literal notranslate"><span class="pre">forward</span></code> function needs to be defined in our wrapper module that would actually quantize the inputs using the <code class="docutils literal notranslate"><span class="pre">_input_quantizer</span></code> that was initialized in the <code class="docutils literal notranslate"><span class="pre">__init__</span></code> function forwarding the inputs to the base module using <code class="docutils literal notranslate"><span class="pre">super</span></code> call.</p></li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">input</span><span class="p">):</span> <span class="n">quant_input</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_input_quantizer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span> <span class="k">return</span> <span class="nb">super</span><span class="p">(</span><span class="n">QuantMaxPool2d</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">forward</span><span class="p">(</span><span class="n">quant_input</span><span class="p">)</span> </pre></div> </div> <ul class="simple"> <li><p>Finally, we need to define a getter method for the <code class="docutils literal notranslate"><span class="pre">_input_quantizer</span></code>. This could, for example, be used to disable the quantization for a particular module using <code class="docutils literal notranslate"><span class="pre">module.input_quantizer.disable()</span></code> which is helpful while experimenting with different layer quantization configuration.</p></li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="nd">@property</span> <span class="k">def</span> <span class="nf">input_quantizer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_input_quantizer</span> </pre></div> </div> <p>A complete quantized pooling module would look like following:</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">QuantMaxPool2d</span><span class="p">(</span><span class="n">pooling</span><span class="o">.</span><span class="n">MaxPool2d</span><span class="p">,</span> <span class="n">_utils</span><span class="o">.</span><span class="n">QuantInputMixin</span><span class="p">):</span> <span class="sd">"""Quantized 2D maxpool"""</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">kernel_size</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">dilation</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">return_indices</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">ceil_mode</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span> <span class="nb">super</span><span class="p">(</span><span class="n">QuantMaxPool2d</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">kernel_size</span><span class="p">,</span> <span class="n">stride</span><span class="p">,</span> <span class="n">padding</span><span class="p">,</span> <span class="n">dilation</span><span class="p">,</span> <span class="n">return_indices</span><span class="p">,</span> <span class="n">ceil_mode</span><span class="p">)</span> <span class="n">quant_desc_input</span> <span class="o">=</span> <span class="n">_utils</span><span class="o">.</span><span class="n">pop_quant_desc_in_kwargs</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="p">,</span> <span class="n">input_only</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">init_quantizer</span><span class="p">(</span><span class="n">quant_desc_input</span><span class="p">)</span> <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">input</span><span class="p">):</span> <span class="n">quant_input</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_input_quantizer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span> <span class="k">return</span> <span class="nb">super</span><span class="p">(</span><span class="n">QuantMaxPool2d</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">forward</span><span class="p">(</span><span class="n">quant_input</span><span class="p">)</span> <span class="nd">@property</span> <span class="k">def</span> <span class="nf">input_quantizer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_input_quantizer</span> </pre></div> </div> </div> <div class="section" id="quantizing-modules-with-weights-and-inputs"> <h3>Quantizing Modules With Weights and Inputs<a class="headerlink" href="#quantizing-modules-with-weights-and-inputs" title="Permalink to this headline"></a></h3> <p>We give an example of quantizing the <code class="docutils literal notranslate"><span class="pre">torch.nn.Linear</span></code> module. It follows that the only additional change from the previous example of quantizing pooling modules is that we’d need to accomodate the quantization of weights in the Linear module.</p> <ul class="simple"> <li><p>We create the quantized linear module as follows:</p></li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">QuantLinear</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">,</span> <span class="n">_utils</span><span class="o">.</span><span class="n">QuantMixin</span><span class="p">):</span> </pre></div> </div> <ul class="simple"> <li><p>In the <code class="docutils literal notranslate"><span class="pre">__init__</span></code> function, we first use the <code class="docutils literal notranslate"><span class="pre">pop_quant_desc_in_kwargs</span></code> function to extract the quantization descriptors for both inputs and weights. Second, we initialize the <code class="docutils literal notranslate"><span class="pre">TensorQuantizer</span></code> modules for both inputs and weights using these quantization descriptors.</p></li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">in_features</span><span class="p">,</span> <span class="n">out_features</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span> <span class="nb">super</span><span class="p">(</span><span class="n">QuantLinear</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">in_features</span><span class="p">,</span> <span class="n">out_features</span><span class="p">,</span> <span class="n">bias</span><span class="p">)</span> <span class="n">quant_desc_input</span><span class="p">,</span> <span class="n">quant_desc_weight</span> <span class="o">=</span> <span class="n">_utils</span><span class="o">.</span><span class="n">pop_quant_desc_in_kwargs</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">init_quantizer</span><span class="p">(</span><span class="n">quant_desc_input</span><span class="p">,</span> <span class="n">quant_desc_weight</span><span class="p">)</span> </pre></div> </div> <ul class="simple"> <li><p>Also, override the <code class="docutils literal notranslate"><span class="pre">forward</span></code> function call and pass the inputs and weights through <code class="docutils literal notranslate"><span class="pre">_input_quantizer</span></code> and <code class="docutils literal notranslate"><span class="pre">_weight_quantizer</span></code> respectively before passing the quantized arguments to the actual <code class="docutils literal notranslate"><span class="pre">F.Linear</span></code> call. This step adds the actual input/weight <code class="docutils literal notranslate"><span class="pre">TensorQuantizer</span></code> to the module and eventually the model.</p></li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">input</span><span class="p">):</span> <span class="n">quant_input</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_input_quantizer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span> <span class="n">quant_weight</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_weight_quantizer</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">weight</span><span class="p">)</span> <span class="n">output</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">linear</span><span class="p">(</span><span class="n">quant_input</span><span class="p">,</span> <span class="n">quant_weight</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">bias</span><span class="p">)</span> <span class="k">return</span> <span class="n">output</span> </pre></div> </div> <ul class="simple"> <li><p>Also similar to the <code class="docutils literal notranslate"><span class="pre">Linear</span></code> module, we add the getter methods for the <code class="docutils literal notranslate"><span class="pre">TensorQuantizer</span></code> modules associated with inputs/weights. This could be used to, for example, disable the quantization mechanism by calling <code class="docutils literal notranslate"><span class="pre">module_obj.weight_quantizer.disable()</span></code></p></li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="nd">@property</span> <span class="k">def</span> <span class="nf">input_quantizer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_input_quantizer</span> <span class="nd">@property</span> <span class="k">def</span> <span class="nf">weight_quantizer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_weight_quantizer</span> </pre></div> </div> <ul class="simple"> <li><p>With all of the above changes, the quantized Linear module would look like following:</p></li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">QuantLinear</span><span class="p">(</span><span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">,</span> <span class="n">_utils</span><span class="o">.</span><span class="n">QuantMixin</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">in_features</span><span class="p">,</span> <span class="n">out_features</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span> <span class="nb">super</span><span class="p">(</span><span class="n">QuantLinear</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">in_features</span><span class="p">,</span> <span class="n">out_features</span><span class="p">,</span> <span class="n">bias</span><span class="p">)</span> <span class="n">quant_desc_input</span><span class="p">,</span> <span class="n">quant_desc_weight</span> <span class="o">=</span> <span class="n">_utils</span><span class="o">.</span><span class="n">pop_quant_desc_in_kwargs</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">init_quantizer</span><span class="p">(</span><span class="n">quant_desc_input</span><span class="p">,</span> <span class="n">quant_desc_weight</span><span class="p">)</span> <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">input</span><span class="p">):</span> <span class="n">quant_input</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_input_quantizer</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span> <span class="n">quant_weight</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_weight_quantizer</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">weight</span><span class="p">)</span> <span class="n">output</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">linear</span><span class="p">(</span><span class="n">quant_input</span><span class="p">,</span> <span class="n">quant_weight</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">bias</span><span class="p">)</span> <span class="k">return</span> <span class="n">output</span> <span class="nd">@property</span> <span class="k">def</span> <span class="nf">input_quantizer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_input_quantizer</span> <span class="nd">@property</span> <span class="k">def</span> <span class="nf">weight_quantizer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_weight_quantizer</span> </pre></div> </div> </div> <div class="section" id="directly-quantizing-inputs-in-graph"> <h3>Directly Quantizing Inputs In Graph<a class="headerlink" href="#directly-quantizing-inputs-in-graph" title="Permalink to this headline"></a></h3> <p>It is also possible to directly quantize graph inputs without creating wrappers as explained above.</p> <p>Here’s an example:</p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">test_input</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">double</span><span class="p">)</span> <span class="n">quantizer</span> <span class="o">=</span> <span class="n">TensorQuantizer</span><span class="p">(</span><span class="n">quant_nn</span><span class="o">.</span><span class="n">QuantLinear</span><span class="o">.</span><span class="n">default_quant_desc_input</span><span class="p">)</span> <span class="n">quant_input</span> <span class="o">=</span> <span class="n">quantizer</span><span class="p">(</span><span class="n">test_input</span><span class="p">)</span> <span class="n">out</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">adaptive_avg_pool2d</span><span class="p">(</span><span class="n">quant_input</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> </pre></div> </div> <p>Assume that there is a <code class="docutils literal notranslate"><span class="pre">F.adaptive_avg_pool2d</span></code> operation in the graph and we’d like to quantize this operation. In the example above, we use <code class="docutils literal notranslate"><span class="pre">TensorQuantizer(quant_nn.QuantLinear.default_quant_desc_input)</span></code> to define a quantizer that we then use to actually quantize the <code class="docutils literal notranslate"><span class="pre">test_input</span></code> and then feed this quantized input to the <code class="docutils literal notranslate"><span class="pre">F.adaptive_avg_pool2d</span></code> operation. Note that this quantizer is the same as the ones we used earlier while created quantized versions of torch’s modules.</p> </div> </div> </div> <div class="toctree-wrapper compound"> <span id="document-calib"></span><div class="section" id="module-pytorch_quantization.calib"> <span id="pytorch-quantization-calib"></span><h2>pytorch_quantization.calib<a class="headerlink" href="#module-pytorch_quantization.calib" title="Permalink to this headline"></a></h2> <p><code class="docutils literal notranslate"><span class="pre">pytorch_quantization.calib</span></code> provides Calibrator classes that collect data statistics and determine pytorch_quantization parameters.</p> <div class="section" id="maxcalibrator"> <h3><span class="hidden-section">MaxCalibrator</span><a class="headerlink" href="#maxcalibrator" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.calib.MaxCalibrator"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.calib.</span></span><span class="sig-name descname"><span class="pre">MaxCalibrator</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">num_bits</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">axis</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">unsigned</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">track_amax</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.calib.MaxCalibrator" title="Permalink to this definition"></a></dt> <dd><p>Max calibrator, tracks the maximum value globally</p> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li><p><strong>calib_desc</strong> – A MaxCalibDescriptor.</p></li> <li><p><strong>num_bits</strong> – An integer. Number of bits of quantization.</p></li> <li><p><strong>axis</strong> – A tuple. see QuantDescriptor.</p></li> <li><p><strong>unsigned</strong> – A boolean. using unsigned quantization.</p></li> </ul> </dd> </dl> <dl class="simple"> <dt>Readonly Properties:</dt><dd><p>amaxs: A list of amax. Numpy array is saved as it is likely to be used for some plot.</p> </dd> </dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.calib.MaxCalibrator.collect"> <span class="sig-name descname"><span class="pre">collect</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.calib.MaxCalibrator.collect" title="Permalink to this definition"></a></dt> <dd><p>Tracks the absolute max of all tensors</p> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><p><strong>x</strong> – A tensor</p> </dd> <dt class="field-even">Raises</dt> <dd class="field-even"><p><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#RuntimeError" title="(in Python v3.12)"><strong>RuntimeError</strong></a> – If amax shape changes</p> </dd> </dl> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.calib.MaxCalibrator.compute_amax"> <span class="sig-name descname"><span class="pre">compute_amax</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.calib.MaxCalibrator.compute_amax" title="Permalink to this definition"></a></dt> <dd><p>Return the absolute max of all tensors collected</p> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.calib.MaxCalibrator.reset"> <span class="sig-name descname"><span class="pre">reset</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.calib.MaxCalibrator.reset" title="Permalink to this definition"></a></dt> <dd><p>Reset the collected absolute max</p> </dd></dl> </dd></dl> </div> <div class="section" id="histogramcalibrator"> <h3><span class="hidden-section">HistogramCalibrator</span><a class="headerlink" href="#histogramcalibrator" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.calib.HistogramCalibrator"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.calib.</span></span><span class="sig-name descname"><span class="pre">HistogramCalibrator</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">num_bits</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">axis</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">unsigned</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_bins</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">2048</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">grow_method</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">skip_zeros</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">torch_hist</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.calib.HistogramCalibrator" title="Permalink to this definition"></a></dt> <dd><p>Unified histogram calibrator</p> <dl class="simple"> <dt>Histogram will be only collected once. compute_amax() performs entropy, percentile, or mse</dt><dd><p>calibration based on arguments</p> </dd> </dl> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li><p><strong>num_bits</strong> – An integer. Number of bits of quantization.</p></li> <li><p><strong>axis</strong> – A tuple. see QuantDescriptor.</p></li> <li><p><strong>unsigned</strong> – A boolean. using unsigned quantization.</p></li> <li><p><strong>num_bins</strong> – An integer. Number of histograms bins. Default 2048.</p></li> <li><p><strong>grow_method</strong> – A string. DEPRECATED. default None.</p></li> <li><p><strong>skip_zeros</strong> – A boolean. If True, skips zeros when collecting data for histogram. Default False.</p></li> <li><p><strong>torch_hist</strong> – A boolean. If True, collect histogram by torch.histc instead of np.histogram. If input tensor is on GPU, histc will also be running on GPU. Default True.</p></li> </ul> </dd> </dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.calib.HistogramCalibrator.collect"> <span class="sig-name descname"><span class="pre">collect</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">x</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.calib.HistogramCalibrator.collect" title="Permalink to this definition"></a></dt> <dd><p>Collect histogram</p> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.calib.HistogramCalibrator.compute_amax"> <span class="sig-name descname"><span class="pre">compute_amax</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">method</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://docs.python.org/3/library/stdtypes.html#str" title="(in Python v3.12)"><span class="pre">str</span></a></span></em>, <em class="sig-param"><span class="o"><span class="pre">*</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.12)"><span class="pre">int</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">start_bin</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.12)"><span class="pre">int</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">128</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">percentile</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://docs.python.org/3/library/functions.html#float" title="(in Python v3.12)"><span class="pre">float</span></a></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">99.99</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.calib.HistogramCalibrator.compute_amax" title="Permalink to this definition"></a></dt> <dd><p>Compute the amax from the collected histogram</p> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><p><strong>method</strong> – A string. One of [‘entropy’, ‘mse’, ‘percentile’]</p> </dd> <dt class="field-even">Keyword Arguments</dt> <dd class="field-even"><ul class="simple"> <li><p><strong>stride</strong> – An integer. Default 1</p></li> <li><p><strong>start_bin</strong> – An integer. Default 128</p></li> <li><p><strong>percentils</strong> – A float number between [0, 100]. Default 99.99.</p></li> </ul> </dd> <dt class="field-odd">Returns</dt> <dd class="field-odd"><p><em>amax</em> – a tensor</p> </dd> </dl> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.calib.HistogramCalibrator.reset"> <span class="sig-name descname"><span class="pre">reset</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.calib.HistogramCalibrator.reset" title="Permalink to this definition"></a></dt> <dd><p>Reset the collected histogram</p> </dd></dl> </dd></dl> </div> </div> <span id="document-nn"></span><div class="section" id="module-pytorch_quantization.nn"> <span id="pytorch-quantization-nn"></span><h2>pytorch_quantization.nn<a class="headerlink" href="#module-pytorch_quantization.nn" title="Permalink to this headline"></a></h2> <div class="section" id="tensorquantizer"> <h3><span class="hidden-section">TensorQuantizer</span><a class="headerlink" href="#tensorquantizer" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">TensorQuantizer</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">quant_desc=<pytorch_quantization.tensor_quant.ScaledQuantDescriptor</span> <span class="pre">object></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">disabled=False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">if_quant=True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">if_clip=False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">if_calib=False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer" title="Permalink to this definition"></a></dt> <dd><p>Tensor quantizer module</p> <p>This module uses tensor_quant or fake_tensor_quant function to quantize a tensor. And wrappers variable, moving statistics we’d want when training a quantized network.</p> <dl class="simple"> <dt>Experimental features:</dt><dd><p><code class="docutils literal notranslate"><span class="pre">clip</span></code> stage learns range before enabling quantization. <code class="docutils literal notranslate"><span class="pre">calib</span></code> stage runs calibration</p> </dd> </dl> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li><p><strong>quant_desc</strong> – An instance of <a class="reference internal" href="index.html#pytorch_quantization.tensor_quant.QuantDescriptor" title="pytorch_quantization.tensor_quant.QuantDescriptor"><code class="xref py py-func docutils literal notranslate"><span class="pre">QuantDescriptor</span></code></a>.</p></li> <li><p><strong>disabled</strong> – A boolean. If True, by pass the whole module returns input. Default False.</p></li> <li><p><strong>if_quant</strong> – A boolean. If True, run main quantization body. Default True.</p></li> <li><p><strong>if_clip</strong> – A boolean. If True, clip before quantization and learn amax. Default False.</p></li> <li><p><strong>if_calib</strong> – A boolean. If True, run calibration. Not implemented yet. Settings of calibration will probably go to <a class="reference internal" href="index.html#pytorch_quantization.tensor_quant.QuantDescriptor" title="pytorch_quantization.tensor_quant.QuantDescriptor"><code class="xref py py-func docutils literal notranslate"><span class="pre">QuantDescriptor</span></code></a>.</p></li> </ul> </dd> </dl> <p>Raises:</p> <dl class="simple"> <dt>Readonly Properties:</dt><dd><ul class="simple"> <li><p>axis:</p></li> <li><p>fake_quant:</p></li> <li><p>scale:</p></li> <li><p>step_size:</p></li> </ul> </dd> <dt>Mutable Properties:</dt><dd><ul class="simple"> <li><p>num_bits:</p></li> <li><p>unsigned:</p></li> <li><p>amax:</p></li> </ul> </dd> </dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer.__init__"> <span class="sig-name descname"><span class="pre">__init__</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">quant_desc=<pytorch_quantization.tensor_quant.ScaledQuantDescriptor</span> <span class="pre">object></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">disabled=False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">if_quant=True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">if_clip=False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">if_calib=False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer.__init__" title="Permalink to this definition"></a></dt> <dd><p>Initialize quantizer and set up required variables</p> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer.disable"> <span class="sig-name descname"><span class="pre">disable</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer.disable" title="Permalink to this definition"></a></dt> <dd><p>Bypass the module</p> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer.disable_clip"> <span class="sig-name descname"><span class="pre">disable_clip</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer.disable_clip" title="Permalink to this definition"></a></dt> <dd><p>Disable clip stage</p> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer.enable_clip"> <span class="sig-name descname"><span class="pre">enable_clip</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer.enable_clip" title="Permalink to this definition"></a></dt> <dd><p>Enable clip stage</p> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer.forward"> <span class="sig-name descname"><span class="pre">forward</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inputs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer.forward" title="Permalink to this definition"></a></dt> <dd><p>Apply tensor_quant function to inputs</p> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><p><strong>inputs</strong> – A Tensor of type float32.</p> </dd> <dt class="field-even">Returns</dt> <dd class="field-even"><p><em>outputs</em> – A Tensor of type output_dtype</p> </dd> </dl> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer.init_learn_amax"> <span class="sig-name descname"><span class="pre">init_learn_amax</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer.init_learn_amax" title="Permalink to this definition"></a></dt> <dd><p>Initialize learned amax from fixed amax</p> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer.load_calib_amax"> <span class="sig-name descname"><span class="pre">load_calib_amax</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer.load_calib_amax" title="Permalink to this definition"></a></dt> <dd><p>Load amax from calibrator.</p> <p>Updates the amax buffer with value computed by the calibrator, creating it if necessary. <a href="#id1"><span class="problematic" id="id2">*</span></a>args and <a href="#id3"><span class="problematic" id="id4">**</span></a>kwargs are directly passed to compute_amax, except “strict” in kwargs. Refer to compute_amax for more details.</p> </dd></dl> </dd></dl> <div class="section" id="quantized-modules"> <h4>Quantized Modules<a class="headerlink" href="#quantized-modules" title="Permalink to this headline"></a></h4> </div> </div> <div class="section" id="quantconvnd"> <h3>_QuantConvNd<a class="headerlink" href="#quantconvnd" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.modules.quant_conv._QuantConvNd"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.modules.quant_conv.</span></span><span class="sig-name descname"><span class="pre">_QuantConvNd</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">in_channels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">out_channels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dilation</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">transposed</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_padding</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">bias</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding_mode</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">quant_desc_input</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">quant_desc_weight</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.modules.quant_conv._QuantConvNd" title="Permalink to this definition"></a></dt> <dd><p>base class of quantized Conv inherited from _ConvNd</p> <p>Comments of original arguments can be found in torch.nn.modules.conv</p> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li><p><strong>quant_desc_input</strong> – An instance of <a class="reference internal" href="index.html#pytorch_quantization.tensor_quant.QuantDescriptor" title="pytorch_quantization.tensor_quant.QuantDescriptor"><code class="xref py py-class docutils literal notranslate"><span class="pre">QuantDescriptor</span></code></a>. Quantization descriptor of input.</p></li> <li><p><strong>quant_desc_weight</strong> – An instance of <a class="reference internal" href="index.html#pytorch_quantization.tensor_quant.QuantDescriptor" title="pytorch_quantization.tensor_quant.QuantDescriptor"><code class="xref py py-class docutils literal notranslate"><span class="pre">QuantDescriptor</span></code></a>. Quantization descriptor of weight.</p></li> </ul> </dd> <dt class="field-even">Raises</dt> <dd class="field-even"><p><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#ValueError" title="(in Python v3.12)"><strong>ValueError</strong></a> – If unsupported arguments are passed in.</p> </dd> </dl> <dl class="simple"> <dt>Readonly properties:</dt><dd><ul class="simple"> <li><p>input_quantizer:</p></li> <li><p>weight_quantizer:</p></li> </ul> </dd> <dt>Static methods:</dt><dd><ul class="simple"> <li><p>set_default_quant_desc_input: Set default_quant_desc_input</p></li> <li><p>set_default_quant_desc_weight: Set default_quant_desc_weight</p></li> </ul> </dd> </dl> </dd></dl> </div> <div class="section" id="quantconv1d"> <h3><span class="hidden-section">QuantConv1d</span><a class="headerlink" href="#quantconv1d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantConv1d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantConv1d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">in_channels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">out_channels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dilation</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">bias</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding_mode</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'zeros'</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantConv1d" title="Permalink to this definition"></a></dt> <dd><p>Quantized 1D Conv</p> </dd></dl> </div> <div class="section" id="quantconv2d"> <h3><span class="hidden-section">QuantConv2d</span><a class="headerlink" href="#quantconv2d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantConv2d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantConv2d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">in_channels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">out_channels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dilation</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">bias</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding_mode</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'zeros'</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantConv2d" title="Permalink to this definition"></a></dt> <dd><p>Quantized 2D conv</p> </dd></dl> </div> <div class="section" id="quantconv3d"> <h3><span class="hidden-section">QuantConv3d</span><a class="headerlink" href="#quantconv3d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantConv3d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantConv3d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">in_channels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">out_channels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dilation</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">bias</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding_mode</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'zeros'</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantConv3d" title="Permalink to this definition"></a></dt> <dd><p>Quantized 3D Conv</p> </dd></dl> </div> <div class="section" id="quantconvtranspose1d"> <h3><span class="hidden-section">QuantConvTranspose1d</span><a class="headerlink" href="#quantconvtranspose1d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantConvTranspose1d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantConvTranspose1d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">in_channels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">out_channels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">bias</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dilation</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding_mode</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'zeros'</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantConvTranspose1d" title="Permalink to this definition"></a></dt> <dd><p>Quantized ConvTranspose1d</p> </dd></dl> </div> <div class="section" id="quantconvtranspose2d"> <h3><span class="hidden-section">QuantConvTranspose2d</span><a class="headerlink" href="#quantconvtranspose2d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantConvTranspose2d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantConvTranspose2d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">in_channels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">out_channels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">bias</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dilation</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding_mode</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'zeros'</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantConvTranspose2d" title="Permalink to this definition"></a></dt> <dd><p>Quantized ConvTranspose2d</p> </dd></dl> </div> <div class="section" id="quantconvtranspose3d"> <h3><span class="hidden-section">QuantConvTranspose3d</span><a class="headerlink" href="#quantconvtranspose3d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantConvTranspose3d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantConvTranspose3d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">in_channels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">out_channels</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">groups</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">bias</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dilation</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding_mode</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">'zeros'</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantConvTranspose3d" title="Permalink to this definition"></a></dt> <dd><p>Quantized ConvTranspose3d</p> </dd></dl> </div> <div class="section" id="quantlinear"> <h3><span class="hidden-section">QuantLinear</span><a class="headerlink" href="#quantlinear" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantLinear"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantLinear</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">in_features</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">out_features</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">bias</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantLinear" title="Permalink to this definition"></a></dt> <dd><p>Quantized version of nn.Linear</p> <p>Apply quantized linear to the incoming data, y = dequant(quant(x)quant(A)^T + b).</p> <p>Keep Module name “Linear” instead of “QuantLinear” so that it can be easily dropped into preexisting model and load pretrained weights. An alias “QuantLinear” is defined below. The base code is a copy of nn.Linear, see detailed comment of original arguments there.</p> <p>Quantization descriptors are passed in in kwargs. If not presents, default_quant_desc_input and default_quant_desc_weight are used.</p> <dl class="field-list simple"> <dt class="field-odd">Keyword Arguments</dt> <dd class="field-odd"><ul class="simple"> <li><p><strong>quant_desc_input</strong> – An instance of <a class="reference internal" href="index.html#pytorch_quantization.tensor_quant.QuantDescriptor" title="pytorch_quantization.tensor_quant.QuantDescriptor"><code class="xref py py-class docutils literal notranslate"><span class="pre">QuantDescriptor</span></code></a>. Quantization descriptor of input.</p></li> <li><p><strong>quant_desc_wegiht</strong> – An instance of <a class="reference internal" href="index.html#pytorch_quantization.tensor_quant.QuantDescriptor" title="pytorch_quantization.tensor_quant.QuantDescriptor"><code class="xref py py-class docutils literal notranslate"><span class="pre">QuantDescriptor</span></code></a>. Quantization descriptor of weight.</p></li> </ul> </dd> <dt class="field-even">Raises</dt> <dd class="field-even"><ul class="simple"> <li><p><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#ValueError" title="(in Python v3.12)"><strong>ValueError</strong></a> – If unsupported arguments are passed in.</p></li> <li><p><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#KeyError" title="(in Python v3.12)"><strong>KeyError</strong></a> – If unsupported kwargs are passed in.</p></li> </ul> </dd> </dl> <dl class="simple"> <dt>Readonly properties:</dt><dd><ul class="simple"> <li><p>input_quantizer:</p></li> <li><p>weight_quantizer:</p></li> </ul> </dd> <dt>Static methods:</dt><dd><ul class="simple"> <li><p>set_default_quant_desc_input: Set default_quant_desc_input</p></li> <li><p>set_default_quant_desc_weight: Set default_quant_desc_weight</p></li> </ul> </dd> </dl> </dd></dl> </div> <div class="section" id="quantmaxpool1d"> <h3><span class="hidden-section">QuantMaxPool1d</span><a class="headerlink" href="#quantmaxpool1d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantMaxPool1d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantMaxPool1d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">kernel_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dilation</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">return_indices</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">ceil_mode</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantMaxPool1d" title="Permalink to this definition"></a></dt> <dd><p>Quantized 1D maxpool</p> </dd></dl> </div> <div class="section" id="quantmaxpool2d"> <h3><span class="hidden-section">QuantMaxPool2d</span><a class="headerlink" href="#quantmaxpool2d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantMaxPool2d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantMaxPool2d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">kernel_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dilation</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">return_indices</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">ceil_mode</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantMaxPool2d" title="Permalink to this definition"></a></dt> <dd><p>Quantized 2D maxpool</p> </dd></dl> </div> <div class="section" id="quantmaxpool3d"> <h3><span class="hidden-section">QuantMaxPool3d</span><a class="headerlink" href="#quantmaxpool3d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantMaxPool3d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantMaxPool3d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">kernel_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">dilation</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">return_indices</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">ceil_mode</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantMaxPool3d" title="Permalink to this definition"></a></dt> <dd><p>Quantized 3D maxpool</p> </dd></dl> </div> <div class="section" id="quantavgpool1d"> <h3><span class="hidden-section">QuantAvgPool1d</span><a class="headerlink" href="#quantavgpool1d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantAvgPool1d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantAvgPool1d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">kernel_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">ceil_mode</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">count_include_pad</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantAvgPool1d" title="Permalink to this definition"></a></dt> <dd><p>Quantized 1D average pool</p> </dd></dl> </div> <div class="section" id="quantavgpool2d"> <h3><span class="hidden-section">QuantAvgPool2d</span><a class="headerlink" href="#quantavgpool2d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantAvgPool2d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantAvgPool2d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">kernel_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">ceil_mode</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">count_include_pad</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">divisor_override</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantAvgPool2d" title="Permalink to this definition"></a></dt> <dd><p>Quantized 2D average pool</p> </dd></dl> </div> <div class="section" id="quantavgpool3d"> <h3><span class="hidden-section">QuantAvgPool3d</span><a class="headerlink" href="#quantavgpool3d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantAvgPool3d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantAvgPool3d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">kernel_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">stride</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">padding</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">ceil_mode</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">count_include_pad</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">divisor_override</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantAvgPool3d" title="Permalink to this definition"></a></dt> <dd><p>Quantized 3D average pool</p> </dd></dl> </div> <div class="section" id="quantadaptiveavgpool1d"> <h3><span class="hidden-section">QuantAdaptiveAvgPool1d</span><a class="headerlink" href="#quantadaptiveavgpool1d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantAdaptiveAvgPool1d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantAdaptiveAvgPool1d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">output_size</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantAdaptiveAvgPool1d" title="Permalink to this definition"></a></dt> <dd><p>Quantized 1D adaptive average pool</p> </dd></dl> </div> <div class="section" id="quantadaptiveavgpool2d"> <h3><span class="hidden-section">QuantAdaptiveAvgPool2d</span><a class="headerlink" href="#quantadaptiveavgpool2d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantAdaptiveAvgPool2d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantAdaptiveAvgPool2d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">output_size</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantAdaptiveAvgPool2d" title="Permalink to this definition"></a></dt> <dd><p>Quantized 2D adaptive average pool</p> </dd></dl> </div> <div class="section" id="quantadaptiveavgpool3d"> <h3><span class="hidden-section">QuantAdaptiveAvgPool3d</span><a class="headerlink" href="#quantadaptiveavgpool3d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantAdaptiveAvgPool3d"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantAdaptiveAvgPool3d</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">output_size</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantAdaptiveAvgPool3d" title="Permalink to this definition"></a></dt> <dd><p>Quantized 3D adaptive average pool</p> </dd></dl> </div> <div class="section" id="clip"> <h3><span class="hidden-section">Clip</span><a class="headerlink" href="#clip" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.Clip"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">Clip</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">clip_value_min</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">clip_value_max</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">learn_min</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">learn_max</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.Clip" title="Permalink to this definition"></a></dt> <dd><p>Clip tensor</p> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li><p><strong>clip_value_min</strong> – A number or tensor of lower bound to clip</p></li> <li><p><strong>clip_value_max</strong> – A number of tensor of upper bound to clip</p></li> <li><p><strong>learn_min</strong> – A boolean. If True, learn min. clip_value_min will be used to initialize. Default False</p></li> <li><p><strong>learn_max</strong> – A boolean. Similar as learn_min but for max.</p></li> </ul> </dd> <dt class="field-even">Raises</dt> <dd class="field-even"><p><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#ValueError" title="(in Python v3.12)"><strong>ValueError</strong></a> – </p> </dd> </dl> </dd></dl> </div> <div class="section" id="quantlstm"> <h3><span class="hidden-section">QuantLSTM</span><a class="headerlink" href="#quantlstm" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantLSTM"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantLSTM</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantLSTM" title="Permalink to this definition"></a></dt> <dd><p>Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.</p> </dd></dl> </div> <div class="section" id="quantlstmcell"> <h3><span class="hidden-section">QuantLSTMCell</span><a class="headerlink" href="#quantlstmcell" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantLSTMCell"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.</span></span><span class="sig-name descname"><span class="pre">QuantLSTMCell</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">input_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">hidden_size</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">bias</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.QuantLSTMCell" title="Permalink to this definition"></a></dt> <dd><p>A long short-term memory (LSTM) cell.</p> </dd></dl> </div> </div> <span id="document-functional"></span><div class="section" id="module-pytorch_quantization.nn.functional"> <span id="pytorch-quantization-nn-functional"></span><h2>pytorch_quantization.nn.functional<a class="headerlink" href="#module-pytorch_quantization.nn.functional" title="Permalink to this headline"></a></h2> <p>Some supportive functions</p> <div class="section" id="clipfunction"> <h3><span class="hidden-section">ClipFunction</span><a class="headerlink" href="#clipfunction" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.functional.ClipFunction"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.nn.functional.</span></span><span class="sig-name descname"><span class="pre">ClipFunction</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.nn.functional.ClipFunction" title="Permalink to this definition"></a></dt> <dd><p>An universal tensor clip function</p> <p>Pytorch’s clamp() only supports scalar range and doesn’t support broadcast. This implementation uses min/max which is more genaral. The gradient is defined according to IBM’s PACT paper <a class="reference external" href="https://arxiv.org/abs/1805.06085">https://arxiv.org/abs/1805.06085</a>, which is also the behavior of Tensorflow’s clip_by_value()</p> </dd></dl> <p><code class="docutils literal notranslate"><span class="pre">clip</span></code> is alias of <code class="docutils literal notranslate"><span class="pre">ClipFunction.apply</span></code></p> </div> </div> <span id="document-optim"></span><div class="section" id="module-pytorch_quantization.optim.helper"> <span id="pytorch-quantization-optim-helper"></span><h2>pytorch_quantization.optim.helper<a class="headerlink" href="#module-pytorch_quantization.optim.helper" title="Permalink to this headline"></a></h2> <p>Helper functions for quant optimizer/trainer</p> <dl class="py function"> <dt class="sig sig-object py" id="pytorch_quantization.optim.helper.freeze_parameters"> <span class="sig-prename descclassname"><span class="pre">pytorch_quantization.optim.helper.</span></span><span class="sig-name descname"><span class="pre">freeze_parameters</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">model</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">patterns</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.optim.helper.freeze_parameters" title="Permalink to this definition"></a></dt> <dd><p>Set requires_grad to False if patterns match name</p> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li><p><strong>model</strong> – A Module</p></li> <li><p><strong>patterns</strong> – A list of strings that will be used to match parameter names. If parameter name contains any pattern, it will be frozen.</p></li> </ul> </dd> </dl> </dd></dl> <dl class="py function"> <dt class="sig sig-object py" id="pytorch_quantization.optim.helper.group_parameters"> <span class="sig-prename descclassname"><span class="pre">pytorch_quantization.optim.helper.</span></span><span class="sig-name descname"><span class="pre">group_parameters</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">model</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">patterns_list</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">lrs</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">momentums</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">weight_decays</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.optim.helper.group_parameters" title="Permalink to this definition"></a></dt> <dd><p>Group parameters for using per-parameters option in optimizer</p> <p>Returns a list of dict that matches Pytorch optimizer fashion, see <a class="reference external" href="https://pytorch.org/docs/stable/optim.html#per-parameter-options">https://pytorch.org/docs/stable/optim.html#per-parameter-options</a> for more details.</p> <div class="admonition-example admonition"> <p class="admonition-title">Example</p> <div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="p">[</span> <span class="gp">>>> </span> <span class="p">{</span><span class="s1">'params'</span><span class="p">:</span> <span class="n">model</span><span class="o">.</span><span class="n">base</span><span class="o">.</span><span class="n">parameters</span><span class="p">()},</span> <span class="gp">>>> </span> <span class="p">{</span><span class="s1">'params'</span><span class="p">:</span> <span class="n">model</span><span class="o">.</span><span class="n">classifier</span><span class="o">.</span><span class="n">parameters</span><span class="p">(),</span> <span class="s1">'lr'</span><span class="p">:</span> <span class="mf">1e-3</span><span class="p">}</span> <span class="gp">>>> </span><span class="p">]</span> </pre></div> </div> </div> <p>Parameters will be grouped w.r.t first level of the keys_list. e.g. <cite>keys_list=[[‘conv1’, ‘conv2’], [‘conv3’]]</cite> will return 2 groups, one with <cite>conv1</cite> and <cite>conv2</cite> in name, and the other with <cite>conv3</cite> in name.</p> <p>If lr, momentum or weight_decay are supplied, they will be added to the group as well.</p> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li><p><strong>model</strong> – A module</p></li> <li><p><strong>patterns_list</strong> – A list of list of strings. WARNING: patters must be EXCLUSIVE, the function doesn’t perform exclusive check.</p></li> <li><p><strong>lrs</strong> – A list of float with same length as keys_list or None.</p></li> <li><p><strong>momentums</strong> – A list of float with same length as keys_list or None.</p></li> <li><p><strong>weight_decays</strong> – A list of float with same length as keys_list or None.</p></li> </ul> </dd> <dt class="field-even">Returns</dt> <dd class="field-even"><p><em>param_group</em> – A list of dict</p> </dd> </dl> </dd></dl> <dl class="py function"> <dt class="sig sig-object py" id="pytorch_quantization.optim.helper.match_parameters"> <span class="sig-prename descclassname"><span class="pre">pytorch_quantization.optim.helper.</span></span><span class="sig-name descname"><span class="pre">match_parameters</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">model</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">patterns</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.optim.helper.match_parameters" title="Permalink to this definition"></a></dt> <dd><p>Returns an generator over module parameters if name matches key</p> <p>It is useful to group parameters, and apply different functions to different group. This function provides an easy way to group them.</p> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li><p><strong>model</strong> – A Module</p></li> <li><p><strong>patterns</strong> – A list of strings that will be used to match parameter names. If parameter name contains any pattern, it will be yield</p></li> </ul> </dd> <dt class="field-even">Yields</dt> <dd class="field-even"><p><em>param</em> – Module parameters</p> </dd> </dl> </dd></dl> <dl class="py function"> <dt class="sig sig-object py" id="pytorch_quantization.optim.helper.quant_weight_inplace"> <span class="sig-prename descclassname"><span class="pre">pytorch_quantization.optim.helper.</span></span><span class="sig-name descname"><span class="pre">quant_weight_inplace</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">model</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.optim.helper.quant_weight_inplace" title="Permalink to this definition"></a></dt> <dd><p>Make quantization inplace</p> <p>Search for quantized modules including QuantConvNd and QuantLinear, make weight quantization in place using weight_quantizer.</p> <p>Most publications of quantization aware training uses STE by default, which is really an approximation of derivative of the nondifferentiable quantization function, which works to some extended but by no means the F=ma of the problem. Inplace quantization can be used to implement relax-and-round, which is a common method in Discrete Optimization’s or Integer Programming.</p> </dd></dl> </div> <span id="document-tensor_quant"></span><div class="section" id="module-pytorch_quantization.tensor_quant"> <span id="pytorch-quantization-tensor-quant"></span><h2>pytorch_quantization.tensor_quant<a class="headerlink" href="#module-pytorch_quantization.tensor_quant" title="Permalink to this headline"></a></h2> <p>Basic tensor quantization functions</p> <div class="section" id="quantdescriptor"> <h3><span class="hidden-section">QuantDescriptor</span><a class="headerlink" href="#quantdescriptor" title="Permalink to this headline"></a></h3> <dl class="py attribute"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.QuantDescriptor"> <span class="sig-prename descclassname"><span class="pre">pytorch_quantization.tensor_quant.</span></span><span class="sig-name descname"><span class="pre">QuantDescriptor</span></span><a class="headerlink" href="#pytorch_quantization.tensor_quant.QuantDescriptor" title="Permalink to this definition"></a></dt> <dd><p>alias of <a class="reference internal" href="#pytorch_quantization.tensor_quant.ScaledQuantDescriptor" title="pytorch_quantization.tensor_quant.ScaledQuantDescriptor"><code class="xref py py-class docutils literal notranslate"><span class="pre">pytorch_quantization.tensor_quant.ScaledQuantDescriptor</span></code></a></p> </dd></dl> </div> <div class="section" id="scaledquantdescriptor"> <h3><span class="hidden-section">ScaledQuantDescriptor</span><a class="headerlink" href="#scaledquantdescriptor" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.ScaledQuantDescriptor"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.tensor_quant.</span></span><span class="sig-name descname"><span class="pre">ScaledQuantDescriptor</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">num_bits</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">8</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.tensor_quant.ScaledQuantDescriptor" title="Permalink to this definition"></a></dt> <dd><p>Supportive descriptor of quantization</p> <p>Describe how a tensor should be quantized. A QuantDescriptor and a tensor defines a quantized tensor.</p> <dl class="field-list"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul> <li><p><strong>num_bits</strong> – An integer or a tuple of two integers. Specifically, <cite>num_bits</cite> can be:</p> <ol class="arabic simple"> <li><dl class="simple"> <dt>A positive integer argument for integer qunatization. <cite>num_bits</cite> specify</dt><dd><p>the number of bits used for integer quantization.</p> </dd> </dl> </li> <li><dl class="simple"> <dt>Constant integer tuple (4,3) for E4M3 floating point quantization emulating</dt><dd><p>Nvidia’s FP8 quantization. E4M3 quantization only supports per-tensor quantization.</p> </dd> </dl> </li> </ol> <p>Default: 8.</p> </li> <li><p><strong>name</strong> – Seems a nice thing to have</p></li> </ul> </dd> <dt class="field-even">Keyword Arguments</dt> <dd class="field-even"><ul class="simple"> <li><p><strong>fake_quant</strong> – A boolean. If True, use fake quantization mode. Default True.</p></li> <li><p><strong>axis</strong> – None, int or tuple of int. axes which will have its own max for computing scaling factor. If None (the default), use per tensor scale. Must be in the range [-rank(input_tensor), rank(input_tensor)). e.g. For a KCRS weight tensor, quant_axis=(0) will yield per channel scaling. Default None.</p></li> <li><p><strong>amax</strong> – A float or list/ndarray of floats of user specified absolute max range. If supplied, ignore quant_axis and use this to quantize. If learn_amax is True, will be used to initialize learnable amax. Default None.</p></li> <li><p><strong>learn_amax</strong> – A boolean. If True, learn amax. Default False.</p></li> <li><p><strong>scale_amax</strong> – A float. If supplied, multiply amax by scale_amax. Default None. It is useful for some quick experiment.</p></li> <li><p><strong>calib_method</strong> – A string. One of [“max”, “histogram”] indicates which calibration to use. Except the simple max calibration, other methods are all hisogram based. Default “max”.</p></li> <li><p><strong>unsigned</strong> – A Boolean. If True, use unsigned. Default False.</p></li> </ul> </dd> <dt class="field-odd">Raises</dt> <dd class="field-odd"><p><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#TypeError" title="(in Python v3.12)"><strong>TypeError</strong></a> – If unsupported type is passed in.</p> </dd> </dl> <dl class="simple"> <dt>Read-only properties:</dt><dd><ul class="simple"> <li><p>fake_quant:</p></li> <li><p>name:</p></li> <li><p>learn_amax:</p></li> <li><p>scale_amax:</p></li> <li><p>axis:</p></li> <li><p>calib_method:</p></li> <li><p>num_bits:</p></li> <li><p>amax:</p></li> <li><p>unsigned:</p></li> </ul> </dd> </dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.ScaledQuantDescriptor.dict"> <span class="sig-name descname"><span class="pre">dict</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.tensor_quant.ScaledQuantDescriptor.dict" title="Permalink to this definition"></a></dt> <dd><p>Serialize to dict</p> <p>The build-in __dict__ method returns all the attributes, which includes those have default value and have protected prefix “_”. This method only returns those have values other than the default one and don’t have _ in key. Construct a instance by dict returned by this method should get exactly the same instance.</p> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.ScaledQuantDescriptor.from_yaml"> <em class="property"><span class="pre">classmethod</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">from_yaml</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">yaml_str</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.tensor_quant.ScaledQuantDescriptor.from_yaml" title="Permalink to this definition"></a></dt> <dd><p>Create descriptor from yaml str</p> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.ScaledQuantDescriptor.to_yaml"> <span class="sig-name descname"><span class="pre">to_yaml</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.tensor_quant.ScaledQuantDescriptor.to_yaml" title="Permalink to this definition"></a></dt> <dd><p>Create yaml serialization Some attributes need special treatment to have human readable form, including amax, axis.</p> </dd></dl> </dd></dl> </div> <div class="section" id="tensorquantfunction"> <h3><span class="hidden-section">TensorQuantFunction</span><a class="headerlink" href="#tensorquantfunction" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.TensorQuantFunction"> <em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">pytorch_quantization.tensor_quant.</span></span><span class="sig-name descname"><span class="pre">TensorQuantFunction</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.tensor_quant.TensorQuantFunction" title="Permalink to this definition"></a></dt> <dd><p>A universal tensor quantization function</p> <p>Take an input tensor, output an quantized tensor. The granularity of scale can be interpreted from the shape of amax. output_dtype indicates whether the quantized value will be stored in integer or float. The reason we want to store it in float is the pytorch function takes the quantized value may not accept integer input, e.g. Conv2D.</p> <p>It uses 2^num_bits -1 values instead of 2^num_bits. e.g., for num_bits=8, it uses [-127, 127] instead of [-128, 127]</p> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.TensorQuantFunction.backward"> <em class="property"><span class="pre">static</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">backward</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">ctx</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">grad_outputs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">grad_scale</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.tensor_quant.TensorQuantFunction.backward" title="Permalink to this definition"></a></dt> <dd><p>Implements straight through estimation with clipping. For -amax <= input <= amax the gradient passes straight through, otherwise the gradient is zero.</p> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li><p><strong>ctx</strong> – A Context object with saved tensors from forward.</p></li> <li><p><strong>grad_outputs</strong> – A tensor of gradient of outputs.</p></li> <li><p><strong>grad_scale</strong> – A tensor of gradient of scale.</p></li> </ul> </dd> <dt class="field-even">Returns</dt> <dd class="field-even"><p><em>grad_inputs</em> – A tensor of gradient.</p> </dd> </dl> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.TensorQuantFunction.forward"> <em class="property"><span class="pre">static</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">forward</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">ctx</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">inputs</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">amax</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">num_bits</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">8</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">unsigned</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">narrow_range</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.tensor_quant.TensorQuantFunction.forward" title="Permalink to this definition"></a></dt> <dd><p>Follow tensorflow convention, max value is passed in and used to decide scale, instead of inputing scale directly. Though inputing scale directly may be more natural to use.</p> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li><p><strong>ctx</strong> – A Context object to store tensors for backward.</p></li> <li><p><strong>inputs</strong> – A Tensor of type float32.</p></li> <li><p><strong>amax</strong> – A Tensor of type float32. Inputs will be quantized within range [-amax, amax] amax will be broadcasted to inputs tensor.</p></li> <li><p><strong>num_bits</strong> – A integer used to calculate scaling factor, scale = (2^(num_bits-1) - 1) / max Effectively, it indicates how many integer bits is used to represent the value. Default 8.</p></li> <li><p><strong>output_dtype</strong> – A type of Tensor. torch.int32 or torch.float32.</p></li> <li><p><strong>unsigned</strong> – A boolean. Use unsigned integer range. E.g. [0, 255] for num_bits=8. Default False.</p></li> <li><p><strong>narrow_range</strong> – A boolean. Use symmetric integer range for signed quantization E.g. [-127,127] instead of [-128,127] for num_bits=8. Default True.</p></li> </ul> </dd> <dt class="field-even">Returns</dt> <dd class="field-even"><p><em>outputs</em> – A Tensor of type output_dtype. scale: A Tensor of type float32. outputs / scale will dequantize outputs tensor.</p> </dd> <dt class="field-odd">Raises</dt> <dd class="field-odd"><p><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#ValueError" title="(in Python v3.12)"><strong>ValueError</strong></a> – </p> </dd> </dl> </dd></dl> </dd></dl> <p><code class="docutils literal notranslate"><span class="pre">tensor_quant</span></code> is alias of <code class="docutils literal notranslate"><span class="pre">TensorQuantFunction.apply</span></code></p> <p><code class="docutils literal notranslate"><span class="pre">fake_tensor_quant</span></code> is alias of <code class="docutils literal notranslate"><span class="pre">FakeTensorQuantFunction.apply</span></code></p> </div> </div> <span id="document-utils"></span><div class="section" id="pytorch-quantization-utils"> <h2>pytorch_quantization.utils<a class="headerlink" href="#pytorch-quantization-utils" title="Permalink to this headline"></a></h2> <div class="section" id="module-pytorch_quantization.utils.quant_logging"> <span id="pytorch-quantization-utils-quant-logging"></span><h3>pytorch_quantization.utils.quant_logging<a class="headerlink" href="#module-pytorch_quantization.utils.quant_logging" title="Permalink to this headline"></a></h3> <p>A WAR for codes that messes up logging format</p> <dl class="py function"> <dt class="sig sig-object py" id="pytorch_quantization.utils.quant_logging.reset_logger_handler"> <span class="sig-prename descclassname"><span class="pre">pytorch_quantization.utils.quant_logging.</span></span><span class="sig-name descname"><span class="pre">reset_logger_handler</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.utils.quant_logging.reset_logger_handler" title="Permalink to this definition"></a></dt> <dd><p>Remove all handler in root logger</p> </dd></dl> </div> <div class="section" id="module-pytorch_quantization.utils.reduce_amax"> <span id="pytorch-quantization-utils-reduce-amax"></span><h3>pytorch_quantization.utils.reduce_amax<a class="headerlink" href="#module-pytorch_quantization.utils.reduce_amax" title="Permalink to this headline"></a></h3> <p>Function to get absolute maximum of a tensor Follow numpy fashion, which is more generic as pytorch’s</p> <dl class="py function"> <dt class="sig sig-object py" id="pytorch_quantization.utils.reduce_amax.reduce_amax"> <span class="sig-prename descclassname"><span class="pre">pytorch_quantization.utils.reduce_amax.</span></span><span class="sig-name descname"><span class="pre">reduce_amax</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">input</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">axis</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">keepdims</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#pytorch_quantization.utils.reduce_amax.reduce_amax" title="Permalink to this definition"></a></dt> <dd><p>Compute the absolute maximum value of a tensor.</p> <p>Reduces input_tensor along the dimensions given in axis. Unless keepdims is true, the rank of the tensor is reduced by 1 for each entry in axis. If keepdims is true, the reduced dimensions are retained with length 1.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>Gradient computeation is disabled as this function is never meant learning reduces amax</p> </div> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li><p><strong>input</strong> – Input tensor</p></li> <li><p><strong>axis</strong> – The dimensions to reduce. None or int or tuple of ints. If None (the default), reduces all dimensions. Must be in the range [-rank(input_tensor), rank(input_tensor)).</p></li> <li><p><strong>keepdims</strong> – A boolean. If true, retains reduced dimensions with length 1. Default True</p></li> <li><p><strong>granularity</strong> – DEPRECTED. specifies if the statistic has to be calculated at tensor or channel granularity</p></li> </ul> </dd> <dt class="field-even">Returns</dt> <dd class="field-even"><p>The reduced tensor.</p> </dd> <dt class="field-odd">Raises</dt> <dd class="field-odd"><ul class="simple"> <li><p><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#ValueError" title="(in Python v3.12)"><strong>ValueError</strong></a> – Any axis which doesn’t make sense or is not supported</p></li> <li><p><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#ValueError" title="(in Python v3.12)"><strong>ValueError</strong></a> – If unknown granularity is passed in.</p></li> </ul> </dd> </dl> </dd></dl> </div> </div> </div> </div> <div class="section" id="indices"> <h1>Indices<a class="headerlink" href="#indices" title="Permalink to this headline"></a></h1> <ul class="simple"> <li><p><a class="reference internal" href="genindex.html"><span class="std std-ref">Index</span></a></p></li> </ul> </div> </div> </div> <footer> <hr/> <div role="contentinfo"> <jinja2.runtime.BlockReference object at 0x7f7c54692e20> <div class="footer"> <p> Copyright © 2024 NVIDIA Corporation </p> <p> <a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/privacy-policy/" target="_blank" rel="noopener" data-cms-ai="0">Privacy Policy</a> | <a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/privacy-center/" target="_blank" rel="noopener" data-cms-ai="0">Manage My Privacy</a> | <a class="Link" href="https://www.nvidia.com/en-us/preferences/start/" target="_blank" rel="noopener" data-cms-ai="0">Do Not Sell or Share My Data</a> | <a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/terms-of-service/" target="_blank" rel="noopener" data-cms-ai="0">Terms of Service</a> | <a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/accessibility/" target="_blank" rel="noopener" data-cms-ai="0">Accessibility</a> | <a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/company-policies/" target="_blank" rel="noopener" data-cms-ai="0">Corporate Policies</a> | <a class="Link" href="https://www.nvidia.com/en-us/product-security/" target="_blank" rel="noopener" data-cms-ai="0">Product Security</a> | <a class="Link" href="https://www.nvidia.com/en-us/contact/" target="_blank" rel="noopener" data-cms-ai="0">Contact</a> </p> </div> </div> </footer> </div> </div> </section> </div> <script> jQuery(function () { SphinxRtdTheme.Navigation.enable(true); }); </script> <script type="text/javascript">_satellite.pageBottom();</script> </body> </html>