CINXE.COM

<!DOCTYPE html> <html class="writer-html5" lang="en" > <head> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>pytorch-quantization master documentation</title> <link rel="stylesheet" href="_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="_static/css/theme.css" type="text/css" />  <script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script> <script src="_static/jquery.js"></script> <script src="_static/underscore.js"></script> <script src="_static/doctools.js"></script> <script src="_static/js/theme.js"></script> <link rel="index" title="Index" href="genindex.html" /> <link rel="search" title="Search" href="search.html" /> <script src="//assets.adobedtm.com/5d4962a43b79/c1061d2c5e7b/launch-191c2462b890.min.js"></script> </head> <body class="wy-body-for-nav"> <div class="wy-grid-for-nav"> <nav data-toggle="wy-nav-shift" class="wy-nav-side"> <div class="wy-side-scroll"> <div class="wy-side-nav-search" > <a href="#" class="icon icon-home"> pytorch-quantization </a> <div class="version"> 2.2.1 </div> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> User Guide <ul> <li class="toctree-l1"><a class="reference internal" href="index.html#document-userguide">Basic Functionalities</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#post-training-quantization">Post training quantization</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#quantization-aware-training">Quantization Aware Training</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#export-to-onnx">Export to ONNX</a></li> </ul> Tutorials <ul> <li class="toctree-l1"><a class="reference internal" href="index.html#document-tutorials/quant_resnet50">Quantizing Resnet50</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#document-tutorials/creating_custom_quantized_modules">Creating Custom Quantized Modules</a></li> </ul> Package Reference <ul> <li class="toctree-l1"><a class="reference internal" href="index.html#document-calib">pytorch_quantization.calib</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#document-nn">pytorch_quantization.nn</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#document-functional">pytorch_quantization.nn.functional</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#document-optim">pytorch_quantization.optim.helper</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#document-tensor_quant">pytorch_quantization.tensor_quant</a></li> <li class="toctree-l1"><a class="reference internal" href="index.html#document-utils">pytorch_quantization.utils</a></li> </ul> </div> </div> </nav> <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <a href="#">pytorch-quantization</a> </nav> <div class="wy-nav-content"> <div class="rst-content"> <div role="navigation" aria-label="Page navigation"> <ul class="wy-breadcrumbs"> <li><a href="#" class="icon icon-home"></a> »</li> <li>pytorch-quantization master documentation</li> <li class="wy-breadcrumbs-aside"> </li> </ul> <hr/> </div> <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article"> <div itemprop="articleBody"> <div class="section" id="pytorch-quantization-s-documentation"> <h1>pytorch-quantization’s documentation<a class="headerlink" href="#pytorch-quantization-s-documentation" title="Permalink to this headline"></a></h1> <div class="toctree-wrapper compound"> <div class="section" id="basic-functionalities"> <h2>Basic Functionalities<a class="headerlink" href="#basic-functionalities" title="Permalink to this headline"></a></h2> <div class="section" id="quantization-function"> <h3>Quantization function<a class="headerlink" href="#quantization-function" title="Permalink to this headline"></a></h3> <code class="docutils literal notranslate">tensor_quant</code> and <code class="docutils literal notranslate">fake_tensor_quant</code> are 2 basic functions to quantize a tensor. <code class="docutils literal notranslate">fake_tensor_quant</code> returns fake quantized tensor (float value). <code class="docutils literal notranslate">tensor_quant</code> returns quantized tensor (integer value) and scale. <div class="highlight-python notranslate"><div class="highlight"><pre>tensor_quant(inputs, amax, num_bits=8, output_dtype=torch.float, unsigned=False) fake_tensor_quant(inputs, amax, num_bits=8, output_dtype=torch.float, unsigned=False) </pre></div> </div> Example: <div class="highlight-python notranslate"><div class="highlight"><pre>from pytorch_quantization import tensor_quant # Generate random input. With fixed seed 12345, x should be # tensor([0.9817, 0.8796, 0.9921, 0.4611, 0.0832, 0.1784, 0.3674, 0.5676, 0.3376, 0.2119]) torch.manual_seed(12345) x = torch.rand(10) # fake quantize tensor x. fake_quant_x will be # tensor([0.9843, 0.8828, 0.9921, 0.4609, 0.0859, 0.1797, 0.3672, 0.5703, 0.3359, 0.2109]) fake_quant_x = tensor_quant.fake_tensor_quant(x, x.abs().max()) # quantize tensor x. quant_x will be # tensor([126., 113., 127., 59., 11., 23., 47., 73., 43., 27.]) # with scale=128.0057 quant_x, scale = tensor_quant.tensor_quant(x, x.abs().max()) </pre></div> </div> Backward of both functions are defined as <a class="reference external" href="https://arxiv.org/abs/1308.3432">Straight-Through Estimator (STE)</a>. </div> <div class="section" id="descriptor-and-quantizer"> <h3>Descriptor and quantizer<a class="headerlink" href="#descriptor-and-quantizer" title="Permalink to this headline"></a></h3> <code class="docutils literal notranslate">QuantDescriptor</code> defines how a tensor should be quantized. There are also some predefined <code class="docutils literal notranslate">QuantDescriptor</code>, e.g. <code class="docutils literal notranslate">QUANT_DESC_8BIT_PER_TENSOR</code> and <code class="docutils literal notranslate">QUANT_DESC_8BIT_CONV2D_WEIGHT_PER_CHANNEL</code>. <code class="docutils literal notranslate">TensorQuantizer</code> is the module for quantizing tensors and defined by <code class="docutils literal notranslate">QuantDescriptor</code>. <div class="highlight-python notranslate"><div class="highlight"><pre>from pytorch_quantization.tensor_quant import QuantDescriptor from pytorch_quantization.nn.modules.tensor_quantizer import TensorQuantizer quant_desc = QuantDescriptor(num_bits=4, fake_quant=False, axis=(0), unsigned=True) quantizer = TensorQuantizer(quant_desc) torch.manual_seed(12345) x = torch.rand(10, 9, 8, 7) quant_x = quantizer(x) </pre></div> </div> If <code class="docutils literal notranslate">amax</code> is given in the <a class="reference internal" href="index.html#pytorch_quantization.tensor_quant.QuantDescriptor" title="pytorch_quantization.tensor_quant.QuantDescriptor"><code class="xref py py-func docutils literal notranslate">QuantDescriptor</code></a>, <a class="reference internal" href="index.html#pytorch_quantization.nn.TensorQuantizer" title="pytorch_quantization.nn.TensorQuantizer"><code class="xref py py-func docutils literal notranslate">TensorQuantizer</code></a> will use it to quantize. Otherwise, <a class="reference internal" href="index.html#pytorch_quantization.nn.TensorQuantizer" title="pytorch_quantization.nn.TensorQuantizer"><code class="xref py py-func docutils literal notranslate">TensorQuantizer</code></a> will compute amax then quantize. amax will be computed w.r.t <code class="docutils literal notranslate">axis</code> specified. Note that <code class="docutils literal notranslate">axis</code> of QuantDescriptor specify remaining axis as oppsed to axis of <a class="reference external" href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.amax.html">max()</a>. </div> <div class="section" id="quantized-module"> <h3>Quantized module<a class="headerlink" href="#quantized-module" title="Permalink to this headline"></a></h3> There are 2 major types of module, <code class="docutils literal notranslate">Conv</code> and <code class="docutils literal notranslate">Linear</code>. Both can replace <code class="docutils literal notranslate">torch.nn</code> version and apply quantization on both weight and activation. Both take <code class="docutils literal notranslate">quant_desc_input</code> and <code class="docutils literal notranslate">quant_desc_weight</code> in addition to arguments of the original module. <div class="highlight-python notranslate"><div class="highlight"><pre>from torch import nn from pytorch_quantization import tensor_quant import pytorch_quantization.nn as quant_nn # pytorch's module fc1 = nn.Linear(in_features, out_features, bias=True) conv1 = nn.Conv2d(in_channels, out_channels, kernel_size) # quantized version quant_fc1 = quant_nn.Linear( in_features, out_features, bias=True, quant_desc_input=tensor_quant.QUANT_DESC_8BIT_PER_TENSOR, quant_desc_weight=tensor_quant.QUANT_DESC_8BIT_LINEAR_WEIGHT_PER_ROW) quant_conv1 = quant_nn.Conv2d( in_channels, out_channels, kernel_size, quant_desc_input=tensor_quant.QUANT_DESC_8BIT_PER_TENSOR, quant_desc_weight=tensor_quant.QUANT_DESC_8BIT_CONV2D_WEIGHT_PER_CHANNEL) </pre></div> </div> </div> </div> <div class="section" id="post-training-quantization"> <h2>Post training quantization<a class="headerlink" href="#post-training-quantization" title="Permalink to this headline"></a></h2> A model can be post training quantized by simply by calling <code class="docutils literal notranslate">quant_modules.initialize()</code> <div class="highlight-python notranslate"><div class="highlight"><pre>from pytorch_quantization import quant_modules model = torchvision.models.resnet50() </pre></div> </div> If a model is not entirely defined by module, than TensorQuantizer should be manually created and added to the right place in the model. <div class="section" id="calibration"> <h3>Calibration<a class="headerlink" href="#calibration" title="Permalink to this headline"></a></h3> Calibration is the TensorRT terminology of passing data samples to the quantizer and deciding the best amax for activations. We support 3 calibration methods: <ul class="simple"> <li><code class="docutils literal notranslate">max</code>: Simply use global maximum absolute value</li> <li><code class="docutils literal notranslate">entropy</code>: TensorRT’s entropy calibration</li> <li><code class="docutils literal notranslate">percentile</code>: Get rid of outlier based on given percentile.</li> <li><code class="docutils literal notranslate">mse</code>: MSE(Mean Squared Error) based calibration</li> </ul> In above ResNet50 example, calibration method is set to <code class="docutils literal notranslate">mse</code>, it can be used as the following example: <div class="highlight-python notranslate"><div class="highlight"><pre># Find the TensorQuantizer and enable calibration for name, module in model.named_modules(): if name.endswith('_quantizer'): module.enable_calib() module.disable_quant() # Use full precision data to calibrate # Feeding data samples model(x) # ... # Finalize calibration for name, module in model.named_modules(): if name.endswith('_quantizer'): module.load_calib_amax() module.disable_calib() module.enable_quant() # If running on GPU, it needs to call .cuda() again because new tensors will be created by calibration process model.cuda() # Keep running the quantized model # ... </pre></div> </div> <div class="admonition note"> Note Calibration needs to be performed before exporting the model to ONNX. </div> </div> </div> <div class="section" id="quantization-aware-training"> <h2>Quantization Aware Training<a class="headerlink" href="#quantization-aware-training" title="Permalink to this headline"></a></h2> Quantization Aware Training is based on Straight Through Estimator (STE) derivative approximation. It is some time known as “quantization aware training”. We don’t use the name because it doesn’t reflect the underneath assumption. If anything, it makes training being “unaware” of quantization because of the STE approximation. After calibration is done, Quantization Aware Training is simply select a training schedule and continue training the calibrated model. Usually, it doesn’t need to fine tune very long. We usually use around 10% of the original training schedule, starting at 1% of the initial training learning rate, and a cosine annealing learning rate schedule that follows the decreasing half of a cosine period, down to 1% of the initial fine tuning learning rate (0.01% of the initial training learning rate). <div class="section" id="some-recommendations"> <h3>Some recommendations<a class="headerlink" href="#some-recommendations" title="Permalink to this headline"></a></h3> Quantization Aware Training (Essentially a discrete numerical optimization problem) is not a solved problem mathematically. Based on our experience, here are some recommendations: <ul class="simple"> <li>For STE approximation to work well, it is better to use small learning rate. Large learning rate is more likely to enlarge the variance introduced by STE approximation and destroy the trained network.</li> <li>Do not change quantization representation (scale) during training, at least not too frequently. Changing scale every step, it is effectively like changing data format (e8m7, e5m10, e3m4, et.al) every step, which will easily affect convergence.</li> </ul> </div> </div> <div class="section" id="export-to-onnx"> <h2>Export to ONNX<a class="headerlink" href="#export-to-onnx" title="Permalink to this headline"></a></h2> The goal of exporting to ONNX is to deploy to TensorRT, not to ONNX runtime. So we only export fake quantized model into a form TensorRT will take. Fake quantization will be broken into a pair of QuantizeLinear/DequantizeLinear ONNX ops. TensorRT will take the generated ONNX graph, and execute it in int8 in the most optimized way to its capability. <div class="admonition note"> Note Currently, we only support exporting int8 and fp8 fake quantized modules. Additionally, quantized modules need to be calibrated before exporting to ONNX. </div> Fake quantized model can be exported to ONNX as any other Pytorch model. Please learn more about exporting a Pytorch model to ONNX at <a class="reference external" href="https://pytorch.org/docs/stable/onnx.html?highlight=onnx#module-torch.onnx">torch.onnx</a>. For example: <div class="highlight-python notranslate"><div class="highlight"><pre>import pytorch_quantization from pytorch_quantization import nn as quant_nn from pytorch_quantization import quant_modules quant_modules.initialize() model = torchvision.models.resnet50() # load the calibrated model state_dict = torch.load("quant_resnet50-entropy-1024.pth", map_location="cpu") model.load_state_dict(state_dict) model.cuda() dummy_input = torch.randn(128, 3, 224, 224, device='cuda') input_names = [ "actual_input_1" ] output_names = [ "output1" ] with pytorch_quantization.enable_onnx_export(): # enable_onnx_checker needs to be disabled. See notes below. torch.onnx.export( model, dummy_input, "quant_resnet50.onnx", verbose=True, opset_version=10, enable_onnx_checker=False ) </pre></div> </div> <div class="admonition note"> Note Note that <code class="docutils literal notranslate">axis</code> is added to <code class="docutils literal notranslate">QuantizeLinear</code> and <code class="docutils literal notranslate">DequantizeLinear</code> in opset13. </div> </div> </div> <div class="toctree-wrapper compound"> <div class="section" id="quantizing-resnet50"> <h2>Quantizing Resnet50<a class="headerlink" href="#quantizing-resnet50" title="Permalink to this headline"></a></h2> <div class="section" id="create-a-quantized-model"> <h3>Create a quantized model<a class="headerlink" href="#create-a-quantized-model" title="Permalink to this headline"></a></h3> Import the necessary python modules: <div class="highlight-python notranslate"><div class="highlight"><pre>import torch import torch.utils.data from torch import nn from pytorch_quantization import nn as quant_nn from pytorch_quantization import calib from pytorch_quantization.tensor_quant import QuantDescriptor from torchvision import models sys.path.append("path to torchvision/references/classification/") from train import evaluate, train_one_epoch, load_data </pre></div> </div> <div class="section" id="adding-quantized-modules"> <h4>Adding quantized modules<a class="headerlink" href="#adding-quantized-modules" title="Permalink to this headline"></a></h4> The first step is to add quantizer modules to the neural network graph. This package provides a number of quantized layer modules, which contain quantizers for inputs and weights. e.g. <code class="docutils literal notranslate">quant_nn.QuantLinear</code>, which can be used in place of <code class="docutils literal notranslate">nn.Linear</code>. These quantized layers can be substituted automatically, via monkey-patching, or by manually modifying the model definition. Automatic layer substitution is done with <code class="docutils literal notranslate">quant_modules</code>. This should be called before model creation. <div class="highlight-python notranslate"><div class="highlight"><pre>from pytorch_quantization import quant_modules quant_modules.initialize() </pre></div> </div> This will apply to all instances of each module. If you do not want all modules to be quantized you should instead substitute the quantized modules manually. Stand-alone quantizers can also be added to the model with <code class="docutils literal notranslate">quant_nn.TensorQuantizer</code>. </div> </div> <div class="section" id="post-training-quantization"> <h3>Post training quantization<a class="headerlink" href="#post-training-quantization" title="Permalink to this headline"></a></h3> For efficient inference, we want to select a fixed range for each quantizer. Starting with a pre-trained model, the simplest way to do this is by calibration. <div class="section" id="calibration"> <h4>Calibration<a class="headerlink" href="#calibration" title="Permalink to this headline"></a></h4> We will use histogram based calibration for activations and the default max calibration for weights. <div class="highlight-python notranslate"><div class="highlight"><pre>quant_desc_input = QuantDescriptor(calib_method='histogram') quant_nn.QuantConv2d.set_default_quant_desc_input(quant_desc_input) quant_nn.QuantLinear.set_default_quant_desc_input(quant_desc_input) model = models.resnet50(pretrained=True) model.cuda() </pre></div> </div> To collect activation histograms we must feed sample data in to the model. First, create ImageNet dataloaders as done in the training script. Then, enable calibration in each quantizer and feed training data in to the model. 1024 samples (2 batches of 512) should be sufficient to estimate the distribution of activations. Use training data for calibration so that validation also measures generalization of the selected ranges. <div class="highlight-python notranslate"><div class="highlight"><pre>data_path = "PATH to imagenet" batch_size = 512 traindir = os.path.join(data_path, 'train') valdir = os.path.join(data_path, 'val') dataset, dataset_test, train_sampler, test_sampler = load_data(traindir, valdir, False, False) data_loader = torch.utils.data.DataLoader( dataset, batch_size=batch_size, sampler=train_sampler, num_workers=4, pin_memory=True) data_loader_test = torch.utils.data.DataLoader( dataset_test, batch_size=batch_size, sampler=test_sampler, num_workers=4, pin_memory=True) </pre></div> </div> <div class="highlight-python notranslate"><div class="highlight"><pre> def collect_stats(model, data_loader, num_batches): """Feed data to the network and collect statistic""" # Enable calibrators for name, module in model.named_modules(): if isinstance(module, quant_nn.TensorQuantizer): if module._calibrator is not None: module.disable_quant() module.enable_calib() else: module.disable() for i, (image, _) in tqdm(enumerate(data_loader), total=num_batches): model(image.cuda()) if i >= num_batches: break # Disable calibrators for name, module in model.named_modules(): if isinstance(module, quant_nn.TensorQuantizer): if module._calibrator is not None: module.enable_quant() module.disable_calib() else: module.enable() def compute_amax(model, **kwargs): # Load calib result for name, module in model.named_modules(): if isinstance(module, quant_nn.TensorQuantizer): if module._calibrator is not None: if isinstance(module._calibrator, calib.MaxCalibrator): module.load_calib_amax() else: module.load_calib_amax(**kwargs) print(F"{name:40}: {module}") model.cuda() # It is a bit slow since we collect histograms on CPU with torch.no_grad(): collect_stats(model, data_loader, num_batches=2) compute_amax(model, method="percentile", percentile=99.99) </pre></div> </div> After calibration is done, quantizers will have <code class="docutils literal notranslate">amax</code> set, which represents the absolute maximum input value representable in the quantized space. By default, weight ranges are per channel while activation ranges are per tensor. We can see the condensed amaxes by printing each <code class="docutils literal notranslate">TensorQuantizer</code> module. <div class="highlight-default notranslate"><div class="highlight"><pre>conv1._input_quantizer : TensorQuantizer(8bit fake per-tensor amax=2.6400 calibrator=MaxCalibrator(track_amax=False) quant) conv1._weight_quantizer : TensorQuantizer(8bit fake axis=(0) amax=[0.0000, 0.7817](64) calibrator=MaxCalibrator(track_amax=False) quant) layer1.0.conv1._input_quantizer : TensorQuantizer(8bit fake per-tensor amax=6.8645 calibrator=MaxCalibrator(track_amax=False) quant) layer1.0.conv1._weight_quantizer : TensorQuantizer(8bit fake axis=(0) amax=[0.0000, 0.7266](64) calibrator=MaxCalibrator(track_amax=False) quant) ... </pre></div> </div> </div> <div class="section" id="evaluate-the-calibrated-model"> <h4>Evaluate the calibrated model<a class="headerlink" href="#evaluate-the-calibrated-model" title="Permalink to this headline"></a></h4> Next we will evaluate the classification accuracy of our post training quantized model on the ImageNet validation set. <div class="highlight-python notranslate"><div class="highlight"><pre>criterion = nn.CrossEntropyLoss() with torch.no_grad(): evaluate(model, criterion, data_loader_test, device="cuda", print_freq=20) # Save the model torch.save(model.state_dict(), "/tmp/quant_resnet50-calibrated.pth") </pre></div> </div> This should yield 76.1% top-1 accuracy, which is close to the pre-trained model accuracy of 76.2%. </div> <div class="section" id="use-different-calibration"> <h4>Use different calibration<a class="headerlink" href="#use-different-calibration" title="Permalink to this headline"></a></h4> We can try different calibrations without recollecting the histograms, and see which one gets the best accuracy. <div class="highlight-python notranslate"><div class="highlight"><pre>with torch.no_grad(): compute_amax(model, method="percentile", percentile=99.9) evaluate(model, criterion, data_loader_test, device="cuda", print_freq=20) with torch.no_grad(): for method in ["mse", "entropy"]: print(F"{method} calibration") compute_amax(model, method=method) evaluate(model, criterion, data_loader_test, device="cuda", print_freq=20) </pre></div> </div> MSE and entropy should both get over 76%. 99.9% clips too many values for resnet50 and will get slightly lower accuracy. </div> </div> <div class="section" id="quantization-aware-training"> <h3>Quantization Aware Training<a class="headerlink" href="#quantization-aware-training" title="Permalink to this headline"></a></h3> Optionally, we can fine-tune the calibrated model to improve accuracy further. <div class="highlight-python notranslate"><div class="highlight"><pre>criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=0.0001) lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1) # Training takes about one and half hour per epoch on a single V100 train_one_epoch(model, criterion, optimizer, data_loader, "cuda", 0, 100) # Save the model torch.save(model.state_dict(), "/tmp/quant_resnet50-finetuned.pth") </pre></div> </div> After one epoch of fine-tuning, we can achieve over 76.4% top-1 accuracy. Fine-tuning for more epochs with learning rate annealing can improve accuracy further. For example, fine-tuning for 15 epochs with cosine annealing starting with a learning rate of 0.001 can get over 76.7%. It should be noted that the same fine-tuning schedule will improve the accuracy of the unquantized model as well. <div class="section" id="further-optimization"> <h4>Further optimization<a class="headerlink" href="#further-optimization" title="Permalink to this headline"></a></h4> For efficient inference on TensorRT, we need know more details about the runtime optimization. TensorRT supports fusion of quantizing convolution and residual add. The new fused operator has two inputs. Let us call them conv-input and residual-input. Here the fused operator’s output precision must match the residual input precision. When there is another quantizing node after the fused operator, we can insert a pair of quantizing/dequantizing nodes between the residual-input and the Elementwise-Addition node, so that quantizing node after the Convolution node is fused with the Convolution node, and the Convolution node is completely quantized with INT8 input and output. We cannot use automatic monkey-patching to apply this optimization and we need to manually insert the quantizing/dequantizing nodes. First create a copy of resnet.py from <a class="reference external" href="https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py">https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py</a>, modify the constructor, add explicit bool flag ‘quantize’ <div class="highlight-python notranslate"><div class="highlight"><pre>def resnet50(pretrained: bool = False, progress: bool = True, quantize: bool = False, **kwargs: Any) -> ResNet: return _resnet('resnet50', Bottleneck, [3, 4, 6, 3], pretrained, progress, quantize, **kwargs) def _resnet(arch: str, block: Type[Union[BasicBlock, Bottleneck]], layers: List[int], pretrained: bool, progress: bool, quantize: bool, **kwargs: Any) -> ResNet: model = ResNet(block, layers, quantize, **kwargs) class ResNet(nn.Module): def __init__(self, block: Type[Union[BasicBlock, Bottleneck]], layers: List[int], quantize: bool = False, num_classes: int = 1000, zero_init_residual: bool = False, groups: int = 1, width_per_group: int = 64, replace_stride_with_dilation: Optional[List[bool]] = None, norm_layer: Optional[Callable[..., nn.Module]] = None) -> None: super(ResNet, self).__init__() self._quantize = quantize </pre></div> </div> When this <code class="docutils literal notranslate">self._quantize</code> flag is set to <code class="docutils literal notranslate">True</code>, we need replace all the <code class="docutils literal notranslate">nn.Conv2d</code> with <code class="docutils literal notranslate">quant_nn.QuantConv2d</code>. <div class="highlight-python notranslate"><div class="highlight"><pre>def conv3x3(in_planes: int, out_planes: int, stride: int = 1, groups: int = 1, dilation: int = 1, quantize: bool = False) -> nn.Conv2d: """3x3 convolution with padding""" if quantize: return quant_nn.QuantConv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=dilation, groups=groups, bias=False, dilation=dilation) else: return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=dilation, groups=groups, bias=False, dilation=dilation) def conv1x1(in_planes: int, out_planes: int, stride: int = 1, quantize: bool = False) -> nn.Conv2d: """1x1 convolution""" if quantize: return quant_nn.QuantConv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False) else: return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False) </pre></div> </div> The residual conv add can be find both in both <code class="docutils literal notranslate">BasicBlock</code> and <code class="docutils literal notranslate">Bottleneck</code>. We need first declare quantization node in the <code class="docutils literal notranslate">__init__</code> function. <div class="highlight-python notranslate"><div class="highlight"><pre>def __init__(self, inplanes: int, planes: int, stride: int = 1, downsample: Optional[nn.Module] = None, groups: int = 1, base_width: int = 64, dilation: int = 1, norm_layer: Optional[Callable[..., nn.Module]] = None, quantize: bool = False) -> None: # other code... self._quantize = quantize if self._quantize: self.residual_quantizer = quant_nn.TensorQuantizer(quant_nn.QuantConv2d.default_quant_desc_input) </pre></div> </div> Finally we need patch the <code class="docutils literal notranslate">forward</code> function in both <code class="docutils literal notranslate">BasicBlock</code> and <code class="docutils literal notranslate">Bottleneck</code>, inserting extra quantization/dequantization nodes here. <div class="highlight-python notranslate"><div class="highlight"><pre>def forward(self, x: Tensor) -> Tensor: # other code... if self._quantize: out += self.residual_quantizer(identity) else: out += identity out = self.relu(out) return out </pre></div> </div> The final resnet code with residual quantized can be found in <a class="reference external" href="https://github.com/NVIDIA/TensorRT/blob/master/tools/pytorch-quantization/examples/torchvision/models/classification/resnet.py">https://github.com/NVIDIA/TensorRT/blob/master/tools/pytorch-quantization/examples/torchvision/models/classification/resnet.py</a> </div> </div> </div> <div class="section" id="creating-custom-quantized-modules"> <h2>Creating Custom Quantized Modules<a class="headerlink" href="#creating-custom-quantized-modules" title="Permalink to this headline"></a></h2> There are several quantized modules provided by the quantization tool as follows: <ul class="simple"> <li>QuantConv1d, QuantConv2d, QuantConv3d, QuantConvTranspose1d, QuantConvTranspose2d, QuantConvTranspose3d</li> <li>QuantLinear</li> <li>QuantAvgPool1d, QuantAvgPool2d, QuantAvgPool3d, QuantMaxPool1d, QuantMaxPool2d, QuantMaxPool3d</li> </ul> To quantize a module, we need to quantize the input and weights if present. Following are 3 major use-cases: <ol class="arabic simple"> <li>Create quantized wrapper for modules that have only inputs</li> <li>Create quantized wrapper for modules that have inputs as well as weights.</li> <li>Directly add the <code class="docutils literal notranslate">TensorQuantizer</code> module to the inputs of an operation in the model graph.</li> </ol> The first two methods are very useful if it’s needed to automatically replace the original modules (nodes in the graph) with their quantized versions. The third method could be useful when it’s required to manually add the quantization to the model graph at very specific places (more manual, more control). Let’s see each use-case with examples below. <div class="section" id="quantizing-modules-with-only-inputs"> <h3>Quantizing Modules With Only Inputs<a class="headerlink" href="#quantizing-modules-with-only-inputs" title="Permalink to this headline"></a></h3> A suitable example would be quantizing the <code class="docutils literal notranslate">pooling</code> module variants. Essentially, we need to provide a wrapper function that takes the original module and adds the <code class="docutils literal notranslate">TensorQuantizer</code> module around it so that the input is first quantized and then fed into the original module. <ul class="simple"> <li>Create the wrapper by subclassing the original module (<code class="docutils literal notranslate">pooling.MaxPool2d</code>) along with the utilities module (<code class="docutils literal notranslate">_utils.QuantInputMixin</code>).</li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre>class QuantMaxPool2d(pooling.MaxPool2d, _utils.QuantInputMixin): </pre></div> </div> <ul class="simple"> <li>The <code class="docutils literal notranslate">__init__.py</code> function would call the original module’s init function and provide it with the corresponding arguments. There would be just one additional argument using <code class="docutils literal notranslate">**kwargs</code> which contains the quantization configuration information. The <code class="docutils literal notranslate">QuantInputMixin</code> utility contains the method <code class="docutils literal notranslate">pop_quant_desc_in_kwargs</code> which extracts this configuration information from the input or returns a default if that input is <code class="docutils literal notranslate">None</code>. Finally the <code class="docutils literal notranslate">init_quantizer</code> method is called that initializes the <code class="docutils literal notranslate">TensorQuantizer</code> module which would quantize the inputs.</li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre>def __init__(self, kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False, **kwargs): super(QuantMaxPool2d, self).__init__(kernel_size, stride, padding, dilation, return_indices, ceil_mode) quant_desc_input = _utils.pop_quant_desc_in_kwargs(self.__class__, input_only=True, **kwargs) self.init_quantizer(quant_desc_input) </pre></div> </div> <ul class="simple"> <li>After the initialization, the <code class="docutils literal notranslate">forward</code> function needs to be defined in our wrapper module that would actually quantize the inputs using the <code class="docutils literal notranslate">_input_quantizer</code> that was initialized in the <code class="docutils literal notranslate">__init__</code> function forwarding the inputs to the base module using <code class="docutils literal notranslate">super</code> call.</li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre>def forward(self, input): quant_input = self._input_quantizer(input) return super(QuantMaxPool2d, self).forward(quant_input) </pre></div> </div> <ul class="simple"> <li>Finally, we need to define a getter method for the <code class="docutils literal notranslate">_input_quantizer</code>. This could, for example, be used to disable the quantization for a particular module using <code class="docutils literal notranslate">module.input_quantizer.disable()</code> which is helpful while experimenting with different layer quantization configuration.</li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre>@property def input_quantizer(self): return self._input_quantizer </pre></div> </div> A complete quantized pooling module would look like following: <div class="highlight-python notranslate"><div class="highlight"><pre>class QuantMaxPool2d(pooling.MaxPool2d, _utils.QuantInputMixin): """Quantized 2D maxpool""" def __init__(self, kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False, **kwargs): super(QuantMaxPool2d, self).__init__(kernel_size, stride, padding, dilation, return_indices, ceil_mode) quant_desc_input = _utils.pop_quant_desc_in_kwargs(self.__class__, input_only=True, **kwargs) self.init_quantizer(quant_desc_input) def forward(self, input): quant_input = self._input_quantizer(input) return super(QuantMaxPool2d, self).forward(quant_input) @property def input_quantizer(self): return self._input_quantizer </pre></div> </div> </div> <div class="section" id="quantizing-modules-with-weights-and-inputs"> <h3>Quantizing Modules With Weights and Inputs<a class="headerlink" href="#quantizing-modules-with-weights-and-inputs" title="Permalink to this headline"></a></h3> We give an example of quantizing the <code class="docutils literal notranslate">torch.nn.Linear</code> module. It follows that the only additional change from the previous example of quantizing pooling modules is that we’d need to accomodate the quantization of weights in the Linear module. <ul class="simple"> <li>We create the quantized linear module as follows:</li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre>class QuantLinear(nn.Linear, _utils.QuantMixin): </pre></div> </div> <ul class="simple"> <li>In the <code class="docutils literal notranslate">__init__</code> function, we first use the <code class="docutils literal notranslate">pop_quant_desc_in_kwargs</code> function to extract the quantization descriptors for both inputs and weights. Second, we initialize the <code class="docutils literal notranslate">TensorQuantizer</code> modules for both inputs and weights using these quantization descriptors.</li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre>def __init__(self, in_features, out_features, bias=True, **kwargs): super(QuantLinear, self).__init__(in_features, out_features, bias) quant_desc_input, quant_desc_weight = _utils.pop_quant_desc_in_kwargs(self.__class__, **kwargs) self.init_quantizer(quant_desc_input, quant_desc_weight) </pre></div> </div> <ul class="simple"> <li>Also, override the <code class="docutils literal notranslate">forward</code> function call and pass the inputs and weights through <code class="docutils literal notranslate">_input_quantizer</code> and <code class="docutils literal notranslate">_weight_quantizer</code> respectively before passing the quantized arguments to the actual <code class="docutils literal notranslate">F.Linear</code> call. This step adds the actual input/weight <code class="docutils literal notranslate">TensorQuantizer</code> to the module and eventually the model.</li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre>def forward(self, input): quant_input = self._input_quantizer(input) quant_weight = self._weight_quantizer(self.weight) output = F.linear(quant_input, quant_weight, bias=self.bias) return output </pre></div> </div> <ul class="simple"> <li>Also similar to the <code class="docutils literal notranslate">Linear</code> module, we add the getter methods for the <code class="docutils literal notranslate">TensorQuantizer</code> modules associated with inputs/weights. This could be used to, for example, disable the quantization mechanism by calling <code class="docutils literal notranslate">module_obj.weight_quantizer.disable()</code></li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre>@property def input_quantizer(self): return self._input_quantizer @property def weight_quantizer(self): return self._weight_quantizer </pre></div> </div> <ul class="simple"> <li>With all of the above changes, the quantized Linear module would look like following:</li> </ul> <div class="highlight-python notranslate"><div class="highlight"><pre>class QuantLinear(nn.Linear, _utils.QuantMixin): def __init__(self, in_features, out_features, bias=True, **kwargs): super(QuantLinear, self).__init__(in_features, out_features, bias) quant_desc_input, quant_desc_weight = _utils.pop_quant_desc_in_kwargs(self.__class__, **kwargs) self.init_quantizer(quant_desc_input, quant_desc_weight) def forward(self, input): quant_input = self._input_quantizer(input) quant_weight = self._weight_quantizer(self.weight) output = F.linear(quant_input, quant_weight, bias=self.bias) return output @property def input_quantizer(self): return self._input_quantizer @property def weight_quantizer(self): return self._weight_quantizer </pre></div> </div> </div> <div class="section" id="directly-quantizing-inputs-in-graph"> <h3>Directly Quantizing Inputs In Graph<a class="headerlink" href="#directly-quantizing-inputs-in-graph" title="Permalink to this headline"></a></h3> It is also possible to directly quantize graph inputs without creating wrappers as explained above. Here’s an example: <div class="highlight-python notranslate"><div class="highlight"><pre>test_input = torch.randn(1, 5, 5, 5, dtype=torch.double) quantizer = TensorQuantizer(quant_nn.QuantLinear.default_quant_desc_input) quant_input = quantizer(test_input) out = F.adaptive_avg_pool2d(quant_input, 3) </pre></div> </div> Assume that there is a <code class="docutils literal notranslate">F.adaptive_avg_pool2d</code> operation in the graph and we’d like to quantize this operation. In the example above, we use <code class="docutils literal notranslate">TensorQuantizer(quant_nn.QuantLinear.default_quant_desc_input)</code> to define a quantizer that we then use to actually quantize the <code class="docutils literal notranslate">test_input</code> and then feed this quantized input to the <code class="docutils literal notranslate">F.adaptive_avg_pool2d</code> operation. Note that this quantizer is the same as the ones we used earlier while created quantized versions of torch’s modules. </div> </div> </div> <div class="toctree-wrapper compound"> <div class="section" id="module-pytorch_quantization.calib"> <h2>pytorch_quantization.calib<a class="headerlink" href="#module-pytorch_quantization.calib" title="Permalink to this headline"></a></h2> <code class="docutils literal notranslate">pytorch_quantization.calib</code> provides Calibrator classes that collect data statistics and determine pytorch_quantization parameters. <div class="section" id="maxcalibrator"> <h3>MaxCalibrator<a class="headerlink" href="#maxcalibrator" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.calib.MaxCalibrator"> class pytorch_quantization.calib.MaxCalibrator(num_bits, axis, unsigned, track_amax=False)<a class="headerlink" href="#pytorch_quantization.calib.MaxCalibrator" title="Permalink to this definition"></a></dt> <dd>Max calibrator, tracks the maximum value globally <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li>calib_desc – A MaxCalibDescriptor.</li> <li>num_bits – An integer. Number of bits of quantization.</li> <li>axis – A tuple. see QuantDescriptor.</li> <li>unsigned – A boolean. using unsigned quantization.</li> </ul> </dd> </dl> <dl class="simple"> <dt>Readonly Properties:</dt><dd>amaxs: A list of amax. Numpy array is saved as it is likely to be used for some plot. </dd> </dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.calib.MaxCalibrator.collect"> collect(x)<a class="headerlink" href="#pytorch_quantization.calib.MaxCalibrator.collect" title="Permalink to this definition"></a></dt> <dd>Tracks the absolute max of all tensors <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd">x – A tensor </dd> <dt class="field-even">Raises</dt> <dd class="field-even"><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#RuntimeError" title="(in Python v3.12)">RuntimeError</a> – If amax shape changes </dd> </dl> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.calib.MaxCalibrator.compute_amax"> compute_amax()<a class="headerlink" href="#pytorch_quantization.calib.MaxCalibrator.compute_amax" title="Permalink to this definition"></a></dt> <dd>Return the absolute max of all tensors collected </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.calib.MaxCalibrator.reset"> reset()<a class="headerlink" href="#pytorch_quantization.calib.MaxCalibrator.reset" title="Permalink to this definition"></a></dt> <dd>Reset the collected absolute max </dd></dl> </dd></dl> </div> <div class="section" id="histogramcalibrator"> <h3>HistogramCalibrator<a class="headerlink" href="#histogramcalibrator" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.calib.HistogramCalibrator"> class pytorch_quantization.calib.HistogramCalibrator(num_bits, axis, unsigned, num_bins=2048, grow_method=None, skip_zeros=False, torch_hist=True)<a class="headerlink" href="#pytorch_quantization.calib.HistogramCalibrator" title="Permalink to this definition"></a></dt> <dd>Unified histogram calibrator <dl class="simple"> <dt>Histogram will be only collected once. compute_amax() performs entropy, percentile, or mse</dt><dd>calibration based on arguments </dd> </dl> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li>num_bits – An integer. Number of bits of quantization.</li> <li>axis – A tuple. see QuantDescriptor.</li> <li>unsigned – A boolean. using unsigned quantization.</li> <li>num_bins – An integer. Number of histograms bins. Default 2048.</li> <li>grow_method – A string. DEPRECATED. default None.</li> <li>skip_zeros – A boolean. If True, skips zeros when collecting data for histogram. Default False.</li> <li>torch_hist – A boolean. If True, collect histogram by torch.histc instead of np.histogram. If input tensor is on GPU, histc will also be running on GPU. Default True.</li> </ul> </dd> </dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.calib.HistogramCalibrator.collect"> collect(x)<a class="headerlink" href="#pytorch_quantization.calib.HistogramCalibrator.collect" title="Permalink to this definition"></a></dt> <dd>Collect histogram </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.calib.HistogramCalibrator.compute_amax"> compute_amax(method: <a class="reference external" href="https://docs.python.org/3/library/stdtypes.html#str" title="(in Python v3.12)">str</a>, *, stride: <a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.12)">int</a> = 1, start_bin: <a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.12)">int</a> = 128, percentile: <a class="reference external" href="https://docs.python.org/3/library/functions.html#float" title="(in Python v3.12)">float</a> = 99.99)<a class="headerlink" href="#pytorch_quantization.calib.HistogramCalibrator.compute_amax" title="Permalink to this definition"></a></dt> <dd>Compute the amax from the collected histogram <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd">method – A string. One of [‘entropy’, ‘mse’, ‘percentile’] </dd> <dt class="field-even">Keyword Arguments</dt> <dd class="field-even"><ul class="simple"> <li>stride – An integer. Default 1</li> <li>start_bin – An integer. Default 128</li> <li>percentils – A float number between [0, 100]. Default 99.99.</li> </ul> </dd> <dt class="field-odd">Returns</dt> <dd class="field-odd">amax – a tensor </dd> </dl> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.calib.HistogramCalibrator.reset"> reset()<a class="headerlink" href="#pytorch_quantization.calib.HistogramCalibrator.reset" title="Permalink to this definition"></a></dt> <dd>Reset the collected histogram </dd></dl> </dd></dl> </div> </div> <div class="section" id="module-pytorch_quantization.nn"> <h2>pytorch_quantization.nn<a class="headerlink" href="#module-pytorch_quantization.nn" title="Permalink to this headline"></a></h2> <div class="section" id="tensorquantizer"> <h3>TensorQuantizer<a class="headerlink" href="#tensorquantizer" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer"> class pytorch_quantization.nn.TensorQuantizer(quant_desc=<pytorch_quantization.tensor_quant.ScaledQuantDescriptor object>, disabled=False, if_quant=True, if_clip=False, if_calib=False)<a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer" title="Permalink to this definition"></a></dt> <dd>Tensor quantizer module This module uses tensor_quant or fake_tensor_quant function to quantize a tensor. And wrappers variable, moving statistics we’d want when training a quantized network. <dl class="simple"> <dt>Experimental features:</dt><dd><code class="docutils literal notranslate">clip</code> stage learns range before enabling quantization. <code class="docutils literal notranslate">calib</code> stage runs calibration </dd> </dl> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li>quant_desc – An instance of <a class="reference internal" href="index.html#pytorch_quantization.tensor_quant.QuantDescriptor" title="pytorch_quantization.tensor_quant.QuantDescriptor"><code class="xref py py-func docutils literal notranslate">QuantDescriptor</code></a>.</li> <li>disabled – A boolean. If True, by pass the whole module returns input. Default False.</li> <li>if_quant – A boolean. If True, run main quantization body. Default True.</li> <li>if_clip – A boolean. If True, clip before quantization and learn amax. Default False.</li> <li>if_calib – A boolean. If True, run calibration. Not implemented yet. Settings of calibration will probably go to <a class="reference internal" href="index.html#pytorch_quantization.tensor_quant.QuantDescriptor" title="pytorch_quantization.tensor_quant.QuantDescriptor"><code class="xref py py-func docutils literal notranslate">QuantDescriptor</code></a>.</li> </ul> </dd> </dl> Raises: <dl class="simple"> <dt>Readonly Properties:</dt><dd><ul class="simple"> <li>axis:</li> <li>fake_quant:</li> <li>scale:</li> <li>step_size:</li> </ul> </dd> <dt>Mutable Properties:</dt><dd><ul class="simple"> <li>num_bits:</li> <li>unsigned:</li> <li>amax:</li> </ul> </dd> </dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer.__init__"> __init__(quant_desc=<pytorch_quantization.tensor_quant.ScaledQuantDescriptor object>, disabled=False, if_quant=True, if_clip=False, if_calib=False)<a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer.__init__" title="Permalink to this definition"></a></dt> <dd>Initialize quantizer and set up required variables </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer.disable"> disable()<a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer.disable" title="Permalink to this definition"></a></dt> <dd>Bypass the module </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer.disable_clip"> disable_clip()<a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer.disable_clip" title="Permalink to this definition"></a></dt> <dd>Disable clip stage </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer.enable_clip"> enable_clip()<a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer.enable_clip" title="Permalink to this definition"></a></dt> <dd>Enable clip stage </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer.forward"> forward(inputs)<a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer.forward" title="Permalink to this definition"></a></dt> <dd>Apply tensor_quant function to inputs <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd">inputs – A Tensor of type float32. </dd> <dt class="field-even">Returns</dt> <dd class="field-even">outputs – A Tensor of type output_dtype </dd> </dl> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer.init_learn_amax"> init_learn_amax()<a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer.init_learn_amax" title="Permalink to this definition"></a></dt> <dd>Initialize learned amax from fixed amax </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.nn.TensorQuantizer.load_calib_amax"> load_calib_amax(*args, **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.TensorQuantizer.load_calib_amax" title="Permalink to this definition"></a></dt> <dd>Load amax from calibrator. Updates the amax buffer with value computed by the calibrator, creating it if necessary. <a href="#id1">*</a>args and <a href="#id3">**</a>kwargs are directly passed to compute_amax, except “strict” in kwargs. Refer to compute_amax for more details. </dd></dl> </dd></dl> <div class="section" id="quantized-modules"> <h4>Quantized Modules<a class="headerlink" href="#quantized-modules" title="Permalink to this headline"></a></h4> </div> </div> <div class="section" id="quantconvnd"> <h3>_QuantConvNd<a class="headerlink" href="#quantconvnd" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.modules.quant_conv._QuantConvNd"> class pytorch_quantization.nn.modules.quant_conv._QuantConvNd(in_channels, out_channels, kernel_size, stride, padding, dilation, transposed, output_padding, groups, bias, padding_mode, quant_desc_input, quant_desc_weight)<a class="headerlink" href="#pytorch_quantization.nn.modules.quant_conv._QuantConvNd" title="Permalink to this definition"></a></dt> <dd>base class of quantized Conv inherited from _ConvNd Comments of original arguments can be found in torch.nn.modules.conv <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li>quant_desc_input – An instance of <a class="reference internal" href="index.html#pytorch_quantization.tensor_quant.QuantDescriptor" title="pytorch_quantization.tensor_quant.QuantDescriptor"><code class="xref py py-class docutils literal notranslate">QuantDescriptor</code></a>. Quantization descriptor of input.</li> <li>quant_desc_weight – An instance of <a class="reference internal" href="index.html#pytorch_quantization.tensor_quant.QuantDescriptor" title="pytorch_quantization.tensor_quant.QuantDescriptor"><code class="xref py py-class docutils literal notranslate">QuantDescriptor</code></a>. Quantization descriptor of weight.</li> </ul> </dd> <dt class="field-even">Raises</dt> <dd class="field-even"><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#ValueError" title="(in Python v3.12)">ValueError</a> – If unsupported arguments are passed in. </dd> </dl> <dl class="simple"> <dt>Readonly properties:</dt><dd><ul class="simple"> <li>input_quantizer:</li> <li>weight_quantizer:</li> </ul> </dd> <dt>Static methods:</dt><dd><ul class="simple"> <li>set_default_quant_desc_input: Set default_quant_desc_input</li> <li>set_default_quant_desc_weight: Set default_quant_desc_weight</li> </ul> </dd> </dl> </dd></dl> </div> <div class="section" id="quantconv1d"> <h3>QuantConv1d<a class="headerlink" href="#quantconv1d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantConv1d"> class pytorch_quantization.nn.QuantConv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantConv1d" title="Permalink to this definition"></a></dt> <dd>Quantized 1D Conv </dd></dl> </div> <div class="section" id="quantconv2d"> <h3>QuantConv2d<a class="headerlink" href="#quantconv2d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantConv2d"> class pytorch_quantization.nn.QuantConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantConv2d" title="Permalink to this definition"></a></dt> <dd>Quantized 2D conv </dd></dl> </div> <div class="section" id="quantconv3d"> <h3>QuantConv3d<a class="headerlink" href="#quantconv3d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantConv3d"> class pytorch_quantization.nn.QuantConv3d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantConv3d" title="Permalink to this definition"></a></dt> <dd>Quantized 3D Conv </dd></dl> </div> <div class="section" id="quantconvtranspose1d"> <h3>QuantConvTranspose1d<a class="headerlink" href="#quantconvtranspose1d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantConvTranspose1d"> class pytorch_quantization.nn.QuantConvTranspose1d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros', **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantConvTranspose1d" title="Permalink to this definition"></a></dt> <dd>Quantized ConvTranspose1d </dd></dl> </div> <div class="section" id="quantconvtranspose2d"> <h3>QuantConvTranspose2d<a class="headerlink" href="#quantconvtranspose2d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantConvTranspose2d"> class pytorch_quantization.nn.QuantConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros', **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantConvTranspose2d" title="Permalink to this definition"></a></dt> <dd>Quantized ConvTranspose2d </dd></dl> </div> <div class="section" id="quantconvtranspose3d"> <h3>QuantConvTranspose3d<a class="headerlink" href="#quantconvtranspose3d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantConvTranspose3d"> class pytorch_quantization.nn.QuantConvTranspose3d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros', **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantConvTranspose3d" title="Permalink to this definition"></a></dt> <dd>Quantized ConvTranspose3d </dd></dl> </div> <div class="section" id="quantlinear"> <h3>QuantLinear<a class="headerlink" href="#quantlinear" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantLinear"> class pytorch_quantization.nn.QuantLinear(in_features, out_features, bias=True, **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantLinear" title="Permalink to this definition"></a></dt> <dd>Quantized version of nn.Linear Apply quantized linear to the incoming data, y = dequant(quant(x)quant(A)^T + b). Keep Module name “Linear” instead of “QuantLinear” so that it can be easily dropped into preexisting model and load pretrained weights. An alias “QuantLinear” is defined below. The base code is a copy of nn.Linear, see detailed comment of original arguments there. Quantization descriptors are passed in in kwargs. If not presents, default_quant_desc_input and default_quant_desc_weight are used. <dl class="field-list simple"> <dt class="field-odd">Keyword Arguments</dt> <dd class="field-odd"><ul class="simple"> <li>quant_desc_input – An instance of <a class="reference internal" href="index.html#pytorch_quantization.tensor_quant.QuantDescriptor" title="pytorch_quantization.tensor_quant.QuantDescriptor"><code class="xref py py-class docutils literal notranslate">QuantDescriptor</code></a>. Quantization descriptor of input.</li> <li>quant_desc_wegiht – An instance of <a class="reference internal" href="index.html#pytorch_quantization.tensor_quant.QuantDescriptor" title="pytorch_quantization.tensor_quant.QuantDescriptor"><code class="xref py py-class docutils literal notranslate">QuantDescriptor</code></a>. Quantization descriptor of weight.</li> </ul> </dd> <dt class="field-even">Raises</dt> <dd class="field-even"><ul class="simple"> <li><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#ValueError" title="(in Python v3.12)">ValueError</a> – If unsupported arguments are passed in.</li> <li><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#KeyError" title="(in Python v3.12)">KeyError</a> – If unsupported kwargs are passed in.</li> </ul> </dd> </dl> <dl class="simple"> <dt>Readonly properties:</dt><dd><ul class="simple"> <li>input_quantizer:</li> <li>weight_quantizer:</li> </ul> </dd> <dt>Static methods:</dt><dd><ul class="simple"> <li>set_default_quant_desc_input: Set default_quant_desc_input</li> <li>set_default_quant_desc_weight: Set default_quant_desc_weight</li> </ul> </dd> </dl> </dd></dl> </div> <div class="section" id="quantmaxpool1d"> <h3>QuantMaxPool1d<a class="headerlink" href="#quantmaxpool1d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantMaxPool1d"> class pytorch_quantization.nn.QuantMaxPool1d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False, **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantMaxPool1d" title="Permalink to this definition"></a></dt> <dd>Quantized 1D maxpool </dd></dl> </div> <div class="section" id="quantmaxpool2d"> <h3>QuantMaxPool2d<a class="headerlink" href="#quantmaxpool2d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantMaxPool2d"> class pytorch_quantization.nn.QuantMaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False, **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantMaxPool2d" title="Permalink to this definition"></a></dt> <dd>Quantized 2D maxpool </dd></dl> </div> <div class="section" id="quantmaxpool3d"> <h3>QuantMaxPool3d<a class="headerlink" href="#quantmaxpool3d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantMaxPool3d"> class pytorch_quantization.nn.QuantMaxPool3d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False, **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantMaxPool3d" title="Permalink to this definition"></a></dt> <dd>Quantized 3D maxpool </dd></dl> </div> <div class="section" id="quantavgpool1d"> <h3>QuantAvgPool1d<a class="headerlink" href="#quantavgpool1d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantAvgPool1d"> class pytorch_quantization.nn.QuantAvgPool1d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantAvgPool1d" title="Permalink to this definition"></a></dt> <dd>Quantized 1D average pool </dd></dl> </div> <div class="section" id="quantavgpool2d"> <h3>QuantAvgPool2d<a class="headerlink" href="#quantavgpool2d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantAvgPool2d"> class pytorch_quantization.nn.QuantAvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None, **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantAvgPool2d" title="Permalink to this definition"></a></dt> <dd>Quantized 2D average pool </dd></dl> </div> <div class="section" id="quantavgpool3d"> <h3>QuantAvgPool3d<a class="headerlink" href="#quantavgpool3d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantAvgPool3d"> class pytorch_quantization.nn.QuantAvgPool3d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None, **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantAvgPool3d" title="Permalink to this definition"></a></dt> <dd>Quantized 3D average pool </dd></dl> </div> <div class="section" id="quantadaptiveavgpool1d"> <h3>QuantAdaptiveAvgPool1d<a class="headerlink" href="#quantadaptiveavgpool1d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantAdaptiveAvgPool1d"> class pytorch_quantization.nn.QuantAdaptiveAvgPool1d(output_size, **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantAdaptiveAvgPool1d" title="Permalink to this definition"></a></dt> <dd>Quantized 1D adaptive average pool </dd></dl> </div> <div class="section" id="quantadaptiveavgpool2d"> <h3>QuantAdaptiveAvgPool2d<a class="headerlink" href="#quantadaptiveavgpool2d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantAdaptiveAvgPool2d"> class pytorch_quantization.nn.QuantAdaptiveAvgPool2d(output_size, **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantAdaptiveAvgPool2d" title="Permalink to this definition"></a></dt> <dd>Quantized 2D adaptive average pool </dd></dl> </div> <div class="section" id="quantadaptiveavgpool3d"> <h3>QuantAdaptiveAvgPool3d<a class="headerlink" href="#quantadaptiveavgpool3d" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantAdaptiveAvgPool3d"> class pytorch_quantization.nn.QuantAdaptiveAvgPool3d(output_size, **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantAdaptiveAvgPool3d" title="Permalink to this definition"></a></dt> <dd>Quantized 3D adaptive average pool </dd></dl> </div> <div class="section" id="clip"> <h3>Clip<a class="headerlink" href="#clip" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.Clip"> class pytorch_quantization.nn.Clip(clip_value_min, clip_value_max, learn_min=False, learn_max=False)<a class="headerlink" href="#pytorch_quantization.nn.Clip" title="Permalink to this definition"></a></dt> <dd>Clip tensor <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li>clip_value_min – A number or tensor of lower bound to clip</li> <li>clip_value_max – A number of tensor of upper bound to clip</li> <li>learn_min – A boolean. If True, learn min. clip_value_min will be used to initialize. Default False</li> <li>learn_max – A boolean. Similar as learn_min but for max.</li> </ul> </dd> <dt class="field-even">Raises</dt> <dd class="field-even"><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#ValueError" title="(in Python v3.12)">ValueError</a> – </dd> </dl> </dd></dl> </div> <div class="section" id="quantlstm"> <h3>QuantLSTM<a class="headerlink" href="#quantlstm" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantLSTM"> class pytorch_quantization.nn.QuantLSTM(*args, **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantLSTM" title="Permalink to this definition"></a></dt> <dd>Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence. </dd></dl> </div> <div class="section" id="quantlstmcell"> <h3>QuantLSTMCell<a class="headerlink" href="#quantlstmcell" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.QuantLSTMCell"> class pytorch_quantization.nn.QuantLSTMCell(input_size, hidden_size, bias=True, **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.QuantLSTMCell" title="Permalink to this definition"></a></dt> <dd>A long short-term memory (LSTM) cell. </dd></dl> </div> </div> <div class="section" id="module-pytorch_quantization.nn.functional"> <h2>pytorch_quantization.nn.functional<a class="headerlink" href="#module-pytorch_quantization.nn.functional" title="Permalink to this headline"></a></h2> Some supportive functions <div class="section" id="clipfunction"> <h3>ClipFunction<a class="headerlink" href="#clipfunction" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.nn.functional.ClipFunction"> class pytorch_quantization.nn.functional.ClipFunction(*args, **kwargs)<a class="headerlink" href="#pytorch_quantization.nn.functional.ClipFunction" title="Permalink to this definition"></a></dt> <dd>An universal tensor clip function Pytorch’s clamp() only supports scalar range and doesn’t support broadcast. This implementation uses min/max which is more genaral. The gradient is defined according to IBM’s PACT paper <a class="reference external" href="https://arxiv.org/abs/1805.06085">https://arxiv.org/abs/1805.06085</a>, which is also the behavior of Tensorflow’s clip_by_value() </dd></dl> <code class="docutils literal notranslate">clip</code> is alias of <code class="docutils literal notranslate">ClipFunction.apply</code> </div> </div> <div class="section" id="module-pytorch_quantization.optim.helper"> <h2>pytorch_quantization.optim.helper<a class="headerlink" href="#module-pytorch_quantization.optim.helper" title="Permalink to this headline"></a></h2> Helper functions for quant optimizer/trainer <dl class="py function"> <dt class="sig sig-object py" id="pytorch_quantization.optim.helper.freeze_parameters"> pytorch_quantization.optim.helper.freeze_parameters(model, patterns)<a class="headerlink" href="#pytorch_quantization.optim.helper.freeze_parameters" title="Permalink to this definition"></a></dt> <dd>Set requires_grad to False if patterns match name <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li>model – A Module</li> <li>patterns – A list of strings that will be used to match parameter names. If parameter name contains any pattern, it will be frozen.</li> </ul> </dd> </dl> </dd></dl> <dl class="py function"> <dt class="sig sig-object py" id="pytorch_quantization.optim.helper.group_parameters"> pytorch_quantization.optim.helper.group_parameters(model, patterns_list, lrs=None, momentums=None, weight_decays=None)<a class="headerlink" href="#pytorch_quantization.optim.helper.group_parameters" title="Permalink to this definition"></a></dt> <dd>Group parameters for using per-parameters option in optimizer Returns a list of dict that matches Pytorch optimizer fashion, see <a class="reference external" href="https://pytorch.org/docs/stable/optim.html#per-parameter-options">https://pytorch.org/docs/stable/optim.html#per-parameter-options</a> for more details. <div class="admonition-example admonition"> Example <div class="doctest highlight-default notranslate"><div class="highlight"><pre>>>> [ >>> {'params': model.base.parameters()}, >>> {'params': model.classifier.parameters(), 'lr': 1e-3} >>> ] </pre></div> </div> </div> Parameters will be grouped w.r.t first level of the keys_list. e.g. <cite>keys_list=[[‘conv1’, ‘conv2’], [‘conv3’]]</cite> will return 2 groups, one with <cite>conv1</cite> and <cite>conv2</cite> in name, and the other with <cite>conv3</cite> in name. If lr, momentum or weight_decay are supplied, they will be added to the group as well. <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li>model – A module</li> <li>patterns_list – A list of list of strings. WARNING: patters must be EXCLUSIVE, the function doesn’t perform exclusive check.</li> <li>lrs – A list of float with same length as keys_list or None.</li> <li>momentums – A list of float with same length as keys_list or None.</li> <li>weight_decays – A list of float with same length as keys_list or None.</li> </ul> </dd> <dt class="field-even">Returns</dt> <dd class="field-even">param_group – A list of dict </dd> </dl> </dd></dl> <dl class="py function"> <dt class="sig sig-object py" id="pytorch_quantization.optim.helper.match_parameters"> pytorch_quantization.optim.helper.match_parameters(model, patterns)<a class="headerlink" href="#pytorch_quantization.optim.helper.match_parameters" title="Permalink to this definition"></a></dt> <dd>Returns an generator over module parameters if name matches key It is useful to group parameters, and apply different functions to different group. This function provides an easy way to group them. <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li>model – A Module</li> <li>patterns – A list of strings that will be used to match parameter names. If parameter name contains any pattern, it will be yield</li> </ul> </dd> <dt class="field-even">Yields</dt> <dd class="field-even">param – Module parameters </dd> </dl> </dd></dl> <dl class="py function"> <dt class="sig sig-object py" id="pytorch_quantization.optim.helper.quant_weight_inplace"> pytorch_quantization.optim.helper.quant_weight_inplace(model)<a class="headerlink" href="#pytorch_quantization.optim.helper.quant_weight_inplace" title="Permalink to this definition"></a></dt> <dd>Make quantization inplace Search for quantized modules including QuantConvNd and QuantLinear, make weight quantization in place using weight_quantizer. Most publications of quantization aware training uses STE by default, which is really an approximation of derivative of the nondifferentiable quantization function, which works to some extended but by no means the F=ma of the problem. Inplace quantization can be used to implement relax-and-round, which is a common method in Discrete Optimization’s or Integer Programming. </dd></dl> </div> <div class="section" id="module-pytorch_quantization.tensor_quant"> <h2>pytorch_quantization.tensor_quant<a class="headerlink" href="#module-pytorch_quantization.tensor_quant" title="Permalink to this headline"></a></h2> Basic tensor quantization functions <div class="section" id="quantdescriptor"> <h3>QuantDescriptor<a class="headerlink" href="#quantdescriptor" title="Permalink to this headline"></a></h3> <dl class="py attribute"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.QuantDescriptor"> pytorch_quantization.tensor_quant.QuantDescriptor<a class="headerlink" href="#pytorch_quantization.tensor_quant.QuantDescriptor" title="Permalink to this definition"></a></dt> <dd>alias of <a class="reference internal" href="#pytorch_quantization.tensor_quant.ScaledQuantDescriptor" title="pytorch_quantization.tensor_quant.ScaledQuantDescriptor"><code class="xref py py-class docutils literal notranslate">pytorch_quantization.tensor_quant.ScaledQuantDescriptor</code></a> </dd></dl> </div> <div class="section" id="scaledquantdescriptor"> <h3>ScaledQuantDescriptor<a class="headerlink" href="#scaledquantdescriptor" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.ScaledQuantDescriptor"> class pytorch_quantization.tensor_quant.ScaledQuantDescriptor(num_bits=8, name=None, **kwargs)<a class="headerlink" href="#pytorch_quantization.tensor_quant.ScaledQuantDescriptor" title="Permalink to this definition"></a></dt> <dd>Supportive descriptor of quantization Describe how a tensor should be quantized. A QuantDescriptor and a tensor defines a quantized tensor. <dl class="field-list"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul> <li>num_bits – An integer or a tuple of two integers. Specifically, <cite>num_bits</cite> can be: <ol class="arabic simple"> <li><dl class="simple"> <dt>A positive integer argument for integer qunatization. <cite>num_bits</cite> specify</dt><dd>the number of bits used for integer quantization. </dd> </dl> </li> <li><dl class="simple"> <dt>Constant integer tuple (4,3) for E4M3 floating point quantization emulating</dt><dd>Nvidia’s FP8 quantization. E4M3 quantization only supports per-tensor quantization. </dd> </dl> </li> </ol> Default: 8. </li> <li>name – Seems a nice thing to have</li> </ul> </dd> <dt class="field-even">Keyword Arguments</dt> <dd class="field-even"><ul class="simple"> <li>fake_quant – A boolean. If True, use fake quantization mode. Default True.</li> <li>axis – None, int or tuple of int. axes which will have its own max for computing scaling factor. If None (the default), use per tensor scale. Must be in the range [-rank(input_tensor), rank(input_tensor)). e.g. For a KCRS weight tensor, quant_axis=(0) will yield per channel scaling. Default None.</li> <li>amax – A float or list/ndarray of floats of user specified absolute max range. If supplied, ignore quant_axis and use this to quantize. If learn_amax is True, will be used to initialize learnable amax. Default None.</li> <li>learn_amax – A boolean. If True, learn amax. Default False.</li> <li>scale_amax – A float. If supplied, multiply amax by scale_amax. Default None. It is useful for some quick experiment.</li> <li>calib_method – A string. One of [“max”, “histogram”] indicates which calibration to use. Except the simple max calibration, other methods are all hisogram based. Default “max”.</li> <li>unsigned – A Boolean. If True, use unsigned. Default False.</li> </ul> </dd> <dt class="field-odd">Raises</dt> <dd class="field-odd"><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#TypeError" title="(in Python v3.12)">TypeError</a> – If unsupported type is passed in. </dd> </dl> <dl class="simple"> <dt>Read-only properties:</dt><dd><ul class="simple"> <li>fake_quant:</li> <li>name:</li> <li>learn_amax:</li> <li>scale_amax:</li> <li>axis:</li> <li>calib_method:</li> <li>num_bits:</li> <li>amax:</li> <li>unsigned:</li> </ul> </dd> </dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.ScaledQuantDescriptor.dict"> dict()<a class="headerlink" href="#pytorch_quantization.tensor_quant.ScaledQuantDescriptor.dict" title="Permalink to this definition"></a></dt> <dd>Serialize to dict The build-in __dict__ method returns all the attributes, which includes those have default value and have protected prefix “_”. This method only returns those have values other than the default one and don’t have _ in key. Construct a instance by dict returned by this method should get exactly the same instance. </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.ScaledQuantDescriptor.from_yaml"> classmethod from_yaml(yaml_str)<a class="headerlink" href="#pytorch_quantization.tensor_quant.ScaledQuantDescriptor.from_yaml" title="Permalink to this definition"></a></dt> <dd>Create descriptor from yaml str </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.ScaledQuantDescriptor.to_yaml"> to_yaml()<a class="headerlink" href="#pytorch_quantization.tensor_quant.ScaledQuantDescriptor.to_yaml" title="Permalink to this definition"></a></dt> <dd>Create yaml serialization Some attributes need special treatment to have human readable form, including amax, axis. </dd></dl> </dd></dl> </div> <div class="section" id="tensorquantfunction"> <h3>TensorQuantFunction<a class="headerlink" href="#tensorquantfunction" title="Permalink to this headline"></a></h3> <dl class="py class"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.TensorQuantFunction"> class pytorch_quantization.tensor_quant.TensorQuantFunction(*args, **kwargs)<a class="headerlink" href="#pytorch_quantization.tensor_quant.TensorQuantFunction" title="Permalink to this definition"></a></dt> <dd>A universal tensor quantization function Take an input tensor, output an quantized tensor. The granularity of scale can be interpreted from the shape of amax. output_dtype indicates whether the quantized value will be stored in integer or float. The reason we want to store it in float is the pytorch function takes the quantized value may not accept integer input, e.g. Conv2D. It uses 2^num_bits -1 values instead of 2^num_bits. e.g., for num_bits=8, it uses [-127, 127] instead of [-128, 127] <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.TensorQuantFunction.backward"> static backward(ctx, grad_outputs, grad_scale)<a class="headerlink" href="#pytorch_quantization.tensor_quant.TensorQuantFunction.backward" title="Permalink to this definition"></a></dt> <dd>Implements straight through estimation with clipping. For -amax <= input <= amax the gradient passes straight through, otherwise the gradient is zero. <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li>ctx – A Context object with saved tensors from forward.</li> <li>grad_outputs – A tensor of gradient of outputs.</li> <li>grad_scale – A tensor of gradient of scale.</li> </ul> </dd> <dt class="field-even">Returns</dt> <dd class="field-even">grad_inputs – A tensor of gradient. </dd> </dl> </dd></dl> <dl class="py method"> <dt class="sig sig-object py" id="pytorch_quantization.tensor_quant.TensorQuantFunction.forward"> static forward(ctx, inputs, amax, num_bits=8, unsigned=False, narrow_range=True)<a class="headerlink" href="#pytorch_quantization.tensor_quant.TensorQuantFunction.forward" title="Permalink to this definition"></a></dt> <dd>Follow tensorflow convention, max value is passed in and used to decide scale, instead of inputing scale directly. Though inputing scale directly may be more natural to use. <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li>ctx – A Context object to store tensors for backward.</li> <li>inputs – A Tensor of type float32.</li> <li>amax – A Tensor of type float32. Inputs will be quantized within range [-amax, amax] amax will be broadcasted to inputs tensor.</li> <li>num_bits – A integer used to calculate scaling factor, scale = (2^(num_bits-1) - 1) / max Effectively, it indicates how many integer bits is used to represent the value. Default 8.</li> <li>output_dtype – A type of Tensor. torch.int32 or torch.float32.</li> <li>unsigned – A boolean. Use unsigned integer range. E.g. [0, 255] for num_bits=8. Default False.</li> <li>narrow_range – A boolean. Use symmetric integer range for signed quantization E.g. [-127,127] instead of [-128,127] for num_bits=8. Default True.</li> </ul> </dd> <dt class="field-even">Returns</dt> <dd class="field-even">outputs – A Tensor of type output_dtype. scale: A Tensor of type float32. outputs / scale will dequantize outputs tensor. </dd> <dt class="field-odd">Raises</dt> <dd class="field-odd"><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#ValueError" title="(in Python v3.12)">ValueError</a> – </dd> </dl> </dd></dl> </dd></dl> <code class="docutils literal notranslate">tensor_quant</code> is alias of <code class="docutils literal notranslate">TensorQuantFunction.apply</code> <code class="docutils literal notranslate">fake_tensor_quant</code> is alias of <code class="docutils literal notranslate">FakeTensorQuantFunction.apply</code> </div> </div> <div class="section" id="pytorch-quantization-utils"> <h2>pytorch_quantization.utils<a class="headerlink" href="#pytorch-quantization-utils" title="Permalink to this headline"></a></h2> <div class="section" id="module-pytorch_quantization.utils.quant_logging"> <h3>pytorch_quantization.utils.quant_logging<a class="headerlink" href="#module-pytorch_quantization.utils.quant_logging" title="Permalink to this headline"></a></h3> A WAR for codes that messes up logging format <dl class="py function"> <dt class="sig sig-object py" id="pytorch_quantization.utils.quant_logging.reset_logger_handler"> pytorch_quantization.utils.quant_logging.reset_logger_handler()<a class="headerlink" href="#pytorch_quantization.utils.quant_logging.reset_logger_handler" title="Permalink to this definition"></a></dt> <dd>Remove all handler in root logger </dd></dl> </div> <div class="section" id="module-pytorch_quantization.utils.reduce_amax"> <h3>pytorch_quantization.utils.reduce_amax<a class="headerlink" href="#module-pytorch_quantization.utils.reduce_amax" title="Permalink to this headline"></a></h3> Function to get absolute maximum of a tensor Follow numpy fashion, which is more generic as pytorch’s <dl class="py function"> <dt class="sig sig-object py" id="pytorch_quantization.utils.reduce_amax.reduce_amax"> pytorch_quantization.utils.reduce_amax.reduce_amax(input, axis=None, keepdims=True)<a class="headerlink" href="#pytorch_quantization.utils.reduce_amax.reduce_amax" title="Permalink to this definition"></a></dt> <dd>Compute the absolute maximum value of a tensor. Reduces input_tensor along the dimensions given in axis. Unless keepdims is true, the rank of the tensor is reduced by 1 for each entry in axis. If keepdims is true, the reduced dimensions are retained with length 1. <div class="admonition note"> Note Gradient computeation is disabled as this function is never meant learning reduces amax </div> <dl class="field-list simple"> <dt class="field-odd">Parameters</dt> <dd class="field-odd"><ul class="simple"> <li>input – Input tensor</li> <li>axis – The dimensions to reduce. None or int or tuple of ints. If None (the default), reduces all dimensions. Must be in the range [-rank(input_tensor), rank(input_tensor)).</li> <li>keepdims – A boolean. If true, retains reduced dimensions with length 1. Default True</li> <li>granularity – DEPRECTED. specifies if the statistic has to be calculated at tensor or channel granularity</li> </ul> </dd> <dt class="field-even">Returns</dt> <dd class="field-even">The reduced tensor. </dd> <dt class="field-odd">Raises</dt> <dd class="field-odd"><ul class="simple"> <li><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#ValueError" title="(in Python v3.12)">ValueError</a> – Any axis which doesn’t make sense or is not supported</li> <li><a class="reference external" href="https://docs.python.org/3/library/exceptions.html#ValueError" title="(in Python v3.12)">ValueError</a> – If unknown granularity is passed in.</li> </ul> </dd> </dl> </dd></dl> </div> </div> </div> </div> <div class="section" id="indices"> <h1>Indices<a class="headerlink" href="#indices" title="Permalink to this headline"></a></h1> <ul class="simple"> <li><a class="reference internal" href="genindex.html">Index</a></li> </ul> </div> </div> </div> <footer> <hr/> <div role="contentinfo"> <jinja2.runtime.BlockReference object at 0x7f7c54692e20> <div class="footer"> Copyright © 2024 NVIDIA Corporation <a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/privacy-policy/" target="_blank" rel="noopener" data-cms-ai="0">Privacy Policy</a> | <a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/privacy-center/" target="_blank" rel="noopener" data-cms-ai="0">Manage My Privacy</a> | <a class="Link" href="https://www.nvidia.com/en-us/preferences/start/" target="_blank" rel="noopener" data-cms-ai="0">Do Not Sell or Share My Data</a> | <a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/terms-of-service/" target="_blank" rel="noopener" data-cms-ai="0">Terms of Service</a> | <a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/accessibility/" target="_blank" rel="noopener" data-cms-ai="0">Accessibility</a> | <a class="Link" href="https://www.nvidia.com/en-us/about-nvidia/company-policies/" target="_blank" rel="noopener" data-cms-ai="0">Corporate Policies</a> | <a class="Link" href="https://www.nvidia.com/en-us/product-security/" target="_blank" rel="noopener" data-cms-ai="0">Product Security</a> | <a class="Link" href="https://www.nvidia.com/en-us/contact/" target="_blank" rel="noopener" data-cms-ai="0">Contact</a> </div> </div> </footer> </div> </div> </section> </div> <script> jQuery(function () { SphinxRtdTheme.Navigation.enable(true); }); </script> <script type="text/javascript">_satellite.pageBottom();</script> </body> </html>