CINXE.COM

<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="description" content="Keras documentation"> <meta name="author" content="Keras Team"> <link rel="shortcut icon" href="https://keras.io/img/favicon.ico"> <link rel="canonical" href="https://keras.io/examples/vision/involution/" />  <meta property="og:title" content="Keras documentation: Involutional neural networks"> <meta property="og:image" content="https://keras.io/img/logo-k-keras-wb.png"> <meta name="twitter:title" content="Keras documentation: Involutional neural networks"> <meta name="twitter:image" content="https://keras.io/img/k-keras-social.png"> <meta name="twitter:card" content="summary"> <title>Involutional neural networks</title>  <link href="/css/bootstrap.min.css" rel="stylesheet">  <link href="https://fonts.googleapis.com/css2?family=Open+Sans:wght@400;600;700;800&display=swap" rel="stylesheet">  <link href="/css/docs.css" rel="stylesheet"> <link href="/css/monokai.css" rel="stylesheet">  <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0], j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src= 'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f); })(window,document,'script','dataLayer','GTM-5DNGF4N'); </script> <script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-175165319-128', 'auto'); ga('send', 'pageview'); </script>  <script async defer src="https://buttons.github.io/buttons.js"></script> </head> <body>  <noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-5DNGF4N" height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>  <div class='k-page'> <div class="k-nav" id="nav-menu"> <a href='/'><img src='/img/logo-small.png' class='logo-small' /></a> <div class="nav flex-column nav-pills" role="tablist" aria-orientation="vertical"> <a class="nav-link" href="/about/" role="tab" aria-selected="">About Keras</a> <a class="nav-link" href="/getting_started/" role="tab" aria-selected="">Getting started</a> <a class="nav-link" href="/guides/" role="tab" aria-selected="">Developer guides</a> <a class="nav-link active" href="/examples/" role="tab" aria-selected="">Code examples</a> <a class="nav-sublink active" href="/examples/vision/">Computer Vision</a> <a class="nav-sublink2" href="/examples/vision/image_classification_from_scratch/">Image classification from scratch</a> <a class="nav-sublink2" href="/examples/vision/mnist_convnet/">Simple MNIST convnet</a> <a class="nav-sublink2" href="/examples/vision/image_classification_efficientnet_fine_tuning/">Image classification via fine-tuning with EfficientNet</a> <a class="nav-sublink2" href="/examples/vision/image_classification_with_vision_transformer/">Image classification with Vision Transformer</a> <a class="nav-sublink2" href="/examples/vision/attention_mil_classification/">Classification using Attention-based Deep Multiple Instance Learning</a> <a class="nav-sublink2" href="/examples/vision/mlp_image_classification/">Image classification with modern MLP models</a> <a class="nav-sublink2" href="/examples/vision/mobilevit/">A mobile-friendly Transformer-based model for image classification</a> <a class="nav-sublink2" href="/examples/vision/xray_classification_with_tpus/">Pneumonia Classification on TPU</a> <a class="nav-sublink2" href="/examples/vision/cct/">Compact Convolutional Transformers</a> <a class="nav-sublink2" href="/examples/vision/convmixer/">Image classification with ConvMixer</a> <a class="nav-sublink2" href="/examples/vision/eanet/">Image classification with EANet (External Attention Transformer)</a> <a class="nav-sublink2 active" href="/examples/vision/involution/">Involutional neural networks</a> <a class="nav-sublink2" href="/examples/vision/perceiver_image_classification/">Image classification with Perceiver</a> <a class="nav-sublink2" href="/examples/vision/reptile/">Few-Shot learning with Reptile</a> <a class="nav-sublink2" href="/examples/vision/semisupervised_simclr/">Semi-supervised image classification using contrastive pretraining with SimCLR</a> <a class="nav-sublink2" href="/examples/vision/swin_transformers/">Image classification with Swin Transformers</a> <a class="nav-sublink2" href="/examples/vision/vit_small_ds/">Train a Vision Transformer on small datasets</a> <a class="nav-sublink2" href="/examples/vision/shiftvit/">A Vision Transformer without Attention</a> <a class="nav-sublink2" href="/examples/vision/image_classification_using_global_context_vision_transformer/">Image Classification using Global Context Vision Transformer</a> <a class="nav-sublink2" href="/examples/vision/temporal_latent_bottleneck/">When Recurrence meets Transformers</a> <a class="nav-sublink2" href="/examples/vision/oxford_pets_image_segmentation/">Image segmentation with a U-Net-like architecture</a> <a class="nav-sublink2" href="/examples/vision/deeplabv3_plus/">Multiclass semantic segmentation using DeepLabV3+</a> <a class="nav-sublink2" href="/examples/vision/basnet_segmentation/">Highly accurate boundaries segmentation using BASNet</a> <a class="nav-sublink2" href="/examples/vision/fully_convolutional_network/">Image Segmentation using Composable Fully-Convolutional Networks</a> <a class="nav-sublink2" href="/examples/vision/retinanet/">Object Detection with RetinaNet</a> <a class="nav-sublink2" href="/examples/vision/keypoint_detection/">Keypoint Detection with Transfer Learning</a> <a class="nav-sublink2" href="/examples/vision/object_detection_using_vision_transformer/">Object detection with Vision Transformers</a> <a class="nav-sublink2" href="/examples/vision/3D_image_classification/">3D image classification from CT scans</a> <a class="nav-sublink2" href="/examples/vision/depth_estimation/">Monocular depth estimation</a> <a class="nav-sublink2" href="/examples/vision/nerf/">3D volumetric rendering with NeRF</a> <a class="nav-sublink2" href="/examples/vision/pointnet_segmentation/">Point cloud segmentation with PointNet</a> <a class="nav-sublink2" href="/examples/vision/pointnet/">Point cloud classification</a> <a class="nav-sublink2" href="/examples/vision/captcha_ocr/">OCR model for reading Captchas</a> <a class="nav-sublink2" href="/examples/vision/handwriting_recognition/">Handwriting recognition</a> <a class="nav-sublink2" href="/examples/vision/autoencoder/">Convolutional autoencoder for image denoising</a> <a class="nav-sublink2" href="/examples/vision/mirnet/">Low-light image enhancement using MIRNet</a> <a class="nav-sublink2" href="/examples/vision/super_resolution_sub_pixel/">Image Super-Resolution using an Efficient Sub-Pixel CNN</a> <a class="nav-sublink2" href="/examples/vision/edsr/">Enhanced Deep Residual Networks for single-image super-resolution</a> <a class="nav-sublink2" href="/examples/vision/zero_dce/">Zero-DCE for low-light image enhancement</a> <a class="nav-sublink2" href="/examples/vision/cutmix/">CutMix data augmentation for image classification</a> <a class="nav-sublink2" href="/examples/vision/mixup/">MixUp augmentation for image classification</a> <a class="nav-sublink2" href="/examples/vision/randaugment/">RandAugment for Image Classification for Improved Robustness</a> <a class="nav-sublink2" href="/examples/vision/image_captioning/">Image captioning</a> <a class="nav-sublink2" href="/examples/vision/nl_image_search/">Natural language image search with a Dual Encoder</a> <a class="nav-sublink2" href="/examples/vision/visualizing_what_convnets_learn/">Visualizing what convnets learn</a> <a class="nav-sublink2" href="/examples/vision/integrated_gradients/">Model interpretability with Integrated Gradients</a> <a class="nav-sublink2" href="/examples/vision/probing_vits/">Investigating Vision Transformer representations</a> <a class="nav-sublink2" href="/examples/vision/grad_cam/">Grad-CAM class activation visualization</a> <a class="nav-sublink2" href="/examples/vision/near_dup_search/">Near-duplicate image search</a> <a class="nav-sublink2" href="/examples/vision/semantic_image_clustering/">Semantic Image Clustering</a> <a class="nav-sublink2" href="/examples/vision/siamese_contrastive/">Image similarity estimation using a Siamese Network with a contrastive loss</a> <a class="nav-sublink2" href="/examples/vision/siamese_network/">Image similarity estimation using a Siamese Network with a triplet loss</a> <a class="nav-sublink2" href="/examples/vision/metric_learning/">Metric learning for image similarity search</a> <a class="nav-sublink2" href="/examples/vision/metric_learning_tf_similarity/">Metric learning for image similarity search using TensorFlow Similarity</a> <a class="nav-sublink2" href="/examples/vision/nnclr/">Self-supervised contrastive learning with NNCLR</a> <a class="nav-sublink2" href="/examples/vision/video_classification/">Video Classification with a CNN-RNN Architecture</a> <a class="nav-sublink2" href="/examples/vision/conv_lstm/">Next-Frame Video Prediction with Convolutional LSTMs</a> <a class="nav-sublink2" href="/examples/vision/video_transformers/">Video Classification with Transformers</a> <a class="nav-sublink2" href="/examples/vision/vivit/">Video Vision Transformer</a> <a class="nav-sublink2" href="/examples/vision/bit/">Image Classification using BigTransfer (BiT)</a> <a class="nav-sublink2" href="/examples/vision/gradient_centralization/">Gradient Centralization for Better Training Performance</a> <a class="nav-sublink2" href="/examples/vision/token_learner/">Learning to tokenize in Vision Transformers</a> <a class="nav-sublink2" href="/examples/vision/knowledge_distillation/">Knowledge Distillation</a> <a class="nav-sublink2" href="/examples/vision/fixres/">FixRes: Fixing train-test resolution discrepancy</a> <a class="nav-sublink2" href="/examples/vision/cait/">Class Attention Image Transformers with LayerScale</a> <a class="nav-sublink2" href="/examples/vision/patch_convnet/">Augmenting convnets with aggregated attention</a> <a class="nav-sublink2" href="/examples/vision/learnable_resizer/">Learning to Resize</a> <a class="nav-sublink2" href="/examples/vision/adamatch/">Semi-supervision and domain adaptation with AdaMatch</a> <a class="nav-sublink2" href="/examples/vision/barlow_twins/">Barlow Twins for Contrastive SSL</a> <a class="nav-sublink2" href="/examples/vision/consistency_training/">Consistency training with supervision</a> <a class="nav-sublink2" href="/examples/vision/deit/">Distilling Vision Transformers</a> <a class="nav-sublink2" href="/examples/vision/focal_modulation_network/">Focal Modulation: A replacement for Self-Attention</a> <a class="nav-sublink2" href="/examples/vision/forwardforward/">Using the Forward-Forward Algorithm for Image Classification</a> <a class="nav-sublink2" href="/examples/vision/masked_image_modeling/">Masked image modeling with Autoencoders</a> <a class="nav-sublink2" href="/examples/vision/sam/">Segment Anything Model with 🤗Transformers</a> <a class="nav-sublink2" href="/examples/vision/segformer/">Semantic segmentation with SegFormer and Hugging Face Transformers</a> <a class="nav-sublink2" href="/examples/vision/simsiam/">Self-supervised contrastive learning with SimSiam</a> <a class="nav-sublink2" href="/examples/vision/supervised-contrastive-learning/">Supervised Contrastive Learning</a> <a class="nav-sublink2" href="/examples/vision/yolov8/">Efficient Object Detection with YOLOV8 and KerasCV</a> <a class="nav-sublink" href="/examples/nlp/">Natural Language Processing</a> <a class="nav-sublink" href="/examples/structured_data/">Structured Data</a> <a class="nav-sublink" href="/examples/timeseries/">Timeseries</a> <a class="nav-sublink" href="/examples/generative/">Generative Deep Learning</a> <a class="nav-sublink" href="/examples/audio/">Audio Data</a> <a class="nav-sublink" href="/examples/rl/">Reinforcement Learning</a> <a class="nav-sublink" href="/examples/graph/">Graph Data</a> <a class="nav-sublink" href="/examples/keras_recipes/">Quick Keras Recipes</a> <a class="nav-link" href="/api/" role="tab" aria-selected="">Keras 3 API documentation</a> <a class="nav-link" href="/2.18/api/" role="tab" aria-selected="">Keras 2 API documentation</a> <a class="nav-link" href="/keras_tuner/" role="tab" aria-selected="">KerasTuner: Hyperparam Tuning</a> <a class="nav-link" href="/keras_hub/" role="tab" aria-selected="">KerasHub: Pretrained Models</a> </div> </div> <div class='k-main'> <div class='k-main-top'> <script> function displayDropdownMenu() { e = document.getElementById("nav-menu"); if (e.style.display == "block") { e.style.display = "none"; } else { e.style.display = "block"; document.getElementById("dropdown-nav").style.display = "block"; } } function resetMobileUI() { if (window.innerWidth <= 840) { document.getElementById("nav-menu").style.display = "none"; document.getElementById("dropdown-nav").style.display = "block"; } else { document.getElementById("nav-menu").style.display = "block"; document.getElementById("dropdown-nav").style.display = "none"; } var navmenu = document.getElementById("nav-menu"); var menuheight = navmenu.clientHeight; var kmain = document.getElementById("k-main-id"); kmain.style.minHeight = (menuheight + 100) + 'px'; } window.onresize = resetMobileUI; window.addEventListener("load", (event) => { resetMobileUI() }); </script> <div id='dropdown-nav' onclick="displayDropdownMenu();"> <svg viewBox="-20 -20 120 120" width="60" height="60"> <rect width="100" height="20"></rect> <rect y="30" width="100" height="20"></rect> <rect y="60" width="100" height="20"></rect> </svg> </div> <form class="bd-search d-flex align-items-center k-search-form" id="search-form"> <input type="search" class="k-search-input" id="search-input" placeholder="Search Keras documentation..." aria-label="Search Keras documentation..." autocomplete="off"> <button class="k-search-btn"> <svg width="13" height="13" viewBox="0 0 13 13"><title>search</title><path d="m4.8495 7.8226c0.82666 0 1.5262-0.29146 2.0985-0.87438 0.57232-0.58292 0.86378-1.2877 0.87438-2.1144 0.010599-0.82666-0.28086-1.5262-0.87438-2.0985-0.59352-0.57232-1.293-0.86378-2.0985-0.87438-0.8055-0.010599-1.5103 0.28086-2.1144 0.87438-0.60414 0.59352-0.8956 1.293-0.87438 2.0985 0.021197 0.8055 0.31266 1.5103 0.87438 2.1144 0.56172 0.60414 1.2665 0.8956 2.1144 0.87438zm4.4695 0.2115 3.681 3.6819-1.259 1.284-3.6817-3.7 0.0019784-0.69479-0.090043-0.098846c-0.87973 0.76087-1.92 1.1413-3.1207 1.1413-1.3553 0-2.5025-0.46363-3.4417-1.3909s-1.4088-2.0686-1.4088-3.4239c0-1.3553 0.4696-2.4966 1.4088-3.4239 0.9392-0.92727 2.0864-1.3969 3.4417-1.4088 1.3553-0.011889 2.4906 0.45771 3.406 1.4088 0.9154 0.95107 1.379 2.0924 1.3909 3.4239 0 1.2126-0.38043 2.2588-1.1413 3.1385l0.098834 0.090049z"></path></svg> </button> </form> <script> var form = document.getElementById('search-form'); form.onsubmit = function(e) { e.preventDefault(); var query = document.getElementById('search-input').value; window.location.href = '/search.html?query=' + query; return False } </script> </div> <div class='k-main-inner' id='k-main-id'> <div class='k-location-slug'> ► <a href='/examples/'>Code examples</a> / <a href='/examples/vision/'>Computer Vision</a> / Involutional neural networks </div> <div class='k-content'> <h1 id="involutional-neural-networks">Involutional neural networks</h1> Author: <a href="https://twitter.com/ariG23498">Aritra Roy Gosthipaty</a> Date created: 2021/07/25 Last modified: 2021/07/25 Description: Deep dive into location-specific and channel-agnostic "involution" kernels. <div class='example_version_banner keras_3'>ⓘ This example uses Keras 3</div> <img class="k-inline-icon" src="https://colab.research.google.com/img/colab_favicon.ico"/> <a href="https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/vision/ipynb/involution.ipynb">View in Colab</a> •<img class="k-inline-icon" src="https://github.com/favicon.ico"/> <a href="https://github.com/keras-team/keras-io/blob/master/examples/vision/involution.py">GitHub source</a> <hr /> <h2 id="introduction">Introduction</h2> Convolution has been the basis of most modern neural networks for computer vision. A convolution kernel is spatial-agnostic and channel-specific. Because of this, it isn't able to adapt to different visual patterns with respect to different spatial locations. Along with location-related problems, the receptive field of convolution creates challenges with regard to capturing long-range spatial interactions. To address the above issues, Li et. al. rethink the properties of convolution in <a href="https://arxiv.org/abs/2103.06255">Involution: Inverting the Inherence of Convolution for VisualRecognition</a>. The authors propose the "involution kernel", that is location-specific and channel-agnostic. Due to the location-specific nature of the operation, the authors say that self-attention falls under the design paradigm of involution. This example describes the involution kernel, compares two image classification models, one with convolution and the other with involution, and also tries drawing a parallel with the self-attention layer. <hr /> <h2 id="setup">Setup</h2> <div class="codehilite"><pre><code>import os os.environ["KERAS_BACKEND"] = "tensorflow" import tensorflow as tf import keras import matplotlib.pyplot as plt # Set seed for reproducibility. tf.random.set_seed(42) </code></pre></div> <hr /> <h2 id="convolution">Convolution</h2> Convolution remains the mainstay of deep neural networks for computer vision. To understand Involution, it is necessary to talk about the convolution operation. <img alt="Imgur" src="https://i.imgur.com/MSKLsm5.png" /> Consider an input tensor X with dimensions H, W and C_in. We take a collection of C_out convolution kernels each of shape K, K, C_in. With the multiply-add operation between the input tensor and the kernels we obtain an output tensor Y with dimensions H, W, C_out. In the diagram above <code>C_out=3</code>. This makes the output tensor of shape H, W and 3. One can notice that the convoltuion kernel does not depend on the spatial position of the input tensor which makes it location-agnostic. On the other hand, each channel in the output tensor is based on a specific convolution filter which makes is channel-specific. <hr /> <h2 id="involution">Involution</h2> The idea is to have an operation that is both location-specific and channel-agnostic. Trying to implement these specific properties poses a challenge. With a fixed number of involution kernels (for each spatial position) we will not be able to process variable-resolution input tensors. To solve this problem, the authors have considered generating each kernel conditioned on specific spatial positions. With this method, we should be able to process variable-resolution input tensors with ease. The diagram below provides an intuition on this kernel generation method. <img alt="Imgur" src="https://i.imgur.com/jtrGGQg.png" /> <div class="codehilite"><pre><code>class Involution(keras.layers.Layer): def __init__( self, channel, group_number, kernel_size, stride, reduction_ratio, name ): super().__init__(name=name) # Initialize the parameters. self.channel = channel self.group_number = group_number self.kernel_size = kernel_size self.stride = stride self.reduction_ratio = reduction_ratio def build(self, input_shape): # Get the shape of the input. (_, height, width, num_channels) = input_shape # Scale the height and width with respect to the strides. height = height // self.stride width = width // self.stride # Define a layer that average pools the input tensor # if stride is more than 1. self.stride_layer = ( keras.layers.AveragePooling2D( pool_size=self.stride, strides=self.stride, padding="same" ) if self.stride > 1 else tf.identity ) # Define the kernel generation layer. self.kernel_gen = keras.Sequential( [ keras.layers.Conv2D( filters=self.channel // self.reduction_ratio, kernel_size=1 ), keras.layers.BatchNormalization(), keras.layers.ReLU(), keras.layers.Conv2D( filters=self.kernel_size * self.kernel_size * self.group_number, kernel_size=1, ), ] ) # Define reshape layers self.kernel_reshape = keras.layers.Reshape( target_shape=( height, width, self.kernel_size * self.kernel_size, 1, self.group_number, ) ) self.input_patches_reshape = keras.layers.Reshape( target_shape=( height, width, self.kernel_size * self.kernel_size, num_channels // self.group_number, self.group_number, ) ) self.output_reshape = keras.layers.Reshape( target_shape=(height, width, num_channels) ) def call(self, x): # Generate the kernel with respect to the input tensor. # B, H, W, K*K*G kernel_input = self.stride_layer(x) kernel = self.kernel_gen(kernel_input) # reshape the kerenl # B, H, W, K*K, 1, G kernel = self.kernel_reshape(kernel) # Extract input patches. # B, H, W, K*K*C input_patches = tf.image.extract_patches( images=x, sizes=[1, self.kernel_size, self.kernel_size, 1], strides=[1, self.stride, self.stride, 1], rates=[1, 1, 1, 1], padding="SAME", ) # Reshape the input patches to align with later operations. # B, H, W, K*K, C//G, G input_patches = self.input_patches_reshape(input_patches) # Compute the multiply-add operation of kernels and patches. # B, H, W, K*K, C//G, G output = tf.multiply(kernel, input_patches) # B, H, W, C//G, G output = tf.reduce_sum(output, axis=3) # Reshape the output kernel. # B, H, W, C output = self.output_reshape(output) # Return the output tensor and the kernel. return output, kernel </code></pre></div> <hr /> <h2 id="testing-the-involution-layer">Testing the Involution layer</h2> <div class="codehilite"><pre><code># Define the input tensor. input_tensor = tf.random.normal((32, 256, 256, 3)) # Compute involution with stride 1. output_tensor, _ = Involution( channel=3, group_number=1, kernel_size=5, stride=1, reduction_ratio=1, name="inv_1" )(input_tensor) print(f"with stride 1 ouput shape: {output_tensor.shape}") # Compute involution with stride 2. output_tensor, _ = Involution( channel=3, group_number=1, kernel_size=5, stride=2, reduction_ratio=1, name="inv_2" )(input_tensor) print(f"with stride 2 ouput shape: {output_tensor.shape}") # Compute involution with stride 1, channel 16 and reduction ratio 2. output_tensor, _ = Involution( channel=16, group_number=1, kernel_size=5, stride=1, reduction_ratio=2, name="inv_3" )(input_tensor) print( "with channel 16 and reduction ratio 2 ouput shape: {}".format(output_tensor.shape) ) </code></pre></div> <div class="k-default-codeblock"> <div class="codehilite"><pre><code>with stride 1 ouput shape: (32, 256, 256, 3) with stride 2 ouput shape: (32, 128, 128, 3) with channel 16 and reduction ratio 2 ouput shape: (32, 256, 256, 3) </code></pre></div> </div> <hr /> <h2 id="image-classification">Image Classification</h2> In this section, we will build an image-classifier model. There will be two models one with convolutions and the other with involutions. The image-classification model is heavily inspired by this <a href="https://www.tensorflow.org/tutorials/images/cnn">Convolutional Neural Network (CNN)</a> tutorial from Google. <hr /> <h2 id="get-the-cifar10-dataset">Get the CIFAR10 Dataset</h2> <div class="codehilite"><pre><code># Load the CIFAR10 dataset. print("loading the CIFAR10 dataset...") ( (train_images, train_labels), ( test_images, test_labels, ), ) = keras.datasets.cifar10.load_data() # Normalize pixel values to be between 0 and 1. (train_images, test_images) = (train_images / 255.0, test_images / 255.0) # Shuffle and batch the dataset. train_ds = ( tf.data.Dataset.from_tensor_slices((train_images, train_labels)) .shuffle(256) .batch(256) ) test_ds = tf.data.Dataset.from_tensor_slices((test_images, test_labels)).batch(256) </code></pre></div> <div class="k-default-codeblock"> <div class="codehilite"><pre><code>loading the CIFAR10 dataset... </code></pre></div> </div> <hr /> <h2 id="visualise-the-data">Visualise the data</h2> <div class="codehilite"><pre><code>class_names = [ "airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck", ] plt.figure(figsize=(10, 10)) for i in range(25): plt.subplot(5, 5, i + 1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i]) plt.xlabel(class_names[train_labels[i][0]]) plt.show() </code></pre></div> <img alt="png" src="/img/examples/vision/involution/involution_13_0.png" /> <hr /> <h2 id="convolutional-neural-network">Convolutional Neural Network</h2> <div class="codehilite"><pre><code># Build the conv model. print("building the convolution model...") conv_model = keras.Sequential( [ keras.layers.Conv2D(32, (3, 3), input_shape=(32, 32, 3), padding="same"), keras.layers.ReLU(name="relu1"), keras.layers.MaxPooling2D((2, 2)), keras.layers.Conv2D(64, (3, 3), padding="same"), keras.layers.ReLU(name="relu2"), keras.layers.MaxPooling2D((2, 2)), keras.layers.Conv2D(64, (3, 3), padding="same"), keras.layers.ReLU(name="relu3"), keras.layers.Flatten(), keras.layers.Dense(64, activation="relu"), keras.layers.Dense(10), ] ) # Compile the mode with the necessary loss function and optimizer. print("compiling the convolution model...") conv_model.compile( optimizer="adam", loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=["accuracy"], ) # Train the model. print("conv model training...") conv_hist = conv_model.fit(train_ds, epochs=20, validation_data=test_ds) </code></pre></div> <div class="k-default-codeblock"> <div class="codehilite"><pre><code>building the convolution model... compiling the convolution model... conv model training... Epoch 1/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 6s 15ms/step - accuracy: 0.3068 - loss: 1.9000 - val_accuracy: 0.4861 - val_loss: 1.4593 Epoch 2/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.5153 - loss: 1.3603 - val_accuracy: 0.5741 - val_loss: 1.1913 Epoch 3/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.5949 - loss: 1.1517 - val_accuracy: 0.6095 - val_loss: 1.0965 Epoch 4/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.6414 - loss: 1.0330 - val_accuracy: 0.6260 - val_loss: 1.0635 Epoch 5/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.6690 - loss: 0.9485 - val_accuracy: 0.6622 - val_loss: 0.9833 Epoch 6/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.6951 - loss: 0.8764 - val_accuracy: 0.6783 - val_loss: 0.9413 Epoch 7/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7122 - loss: 0.8167 - val_accuracy: 0.6856 - val_loss: 0.9134 Epoch 8/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7299 - loss: 0.7709 - val_accuracy: 0.7001 - val_loss: 0.8792 Epoch 9/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7467 - loss: 0.7288 - val_accuracy: 0.6992 - val_loss: 0.8821 Epoch 10/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7591 - loss: 0.6982 - val_accuracy: 0.7235 - val_loss: 0.8237 Epoch 11/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7725 - loss: 0.6550 - val_accuracy: 0.7115 - val_loss: 0.8521 Epoch 12/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7808 - loss: 0.6302 - val_accuracy: 0.7051 - val_loss: 0.8823 Epoch 13/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7860 - loss: 0.6101 - val_accuracy: 0.7122 - val_loss: 0.8635 Epoch 14/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7998 - loss: 0.5786 - val_accuracy: 0.7214 - val_loss: 0.8348 Epoch 15/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.8117 - loss: 0.5473 - val_accuracy: 0.7139 - val_loss: 0.8835 Epoch 16/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.8168 - loss: 0.5267 - val_accuracy: 0.7155 - val_loss: 0.8840 Epoch 17/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.8266 - loss: 0.5022 - val_accuracy: 0.7239 - val_loss: 0.8576 Epoch 18/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.8374 - loss: 0.4750 - val_accuracy: 0.7262 - val_loss: 0.8756 Epoch 19/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.8452 - loss: 0.4505 - val_accuracy: 0.7235 - val_loss: 0.9049 Epoch 20/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8531 - loss: 0.4283 - val_accuracy: 0.7304 - val_loss: 0.8962 </code></pre></div> </div> <hr /> <h2 id="involutional-neural-network">Involutional Neural Network</h2> <div class="codehilite"><pre><code># Build the involution model. print("building the involution model...") inputs = keras.Input(shape=(32, 32, 3)) x, _ = Involution( channel=3, group_number=1, kernel_size=3, stride=1, reduction_ratio=2, name="inv_1" )(inputs) x = keras.layers.ReLU()(x) x = keras.layers.MaxPooling2D((2, 2))(x) x, _ = Involution( channel=3, group_number=1, kernel_size=3, stride=1, reduction_ratio=2, name="inv_2" )(x) x = keras.layers.ReLU()(x) x = keras.layers.MaxPooling2D((2, 2))(x) x, _ = Involution( channel=3, group_number=1, kernel_size=3, stride=1, reduction_ratio=2, name="inv_3" )(x) x = keras.layers.ReLU()(x) x = keras.layers.Flatten()(x) x = keras.layers.Dense(64, activation="relu")(x) outputs = keras.layers.Dense(10)(x) inv_model = keras.Model(inputs=[inputs], outputs=[outputs], name="inv_model") # Compile the mode with the necessary loss function and optimizer. print("compiling the involution model...") inv_model.compile( optimizer="adam", loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=["accuracy"], ) # train the model print("inv model training...") inv_hist = inv_model.fit(train_ds, epochs=20, validation_data=test_ds) </code></pre></div> <div class="k-default-codeblock"> <div class="codehilite"><pre><code>building the involution model... compiling the involution model... inv model training... Epoch 1/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 9s 25ms/step - accuracy: 0.1369 - loss: 2.2728 - val_accuracy: 0.2716 - val_loss: 2.1041 Epoch 2/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.2922 - loss: 1.9489 - val_accuracy: 0.3478 - val_loss: 1.8275 Epoch 3/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.3477 - loss: 1.8098 - val_accuracy: 0.3782 - val_loss: 1.7435 Epoch 4/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.3741 - loss: 1.7420 - val_accuracy: 0.3901 - val_loss: 1.6943 Epoch 5/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.3931 - loss: 1.6942 - val_accuracy: 0.4007 - val_loss: 1.6639 Epoch 6/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.4057 - loss: 1.6622 - val_accuracy: 0.4108 - val_loss: 1.6494 Epoch 7/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.4134 - loss: 1.6374 - val_accuracy: 0.4202 - val_loss: 1.6363 Epoch 8/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.4200 - loss: 1.6166 - val_accuracy: 0.4312 - val_loss: 1.6062 Epoch 9/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.4286 - loss: 1.5949 - val_accuracy: 0.4316 - val_loss: 1.6018 Epoch 10/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.4346 - loss: 1.5794 - val_accuracy: 0.4346 - val_loss: 1.5963 Epoch 11/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.4395 - loss: 1.5641 - val_accuracy: 0.4388 - val_loss: 1.5831 Epoch 12/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.4445 - loss: 1.5502 - val_accuracy: 0.4443 - val_loss: 1.5826 Epoch 13/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.4493 - loss: 1.5391 - val_accuracy: 0.4497 - val_loss: 1.5574 Epoch 14/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.4528 - loss: 1.5255 - val_accuracy: 0.4547 - val_loss: 1.5433 Epoch 15/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.4575 - loss: 1.5148 - val_accuracy: 0.4548 - val_loss: 1.5438 Epoch 16/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.4599 - loss: 1.5072 - val_accuracy: 0.4581 - val_loss: 1.5323 Epoch 17/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.4664 - loss: 1.4957 - val_accuracy: 0.4598 - val_loss: 1.5321 Epoch 18/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.4701 - loss: 1.4863 - val_accuracy: 0.4575 - val_loss: 1.5302 Epoch 19/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.4737 - loss: 1.4790 - val_accuracy: 0.4676 - val_loss: 1.5233 Epoch 20/20 196/196 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.4771 - loss: 1.4740 - val_accuracy: 0.4719 - val_loss: 1.5096 </code></pre></div> </div> <hr /> <h2 id="comparisons">Comparisons</h2> In this section, we will be looking at both the models and compare a few pointers. <h3 id="parameters">Parameters</h3> One can see that with a similar architecture the parameters in a CNN is much larger than that of an INN (Involutional Neural Network). <div class="codehilite"><pre><code>conv_model.summary() inv_model.summary() </code></pre></div> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">Model: "sequential_3" </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩ │ conv2d_6 (Conv2D) │ (None, 32, 32, 32) │ 896 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ relu1 (ReLU) │ (None, 32, 32, 32) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ max_pooling2d (MaxPooling2D) │ (None, 16, 16, 32) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ conv2d_7 (Conv2D) │ (None, 16, 16, 64) │ 18,496 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ relu2 (ReLU) │ (None, 16, 16, 64) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ max_pooling2d_1 (MaxPooling2D) │ (None, 8, 8, 64) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ conv2d_8 (Conv2D) │ (None, 8, 8, 64) │ 36,928 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ relu3 (ReLU) │ (None, 8, 8, 64) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ flatten (Flatten) │ (None, 4096) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ dense (Dense) │ (None, 64) │ 262,208 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ dense_1 (Dense) │ (None, 10) │ 650 │ └─────────────────────────────────┴───────────────────────────┴────────────┘ </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"> Total params: 957,536 (3.65 MB) </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"> Trainable params: 319,178 (1.22 MB) </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"> Non-trainable params: 0 (0.00 B) </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"> Optimizer params: 638,358 (2.44 MB) </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">Model: "inv_model" </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩ │ input_layer_4 (InputLayer) │ (None, 32, 32, 3) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ inv_1 (Involution) │ [(None, 32, 32, 3), │ 26 │ │ │ (None, 32, 32, 9, 1, 1)] │ │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ re_lu_4 (ReLU) │ (None, 32, 32, 3) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ max_pooling2d_2 (MaxPooling2D) │ (None, 16, 16, 3) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ inv_2 (Involution) │ [(None, 16, 16, 3), │ 26 │ │ │ (None, 16, 16, 9, 1, 1)] │ │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ re_lu_6 (ReLU) │ (None, 16, 16, 3) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ max_pooling2d_3 (MaxPooling2D) │ (None, 8, 8, 3) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ inv_3 (Involution) │ [(None, 8, 8, 3), (None, │ 26 │ │ │ 8, 8, 9, 1, 1)] │ │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ re_lu_8 (ReLU) │ (None, 8, 8, 3) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ flatten_1 (Flatten) │ (None, 192) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ dense_2 (Dense) │ (None, 64) │ 12,352 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ dense_3 (Dense) │ (None, 10) │ 650 │ └─────────────────────────────────┴───────────────────────────┴────────────┘ </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"> Total params: 39,230 (153.25 KB) </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"> Trainable params: 13,074 (51.07 KB) </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"> Non-trainable params: 6 (24.00 B) </pre> <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"> Optimizer params: 26,150 (102.15 KB) </pre> <h3 id="loss-and-accuracy-plots">Loss and Accuracy Plots</h3> Here, the loss and the accuracy plots demonstrate that INNs are slow learners (with lower parameters). <div class="codehilite"><pre><code>plt.figure(figsize=(20, 5)) plt.subplot(1, 2, 1) plt.title("Convolution Loss") plt.plot(conv_hist.history["loss"], label="loss") plt.plot(conv_hist.history["val_loss"], label="val_loss") plt.legend() plt.subplot(1, 2, 2) plt.title("Involution Loss") plt.plot(inv_hist.history["loss"], label="loss") plt.plot(inv_hist.history["val_loss"], label="val_loss") plt.legend() plt.show() plt.figure(figsize=(20, 5)) plt.subplot(1, 2, 1) plt.title("Convolution Accuracy") plt.plot(conv_hist.history["accuracy"], label="accuracy") plt.plot(conv_hist.history["val_accuracy"], label="val_accuracy") plt.legend() plt.subplot(1, 2, 2) plt.title("Involution Accuracy") plt.plot(inv_hist.history["accuracy"], label="accuracy") plt.plot(inv_hist.history["val_accuracy"], label="val_accuracy") plt.legend() plt.show() </code></pre></div> <img alt="png" src="/img/examples/vision/involution/involution_22_0.png" /> <img alt="png" src="/img/examples/vision/involution/involution_22_1.png" /> <hr /> <h2 id="visualizing-involution-kernels">Visualizing Involution Kernels</h2> To visualize the kernels, we take the sum of K×K values from each involution kernel. All the representatives at different spatial locations frame the corresponding heat map. The authors mention: "Our proposed involution is reminiscent of self-attention and essentially could become a generalized version of it." With the visualization of the kernel we can indeed obtain an attention map of the image. The learned involution kernels provides attention to individual spatial positions of the input tensor. The location-specific property makes involution a generic space of models in which self-attention belongs. <div class="codehilite"><pre><code>layer_names = ["inv_1", "inv_2", "inv_3"] outputs = [inv_model.get_layer(name).output[1] for name in layer_names] vis_model = keras.Model(inv_model.input, outputs) fig, axes = plt.subplots(nrows=10, ncols=4, figsize=(10, 30)) for ax, test_image in zip(axes, test_images[:10]): (inv1_kernel, inv2_kernel, inv3_kernel) = vis_model.predict(test_image[None, ...]) inv1_kernel = tf.reduce_sum(inv1_kernel, axis=[-1, -2, -3]) inv2_kernel = tf.reduce_sum(inv2_kernel, axis=[-1, -2, -3]) inv3_kernel = tf.reduce_sum(inv3_kernel, axis=[-1, -2, -3]) ax[0].imshow(keras.utils.array_to_img(test_image)) ax[0].set_title("Input Image") ax[1].imshow(keras.utils.array_to_img(inv1_kernel[0, ..., None])) ax[1].set_title("Involution Kernel 1") ax[2].imshow(keras.utils.array_to_img(inv2_kernel[0, ..., None])) ax[2].set_title("Involution Kernel 2") ax[3].imshow(keras.utils.array_to_img(inv3_kernel[0, ..., None])) ax[3].set_title("Involution Kernel 3") </code></pre></div> <div class="k-default-codeblock"> <div class="codehilite"><pre><code> 1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 503ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step </code></pre></div> </div> <img alt="png" src="/img/examples/vision/involution/involution_24_1.png" /> <hr /> <h2 id="conclusions">Conclusions</h2> In this example, the main focus was to build an <code>Involution</code> layer which can be easily reused. While our comparisons were based on a specific task, feel free to use the layer for different tasks and report your results. According to me, the key take-away of involution is its relationship with self-attention. The intuition behind location-specific and channel-spefic processing makes sense in a lot of tasks. Moving forward one can: <ul> <li>Look at <a href="https://youtu.be/pH2jZun8MoY">Yannick's video</a> on involution for a better understanding.</li> <li>Experiment with the various hyperparameters of the involution layer.</li> <li>Build different models with the involution layer.</li> <li>Try building a different kernel generation method altogether.</li> </ul> You can use the trained model hosted on <a href="https://huggingface.co/keras-io/involution">Hugging Face Hub</a> and try the demo on <a href="https://huggingface.co/spaces/keras-io/involution">Hugging Face Spaces</a>. </div> <div class='k-outline'> <div class='k-outline-depth-1'> <a href='#involutional-neural-networks'>Involutional neural networks</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#introduction'>Introduction</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#setup'>Setup</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#convolution'>Convolution</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#involution'>Involution</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#testing-the-involution-layer'>Testing the Involution layer</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#image-classification'>Image Classification</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#get-the-cifar10-dataset'>Get the CIFAR10 Dataset</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#visualise-the-data'>Visualise the data</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#convolutional-neural-network'>Convolutional Neural Network</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#involutional-neural-network'>Involutional Neural Network</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#comparisons'>Comparisons</a> </div> <div class='k-outline-depth-3'> <a href='#parameters'>Parameters</a> </div> <div class='k-outline-depth-3'> <a href='#loss-and-accuracy-plots'>Loss and Accuracy Plots</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#visualizing-involution-kernels'>Visualizing Involution Kernels</a> </div> <div class='k-outline-depth-2'> ◆ <a href='#conclusions'>Conclusions</a> </div> </div> </div> </div> </div> </body> <footer style="float: left; width: 100%; padding: 1em; border-top: solid 1px #bbb;"> <a href="https://policies.google.com/terms">Terms</a> | <a href="https://policies.google.com/privacy">Privacy</a> </footer> </html>