CINXE.COM
Computer Vision
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="description" content="Keras documentation"> <meta name="author" content="Keras Team"> <link rel="shortcut icon" href="https://keras.io/img/favicon.ico"> <link rel="canonical" href="https://keras.io/examples/vision/" /> <!-- Social --> <meta property="og:title" content="Keras documentation: Computer Vision"> <meta property="og:image" content="https://keras.io/img/logo-k-keras-wb.png"> <meta name="twitter:title" content="Keras documentation: Computer Vision"> <meta name="twitter:image" content="https://keras.io/img/k-keras-social.png"> <meta name="twitter:card" content="summary"> <title>Computer Vision</title> <!-- Bootstrap core CSS --> <link href="/css/bootstrap.min.css" rel="stylesheet"> <!-- Custom fonts for this template --> <link href="https://fonts.googleapis.com/css2?family=Open+Sans:wght@400;600;700;800&display=swap" rel="stylesheet"> <!-- Custom styles for this template --> <link href="/css/docs.css" rel="stylesheet"> <link href="/css/monokai.css" rel="stylesheet"> <!-- Google Tag Manager --> <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0], j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src= 'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f); })(window,document,'script','dataLayer','GTM-5DNGF4N'); </script> <script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-175165319-128', 'auto'); ga('send', 'pageview'); </script> <!-- End Google Tag Manager --> <script async defer src="https://buttons.github.io/buttons.js"></script> </head> <body> <!-- Google Tag Manager (noscript) --> <noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-5DNGF4N" height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript> <!-- End Google Tag Manager (noscript) --> <div class='k-page'> <div class="k-nav" id="nav-menu"> <a href='/'><img src='/img/logo-small.png' class='logo-small' /></a> <div class="nav flex-column nav-pills" role="tablist" aria-orientation="vertical"> <a class="nav-link" href="/about/" role="tab" aria-selected="">About Keras</a> <a class="nav-link" href="/getting_started/" role="tab" aria-selected="">Getting started</a> <a class="nav-link" href="/guides/" role="tab" aria-selected="">Developer guides</a> <a class="nav-link" href="/api/" role="tab" aria-selected="">Keras 3 API documentation</a> <a class="nav-link" href="/2.18/api/" role="tab" aria-selected="">Keras 2 API documentation</a> <a class="nav-link active" href="/examples/" role="tab" aria-selected="">Code examples</a> <a class="nav-sublink active" href="/examples/vision/">Computer Vision</a> <a class="nav-sublink2" href="/examples/vision/image_classification_from_scratch/">Image classification from scratch</a> <a class="nav-sublink2" href="/examples/vision/mnist_convnet/">Simple MNIST convnet</a> <a class="nav-sublink2" href="/examples/vision/image_classification_efficientnet_fine_tuning/">Image classification via fine-tuning with EfficientNet</a> <a class="nav-sublink2" href="/examples/vision/image_classification_with_vision_transformer/">Image classification with Vision Transformer</a> <a class="nav-sublink2" href="/examples/vision/attention_mil_classification/">Classification using Attention-based Deep Multiple Instance Learning</a> <a class="nav-sublink2" href="/examples/vision/mlp_image_classification/">Image classification with modern MLP models</a> <a class="nav-sublink2" href="/examples/vision/mobilevit/">A mobile-friendly Transformer-based model for image classification</a> <a class="nav-sublink2" href="/examples/vision/xray_classification_with_tpus/">Pneumonia Classification on TPU</a> <a class="nav-sublink2" href="/examples/vision/cct/">Compact Convolutional Transformers</a> <a class="nav-sublink2" href="/examples/vision/convmixer/">Image classification with ConvMixer</a> <a class="nav-sublink2" href="/examples/vision/eanet/">Image classification with EANet (External Attention Transformer)</a> <a class="nav-sublink2" href="/examples/vision/involution/">Involutional neural networks</a> <a class="nav-sublink2" href="/examples/vision/perceiver_image_classification/">Image classification with Perceiver</a> <a class="nav-sublink2" href="/examples/vision/reptile/">Few-Shot learning with Reptile</a> <a class="nav-sublink2" href="/examples/vision/semisupervised_simclr/">Semi-supervised image classification using contrastive pretraining with SimCLR</a> <a class="nav-sublink2" href="/examples/vision/swin_transformers/">Image classification with Swin Transformers</a> <a class="nav-sublink2" href="/examples/vision/vit_small_ds/">Train a Vision Transformer on small datasets</a> <a class="nav-sublink2" href="/examples/vision/shiftvit/">A Vision Transformer without Attention</a> <a class="nav-sublink2" href="/examples/vision/image_classification_using_global_context_vision_transformer/">Image Classification using Global Context Vision Transformer</a> <a class="nav-sublink2" href="/examples/vision/oxford_pets_image_segmentation/">Image segmentation with a U-Net-like architecture</a> <a class="nav-sublink2" href="/examples/vision/deeplabv3_plus/">Multiclass semantic segmentation using DeepLabV3+</a> <a class="nav-sublink2" href="/examples/vision/basnet_segmentation/">Highly accurate boundaries segmentation using BASNet</a> <a class="nav-sublink2" href="/examples/vision/fully_convolutional_network/">Image Segmentation using Composable Fully-Convolutional Networks</a> <a class="nav-sublink2" href="/examples/vision/retinanet/">Object Detection with RetinaNet</a> <a class="nav-sublink2" href="/examples/vision/keypoint_detection/">Keypoint Detection with Transfer Learning</a> <a class="nav-sublink2" href="/examples/vision/object_detection_using_vision_transformer/">Object detection with Vision Transformers</a> <a class="nav-sublink2" href="/examples/vision/3D_image_classification/">3D image classification from CT scans</a> <a class="nav-sublink2" href="/examples/vision/depth_estimation/">Monocular depth estimation</a> <a class="nav-sublink2" href="/examples/vision/nerf/">3D volumetric rendering with NeRF</a> <a class="nav-sublink2" href="/examples/vision/pointnet_segmentation/">Point cloud segmentation with PointNet</a> <a class="nav-sublink2" href="/examples/vision/pointnet/">Point cloud classification</a> <a class="nav-sublink2" href="/examples/vision/captcha_ocr/">OCR model for reading Captchas</a> <a class="nav-sublink2" href="/examples/vision/handwriting_recognition/">Handwriting recognition</a> <a class="nav-sublink2" href="/examples/vision/autoencoder/">Convolutional autoencoder for image denoising</a> <a class="nav-sublink2" href="/examples/vision/mirnet/">Low-light image enhancement using MIRNet</a> <a class="nav-sublink2" href="/examples/vision/super_resolution_sub_pixel/">Image Super-Resolution using an Efficient Sub-Pixel CNN</a> <a class="nav-sublink2" href="/examples/vision/edsr/">Enhanced Deep Residual Networks for single-image super-resolution</a> <a class="nav-sublink2" href="/examples/vision/zero_dce/">Zero-DCE for low-light image enhancement</a> <a class="nav-sublink2" href="/examples/vision/cutmix/">CutMix data augmentation for image classification</a> <a class="nav-sublink2" href="/examples/vision/mixup/">MixUp augmentation for image classification</a> <a class="nav-sublink2" href="/examples/vision/randaugment/">RandAugment for Image Classification for Improved Robustness</a> <a class="nav-sublink2" href="/examples/vision/image_captioning/">Image captioning</a> <a class="nav-sublink2" href="/examples/vision/nl_image_search/">Natural language image search with a Dual Encoder</a> <a class="nav-sublink2" href="/examples/vision/visualizing_what_convnets_learn/">Visualizing what convnets learn</a> <a class="nav-sublink2" href="/examples/vision/integrated_gradients/">Model interpretability with Integrated Gradients</a> <a class="nav-sublink2" href="/examples/vision/probing_vits/">Investigating Vision Transformer representations</a> <a class="nav-sublink2" href="/examples/vision/grad_cam/">Grad-CAM class activation visualization</a> <a class="nav-sublink2" href="/examples/vision/near_dup_search/">Near-duplicate image search</a> <a class="nav-sublink2" href="/examples/vision/semantic_image_clustering/">Semantic Image Clustering</a> <a class="nav-sublink2" href="/examples/vision/siamese_contrastive/">Image similarity estimation using a Siamese Network with a contrastive loss</a> <a class="nav-sublink2" href="/examples/vision/siamese_network/">Image similarity estimation using a Siamese Network with a triplet loss</a> <a class="nav-sublink2" href="/examples/vision/metric_learning/">Metric learning for image similarity search</a> <a class="nav-sublink2" href="/examples/vision/metric_learning_tf_similarity/">Metric learning for image similarity search using TensorFlow Similarity</a> <a class="nav-sublink2" href="/examples/vision/nnclr/">Self-supervised contrastive learning with NNCLR</a> <a class="nav-sublink2" href="/examples/vision/video_classification/">Video Classification with a CNN-RNN Architecture</a> <a class="nav-sublink2" href="/examples/vision/conv_lstm/">Next-Frame Video Prediction with Convolutional LSTMs</a> <a class="nav-sublink2" href="/examples/vision/video_transformers/">Video Classification with Transformers</a> <a class="nav-sublink2" href="/examples/vision/vivit/">Video Vision Transformer</a> <a class="nav-sublink2" href="/examples/vision/bit/">Image Classification using BigTransfer (BiT)</a> <a class="nav-sublink2" href="/examples/vision/gradient_centralization/">Gradient Centralization for Better Training Performance</a> <a class="nav-sublink2" href="/examples/vision/token_learner/">Learning to tokenize in Vision Transformers</a> <a class="nav-sublink2" href="/examples/vision/knowledge_distillation/">Knowledge Distillation</a> <a class="nav-sublink2" href="/examples/vision/fixres/">FixRes: Fixing train-test resolution discrepancy</a> <a class="nav-sublink2" href="/examples/vision/cait/">Class Attention Image Transformers with LayerScale</a> <a class="nav-sublink2" href="/examples/vision/patch_convnet/">Augmenting convnets with aggregated attention</a> <a class="nav-sublink2" href="/examples/vision/learnable_resizer/">Learning to Resize</a> <a class="nav-sublink2" href="/examples/vision/adamatch/">Semi-supervision and domain adaptation with AdaMatch</a> <a class="nav-sublink2" href="/examples/vision/barlow_twins/">Barlow Twins for Contrastive SSL</a> <a class="nav-sublink2" href="/examples/vision/consistency_training/">Consistency training with supervision</a> <a class="nav-sublink2" href="/examples/vision/deit/">Distilling Vision Transformers</a> <a class="nav-sublink2" href="/examples/vision/focal_modulation_network/">Focal Modulation: A replacement for Self-Attention</a> <a class="nav-sublink2" href="/examples/vision/forwardforward/">Using the Forward-Forward Algorithm for Image Classification</a> <a class="nav-sublink2" href="/examples/vision/masked_image_modeling/">Masked image modeling with Autoencoders</a> <a class="nav-sublink2" href="/examples/vision/sam/">Segment Anything Model with 🤗Transformers</a> <a class="nav-sublink2" href="/examples/vision/segformer/">Semantic segmentation with SegFormer and Hugging Face Transformers</a> <a class="nav-sublink2" href="/examples/vision/simsiam/">Self-supervised contrastive learning with SimSiam</a> <a class="nav-sublink2" href="/examples/vision/supervised-contrastive-learning/">Supervised Contrastive Learning</a> <a class="nav-sublink2" href="/examples/vision/temporal_latent_bottleneck/">When Recurrence meets Transformers</a> <a class="nav-sublink2" href="/examples/vision/yolov8/">Efficient Object Detection with YOLOV8 and KerasCV</a> <a class="nav-sublink" href="/examples/nlp/">Natural Language Processing</a> <a class="nav-sublink" href="/examples/structured_data/">Structured Data</a> <a class="nav-sublink" href="/examples/timeseries/">Timeseries</a> <a class="nav-sublink" href="/examples/generative/">Generative Deep Learning</a> <a class="nav-sublink" href="/examples/audio/">Audio Data</a> <a class="nav-sublink" href="/examples/rl/">Reinforcement Learning</a> <a class="nav-sublink" href="/examples/graph/">Graph Data</a> <a class="nav-sublink" href="/examples/keras_recipes/">Quick Keras Recipes</a> <a class="nav-link" href="/keras_tuner/" role="tab" aria-selected="">KerasTuner: Hyperparameter Tuning</a> <a class="nav-link" href="/keras_hub/" role="tab" aria-selected="">KerasHub: Pretrained Models</a> <a class="nav-link" href="/keras_cv/" role="tab" aria-selected="">KerasCV: Computer Vision Workflows</a> <a class="nav-link" href="/keras_nlp/" role="tab" aria-selected="">KerasNLP: Natural Language Workflows</a> </div> </div> <div class='k-main'> <div class='k-main-top'> <script> function displayDropdownMenu() { e = document.getElementById("nav-menu"); if (e.style.display == "block") { e.style.display = "none"; } else { e.style.display = "block"; document.getElementById("dropdown-nav").style.display = "block"; } } function resetMobileUI() { if (window.innerWidth <= 840) { document.getElementById("nav-menu").style.display = "none"; document.getElementById("dropdown-nav").style.display = "block"; } else { document.getElementById("nav-menu").style.display = "block"; document.getElementById("dropdown-nav").style.display = "none"; } var navmenu = document.getElementById("nav-menu"); var menuheight = navmenu.clientHeight; var kmain = document.getElementById("k-main-id"); kmain.style.minHeight = (menuheight + 100) + 'px'; } window.onresize = resetMobileUI; window.addEventListener("load", (event) => { resetMobileUI() }); </script> <div id='dropdown-nav' onclick="displayDropdownMenu();"> <svg viewBox="-20 -20 120 120" width="60" height="60"> <rect width="100" height="20"></rect> <rect y="30" width="100" height="20"></rect> <rect y="60" width="100" height="20"></rect> </svg> </div> <form class="bd-search d-flex align-items-center k-search-form" id="search-form"> <input type="search" class="k-search-input" id="search-input" placeholder="Search Keras documentation..." aria-label="Search Keras documentation..." autocomplete="off"> <button class="k-search-btn"> <svg width="13" height="13" viewBox="0 0 13 13"><title>search</title><path d="m4.8495 7.8226c0.82666 0 1.5262-0.29146 2.0985-0.87438 0.57232-0.58292 0.86378-1.2877 0.87438-2.1144 0.010599-0.82666-0.28086-1.5262-0.87438-2.0985-0.59352-0.57232-1.293-0.86378-2.0985-0.87438-0.8055-0.010599-1.5103 0.28086-2.1144 0.87438-0.60414 0.59352-0.8956 1.293-0.87438 2.0985 0.021197 0.8055 0.31266 1.5103 0.87438 2.1144 0.56172 0.60414 1.2665 0.8956 2.1144 0.87438zm4.4695 0.2115 3.681 3.6819-1.259 1.284-3.6817-3.7 0.0019784-0.69479-0.090043-0.098846c-0.87973 0.76087-1.92 1.1413-3.1207 1.1413-1.3553 0-2.5025-0.46363-3.4417-1.3909s-1.4088-2.0686-1.4088-3.4239c0-1.3553 0.4696-2.4966 1.4088-3.4239 0.9392-0.92727 2.0864-1.3969 3.4417-1.4088 1.3553-0.011889 2.4906 0.45771 3.406 1.4088 0.9154 0.95107 1.379 2.0924 1.3909 3.4239 0 1.2126-0.38043 2.2588-1.1413 3.1385l0.098834 0.090049z"></path></svg> </button> </form> <script> var form = document.getElementById('search-form'); form.onsubmit = function(e) { e.preventDefault(); var query = document.getElementById('search-input').value; window.location.href = '/search.html?query=' + query; return False } </script> </div> <div class='k-main-inner' id='k-main-id'> <div class='k-location-slug'> <span class="k-location-slug-pointer">►</span> <a href='/examples/'>Code examples</a> / Computer Vision </div> <div class='k-content'> <h2><a href="/examples/vision/">Computer Vision</a></h2> <h3 class="example-subcategory-title">Image classification</h3> <a href="/examples/vision/image_classification_from_scratch"> <div class="example-card"> <div class="example-highlight">★</div> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification from scratch </div> </div> </a> <a href="/examples/vision/mnist_convnet"> <div class="example-card"> <div class="example-highlight">★</div> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Simple MNIST convnet </div> </div> </a> <a href="/examples/vision/image_classification_efficientnet_fine_tuning"> <div class="example-card"> <div class="example-highlight">★</div> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification via fine-tuning with EfficientNet </div> </div> </a> <a href="/examples/vision/image_classification_with_vision_transformer"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification with Vision Transformer </div> </div> </a> <a href="/examples/vision/attention_mil_classification"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Classification using Attention-based Deep Multiple Instance Learning </div> </div> </a> <a href="/examples/vision/mlp_image_classification"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification with modern MLP models </div> </div> </a> <a href="/examples/vision/mobilevit"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> A mobile-friendly Transformer-based model for image classification </div> </div> </a> <a href="/examples/vision/xray_classification_with_tpus"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Pneumonia Classification on TPU </div> </div> </a> <a href="/examples/vision/cct"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Compact Convolutional Transformers </div> </div> </a> <a href="/examples/vision/convmixer"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification with ConvMixer </div> </div> </a> <a href="/examples/vision/eanet"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification with EANet (External Attention Transformer) </div> </div> </a> <a href="/examples/vision/involution"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Involutional neural networks </div> </div> </a> <a href="/examples/vision/perceiver_image_classification"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification with Perceiver </div> </div> </a> <a href="/examples/vision/reptile"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Few-Shot learning with Reptile </div> </div> </a> <a href="/examples/vision/semisupervised_simclr"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Semi-supervised image classification using contrastive pretraining with SimCLR </div> </div> </a> <a href="/examples/vision/swin_transformers"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification with Swin Transformers </div> </div> </a> <a href="/examples/vision/vit_small_ds"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Train a Vision Transformer on small datasets </div> </div> </a> <a href="/examples/vision/shiftvit"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> A Vision Transformer without Attention </div> </div> </a> <a href="/examples/vision/image_classification_using_global_context_vision_transformer"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image Classification using Global Context Vision Transformer </div> </div> </a> <a href="/examples/vision/bit"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image Classification using BigTransfer (BiT) </div> </div> </a> <h3 class="example-subcategory-title">Image segmentation</h3> <a href="/examples/vision/oxford_pets_image_segmentation"> <div class="example-card"> <div class="example-highlight">★</div> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image segmentation with a U-Net-like architecture </div> </div> </a> <a href="/examples/vision/deeplabv3_plus"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Multiclass semantic segmentation using DeepLabV3+ </div> </div> </a> <a href="/examples/vision/basnet_segmentation"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Highly accurate boundaries segmentation using BASNet </div> </div> </a> <a href="/examples/vision/fully_convolutional_network"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image Segmentation using Composable Fully-Convolutional Networks </div> </div> </a> <h3 class="example-subcategory-title">Object detection</h3> <a href="/examples/vision/retinanet"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Object Detection with RetinaNet </div> </div> </a> <a href="/examples/vision/keypoint_detection"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Keypoint Detection with Transfer Learning </div> </div> </a> <a href="/examples/vision/object_detection_using_vision_transformer"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Object detection with Vision Transformers </div> </div> </a> <h3 class="example-subcategory-title">3D</h3> <a href="/examples/vision/3D_image_classification"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> 3D image classification from CT scans </div> </div> </a> <a href="/examples/vision/depth_estimation"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Monocular depth estimation </div> </div> </a> <a href="/examples/vision/nerf"> <div class="example-card"> <div class="example-highlight">★</div> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> 3D volumetric rendering with NeRF </div> </div> </a> <a href="/examples/vision/pointnet_segmentation"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Point cloud segmentation with PointNet </div> </div> </a> <a href="/examples/vision/pointnet"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Point cloud classification </div> </div> </a> <h3 class="example-subcategory-title">OCR</h3> <a href="/examples/vision/captcha_ocr"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> OCR model for reading Captchas </div> </div> </a> <a href="/examples/vision/handwriting_recognition"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Handwriting recognition </div> </div> </a> <h3 class="example-subcategory-title">Image enhancement</h3> <a href="/examples/vision/autoencoder"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Convolutional autoencoder for image denoising </div> </div> </a> <a href="/examples/vision/mirnet"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Low-light image enhancement using MIRNet </div> </div> </a> <a href="/examples/vision/super_resolution_sub_pixel"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image Super-Resolution using an Efficient Sub-Pixel CNN </div> </div> </a> <a href="/examples/vision/edsr"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Enhanced Deep Residual Networks for single-image super-resolution </div> </div> </a> <a href="/examples/vision/zero_dce"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Zero-DCE for low-light image enhancement </div> </div> </a> <h3 class="example-subcategory-title">Data augmentation</h3> <a href="/examples/vision/cutmix"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> CutMix data augmentation for image classification </div> </div> </a> <a href="/examples/vision/mixup"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> MixUp augmentation for image classification </div> </div> </a> <a href="/examples/vision/randaugment"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> RandAugment for Image Classification for Improved Robustness </div> </div> </a> <h3 class="example-subcategory-title">Image & Text</h3> <a href="/examples/vision/image_captioning"> <div class="example-card"> <div class="example-highlight">★</div> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image captioning </div> </div> </a> <a href="/examples/vision/nl_image_search"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Natural language image search with a Dual Encoder </div> </div> </a> <h3 class="example-subcategory-title">Vision models interpretability</h3> <a href="/examples/vision/visualizing_what_convnets_learn"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Visualizing what convnets learn </div> </div> </a> <a href="/examples/vision/integrated_gradients"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Model interpretability with Integrated Gradients </div> </div> </a> <a href="/examples/vision/probing_vits"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Investigating Vision Transformer representations </div> </div> </a> <a href="/examples/vision/grad_cam"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Grad-CAM class activation visualization </div> </div> </a> <h3 class="example-subcategory-title">Image similarity search</h3> <a href="/examples/vision/near_dup_search"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Near-duplicate image search </div> </div> </a> <a href="/examples/vision/semantic_image_clustering"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Semantic Image Clustering </div> </div> </a> <a href="/examples/vision/siamese_contrastive"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image similarity estimation using a Siamese Network with a contrastive loss </div> </div> </a> <a href="/examples/vision/siamese_network"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image similarity estimation using a Siamese Network with a triplet loss </div> </div> </a> <a href="/examples/vision/metric_learning"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Metric learning for image similarity search </div> </div> </a> <a href="/examples/vision/metric_learning_tf_similarity"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Metric learning for image similarity search using TensorFlow Similarity </div> </div> </a> <a href="/examples/vision/nnclr"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Self-supervised contrastive learning with NNCLR </div> </div> </a> <h3 class="example-subcategory-title">Video</h3> <a href="/examples/vision/video_classification"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Video Classification with a CNN-RNN Architecture </div> </div> </a> <a href="/examples/vision/conv_lstm"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Next-Frame Video Prediction with Convolutional LSTMs </div> </div> </a> <a href="/examples/vision/video_transformers"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Video Classification with Transformers </div> </div> </a> <a href="/examples/vision/vivit"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Video Vision Transformer </div> </div> </a> <h3 class="example-subcategory-title">Performance recipes</h3> <a href="/examples/vision/gradient_centralization"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Gradient Centralization for Better Training Performance </div> </div> </a> <a href="/examples/vision/token_learner"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Learning to tokenize in Vision Transformers </div> </div> </a> <a href="/examples/vision/knowledge_distillation"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Knowledge Distillation </div> </div> </a> <a href="/examples/vision/fixres"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> FixRes: Fixing train-test resolution discrepancy </div> </div> </a> <a href="/examples/vision/cait"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Class Attention Image Transformers with LayerScale </div> </div> </a> <a href="/examples/vision/patch_convnet"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Augmenting convnets with aggregated attention </div> </div> </a> <a href="/examples/vision/learnable_resizer"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Learning to Resize </div> </div> </a> <h3 class="example-subcategory-title">Other</h3> <a href="/examples/vision/adamatch"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Semi-supervision and domain adaptation with AdaMatch </div> </div> </a> <a href="/examples/vision/barlow_twins"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Barlow Twins for Contrastive SSL </div> </div> </a> <a href="/examples/vision/consistency_training"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Consistency training with supervision </div> </div> </a> <a href="/examples/vision/deit"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Distilling Vision Transformers </div> </div> </a> <a href="/examples/vision/focal_modulation_network"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Focal Modulation: A replacement for Self-Attention </div> </div> </a> <a href="/examples/vision/forwardforward"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Using the Forward-Forward Algorithm for Image Classification </div> </div> </a> <a href="/examples/vision/masked_image_modeling"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Masked image modeling with Autoencoders </div> </div> </a> <a href="/examples/vision/sam"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Segment Anything Model with 🤗Transformers </div> </div> </a> <a href="/examples/vision/segformer"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Semantic segmentation with SegFormer and Hugging Face Transformers </div> </div> </a> <a href="/examples/vision/simsiam"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Self-supervised contrastive learning with SimSiam </div> </div> </a> <a href="/examples/vision/supervised-contrastive-learning"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Supervised Contrastive Learning </div> </div> </a> <a href="/examples/vision/temporal_latent_bottleneck"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> When Recurrence meets Transformers </div> </div> </a> <a href="/examples/vision/yolov8"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Efficient Object Detection with YOLOV8 and KerasCV </div> </div> </a> <hr> </div> </div> </div> </div> </body> <footer style="float: left; width: 100%; padding: 1em; border-top: solid 1px #bbb;"> <a href="https://policies.google.com/terms">Terms</a> | <a href="https://policies.google.com/privacy">Privacy</a> </footer> </html>