CINXE.COM
Computer Vision
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="description" content="Keras documentation"> <meta name="author" content="Keras Team"> <link rel="shortcut icon" href="https://keras.io/img/favicon.ico"> <link rel="canonical" href="https://keras.io/examples/vision/" /> <!-- Social --> <meta property="og:title" content="Keras documentation: Computer Vision"> <meta property="og:image" content="https://keras.io/img/logo-k-keras-wb.png"> <meta name="twitter:title" content="Keras documentation: Computer Vision"> <meta name="twitter:image" content="https://keras.io/img/k-keras-social.png"> <meta name="twitter:card" content="summary"> <title>Computer Vision</title> <!-- Custom fonts for this template --> <link href="https://fonts.googleapis.com/css2?family=Open+Sans:wght@400;600;700;800&display=swap" rel="stylesheet"> <link href="https://fonts.googleapis.com/css2?family=Montserrat:wght@300;400;600;700;800&display=swap" rel="stylesheet"> <link href="https://fonts.googleapis.com/css2?family=Roboto+Mono:wght@400&display=swap" rel="stylesheet"> <!-- Bootstrap core CSS --> <link href="/css/bootstrap.min.css" rel="stylesheet"> <!-- Custom styles for this template --> <link href="/css/docs.css?v=3" rel="stylesheet"> <link href="/css/monokai.css" rel="stylesheet"> <!-- Google Tag Manager --> <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0], j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src= 'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f); })(window,document,'script','dataLayer','GTM-5DNGF4N'); </script> <script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-175165319-128', 'auto'); ga('send', 'pageview'); </script> <!-- End Google Tag Manager --> <script async defer src="https://buttons.github.io/buttons.js"></script> <link rel="preconnect" href="https://fonts.googleapis.com"> <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> </head> <body> <!-- Google Tag Manager (noscript) --> <noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-5DNGF4N" height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript> <!-- End Google Tag Manager (noscript) --> <div class="k-page"> <div class="hidden"> None </div> <nav class="nav__container"> <div class="nav__wrapper"> <div class="nav__controls--mobile"> <button class="nav__menu--button"><i class="icon--menu"></i></button> <button class="nav__menu--close"><i class="icon--close"></i></button> <a href="/"> <img class="nav__logo nav__logo--mobile" src="/img/k-logo.png" /> </a> <button class="nav__search--mobile"> <i class="icon__search--mobile"></i> </button> </div> <form class="nav__search nav__search-form--mobile"> <input class="nav__search--input" type="search" placeholder="SEARCH" aria-label="Search" /> <button class="nav__search--button" type="submit"> <i class="icon--search"></i> </button> </form> <div class="k-nav nav__mobile-menu" id="nav-menu"> <!-- version with just the active item visible --> <div class="nav flex-column nav-pills" role="tablist" aria-orientation="vertical"> <a class="nav-link" href="/getting_started/" role="tab" aria-selected="">Getting started</a> <a class="nav-link" href="/guides/" role="tab" aria-selected="">Developer guides</a> <a class="nav-link active" href="/examples/" role="tab" aria-selected="">Code examples</a> <a class="nav-sublink active" href="/examples/vision/">Computer Vision</a> <a class="nav-sublink2" href="/examples/vision/image_classification_from_scratch/">Image classification from scratch</a> <a class="nav-sublink2" href="/examples/vision/mnist_convnet/">Simple MNIST convnet</a> <a class="nav-sublink2" href="/examples/vision/image_classification_efficientnet_fine_tuning/">Image classification via fine-tuning with EfficientNet</a> <a class="nav-sublink2" href="/examples/vision/image_classification_with_vision_transformer/">Image classification with Vision Transformer</a> <a class="nav-sublink2" href="/examples/vision/attention_mil_classification/">Classification using Attention-based Deep Multiple Instance Learning</a> <a class="nav-sublink2" href="/examples/vision/mlp_image_classification/">Image classification with modern MLP models</a> <a class="nav-sublink2" href="/examples/vision/mobilevit/">A mobile-friendly Transformer-based model for image classification</a> <a class="nav-sublink2" href="/examples/vision/xray_classification_with_tpus/">Pneumonia Classification on TPU</a> <a class="nav-sublink2" href="/examples/vision/cct/">Compact Convolutional Transformers</a> <a class="nav-sublink2" href="/examples/vision/convmixer/">Image classification with ConvMixer</a> <a class="nav-sublink2" href="/examples/vision/eanet/">Image classification with EANet (External Attention Transformer)</a> <a class="nav-sublink2" href="/examples/vision/involution/">Involutional neural networks</a> <a class="nav-sublink2" href="/examples/vision/perceiver_image_classification/">Image classification with Perceiver</a> <a class="nav-sublink2" href="/examples/vision/reptile/">Few-Shot learning with Reptile</a> <a class="nav-sublink2" href="/examples/vision/semisupervised_simclr/">Semi-supervised image classification using contrastive pretraining with SimCLR</a> <a class="nav-sublink2" href="/examples/vision/swin_transformers/">Image classification with Swin Transformers</a> <a class="nav-sublink2" href="/examples/vision/vit_small_ds/">Train a Vision Transformer on small datasets</a> <a class="nav-sublink2" href="/examples/vision/shiftvit/">A Vision Transformer without Attention</a> <a class="nav-sublink2" href="/examples/vision/image_classification_using_global_context_vision_transformer/">Image Classification using Global Context Vision Transformer</a> <a class="nav-sublink2" href="/examples/vision/temporal_latent_bottleneck/">When Recurrence meets Transformers</a> <a class="nav-sublink2" href="/examples/vision/oxford_pets_image_segmentation/">Image segmentation with a U-Net-like architecture</a> <a class="nav-sublink2" href="/examples/vision/deeplabv3_plus/">Multiclass semantic segmentation using DeepLabV3+</a> <a class="nav-sublink2" href="/examples/vision/basnet_segmentation/">Highly accurate boundaries segmentation using BASNet</a> <a class="nav-sublink2" href="/examples/vision/fully_convolutional_network/">Image Segmentation using Composable Fully-Convolutional Networks</a> <a class="nav-sublink2" href="/examples/vision/retinanet/">Object Detection with RetinaNet</a> <a class="nav-sublink2" href="/examples/vision/keypoint_detection/">Keypoint Detection with Transfer Learning</a> <a class="nav-sublink2" href="/examples/vision/object_detection_using_vision_transformer/">Object detection with Vision Transformers</a> <a class="nav-sublink2" href="/examples/vision/3D_image_classification/">3D image classification from CT scans</a> <a class="nav-sublink2" href="/examples/vision/depth_estimation/">Monocular depth estimation</a> <a class="nav-sublink2" href="/examples/vision/nerf/">3D volumetric rendering with NeRF</a> <a class="nav-sublink2" href="/examples/vision/pointnet_segmentation/">Point cloud segmentation with PointNet</a> <a class="nav-sublink2" href="/examples/vision/pointnet/">Point cloud classification</a> <a class="nav-sublink2" href="/examples/vision/captcha_ocr/">OCR model for reading Captchas</a> <a class="nav-sublink2" href="/examples/vision/handwriting_recognition/">Handwriting recognition</a> <a class="nav-sublink2" href="/examples/vision/autoencoder/">Convolutional autoencoder for image denoising</a> <a class="nav-sublink2" href="/examples/vision/mirnet/">Low-light image enhancement using MIRNet</a> <a class="nav-sublink2" href="/examples/vision/super_resolution_sub_pixel/">Image Super-Resolution using an Efficient Sub-Pixel CNN</a> <a class="nav-sublink2" href="/examples/vision/edsr/">Enhanced Deep Residual Networks for single-image super-resolution</a> <a class="nav-sublink2" href="/examples/vision/zero_dce/">Zero-DCE for low-light image enhancement</a> <a class="nav-sublink2" href="/examples/vision/cutmix/">CutMix data augmentation for image classification</a> <a class="nav-sublink2" href="/examples/vision/mixup/">MixUp augmentation for image classification</a> <a class="nav-sublink2" href="/examples/vision/randaugment/">RandAugment for Image Classification for Improved Robustness</a> <a class="nav-sublink2" href="/examples/vision/image_captioning/">Image captioning</a> <a class="nav-sublink2" href="/examples/vision/nl_image_search/">Natural language image search with a Dual Encoder</a> <a class="nav-sublink2" href="/examples/vision/visualizing_what_convnets_learn/">Visualizing what convnets learn</a> <a class="nav-sublink2" href="/examples/vision/integrated_gradients/">Model interpretability with Integrated Gradients</a> <a class="nav-sublink2" href="/examples/vision/probing_vits/">Investigating Vision Transformer representations</a> <a class="nav-sublink2" href="/examples/vision/grad_cam/">Grad-CAM class activation visualization</a> <a class="nav-sublink2" href="/examples/vision/near_dup_search/">Near-duplicate image search</a> <a class="nav-sublink2" href="/examples/vision/semantic_image_clustering/">Semantic Image Clustering</a> <a class="nav-sublink2" href="/examples/vision/siamese_contrastive/">Image similarity estimation using a Siamese Network with a contrastive loss</a> <a class="nav-sublink2" href="/examples/vision/siamese_network/">Image similarity estimation using a Siamese Network with a triplet loss</a> <a class="nav-sublink2" href="/examples/vision/metric_learning/">Metric learning for image similarity search</a> <a class="nav-sublink2" href="/examples/vision/metric_learning_tf_similarity/">Metric learning for image similarity search using TensorFlow Similarity</a> <a class="nav-sublink2" href="/examples/vision/nnclr/">Self-supervised contrastive learning with NNCLR</a> <a class="nav-sublink2" href="/examples/vision/video_classification/">Video Classification with a CNN-RNN Architecture</a> <a class="nav-sublink2" href="/examples/vision/conv_lstm/">Next-Frame Video Prediction with Convolutional LSTMs</a> <a class="nav-sublink2" href="/examples/vision/video_transformers/">Video Classification with Transformers</a> <a class="nav-sublink2" href="/examples/vision/vivit/">Video Vision Transformer</a> <a class="nav-sublink2" href="/examples/vision/bit/">Image Classification using BigTransfer (BiT)</a> <a class="nav-sublink2" href="/examples/vision/gradient_centralization/">Gradient Centralization for Better Training Performance</a> <a class="nav-sublink2" href="/examples/vision/token_learner/">Learning to tokenize in Vision Transformers</a> <a class="nav-sublink2" href="/examples/vision/knowledge_distillation/">Knowledge Distillation</a> <a class="nav-sublink2" href="/examples/vision/fixres/">FixRes: Fixing train-test resolution discrepancy</a> <a class="nav-sublink2" href="/examples/vision/cait/">Class Attention Image Transformers with LayerScale</a> <a class="nav-sublink2" href="/examples/vision/patch_convnet/">Augmenting convnets with aggregated attention</a> <a class="nav-sublink2" href="/examples/vision/learnable_resizer/">Learning to Resize</a> <a class="nav-sublink2" href="/examples/vision/adamatch/">Semi-supervision and domain adaptation with AdaMatch</a> <a class="nav-sublink2" href="/examples/vision/barlow_twins/">Barlow Twins for Contrastive SSL</a> <a class="nav-sublink2" href="/examples/vision/consistency_training/">Consistency training with supervision</a> <a class="nav-sublink2" href="/examples/vision/deit/">Distilling Vision Transformers</a> <a class="nav-sublink2" href="/examples/vision/focal_modulation_network/">Focal Modulation: A replacement for Self-Attention</a> <a class="nav-sublink2" href="/examples/vision/forwardforward/">Using the Forward-Forward Algorithm for Image Classification</a> <a class="nav-sublink2" href="/examples/vision/masked_image_modeling/">Masked image modeling with Autoencoders</a> <a class="nav-sublink2" href="/examples/vision/sam/">Segment Anything Model with 🤗Transformers</a> <a class="nav-sublink2" href="/examples/vision/segformer/">Semantic segmentation with SegFormer and Hugging Face Transformers</a> <a class="nav-sublink2" href="/examples/vision/simsiam/">Self-supervised contrastive learning with SimSiam</a> <a class="nav-sublink2" href="/examples/vision/supervised-contrastive-learning/">Supervised Contrastive Learning</a> <a class="nav-sublink2" href="/examples/vision/yolov8/">Efficient Object Detection with YOLOV8 and KerasCV</a> <a class="nav-sublink" href="/examples/nlp/">Natural Language Processing</a> <a class="nav-sublink" href="/examples/structured_data/">Structured Data</a> <a class="nav-sublink" href="/examples/timeseries/">Timeseries</a> <a class="nav-sublink" href="/examples/generative/">Generative Deep Learning</a> <a class="nav-sublink" href="/examples/audio/">Audio Data</a> <a class="nav-sublink" href="/examples/rl/">Reinforcement Learning</a> <a class="nav-sublink" href="/examples/graph/">Graph Data</a> <a class="nav-sublink" href="/examples/keras_recipes/">Quick Keras Recipes</a> <a class="nav-link" href="/api/" role="tab" aria-selected="">Keras 3 API documentation</a> <a class="nav-link" href="/2/api/" role="tab" aria-selected="">Keras 2 API documentation</a> <a class="nav-link" href="/keras_tuner/" role="tab" aria-selected="">KerasTuner: Hyperparam Tuning</a> <a class="nav-link" href="/keras_hub/" role="tab" aria-selected="">KerasHub: Pretrained Models</a> </div> </div> <a href="/"> <img class="nav__logo nav__logo--desktop" src="/img/logo.png" alt="keras.io logo" /> </a> <div class="nav__menu"> <ul class="nav__item--container"> <li class="nav__item"> <a class="nav__link" href="/getting_started/">Get started</a> </li> <li class="nav__item"> <a class="nav__link" href="/guides/">Guides</a> </li> <li class="nav__item"> <a class="nav__link" href="/api/">API Docs</a> </li> <li class="nav__item"> <a class="nav__link nav__link--active" href="/examples/">Examples</a> </li> <li class="nav__item"> <a class="nav__link" href="/keras_tuner/">Keras Tuner</a> </li> <li class="nav__item"> <a class="nav__link" href="/keras_hub/">Keras Hub</a> </li> </ul> <form class="nav__search"> <input class="nav__search--input" type="search" placeholder="SEARCH" aria-label="Search" /> <button class="nav__search--button" type="submit"> <i class="icon--search"></i> </button> </form> </div> </div> </nav> <div class="page__container flex__container"> <div class="nav__side-nav" id="nav-menu"> <div class="nav flex-column nav-pills" role="tablist" aria-orientation="vertical"> <a class="nav-link active" href="/examples/" role="tab" aria-selected=""> Code examples </a> <div class="nav-expanded-panel"> <a class="nav-sublink active" href="/examples/vision/">Computer Vision</a> <a class="nav-sublink2" href="/examples/vision/image_classification_from_scratch/">Image classification from scratch</a> <a class="nav-sublink2" href="/examples/vision/mnist_convnet/">Simple MNIST convnet</a> <a class="nav-sublink2" href="/examples/vision/image_classification_efficientnet_fine_tuning/">Image classification via fine-tuning with EfficientNet</a> <a class="nav-sublink2" href="/examples/vision/image_classification_with_vision_transformer/">Image classification with Vision Transformer</a> <a class="nav-sublink2" href="/examples/vision/attention_mil_classification/">Classification using Attention-based Deep Multiple Instance Learning</a> <a class="nav-sublink2" href="/examples/vision/mlp_image_classification/">Image classification with modern MLP models</a> <a class="nav-sublink2" href="/examples/vision/mobilevit/">A mobile-friendly Transformer-based model for image classification</a> <a class="nav-sublink2" href="/examples/vision/xray_classification_with_tpus/">Pneumonia Classification on TPU</a> <a class="nav-sublink2" href="/examples/vision/cct/">Compact Convolutional Transformers</a> <a class="nav-sublink2" href="/examples/vision/convmixer/">Image classification with ConvMixer</a> <a class="nav-sublink2" href="/examples/vision/eanet/">Image classification with EANet (External Attention Transformer)</a> <a class="nav-sublink2" href="/examples/vision/involution/">Involutional neural networks</a> <a class="nav-sublink2" href="/examples/vision/perceiver_image_classification/">Image classification with Perceiver</a> <a class="nav-sublink2" href="/examples/vision/reptile/">Few-Shot learning with Reptile</a> <a class="nav-sublink2" href="/examples/vision/semisupervised_simclr/">Semi-supervised image classification using contrastive pretraining with SimCLR</a> <a class="nav-sublink2" href="/examples/vision/swin_transformers/">Image classification with Swin Transformers</a> <a class="nav-sublink2" href="/examples/vision/vit_small_ds/">Train a Vision Transformer on small datasets</a> <a class="nav-sublink2" href="/examples/vision/shiftvit/">A Vision Transformer without Attention</a> <a class="nav-sublink2" href="/examples/vision/image_classification_using_global_context_vision_transformer/">Image Classification using Global Context Vision Transformer</a> <a class="nav-sublink2" href="/examples/vision/temporal_latent_bottleneck/">When Recurrence meets Transformers</a> <a class="nav-sublink2" href="/examples/vision/oxford_pets_image_segmentation/">Image segmentation with a U-Net-like architecture</a> <a class="nav-sublink2" href="/examples/vision/deeplabv3_plus/">Multiclass semantic segmentation using DeepLabV3+</a> <a class="nav-sublink2" href="/examples/vision/basnet_segmentation/">Highly accurate boundaries segmentation using BASNet</a> <a class="nav-sublink2" href="/examples/vision/fully_convolutional_network/">Image Segmentation using Composable Fully-Convolutional Networks</a> <a class="nav-sublink2" href="/examples/vision/retinanet/">Object Detection with RetinaNet</a> <a class="nav-sublink2" href="/examples/vision/keypoint_detection/">Keypoint Detection with Transfer Learning</a> <a class="nav-sublink2" href="/examples/vision/object_detection_using_vision_transformer/">Object detection with Vision Transformers</a> <a class="nav-sublink2" href="/examples/vision/3D_image_classification/">3D image classification from CT scans</a> <a class="nav-sublink2" href="/examples/vision/depth_estimation/">Monocular depth estimation</a> <a class="nav-sublink2" href="/examples/vision/nerf/">3D volumetric rendering with NeRF</a> <a class="nav-sublink2" href="/examples/vision/pointnet_segmentation/">Point cloud segmentation with PointNet</a> <a class="nav-sublink2" href="/examples/vision/pointnet/">Point cloud classification</a> <a class="nav-sublink2" href="/examples/vision/captcha_ocr/">OCR model for reading Captchas</a> <a class="nav-sublink2" href="/examples/vision/handwriting_recognition/">Handwriting recognition</a> <a class="nav-sublink2" href="/examples/vision/autoencoder/">Convolutional autoencoder for image denoising</a> <a class="nav-sublink2" href="/examples/vision/mirnet/">Low-light image enhancement using MIRNet</a> <a class="nav-sublink2" href="/examples/vision/super_resolution_sub_pixel/">Image Super-Resolution using an Efficient Sub-Pixel CNN</a> <a class="nav-sublink2" href="/examples/vision/edsr/">Enhanced Deep Residual Networks for single-image super-resolution</a> <a class="nav-sublink2" href="/examples/vision/zero_dce/">Zero-DCE for low-light image enhancement</a> <a class="nav-sublink2" href="/examples/vision/cutmix/">CutMix data augmentation for image classification</a> <a class="nav-sublink2" href="/examples/vision/mixup/">MixUp augmentation for image classification</a> <a class="nav-sublink2" href="/examples/vision/randaugment/">RandAugment for Image Classification for Improved Robustness</a> <a class="nav-sublink2" href="/examples/vision/image_captioning/">Image captioning</a> <a class="nav-sublink2" href="/examples/vision/nl_image_search/">Natural language image search with a Dual Encoder</a> <a class="nav-sublink2" href="/examples/vision/visualizing_what_convnets_learn/">Visualizing what convnets learn</a> <a class="nav-sublink2" href="/examples/vision/integrated_gradients/">Model interpretability with Integrated Gradients</a> <a class="nav-sublink2" href="/examples/vision/probing_vits/">Investigating Vision Transformer representations</a> <a class="nav-sublink2" href="/examples/vision/grad_cam/">Grad-CAM class activation visualization</a> <a class="nav-sublink2" href="/examples/vision/near_dup_search/">Near-duplicate image search</a> <a class="nav-sublink2" href="/examples/vision/semantic_image_clustering/">Semantic Image Clustering</a> <a class="nav-sublink2" href="/examples/vision/siamese_contrastive/">Image similarity estimation using a Siamese Network with a contrastive loss</a> <a class="nav-sublink2" href="/examples/vision/siamese_network/">Image similarity estimation using a Siamese Network with a triplet loss</a> <a class="nav-sublink2" href="/examples/vision/metric_learning/">Metric learning for image similarity search</a> <a class="nav-sublink2" href="/examples/vision/metric_learning_tf_similarity/">Metric learning for image similarity search using TensorFlow Similarity</a> <a class="nav-sublink2" href="/examples/vision/nnclr/">Self-supervised contrastive learning with NNCLR</a> <a class="nav-sublink2" href="/examples/vision/video_classification/">Video Classification with a CNN-RNN Architecture</a> <a class="nav-sublink2" href="/examples/vision/conv_lstm/">Next-Frame Video Prediction with Convolutional LSTMs</a> <a class="nav-sublink2" href="/examples/vision/video_transformers/">Video Classification with Transformers</a> <a class="nav-sublink2" href="/examples/vision/vivit/">Video Vision Transformer</a> <a class="nav-sublink2" href="/examples/vision/bit/">Image Classification using BigTransfer (BiT)</a> <a class="nav-sublink2" href="/examples/vision/gradient_centralization/">Gradient Centralization for Better Training Performance</a> <a class="nav-sublink2" href="/examples/vision/token_learner/">Learning to tokenize in Vision Transformers</a> <a class="nav-sublink2" href="/examples/vision/knowledge_distillation/">Knowledge Distillation</a> <a class="nav-sublink2" href="/examples/vision/fixres/">FixRes: Fixing train-test resolution discrepancy</a> <a class="nav-sublink2" href="/examples/vision/cait/">Class Attention Image Transformers with LayerScale</a> <a class="nav-sublink2" href="/examples/vision/patch_convnet/">Augmenting convnets with aggregated attention</a> <a class="nav-sublink2" href="/examples/vision/learnable_resizer/">Learning to Resize</a> <a class="nav-sublink2" href="/examples/vision/adamatch/">Semi-supervision and domain adaptation with AdaMatch</a> <a class="nav-sublink2" href="/examples/vision/barlow_twins/">Barlow Twins for Contrastive SSL</a> <a class="nav-sublink2" href="/examples/vision/consistency_training/">Consistency training with supervision</a> <a class="nav-sublink2" href="/examples/vision/deit/">Distilling Vision Transformers</a> <a class="nav-sublink2" href="/examples/vision/focal_modulation_network/">Focal Modulation: A replacement for Self-Attention</a> <a class="nav-sublink2" href="/examples/vision/forwardforward/">Using the Forward-Forward Algorithm for Image Classification</a> <a class="nav-sublink2" href="/examples/vision/masked_image_modeling/">Masked image modeling with Autoencoders</a> <a class="nav-sublink2" href="/examples/vision/sam/">Segment Anything Model with 🤗Transformers</a> <a class="nav-sublink2" href="/examples/vision/segformer/">Semantic segmentation with SegFormer and Hugging Face Transformers</a> <a class="nav-sublink2" href="/examples/vision/simsiam/">Self-supervised contrastive learning with SimSiam</a> <a class="nav-sublink2" href="/examples/vision/supervised-contrastive-learning/">Supervised Contrastive Learning</a> <a class="nav-sublink2" href="/examples/vision/yolov8/">Efficient Object Detection with YOLOV8 and KerasCV</a> <a class="nav-sublink" href="/examples/nlp/">Natural Language Processing</a> <a class="nav-sublink" href="/examples/structured_data/">Structured Data</a> <a class="nav-sublink" href="/examples/timeseries/">Timeseries</a> <a class="nav-sublink" href="/examples/generative/">Generative Deep Learning</a> <a class="nav-sublink" href="/examples/audio/">Audio Data</a> <a class="nav-sublink" href="/examples/rl/">Reinforcement Learning</a> <a class="nav-sublink" href="/examples/graph/">Graph Data</a> <a class="nav-sublink" href="/examples/keras_recipes/">Quick Keras Recipes</a> </div> </div> </div> <div class="k-main"> <div class='k-main-inner' id='k-main-id'> <div class='k-content'> <div class='k-location-slug'> <span class="k-location-slug-pointer">►</span> <a href='/examples/'>Code examples</a> / Computer Vision </div> <h2 class="example-category-title"><a href="/examples/vision/">Computer Vision</a></h2> <h3 class="example-subcategory-title">Image classification</h3> <a href="/examples/vision/image_classification_from_scratch"> <div class="example-card"> <div class="example-highlight">★</div> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification from scratch </div> </div> </a> <a href="/examples/vision/mnist_convnet"> <div class="example-card"> <div class="example-highlight">★</div> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Simple MNIST convnet </div> </div> </a> <a href="/examples/vision/image_classification_efficientnet_fine_tuning"> <div class="example-card"> <div class="example-highlight">★</div> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification via fine-tuning with EfficientNet </div> </div> </a> <a href="/examples/vision/image_classification_with_vision_transformer"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification with Vision Transformer </div> </div> </a> <a href="/examples/vision/attention_mil_classification"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Classification using Attention-based Deep Multiple Instance Learning </div> </div> </a> <a href="/examples/vision/mlp_image_classification"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification with modern MLP models </div> </div> </a> <a href="/examples/vision/mobilevit"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> A mobile-friendly Transformer-based model for image classification </div> </div> </a> <a href="/examples/vision/xray_classification_with_tpus"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Pneumonia Classification on TPU </div> </div> </a> <a href="/examples/vision/cct"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Compact Convolutional Transformers </div> </div> </a> <a href="/examples/vision/convmixer"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification with ConvMixer </div> </div> </a> <a href="/examples/vision/eanet"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification with EANet (External Attention Transformer) </div> </div> </a> <a href="/examples/vision/involution"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Involutional neural networks </div> </div> </a> <a href="/examples/vision/perceiver_image_classification"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification with Perceiver </div> </div> </a> <a href="/examples/vision/reptile"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Few-Shot learning with Reptile </div> </div> </a> <a href="/examples/vision/semisupervised_simclr"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Semi-supervised image classification using contrastive pretraining with SimCLR </div> </div> </a> <a href="/examples/vision/swin_transformers"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image classification with Swin Transformers </div> </div> </a> <a href="/examples/vision/vit_small_ds"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Train a Vision Transformer on small datasets </div> </div> </a> <a href="/examples/vision/shiftvit"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> A Vision Transformer without Attention </div> </div> </a> <a href="/examples/vision/image_classification_using_global_context_vision_transformer"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image Classification using Global Context Vision Transformer </div> </div> </a> <a href="/examples/vision/temporal_latent_bottleneck"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> When Recurrence meets Transformers </div> </div> </a> <a href="/examples/vision/bit"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image Classification using BigTransfer (BiT) </div> </div> </a> <h3 class="example-subcategory-title">Image segmentation</h3> <a href="/examples/vision/oxford_pets_image_segmentation"> <div class="example-card"> <div class="example-highlight">★</div> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image segmentation with a U-Net-like architecture </div> </div> </a> <a href="/examples/vision/deeplabv3_plus"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Multiclass semantic segmentation using DeepLabV3+ </div> </div> </a> <a href="/examples/vision/basnet_segmentation"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Highly accurate boundaries segmentation using BASNet </div> </div> </a> <a href="/examples/vision/fully_convolutional_network"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image Segmentation using Composable Fully-Convolutional Networks </div> </div> </a> <h3 class="example-subcategory-title">Object detection</h3> <a href="/examples/vision/retinanet"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Object Detection with RetinaNet </div> </div> </a> <a href="/examples/vision/keypoint_detection"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Keypoint Detection with Transfer Learning </div> </div> </a> <a href="/examples/vision/object_detection_using_vision_transformer"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Object detection with Vision Transformers </div> </div> </a> <h3 class="example-subcategory-title">3D</h3> <a href="/examples/vision/3D_image_classification"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> 3D image classification from CT scans </div> </div> </a> <a href="/examples/vision/depth_estimation"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Monocular depth estimation </div> </div> </a> <a href="/examples/vision/nerf"> <div class="example-card"> <div class="example-highlight">★</div> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> 3D volumetric rendering with NeRF </div> </div> </a> <a href="/examples/vision/pointnet_segmentation"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Point cloud segmentation with PointNet </div> </div> </a> <a href="/examples/vision/pointnet"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Point cloud classification </div> </div> </a> <h3 class="example-subcategory-title">OCR</h3> <a href="/examples/vision/captcha_ocr"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> OCR model for reading Captchas </div> </div> </a> <a href="/examples/vision/handwriting_recognition"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Handwriting recognition </div> </div> </a> <h3 class="example-subcategory-title">Image enhancement</h3> <a href="/examples/vision/autoencoder"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Convolutional autoencoder for image denoising </div> </div> </a> <a href="/examples/vision/mirnet"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Low-light image enhancement using MIRNet </div> </div> </a> <a href="/examples/vision/super_resolution_sub_pixel"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image Super-Resolution using an Efficient Sub-Pixel CNN </div> </div> </a> <a href="/examples/vision/edsr"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Enhanced Deep Residual Networks for single-image super-resolution </div> </div> </a> <a href="/examples/vision/zero_dce"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Zero-DCE for low-light image enhancement </div> </div> </a> <h3 class="example-subcategory-title">Data augmentation</h3> <a href="/examples/vision/cutmix"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> CutMix data augmentation for image classification </div> </div> </a> <a href="/examples/vision/mixup"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> MixUp augmentation for image classification </div> </div> </a> <a href="/examples/vision/randaugment"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> RandAugment for Image Classification for Improved Robustness </div> </div> </a> <h3 class="example-subcategory-title">Image & Text</h3> <a href="/examples/vision/image_captioning"> <div class="example-card"> <div class="example-highlight">★</div> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image captioning </div> </div> </a> <a href="/examples/vision/nl_image_search"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Natural language image search with a Dual Encoder </div> </div> </a> <h3 class="example-subcategory-title">Vision models interpretability</h3> <a href="/examples/vision/visualizing_what_convnets_learn"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Visualizing what convnets learn </div> </div> </a> <a href="/examples/vision/integrated_gradients"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Model interpretability with Integrated Gradients </div> </div> </a> <a href="/examples/vision/probing_vits"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Investigating Vision Transformer representations </div> </div> </a> <a href="/examples/vision/grad_cam"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Grad-CAM class activation visualization </div> </div> </a> <h3 class="example-subcategory-title">Image similarity search</h3> <a href="/examples/vision/near_dup_search"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Near-duplicate image search </div> </div> </a> <a href="/examples/vision/semantic_image_clustering"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Semantic Image Clustering </div> </div> </a> <a href="/examples/vision/siamese_contrastive"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image similarity estimation using a Siamese Network with a contrastive loss </div> </div> </a> <a href="/examples/vision/siamese_network"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Image similarity estimation using a Siamese Network with a triplet loss </div> </div> </a> <a href="/examples/vision/metric_learning"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Metric learning for image similarity search </div> </div> </a> <a href="/examples/vision/metric_learning_tf_similarity"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Metric learning for image similarity search using TensorFlow Similarity </div> </div> </a> <a href="/examples/vision/nnclr"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Self-supervised contrastive learning with NNCLR </div> </div> </a> <h3 class="example-subcategory-title">Video</h3> <a href="/examples/vision/video_classification"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Video Classification with a CNN-RNN Architecture </div> </div> </a> <a href="/examples/vision/conv_lstm"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Next-Frame Video Prediction with Convolutional LSTMs </div> </div> </a> <a href="/examples/vision/video_transformers"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Video Classification with Transformers </div> </div> </a> <a href="/examples/vision/vivit"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Video Vision Transformer </div> </div> </a> <h3 class="example-subcategory-title">Performance recipes</h3> <a href="/examples/vision/gradient_centralization"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Gradient Centralization for Better Training Performance </div> </div> </a> <a href="/examples/vision/token_learner"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Learning to tokenize in Vision Transformers </div> </div> </a> <a href="/examples/vision/knowledge_distillation"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Knowledge Distillation </div> </div> </a> <a href="/examples/vision/fixres"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> FixRes: Fixing train-test resolution discrepancy </div> </div> </a> <a href="/examples/vision/cait"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Class Attention Image Transformers with LayerScale </div> </div> </a> <a href="/examples/vision/patch_convnet"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Augmenting convnets with aggregated attention </div> </div> </a> <a href="/examples/vision/learnable_resizer"> <div class="example-card"> <div class="example-highlight"><b>V3</b></div> <div class="example-card-title"> Learning to Resize </div> </div> </a> <h3 class="example-subcategory-title">Other</h3> <a href="/examples/vision/adamatch"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Semi-supervision and domain adaptation with AdaMatch </div> </div> </a> <a href="/examples/vision/barlow_twins"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Barlow Twins for Contrastive SSL </div> </div> </a> <a href="/examples/vision/consistency_training"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Consistency training with supervision </div> </div> </a> <a href="/examples/vision/deit"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Distilling Vision Transformers </div> </div> </a> <a href="/examples/vision/focal_modulation_network"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Focal Modulation: A replacement for Self-Attention </div> </div> </a> <a href="/examples/vision/forwardforward"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Using the Forward-Forward Algorithm for Image Classification </div> </div> </a> <a href="/examples/vision/masked_image_modeling"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Masked image modeling with Autoencoders </div> </div> </a> <a href="/examples/vision/sam"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Segment Anything Model with 🤗Transformers </div> </div> </a> <a href="/examples/vision/segformer"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Semantic segmentation with SegFormer and Hugging Face Transformers </div> </div> </a> <a href="/examples/vision/simsiam"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Self-supervised contrastive learning with SimSiam </div> </div> </a> <a href="/examples/vision/supervised-contrastive-learning"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Supervised Contrastive Learning </div> </div> </a> <a href="/examples/vision/yolov8"> <div class="example-card"> <div class="example-highlight">V2</div> <div class="example-card-title"> Efficient Object Detection with YOLOV8 and KerasCV </div> </div> </a> <hr class="examples-separator"> </div> </div> </div> </div> </div> <footer> <div class="footer__container"> <a href="https://policies.google.com/terms">Terms</a> <div>|</div> <a href="https://policies.google.com/privacy">Privacy</a> </div> </footer> <script src="/js/index.js"></script> </body> </html>