CINXE.COM

Training - AWS Deep Learning Containers

<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml" lang="en-US"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Training - AWS Deep Learning Containers</title><meta name="viewport" content="width=device-width,initial-scale=1" /><meta name="assets_root" content="/assets" /><meta name="target_state" content="deep-learning-containers-ec2-tutorials-training" /><meta name="default_state" content="deep-learning-containers-ec2-tutorials-training" /><link rel="icon" type="image/ico" href="/assets/images/favicon.ico" /><link rel="shortcut icon" type="image/ico" href="/assets/images/favicon.ico" /><link rel="canonical" href="https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" /><meta name="description" content="This section shows how to run training on AWS Deep Learning Containers for Amazon EC2 using PyTorch and TensorFlow." /><meta name="deployment_region" content="IAD" /><meta name="product" content="AWS Deep Learning Containers" /><meta name="guide" content="Developer Guide" /><meta name="abstract" content="Insert abstract text" /><meta name="guide-locale" content="en_us" /><meta name="tocs" content="toc-contents.json" /><link rel="canonical" href="https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" /><link rel="alternative" href="https://docs.aws.amazon.com/id_id/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="id-id" /><link rel="alternative" href="https://docs.aws.amazon.com/id_id/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="id" /><link rel="alternative" href="https://docs.aws.amazon.com/de_de/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="de-de" /><link rel="alternative" href="https://docs.aws.amazon.com/de_de/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="de" /><link rel="alternative" href="https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="en-us" /><link rel="alternative" href="https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="en" /><link rel="alternative" href="https://docs.aws.amazon.com/es_es/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="es-es" /><link rel="alternative" href="https://docs.aws.amazon.com/es_es/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="es" /><link rel="alternative" href="https://docs.aws.amazon.com/fr_fr/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="fr-fr" /><link rel="alternative" href="https://docs.aws.amazon.com/fr_fr/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="fr" /><link rel="alternative" href="https://docs.aws.amazon.com/it_it/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="it-it" /><link rel="alternative" href="https://docs.aws.amazon.com/it_it/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="it" /><link rel="alternative" href="https://docs.aws.amazon.com/ja_jp/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="ja-jp" /><link rel="alternative" href="https://docs.aws.amazon.com/ja_jp/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="ja" /><link rel="alternative" href="https://docs.aws.amazon.com/ko_kr/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="ko-kr" /><link rel="alternative" href="https://docs.aws.amazon.com/ko_kr/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="ko" /><link rel="alternative" href="https://docs.aws.amazon.com/pt_br/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="pt-br" /><link rel="alternative" href="https://docs.aws.amazon.com/pt_br/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="pt" /><link rel="alternative" href="https://docs.aws.amazon.com/zh_cn/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="zh-cn" /><link rel="alternative" href="https://docs.aws.amazon.com/zh_tw/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="zh-tw" /><link rel="alternative" href="https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" hreflang="x-default" /><meta name="feedback-item" content="AWS Deep Learning Containers" /><meta name="this_doc_product" content="AWS Deep Learning Containers" /><meta name="this_doc_guide" content="Developer Guide" /><script defer="" src="/assets/r/vendor4.js?version=2021.12.02"></script><script defer="" src="/assets/r/vendor3.js?version=2021.12.02"></script><script defer="" src="/assets/r/vendor1.js?version=2021.12.02"></script><script defer="" src="/assets/r/awsdocs-common.js?version=2021.12.02"></script><script defer="" src="/assets/r/awsdocs-doc-page.js?version=2021.12.02"></script><link href="/assets/r/vendor4.css?version=2021.12.02" rel="stylesheet" /><link href="/assets/r/awsdocs-common.css?version=2021.12.02" rel="stylesheet" /><link href="/assets/r/awsdocs-doc-page.css?version=2021.12.02" rel="stylesheet" /><script async="" id="awsc-panorama-bundle" type="text/javascript" src="https://prod.pa.cdn.uis.awsstatic.com/panorama-nav-init.js" data-config="{'appEntity':'aws-documentation','region':'us-east-1','service':'deep-learning-containers'}"></script><meta id="panorama-serviceSubSection" value="Developer Guide" /><meta id="panorama-serviceConsolePage" value="Training" /></head><body class="awsdocs awsui"><div class="awsdocs-container"><awsdocs-header></awsdocs-header><awsui-app-layout id="app-layout" class="awsui-util-no-gutters" ng-controller="ContentController as $ctrl" header-selector="awsdocs-header" navigation-hide="false" navigation-width="$ctrl.navWidth" navigation-open="$ctrl.navOpen" navigation-change="$ctrl.onNavChange($event)" tools-hide="$ctrl.hideTools" tools-width="$ctrl.toolsWidth" tools-open="$ctrl.toolsOpen" tools-change="$ctrl.onToolsChange($event)"><div id="guide-toc" dom-region="navigation"><awsdocs-toc></awsdocs-toc></div><div id="main-column" dom-region="content" tabindex="-1"><awsdocs-view class="awsdocs-view"><div id="awsdocs-content"><head><title>Training - AWS Deep Learning Containers</title><meta name="pdf" content="/pdfs/deep-learning-containers/latest/devguide/dlc-guide.pdf.pdf#deep-learning-containers-ec2-tutorials-training" /><meta name="rss" content="dlc-guide.pdf.rss" /><meta name="forums" content="https://repost.aws/tags/TAtQOYCNQXQAypuIl0ZxRowA" /><meta name="feedback" content="https://docs.aws.amazon.com/forms/aws-doc-feedback?hidden_service_name=AWS%20Deep%20Learning%20Containers&amp;topic_url=https://docs.aws.amazon.com/en_us/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" /><meta name="feedback-yes" content="feedbackyes.html?topic_url=https://docs.aws.amazon.com/en_us/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" /><meta name="feedback-no" content="feedbackno.html?topic_url=https://docs.aws.amazon.com/en_us/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html" /><script type="application/ld+json"> { "@context" : "https://schema.org", "@type" : "BreadcrumbList", "itemListElement" : [ { "@type" : "ListItem", "position" : 1, "name" : "AWS", "item" : "https://aws.amazon.com" }, { "@type" : "ListItem", "position" : 2, "name" : "AWS Deep Learning Containers", "item" : "https://docs.aws.amazon.com/deep-learning-containers/index.html" }, { "@type" : "ListItem", "position" : 3, "name" : "Developer Guide", "item" : "https://docs.aws.amazon.com/deep-learning-containers/latest/devguide" }, { "@type" : "ListItem", "position" : 4, "name" : "Getting Started with AWS Deep Learning Containers", "item" : "https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/getting-started.html" }, { "@type" : "ListItem", "position" : 5, "name" : "Amazon EC2 Tutorials", "item" : "https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ec2.html" }, { "@type" : "ListItem", "position" : 6, "name" : "Training", "item" : "https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ec2.html" } ] } </script></head><body><div id="main"><div style="display: none"><a href="/pdfs/deep-learning-containers/latest/devguide/dlc-guide.pdf.pdf#deep-learning-containers-ec2-tutorials-training" target="_blank" rel="noopener noreferrer" title="Open PDF"></a></div><div id="breadcrumbs" class="breadcrumb"><a href="https://aws.amazon.com">AWS</a><a href="/index.html">Documentation</a><a href="/deep-learning-containers/index.html">AWS Deep Learning Containers</a><a href="what-is-dlc.html">Developer Guide</a></div><div id="page-toc-src"><a href="#deep-learning-containers-ec2-tutorials-training-pytorch">PyTorch training</a><a href="#deep-learning-containers-ec2-tutorials-training-tf">TensorFlow training</a><a href="#deep-learning-containers-ec2-training-pytorch-next">Next steps</a></div><div id="main-content" class="awsui-util-container"><div id="main-col-body"><awsdocs-language-banner data-service="$ctrl.pageService"></awsdocs-language-banner><h1 class="topictitle" id="deep-learning-containers-ec2-tutorials-training">Training</h1><div class="awsdocs-page-header-container"><awsdocs-page-header></awsdocs-page-header><awsdocs-filter-selector id="awsdocs-filter-selector"></awsdocs-filter-selector></div><p>This section shows how to run training on AWS Deep Learning Containers for Amazon EC2 using PyTorch and TensorFlow.</p><div class="highlights" id="inline-topiclist"><h6>Contents</h6><ul><li><a href="#deep-learning-containers-ec2-tutorials-training-pytorch">PyTorch training</a></li><li><a href="#deep-learning-containers-ec2-tutorials-training-tf">TensorFlow training</a></li><li><a href="#deep-learning-containers-ec2-training-pytorch-next">Next steps</a></li></ul></div> <h2 id="deep-learning-containers-ec2-tutorials-training-pytorch">PyTorch training</h2> <p>To begin training with PyTorch from your Amazon EC2 instance, use the following commands to run the container. You must use <code class="userinput">nvidia-docker</code> for GPU images. </p> <div class="itemizedlist"> <ul class="itemizedlist"><li class="listitem"> <p>For CPU</p> <pre class="programlisting"><div class="code-btn-container"><div class="btn-copy-code" title="Copy"><awsui-icon name="copy"></awsui-icon></div></div><!--DEBUG: cli ()--><code class="nohighlight"><code class="prompt" copy="false">$ </code>docker run -it <code class="replaceable">&lt;CPU training container&gt;</code></code></pre> </li><li class="listitem"> <p>For GPU</p> <pre class="programlisting"><div class="code-btn-container"><div class="btn-copy-code" title="Copy"><awsui-icon name="copy"></awsui-icon></div></div><!--DEBUG: cli ()--><code class="nohighlight"><code class="prompt" copy="false">$ </code>nvidia-docker run -it <code class="replaceable">&lt;GPU training container&gt;</code></code></pre> </li><li class="listitem"> <p>If you have docker-ce version 19.03 or later, you can use the --gpus flag with docker:</p> <pre class="programlisting"><div class="code-btn-container"><div class="btn-copy-code" title="Copy"><awsui-icon name="copy"></awsui-icon></div></div><!--DEBUG: cli ()--><code class="nohighlight"><code class="prompt" copy="false">$ </code>docker run -it --gpus <code class="replaceable">&lt;GPU training container&gt;</code></code></pre> </li></ul></div> <p> Run the following to begin training. </p> <div class="itemizedlist"> <ul class="itemizedlist"><li class="listitem"> <p>For CPU</p> <pre class="programlisting"><div class="code-btn-container"><div class="btn-copy-code" title="Copy"><awsui-icon name="copy"></awsui-icon></div></div><!--DEBUG: cli ()--><code class="nohighlight"><code class="prompt" copy="false">$ </code>git clone https://github.com/pytorch/examples.git <code class="prompt" copy="false">$ </code>python examples/mnist/main.py --no-cuda</code></pre> </li><li class="listitem"> <p>For GPU</p> <pre class="programlisting"><div class="code-btn-container"><div class="btn-copy-code" title="Copy"><awsui-icon name="copy"></awsui-icon></div></div><!--DEBUG: cli ()--><code class="nohighlight"><code class="prompt" copy="false">$ </code>git clone https://github.com/pytorch/examples.git <code class="prompt" copy="false">$ </code>python examples/mnist/main.py</code></pre> </li></ul></div> <h3 id="deep-learning-containers-ec2-training-pytorch-apex">PyTorch distributed GPU training with NVIDIA Apex </h3> <p>NVIDIA Apex is a PyTorch extension with utilities for mixed precision and distributed training. For more information on the utilities offered with Apex, see the <a href="https://nvidia.github.io/apex/" rel="noopener noreferrer" target="_blank"><span>NVIDIA Apex website</span><awsui-icon class="awsdocs-link-icon" name="external"></awsui-icon></a>. Apex is currently supported by Amazon EC2 instances in the following families:</p> <div class="itemizedlist"> <ul class="itemizedlist"><li class="listitem"><p><a href="https://aws.amazon.com/ec2/instance-types/p3/" rel="noopener noreferrer" target="_blank"><span>Amazon EC2 P3 Instances</span><awsui-icon class="awsdocs-link-icon" name="external"></awsui-icon></a></p></li><li class="listitem"><p><a href="https://aws.amazon.com/ec2/instance-types/p4/" rel="noopener noreferrer" target="_blank"><span>Amazon EC2 P4 Instances</span><awsui-icon class="awsdocs-link-icon" name="external"></awsui-icon></a></p></li><li class="listitem"><p><a href="https://aws.amazon.com/ec2/instance-types/p5/" rel="noopener noreferrer" target="_blank"><span>Amazon EC2 P5 Instances</span><awsui-icon class="awsdocs-link-icon" name="external"></awsui-icon></a></p></li><li class="listitem"><p><a href="https://aws.amazon.com/ec2/instance-types/g5/" rel="noopener noreferrer" target="_blank"><span>Amazon EC2 G5 Instances</span><awsui-icon class="awsdocs-link-icon" name="external"></awsui-icon></a></p></li></ul></div> <p>To begin distributed training using NVIDIA Apex, run the following in the terminal of the GPU training container. This example requires at least two GPUs on your Amazon EC2 instance to run parallel distributed training. </p> <pre class="programlisting"><div class="code-btn-container"><div class="btn-copy-code" title="Copy"><awsui-icon name="copy"></awsui-icon></div></div><!--DEBUG: cli ()--><code class="nohighlight"><code class="prompt" copy="false">$ </code>git clone https://github.com/NVIDIA/apex.git &amp;&amp; cd apex <code class="prompt" copy="false">$ </code>python -m torch.distributed.launch --nproc_per_node=2 examples/simple/distributed/distributed_data_parallel.py</code></pre> <h2 id="deep-learning-containers-ec2-tutorials-training-tf">TensorFlow training</h2> <p>After you log into your Amazon EC2 instance, you can run TensorFlow and TensorFlow 2 containers with the following commands. You must use <code class="code">nvidia-docker</code> for GPU images. </p> <div class="itemizedlist"> <ul class="itemizedlist"><li class="listitem"> <p>For CPU-based training, run the following.</p> <pre class="programlisting"><div class="code-btn-container"><div class="btn-copy-code" title="Copy"><awsui-icon name="copy"></awsui-icon></div></div><!--DEBUG: cli ()--><code class="nohighlight"><code class="prompt" copy="false">$ </code>docker run -it <code class="replaceable">&lt;CPU training container&gt;</code></code></pre> </li><li class="listitem"> <p>For GPU-based training, run the following.</p> <pre class="programlisting"><div class="code-btn-container"><div class="btn-copy-code" title="Copy"><awsui-icon name="copy"></awsui-icon></div></div><!--DEBUG: cli ()--><code class="nohighlight"><code class="prompt" copy="false">$ </code>nvidia-docker run -it <code class="replaceable">&lt;GPU training container&gt;</code></code></pre> </li></ul></div> <p> The previous command runs the container in interactive mode and provides a shell prompt inside the container. You can then run the following to import TensorFlow. </p> <pre class="programlisting"><div class="code-btn-container"><div class="btn-copy-code" title="Copy"><awsui-icon name="copy"></awsui-icon></div></div><!--DEBUG: cli ()--><code class="nohighlight"><code class="prompt" copy="false">$ </code><code class="userinput">python</code></code></pre> <pre class="programlisting"><div class="code-btn-container"><div class="btn-copy-code" title="Copy"><awsui-icon name="copy"></awsui-icon></div></div><!--DEBUG: cli ()--><code class="nohighlight"><code class="userinput">&gt;&gt; import tensorflow</code></code></pre> <p> Press Ctrl+D to return to the bash prompt. Run the following to begin training: </p> <pre class="programlisting"><div class="code-btn-container"><div class="btn-copy-code" title="Copy"><awsui-icon name="copy"></awsui-icon></div></div><!--DEBUG: cli ()--><code class="nohighlight"><code class="userinput">git clone https://github.com/fchollet/keras.git</code></code></pre> <pre class="programlisting"><div class="code-btn-container"><div class="btn-copy-code" title="Copy"><awsui-icon name="copy"></awsui-icon></div></div><!--DEBUG: cli ()--><code class="nohighlight"><code class="prompt" copy="false">$ </code><code class="userinput">cd keras</code></code></pre> <pre class="programlisting"><div class="code-btn-container"><div class="btn-copy-code" title="Copy"><awsui-icon name="copy"></awsui-icon></div></div><!--DEBUG: cli ()--><code class="nohighlight"><code class="prompt" copy="false">$ </code><code class="userinput">python examples/mnist_cnn.py</code></code></pre> <h2 id="deep-learning-containers-ec2-training-pytorch-next">Next steps</h2> <p>To learn inference on Amazon EC2 using PyTorch with Deep Learning Containers, see <a href="./deep-learning-containers-ec2-tutorials-inference.html#deep-learning-containers-ec2-tutorials-inference-pytorch">PyTorch Inference </a>. </p> <awsdocs-copyright class="copyright-print"></awsdocs-copyright><awsdocs-thumb-feedback right-edge="{{$ctrl.thumbFeedbackRightEdge}}"></awsdocs-thumb-feedback></div><noscript><div><div><div><div id="js_error_message"><p><img src="https://d1ge0kk1l5kms0.cloudfront.net/images/G/01/webservices/console/warning.png" alt="Warning" /> <strong>Javascript is disabled or is unavailable in your browser.</strong></p><p>To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.</p></div></div></div></div></noscript><div id="main-col-footer" class="awsui-util-font-size-0"><div id="doc-conventions"><a target="_top" href="/general/latest/gr/docconventions.html">Document Conventions</a></div><div class="prev-next"><div id="previous" class="prev-link" accesskey="p" href="./deep-learning-containers-ec2-setup.html">Amazon EC2 setup</div><div id="next" class="next-link" accesskey="n" href="./deep-learning-containers-ec2-tutorials-inference.html">Inference</div></div></div><awsdocs-page-utilities></awsdocs-page-utilities></div><div id="quick-feedback-yes" style="display: none;"><div class="title">Did this page help you? - Yes</div><div class="content"><p>Thanks for letting us know we're doing a good job!</p><p>If you've got a moment, please tell us what we did right so we can do more of it.</p><p><awsui-button id="fblink" rel="noopener noreferrer" target="_blank" text="Feedback" click="linkClick($event)" href="https://docs.aws.amazon.com/forms/aws-doc-feedback?hidden_service_name=AWS Deep Learning Containers&amp;topic_url=https://docs.aws.amazon.com/en_us/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html"></awsui-button></p></div></div><div id="quick-feedback-no" style="display: none;"><div class="title">Did this page help you? - No</div><div class="content"><p>Thanks for letting us know this page needs work. We're sorry we let you down.</p><p>If you've got a moment, please tell us how we can make the documentation better.</p><p><awsui-button id="fblink" rel="noopener noreferrer" target="_blank" text="Feedback" click="linkClick($event)" href="https://docs.aws.amazon.com/forms/aws-doc-feedback?hidden_service_name=AWS Deep Learning Containers&amp;topic_url=https://docs.aws.amazon.com/en_us/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-training.html"></awsui-button></p></div></div></div></body></div></awsdocs-view><div class="page-loading-indicator" id="page-loading-indicator"><awsui-spinner size="large"></awsui-spinner></div></div><div id="tools-panel" dom-region="tools"><awsdocs-tools-panel id="awsdocs-tools-panel"></awsdocs-tools-panel></div></awsui-app-layout><awsdocs-cookie-banner class="doc-cookie-banner"></awsdocs-cookie-banner></div></body></html>

Pages: 1 2 3 4 5 6 7 8 9 10