CINXE.COM
Question about GPU FLops - CUDA Programming and Performance - NVIDIA Developer Forums
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <title>Question about GPU FLops - CUDA Programming and Performance - NVIDIA Developer Forums</title> <meta name="description" content="Recently while learning cuda, I am using a Tesla P100 graphics card. Why use the matrix multiplication sample program in nvidia cuda sample (“cuda-samples-12.2\Samples\0_Introduction\matrixMul”)to test floating-point per&hellip;"> <meta name="generator" content="Discourse 3.4.0.beta3-dev - https://github.com/discourse/discourse version d71016522e8d9bb21c20312388271f8f0dd53069"> <link rel="icon" type="image/png" href="https://global.discourse-cdn.com/nvidia/optimized/2X/0/0372ccc95874f71d7fbff64bbbff6f8c69dd850b_2_32x32.png"> <link rel="apple-touch-icon" type="image/png" href="https://global.discourse-cdn.com/nvidia/optimized/2X/8/819b2855e1f1f3249e77dc713405cf77d1eda57c_2_180x180.png"> <meta name="theme-color" media="all" content="#000000"> <meta name="viewport" content="width=device-width, initial-scale=1.0, minimum-scale=1.0, user-scalable=yes, viewport-fit=cover"> <link rel="canonical" href="https://forums.developer.nvidia.com/t/question-about-gpu-flops/313972" /> <link rel="search" type="application/opensearchdescription+xml" href="https://forums.developer.nvidia.com/opensearch.xml" title="NVIDIA Developer Forums Search"> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/color_definitions_nvidia_4_13_b0e6a9bf93713cf1d1478e25b62f18833c3cf820.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" class="light-scheme"/> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/desktop_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="desktop" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/automation_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="automation" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/checklist_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="checklist" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-ai_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-ai" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-akismet_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-akismet" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-antivirus_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-antivirus" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-assign_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-assign" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-cakeday_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-cakeday" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-calendar_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-calendar" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-chat-integration_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-chat-integration" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-data-explorer_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-data-explorer" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-details_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-details" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-docs_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-docs" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-jira_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-jira" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-lazy-videos_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-lazy-videos" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-local-dates_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-local-dates" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-policy_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-policy" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-presence_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-presence" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-solved_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-solved" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-templates_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-templates" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-topic-voting_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-topic-voting" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-translator_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-translator" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-user-notes_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-user-notes" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-yearly-review_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-yearly-review" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/footnote_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="footnote" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/hosted-site_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="hosted-site" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/poll_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="poll" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/spoiler-alert_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="spoiler-alert" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-ai_desktop_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-ai_desktop" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-calendar_desktop_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-calendar_desktop" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/discourse-topic-voting_desktop_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="discourse-topic-voting_desktop" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/poll_desktop_a0eee66b08799ab07414e8b689b5f4dada707e91.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="poll_desktop" /> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/desktop_theme_11_d44f674dbebc6d04cb93320be0a0aa5b72e0d2d7.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="desktop_theme" data-theme-id="11" data-theme-name="custom header links"/> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/desktop_theme_18_9a79ebc15ad3c4f807e85fe309e8959ee11d4f96.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="desktop_theme" data-theme-id="18" data-theme-name="topic thumbnails"/> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/desktop_theme_19_fb9ac007701ff5ca6e7c095e97fa1b654e7756b3.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="desktop_theme" data-theme-id="19" data-theme-name="versatile banner"/> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/desktop_theme_13_0af64655fecb6a9ed407080d18cfdc208ddbe8ec.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="desktop_theme" data-theme-id="13" data-theme-name="discourse-nvidia-theme"/> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/desktop_theme_17_fa34540ca6989abe2302b4ea9d21a1ad1c16219e.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="desktop_theme" data-theme-id="17" data-theme-name="fix bulk actions button focus"/> <link href="https://sea2.discourse-cdn.com/nvidia/stylesheets/desktop_theme_20_dcf048c51982d668d078dac82d8ac6c5d2966e55.css?__ws=forums.developer.nvidia.com" media="all" rel="stylesheet" data-target="desktop_theme" data-theme-id="20" data-theme-name="versatile banner adjustments"/> <script src="//assets.adobedtm.com/5d4962a43b79/76a76b0bb8ea/launch-7d434965ec64.min.js" async="" nonce="LUEsfrf5JFCuF4bqReHmwXGJM"></script> <meta name="google-site-verification" content="If3jxgzZoS1XODkXDOo83AD2VzBqttpfA2TfyU7YQlk"> <script defer="" src="https://sea2.discourse-cdn.com/nvidia/theme-javascripts/dbd2b5fc7317c86b8d3bb000f99224d82134a5d8.js?__ws=forums.developer.nvidia.com" data-theme-id="13" nonce="LUEsfrf5JFCuF4bqReHmwXGJM"></script> <script defer="" src="https://sea2.discourse-cdn.com/nvidia/theme-javascripts/dc40147ee1dbb3eacb2870c9212fe522658a046b.js?__ws=forums.developer.nvidia.com" data-theme-id="13" nonce="LUEsfrf5JFCuF4bqReHmwXGJM"></script> <!-- OneTrust Cookies Consent Notice start for nvidia.com --> <script src="https://cdn.cookielaw.org/scripttemplates/otSDKStub.js" data-document-language="true" type="text/javascript" charset="UTF-8" data-domain-script="3e2b62ff-7ae7-4ac5-87c8-d5949ecafff5" nonce="LUEsfrf5JFCuF4bqReHmwXGJM"></script> <!-- OneTrust Cookies Consent Notice end for nvidia.com --> <script type="text/javascript" src="https://images.nvidia.com/aem-dam/Solutions/ot-js/ot-custom.js" nonce="LUEsfrf5JFCuF4bqReHmwXGJM"></script><script defer="" src="https://sea2.discourse-cdn.com/nvidia/theme-javascripts/098438cd37b784b1cdbf6a91223ae055e9a0d312.js?__ws=forums.developer.nvidia.com" data-theme-id="22" nonce="LUEsfrf5JFCuF4bqReHmwXGJM"></script> <link rel="alternate nofollow" type="application/rss+xml" title="RSS feed of 'Question about GPU FLops'" href="https://forums.developer.nvidia.com/t/question-about-gpu-flops/313972.rss" /> <meta property="og:site_name" content="NVIDIA Developer Forums" /> <meta property="og:type" content="website" /> <meta name="twitter:card" content="summary" /> <meta name="twitter:image" content="https://global.discourse-cdn.com/nvidia/original/3X/e/4/e41d27491ee72562cf68c340597ca95f587879bd.png" /> <meta property="og:image" content="https://global.discourse-cdn.com/nvidia/original/3X/e/4/e41d27491ee72562cf68c340597ca95f587879bd.png" /> <meta property="og:url" content="https://forums.developer.nvidia.com/t/question-about-gpu-flops/313972" /> <meta name="twitter:url" content="https://forums.developer.nvidia.com/t/question-about-gpu-flops/313972" /> <meta property="og:title" content="Question about GPU FLops" /> <meta name="twitter:title" content="Question about GPU FLops" /> <meta property="og:description" content="Recently while learning cuda, I am using a Tesla P100 graphics card. Why use the matrix multiplication sample program in nvidia cuda sample (“cuda-samples-12.2\Samples\0_Introduction\matrixMul”)to test floating-point performance, test speed single precision 1657.76 GFlop/s double precision double 1078.98 GFlop/s. Nowhere near the theoretical performance (only about 1/5). What is the cause of it, what methods can make graphics card test floating-point performance further improve? Is it an optimi..." /> <meta name="twitter:description" content="Recently while learning cuda, I am using a Tesla P100 graphics card. Why use the matrix multiplication sample program in nvidia cuda sample (“cuda-samples-12.2\Samples\0_Introduction\matrixMul”)to test floating-point performance, test speed single precision 1657.76 GFlop/s double precision double 1078.98 GFlop/s. Nowhere near the theoretical performance (only about 1/5). What is the cause of it, what methods can make graphics card test floating-point performance further improve? Is it an optimi..." /> <meta property="og:article:section" content="Accelerated Computing" /> <meta property="og:article:section:color" content="76B900" /> <meta property="og:article:section" content="CUDA" /> <meta property="og:article:section:color" content="76B900" /> <meta property="og:article:section" content="CUDA Programming and Performance" /> <meta property="og:article:section:color" content="76B900" /> <meta property="og:article:tag" content="kernel" /> <meta property="og:article:tag" content="cuda" /> <meta property="article:published_time" content="2024-11-19T19:59:03+00:00" /> <meta property="og:ignore_canonical" content="true" /> <script type="application/ld+json">{"@context":"http://schema.org","@type":"QAPage","name":"Question about GPU FLops","mainEntity":{"@type":"Question","name":"Question about GPU FLops","text":"Recently while learning cuda, I am using a Tesla P100 graphics card. Why use the matrix multiplication sample program in nvidia cuda sample (“cuda-samples-12.2\\Samples\\0_Introduction\\matrixMul”)to test floating-point performance, test speed single precision 1657.76 GFlop/s double precision double 10…","upvoteCount":0,"answerCount":0,"datePublished":"2024-11-19T19:59:03.287Z","author":{"@type":"Person","name":"LEEWEI","url":"https://forums.developer.nvidia.com/u/LEEWEI"}}}</script> </head> <body class="crawler browser-update"> <header> <a href="/"> NVIDIA Developer Forums </a> </header> <div id="main-outlet" class="wrap" role="main"> <div id="topic-title"> <h1> <a href="/t/question-about-gpu-flops/313972">Question about GPU FLops</a> </h1> <div class="topic-category" itemscope itemtype="http://schema.org/BreadcrumbList"> <span itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem"> <a href="/c/accelerated-computing/cuda/cuda-programming-and-performance/7" class="badge-wrapper bullet" itemprop="item"> <span class='badge-category-bg' style='background-color: #76B900'></span> <span class='badge-category clear-badge'> <span class='category-name' itemprop='name'>Accelerated Computing</span> </span> </a> <meta itemprop="position" content="1" /> </span> <span itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem"> <a href="/c/accelerated-computing/cuda/cuda-programming-and-performance/7" class="badge-wrapper bullet" itemprop="item"> <span class='badge-category-bg' style='background-color: #76B900'></span> <span class='badge-category clear-badge'> <span class='category-name' itemprop='name'>CUDA</span> </span> </a> <meta itemprop="position" content="2" /> </span> <span itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem"> <a href="/c/accelerated-computing/cuda/cuda-programming-and-performance/7" class="badge-wrapper bullet" itemprop="item"> <span class='badge-category-bg' style='background-color: #76B900'></span> <span class='badge-category clear-badge'> <span class='category-name' itemprop='name'>CUDA Programming and Performance</span> </span> </a> <meta itemprop="position" content="3" /> </span> </div> <div class="topic-category"> <div class='discourse-tags list-tags'> <a href='https://forums.developer.nvidia.com/tag/kernel' class='discourse-tag' rel="tag">kernel</a>, <a href='https://forums.developer.nvidia.com/tag/cuda' class='discourse-tag' rel="tag">cuda</a> </div> </div> </div> <div itemscope itemtype='http://schema.org/DiscussionForumPosting'> <meta itemprop='headline' content='Question about GPU FLops'> <link itemprop='url' href='https://forums.developer.nvidia.com/t/question-about-gpu-flops/313972'> <meta itemprop='datePublished' content='2024-11-19T19:59:03Z'> <meta itemprop='articleSection' content='CUDA Programming and Performance'> <meta itemprop='keywords' content='kernel, cuda'> <div itemprop='publisher' itemscope itemtype="http://schema.org/Organization"> <meta itemprop='name' content='NVIDIA'> <div itemprop='logo' itemscope itemtype="http://schema.org/ImageObject"> <meta itemprop='url' content='https://global.discourse-cdn.com/nvidia/original/3X/a/1/a1ef6e0c1fbd3fad5bf82538b78dfaa9c5fa1a61.png'> </div> </div> <div id='post_1' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://forums.developer.nvidia.com/u/LEEWEI'><span itemprop='name'>LEEWEI</span></a> </span> <link itemprop="mainEntityOfPage" href="https://forums.developer.nvidia.com/t/question-about-gpu-flops/313972"> <span class="crawler-post-infos"> <time datetime='2024-11-19T19:59:03Z' class='post-time'> November 19, 2024, 7:59pm </time> <meta itemprop='dateModified' content='2024-11-19T19:59:03Z'> <span itemprop='position'>1</span> </span> </div> <div class='post' itemprop='text'> <p>Recently while learning cuda, I am using a Tesla P100 graphics card. Why use the matrix multiplication sample program in nvidia cuda sample (“cuda-samples-12.2\Samples\0_Introduction\matrixMul”)to test floating-point performance, test speed single precision 1657.76 GFlop/s double precision double 1078.98 GFlop/s. Nowhere near the theoretical performance (only about 1/5).<br> What is the cause of it, what methods can make graphics card test floating-point performance further improve? Is it an optimized programming method? Or the theoretical floating-point performance of graphics cards can only be achieved by simple mathematical operations such as calculating a*b+c.<br> thank you.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_2' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://forums.developer.nvidia.com/u/Robert_Crovella'><span itemprop='name'>Robert_Crovella</span></a> </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2024-11-19T20:00:59Z' class='post-time'> November 19, 2024, 8:00pm </time> <meta itemprop='dateModified' content='2024-11-19T20:00:59Z'> <span itemprop='position'>2</span> </span> </div> <div class='post' itemprop='text'> <p>yes, optimization is needed. Use cublas. There are numerous questions on these forums about it.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_3' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://forums.developer.nvidia.com/u/LEEWEI'><span itemprop='name'>LEEWEI</span></a> </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2024-11-19T20:24:42Z' class='post-time'> November 19, 2024, 8:24pm </time> <meta itemprop='dateModified' content='2024-11-19T20:24:42Z'> <span itemprop='position'>3</span> </span> </div> <div class='post' itemprop='text'> <p>Thank you for your answer. Again, ask if using the cublas library (for example, cuda-samples-12.2\Samples\4_CUDA_Libraries\matrixMulCUBLAS in this example) to compute matrix multiplication represents the best possible speed for graphics cards.</p> <p>Are there other ways to speed things up even further?</p> <p>thank you.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_4' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://forums.developer.nvidia.com/u/Robert_Crovella'><span itemprop='name'>Robert_Crovella</span></a> </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2024-11-19T20:29:03Z' class='post-time'> November 19, 2024, 8:29pm </time> <meta itemprop='dateModified' content='2024-11-19T20:29:03Z'> <span itemprop='position'>4</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote no-group" data-username="LEEWEI" data-post="3" data-topic="313972"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" class="avatar" src="https://global.discourse-cdn.com/nvidia/original/3X/d/2/d2a387bcc59afb6f6fc9281deb01d412675e26e6.png" data-dominant-color="DADBDA"> LEEWEI:</div> <blockquote> <p>Are there other ways to speed things up even further?</p> </blockquote> </aside> <p>to do better than cublas?</p> <p>not for a typical programmer or use-case</p> <p>There are certainly <a href="https://github.com/nervanasystems/maxas/wiki/sgemm">examples</a> of people who have done better. I know of no better general recommendations than using cublas for matrix-matrix multiply, and/or to achieve something close to published peak theoretical flops numbers.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_5' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://forums.developer.nvidia.com/u/LEEWEI'><span itemprop='name'>LEEWEI</span></a> </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2024-11-19T20:39:14Z' class='post-time'> November 19, 2024, 8:39pm </time> <meta itemprop='dateModified' content='2024-11-19T20:39:14Z'> <span itemprop='position'>5</span> </span> </div> <div class='post' itemprop='text'> <p>Thanks again for your answer</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> <div id='post_6' itemprop='comment' itemscope itemtype='http://schema.org/Comment' class='topic-body crawler-post'> <div class='crawler-post-meta'> <span class="creator" itemprop="author" itemscope itemtype="http://schema.org/Person"> <a itemprop="url" href='https://forums.developer.nvidia.com/u/Curefab'><span itemprop='name'>Curefab</span></a> </span> <span class="crawler-post-infos"> <time itemprop='datePublished' datetime='2024-11-19T23:20:42Z' class='post-time'> November 19, 2024, 11:20pm </time> <meta itemprop='dateModified' content='2024-11-19T23:20:42Z'> <span itemprop='position'>6</span> </span> </div> <div class='post' itemprop='text'> <aside class="quote no-group" data-username="LEEWEI" data-post="1" data-topic="313972"> <div class="title"> <div class="quote-controls"></div> <img loading="lazy" alt="" width="24" height="24" class="avatar" src="https://global.discourse-cdn.com/nvidia/original/3X/d/2/d2a387bcc59afb6f6fc9281deb01d412675e26e6.png" data-dominant-color="DADBDA"> LEEWEI:</div> <blockquote> <p>can only be achieved by simple mathematical operations such as calculating a*b+c</p> </blockquote> </aside> <p>With some effort you can optimize other calculations, but probably not to the theoretical peak.</p> <p>1/5 is not untypical, even after optimization it could be 1/2 and still be good, more than 80% or 90% is very difficult.</p> <p>There are many parameters, e.g. also the memory bandwidth, to consider.</p> <p>BTW it is also difficult to fully use a CPU, even a single core: You would need to use hand-crafted assembly AVX or SSE vector instructions to max. out the computation speed.</p> </div> <div itemprop="interactionStatistic" itemscope itemtype="http://schema.org/InteractionCounter"> <meta itemprop="interactionType" content="http://schema.org/LikeAction"/> <meta itemprop="userInteractionCount" content="0" /> <span class='post-likes'></span> </div> </div> </div> <div id="related-topics" class="more-topics__list " role="complementary" aria-labelledby="related-topics-title"> <h3 id="related-topics-title" class="more-topics__list-title"> Related topics </h3> <div class="topic-list-container" itemscope itemtype='http://schema.org/ItemList'> <meta itemprop='itemListOrder' content='http://schema.org/ItemListOrderDescending'> <table class='topic-list'> <thead> <tr> <th>Topic</th> <th></th> <th class="replies">Replies</th> <th class="views">Views</th> <th>Activity</th> </tr> </thead> <tbody> <tr class="topic-list-item" id="topic-list-item-26269"> <td class="main-link" itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <meta itemprop='position' content='1'> <span class="link-top-line"> <a itemprop='url' href='https://forums.developer.nvidia.com/t/matrix-multiplication-cant-achieve-peak-performanc/26269' class='title raw-link raw-topic-link'>matrix multiplication can't achieve peak performanc</a> </span> <div class="link-bottom-line"> <a href='/c/accelerated-computing/cuda/cuda-programming-and-performance/7' class='badge-wrapper bullet'> <span class='badge-category-bg' style='background-color: #76B900'></span> <span class='badge-category clear-badge'> <span class='category-name'>CUDA Programming and Performance</span> </span> </a> <div class="discourse-tags"> </div> </div> </td> <td class="replies"> <span class='posts' title='posts'>9</span> </td> <td class="views"> <span class='views' title='views'>2299</span> </td> <td> April 19, 2012 </td> </tr> <tr class="topic-list-item" id="topic-list-item-15454"> <td class="main-link" itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <meta itemprop='position' content='2'> <span class="link-top-line"> <a itemprop='url' href='https://forums.developer.nvidia.com/t/matrix-multiplication-gflops-on-nvidia-quadro-fx-1700/15454' class='title raw-link raw-topic-link'>[Matrix Multiplication] GFlops on Nvidia Quadro FX 1700....</a> </span> <div class="link-bottom-line"> <a href='/c/accelerated-computing/cuda/cuda-programming-and-performance/7' class='badge-wrapper bullet'> <span class='badge-category-bg' style='background-color: #76B900'></span> <span class='badge-category clear-badge'> <span class='category-name'>CUDA Programming and Performance</span> </span> </a> <div class="discourse-tags"> </div> </div> </td> <td class="replies"> <span class='posts' title='posts'>5</span> </td> <td class="views"> <span class='views' title='views'>7759</span> </td> <td> April 16, 2010 </td> </tr> <tr class="topic-list-item" id="topic-list-item-13145"> <td class="main-link" itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <meta itemprop='position' content='3'> <span class="link-top-line"> <a itemprop='url' href='https://forums.developer.nvidia.com/t/cublas-question-a-question-about-performance-of-cublas/13145' class='title raw-link raw-topic-link'>CUBLAS question a question about performance of CUBLAS</a> </span> <div class="link-bottom-line"> <a href='/c/accelerated-computing/cuda/cuda-programming-and-performance/7' class='badge-wrapper bullet'> <span class='badge-category-bg' style='background-color: #76B900'></span> <span class='badge-category clear-badge'> <span class='category-name'>CUDA Programming and Performance</span> </span> </a> <div class="discourse-tags"> </div> </div> </td> <td class="replies"> <span class='posts' title='posts'>4</span> </td> <td class="views"> <span class='views' title='views'>5972</span> </td> <td> November 11, 2009 </td> </tr> <tr class="topic-list-item" id="topic-list-item-28125"> <td class="main-link" itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <meta itemprop='position' content='4'> <span class="link-top-line"> <a itemprop='url' href='https://forums.developer.nvidia.com/t/a-few-questions-related-to-cuda-and-cublas/28125' class='title raw-link raw-topic-link'>A few Questions related to CUDA and CUBLAS</a> </span> <div class="link-bottom-line"> <a href='/c/accelerated-computing/cuda/cuda-programming-and-performance/7' class='badge-wrapper bullet'> <span class='badge-category-bg' style='background-color: #76B900'></span> <span class='badge-category clear-badge'> <span class='category-name'>CUDA Programming and Performance</span> </span> </a> <div class="discourse-tags"> </div> </div> </td> <td class="replies"> <span class='posts' title='posts'>0</span> </td> <td class="views"> <span class='views' title='views'>905</span> </td> <td> February 1, 2013 </td> </tr> <tr class="topic-list-item" id="topic-list-item-16890"> <td class="main-link" itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <meta itemprop='position' content='5'> <span class="link-top-line"> <a itemprop='url' href='https://forums.developer.nvidia.com/t/why-matrixmul-from-samples-so-slow/16890' class='title raw-link raw-topic-link'>why matrixMul from samples so slow?</a> </span> <div class="link-bottom-line"> <a href='/c/accelerated-computing/cuda/cuda-programming-and-performance/7' class='badge-wrapper bullet'> <span class='badge-category-bg' style='background-color: #76B900'></span> <span class='badge-category clear-badge'> <span class='category-name'>CUDA Programming and Performance</span> </span> </a> <div class="discourse-tags"> </div> </div> </td> <td class="replies"> <span class='posts' title='posts'>7</span> </td> <td class="views"> <span class='views' title='views'>5069</span> </td> <td> June 7, 2010 </td> </tr> <tr class="topic-list-item" id="topic-list-item-14902"> <td class="main-link" itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <meta itemprop='position' content='6'> <span class="link-top-line"> <a itemprop='url' href='https://forums.developer.nvidia.com/t/performance-query-odd-results-profiling-gpu-speed-of-matrix-multiplication-using-cublas/14902' class='title raw-link raw-topic-link'>Performance query Odd results profiling GPU speed of matrix multiplication using cublas</a> </span> <div class="link-bottom-line"> <a href='/c/accelerated-computing/cuda/cuda-programming-and-performance/7' class='badge-wrapper bullet'> <span class='badge-category-bg' style='background-color: #76B900'></span> <span class='badge-category clear-badge'> <span class='category-name'>CUDA Programming and Performance</span> </span> </a> <div class="discourse-tags"> </div> </div> </td> <td class="replies"> <span class='posts' title='posts'>1</span> </td> <td class="views"> <span class='views' title='views'>1443</span> </td> <td> February 12, 2010 </td> </tr> <tr class="topic-list-item" id="topic-list-item-112863"> <td class="main-link" itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <meta itemprop='position' content='7'> <span class="link-top-line"> <a itemprop='url' href='https://forums.developer.nvidia.com/t/unexpected-slow-performance/112863' class='title raw-link raw-topic-link'>unexpected slow performance</a> </span> <div class="link-bottom-line"> <a href='/c/accelerated-computing/cuda/cuda-programming-and-performance/7' class='badge-wrapper bullet'> <span class='badge-category-bg' style='background-color: #76B900'></span> <span class='badge-category clear-badge'> <span class='category-name'>CUDA Programming and Performance</span> </span> </a> <div class="discourse-tags"> </div> </div> </td> <td class="replies"> <span class='posts' title='posts'>0</span> </td> <td class="views"> <span class='views' title='views'>367</span> </td> <td> February 29, 2020 </td> </tr> <tr class="topic-list-item" id="topic-list-item-39637"> <td class="main-link" itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <meta itemprop='position' content='8'> <span class="link-top-line"> <a itemprop='url' href='https://forums.developer.nvidia.com/t/help-me-cuda-program-execution-is-slower-than-cpu-did-i-miss-any-settings/39637' class='title raw-link raw-topic-link'>Help me... Cuda program execution is slower than CPU...Did I miss any settings??</a> </span> <div class="link-bottom-line"> <a href='/c/accelerated-computing/cuda/cuda-programming-and-performance/7' class='badge-wrapper bullet'> <span class='badge-category-bg' style='background-color: #76B900'></span> <span class='badge-category clear-badge'> <span class='category-name'>CUDA Programming and Performance</span> </span> </a> <div class="discourse-tags"> </div> </div> </td> <td class="replies"> <span class='posts' title='posts'>5</span> </td> <td class="views"> <span class='views' title='views'>1181</span> </td> <td> September 24, 2015 </td> </tr> <tr class="topic-list-item" id="topic-list-item-68068"> <td class="main-link" itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <meta itemprop='position' content='9'> <span class="link-top-line"> <a itemprop='url' href='https://forums.developer.nvidia.com/t/how-to-measure-the-performance-of-a-gpu/68068' class='title raw-link raw-topic-link'>How to measure the performance of a GPU?</a> </span> <div class="link-bottom-line"> <a href='/c/accelerated-computing/cuda/cuda-programming-and-performance/7' class='badge-wrapper bullet'> <span class='badge-category-bg' style='background-color: #76B900'></span> <span class='badge-category clear-badge'> <span class='category-name'>CUDA Programming and Performance</span> </span> </a> <div class="discourse-tags"> </div> </div> </td> <td class="replies"> <span class='posts' title='posts'>2</span> </td> <td class="views"> <span class='views' title='views'>956</span> </td> <td> December 3, 2018 </td> </tr> <tr class="topic-list-item" id="topic-list-item-7597"> <td class="main-link" itemprop='itemListElement' itemscope itemtype='http://schema.org/ListItem'> <meta itemprop='position' content='10'> <span class="link-top-line"> <a itemprop='url' href='https://forums.developer.nvidia.com/t/confused-about-gpu-vs-cpu-speed-in-multiplication/7597' class='title raw-link raw-topic-link'>Confused about GPU vs CPU speed in multiplication</a> </span> <div class="link-bottom-line"> <a href='/c/accelerated-computing/cuda/cuda-programming-and-performance/7' class='badge-wrapper bullet'> <span class='badge-category-bg' style='background-color: #76B900'></span> <span class='badge-category clear-badge'> <span class='category-name'>CUDA Programming and Performance</span> </span> </a> <div class="discourse-tags"> </div> </div> </td> <td class="replies"> <span class='posts' title='posts'>8</span> </td> <td class="views"> <span class='views' title='views'>6526</span> </td> <td> February 19, 2009 </td> </tr> </tbody> </table> </div> </div> </div> <footer class="container wrap"> <nav class='crawler-nav'> <ul> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='/' itemprop="url">Home </a> </span> </li> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='/categories' itemprop="url">Categories </a> </span> </li> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='/guidelines' itemprop="url">Guidelines </a> </span> </li> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='https://www.nvidia.com/en-us/about-nvidia/legal-info/' itemprop="url">Terms of Service </a> </span> </li> <li itemscope itemtype='http://schema.org/SiteNavigationElement'> <span itemprop='name'> <a href='https://www.nvidia.com/en-us/about-nvidia/privacy-policy/' itemprop="url">Privacy Policy </a> </span> </li> </ul> </nav> <p class='powered-by-link'>Powered by <a href="https://www.discourse.org">Discourse</a>, best viewed with JavaScript enabled</p> </footer> <script defer="" src="https://sea2.discourse-cdn.com/nvidia/theme-javascripts/d8ea190aad487b8feae316757854a563d15c4c27.js?__ws=forums.developer.nvidia.com" data-theme-id="13" nonce="LUEsfrf5JFCuF4bqReHmwXGJM"></script> <footer> <div class="footer-links"> <div class="container"> <div class="row"> <div class="col-xs-12 col-sm-12 col-md-3 col-lg-3"> <div class="col-xs-12 col-sm-12 col-md-12 col-lg-12"> <div class="padding-md-footer"> <div class="logo-footer"></div> </div> </div> <div class="col-xs-12 col-sm-12 col-md-9 col-lg-9 padding-section-footer"></div> </div> <div class="col-xs-12 col-sm-12 col-md-9 col-lg-9"></div> </div> </div> </div> <div class="footer-boilerplate"> <div class="container"> <div class="boilerplate"> <div class="col-xs-12 col-sm-12 col-lg-9 padding-sm-bottom"> Copyright © 2024 NVIDIA Corporation <ul class="legal_links"> <li class="first leaf"> <a href="https://www.nvidia.com/en-us/about-nvidia/legal-info/" title="">Legal Information</a> </li> <li class="leaf"> <a href="https://developer.nvidia.com/legal/terms" title="">Terms of Use</a> </li> <li class="leaf"> <a href="https://www.nvidia.com/en-us/about-nvidia/privacy-policy/" title="">Privacy Policy</a> </li> <li class="leaf"> <a href="https://developer.nvidia.com/contact" title="">Contact</a> </li> <li class="last leaf"> <a href="https://www.nvidia.com/en-us/about-nvidia/cookie-policy/" title="NVIDIA websites use cookies to deliver and improve the website experience. See our cookie policy for further details on how we use cookies and how to change your cookie settings.">Cookie Policy</a> </li> </ul> </div> </div> </div> </div> </footer> <div class="buorg"><div>Unfortunately, <a href="https://www.discourse.org/faq/#browser">your browser is unsupported</a>. Please <a href="https://browsehappy.com">switch to a supported browser</a> to view rich content, log in and reply.</div></div> </body> </html>